Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extreme-value theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial cross-validation scheme and show that in our setting it provides a better proxy for test set performance than naive cross-validation. The predictions are benchmarked against boosting approaches with different loss functions, and perform competitively in terms of the score criterion, finally placing second in the competition ranking. PubDate: 2023-01-21
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We present our submission to the Extreme Value Analysis 2021 Data Challenge in which teams were asked to accurately predict distributions of wildfire frequency and size within spatio-temporal regions of missing data. For this competition, we developed a variant of the powerful variational autoencoder models, which we call Conditional Missing data Importance-Weighted Autoencoder (CMIWAE). Our deep latent variable generative model requires little to no feature engineering and does not necessarily rely on the specifics of scoring in the Data Challenge. It is fully trained on incomplete data, with the single objective to maximize log-likelihood of the observed wildfire information. We mitigate the effects of the relatively low number of training samples by stochastic sampling from a variational latent variable distribution, as well as by ensembling a set of CMIWAE models trained and validated on different splits of the provided data. PubDate: 2023-01-18
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We build a sharp approximation of the whole distribution of the sum of iid heavy-tailed random vectors, combining mean and extreme behaviors. It extends the so-called ’normex’ approach from a univariate to a multivariate framework. We propose two possible multi-normex distributions, named d-Normex and MRV-Normex. Both rely on the Gaussian distribution for describing the mean behavior, via the CLT, while the difference between the two versions comes from using the exact distribution or the EV theorem for the maximum. The main theorems provide the rate of convergence for each version of the multi-normex distributions towards the distribution of the sum, assuming second order regular variation property for the norm of the parent random vector when considering the MRV-normex case. Numerical illustrations and comparisons are proposed with various dependence structures on the parent random vector, using QQ-plots based on geometrical quantiles. PubDate: 2023-01-13
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Weissman extrapolation methodology for estimating extreme quantiles from heavy-tailed distributions is based on two estimators: an order statistic to estimate an intermediate quantile and an estimator of the tail-index. The common practice is to select the same intermediate sequence for both estimators. In this work, we show how an adapted choice of two different intermediate sequences leads to a reduction of the asymptotic bias associated with the resulting refined Weissman estimator. The asymptotic normality of the latter estimator is established and a data-driven method is introduced for the practical selection of the intermediate sequences. Our approach is compared to the Weissman estimator and to six bias reduced estimators of extreme quantiles on a large scale simulation study. It appears that the refined Weissman estimator outperforms its competitors in a wide variety of situations, especially in the challenging high bias cases. Finally, an illustration on an actuarial real data set is provided. PubDate: 2022-12-27
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In order to formulate effective fire-mitigation policies, it is important to understand the spatial and temporal distribution of different types of wildfires and to be able to predict their occurrence taking the main influencing factors into account. The objective of this short communication is to assess the capability of a fast and easy-to-implement random forest algorithm to estimate cumulative probabilities fire frequency and burned area using a large dataset collected in the USA. The input variables of the algorithm are voluntary restricted to climate and land use factors, which are easy to obtain in practice. No input related to fire frequency, burned area, or to any other fire characteristic is used. After model selection and training, the performance of random forest is assessed using an independent dataset including 80,000 observations of fire occurrence and burned area. Results show that the score of our simple random forest algorithm is 9% higher than the score of the winner of the data challenge of Opitz (Extreme, 2022) revealing that, although this model has a good performance, it is not the best. However, the approach proposed here can be implemented using standard packages, does not require any fire monitoring system after training, and requires little specialized knowledge in machine learning, which makes it usable by a large diversity of stakeholders. The results of this study suggest that random forest should be part of the toolbox of engineers and scientists involved in wildfire prediction. PubDate: 2022-12-27
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Confounding variables are a recurrent challenge for causal discovery and inference. In many situations, complex causal mechanisms only manifest themselves in extreme events, or take simpler forms in the extremes. Stimulated by data on extreme river flows and precipitation, we introduce a new causal discovery methodology for heavy-tailed variables that allows the effect of a known potential confounder to be almost entirely removed when the variables have comparable tails, and also decreases it sufficiently to enable correct causal inference when the confounder has a heavier tail. We also introduce a new parametric estimator for the existing causal tail coefficient and a permutation test. Simulations show that the methods work well and the ideas are applied to the motivating dataset. PubDate: 2022-12-17
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We establish a one-to-one correspondence between (i) exchangeable sequences of random variables whose finite-dimensional distributions are minimum (or maximum) infinitely divisible and (ii) non-negative, non-decreasing, infinitely divisible stochastic processes. The exponent measure of an exchangeable minimum infinitely divisible sequence is shown to be the sum of a very simple “drift measure” and a mixture of product probability measures, which uniquely corresponds to the Lévy measure of a non-negative and non-decreasing infinitely divisible process. The latter is shown to be supported on non-negative and non-decreasing functions. In probabilistic terms, the aforementioned infinitely divisible process is equal to the conditional cumulative hazard process associated with the exchangeable sequence of random variables with minimum (or maximum) infinitely divisible marginals. Our results provide an analytic umbrella which embeds the de Finetti subfamilies of many interesting classes of multivariate distributions, such as exogenous shock models, exponential and geometric laws with lack-of-memory property, min-stable multivariate exponential and extreme-value distributions, as well as reciprocal Archimedean copulas with completely monotone generator and Archimedean copulas with log-completely monotone generator. PubDate: 2022-12-17
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The full-information best choice problem asks one to find a strategy maximising the probability of stopping at the minimum (or maximum) of a sequence \(X_1,\cdots ,X_n\) of i.i.d. random variables with continuous distribution. In this paper we look at more general models, where independent \(X_j\) ’s may have different distributions, discrete or continuous. A central role in our study is played by the running minimum process, which we first employ to re-visit the classic problem and its limit Poisson counterpart. The approach is further applied to two explicitly solvable models: in the first the distribution of the jth variable is uniform on \(\{j,\cdots ,n\}\) , and in the second it is uniform on \(\{1,\cdots , n\}\) . PubDate: 2022-11-29
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Conditionally specified models are often used to describe complex multivariate data. Such models assume implicit structures on the extremes. So far, no methodology exists for calculating extremal characteristics of conditional models since the copula and marginals are not expressed in closed forms. We consider bivariate conditional models that specify the distribution of X and the distribution of Y conditional on X. We provide tools to quantify implicit assumptions on the extremes of this class of models. In particular, these tools allow us to approximate the distribution of the tail of Y and the coefficient of asymptotic independence \(\eta\) in closed forms. We apply these methods to a widely used conditional model for wave height and wave period. Moreover, we introduce a new condition on the parameter space for the conditional extremes model of Heffernan and Tawn (Journal of the Royal Statistical Society: Series B (Methodology) 66(3), 497-547, 2004), and prove that the conditional extremes model does not capture \(\eta\) , when \(\eta <1\) . PubDate: 2022-11-10 DOI: 10.1007/s10687-022-00453-7
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract A bivariate extreme-value copula is characterized by its Pickands dependence function, i.e., a convex function defined on the unit interval satisfying boundary conditions. This paper investigates the large-sample behavior of a nonparametric estimator of this function due to Cormier et al. (Extremes 17:633–659, 2014). These authors showed how to construct this estimator through constrained quadratic median B-spline smoothing of pairs of pseudo-observations derived from a random sample. Their estimator is shown here to exist whatever the order \(m \ge 3\) of the B-spline basis, and its consistency is established under minimal conditions. The large-sample distribution of this estimator is also determined under the additional assumption that the underlying Pickands dependence function is a B-spline of given order with a known set of knots. PubDate: 2022-11-09 DOI: 10.1007/s10687-022-00451-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider the standard and kth record values arising in sequences of independent identically distributed continuous and positive random variables with finite expectations. We determine necessary and sufficient conditions on the type of record k, its number n and moment order r so that the rth moment of the n value of kth record is finite for every parent distribution. Under the conditions we present the optimal upper bounds on these moments expressed in the scale units being the respective powers of the first population moment. The theoretical results are illustrated by some numerical evaluations. PubDate: 2022-11-09 DOI: 10.1007/s10687-022-00449-3
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The tail process \(\varvec{Y}=(Y_{\varvec{i}})_{\varvec{i}\in \mathbb {Z}^d}\) of a stationary regularly varying random field \(\varvec{X}=(X_{\varvec{i}})_{\varvec{i}\in \mathbb {Z}^d}\) represents the asymptotic local distribution of \(\varvec{X}\) as seen from its typical exceedance over a threshold u as \(u\rightarrow \infty\) . Motivated by the standard Palm theory, we show that every tail process satisfies an invariance property called exceedance-stationarity and that this property, together with the spectral decomposition of the tail process, characterizes the class of all tail processes. We then restrict to the case when \(Y_{\varvec{i}}\rightarrow 0\) as \( \varvec{i} \rightarrow \infty\) and establish a couple of Palm-like dualities between the tail process and the so-called anchored tail process which, under suitable conditions, represents the asymptotic distribution of a typical cluster of extremes of \(\varvec{X}\) . The main message is that the distribution of the tail process is biased towards clusters with more exceedances. Finally, we use these results to determine the distribution of a typical cluster of extremes for moving average processes with random coefficients and heavy-tailed innovations. PubDate: 2022-10-24 DOI: 10.1007/s10687-022-00447-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper devises a regression-type model for the situation where both the response and covariates are extreme. The proposed approach is designed for the setting where the response and covariates are modeled as multivariate extreme values, and thus contrarily to standard regression methods it takes into account the key fact that the limiting distribution of suitably standardized componentwise maxima is an extreme value copula. An important target in the proposed framework is the regression manifold, which consists of a family of regression lines obeying the latter asymptotic result. To learn about the proposed model from data, we employ a Bernstein polynomial prior on the space of angular densities which leads to an induced prior on the space of regression manifolds. Numerical studies suggest a good performance of the proposed methods, and a finance real-data illustration reveals interesting aspects on the conditional risk of extreme losses in two leading international stock markets. PubDate: 2022-10-21 DOI: 10.1007/s10687-022-00446-6
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The tail empirical process (TEP) generated by an i.i.d. sequence of regularly varying random variables is key to investigating the behaviour of extreme value statistics such as the Hill and harmonic moment estimators of the tail index. The main contribution of the paper is to prove that Efron’s bootstrap produces versions of the estimators that exhibit the same asymptotic behaviour, including possible bias. In addition, the bootstrap provides new estimators of the tail index based on variability. Further, the asymptotic behaviour of the bootstrap variance estimators is shown to be unaffected by bias. PubDate: 2022-10-14 DOI: 10.1007/s10687-022-00445-7
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We investigate branching processes with immigration in a random environment. Using Goldie’s implicit renewal theory we prove that under a generalized Cramér condition the stationary distribution of such processes has a power law tail. We further show how several methods familiar in the extreme value theory provide a natural and elegant path to their mathematical analysis. In particular, we rely on the point processes theory and the concept of tail process to determine the limiting distribution for the corresponding extremes and partial sums. Since Kesten, Kozlov and Spitzer seminal 1975 paper, it is known that one class of these processes has a close relation with random walks in a random environment. Even in that well studied context, the method we follow yields new results. For instance, we are able to i) move away from the conditions used by Kesten et al., ii) provide precise form of the limiting distribution in their main theorem, and iii) characterize the long term behavior of the worst traps a random walk in random environment encounters when drifting away from the origin. PubDate: 2022-07-12 DOI: 10.1007/s10687-022-00443-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this manuscript, we study the limiting distribution for the joint law of the largest and the smallest singular values for random circulant matrices with generating sequence given by independent and identically distributed random elements satisfying the so-called Lyapunov condition. Under an appropriated normalization, the joint law of the extremal singular values converges in distribution, as the matrix dimension tends to infinity, to an independent product of Rayleigh and Gumbel laws. The latter implies that a normalized \(\textit{condition number}\) converges in distribution to a Fréchet law as the dimension of the matrix increases. PubDate: 2022-07-04 DOI: 10.1007/s10687-022-00442-w
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The extremal index is an important parameter in the characterization of extreme values of a stationary sequence. This paper presents a novel approach to estimation of the extremal index based on truncation of interexceedance times. The truncated estimator based on the maximum likelihood method is derived together with its first-order bias. The estimator is further improved using penultimate approximation to the limiting mixture distribution. In order to assess the performance of the proposed estimator, a simulation study is carried out for various stationary processes satisfying the local dependence condition \(D^{(k)}(u_n)\) . An application to daily maximum temperatures at Uccle, Belgium, is also presented. PubDate: 2022-06-24 DOI: 10.1007/s10687-022-00444-8
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The objective of this paper is to investigate the layered structure of topological complexity in the tail of a probability distribution. We establish the functional strong law of large numbers for Betti numbers, a basic quantifier of algebraic topology, of a geometric complex outside an open ball of radius \(R_n\) , such that \(R_n\rightarrow \infty\) as the sample size n increases. The nature of the obtained law of large numbers is determined by the decay rate of a probability density and how rapidly \(R_n\) diverges. In particular, if \(R_n\) diverges sufficiently slowly, the limiting function in the law of large numbers is crucially affected by the emergence of arbitrarily large connected components supporting topological cycles in the limit. PubDate: 2022-05-31 DOI: 10.1007/s10687-022-00441-x
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We provide a new extension of Breiman’s Theorem on computing tail probabilities of a product of random variables to a multivariate setting. In particular, we give a characterization of regular variation on cones in \([0,\infty )^d\) under random linear transformations. This allows us to compute probabilities of a variety of tail events, which classical multivariate regularly varying models would report to be asymptotically negligible. We illustrate our findings with applications to risk assessment in financial systems and reinsurance markets under a bipartite network structure. PubDate: 2022-05-26 DOI: 10.1007/s10687-021-00432-4