Extremes
Journal Prestige (SJR): 1.562 Citation Impact (citeScore): 1 Number of Followers: 2 Hybrid journal (It can contain Open Access articles) ISSN (Print) 1572915X  ISSN (Online) 13861999 Published by SpringerVerlag [2468 journals] 
 Simple random forest classification algorithms for predicting occurrences
and sizes of wildfires
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: In order to formulate effective firemitigation policies, it is important to understand the spatial and temporal distribution of different types of wildfires and to be able to predict their occurrence taking the main influencing factors into account. The objective of this short communication is to assess the capability of a fast and easytoimplement random forest algorithm to estimate cumulative probabilities fire frequency and burned area using a large dataset collected in the USA. The input variables of the algorithm are voluntary restricted to climate and land use factors, which are easy to obtain in practice. No input related to fire frequency, burned area, or to any other fire characteristic is used. After model selection and training, the performance of random forest is assessed using an independent dataset including 80,000 observations of fire occurrence and burned area. Results show that the score of our simple random forest algorithm is 9% higher than the score of the winner of the data challenge of Opitz (Extreme, 2022) revealing that, although this model has a good performance, it is not the best. However, the approach proposed here can be implemented using standard packages, does not require any fire monitoring system after training, and requires little specialized knowledge in machine learning, which makes it usable by a large diversity of stakeholders. The results of this study suggest that random forest should be part of the toolbox of engineers and scientists involved in wildfire prediction.
PubDate: 20230601

 Taildependence, exceedance sets, and metric embeddings

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: There are many ways of measuring and modeling taildependence in random vectors: from the general framework of multivariate regular variation and the flexible class of maxstable vectors down to simple and concise summary measures like the matrix of bivariate taildependence coefficients. This paper starts by providing a review of existing results from a unifying perspective, which highlights connections between extreme value theory and the theory of cuts and metrics. Our approach leads to some new findings in both areas with some applications to current topics in risk management. We begin by using the framework of multivariate regular variation to show that extremal coefficients, or equivalently, the higherorder taildependence coefficients of a random vector can simply be understood in terms of random exceedance sets, which allows us to extend the notion of Bernoulli compatibility. In the special but important case of bivariate taildependence, we establish a correspondence between taildependence matrices and \(L^1\)  and \(\ell _1\) embeddable finite metric spaces via the spectral distance, which is a metric on the space of jointly 1Fréchet random variables. Namely, the coefficients of the cutdecomposition of the spectral distance and of the TawnMolchanov maxstable model realizing the corresponding bivariate extremal dependence coincide. We show that line metrics are rigid and if the spectral distance corresponds to a line metric, the higher order taildependence is determined by the bivariate taildependence matrix. Finally, the correspondence between \(\ell _1\) embeddable metric spaces and taildependence matrices allows us to revisit the realizability problem, i.e. checking whether a given matrix is a valid taildependence matrix. We confirm a conjecture of Shyamalkumar and Tao (2020) that this problem is NPcomplete.
PubDate: 20230527

 Large nearest neighbour balls in hyperbolic stochastic geometry

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: Consider a stationary Poisson process in a ddimensional hyperbolic space. For \(R>0\) define the point process \(\xi _R^{(k)}\) of exceedance heights over a suitable threshold of the hyperbolic volumes of kth nearest neighbour balls centred around the points of the Poisson process within a hyperbolic ball of radius R centred at a fixed point. The point process \(\xi _R^{(k)}\) is compared to an inhomogeneous Poisson process on the real line with intensity function \(e^{u}\) and point process convergence in the KantorovichRubinstein distance is shown. From this, a quantitative limit theorem for the hyperbolic maximum kth nearest neighbour ball with a limiting Gumbel distribution is derived.
PubDate: 20230420

 Remembering Ross Leadbetter: some personal recollections

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: Ross Leadbetter had had a broad and deep influence on the development of probabilistic and statistical theory of extreme values and on the application of extremevalue methods. He has been an inspiration and a friend for many of us. This editorial collects thirteen personal recollections of Ross and his work. An account of his career and some of his work can be found in the IMS Obituary “Ross Leadbetter 1931–2022”.
PubDate: 20230410

 Extremes of Markov random fields on block graphs: Maxstable limits and
structured HÃ¼slerâ€“Reiss distributions
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: We study the joint occurrence of large values of a Markov random field or undirected graphical model associated to a block graph. On such graphs, containing trees as special cases, we aim to generalize recent results for extremes of Markov trees. Every pair of nodes in a block graph is connected by a unique shortest path. These paths are shown to determine the limiting distribution of the properly rescaled random field given that a fixed variable exceeds a high threshold. The latter limit relation implies that the random field is multivariate regularly varying and it determines the maxstable distribution to which componentwise maxima of independent random samples from the field are attracted. When the subvectors induced by the blocks have certain limits parametrized by Hüsler–Reiss distributions, the global Markov property of the original field induces a particular structure on the parameter matrix of the limiting maxstable Hüsler–Reiss distribution. The multivariate Pareto version of the latter turns out to be an extremal graphical model according to the original block graph. Thanks to these algebraic relations, the parameters are still identifiable even if some variables are latent.
PubDate: 20230404

 A marginal modelling approach for predicting wildfire extremes across the
contiguous United States
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: This paper details a methodology proposed for the EVA 2021 conference data challenge. The aim of this challenge was to predict the number and size of wildfires over the contiguous US between 1993 and 2015, with more importance placed on extreme events. In the data set provided, over 14% of both wildfire count and burnt area observations are missing; the objective of the data challenge was to estimate a range of marginal probabilities from the distribution functions of these missing observations. To enable this prediction, we make the assumption that the marginal distribution of a missing observation can be informed using nonmissing data from neighbouring locations. In our method, we select spatial neighbourhoods for each missing observation and fit marginal models to nonmissing observations in these regions. For the wildfire counts, we assume the compiled data sets follow a zeroinflated negative binomial distribution, while for burnt area values, we model the bulk and tail of each compiled data set using nonparametric and parametric techniques, respectively. Cross validation is used to select tuning parameters, and the resulting predictions are shown to significantly outperform the benchmark method proposed in the challenge outline. We conclude with a discussion of our modelling framework, and evaluate ways in which it could be extended.
PubDate: 20230401

 A weighted composite loglikelihood approach to parametric estimation of
the extreme quantiles of a distribution
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: Extreme value theory motivates estimating extreme upper quantiles of a distribution by selecting some threshold, discarding those observations below the threshold and fitting a generalized Pareto distribution to exceedances above the threshold via maximum likelihood. This sharp cutoff between observations that are used in the parameter estimation and those that are not is at odds with statistical practice for analogous problems such as nonparametric density estimation, in which observations are typically smoothly downweighted as they become more distant from the value at which the density is being estimated. By exploiting the fact that the order statistics of independent and identically distributed observations form a Markov chain, this work shows how one can obtain a natural weighted composite loglikelihood function for fitting generalized Pareto distributions to exceedances over a threshold. A method for producing confidence intervals based on inverting a test statistic calibrated via parametric bootstrapping is proposed. Some theory demonstrates the asymptotic advantages of using weights in the special case when the shape parameter of the limiting generalized Pareto distribution is known to be 0. Methods for extending this approach to observations that are not identically distributed are described and applied to an analysis of daily precipitation data in New York City. Perhaps the most important practical finding is that including weights in the composite loglikelihood function can reduce the sensitivity of estimates to small changes in the threshold.
PubDate: 20230329
DOI: 10.1007/s1068702300466w

 Editorial: EVA 2021 data challenge on spatiotemporal prediction of
wildfire extremes in the USA
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
PubDate: 20230327
DOI: 10.1007/s1068702300465x

 Joint modeling and prediction of massive spatiotemporal wildfire count
and burnt area data with the INLASPDE approach
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: This paper describes the methodology used by the team RedSea in the data competition organized for EVA 2021 conference. We develop a novel twopart model to jointly describe the wildfire count data and burnt area data provided by the competition organizers with covariates. Our proposed methodology relies on the integrated nested Laplace approximation combined with the stochastic partial differential equation (INLASPDE) approach. In the first part, a binary nonstationary spatiotemporal model is used to describe the underlying process that determines whether or not there is wildfire at a specific time and location. In the second part, we consider a nonstationary hurdle logGaussian Cox process (hurdleLGCP) for the positive wildfire count data, i.e., an LGCP is used to model the shifted positive count data, and a nonstationary logGaussian model for positive burnt area data. Dependence between the positive count data and positive burnt area data is captured by a shared spatiotemporal random effect. Our twopart modeling approach performs well in terms of the prediction score criterion chosen by the data competition organizers. Moreover, our model results show that surface pressure is the most influential driver for the occurrence of a wildfire, whilst surface net solar radiation and surface pressure are the key drivers for large numbers of wildfires, and temperature and evaporation are the key drivers of large burnt areas.
PubDate: 20230314
DOI: 10.1007/s1068702300463z

 A combined statistical and machine learning approach for spatial
prediction of extreme wildfire frequencies and sizes
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge, we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a fourstage, bivariate, sparse spatial model for highdimensional zeroinflated data that we develop using stochastic partial differential equations (SPDE), allowing sparse precision matrices for the latent processes. In Stage 1, the observations are separated in zero/nonzero categories and modeled using a twolayered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, we first obtain empirical estimates of the spatiallyvarying mean and variance profiles across the spatial locations for the positive observations and smooth those estimates using fixed rank kriging. This approximate Bayesian inference method is employed to avoid the high computational burden of large spatial data modeling using spatiallyvarying coefficients. In Stage 3, we further model the standardized logtransformed positive observations from the second stage using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus, in Stage 4, the predicted exceedance probabilities are postprocessed using Random Forests. We draw posterior inference for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. We then create a crossvalidation scheme for the artificially generated gaps and compare the EVA 2021 prediction scores of the proposed model to those obtained using some competitors.
PubDate: 20230221
DOI: 10.1007/s10687022004608

 Analysis of wildfires and their extremes via spatial quantile
autoregressive model
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: In this paper we propose a procedure to estimate the distribution of wildfire frequency and severity using the wildfire data measured by month during 1993–2015. To this end, a spatial quantile autoregressive model (SQAR) is applied to the data with an aid of extreme value theory. Using the proposed method we are able to predict the distributional behavior of the data and identify the hidden structures beyond their mean structures. In addition, abundant interpretations are available with a regressionbased model. We provide the estimated results from the wildfire data, including significant explanatory variables and some meaningful interpretations.
PubDate: 20230213
DOI: 10.1007/s10687023004620

 Gradient boosting with extremevalue theory for wildfire prediction

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extremevalue theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial crossvalidation scheme and show that in our setting it provides a better proxy for test set performance than naive crossvalidation. The predictions are benchmarked against boosting approaches with different loss functions, and perform competitively in terms of the score criterion, finally placing second in the competition ranking.
PubDate: 20230121
DOI: 10.1007/s10687022004546

 Reconstruction of incomplete wildfire data using deep generative models

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: We present our submission to the Extreme Value Analysis 2021 Data Challenge in which teams were asked to accurately predict distributions of wildfire frequency and size within spatiotemporal regions of missing data. For this competition, we developed a variant of the powerful variational autoencoder models, which we call Conditional Missing data ImportanceWeighted Autoencoder (CMIWAE). Our deep latent variable generative model requires little to no feature engineering and does not necessarily rely on the specifics of scoring in the Data Challenge. It is fully trained on incomplete data, with the single objective to maximize loglikelihood of the observed wildfire information. We mitigate the effects of the relatively low number of training samples by stochastic sampling from a variational latent variable distribution, as well as by ensembling a set of CMIWAE models trained and validated on different splits of the provided data.
PubDate: 20230118
DOI: 10.1007/s10687022004591

 Exchangeable minid sequences: Characterization, exponent measures and
nondecreasing idprocesses
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: We establish a onetoone correspondence between (i) exchangeable sequences of random variables whose finitedimensional distributions are minimum (or maximum) infinitely divisible and (ii) nonnegative, nondecreasing, infinitely divisible stochastic processes. The exponent measure of an exchangeable minimum infinitely divisible sequence is shown to be the sum of a very simple “drift measure” and a mixture of product probability measures, which uniquely corresponds to the Lévy measure of a nonnegative and nondecreasing infinitely divisible process. The latter is shown to be supported on nonnegative and nondecreasing functions. In probabilistic terms, the aforementioned infinitely divisible process is equal to the conditional cumulative hazard process associated with the exchangeable sequence of random variables with minimum (or maximum) infinitely divisible marginals. Our results provide an analytic umbrella which embeds the de Finetti subfamilies of many interesting classes of multivariate distributions, such as exogenous shock models, exponential and geometric laws with lackofmemory property, minstable multivariate exponential and extremevalue distributions, as well as reciprocal Archimedean copulas with completely monotone generator and Archimedean copulas with logcompletely monotone generator.
PubDate: 20221217
DOI: 10.1007/s1068702200450w

 Running minimum in the bestchoice problem

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: The fullinformation best choice problem asks one to find a strategy maximising the probability of stopping at the minimum (or maximum) of a sequence \(X_1,\cdots ,X_n\) of i.i.d. random variables with continuous distribution. In this paper we look at more general models, where independent \(X_j\) ’s may have different distributions, discrete or continuous. A central role in our study is played by the running minimum process, which we first employ to revisit the classic problem and its limit Poisson counterpart. The approach is further applied to two explicitly solvable models: in the first the distribution of the jth variable is uniform on \(\{j,\cdots ,n\}\) , and in the second it is uniform on \(\{1,\cdots , n\}\) .
PubDate: 20221129
DOI: 10.1007/s10687022004573

 Publisher Correction: Integral Functionals and the Bootstrap for the Tail
Empirical Process
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
PubDate: 20221128
DOI: 10.1007/s10687022004555

 Extremal characteristics of conditional models

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: Conditionally specified models are often used to describe complex multivariate data. Such models assume implicit structures on the extremes. So far, no methodology exists for calculating extremal characteristics of conditional models since the copula and marginals are not expressed in closed forms. We consider bivariate conditional models that specify the distribution of X and the distribution of Y conditional on X. We provide tools to quantify implicit assumptions on the extremes of this class of models. In particular, these tools allow us to approximate the distribution of the tail of Y and the coefficient of asymptotic independence \(\eta\) in closed forms. We apply these methods to a widely used conditional model for wave height and wave period. Moreover, we introduce a new condition on the parameter space for the conditional extremes model of Heffernan and Tawn (Journal of the Royal Statistical Society: Series B (Methodology) 66(3), 497547, 2004), and prove that the conditional extremes model does not capture \(\eta\) , when \(\eta <1\) .
PubDate: 20221110
DOI: 10.1007/s10687022004537

 Asymptotic behavior of an intrinsic rankbased estimator of the Pickands
dependence function constructed from Bsplines
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: A bivariate extremevalue copula is characterized by its Pickands dependence function, i.e., a convex function defined on the unit interval satisfying boundary conditions. This paper investigates the largesample behavior of a nonparametric estimator of this function due to Cormier et al. (Extremes 17:633–659, 2014). These authors showed how to construct this estimator through constrained quadratic median Bspline smoothing of pairs of pseudoobservations derived from a random sample. Their estimator is shown here to exist whatever the order \(m \ge 3\) of the Bspline basis, and its consistency is established under minimal conditions. The largesample distribution of this estimator is also determined under the additional assumption that the underlying Pickands dependence function is a Bspline of given order with a known set of knots.
PubDate: 20221109
DOI: 10.1007/s10687022004519

 Palm theory for extremes of stationary regularly varying time series and
random fields
Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: The tail process \(\varvec{Y}=(Y_{\varvec{i}})_{\varvec{i}\in \mathbb {Z}^d}\) of a stationary regularly varying random field \(\varvec{X}=(X_{\varvec{i}})_{\varvec{i}\in \mathbb {Z}^d}\) represents the asymptotic local distribution of \(\varvec{X}\) as seen from its typical exceedance over a threshold u as \(u\rightarrow \infty\) . Motivated by the standard Palm theory, we show that every tail process satisfies an invariance property called exceedancestationarity and that this property, together with the spectral decomposition of the tail process, characterizes the class of all tail processes. We then restrict to the case when \(Y_{\varvec{i}}\rightarrow 0\) as \( \varvec{i} \rightarrow \infty\) and establish a couple of Palmlike dualities between the tail process and the socalled anchored tail process which, under suitable conditions, represents the asymptotic distribution of a typical cluster of extremes of \(\varvec{X}\) . The main message is that the distribution of the tail process is biased towards clusters with more exceedances. Finally, we use these results to determine the distribution of a typical cluster of extremes for moving average processes with random coefficients and heavytailed innovations.
PubDate: 20221024
DOI: 10.1007/s10687022004475

 Integral Functionals and the Bootstrap for the Tail Empirical Process

Free preprint version: Loading...Rate this result: What is this?Please help us test our new preprint finding feature by giving the preprint link a rating.
A 5 star rating indicates the linked preprint has the exact same content as the published article.
Abstract: The tail empirical process (TEP) generated by an i.i.d. sequence of regularly varying random variables is key to investigating the behaviour of extreme value statistics such as the Hill and harmonic moment estimators of the tail index. The main contribution of the paper is to prove that Efron’s bootstrap produces versions of the estimators that exhibit the same asymptotic behaviour, including possible bias. In addition, the bootstrap provides new estimators of the tail index based on variability. Further, the asymptotic behaviour of the bootstrap variance estimators is shown to be unaffected by bias.
PubDate: 20221014
DOI: 10.1007/s10687022004457
