Subjects -> STATISTICS (Total: 130 journals)
| A B C D E F G H I J K L M N O P Q R S T U V W X Y Z | The end of the list has been reached or no journals were found for your choice. |
|
|
- Simple random forest classification algorithms for predicting occurrences
and sizes of wildfires-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In order to formulate effective fire-mitigation policies, it is important to understand the spatial and temporal distribution of different types of wildfires and to be able to predict their occurrence taking the main influencing factors into account. The objective of this short communication is to assess the capability of a fast and easy-to-implement random forest algorithm to estimate cumulative probabilities fire frequency and burned area using a large dataset collected in the USA. The input variables of the algorithm are voluntary restricted to climate and land use factors, which are easy to obtain in practice. No input related to fire frequency, burned area, or to any other fire characteristic is used. After model selection and training, the performance of random forest is assessed using an independent dataset including 80,000 observations of fire occurrence and burned area. Results show that the score of our simple random forest algorithm is 9% higher than the score of the winner of the data challenge of Opitz (Extreme, 2022) revealing that, although this model has a good performance, it is not the best. However, the approach proposed here can be implemented using standard packages, does not require any fire monitoring system after training, and requires little specialized knowledge in machine learning, which makes it usable by a large diversity of stakeholders. The results of this study suggest that random forest should be part of the toolbox of engineers and scientists involved in wildfire prediction. PubDate: 2023-06-01
- Tail-dependence, exceedance sets, and metric embeddings
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: There are many ways of measuring and modeling tail-dependence in random vectors: from the general framework of multivariate regular variation and the flexible class of max-stable vectors down to simple and concise summary measures like the matrix of bivariate tail-dependence coefficients. This paper starts by providing a review of existing results from a unifying perspective, which highlights connections between extreme value theory and the theory of cuts and metrics. Our approach leads to some new findings in both areas with some applications to current topics in risk management. We begin by using the framework of multivariate regular variation to show that extremal coefficients, or equivalently, the higher-order tail-dependence coefficients of a random vector can simply be understood in terms of random exceedance sets, which allows us to extend the notion of Bernoulli compatibility. In the special but important case of bivariate tail-dependence, we establish a correspondence between tail-dependence matrices and \(L^1\) - and \(\ell _1\) -embeddable finite metric spaces via the spectral distance, which is a metric on the space of jointly 1-Fréchet random variables. Namely, the coefficients of the cut-decomposition of the spectral distance and of the Tawn-Molchanov max-stable model realizing the corresponding bivariate extremal dependence coincide. We show that line metrics are rigid and if the spectral distance corresponds to a line metric, the higher order tail-dependence is determined by the bivariate tail-dependence matrix. Finally, the correspondence between \(\ell _1\) -embeddable metric spaces and tail-dependence matrices allows us to revisit the realizability problem, i.e. checking whether a given matrix is a valid tail-dependence matrix. We confirm a conjecture of Shyamalkumar and Tao (2020) that this problem is NP-complete. PubDate: 2023-05-27
- Large nearest neighbour balls in hyperbolic stochastic geometry
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Consider a stationary Poisson process in a d-dimensional hyperbolic space. For \(R>0\) define the point process \(\xi _R^{(k)}\) of exceedance heights over a suitable threshold of the hyperbolic volumes of kth nearest neighbour balls centred around the points of the Poisson process within a hyperbolic ball of radius R centred at a fixed point. The point process \(\xi _R^{(k)}\) is compared to an inhomogeneous Poisson process on the real line with intensity function \(e^{-u}\) and point process convergence in the Kantorovich-Rubinstein distance is shown. From this, a quantitative limit theorem for the hyperbolic maximum kth nearest neighbour ball with a limiting Gumbel distribution is derived. PubDate: 2023-04-20
- Remembering Ross Leadbetter: some personal recollections
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Ross Leadbetter had had a broad and deep influence on the development of probabilistic and statistical theory of extreme values and on the application of extreme-value methods. He has been an inspiration and a friend for many of us. This editorial collects thirteen personal recollections of Ross and his work. An account of his career and some of his work can be found in the IMS Obituary “Ross Leadbetter 1931–2022”. PubDate: 2023-04-10
- Extremes of Markov random fields on block graphs: Max-stable limits and
structured Hüsler–Reiss distributions-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We study the joint occurrence of large values of a Markov random field or undirected graphical model associated to a block graph. On such graphs, containing trees as special cases, we aim to generalize recent results for extremes of Markov trees. Every pair of nodes in a block graph is connected by a unique shortest path. These paths are shown to determine the limiting distribution of the properly rescaled random field given that a fixed variable exceeds a high threshold. The latter limit relation implies that the random field is multivariate regularly varying and it determines the max-stable distribution to which component-wise maxima of independent random samples from the field are attracted. When the sub-vectors induced by the blocks have certain limits parametrized by Hüsler–Reiss distributions, the global Markov property of the original field induces a particular structure on the parameter matrix of the limiting max-stable Hüsler–Reiss distribution. The multivariate Pareto version of the latter turns out to be an extremal graphical model according to the original block graph. Thanks to these algebraic relations, the parameters are still identifiable even if some variables are latent. PubDate: 2023-04-04
- A marginal modelling approach for predicting wildfire extremes across the
contiguous United States-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper details a methodology proposed for the EVA 2021 conference data challenge. The aim of this challenge was to predict the number and size of wildfires over the contiguous US between 1993 and 2015, with more importance placed on extreme events. In the data set provided, over 14% of both wildfire count and burnt area observations are missing; the objective of the data challenge was to estimate a range of marginal probabilities from the distribution functions of these missing observations. To enable this prediction, we make the assumption that the marginal distribution of a missing observation can be informed using non-missing data from neighbouring locations. In our method, we select spatial neighbourhoods for each missing observation and fit marginal models to non-missing observations in these regions. For the wildfire counts, we assume the compiled data sets follow a zero-inflated negative binomial distribution, while for burnt area values, we model the bulk and tail of each compiled data set using non-parametric and parametric techniques, respectively. Cross validation is used to select tuning parameters, and the resulting predictions are shown to significantly outperform the benchmark method proposed in the challenge outline. We conclude with a discussion of our modelling framework, and evaluate ways in which it could be extended. PubDate: 2023-04-01
- A weighted composite log-likelihood approach to parametric estimation of
the extreme quantiles of a distribution-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Extreme value theory motivates estimating extreme upper quantiles of a distribution by selecting some threshold, discarding those observations below the threshold and fitting a generalized Pareto distribution to exceedances above the threshold via maximum likelihood. This sharp cutoff between observations that are used in the parameter estimation and those that are not is at odds with statistical practice for analogous problems such as nonparametric density estimation, in which observations are typically smoothly downweighted as they become more distant from the value at which the density is being estimated. By exploiting the fact that the order statistics of independent and identically distributed observations form a Markov chain, this work shows how one can obtain a natural weighted composite log-likelihood function for fitting generalized Pareto distributions to exceedances over a threshold. A method for producing confidence intervals based on inverting a test statistic calibrated via parametric bootstrapping is proposed. Some theory demonstrates the asymptotic advantages of using weights in the special case when the shape parameter of the limiting generalized Pareto distribution is known to be 0. Methods for extending this approach to observations that are not identically distributed are described and applied to an analysis of daily precipitation data in New York City. Perhaps the most important practical finding is that including weights in the composite log-likelihood function can reduce the sensitivity of estimates to small changes in the threshold. PubDate: 2023-03-29 DOI: 10.1007/s10687-023-00466-w
- Editorial: EVA 2021 data challenge on spatiotemporal prediction of
wildfire extremes in the USA-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2023-03-27 DOI: 10.1007/s10687-023-00465-x
- Joint modeling and prediction of massive spatio-temporal wildfire count
and burnt area data with the INLA-SPDE approach-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper describes the methodology used by the team RedSea in the data competition organized for EVA 2021 conference. We develop a novel two-part model to jointly describe the wildfire count data and burnt area data provided by the competition organizers with covariates. Our proposed methodology relies on the integrated nested Laplace approximation combined with the stochastic partial differential equation (INLA-SPDE) approach. In the first part, a binary non-stationary spatio-temporal model is used to describe the underlying process that determines whether or not there is wildfire at a specific time and location. In the second part, we consider a non-stationary hurdle log-Gaussian Cox process (hurdle-LGCP) for the positive wildfire count data, i.e., an LGCP is used to model the shifted positive count data, and a non-stationary log-Gaussian model for positive burnt area data. Dependence between the positive count data and positive burnt area data is captured by a shared spatio-temporal random effect. Our two-part modeling approach performs well in terms of the prediction score criterion chosen by the data competition organizers. Moreover, our model results show that surface pressure is the most influential driver for the occurrence of a wildfire, whilst surface net solar radiation and surface pressure are the key drivers for large numbers of wildfires, and temperature and evaporation are the key drivers of large burnt areas. PubDate: 2023-03-14 DOI: 10.1007/s10687-023-00463-z
- A combined statistical and machine learning approach for spatial
prediction of extreme wildfire frequencies and sizes-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Motivated by the Extreme Value Analysis 2021 (EVA 2021) data challenge, we propose a method based on statistics and machine learning for the spatial prediction of extreme wildfire frequencies and sizes. This method is tailored to handle large datasets, including missing observations. Our approach relies on a four-stage, bivariate, sparse spatial model for high-dimensional zero-inflated data that we develop using stochastic partial differential equations (SPDE), allowing sparse precision matrices for the latent processes. In Stage 1, the observations are separated in zero/nonzero categories and modeled using a two-layered hierarchical Bayesian sparse spatial model to estimate the probabilities of these two categories. In Stage 2, we first obtain empirical estimates of the spatially-varying mean and variance profiles across the spatial locations for the positive observations and smooth those estimates using fixed rank kriging. This approximate Bayesian inference method is employed to avoid the high computational burden of large spatial data modeling using spatially-varying coefficients. In Stage 3, we further model the standardized log-transformed positive observations from the second stage using a sparse bivariate spatial Gaussian process. The Gaussian distribution assumption for wildfire counts developed in the third stage is computationally effective but erroneous. Thus, in Stage 4, the predicted exceedance probabilities are post-processed using Random Forests. We draw posterior inference for Stages 1 and 3 using Markov chain Monte Carlo (MCMC) sampling. We then create a cross-validation scheme for the artificially generated gaps and compare the EVA 2021 prediction scores of the proposed model to those obtained using some competitors. PubDate: 2023-02-21 DOI: 10.1007/s10687-022-00460-8
- Analysis of wildfires and their extremes via spatial quantile
autoregressive model-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: In this paper we propose a procedure to estimate the distribution of wildfire frequency and severity using the wildfire data measured by month during 1993–2015. To this end, a spatial quantile autoregressive model (SQAR) is applied to the data with an aid of extreme value theory. Using the proposed method we are able to predict the distributional behavior of the data and identify the hidden structures beyond their mean structures. In addition, abundant interpretations are available with a regression-based model. We provide the estimated results from the wildfire data, including significant explanatory variables and some meaningful interpretations. PubDate: 2023-02-13 DOI: 10.1007/s10687-023-00462-0
- Gradient boosting with extreme-value theory for wildfire prediction
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extreme-value theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial cross-validation scheme and show that in our setting it provides a better proxy for test set performance than naive cross-validation. The predictions are benchmarked against boosting approaches with different loss functions, and perform competitively in terms of the score criterion, finally placing second in the competition ranking. PubDate: 2023-01-21 DOI: 10.1007/s10687-022-00454-6
- Reconstruction of incomplete wildfire data using deep generative models
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We present our submission to the Extreme Value Analysis 2021 Data Challenge in which teams were asked to accurately predict distributions of wildfire frequency and size within spatio-temporal regions of missing data. For this competition, we developed a variant of the powerful variational autoencoder models, which we call Conditional Missing data Importance-Weighted Autoencoder (CMIWAE). Our deep latent variable generative model requires little to no feature engineering and does not necessarily rely on the specifics of scoring in the Data Challenge. It is fully trained on incomplete data, with the single objective to maximize log-likelihood of the observed wildfire information. We mitigate the effects of the relatively low number of training samples by stochastic sampling from a variational latent variable distribution, as well as by ensembling a set of CMIWAE models trained and validated on different splits of the provided data. PubDate: 2023-01-18 DOI: 10.1007/s10687-022-00459-1
- Exchangeable min-id sequences: Characterization, exponent measures and
non-decreasing id-processes-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: We establish a one-to-one correspondence between (i) exchangeable sequences of random variables whose finite-dimensional distributions are minimum (or maximum) infinitely divisible and (ii) non-negative, non-decreasing, infinitely divisible stochastic processes. The exponent measure of an exchangeable minimum infinitely divisible sequence is shown to be the sum of a very simple “drift measure” and a mixture of product probability measures, which uniquely corresponds to the Lévy measure of a non-negative and non-decreasing infinitely divisible process. The latter is shown to be supported on non-negative and non-decreasing functions. In probabilistic terms, the aforementioned infinitely divisible process is equal to the conditional cumulative hazard process associated with the exchangeable sequence of random variables with minimum (or maximum) infinitely divisible marginals. Our results provide an analytic umbrella which embeds the de Finetti subfamilies of many interesting classes of multivariate distributions, such as exogenous shock models, exponential and geometric laws with lack-of-memory property, min-stable multivariate exponential and extreme-value distributions, as well as reciprocal Archimedean copulas with completely monotone generator and Archimedean copulas with log-completely monotone generator. PubDate: 2022-12-17 DOI: 10.1007/s10687-022-00450-w
- Running minimum in the best-choice problem
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The full-information best choice problem asks one to find a strategy maximising the probability of stopping at the minimum (or maximum) of a sequence \(X_1,\cdots ,X_n\) of i.i.d. random variables with continuous distribution. In this paper we look at more general models, where independent \(X_j\) ’s may have different distributions, discrete or continuous. A central role in our study is played by the running minimum process, which we first employ to re-visit the classic problem and its limit Poisson counterpart. The approach is further applied to two explicitly solvable models: in the first the distribution of the jth variable is uniform on \(\{j,\cdots ,n\}\) , and in the second it is uniform on \(\{1,\cdots , n\}\) . PubDate: 2022-11-29 DOI: 10.1007/s10687-022-00457-3
- Publisher Correction: Integral Functionals and the Bootstrap for the Tail
Empirical Process-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
PubDate: 2022-11-28 DOI: 10.1007/s10687-022-00455-5
- Extremal characteristics of conditional models
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Conditionally specified models are often used to describe complex multivariate data. Such models assume implicit structures on the extremes. So far, no methodology exists for calculating extremal characteristics of conditional models since the copula and marginals are not expressed in closed forms. We consider bivariate conditional models that specify the distribution of X and the distribution of Y conditional on X. We provide tools to quantify implicit assumptions on the extremes of this class of models. In particular, these tools allow us to approximate the distribution of the tail of Y and the coefficient of asymptotic independence \(\eta\) in closed forms. We apply these methods to a widely used conditional model for wave height and wave period. Moreover, we introduce a new condition on the parameter space for the conditional extremes model of Heffernan and Tawn (Journal of the Royal Statistical Society: Series B (Methodology) 66(3), 497-547, 2004), and prove that the conditional extremes model does not capture \(\eta\) , when \(\eta <1\) . PubDate: 2022-11-10 DOI: 10.1007/s10687-022-00453-7
- Asymptotic behavior of an intrinsic rank-based estimator of the Pickands
dependence function constructed from B-splines-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: A bivariate extreme-value copula is characterized by its Pickands dependence function, i.e., a convex function defined on the unit interval satisfying boundary conditions. This paper investigates the large-sample behavior of a nonparametric estimator of this function due to Cormier et al. (Extremes 17:633–659, 2014). These authors showed how to construct this estimator through constrained quadratic median B-spline smoothing of pairs of pseudo-observations derived from a random sample. Their estimator is shown here to exist whatever the order \(m \ge 3\) of the B-spline basis, and its consistency is established under minimal conditions. The large-sample distribution of this estimator is also determined under the additional assumption that the underlying Pickands dependence function is a B-spline of given order with a known set of knots. PubDate: 2022-11-09 DOI: 10.1007/s10687-022-00451-9
- Palm theory for extremes of stationary regularly varying time series and
random fields-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The tail process \(\varvec{Y}=(Y_{\varvec{i}})_{\varvec{i}\in \mathbb {Z}^d}\) of a stationary regularly varying random field \(\varvec{X}=(X_{\varvec{i}})_{\varvec{i}\in \mathbb {Z}^d}\) represents the asymptotic local distribution of \(\varvec{X}\) as seen from its typical exceedance over a threshold u as \(u\rightarrow \infty\) . Motivated by the standard Palm theory, we show that every tail process satisfies an invariance property called exceedance-stationarity and that this property, together with the spectral decomposition of the tail process, characterizes the class of all tail processes. We then restrict to the case when \(Y_{\varvec{i}}\rightarrow 0\) as \( \varvec{i} \rightarrow \infty\) and establish a couple of Palm-like dualities between the tail process and the so-called anchored tail process which, under suitable conditions, represents the asymptotic distribution of a typical cluster of extremes of \(\varvec{X}\) . The main message is that the distribution of the tail process is biased towards clusters with more exceedances. Finally, we use these results to determine the distribution of a typical cluster of extremes for moving average processes with random coefficients and heavy-tailed innovations. PubDate: 2022-10-24 DOI: 10.1007/s10687-022-00447-5
- Integral Functionals and the Bootstrap for the Tail Empirical Process
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: The tail empirical process (TEP) generated by an i.i.d. sequence of regularly varying random variables is key to investigating the behaviour of extreme value statistics such as the Hill and harmonic moment estimators of the tail index. The main contribution of the paper is to prove that Efron’s bootstrap produces versions of the estimators that exhibit the same asymptotic behaviour, including possible bias. In addition, the bootstrap provides new estimators of the tail index based on variability. Further, the asymptotic behaviour of the bootstrap variance estimators is shown to be unaffected by bias. PubDate: 2022-10-14 DOI: 10.1007/s10687-022-00445-7
|