Abstract: When making inferences about extreme quantiles, using simple parametric models for the entire distribution can be problematic in that a model that accurately describes the bulk of the distribution may lead to substantially biased estimates of extreme quantiles if the model is misspecified. One way to address this problem is to use flexible parametric families of distributions. For the setting where extremes in both the upper and lower tails are of interest, this paper describes various approaches to quantifying notions of flexibility and then proposes new parametric classes of distributions that satisfy these notions and are computable without requiring numerical integration. A semiparametric extension of these distributions is proposed when the parametric classes are not sufficiently flexible. Some of the new models are applied to daily temperature in July from an ensemble of 50 climate model runs that can be treated as independent realizations of the climate system over the period studied. The large ensemble makes it possible to compare estimates of extreme quantiles based on a single model run to estimates based on the full ensemble. For these data, at the four largest US cities, Chicago, Houston, Los Angeles and New York City, the parametric models generally dominate estimates based on fitting generalized Pareto distributions to some fraction of the most extreme observations, sometimes by a substantial margin. Thus, in at least this setting, parametric models not only provide a way to estimate the whole distribution, they also result in better estimates of extreme quantiles than traditional extreme value approaches. PubDate: 2021-06-01

Abstract: Estimation of extreme quantile regions, spaces in which future extreme events can occur with a given low probability, even beyond the range of the observed data, is an important task in the analysis of extremes. Existing methods to estimate such regions are available, but do not provide any measures of estimation uncertainty. We develop univariate and bivariate schemes for estimating extreme quantile regions under the Bayesian paradigm that outperforms existing approaches and provides natural measures of quantile region estimate uncertainty. We examine the method’s performance in controlled simulation studies. We illustrate the applicability of the proposed method by analysing high bivariate quantiles for pairs of pollutants, conditionally on different temperature gradations, recorded in Milan, Italy. PubDate: 2021-06-01

Abstract: To mitigate the risk posed by extreme rainfall events, we require statistical models that reliably capture extremes in continuous space with dependence. However, assuming a stationary dependence structure in such models is often erroneous, particularly over large geographical domains. Furthermore, there are limitations on the ability to fit existing models, such as max-stable processes, to a large number of locations. To address these modelling challenges, we present a regionalisation method that partitions stations into regions of similar extremal dependence using clustering. To demonstrate our regionalisation approach, we consider a study region of Australia and discuss the results with respect to known climate and topographic features. To visualise and evaluate the effectiveness of the partitioning, we fit max-stable models to each of the regions. This work serves as a prelude to how one might consider undertaking a project where spatial dependence is non-stationary and is modelled on a large geographical scale. PubDate: 2021-06-01

Abstract: A common statistical problem in hydrology is the estimation of annual maximal river flow distributions and their quantiles, with the objective of evaluating flood protection systems. Typically, record lengths are short and estimators imprecise, so that it is advisable to exploit additional sources of information. However, there is often uncertainty about the adequacy of such information, and a strict decision on whether to use it is difficult. We propose penalized quasi-maximum likelihood estimators to overcome this dilemma, allowing one to push the model towards a reasonable direction defined a priori. We are particularly interested in regional settings, with river flow observations collected at multiple stations. To account for regional information, we introduce a penalization term inspired by the popular Index Flood assumption. Unlike in standard approaches, the degree of regionalization can be controlled gradually instead of deciding between a local or a regional estimator. Theoretical results on the consistency of the estimator are provided and extensive simulations are performed for the reason of comparison with other local and regional estimators. The proposed procedure yields very good results, both for homogeneous as well as for heterogeneous groups of sites. A case study consisting of sites in Saxony, Germany, illustrates the applicability to real data. PubDate: 2021-06-01

Abstract: In an earthquake event, the combination of a strong mainshock and damaging aftershocks is often the cause of severe structural damages and/or high death tolls. The objective of this paper is to provide estimation for the probability of such extreme events where the mainshock and the largest aftershocks exceed certain thresholds. Two approaches are illustrated and compared – a parametric approach based on previously observed stochastic laws in earthquake data, and a non-parametric approach based on bivariate extreme value theory. We analyze the earthquake data from the North Anatolian Fault Zone (NAFZ) in Turkey during 1965–2018 and show that the two approaches provide unifying results. PubDate: 2021-06-01

Abstract: Saudi Arabia has been seeking to reduce its dependence on oil by diversifying its energy portfolio, including the largely underused energy potential from wind. However, extreme winds can possibly disrupt the wind turbine operations, thus preventing the stable and continuous production of wind energy. In this study, we assess the risk of disruptions of wind turbine operations, based on return levels with a hierarchical spatial extreme modeling approach for wind speeds in Saudi Arabia. Using a unique Weather Research and Forecasting dataset, we provide the first high-resolution risk assessment of wind extremes under spatial non-stationarity over the country. We account for the spatial dependence with a multivariate intrinsic autoregressive prior at the latent Gaussian process level. The computational efficiency is greatly improved by parallel computing on subregions from spatial clustering, and the maps are smoothed by fitting the model to cluster neighbors. Under the Bayesian hierarchical framework, we measure the uncertainty of return levels from the posterior Markov chain Monto Carlo samples, and produce probability maps of return levels exceeding the cut-out wind speed of wind turbines within their lifetime. The probability maps show that locations in the South of Saudi Arabia and near the Red Sea and the Persian Gulf are at very high risk of disruption of wind turbine operations. PubDate: 2021-06-01

Abstract: Physical considerations and previous studies suggest that extremal dependence between ocean storm severity at two locations exhibits near asymptotic dependence at short inter-location distances, leading to asymptotic independence and perfect independence with increasing distance. We present a spatial conditional extremes (SCE) model for storm severity, characterising extremal spatial dependence of severe storms by distance and direction. The model is an extension of Shooter et al. 2019 (Environmetrics 30, e2562, 2019) and Wadsworth and Tawn (2019), incorporating piecewise linear representations for SCE model parameters with distance and direction; model variants including parametric representations of some SCE model parameters are also considered. The SCE residual process is assumed to follow the delta-Laplace form marginally, with distance-dependent parameter. Residual dependence of remote locations given conditioning location is characterised by a conditional Gaussian covariance dependent on the distances between remote locations, and distances of remote locations to the conditioning location. We apply the model using Bayesian inference to estimates extremal spatial dependence of storm peak significant wave height on a neighbourhood of 150 locations covering over 200,000 km2 in the North Sea. PubDate: 2021-06-01

Abstract: It is well known that the distribution of extreme values of strictly stationary sequences differ from those of independent and identically distributed sequences in that extremal clustering may occur. Here we consider non-stationary but identically distributed sequences of random variables subject to suitable long range dependence restrictions. We find that the limiting distribution of appropriately normalized sample maxima depends on a parameter that measures the average extremal clustering of the sequence. Based on this new representation we derive the asymptotic distribution for the time between consecutive extreme observations and construct moment and likelihood based estimators for measures of extremal clustering. We specialize our results to random sequences with periodic dependence structure. PubDate: 2021-05-12

Abstract: First, we consider a stationary random field indexed by an increasing sequence of subsets of \(\mathbb {Z}^{d}\) . Under certain mixing and anti–clustering conditions combined with a very broad assumption on how the sequence of spatial index sets increases, we obtain an extremal result that relates a normalized version of the distribution of the maximum of the field over the index sets to the tail distribution of the individual variables. Furthermore, we identify the limiting distribution as an extreme value distribution. Secondly, we consider a continuous, infinitely divisible random field indexed by \(\mathbb {R}^{d}\) given as an integral of a kernel function with respect to a Lévy basis with convolution equivalent Lévy measure. When observing the supremum of this field over an increasing sequence of (continuous) index sets, we obtain an extreme value theorem for the distribution of this supremum. The proof relies on discretization and a conditional version of the technique applied in the first part of the paper, as we condition on the high activity and light–tailed part of the field. PubDate: 2021-05-07

Abstract: In the context of bivariate random variables \(\left (Y^{(1)},Y^{(2)}\right )\) , the marginal expected shortfall, defined as \(\mathbb {E}\left (Y^{(1)} Y^{(2)} \ge Q_{2}(1-p)\right )\) for p small, where Q2 denotes the quantile function of Y(2), is an important risk measure, which finds applications in areas like, e.g., finance and environmental science. Our paper pioneers the statistical modeling of this risk measure when the random variables of main interest \(\left (Y^{(1)},Y^{(2)}\right )\) are observed together with a random covariate X, leading to the concept of the conditional marginal expected shortfall. The asymptotic behavior of an estimator for this conditional marginal expected shortfall is studied for a wide class of conditional bivariate distributions, with heavy-tailed marginal conditional distributions, and where p tends to zero at an intermediate rate. The finite sample performance is evaluated on a small simulation experiment. The practical applicability of the proposed estimator is illustrated on flood claim data. PubDate: 2021-05-06

Abstract: In this paper we derive new results on multivariate extremes and D-norms. In particular we establish new characterizations of the multivariate max-domain of attraction property. The limit distribution of certain multivariate exceedances above high thresholds is derived, and the distribution of that generator of a D-norm on \({\mathbb R}^{d}\) , whose components sum up to d, is obtained. Finally we introduce exchangeable D-norms and show that the set of exchangeable D-norms is a simplex. PubDate: 2021-05-05

Abstract: Threshold selection plays a key role in various aspects of statistical inference of rare events. In this work, two new threshold selection methods are introduced. The first approach measures the fit of the exponential approximation above a threshold and achieves good performance in small samples. The second method smoothly estimates the asymptotic mean squared error of the Hill estimator and performs consistently well over a wide range of processes. Both methods are analyzed theoretically, compared to existing procedures in an extensive simulation study and applied to a dataset of financial losses, where the underlying extreme value index is assumed to vary over time. PubDate: 2021-04-16

Abstract: In this paper, we extend the Zipf distribution by means of the Randomly Stopped Extreme mechanism; we establish the conditions under which the maximum and minimum families of distributions intersect in the original family; and we demonstrate how to generate data from the extended family using any Zipf random number generator. We study in detail the particular cases of geometric and positive Poisson stopping distributions, showing that, in log-log scale, the extended models allow for top-concavity (top-convexity) while maintaining linearity in the tail. We prove the suitability of the models presented, by fitting the degree sequences in a collaboration and a protein-protein interaction networks. The proposed models not only give a good fit, but they also allow for extracting interesting insights related to the data generation mechanism. PubDate: 2021-03-23

Abstract: We propose a new class of extreme-value copulas which are extreme-value limits of conditional normal models. Conditional normal models are generalizations of conditional independence models, where the dependence among observed variables is modeled using one unobserved factor. Conditional on this factor, the distribution of these variables is given by the Gaussian copula. This structure allows one to build flexible and parsimonious models for data with complex dependence structures, such as data with spatial dependence or factor structure. We study the extreme-value limits of these models and show some interesting special cases of the proposed class of copulas. We develop estimation methods for the proposed models and conduct a simulation study to assess the performance of these algorithms. Finally, we apply these copula models to analyze data on monthly wind maxima and stock return minima. PubDate: 2021-03-19

Abstract: We develop a method for probabilistic prediction of extreme value hot-spots in a spatio-temporal framework, tailored to big datasets containing important gaps. In this setting, direct calculation of summaries from data, such as the minimum over a space-time domain, is not possible. To obtain predictive distributions for such cluster summaries, we propose a two-step approach. We first model marginal distributions with a focus on accurate modeling of the right tail and then, after transforming the data to a standard Gaussian scale, we estimate a Gaussian space-time dependence model defined locally in the time domain for the space-time subregions where we want to predict. In the first step, we detrend the mean and standard deviation of the data and fit a spatially resolved generalized Pareto distribution to apply a correction of the upper tail. To ensure spatial smoothness of the estimated trends, we either pool data using nearest-neighbor techniques, or apply generalized additive regression modeling. To cope with high space-time resolution of data, the local Gaussian models use a Markov representation of the Matérn correlation function based on the stochastic partial differential equations (SPDE) approach. In the second step, they are fitted in a Bayesian framework through the integrated nested Laplace approximation implemented in R-INLA. Finally, posterior samples are generated to provide statistical inferences through Monte-Carlo estimation. Motivated by the 2019 Extreme Value Analysis data challenge, we illustrate our approach to predict the distribution of local space-time minima in anomalies of Red Sea surface temperatures, using a gridded dataset (11315 days, 16703 pixels) with artificially generated gaps. In particular, we show the improved performance of our two-step approach over a purely Gaussian model without tail transformations. PubDate: 2021-03-01 DOI: 10.1007/s10687-020-00394-z

Abstract: Recently in Gao and Stoev (2020) it was established that the concentration of maxima phenomenon is the key to solving the exact sparse support recovery problem in high dimensions. This phenomenon, known also as relative stability, has been little studied in the context of dependence. Here, we obtain bounds on the rate of concentration of maxima in Gaussian triangular arrays. These results are used to establish sufficient conditions for the uniform relative stability of functions of Gaussian arrays, leading to new models that exhibit phase transitions in the exact support recovery problem. Finally, the optimal rate of concentration for Gaussian arrays is studied under general assumptions implied by the classic condition of Berman (1964). PubDate: 2020-11-19 DOI: 10.1007/s10687-020-00399-8

Abstract: We describe our submission to the Extreme Value Analysis 2019 Data Challenge in which teams were asked to predict extremes of sea surface temperature anomaly within spatio-temporal regions of missing data. We present a computational framework which reconstructs missing data using convolutional deep neural networks. Conditioned on incomplete data, we employ autoencoder-like models as multivariate conditional distributions from which possible reconstructions of the complete dataset are sampled using imputed noise. In order to mitigate bias introduced by any one particular model, a prediction ensemble is constructed to create the final distribution of extremal values. Our method does not rely on expert knowledge in order to accurately reproduce dynamic features of a complex oceanographic system with minimal assumptions. The obtained results promise reusability and generalization to other domains. PubDate: 2020-10-21 DOI: 10.1007/s10687-020-00396-x