Abstract: Abstract It is long known that the distribution of a sum Sn of independent non-negative integer-valued random variables can often be approximated by a Poisson law: Sn≈πλ, where . The problem of evaluating the accuracy of such approximation has attracted a lot of attention in the past six decades. From a practical point of view, the problem has important applications in insurance, reliability theory, extreme value theory, etc.; from a theoretical point of view, it provides insights into Kolmogorov’s problem. Among popular metrics considered in the literature is the Gini–Kantorovich distance dG. The task of establishing an estimate of dG(Sn;πλ) with correct (the best possible) constant at the leading term remained open for a long while. The paper presents a solution to that problem. A first-order asymptotic expansion is established as well. We show that the accuracy of approximation can be considerably better if the random variables obey an extra condition involving the first two moments. A sharp estimate of the accuracy of shifted (translated) Poisson approximation is established as well. PubDate: 2021-01-08
Abstract: Abstract In an earthquake event, the combination of a strong mainshock and damaging aftershocks is often the cause of severe structural damages and/or high death tolls. The objective of this paper is to provide estimation for the probability of such extreme events where the mainshock and the largest aftershocks exceed certain thresholds. Two approaches are illustrated and compared – a parametric approach based on previously observed stochastic laws in earthquake data, and a non-parametric approach based on bivariate extreme value theory. We analyze the earthquake data from the North Anatolian Fault Zone (NAFZ) in Turkey during 1965–2018 and show that the two approaches provide unifying results. PubDate: 2020-11-23
Abstract: Abstract Recently in Gao and Stoev (2020) it was established that the concentration of maxima phenomenon is the key to solving the exact sparse support recovery problem in high dimensions. This phenomenon, known also as relative stability, has been little studied in the context of dependence. Here, we obtain bounds on the rate of concentration of maxima in Gaussian triangular arrays. These results are used to establish sufficient conditions for the uniform relative stability of functions of Gaussian arrays, leading to new models that exhibit phase transitions in the exact support recovery problem. Finally, the optimal rate of concentration for Gaussian arrays is studied under general assumptions implied by the classic condition of Berman (1964). PubDate: 2020-11-19
Abstract: Abstract This paper presents our winning entry for the EVA 2019 data competition, the aim of which is to predict Red Sea surface temperature extremes over space and time. To achieve this, we used a stochastic partial differential equation (Poisson equation) based method, improved through a regularization to penalize large magnitudes of solutions. This approach is shown to be successful according to the competition’s evaluation criterion, i.e. a threshold-weighted continuous ranked probability score. Our stochastic Poisson equation and its boundary conditions resolve the data’s non-stationarity naturally and effectively. Meanwhile, our numerical method is computationally efficient at dealing with the data’s high dimensionality, without any parameter estimation. It demonstrates the usefulness of stochastic differential equations on spatio-temporal predictions, including the extremes of the process. PubDate: 2020-11-17
Abstract: Abstract Recently, the notion of implicit extreme value distributions has been established, which is based on a given loss function f ≥ 0. From an application point of view, one is rather interested in extreme loss events that occur relative to f than in the corresponding extreme values itself. In this context, so-called f -implicit α-Fréchet max-stable distributions arise and have been used to construct independently scattered sup-measures that possess such margins. In this paper we solve an open problem in Goldbach (2016) by developing a stochastic integral of a deterministic function g ≥ 0 with respect to implicit max-stable sup-measures. The resulting theory covers the construction of max-stable extremal integrals (see Stoev and Taqqu Extremes 8, 237–266 (2005)) and, at the same time, reveals striking parallels. PubDate: 2020-10-22
Abstract: Abstract We describe our submission to the Extreme Value Analysis 2019 Data Challenge in which teams were asked to predict extremes of sea surface temperature anomaly within spatio-temporal regions of missing data. We present a computational framework which reconstructs missing data using convolutional deep neural networks. Conditioned on incomplete data, we employ autoencoder-like models as multivariate conditional distributions from which possible reconstructions of the complete dataset are sampled using imputed noise. In order to mitigate bias introduced by any one particular model, a prediction ensemble is constructed to create the final distribution of extremal values. Our method does not rely on expert knowledge in order to accurately reproduce dynamic features of a complex oceanographic system with minimal assumptions. The obtained results promise reusability and generalization to other domains. PubDate: 2020-10-21
Abstract: Abstract To mitigate the risk posed by extreme rainfall events, we require statistical models that reliably capture extremes in continuous space with dependence. However, assuming a stationary dependence structure in such models is often erroneous, particularly over large geographical domains. Furthermore, there are limitations on the ability to fit existing models, such as max-stable processes, to a large number of locations. To address these modelling challenges, we present a regionalisation method that partitions stations into regions of similar extremal dependence using clustering. To demonstrate our regionalisation approach, we consider a study region of Australia and discuss the results with respect to known climate and topographic features. To visualise and evaluate the effectiveness of the partitioning, we fit max-stable models to each of the regions. This work serves as a prelude to how one might consider undertaking a project where spatial dependence is non-stationary and is modelled on a large geographical scale. PubDate: 2020-10-07
Abstract: Abstract We develop a method for probabilistic prediction of extreme value hot-spots in a spatio-temporal framework, tailored to big datasets containing important gaps. In this setting, direct calculation of summaries from data, such as the minimum over a space-time domain, is not possible. To obtain predictive distributions for such cluster summaries, we propose a two-step approach. We first model marginal distributions with a focus on accurate modeling of the right tail and then, after transforming the data to a standard Gaussian scale, we estimate a Gaussian space-time dependence model defined locally in the time domain for the space-time subregions where we want to predict. In the first step, we detrend the mean and standard deviation of the data and fit a spatially resolved generalized Pareto distribution to apply a correction of the upper tail. To ensure spatial smoothness of the estimated trends, we either pool data using nearest-neighbor techniques, or apply generalized additive regression modeling. To cope with high space-time resolution of data, the local Gaussian models use a Markov representation of the Matérn correlation function based on the stochastic partial differential equations (SPDE) approach. In the second step, they are fitted in a Bayesian framework through the integrated nested Laplace approximation implemented in R-INLA. Finally, posterior samples are generated to provide statistical inferences through Monte-Carlo estimation. Motivated by the 2019 Extreme Value Analysis data challenge, we illustrate our approach to predict the distribution of local space-time minima in anomalies of Red Sea surface temperatures, using a gridded dataset (11315 days, 16703 pixels) with artificially generated gaps. In particular, we show the improved performance of our two-step approach over a purely Gaussian model without tail transformations. PubDate: 2020-09-15
Abstract: Abstract Classification tasks usually assume that all possible classes are present during the training phase. This is restrictive if the algorithm is used over a long time and possibly encounters samples from unknown new classes. It is therefore fundamental to develop algorithms able to distinguish between normal and abnormal test data. In the last few years, extreme value theory has become an important tool in multivariate statistics and machine learning. The recently introduced extreme value machine, a classifier motivated by extreme value theory, addresses this problem and achieves competitive performance in specific cases. We show that this algorithm has some theoretical and practical drawbacks and can fail even if the recognition task is fairly simple. To overcome these limitations, we propose two new algorithms for anomaly detection relying on approximations from extreme value theory that are more robust in such cases. We exploit the intuition that test points that are extremely far from the training classes are more likely to be abnormal objects. We derive asymptotic results motivated by univariate extreme value theory that make this intuition precise. We show the effectiveness of our classifiers in simulations and on real data sets. PubDate: 2020-09-09
Abstract: Abstract In this paper, we investigate temporal clusters of extremes defined as subsequent exceedances of high thresholds in a stationary time series. Two meaningful features of these clusters are the probability distribution of the cluster size and the ordinal patterns giving the relative positions of the data points within a cluster. Since these patterns take only the ordinal structure of consecutive data points into account, the method is robust under monotone transformations and measurement errors. We verify the existence of the corresponding limit distributions in the framework of regularly varying time series, develop non-parametric estimators and show their asymptotic normality under appropriate mixing conditions. The performance of the estimators is demonstrated in a simulated example and a real data application to discharge data of the river Rhine. PubDate: 2020-08-24
Abstract: Abstract We investigate maxima in incomplete samples from strictly stationary random sequences defined as linear models of i.i.d. random variables with heavy-tailed innovations that satisfy the tail balance condition. Using the point process approach we obtain limit theorems for the sequence of random vectors whose components are properly normalized maxima in complete and incomplete samples. PubDate: 2020-08-23
Abstract: Abstract We derive the exact asymptotics of $ {\mathbb {P} \left \{ \underset {t\ge 0}{\sup } \left (X_{1}(t) - \mu _{1} t\right )> u, \ \underset {s\ge 0}{\sup } \left (X_{2}(s) - \mu _{2} s\right )> u \right \} },\ \ u\to \infty , $ where (X1(t), X2(s))t, s≥ 0 is a correlated two-dimensional Brownian motion with correlation ρ ∈ [− 1,1] and μ1, μ2 > 0. It appears that the play between ρ and μ1, μ2 leads to several types of asymptotics. Although the exponent in the asymptotics as a function of ρ is continuous, one can observe different types of prefactor functions depending on the range of ρ, which constitute a phase-type transition phenomena. PubDate: 2020-08-11
Abstract: Abstract The Marshall-Olkin (MO) distribution is considered a key model in reliability theory and in risk analysis, where it is used to model the lifetimes of dependent components or entities of a system and dependency is induced by “shocks” that hit one or more components at a time. Of particular interest is the Lévy-frailty subfamily of the Marshall-Olkin (LFMO) distribution, since it has few parameters and because the nontrivial dependency structure is driven by an underlying Lévy subordinator process. The main contribution of this work is that we derive the precise asymptotic behavior of the upper order statistics of the LFMO distribution. More specifically, we consider a sequence of n univariate random variables jointly distributed as a multivariate LFMO distribution and analyze the order statistics of the sequence as n grows. Our main result states that if the underlying Lévy subordinator is in the normal domain of attraction of a stable distribution with index of stability α then, after certain logarithmic centering and scaling, the upper order statistics converge in distribution to a stable distribution if α > 1 or a simple transformation of it if α ≤ 1. Our result can also give easily computable confidence intervals for the last failure times, provided that a proper convergence analysis is carried out first. PubDate: 2020-08-07
Abstract: Abstract We consider removing lower order statistics from the classical Hill estimator in extreme value statistics, and compensating for it by rescaling the remaining terms. Trajectories of these trimmed statistics as a function of the extent of trimming turn out to be quite flat near the optimal threshold value. For the regularly varying case, the classical threshold selection problem in tail estimation is then revisited, both visually via trimmed Hill plots and, for the Hall class, also mathematically via minimizing the expected empirical variance. This leads to a simple threshold selection procedure for the classical Hill estimator which circumvents the estimation of some of the tail characteristics, a problem which is usually the bottleneck in threshold selection. As a by-product, we derive an alternative estimator of the tail index, which assigns more weight to large observations, and works particularly well for relatively lighter tails. A simple ratio statistic routine is suggested to evaluate the goodness of the implied selection of the threshold. We illustrate the favourable performance and the potential of the proposed method with simulation studies and real insurance data. PubDate: 2020-07-14
Abstract: Abstract We develop a formula for the power-law decay of various sets for symmetric stable random vectors in terms of how many vectors from the support of the corresponding spectral measure are needed to enter the set. One sees different decay rates in “different directions”, illustrating the phenomenon of hidden regular variation. We give several examples and obtain quite varied behavior, including sets which do not have exact power-law decay. PubDate: 2020-07-09
Abstract: Abstract Measures of risk concentration and their asymptotic behavior for portfolios with heavy-tailed risk factors is of interest in risk management. Second order regular variation is a structural assumption often imposed on such risk factors to study their convergence rates. In this paper, we provide the asymptotic rate of convergence of the measure of risk concentration for a portfolio of heavy-tailed risk factors, when the portfolio admits the so-called second order regular variation property. Moreover, we explore the relationship between multivariate second order regular variation for a vector (e.g., risk factors) and the second order regular variation property for the sum of its components (e.g., the portfolio of risk factors). Results are illustrated with a variety of examples. PubDate: 2020-06-28
Abstract: Abstract Consider the maximum of independent and identically distributed random variables. The classical result says that the renormalized sample maximum converges to an extreme value distributions, under certain conditions on the distribution function. In the present paper, we shall study the uniform rate of the convergence with respect to the Kolmogorov distance in the framework of the Stein equations. Some typical examples are raised in the paper. PubDate: 2020-06-16
Abstract: Abstract We acknowledge the priority on the introduction of the formula of t-lgHill estimator for the positive extreme value index. We provide a novel motivation for this estimator based on ecologically driven dynamical systems. Another motivation is given directly by applying the general t-Hill procedure to log-gamma distribution. We illustrate the good quality of t-lgHill estimator in comparison to classical Hill estimator on the novel data of the concentration of arsenic in drinking water in the rural area of the Arica and Parinacota Region, Chile. PubDate: 2020-04-27
Abstract: Abstract The extreme value dependence of regularly varying stationary time series can be described by the spectral tail process. Drees et al. (Extremes 18(3), 369–402, 2015) proposed estimators of the marginal distributions of this process based on exceedances over high deterministic thresholds and analyzed their asymptotic behavior. In practice, however, versions of the estimators are applied which use exceedances over random thresholds like intermediate order statistics. We prove that these modified estimators have the same limit distributions. This finding is corroborated in a simulation study, but the version using order statistics performs a bit better for finite samples. PubDate: 2020-03-05