Abstract: We develop a method for probabilistic prediction of extreme value hot-spots in a spatio-temporal framework, tailored to big datasets containing important gaps. In this setting, direct calculation of summaries from data, such as the minimum over a space-time domain, is not possible. To obtain predictive distributions for such cluster summaries, we propose a two-step approach. We first model marginal distributions with a focus on accurate modeling of the right tail and then, after transforming the data to a standard Gaussian scale, we estimate a Gaussian space-time dependence model defined locally in the time domain for the space-time subregions where we want to predict. In the first step, we detrend the mean and standard deviation of the data and fit a spatially resolved generalized Pareto distribution to apply a correction of the upper tail. To ensure spatial smoothness of the estimated trends, we either pool data using nearest-neighbor techniques, or apply generalized additive regression modeling. To cope with high space-time resolution of data, the local Gaussian models use a Markov representation of the Matérn correlation function based on the stochastic partial differential equations (SPDE) approach. In the second step, they are fitted in a Bayesian framework through the integrated nested Laplace approximation implemented in R-INLA. Finally, posterior samples are generated to provide statistical inferences through Monte-Carlo estimation. Motivated by the 2019 Extreme Value Analysis data challenge, we illustrate our approach to predict the distribution of local space-time minima in anomalies of Red Sea surface temperatures, using a gridded dataset (11315 days, 16703 pixels) with artificially generated gaps. In particular, we show the improved performance of our two-step approach over a purely Gaussian model without tail transformations. PubDate: 2021-03-01

Abstract: This paper details the approach of team Lancaster to the 2019 EVA data challenge, dealing with spatio-temporal modelling of Red Sea surface temperature anomalies. We model the marginal distributions and dependence features separately; for the former, we use a combination of Gaussian and generalised Pareto distributions, while the dependence is captured using a localised Gaussian process approach. We also propose a space-time moving estimate of the cumulative distribution function that takes into account spatial variation and temporal trend in the anomalies, to be used in those regions with limited available data. The team’s predictions are compared to results obtained via an empirical benchmark. Our approach performs well in terms of the threshold-weighted continuous ranked probability score criterion, chosen by the challenge organiser. PubDate: 2021-03-01

Abstract: Large, non-stationary spatio-temporal data are ubiquitous in modern statistical applications, and the modeling of spatio-temporal extremes is crucial for assessing risks in environmental sciences among others. While the modeling of extremes is challenging in itself, the prediction of rare events at unobserved spatial locations and time points is even more difficult. In this Editorial, we describe the data competition that was organized for the 11th international conference on Extreme-Value Analysis (EVA 2019), for which several teams modeled and predicted Red Sea surface temperature extremes over space and time. After introducing the dataset and the goal of the competition, we disclose the final ranking of the teams, and we finally discuss some interesting outcomes and future challenges. PubDate: 2021-03-01

Abstract: We consider a critical branching process in an i.i.d. random environment, in which one immigrant arrives at each generation. We are interested in the event \(\mathcal {A}_{i}(n)\) that all individuals alive at time n are offsprings of the immigrant which joined the population at time i. We study the asymptotic probability of this extreme event when n is large and i follows different asymptotics which may be related to n (i fixed, close to n, or going to infinity but far from n). In order to do so, we establish some limit theorems for random walks conditioned to stay positive or nonnegative, which are of independent interest. PubDate: 2021-02-26

Abstract: A Markov tree is a probabilistic graphical model for a random vector indexed by the nodes of an undirected tree encoding conditional independence relations between variables. One possible limit distribution of partial maxima of samples from such a Markov tree is a max-stable Hüsler–Reiss distribution whose parameter matrix inherits its structure from the tree, each edge contributing one free dependence parameter. Our central assumption is that, upon marginal standardization, the data-generating distribution is in the max-domain of attraction of the said Hüsler–Reiss distribution, an assumption much weaker than the one that data are generated according to a graphical model. Even if some of the variables are unobservable (latent), we show that the underlying model parameters are still identifiable if and only if every node corresponding to a latent variable has degree at least three. Three estimation procedures, based on the method of moments, maximum composite likelihood, and pairwise extremal coefficients, are proposed for usage on multivariate peaks over thresholds data when some variables are latent. A typical application is a river network in the form of a tree where, on some locations, no data are available. We illustrate the model and the identifiability criterion on a data set of high water levels on the Seine, France, with two latent variables. The structured Hüsler–Reiss distribution is found to fit the observed extremal dependence patterns well. The parameters being identifiable we are able to quantify tail dependence between locations for which there are no data. PubDate: 2021-02-23

Abstract: We study the behaviour of large values of extremal processes at small times, obtaining an analogue of the Fisher-Tippet-Gnedenko Theorem. Thus, necessary and sufficient conditions for local convergence of such maxima, linearly normalised, to the Fréchet or Gumbel distributions, are established. Weibull distributions are not possible limits in this situation. Moreover, assuming second order regular variation, we prove local asymptotic normality for intermediate order statistics, and derive explicit formulae for the normalising constants for tempered stable processes. We adapt Hill’s estimator of the tail index to the small time setting and establish its asymptotic normality under second order regular variation conditions, illustrating this with simulations. Applications to the fine structure of asset returns processes, possibly with infinite variation, are indicated. PubDate: 2021-02-17

Abstract: We show how fat tails in agricultural commodity returns arise endogenously from productivity shocks in a standard macroeconomic model. Using nearly ninety years of data, we show that the eight agricultural commodities in our sample exhibit fat-tailed return distributions. Statistical tests confirm the heavy-tailedness of price spikes for agricultural commodities. We apply extreme value theory to estimate the size and likelihood of price spikes in agricultural commodities. Back-testing verifies the validity of our risk assessment methodology. PubDate: 2021-02-15

Abstract: In this paper, we derive explicit formulas for the first-passage probabilities of the process S(t) = W(t) − W(t + 1), where W(t) is the Brownian motion, for linear and piece-wise linear barriers on arbitrary intervals [0,T]. Previously, explicit formulas for the first-passage probabilities of this process were known only for the cases of a constant barrier or T ≤ 1. The first-passage probabilities results are used to derive explicit formulas for the power of a familiar test for change-point detection in the Wiener process. PubDate: 2021-02-12

Abstract: This work continues the research started by Rudzkis (Soviet Math Dokl, 45(1), 226–228, 1992), Rudzkis and Bakshaev (Lithuanian Mathematical Journal, 52(2), 196–213, 2012) and extends it to the case of random fields close to Gaussian ones. Let \( \{\xi (t), t \in \mathbb {R}^{m} \}\) be a differentiable (in the mean square sense) random field with \(\mathbb {E} \xi (t)\equiv 0, \mathbb {D} \xi (t)\equiv 1\) and continuous trajectories. The paper is devoted to the problem of large excursions of the random field ξ. Let T be an m-dimensional interval and u(t) be a continuously differentiable function. We investigate the asymptotic properties of the probability \(P=\mathbb {P}\{ \xi (t)< u(t), t \in T \}\) as \(\inf _{t \in \mathbb {R}^{m}} u(t) \rightarrow \infty \) and the mixed cumulants of the random field ξ and its partial derivatives tend to zero, i.e. the scheme of series is considered. It is shown that if the random field ξ satisfies certain smoothness and regularity conditions, then \(\frac {1-P}{1-G}=1+o(1)\) , where G is a constructive functional depending on u, T and a matrix function \(R(t) = cov(\xi ^{\prime }(t),\xi ^{\prime }(t))\) , \(\xi ^{\prime }(t) = \left (\frac {\partial \xi (t)}{\partial t_{1}},...,\frac {\partial \xi (t)}{\partial t_{m}} \right )\) . PubDate: 2021-02-11

Abstract: In this paper, we consider the distribution of the supremum of non-stationary Gaussian processes, and present a new theoretical result on the asymptotic behaviour of this distribution. We focus on the case when the processes have finite number of points attaining their maximal variance, but, unlike previously known facts in this field, our main theorem yields the asymptotic representation of the corresponding distribution function with exponentially decaying remainder term. This result can be efficiently used for studying the projection density estimates, based, for instance, on Legendre polynomials. More precisely, we construct the sequence of accompanying laws, which approximates the distribution of maximal deviation of the considered estimates with polynomial rate. Moreover, we construct the confidence bands for densities, which are honest at polynomial rate to a broad class of densities. PubDate: 2021-02-10

Abstract: This paper describes the estimation of the extreme spatio-temporal sea surface temperature data based on the quantile factor model implemented by the SNU multiscale team. The proposed method was developed for the EVA2019 Data Challenge. Various attempts have been conducted to use factor models in spatio-temporal data analysis to find hidden factors in high-dimensional data. Factor models represent high-dimensional data as a linear combination of several factors, and hence, can describe spatially and temporally correlated data in a simple form. Meanwhile, unlike ordinary factor models, there are asymmetric norm-based factor models, such as quantile factor models or expectile dynamic semiparametric factor models, that can help understand the quantitative behavior of data beyond their mean structure. For this purpose, we apply a quantile factor model to the data to obtain significant factors explaining the quantile response of the temperatures and find quantile estimates. We develop a new method for inference of quantiles of extremal levels by extrapolating quantile estimates from the factor model with extreme value theory. The proposed method provides better performance than the benchmark, gives some interpretable insights, and shows the potential to expand the factor model with various data. PubDate: 2021-02-08

Abstract: It is long known that the distribution of a sum Sn of independent non-negative integer-valued random variables can often be approximated by a Poisson law: Sn≈πλ, where . The problem of evaluating the accuracy of such approximation has attracted a lot of attention in the past six decades. From a practical point of view, the problem has important applications in insurance, reliability theory, extreme value theory, etc.; from a theoretical point of view, it provides insights into Kolmogorov’s problem. Among popular metrics considered in the literature is the Gini–Kantorovich distance dG. The task of establishing an estimate of dG(Sn;πλ) with correct (the best possible) constant at the leading term remained open for a long while. The paper presents a solution to that problem. A first-order asymptotic expansion is established as well. We show that the accuracy of approximation can be considerably better if the random variables obey an extra condition involving the first two moments. A sharp estimate of the accuracy of shifted (translated) Poisson approximation is established as well. PubDate: 2021-01-08 DOI: 10.1007/s10687-020-00392-1

Abstract: Recently in Gao and Stoev (2020) it was established that the concentration of maxima phenomenon is the key to solving the exact sparse support recovery problem in high dimensions. This phenomenon, known also as relative stability, has been little studied in the context of dependence. Here, we obtain bounds on the rate of concentration of maxima in Gaussian triangular arrays. These results are used to establish sufficient conditions for the uniform relative stability of functions of Gaussian arrays, leading to new models that exhibit phase transitions in the exact support recovery problem. Finally, the optimal rate of concentration for Gaussian arrays is studied under general assumptions implied by the classic condition of Berman (1964). PubDate: 2020-11-19 DOI: 10.1007/s10687-020-00399-8

Abstract: This paper presents our winning entry for the EVA 2019 data competition, the aim of which is to predict Red Sea surface temperature extremes over space and time. To achieve this, we used a stochastic partial differential equation (Poisson equation) based method, improved through a regularization to penalize large magnitudes of solutions. This approach is shown to be successful according to the competition’s evaluation criterion, i.e. a threshold-weighted continuous ranked probability score. Our stochastic Poisson equation and its boundary conditions resolve the data’s non-stationarity naturally and effectively. Meanwhile, our numerical method is computationally efficient at dealing with the data’s high dimensionality, without any parameter estimation. It demonstrates the usefulness of stochastic differential equations on spatio-temporal predictions, including the extremes of the process. PubDate: 2020-11-17 DOI: 10.1007/s10687-020-00397-w

Abstract: Recently, the notion of implicit extreme value distributions has been established, which is based on a given loss function f ≥ 0. From an application point of view, one is rather interested in extreme loss events that occur relative to f than in the corresponding extreme values itself. In this context, so-called f -implicit α-Fréchet max-stable distributions arise and have been used to construct independently scattered sup-measures that possess such margins. In this paper we solve an open problem in Goldbach (2016) by developing a stochastic integral of a deterministic function g ≥ 0 with respect to implicit max-stable sup-measures. The resulting theory covers the construction of max-stable extremal integrals (see Stoev and Taqqu Extremes 8, 237–266 (2005)) and, at the same time, reveals striking parallels. PubDate: 2020-10-22 DOI: 10.1007/s10687-020-00388-x

Abstract: We describe our submission to the Extreme Value Analysis 2019 Data Challenge in which teams were asked to predict extremes of sea surface temperature anomaly within spatio-temporal regions of missing data. We present a computational framework which reconstructs missing data using convolutional deep neural networks. Conditioned on incomplete data, we employ autoencoder-like models as multivariate conditional distributions from which possible reconstructions of the complete dataset are sampled using imputed noise. In order to mitigate bias introduced by any one particular model, a prediction ensemble is constructed to create the final distribution of extremal values. Our method does not rely on expert knowledge in order to accurately reproduce dynamic features of a complex oceanographic system with minimal assumptions. The obtained results promise reusability and generalization to other domains. PubDate: 2020-10-21 DOI: 10.1007/s10687-020-00396-x

Abstract: In this paper, we investigate temporal clusters of extremes defined as subsequent exceedances of high thresholds in a stationary time series. Two meaningful features of these clusters are the probability distribution of the cluster size and the ordinal patterns giving the relative positions of the data points within a cluster. Since these patterns take only the ordinal structure of consecutive data points into account, the method is robust under monotone transformations and measurement errors. We verify the existence of the corresponding limit distributions in the framework of regularly varying time series, develop non-parametric estimators and show their asymptotic normality under appropriate mixing conditions. The performance of the estimators is demonstrated in a simulated example and a real data application to discharge data of the river Rhine. PubDate: 2020-08-24 DOI: 10.1007/s10687-020-00391-2

Abstract: We derive the exact asymptotics of $ {\mathbb {P} \left \{ \underset {t\ge 0}{\sup } \left (X_{1}(t) - \mu _{1} t\right )> u, \ \underset {s\ge 0}{\sup } \left (X_{2}(s) - \mu _{2} s\right )> u \right \} },\ \ u\to \infty , $ where (X1(t), X2(s))t, s≥ 0 is a correlated two-dimensional Brownian motion with correlation ρ ∈ [− 1,1] and μ1, μ2 > 0. It appears that the play between ρ and μ1, μ2 leads to several types of asymptotics. Although the exponent in the asymptotics as a function of ρ is continuous, one can observe different types of prefactor functions depending on the range of ρ, which constitute a phase-type transition phenomena. PubDate: 2020-08-11 DOI: 10.1007/s10687-020-00387-y

Abstract: We consider removing lower order statistics from the classical Hill estimator in extreme value statistics, and compensating for it by rescaling the remaining terms. Trajectories of these trimmed statistics as a function of the extent of trimming turn out to be quite flat near the optimal threshold value. For the regularly varying case, the classical threshold selection problem in tail estimation is then revisited, both visually via trimmed Hill plots and, for the Hall class, also mathematically via minimizing the expected empirical variance. This leads to a simple threshold selection procedure for the classical Hill estimator which circumvents the estimation of some of the tail characteristics, a problem which is usually the bottleneck in threshold selection. As a by-product, we derive an alternative estimator of the tail index, which assigns more weight to large observations, and works particularly well for relatively lighter tails. A simple ratio statistic routine is suggested to evaluate the goodness of the implied selection of the threshold. We illustrate the favourable performance and the potential of the proposed method with simulation studies and real insurance data. PubDate: 2020-07-14 DOI: 10.1007/s10687-020-00385-0