Abstract: Structural reliability analysis is concerned with estimation of the probability of a critical event taking place, described by \(P(g(\mathbf{X} ) \le 0)\) for some n-dimensional random variable \(\mathbf{X} \) and some real-valued function g. In many applications the function g is practically unknown, as function evaluation involves time consuming numerical simulation or some other form of experiment that is expensive to perform. The problem we address in this paper is how to optimally design experiments, in a Bayesian decision theoretic fashion, when the goal is to estimate the probability \(P(g(\mathbf{X} ) \le 0)\) using a minimal amount of resources. As opposed to existing methods that have been proposed for this purpose, we consider a general structural reliability model given in hierarchical form. We therefore introduce a general formulation of the experimental design problem, where we distinguish between the uncertainty related to the random variable \(\mathbf{X} \) and any additional epistemic uncertainty that we want to reduce through experimentation. The effectiveness of a design strategy is evaluated through a measure of residual uncertainty, and efficient approximation of this quantity is crucial if we want to apply algorithms that search for an optimal strategy. The method we propose is based on importance sampling combined with the unscented transform for epistemic uncertainty propagation. We implement this for the myopic (one-step look ahead) alternative, and demonstrate the effectiveness through a series of numerical experiments. PubDate: 2021-03-09

Abstract: The renewal Hawkes process is a nascent point process model that generalizes the Hawkes process. Although it has shown strong application potential, fitting the renewal Hawkes process to data remains a challenging task, especially on larger datasets. This article tackles this challenge by providing two approaches that significantly reduce the time required to fit renewal Hawkes processes. Since derivative-based methods for optimization, in general, converge faster than derivative-free methods, our first approach is to derive algorithms for evaluating the gradient and Hessian of the log-likelihood function and then use a derivative-based method, such as the Newton–Raphson method, in maximizing the likelihood, instead of the derivative-free method currently being used. Our second approach is to seek linear time algorithms that produce accurate approximations to the likelihood function, and then directly optimize the approximation to the log-likelihood function. Our simulation experiments show that the Newton–Raphson method reduces the computational time by about 30%. Furthermore, the approximate likelihood methods produce equally accurate estimates compared to the methods based on the exact likelihood and are about 20–40 times faster on datasets with about 10,000 events. We conclude with an analysis of price changes of several currencies relative to the US Dollar. PubDate: 2021-03-04

Abstract: We consider the problem of estimating a parameter \(\theta \in \Theta \subseteq {\mathbb {R}}^{d_{\theta }}\) associated with a Bayesian inverse problem. Typically one must resort to a numerical approximation of gradient of the log-likelihood and also adopt a discretization of the problem in space and/or time. We develop a new methodology to unbiasedly estimate the gradient of the log-likelihood with respect to the unknown parameter, i.e. the expectation of the estimate has no discretization bias. Such a property is not only useful for estimation in terms of the original stochastic model of interest, but can be used in stochastic gradient algorithms which benefit from unbiased estimates. Under appropriate assumptions, we prove that our estimator is not only unbiased but of finite variance. In addition, when implemented on a single processor, we show that the cost to achieve a given level of error is comparable to multilevel Monte Carlo methods, both practically and theoretically. However, the new algorithm is highly amenable to parallel computation. PubDate: 2021-03-03

Abstract: Conditional particle filters (CPFs) are powerful smoothing algorithms for general nonlinear/non-Gaussian hidden Markov models. However, CPFs can be inefficient or difficult to apply with diffuse initial distributions, which are common in statistical applications. We propose a simple but generally applicable auxiliary variable method, which can be used together with the CPF in order to perform efficient inference with diffuse initial distributions. The method only requires simulatable Markov transitions that are reversible with respect to the initial distribution, which can be improper. We focus in particular on random walk type transitions which are reversible with respect to a uniform initial distribution (on some domain), and autoregressive kernels for Gaussian initial distributions. We propose to use online adaptations within the methods. In the case of random walk transition, our adaptations use the estimated covariance and acceptance rate adaptation, and we detail their theoretical validity. We tested our methods with a linear Gaussian random walk model, a stochastic volatility model, and a stochastic epidemic compartment model with time-varying transmission rate. The experimental findings demonstrate that our method works reliably with little user specification and can be substantially better mixing than a direct particle Gibbs algorithm that treats initial states as parameters. PubDate: 2021-03-03

Abstract: Mixtures of unigrams are one of the simplest and most efficient tools for clustering textual data, as they assume that documents related to the same topic have similar distributions of terms, naturally described by multinomials. When the classification task is particularly challenging, such as when the document-term matrix is high-dimensional and extremely sparse, a more composite representation can provide better insight into the grouping structure. In this work, we developed a deep version of mixtures of unigrams for the unsupervised classification of very short documents with a large number of terms, by allowing for models with further deeper latent layers; the proposal is derived in a Bayesian framework. The behavior of the deep mixtures of unigrams is empirically compared with that of other traditional and state-of-the-art methods, namely k-means with cosine distance, k-means with Euclidean distance on data transformed according to semantic analysis, partition around medoids, mixture of Gaussians on semantic-based transformed data, hierarchical clustering according to Ward’s method with cosine dissimilarity, latent Dirichlet allocation, mixtures of unigrams estimated via the EM algorithm, spectral clustering and affinity propagation clustering. The performance is evaluated in terms of both correct classification rate and Adjusted Rand Index. Simulation studies and real data analysis prove that going deep in clustering such data highly improves the classification accuracy. PubDate: 2021-03-03

Abstract: Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper, we introduce an extension of BART, called model trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART. PubDate: 2021-03-03

Abstract: There is a growing interest in probabilistic numerical solutions to ordinary differential equations. In this paper, the maximum a posteriori estimate is studied under the class of \(\nu \) times differentiable linear time-invariant Gauss–Markov priors, which can be computed with an iterated extended Kalman smoother. The maximum a posteriori estimate corresponds to an optimal interpolant in the reproducing kernel Hilbert space associated with the prior, which in the present case is equivalent to a Sobolev space of smoothness \(\nu +1\) . Subject to mild conditions on the vector field, convergence rates of the maximum a posteriori estimate are then obtained via methods from nonlinear analysis and scattered data approximation. These results closely resemble classical convergence results in the sense that a \(\nu \) times differentiable prior process obtains a global order of \(\nu \) , which is demonstrated in numerical examples. PubDate: 2021-03-03

Abstract: We derive a single-pass algorithm for computing the gradient and Fisher information of Vecchia’s Gaussian process loglikelihood approximation, which provides a computationally efficient means for applying the Fisher scoring algorithm for maximizing the loglikelihood. The advantages of the optimization techniques are demonstrated in numerical examples and in an application to Argo ocean temperature data. The new methods find the maximum likelihood estimates much faster and more reliably than an optimization method that uses only function evaluations, especially when the covariance function has many parameters. This allows practitioners to fit nonstationary models to large spatial and spatial–temporal datasets. PubDate: 2021-03-03

Abstract: We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. Our formulation includes as a special case the selection of a fixed number of the samples, with equal weights, i.e., the problem of selecting a smaller representative subset of the samples. While this problem is combinatorial and not convex, heuristic methods based on convex optimization seem to perform very well. We describe our open-source implementation rsw and apply it to a skewed sample of the CDC BRFSS dataset. PubDate: 2021-02-28

Abstract: Many multivariate time series observed in practice are second order nonstationary, i.e. their covariance properties vary over time. In addition, missing observations in such data are encountered in many applications of interest, due to recording failures or sensor dropout, hindering successful analysis. This article introduces a novel method for data imputation in multivariate nonstationary time series, based on the so-called locally stationary wavelet modelling paradigm. Our methodology is shown to perform well across a range of simulation scenarios, with a variety of missingness structures, as well as being competitive in the stationary time series setting. We also demonstrate our technique on data arising in a health monitoring application. PubDate: 2021-02-17

Abstract: Adaptive importance sampling is a class of techniques for finding good proposal distributions for importance sampling. Often the proposal distributions are standard probability distributions whose parameters are adapted based on the mismatch between the current proposal and a target distribution. In this work, we present an implicit adaptive importance sampling method that applies to complicated distributions which are not available in closed form. The method iteratively matches the moments of a set of Monte Carlo draws to weighted moments based on importance weights. We apply the method to Bayesian leave-one-out cross-validation and show that it performs better than many existing parametric adaptive importance sampling methods while being computationally inexpensive. PubDate: 2021-02-09

Abstract: In binary and ordinal regression one can distinguish between a location component and a scaling component. While the former determines the location within the range of the response categories, the scaling indicates variance heterogeneity. In particular since it has been demonstrated that misleading effects can occur if one ignores the presence of a scaling component, it is important to account for potential scaling effects in the regression model, which is not possible in available recursive partitioning methods. The proposed recursive partitioning method yields two trees: one for the location and one for the scaling. They show in a simple interpretable way how variables interact to determine the binary or ordinal response. The developed algorithm controls for the global significance level and automatically selects the variables that have an impact on the response. The modeling approach is illustrated by several real-world applications. PubDate: 2021-02-09

Abstract: To overcome the inherent limitations of axis-aligned base learners in ensemble learning, several methods of rotating the feature space have been discussed in the literature. In particular, smoother decision boundaries can often be obtained from axis-aligned ensembles by rotating the feature space. In the present paper, we introduce a low-cost regularization technique that favors rotations which produce compact base learners. The restated problem adds a shrinkage term to the loss function that explicitly accounts for the complexity of the base learners. For example, for tree-based ensembles, we apply a penalty based on the median number of nodes and the median depth of the trees in the forest. Rather than jointly minimizing prediction error and model complexity, which is computationally infeasible, we first generate a prioritized weighting of the available feature rotations that promotes lower model complexity and subsequently minimize prediction errors on each of the selected rotations. We show that the resulting ensembles tend to be significantly more dense, faster to evaluate, and competitive at generalizing in out-of-sample predictions. PubDate: 2021-01-27

Abstract: Historical functional linear models (HFLMs) quantify associations between a functional predictor and functional outcome where the predictor is an exposure variable that occurs before, or at least concurrently with, the outcome. Prior work on the HFLM has largely focused on estimation of a surface that represents a time-varying association between the functional outcome and the functional exposure. This existing work has employed frequentist and spline-based estimation methods, with little attention paid to formal inference or adjustment for multiple testing and no approaches that implement wavelet bases. In this work, we propose a new functional regression model that estimates the time-varying, lagged association between a functional outcome and a functional exposure. Building off of recently developed function-on-function regression methods, the model employs a novel use the wavelet-packet decomposition of the exposure and outcome functions that allows us to strictly enforce the temporal ordering of exposure and outcome, which is not possible with existing wavelet-based functional models. Using a fully Bayesian approach, we conduct formal inference on the time-varying lagged association, while adjusting for multiple testing. We investigate the operating characteristics of our wavelet-packet HFLM and compare them to those of two existing estimation procedures in simulation. We also assess several inference techniques and use the model to analyze data on the impact of lagged exposure to particulate matter finer than 2.5 \(\upmu \) g, or PM \(_{2.5}\) , on heart rate variability in a cohort of journeyman boilermakers during the morning of a typical day’s shift. PubDate: 2021-01-27

Abstract: Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which adapt themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same \(\mathcal {O}(1/\sqrt{N})\) convergence rate as standard importance samplers, where N is the number of Monte Carlo samples, the behaviour of adaptive importance samplers over the number of iterations has been left relatively unexplored. In this work, we investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive importance samplers termed optimised adaptive importance samplers (OAIS). These samplers rely on the iterative minimisation of the \(\chi ^2\) -divergence between an exponential family proposal and the target. The analysed algorithms are closely related to the class of adaptive importance samplers which minimise the variance of the weight function. We first prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. The non-asymptotic bounds derived in this paper imply that when the target belongs to the exponential family, the \(L_2\) errors of the optimised samplers converge to the optimal rate of \(\mathcal {O}(1/\sqrt{N})\) and the rate of convergence in the number of iterations are explicitly provided. When the target does not belong to the exponential family, the rate of convergence is the same but the asymptotic \(L_2\) error increases by a factor \(\sqrt{\rho ^\star } > 1\) , where \(\rho ^\star - 1\) is the minimum \(\chi ^2\) -divergence between the target and an exponential family proposal. PubDate: 2021-01-21

Abstract: Bayesian nonparametric density estimation is dominated by single-scale methods, typically exploiting mixture model specifications, exception made for Pólya trees prior and allied approaches. In this paper we focus on developing a novel family of multiscale stick-breaking mixture models that inherits some of the advantages of both single-scale nonparametric mixtures and Pólya trees. Our proposal is based on a mixture specification exploiting an infinitely deep binary tree of random weights that grows according to a multiscale generalization of a large class of stick-breaking processes; this multiscale stick-breaking is paired with specific stochastic processes generating sequences of parameters that induce stochastically ordered kernel functions. Properties of this family of multiscale stick-breaking mixtures are described. Focusing on a Gaussian specification, a Markov Chain Monte Carlo algorithm for posterior computation is introduced. The performance of the method is illustrated analyzing both synthetic and real datasets consistently showing competitive results both in scenarios favoring single-scale and multiscale methods. The results suggest that the method is well suited to estimate densities with varying degree of smoothness and local features. PubDate: 2021-01-21

Abstract: We develop an Evolutionary Markov Chain Monte Carlo (EMCMC) algorithm for sampling spatial partitions that lie within a large, complex, and constrained spatial state space. Our algorithm combines the advantages of evolutionary algorithms (EAs) as optimization heuristics for state space traversal and the theoretical convergence properties of Markov Chain Monte Carlo algorithms for sampling from unknown distributions. Local optimality information that is identified via a directed search by our optimization heuristic is used to adaptively update a Markov chain in a promising direction within the framework of a Multiple-Try Metropolis Markov Chain model that incorporates a generalized Metropolis-Hastings ratio. We further expand the reach of our EMCMC algorithm by harnessing the computational power afforded by massively parallel computing architecture through the integration of a parallel EA framework that guides Markov chains running in parallel. PubDate: 2021-01-12 DOI: 10.1007/s11222-020-09977-z

Abstract: We present a preconditioned Monte Carlo method for computing high-dimensional multivariate normal and Student-t probabilities arising in spatial statistics. The approach combines a tile-low-rank representation of covariance matrices with a block-reordering scheme for efficient quasi-Monte Carlo simulation. The tile-low-rank representation decomposes the high-dimensional problem into many diagonal-block-size problems and low-rank connections. The block-reordering scheme reorders between and within the diagonal blocks to reduce the impact of integration variables from right to left, thus improving the Monte Carlo convergence rate. Simulations up to dimension 65,536 suggest that the new method can improve the run time by an order of magnitude compared with the hierarchical quasi-Monte Carlo method and two orders of magnitude compared with the dense quasi-Monte Carlo method. Our method also forms a strong substitute for the approximate conditioning methods as a more robust estimation with error guarantees. An application study to wind stochastic generators is provided to illustrate that the new computational method makes the maximum likelihood estimation feasible for high-dimensional skew-normal random fields. PubDate: 2021-01-12 DOI: 10.1007/s11222-020-09978-y

Abstract: The Hawkes process and its extensions effectively model self-excitatory phenomena including earthquakes, viral pandemics, financial transactions, neural spike trains and the spread of memes through social networks. The usefulness of these stochastic process models within a host of economic sectors and scientific disciplines is undercut by the processes’ computational burden: complexity of likelihood evaluations grows quadratically in the number of observations for both the temporal and spatiotemporal Hawkes processes. We show that, with care, one may parallelize these calculations using both central and graphics processing unit implementations to achieve over 100-fold speedups over single-core processing. Using a simple adaptive Metropolis–Hastings scheme, we apply our high-performance computing framework to a Bayesian analysis of big gunshot data generated in Washington D.C. between the years of 2006 and 2019, thereby extending a past analysis of the same data from under 10,000 to over 85,000 observations. To encourage widespread use, we provide hpHawkes, an open-source R package, and discuss high-level implementation and program design for leveraging aspects of computational hardware that become necessary in a big data setting. PubDate: 2021-01-12 DOI: 10.1007/s11222-020-09980-4

Abstract: Modal clustering has a clear population goal, where density estimation plays a critical role. In this paper, we study how to provide better density estimation so as to serve the objective of modal clustering. In particular, we use semiparametric mixtures for density estimation, aided with a novel mode-flattening technique. The use of semiparametric mixtures helps to produce better density estimates, especially in the multivariate situation, and the mode-flattening technique is intended to identify and smooth out spurious and minor modes. With mode flattening, the number of clusters can be sequentially reduced until there is only one mode left. In addition, we adopt the likelihood function in a coherent manner to measure the relative importance of a mode and let the current least important mode disappear in each step. For both simulated and real-world data sets, the proposed method performs very well, as compared with some well-known clustering methods in the literature, and can successfully solve some fairly difficult clustering problems. PubDate: 2021-01-12 DOI: 10.1007/s11222-020-09985-z