Abstract: Abstract Adaptive importance samplers are adaptive Monte Carlo algorithms to estimate expectations with respect to some target distribution which adapt themselves to obtain better estimators over a sequence of iterations. Although it is straightforward to show that they have the same \(\mathcal {O}(1/\sqrt{N})\) convergence rate as standard importance samplers, where N is the number of Monte Carlo samples, the behaviour of adaptive importance samplers over the number of iterations has been left relatively unexplored. In this work, we investigate an adaptation strategy based on convex optimisation which leads to a class of adaptive importance samplers termed optimised adaptive importance samplers (OAIS). These samplers rely on the iterative minimisation of the \(\chi ^2\) -divergence between an exponential family proposal and the target. The analysed algorithms are closely related to the class of adaptive importance samplers which minimise the variance of the weight function. We first prove non-asymptotic error bounds for the mean squared errors (MSEs) of these algorithms, which explicitly depend on the number of iterations and the number of samples together. The non-asymptotic bounds derived in this paper imply that when the target belongs to the exponential family, the \(L_2\) errors of the optimised samplers converge to the optimal rate of \(\mathcal {O}(1/\sqrt{N})\) and the rate of convergence in the number of iterations are explicitly provided. When the target does not belong to the exponential family, the rate of convergence is the same but the asymptotic \(L_2\) error increases by a factor \(\sqrt{\rho ^\star } > 1\) , where \(\rho ^\star - 1\) is the minimum \(\chi ^2\) -divergence between the target and an exponential family proposal. PubDate: 2021-01-21

Abstract: Abstract Bayesian nonparametric density estimation is dominated by single-scale methods, typically exploiting mixture model specifications, exception made for Pólya trees prior and allied approaches. In this paper we focus on developing a novel family of multiscale stick-breaking mixture models that inherits some of the advantages of both single-scale nonparametric mixtures and Pólya trees. Our proposal is based on a mixture specification exploiting an infinitely deep binary tree of random weights that grows according to a multiscale generalization of a large class of stick-breaking processes; this multiscale stick-breaking is paired with specific stochastic processes generating sequences of parameters that induce stochastically ordered kernel functions. Properties of this family of multiscale stick-breaking mixtures are described. Focusing on a Gaussian specification, a Markov Chain Monte Carlo algorithm for posterior computation is introduced. The performance of the method is illustrated analyzing both synthetic and real datasets consistently showing competitive results both in scenarios favoring single-scale and multiscale methods. The results suggest that the method is well suited to estimate densities with varying degree of smoothness and local features. PubDate: 2021-01-21

Abstract: Abstract We develop an Evolutionary Markov Chain Monte Carlo (EMCMC) algorithm for sampling spatial partitions that lie within a large, complex, and constrained spatial state space. Our algorithm combines the advantages of evolutionary algorithms (EAs) as optimization heuristics for state space traversal and the theoretical convergence properties of Markov Chain Monte Carlo algorithms for sampling from unknown distributions. Local optimality information that is identified via a directed search by our optimization heuristic is used to adaptively update a Markov chain in a promising direction within the framework of a Multiple-Try Metropolis Markov Chain model that incorporates a generalized Metropolis-Hastings ratio. We further expand the reach of our EMCMC algorithm by harnessing the computational power afforded by massively parallel computing architecture through the integration of a parallel EA framework that guides Markov chains running in parallel. PubDate: 2021-01-12

Abstract: Abstract We present a preconditioned Monte Carlo method for computing high-dimensional multivariate normal and Student-t probabilities arising in spatial statistics. The approach combines a tile-low-rank representation of covariance matrices with a block-reordering scheme for efficient quasi-Monte Carlo simulation. The tile-low-rank representation decomposes the high-dimensional problem into many diagonal-block-size problems and low-rank connections. The block-reordering scheme reorders between and within the diagonal blocks to reduce the impact of integration variables from right to left, thus improving the Monte Carlo convergence rate. Simulations up to dimension 65,536 suggest that the new method can improve the run time by an order of magnitude compared with the hierarchical quasi-Monte Carlo method and two orders of magnitude compared with the dense quasi-Monte Carlo method. Our method also forms a strong substitute for the approximate conditioning methods as a more robust estimation with error guarantees. An application study to wind stochastic generators is provided to illustrate that the new computational method makes the maximum likelihood estimation feasible for high-dimensional skew-normal random fields. PubDate: 2021-01-12

Abstract: Abstract The Hawkes process and its extensions effectively model self-excitatory phenomena including earthquakes, viral pandemics, financial transactions, neural spike trains and the spread of memes through social networks. The usefulness of these stochastic process models within a host of economic sectors and scientific disciplines is undercut by the processes’ computational burden: complexity of likelihood evaluations grows quadratically in the number of observations for both the temporal and spatiotemporal Hawkes processes. We show that, with care, one may parallelize these calculations using both central and graphics processing unit implementations to achieve over 100-fold speedups over single-core processing. Using a simple adaptive Metropolis–Hastings scheme, we apply our high-performance computing framework to a Bayesian analysis of big gunshot data generated in Washington D.C. between the years of 2006 and 2019, thereby extending a past analysis of the same data from under 10,000 to over 85,000 observations. To encourage widespread use, we provide hpHawkes, an open-source R package, and discuss high-level implementation and program design for leveraging aspects of computational hardware that become necessary in a big data setting. PubDate: 2021-01-12

Abstract: Abstract Modal clustering has a clear population goal, where density estimation plays a critical role. In this paper, we study how to provide better density estimation so as to serve the objective of modal clustering. In particular, we use semiparametric mixtures for density estimation, aided with a novel mode-flattening technique. The use of semiparametric mixtures helps to produce better density estimates, especially in the multivariate situation, and the mode-flattening technique is intended to identify and smooth out spurious and minor modes. With mode flattening, the number of clusters can be sequentially reduced until there is only one mode left. In addition, we adopt the likelihood function in a coherent manner to measure the relative importance of a mode and let the current least important mode disappear in each step. For both simulated and real-world data sets, the proposed method performs very well, as compared with some well-known clustering methods in the literature, and can successfully solve some fairly difficult clustering problems. PubDate: 2021-01-12

Abstract: Abstract Ensemble Kalman inversion (EKI) has been a very popular algorithm used in Bayesian inverse problems (Iglesias et al. in Inverse Probl 29: 045001, 2013). It samples particles from a prior distribution and introduces a motion to move the particles around in pseudo-time. As the pseudo-time goes to infinity, the method finds the minimizer of the objective function, and when the pseudo-time stops at 1, the ensemble distribution of the particles resembles, in some sense, the posterior distribution in the linear setting. The ideas trace back further to ensemble Kalman filter and the associated analysis (Evensen in J Geophys Res: Oceans 99: 10143–10162, 1994; Reich in BIT Numer Math 51: 235–249, 2011), but to today, when viewed as a sampling method, why EKI works, and in what sense with what rate the method converges is still largely unknown. In this paper, we analyze the continuous version of EKI, a coupled SDE system, and prove the mean-field limit of this SDE system. In particular, we will show that 1. as the number of particles goes to infinity, the empirical measure of particles following SDE converges to the solution to a Fokker–Planck equation in Wasserstein 2-distance with an optimal rate, for both linear and weakly nonlinear case; 2. the solution to the Fokker–Planck equation reconstructs the target distribution in finite time in the linear case, as suggested in Iglesias et al. (Inverse Probl 29: 045001, 2013). PubDate: 2021-01-12

Abstract: Abstract In this work, we propose a new Bayesian model for unsupervised image segmentation based on a combination of the spatially varying finite mixture models (SVFMMs) and the non-local means (NLM) framework. The probabilistic NLM weighting function is successfully integrated into a varying Gauss–Markov random field, yielding a prior density that adaptively imposes a local regularization to simultaneously preserve edges and enforce smooth constraints in homogeneous regions of the image. Two versions of our model are proposed: a pixel-based model and a patch-based model, depending on the design of the probabilistic NLM weighting function. Contrary to previous methods proposed in the literature, our approximation does not introduce new parameters to be estimated into the model, because the NLM weighting function is completely known once the neighborhood of a pixel is fixed. The proposed model can be estimated in closed-form solution via a maximum a posteriori (MAP) estimation in an expectation–maximization scheme. We have compared our model with previously proposed SVFMMs using two public datasets: the Berkeley Segmentation dataset and the BRATS 2013 dataset. The proposed model performs favorably to previous approaches in the literature, achieving better results in terms of Rand Index and Dice metrics in our experiments. PubDate: 2021-01-12

Abstract: Abstract In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases, the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is \(1{/}{\text {SNR}}^4\) . Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal’s length. Finally, we present numerical experiments which demonstrate the performance of our algorithms and corroborate our theoretical findings. PubDate: 2021-01-12

Abstract: Abstract Bayesian variable selection is an important method for discovering variables which are most useful for explaining the variation in a response. The widespread use of this method has been restricted by the challenging computational problem of sampling from the corresponding posterior distribution. Recently, the use of adaptive Monte Carlo methods has been shown to lead to performance improvement over traditionally used algorithms in linear regression models. This paper looks at applying one of these algorithms (the adaptively scaled independence sampler) to logistic regression and accelerated failure time models. We investigate the use of this algorithm with data augmentation, Laplace approximation and the correlated pseudo-marginal method. The performance of the algorithms is compared on several genomic data sets. PubDate: 2021-01-12

Abstract: Abstract The validity of estimation and smoothing parameter selection for the wide class of generalized additive models for location, scale and shape (GAMLSS) relies on the correct specification of a likelihood function. Deviations from such assumption are known to mislead any likelihood-based inference and can hinder penalization schemes meant to ensure some degree of smoothness for nonlinear effects. We propose a general approach to achieve robustness in fitting GAMLSSs by limiting the contribution of observations with low log-likelihood values. Robust selection of the smoothing parameters can be carried out either by minimizing information criteria that naturally arise from the robustified likelihood or via an extended Fellner–Schall method. The latter allows for automatic smoothing parameter selection and is particularly advantageous in applications with multiple smoothing parameters. We also address the challenge of tuning robust estimators for models with nonlinear effects by proposing a novel median downweighting proportion criterion. This enables a fair comparison with existing robust estimators for the special case of generalized additive models, where our estimator competes favorably. The overall good performance of our proposal is illustrated by further simulations in the GAMLSS setting and by an application to functional magnetic resonance brain imaging using bivariate smoothing splines. PubDate: 2021-01-12

Abstract: Abstract We investigate sampling \(\beta \) -ensembles with time complexity less than cubic in the cardinality of the ensemble. Following Dumitriu and Edelman (J Math Phys 43(11):5830–5847, 2002), we see the ensemble as the eigenvalues of a random tridiagonal matrix, namely a random Jacobi matrix. First, we provide a unifying and elementary treatment of the tridiagonal models associated with the three classical Hermite, Laguerre, and Jacobi ensembles. For this purpose, we use simple changes of variables between successive reparametrizations of the coefficients defining the tridiagonal matrix. Second, we derive an approximate sampler for the simulation of more general \(\beta \) -ensembles and illustrate how fast it can be for polynomial potentials. This method combines a Gibbs sampler on Jacobi matrices and the diagonalization of these matrices. In practice, even for large ensembles, only a few Gibbs passes suffice for the marginal distribution of the eigenvalues to fit the expected theoretical distribution. When the conditionals in the Gibbs sampler can be simulated exactly, the same fast empirical convergence is observed for the fluctuations of the largest eigenvalue. Our experimental results support a conjecture by Krishnapur et al. (Commun Pure Appl Math 69(1): 145–199, 2016), that the Gibbs chain on Jacobi matrices of size N mixes in \(\mathcal {O}(\log N)\) . PubDate: 2021-01-12

Abstract: Abstract Motivated mainly by applications to partial differential equations with random coefficients, we introduce a new class of Monte Carlo estimators, called Toeplitz Monte Carlo (TMC) estimator, for approximating the integral of a multivariate function with respect to the direct product of an identical univariate probability measure. The TMC estimator generates a sequence \(x_1,x_2,\ldots \) of i.i.d. samples for one random variable and then uses \((x_{n+s-1},x_{n+s-2}\ldots ,x_n)\) with \(n=1,2,\ldots \) as quadrature points, where s denotes the dimension. Although consecutive points have some dependency, the concatenation of all quadrature nodes is represented by a Toeplitz matrix, which allows for a fast matrix–vector multiplication. In this paper, we study the variance of the TMC estimator and its dependence on the dimension s. Numerical experiments confirm the considerable efficiency improvement over the standard Monte Carlo estimator for applications to partial differential equations with random coefficients, particularly when the dimension s is large. PubDate: 2021-01-06

Abstract: Abstract Cluster point processes comprise a class of models that have been used for a wide range of applications. While several models have been studied for the probability density function of the offspring displacements and the parent point process, there are few examples of non-Poisson distributed cluster sizes. In this paper, we introduce a generalization of the Thomas process, which allows for the cluster sizes to have a variance that is greater or less than the expected value. We refer to this as the cluster sizes being over- and under-dispersed, respectively. To fit the model, we introduce minimum contrast methods and a Bayesian MCMC algorithm. These are evaluated in a simulation study. It is found that using the Bayesian MCMC method, we are in most cases able to detect over- and under-dispersion in the cluster sizes. We use the MCMC method to fit the model to nerve fiber data, and contrast the results to those of a fitted Thomas process. PubDate: 2020-11-01

Abstract: Abstract Missing values challenge data analysis because many supervised and unsupervised learning methods cannot be applied directly to incomplete data. Matrix completion based on low-rank assumptions are very powerful solution for dealing with missing values. However, existing methods do not consider the case of informative missing values which are widely encountered in practice. This paper proposes matrix completion methods to recover Missing Not At Random (MNAR) data. Our first contribution is to suggest a model-based estimation strategy by modelling the missing mechanism distribution. An EM algorithm is then implemented, involving a Fast Iterative Soft-Thresholding Algorithm (FISTA). Our second contribution is to suggest a computationally efficient surrogate estimation by implicitly taking into account the joint distribution of the data and the missing mechanism: the data matrix is concatenated with the mask coding for the missing values; a low-rank structure for exponential family is assumed on this new matrix, in order to encode links between variables and missing mechanisms. The methodology that has the great advantage of handling different missing value mechanisms is robust to model specification errors. The performances of our methods are assessed on the real data collected from a trauma registry (TraumaBase \(^{\textregistered }\) ) containing clinical information about over twenty thousand severely traumatized patients in France. The aim is then to predict if the doctors should administrate tranexomic acid to patients with traumatic brain injury, that would limit excessive bleeding. PubDate: 2020-11-01

Abstract: Abstract This article focuses on the challenging problem of efficiently detecting changes in mean within multivariate data sequences. Multivariate changepoints can be detected by projecting a multivariate series to a univariate one using a suitable projection direction that preserves a maximal proportion of signal information. However, for some existing approaches the computation of such a projection direction can scale unfavourably with the number of series and might rely on additional assumptions on the data sequences, thus limiting their generality. We introduce BayesProject, a computationally inexpensive Bayesian approach to compute a projection direction in such a setting. The proposed approach allows the incorporation of prior knowledge of the changepoint scenario, when such information is available, which can help to increase the accuracy of the method. A simulation study shows that BayesProject is robust, yields projections close to the oracle projection direction and, moreover, that its accuracy in detecting changepoints is comparable to, or better than, existing algorithms while scaling linearly with the number of series. PubDate: 2020-11-01

Abstract: Abstract Hierarchical normalized discrete random measures identify a general class of priors that is suited to flexibly learn how the distribution of a response variable changes across groups of observations. A special case widely used in practice is the hierarchical Dirichlet process. Although current theory on hierarchies of nonparametric priors yields all relevant tools for drawing posterior inference, their implementation comes at a high computational cost. We fill this gap by proposing an approximation for a general class of hierarchical processes, which leads to an efficient conditional Gibbs sampling algorithm. The key idea consists of a deterministic truncation of the underlying random probability measures leading to a finite dimensional approximation of the original prior law. We provide both empirical and theoretical support for such a procedure. PubDate: 2020-11-01

Abstract: Abstract We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad “flat” regions which are hard to minimize using gradient-based techniques. PubDate: 2020-11-01

Abstract: Abstract To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation–Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classical conditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models. In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given. PubDate: 2020-09-05

Abstract: Abstract Deciding which predictors to use plays an integral role in deriving statistical models in a wide range of applications. Motivated by the challenges of predicting events across a telecommunications network, we propose a semi-automated, joint model-fitting and predictor selection procedure for linear regression models. Our approach can model and account for serial correlation in the regression residuals, produces sparse and interpretable models and can be used to jointly select models for a group of related responses. This is achieved through fitting linear models under constraints on the number of nonzero coefficients using a generalisation of a recently developed mixed integer quadratic optimisation approach. The resultant models from our approach achieve better predictive performance on the motivating telecommunications data than methods currently used by industry. PubDate: 2020-09-04