Abstract: A new time-varying autoregressive stochastic volatility model with \(\alpha \) -stable innovations (TVAR \(\alpha \) SV) is proposed. This new model for time series data combines a time-varying autoregressive component and a stochastic scaling as known from stochastic volatility models with \(\alpha \) -stable distributed noise. Hence, the model can cover extreme events better than classical stochastic volatility models. Furthermore, we develop a Gibbs sampling procedure for the estimation of the model parameters. The procedure is based on the estimation strategy by Kim et al. (Rev Econ Stud 65(3): 361–393, 1998) for classical stochastic volatility models, however, the estimation procedure requires a deliberate approximation of \(\alpha \) -stable distributions by finite mixtures of normal distributions and the application of a simulation smoother for linear Gaussian state space models. A simulation study for the new estimation procedure illustrates the appealing accuracy. Finally, we apply the model to electricity spot price data. PubDate: 2021-04-20

Abstract: We introduce the use of the Zig-Zag sampler to the problem of sampling conditional diffusion processes (diffusion bridges). The Zig-Zag sampler is a rejection-free sampling scheme based on a non-reversible continuous piecewise deterministic Markov process. Similar to the Lévy–Ciesielski construction of a Brownian motion, we expand the diffusion path in a truncated Faber–Schauder basis. The coefficients within the basis are sampled using a Zig-Zag sampler. A key innovation is the use of the fully local algorithm for the Zig-Zag sampler that allows to exploit the sparsity structure implied by the dependency graph of the coefficients and by the subsampling technique to reduce the complexity of the algorithm. We illustrate the performance of the proposed methods in a number of examples. PubDate: 2021-04-20

Abstract: We introduce a new variational inference (VI) framework, called energetic variational inference (EVI). It minimizes the VI objective function based on a prescribed energy-dissipation law. Using the EVI framework, we can derive many existing particle-based variational inference (ParVI) methods, including the popular Stein variational gradient descent (SVGD). More importantly, many new ParVI schemes can be created under this framework. For illustration, we propose a new particle-based EVI scheme, which performs the particle-based approximation of the density first and then uses the approximated density in the variational procedure, or “Approximation-then-Variation” for short. Thanks to this order of approximation and variation, the new scheme can maintain the variational structure at the particle level, and can significantly decrease the KL-divergence in each iteration. Numerical experiments show the proposed method outperforms some existing ParVI methods in terms of fidelity to the target distribution. PubDate: 2021-04-17

Abstract: Gaussian processes (GPs) serve as flexible surrogates for complex surfaces, but buckle under the cubic cost of matrix decompositions with big training data sizes. Geospatial and machine learning communities suggest pseudo-inputs, or inducing points, as one strategy to obtain an approximation easing that computational burden. However, we show how placement of inducing points and their multitude can be thwarted by pathologies, especially in large-scale dynamic response surface modeling tasks. As remedy, we suggest porting the inducing point idea, which is usually applied globally, over to a more local context where selection is both easier and faster. In this way, our proposed methodology hybridizes global inducing point and data subset-based local GP approximation. A cascade of strategies for planning the selection of local inducing points is provided, and comparisons are drawn to related methodology with emphasis on computer surrogate modeling applications. We show that local inducing points extend their global and data subset component parts on the accuracy–computational efficiency frontier. Illustrative examples are provided on benchmark data and a large-scale real-simulation satellite drag interpolation problem. PubDate: 2021-04-17

Abstract: High dimensional linear regression problems are often fitted using Lasso approaches. Although the Lasso objective function is convex, it is not differentiable everywhere, making the use of gradient descent methods for minimization not straightforward. To avoid this technical issue, we apply Nesterov smoothing to the original (unsmoothed) Lasso objective function. We introduce a closed-form smoothed Lasso which preserves the convexity of the Lasso function, is uniformly close to the unsmoothed Lasso, and allows us to obtain closed-form derivatives everywhere for efficient and fast minimization via gradient descent. Our simulation studies are focused on polygenic risk scores using genetic data from a genome-wide association study (GWAS) for chronic obstructive pulmonary disease (COPD). We compare accuracy and runtime of our approach to the current gold standard in the literature, the FISTA algorithm. Our results suggest that the proposed methodology provides estimates with equal or higher accuracy than the FISTA algorithm while having the same asymptotic runtime scaling. The proposed methodology is implemented in the R-package smoothedLasso, available on the Comprehensive R Archive Network (CRAN). PubDate: 2021-04-17

Abstract: The simultaneous clustering of documents and words, known as co-clustering, has proved to be more effective than one-sided clustering in dealing with sparse high-dimensional datasets. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises–Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper, we propose a general co-clustering framework based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible framework for text co-clustering that can easily incorporate both word–word semantic relationships and document–document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a bi-directional multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence. PubDate: 2021-04-10

Abstract: An essential assumption in traditional regression techniques is that predictors are measured without errors. Failing to take into account measurement error in predictors may result in severely biased inferences. Correcting measurement-error bias is an extremely difficult problem when estimating a regression function nonparametrically. We propose an approach to deal with measurement errors in predictors when modelling flexible regression functions. This approach depends on directly modelling the mean and the variance of the response variable after integrating out the true unobserved predictors in a penalized splines model. We demonstrate through simulation studies that our approach provides satisfactory prediction accuracy largely outperforming previously suggested local polynomial estimators even when the model is incorrectly specified and is competitive with the Bayesian estimator. PubDate: 2021-03-30

Abstract: We derive a probability distribution for the possible number of iterations required for a SIMT (single instruction multiple thread) program using rejection sampling to finish creating a sample across all threads. This distribution is found to match a recently proposed distribution from Chakraborty and Gupta (in: Communications in statistics: theory and methods, 2015) that was shown as a good approximation of certain datasets. This work demonstrates an exact application of this distribution. The distribution can be used to evaluate the relative merit of some sampling methods on the GPU without resort to numerical tests. The distribution reduces to the expected geometric distribution in the single thread per warp limit. A simplified formula to approximate the expected number of iterations required to obtain rejection iteration samples is provided. With this new result, a simple, efficient layout for assigning sampling tasks to threads on a GPU is found as a function of the rejection probability without recourse to more complicated rejection sampling methods. PubDate: 2021-03-30

Abstract: Stochastic approximation methods play a central role in maximum likelihood estimation problems involving intractable likelihood functions, such as marginal likelihoods arising in problems with missing or incomplete data, and in parametric empirical Bayesian estimation. Combined with Markov chain Monte Carlo algorithms, these stochastic optimisation methods have been successfully applied to a wide range of problems in science and industry. However, this strategy scales poorly to large problems because of methodological and theoretical difficulties related to using high-dimensional Markov chain Monte Carlo algorithms within a stochastic approximation scheme. This paper proposes to address these difficulties by using unadjusted Langevin algorithms to construct the stochastic approximation. This leads to a highly efficient stochastic optimisation methodology with favourable convergence properties that can be quantified explicitly and easily checked. The proposed methodology is demonstrated with three experiments, including a challenging application to statistical audio analysis and a sparse Bayesian logistic regression with random effects problem. PubDate: 2021-03-19

Abstract: We introduce a new Markov chain Monte Carlo (MCMC) sampler for infinite-dimensional inverse problems. Our new sampler is based on the affine invariant ensemble sampler, which uses interacting walkers to adapt to the covariance structure of the target distribution. We extend this ensemble sampler for the first time to infinite-dimensional function spaces, yielding a highly efficient gradient-free MCMC algorithm. Because our new ensemble sampler does not require gradients or posterior covariance estimates, it is simple to implement and broadly applicable. PubDate: 2021-03-15

Abstract: Structural reliability analysis is concerned with estimation of the probability of a critical event taking place, described by \(P(g(\mathbf{X} ) \le 0)\) for some n-dimensional random variable \(\mathbf{X} \) and some real-valued function g. In many applications the function g is practically unknown, as function evaluation involves time consuming numerical simulation or some other form of experiment that is expensive to perform. The problem we address in this paper is how to optimally design experiments, in a Bayesian decision theoretic fashion, when the goal is to estimate the probability \(P(g(\mathbf{X} ) \le 0)\) using a minimal amount of resources. As opposed to existing methods that have been proposed for this purpose, we consider a general structural reliability model given in hierarchical form. We therefore introduce a general formulation of the experimental design problem, where we distinguish between the uncertainty related to the random variable \(\mathbf{X} \) and any additional epistemic uncertainty that we want to reduce through experimentation. The effectiveness of a design strategy is evaluated through a measure of residual uncertainty, and efficient approximation of this quantity is crucial if we want to apply algorithms that search for an optimal strategy. The method we propose is based on importance sampling combined with the unscented transform for epistemic uncertainty propagation. We implement this for the myopic (one-step look ahead) alternative, and demonstrate the effectiveness through a series of numerical experiments. PubDate: 2021-03-09

Abstract: The renewal Hawkes process is a nascent point process model that generalizes the Hawkes process. Although it has shown strong application potential, fitting the renewal Hawkes process to data remains a challenging task, especially on larger datasets. This article tackles this challenge by providing two approaches that significantly reduce the time required to fit renewal Hawkes processes. Since derivative-based methods for optimization, in general, converge faster than derivative-free methods, our first approach is to derive algorithms for evaluating the gradient and Hessian of the log-likelihood function and then use a derivative-based method, such as the Newton–Raphson method, in maximizing the likelihood, instead of the derivative-free method currently being used. Our second approach is to seek linear time algorithms that produce accurate approximations to the likelihood function, and then directly optimize the approximation to the log-likelihood function. Our simulation experiments show that the Newton–Raphson method reduces the computational time by about 30%. Furthermore, the approximate likelihood methods produce equally accurate estimates compared to the methods based on the exact likelihood and are about 20–40 times faster on datasets with about 10,000 events. We conclude with an analysis of price changes of several currencies relative to the US Dollar. PubDate: 2021-03-04

Abstract: We consider the problem of estimating a parameter \(\theta \in \Theta \subseteq {\mathbb {R}}^{d_{\theta }}\) associated with a Bayesian inverse problem. Typically one must resort to a numerical approximation of gradient of the log-likelihood and also adopt a discretization of the problem in space and/or time. We develop a new methodology to unbiasedly estimate the gradient of the log-likelihood with respect to the unknown parameter, i.e. the expectation of the estimate has no discretization bias. Such a property is not only useful for estimation in terms of the original stochastic model of interest, but can be used in stochastic gradient algorithms which benefit from unbiased estimates. Under appropriate assumptions, we prove that our estimator is not only unbiased but of finite variance. In addition, when implemented on a single processor, we show that the cost to achieve a given level of error is comparable to multilevel Monte Carlo methods, both practically and theoretically. However, the new algorithm is highly amenable to parallel computation. PubDate: 2021-03-03

Abstract: Conditional particle filters (CPFs) are powerful smoothing algorithms for general nonlinear/non-Gaussian hidden Markov models. However, CPFs can be inefficient or difficult to apply with diffuse initial distributions, which are common in statistical applications. We propose a simple but generally applicable auxiliary variable method, which can be used together with the CPF in order to perform efficient inference with diffuse initial distributions. The method only requires simulatable Markov transitions that are reversible with respect to the initial distribution, which can be improper. We focus in particular on random walk type transitions which are reversible with respect to a uniform initial distribution (on some domain), and autoregressive kernels for Gaussian initial distributions. We propose to use online adaptations within the methods. In the case of random walk transition, our adaptations use the estimated covariance and acceptance rate adaptation, and we detail their theoretical validity. We tested our methods with a linear Gaussian random walk model, a stochastic volatility model, and a stochastic epidemic compartment model with time-varying transmission rate. The experimental findings demonstrate that our method works reliably with little user specification and can be substantially better mixing than a direct particle Gibbs algorithm that treats initial states as parameters. PubDate: 2021-03-03

Abstract: Mixtures of unigrams are one of the simplest and most efficient tools for clustering textual data, as they assume that documents related to the same topic have similar distributions of terms, naturally described by multinomials. When the classification task is particularly challenging, such as when the document-term matrix is high-dimensional and extremely sparse, a more composite representation can provide better insight into the grouping structure. In this work, we developed a deep version of mixtures of unigrams for the unsupervised classification of very short documents with a large number of terms, by allowing for models with further deeper latent layers; the proposal is derived in a Bayesian framework. The behavior of the deep mixtures of unigrams is empirically compared with that of other traditional and state-of-the-art methods, namely k-means with cosine distance, k-means with Euclidean distance on data transformed according to semantic analysis, partition around medoids, mixture of Gaussians on semantic-based transformed data, hierarchical clustering according to Ward’s method with cosine dissimilarity, latent Dirichlet allocation, mixtures of unigrams estimated via the EM algorithm, spectral clustering and affinity propagation clustering. The performance is evaluated in terms of both correct classification rate and Adjusted Rand Index. Simulation studies and real data analysis prove that going deep in clustering such data highly improves the classification accuracy. PubDate: 2021-03-03

Abstract: Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper, we introduce an extension of BART, called model trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART. PubDate: 2021-03-03

Abstract: There is a growing interest in probabilistic numerical solutions to ordinary differential equations. In this paper, the maximum a posteriori estimate is studied under the class of \(\nu \) times differentiable linear time-invariant Gauss–Markov priors, which can be computed with an iterated extended Kalman smoother. The maximum a posteriori estimate corresponds to an optimal interpolant in the reproducing kernel Hilbert space associated with the prior, which in the present case is equivalent to a Sobolev space of smoothness \(\nu +1\) . Subject to mild conditions on the vector field, convergence rates of the maximum a posteriori estimate are then obtained via methods from nonlinear analysis and scattered data approximation. These results closely resemble classical convergence results in the sense that a \(\nu \) times differentiable prior process obtains a global order of \(\nu \) , which is demonstrated in numerical examples. PubDate: 2021-03-03

Abstract: We derive a single-pass algorithm for computing the gradient and Fisher information of Vecchia’s Gaussian process loglikelihood approximation, which provides a computationally efficient means for applying the Fisher scoring algorithm for maximizing the loglikelihood. The advantages of the optimization techniques are demonstrated in numerical examples and in an application to Argo ocean temperature data. The new methods find the maximum likelihood estimates much faster and more reliably than an optimization method that uses only function evaluations, especially when the covariance function has many parameters. This allows practitioners to fit nonstationary models to large spatial and spatial–temporal datasets. PubDate: 2021-03-03

Abstract: We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the problem of finding representative sample weights as an optimization problem, which in many cases is convex and can be efficiently solved. Our formulation includes as a special case the selection of a fixed number of the samples, with equal weights, i.e., the problem of selecting a smaller representative subset of the samples. While this problem is combinatorial and not convex, heuristic methods based on convex optimization seem to perform very well. We describe our open-source implementation rsw and apply it to a skewed sample of the CDC BRFSS dataset. PubDate: 2021-02-28

Abstract: Many multivariate time series observed in practice are second order nonstationary, i.e. their covariance properties vary over time. In addition, missing observations in such data are encountered in many applications of interest, due to recording failures or sensor dropout, hindering successful analysis. This article introduces a novel method for data imputation in multivariate nonstationary time series, based on the so-called locally stationary wavelet modelling paradigm. Our methodology is shown to perform well across a range of simulation scenarios, with a variety of missingness structures, as well as being competitive in the stationary time series setting. We also demonstrate our technique on data arising in a health monitoring application. PubDate: 2021-02-17 DOI: 10.1007/s11222-021-09998-2