for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover Stat
  [SJR: 0.985]   [H-I: 5]   [1 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Online) 2049-1573
   Published by John Wiley and Sons Homepage  [1592 journals]
  • Robust estimation based on a novel family of arctan disparities and the
           limitation of the second order influence function
    • Authors: Bhaveshkumar Choithram Dharmani; Ayanendranath Basu
      Abstract: The article presents a new family of disparity measures based on the trigonometric tan−1 function. A subset of members from the proposed disparity family are shown to have excellent robustness properties against both inliers and outliers and are competitive with other popular disparities such as the Hellinger distance and the symmetric chi-square. The most notable thing about the presented disparity is that the strong robustness properties of the corresponding minimum distance estimators are attained in spite of predictions to the contrary by not just the first order influence function but also the second order influence function. This demonstrates the limitation of even the second order influence analysis in predicting the robustness properties of a disparity. Several examples and numerical studies illustrate the aforementioned property. Copyright © 2018 John Wiley & Sons, Ltd.
      PubDate: 2018-01-03T23:30:55.359487-05:
      DOI: 10.1002/sta4.170
  • Bump hunting by topological data analysis
    • Authors: Max Sommerfeld; Giseon Heo, Peter Kim, Stephen T. Rush, J. S. Marron
      Abstract: A topological data analysis approach is taken to the challenging problem of finding and validating the statistical significance of local modes in a data set. As with the SIgnificance of the ZERo (SiZer) approach to this problem, statistical inference is performed in a multi-scale way, that is, across bandwidths. The key contribution is a two-parameter approach to the persistent homology representation. For each kernel bandwidth, a sub-level set filtration of the resulting kernel density estimate is computed. Inference based on the resulting persistence diagram indicates statistical significance of modes. It is seen through a simulated example, and by analysis of the famous Hidalgo stamps data, that the new method has more statistical power for finding bumps than SiZer. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-11-27T21:48:06.078352-05:
      DOI: 10.1002/sta4.167
  • A matrix variate skew-t distribution
    • Authors: Michael P.B. Gallaugher; Paul D. McNicholas
      Abstract: Although there is ample work in the literature dealing with skewness in the multivariate setting, there is a relative paucity of work in the matrix variate paradigm. Such work is, for example, useful for modelling three-way data. A matrix variate skew-t distribution is derived based on a mean-variance matrix normal mixture. An expectation-conditional maximization algorithm is developed for parameter estimation. Simulated data are used for illustration. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-05-02T10:15:22.405988-05:
      DOI: 10.1002/sta4.143
  • Adaptively tuned particle swarm optimization with application to spatial
    • Authors: Matthew Simpson; Christopher K. Wikle, Scott H. Holan
      Abstract: Particle swarm optimization (PSO) algorithms are a class of heuristic optimization algorithms that are attractive for complex optimization problems. We propose using PSO to solve spatial design problems, e.g. choosing new locations to add to an existing monitoring network. Additionally, we introduce two new classes of PSO algorithms that perform well in a wide variety of circumstances, called adaptively tuned PSO and adaptively tuned bare bones PSO. To illustrate these algorithms, we apply them to a common spatial design problem: choosing new locations to add to an existing monitoring network. Specifically, we consider a network in the Houston, TX, area for monitoring ambient ozone levels, which have been linked to out-of-hospital cardiac arrest rates. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA
      PubDate: 2017-04-17T19:25:42.733323-05:
      DOI: 10.1002/sta4.142
  • On the selection consistency of Bayesian structured variable selection
    • Authors: Kaixu Yang; Xiaoxi Shen
      Abstract: A Bayesian variable selection framework is considered for analyzing image data. For the spatial dependencies to be modelled among the covariates, an Ising prior is assigned to the binary latent vector γ, which indicates whether a covariate should be selected or not. The selection process, that is, the estimation of γ, can be carried out with Gibbs sampler. Although the model has been used in many scientific applications, no theoretical development has been made. In this article, we established theories on the model selection consistency under mild conditions, which is an important theoretical property for high-dimensional variable selection. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-04-17T18:30:36.35643-05:0
      DOI: 10.1002/sta4.141
  • Robust quantile regression using a generalized class of skewed
    • Authors: Christian Galarza Morales; Victor Lachos Davila, Celso Barbosa Cabral, Luis Castro Cepero
      Abstract: It is well known that the widely popular mean regression model could be inadequate if the probability distribution of the observed responses do not follow a symmetric distribution. To deal with this situation, the quantile regression turns to be a more robust alternative for accommodating outliers and the misspecification of the error distribution because it characterizes the entire conditional distribution of the outcome variable. This paper presents a likelihood-based approach for the estimation of the regression quantiles based on a new family of skewed distributions. This family includes the skewed version of normal, Student-t, Laplace, contaminated normal and slash distribution, all with the zero quantile property for the error term and with a convenient and novel stochastic representation that facilitates the implementation of the expectation–maximization algorithm for maximum likelihood estimation of the pth quantile regression parameters. We evaluate the performance of the proposed expectation–maximization algorithm and the asymptotic properties of the maximum likelihood estimates through empirical experiments and application to a real-life dataset. The algorithm is implemented in the R package lqr, providing full estimation and inference for the parameters as well as simulation envelope plots useful for assessing the goodness of fit. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-03-15T00:30:31.492398-05:
      DOI: 10.1002/sta4.140
  • Covariate selection for multilevel models with missing data
    • Authors: Miguel Marino; Orfeu M. Buxton, Yi Li
      Pages: 31 - 46
      Abstract: Missing covariate data hamper variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods that are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data are present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyse the Healthy Directions–Small Business cancer prevention study, which evaluated a behavioural intervention programme targeting multiple risk-related behaviours in a working-class, multi-ethnic population. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-01-08T18:50:26.241275-05:
      DOI: 10.1002/sta4.133
  • A parametric model bridging between bounded and unbounded variograms
    • Authors: Martin Schlather; Olga Moreva
      Pages: 47 - 52
      Abstract: A simple variogram model with two parameters is presented that includes the power variogram for fractional Brownian motion, a modified De Wijsian model, the generalized Cauchy model and the multiquadric model. One parameter controls the sample path roughness of the process. The other parameter allows for a smooth transition between bounded and unbounded variograms, that is, between stationary and intrinsically stationary processes in a Gaussian framework, or between mixing and non-ergodic Brown–Resnick processes when modeling spatial extremes. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-02-07T18:00:37.543933-05:
      DOI: 10.1002/sta4.134
  • A Bayesian supervised dual-dimensionality reduction model for simultaneous
           decoding of LFP and spike train signals
    • Authors: Andrew Holbrook; Alexander Vandenberg-Rodes, Norbert Fortin, Babak Shahbaba
      Pages: 53 - 67
      Abstract: Neuroscientists are increasingly collecting multimodal data during experiments and observational studies. Different data modalities—such as electroencephalogram, functional magnetic resonance imaging, local field potential (LFP) and spike trains—offer different views of the complex systems contributing to neural phenomena. Here, we focus on joint modelling of LFP and spike train data and present a novel Bayesian method for neural decoding to infer behavioural and experimental conditions. This model performs supervised dual-dimensionality reduction: it learns low-dimensional representations of two different sources of information that not only explain variation in the input data itself but also predict extraneuronal outcomes. Despite being one probabilistic unit, the model consists of multiple modules: exponential principal components analysis (PCA) and wavelet PCA are used for dimensionality reduction in the spike train and LFP modules, respectively; these modules simultaneously interface with a Bayesian binary regression module. We demonstrate how this model may be used for prediction, parametric inference and identification of influential predictors. In prediction, the hierarchical model outperforms other models trained on LFP alone, spike train alone and combined LFP and spike train data. We compare two methods for modelling the loading matrix and find them to perform similarly. Finally, model parameters and their posterior distributions yield scientific insights. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-02-07T21:48:34.295858-05:
      DOI: 10.1002/sta4.137
  • “Stationary” point processes are uncommon on linear networks
    • Authors: Adrian Baddeley; Gopalan Nair, Suman Rakshit, Greg McSwiggan
      Pages: 68 - 78
      Abstract: Statistical methodology for analysing patterns of points on a network of lines, such as road traffic accident locations, often assumes that the underlying point process is “stationary” or “correlation-stationary.” However, such processes appear to be rare. In this paper, popular procedures for constructing a point process are adapted to linear networks: many of the resulting models are no longer stationary when distance is measured by the shortest path in the network. This undermines the rationale for popular statistical methods such as the K-function and pair correlation function. Alternative strategies are proposed, such as replacing the shortest-path distance by another metric on the network. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-02-08T18:20:31.12866-05:0
      DOI: 10.1002/sta4.135
  • A second look at inference for bivariate Skellam distributions
    • Authors: Sidi Allal Aissaoui; Christian Genest, Mhamed Mesfioui
      Pages: 79 - 87
      Abstract: Two bivariate extensions of the Skellam distribution were introduced in 2014 by a subset of the present authors, who also proposed moment estimators for the dependence parameters in these models. The limiting distribution of these estimators is established here, and their asymptotic efficiency is compared with that of the corresponding maximum likelihood estimators. © 2017 The
      Authors . Stat Published by John Wiley & Sons Ltd.
      PubDate: 2017-02-16T17:45:25.574195-05:
      DOI: 10.1002/sta4.136
  • A procedure to detect general association based on concentration of ranks
    • Authors: Pratyaydipta Rudra; Yihui Zhou, Fred A. Wright
      Pages: 88 - 101
      Abstract: In modern high-throughput applications, it is important to identify pairwise associations between variables and desirable to use methods that are powerful and sensitive to a variety of association relationships. We describe RankCover, a new non-parametric association test of association between two variables that measures the concentration of paired ranked points. Here, “concentration” is quantified using a disk-covering statistic similar to those employed in spatial data analysis. Considerations from the theory of Boolean coverage processes provide motivation, as well as an R2-like quantity to summarize strength of association. Analysis of simulated and real datasets demonstrates that the method is robust and often powerful in comparison with competing general association tests. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-02-16T17:50:29.948074-05:
      DOI: 10.1002/sta4.138
  • Accurate logistic variational message passing: algebraic and numerical
    • Authors: Tui H. Nolan; Matt P. Wand
      Pages: 102 - 112
      Abstract: We provide full algebraic and numerical details required for fitting accurate logistic likelihood regression-type models via variational message passing with factor graph fragments. Existing methodology of this type involves the Jaakkola–Jordan device, which is prone to poor accuracy. We examine two alternatives: the Saul–Jordan tilted bound device and conjugacy enforcement via multivariate normal prespecification of a key message. Both of these approaches appear in related literature. Our contributions facilitate immediate implementation within variational message passing schemes. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-03-09T03:50:30.04134-05:0
      DOI: 10.1002/sta4.139
  • Classification of RNA-Seq data via Gaussian copulas
    • Authors: Qingyang Zhang
      Pages: 171 - 183
      Abstract: RNA-sequencing (RNA-Seq) has become a preferred option to quantify gene expression, because it is more accurate and reliable than microarrays. In RNA-Seq experiments, the expression level of a gene is measured by the count of short reads that are mapped to the gene region. Although some normal-based statistical methods may also be applied to log-transformed read counts, they are not ideal for directly modelling RNA-Seq data. Two discrete distributions, Poisson distribution and negative binomial distribution, have been commonly used in the literature to model RNA-Seq data, where the latter is a natural extension of the former with allowance of overdispersion. Because of the technical difficulty in modelling correlated counts, most existing classifiers based on discrete distributions assume that genes are independent of each other. However, as we show in this paper, the independence assumption may cause non-ignorable bias in estimating the discriminant score, making the classification inaccurate. To this end, we drop the independence assumption and explicitly model the dependence between genes using a Gaussian copula. We apply a Bayesian approach to estimate the covariance matrix and the overdispersion parameter in negative binomial distribution. Both synthetic data and real data are used to demonstrate the advantages of our model. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-05-03T19:15:27.705224-05:
      DOI: 10.1002/sta4.144
  • A semi-parametric stochastic generator for bivariate extreme events
    • Authors: Giulia Marcon; Philippe Naveau, Simone Padoan
      Pages: 184 - 201
      Abstract: The analysis of multiple extreme values aims to describe the stochastic behaviour of observations in the joint upper tail of a distribution function. For instance, being able to simulate multivariate extreme events is convenient for end users who need a large number of random replications of extremes as input of a given complex system to test its sensitivity. The simulation of multivariate extremes is often based on the assumption that the dependence structure, the so-called extremal dependence function, is described by a specific parametric model. We propose a simulation method for sampling bivariate extremes, under the assumption that the extremal dependence function is semi-parametric. This yields a flexible tool that can be broadly applied in real-data analyses. With the aim of estimating the probability of belonging to some extreme sets, our methodology is examined via simulation and illustrated by an analysis of strong wind gusts in France. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-05-11T23:50:36.905459-05:
      DOI: 10.1002/sta4.145
  • Correction: Correcting for non-ignorable missingness in smoking trends
    • Authors: Juho Kopra; Tommi Härkänen, Hanna Tolonen, Juha Karvanen
      Pages: 202 - 203
      PubDate: 2017-05-22T18:30:26.999636-05:
      DOI: 10.1002/sta4.146
  • When is the mode functional the Bayes classifier'
    • Authors: Tilmann Gneiting
      Pages: 204 - 206
      Abstract: In classification problems, the mode of the conditional probability distribution, that is, the most probable category, is the Bayes classifier under zero-one or misclassification loss. Under any other cost structure, the mode fails to persist. ©2017 The
      Authors . Stat Published by John Wiley & Sons Ltd
      PubDate: 2017-06-06T00:46:11.398685-05:
      DOI: 10.1002/sta4.148
  • Posterior convergence rates for high-dimensional precision matrix
           estimation using G-Wishart priors
    • Authors: Sayantan Banerjee
      Pages: 207 - 217
      Abstract: We study the posterior convergence behaviour of a precision matrix corresponding to a Gaussian graphical model in the high-dimensional set-up under sparsity assumptions. Recent works include studying posterior convergence rates of precision matrices assuming an approximate banding structure, and extension of such result to arbitrary decomposable graphical models using a transformation to Cholesky factor of the precision matrices. In this paper, we study the same for the wider class of arbitrary decomposable graphical models under similar sparsity assumptions using a G-Wishart prior, but without the complications of using a Cholesky factor, and arrive at identical posterior convergence rates. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-06-09T07:16:56.548528-05:
      DOI: 10.1002/sta4.147
  • Covariance analysis for temporal data, with applications to DNA modelling
    • Authors: Ian Dryden; Blake Hill, Hao Wang, Charles Laughton
      Pages: 218 - 230
      Abstract: We introduce methodology for analysing the mean size-and-shape and covariance matrix of landmark data that are collected over time. Motivated by a study of DNA damage, we study some permutation-based tests for investigating significant differences in the structure of the mean and the variability/covariance of size and shape of point sets that evolve over time. The covariance matrix tests make use of some recently introduced metrics for comparing covariance matrices. We demonstrate that the tests have the correct significance level in various simulation studies, and we also investigate the relative power of the tests. Finally, we apply the procedures to the DNA datasets, providing practical insights into different types of DNA damage. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-06-13T21:45:24.799838-05:
      DOI: 10.1002/sta4.149
  • Distance-weighted discrimination of face images for gender classification
    • Authors: Mónica Benito; Eduardo García-Portugués, J. S. Marron, Daniel Peña
      Pages: 231 - 240
      Abstract: We illustrate the advantages of distance-weighted discrimination for classification and feature extraction in a high-dimension low sample size (HDLSS) situation. The HDLSS context is a gender classification problem of face images in which the dimension of the data is several orders of magnitude larger than the sample size. We compare distance-weighted discrimination with Fisher's linear discriminant, support vector machines and principal component analysis by exploring their classification interpretation through insightful visuanimations and by examining the classifiers' discriminant errors. This analysis enables us to make new contributions to the understanding of the drivers of human discrimination between men and women. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-08-08T00:10:44.878913-05:
      DOI: 10.1002/sta4.151
  • The modified Matérn process
    • Authors: Ian Laga; William Kleiber
      Pages: 241 - 247
      Abstract: The behaviour of a stationary random field can be specified through either its covariance or spectrum. In spatial statistics, the Matérn covariance or spectral density is one of the most popular choices due to separation of scale and smoothness effects. We propose a generalization of the Matérn spectral density, generating random processes we term as modified Matérn processes. Our proposal allows for two additional parameters that can loosely be interpreted as arising from a continuous moving average process. The Matérn is a special case under certain parameter restrictions. We illustrate the flexibility of the modified Matérn in an application on an ocean model simulation. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-08-08T00:40:26.47124-05:0
      DOI: 10.1002/sta4.152
  • Frequentist and Bayesian inference for Gaussian–log-Gaussian wavelet
           trees and statistical signal processing applications
    • Authors: Robert Dahl Jacobsen; Jesper Møller
      Pages: 248 - 256
      Abstract: We introduce new estimation methods for a subclass of the Gaussian scale mixture models for wavelet trees by Wainwright, Simoncelli and Willsky that rely on modern results for composite likelihoods and approximate Bayesian inference. Our methodology is illustrated for denoising and edge detection problems in two-dimensional images. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-08-08T00:36:44.42698-05:0
      DOI: 10.1002/sta4.156
  • Consistency of the generalized likelihood ratio test for heteroscedastic
           normal mixtures
    • Authors: Wenhua Jiang
      Pages: 257 - 270
      Abstract: We study a generalized likelihood ratio test for heteroscedastic normal mixtures. The test is based on the generalized maximum likelihood estimator in the context of demixing. We prove that the test is consistent throughout the detectable region of the sparse heteroscedastic mixtures. The test requires specification of a lower bound for the variances. We study the effect of the misspecification of the lower bound. It turns out that the misspecification does not have an adverse effect on the power of the test. We demonstrate the satisfactory power of the test in detecting non-null component by comparing with several state-of-the-art procedures. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-08-09T20:40:53.861225-05:
      DOI: 10.1002/sta4.154
  • A determinant-free method to simulate the parameters of large Gaussian
    • Authors: Louis Ellam; Heiko Strathmann, Mark Girolami, Iain Murray
      Pages: 271 - 281
      Abstract: We propose a determinant-free approach for simulation-based Bayesian inference in high-dimensional Gaussian models. We introduce auxiliary variables with covariance equal to the inverse covariance of the model. The joint probability of the auxiliary model can be computed without evaluating determinants, which are often hard to compute in high dimensions. We develop a Markov chain Monte Carlo sampling scheme for the auxiliary model that requires no more than the application of inverse-matrix-square-roots and the solution of linear systems. These operations can be performed at large scales with rational approximations. We provide an empirical study on both synthetic and real-world data for sparse Gaussian processes and for large-scale Gaussian Markov random fields. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-08-15T00:45:45.46789-05:0
      DOI: 10.1002/sta4.153
  • Explicit, identical maximum likelihood estimates for some cyclic Gaussian
           and cyclic Ising models
    • Authors: Giovanni M. Marchetti; Nanny Wermuth
      Pages: 282 - 291
      Abstract: Cyclic models are a subclass of graphical Markov models with simple, undirected probability graphs that are chordless cycles. In general, all currently known distributions require iterative procedures to obtain maximum likelihood estimates in such cyclic models. For exponential families, the relevant conditional independence constraint for a variable pair is given all remaining variables, and it is captured by vanishing canonical parameters involving this pair. For Gaussian models, the canonical parameter is a concentration, that is, an off-diagonal element in the inverse covariance matrix, while for Ising models, it is a conditional log-linear, two-factor interaction. We give conditions under which the two different likelihood functions, that is, one for continuous and one for binary variables, permit nevertheless explicit maximum likelihood estimates, and we show that their estimated correlation matrices are identical, provided the relevant starting correlation matrices coincide. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-08-16T23:05:56.475774-05:
      DOI: 10.1002/sta4.155
  • Visualizing uncertainty in areal data with bivariate choropleth maps, map
           pixelation and glyph rotation
    • Authors: Lydia R. Lucchesi; Christopher K. Wikle
      Pages: 292 - 302
      Abstract: In statistics, we quantify uncertainty to help determine the accuracy of estimates, yet this crucial piece of information is rarely included on maps visualizing areal data estimates. We develop and present three approaches to include uncertainty on maps: (1) the bivariate choropleth map repurposed to visualize uncertainty; (2) the pixelation of counties to include values within an estimate's margin of error; and (3) the rotation of a glyph, located at a county's centroid, to represent an estimate's uncertainty. The second method is presented as both a static map and visuanimation. We use American Community Survey estimates and their corresponding margins of error to demonstrate the methods and highlight the importance of visualizing uncertainty in areal data. An extensive online supplement provides the R code necessary to produce the maps presented in this article as well as alternative versions of them. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-08-22T22:41:58.234516-05:
      DOI: 10.1002/sta4.150
  • Baum–Welch algorithm on directed acyclic graph for mixtures with
           latent Bayesian networks
    • Authors: Jia Li; Lin Lin
      Pages: 303 - 314
      Abstract: We consider a mixture model with latent Bayesian network (MLBN) for a set of random vectors X(t),X(t)∈Rdt,t=1,…,T. Each X(t) is associated with a latent state st, given which X(t) is conditionally independent from other variables. The joint distribution of the states is governed by a Bayes net. Although specific types of MLBN have been used in diverse areas such as biomedical research and image analysis, the exact expectation–maximization (EM) algorithm for estimating the models can involve visiting all the combinations of states, yielding exponential complexity in the network size. A prominent exception is the Baum–Welch algorithm for the hidden Markov model, where the underlying graph topology is a chain. We hereby develop a new Baum–Welch algorithm on directed acyclic graph (BW-DAG) for the general MLBN and prove that it is an exact EM algorithm. BW-DAG provides insight on the achievable complexity of EM. For a tree graph, the complexity of BW-DAG is much lower than that of the brute-force EM. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-09-17T23:55:50.060241-05:
      DOI: 10.1002/sta4.158
  • An ensemble quadratic echo state network for non-linear spatio-temporal
    • Authors: Patrick L. McDermott; Christopher K. Wikle
      Pages: 315 - 330
      Abstract: Spatio-temporal data and processes are prevalent across a wide variety of scientific disciplines. These processes are often characterized by non-linear time dynamics that include interactions across multiple scales of spatial and temporal variability. The datasets associated with many of these processes are increasing in size because of advances in automated data measurement, management and numerical simulator output. Non-linear spatio-temporal models have only recently seen interest in statistics, but there are many classes of such models in the engineering and geophysical sciences. Traditionally, these models are more heuristic than those that have been presented in the statistics literature but are often intuitive and quite efficient computationally. We show here that with fairly simple, but important, enhancements, the echo state network machine learning approach can be used to generate long-lead forecasts of non-linear spatio-temporal processes, with reasonable uncertainty quantification, and at a fraction of the computational expense of a traditional parametric non-linear spatio-temporal models. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-10-16T02:21:47.704195-05:
      DOI: 10.1002/sta4.160
  • Fast and accurate Bayesian model criticism and conflict diagnostics using
    • Authors: Egil Ferkingstad; Leonhard Held, Håvard Rue
      Pages: 331 - 344
      Abstract: Bayesian hierarchical models are increasingly popular for realistic modelling and analysis of complex data. This trend is accompanied by the need for flexible, general and computationally efficient methods for model criticism and conflict detection. Usually, a Bayesian hierarchical model incorporates a grouping of the individual data points, as, for example, with individuals in repeated measurement data. In such cases, the following question arises: Are any of the groups “outliers,” or in conflict with the remaining groups' Existing general approaches aiming to answer such questions tend to be extremely computationally demanding when model fitting is based on Markov chain Monte Carlo. We show how group-level model criticism and conflict detection can be carried out quickly and accurately through integrated nested Laplace approximations (INLA). The new method is implemented as a part of the open-source R-INLA package for Bayesian computing ( Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-10-16T02:26:02.51441-05:0
      DOI: 10.1002/sta4.163
  • Uncertainty of a detected spatial cluster in 1D: quantification and
    • Authors: Junho Lee; Ronald E. Gangnon, Jun Zhu, Jingjing Liang
      Pages: 345 - 359
      Abstract: Spatial cluster detection is an important problem in a variety of scientific disciplines such as environmental sciences, epidemiology and sociology. However, there appears to be very limited statistical methodology for quantifying the uncertainty of a detected cluster. In this paper, we develop a new method for the quantification and visualization of uncertainty associated with a detected cluster. Our approach is defining a confidence set for the true cluster and visualizing the confidence set, based on the maximum likelihood, in time or in one-dimensional space. We evaluate the pivotal property of the statistic used to construct the confidence set and the coverage rate for the true cluster via empirical distributions. For illustration, our methodology is applied to both simulated data and an Alaska boreal forest dataset. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-10-19T01:57:50.139512-05:
      DOI: 10.1002/sta4.161
  • A comparison of resampling and recursive partitioning methods in random
           forest for estimating the asymptotic variance using the infinitesimal
    • Authors: Cole Brokamp; MB Rao, Patrick Ryan, Roman Jandarov
      Pages: 360 - 372
      Abstract: The infinitesimal jackknife (IJ) has recently been applied to the random forest to estimate its prediction variance. These theorems were verified under a traditional random forest framework that uses classification and regression trees and bootstrap resampling. However, random forests using conditional inference trees and subsampling have been found to be not prone to variable selection bias. Here, we conduct simulation experiments using a novel approach to explore the applicability of the IJ to random forests using variations on the resampling method and base learner. Test data points were simulated and each trained using random forest on one hundred simulated training data sets using different combinations of resampling and base learners. Using conditional inference trees instead of traditional classification and regression trees as well as using subsampling instead of bootstrap sampling resulted in a much more accurate estimation of prediction variance when using the IJ. The random forest variations here have been incorporated into an open-source software package for the R programming language. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-10-20T00:01:03.467459-05:
      DOI: 10.1002/sta4.162
  • Bias and estimation under misspecification of the risk period in
           self-controlled case series studies
    • Authors: Luis Fernando Campos; Damla Şentürk, Yanjun Chen, Danh V. Nguyen
      Pages: 373 - 389
      Abstract: The self-controlled case series (SCCS) method is useful for estimating the relative incidence (RI) of acute events, such as adverse events during a specified risk period following an exposure (e.g. six-week period after vaccinations or 30-day period after infection-related hospitalizations). In practice, the “optimal” risk period is unknown and must be specified. To date, two approaches are available to guide the specification of the risk period. Both methods do not fully utilize the nature of the bias due to misspecification, which to date has not been characterized. Thus, we elucidate the bias of SCCS estimate of the RI when the risk period is misspecified. We then propose a novel method that more effectively estimates the optimal risk period and the associated RI of adverse events. The new method incorporates information on the functional form of the bias. Efficacy of the proposed approach is illustrated with substantial reduction in bias and variance in simulation studies. The proposed method is illustrated with two SCCS studies to determine the (1) risk of idiopathic thrombocytopenic purpura after measles–mumps–rubella vaccination in children and (2) risk of cardiovascular events after infection-related hospitalizations in older patients on dialysis. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-10-20T00:16:23.478737-05:
      DOI: 10.1002/sta4.166
  • Spatial data fusion for large non-Gaussian remote sensing datasets
    • Authors: Hongxiang Shi; Emily L. Kang
      Pages: 390 - 404
      Abstract: Remote sensing data are playing a vital role in understanding the pattern of the Earth's geophysical processes in environmental and climate sciences. We propose a spatial data-fusion methodology that is able to take advantage of two (or potentially more) large remote sensing datasets with the exponential family of distributions. Our hierarchical model follows the generalized linear mixed model but also leverages a low-rank spatial random effects model to allow for flexible spatial covariance and cross-covariance structure. We take an empirical hierarchical modelling approach where any unknown parameters are estimated by maximum likelihood estimation via an efficient expectation–maximization algorithm. Through a Markov chain Monte Carlo algorithm, spatial predictions are obtained by generating samples from the empirical predictive distribution where the unknown parameters are substituted by the estimates. The performance of our proposed method is investigated through a simulation study and a real-data example. It shows that via borrowing strength across complementary datasets, the proposed method improves spatial predictions reciprocally. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-10-23T23:24:15.89288-05:0
      DOI: 10.1002/sta4.165
  • Robust non-parametric tests for imaging data based on data depth
    • Authors: Sara López-Pintado; Julia Wrobel
      Pages: 405 - 419
      Abstract: Research in many disciplines stands on the analysis of complex high-dimensional data sets. For example, in clinical neuroscience, large collections of brain images from different subjects are obtained by advanced scanning techniques to study variations in different neurological states. Developing new tools to analyse the main characteristics of these rich data sets is needed. We consider the basic unit of observation to be a general function, which is defined and takes values in spaces of arbitrary dimension. On the basis of a notion of depth for general functions denoted as multivariate volume depth (MVD), images will be ranked from centre to outward and robust estimators can be defined. The theoretical properties of MVD are established, and several non-parametric depth-based permutation tests for comparing two groups of images are proposed; in particular, we introduce two-sample location tests based on MVD. In addition, dispersion measures for a sample of images are introduced and used for testing two sample differences in dispersion. All the proposed tests are calibrated in an extensive simulation study. These statistical tools are applied to detect whether there are differences between the brain images from healthy individuals and patients with major depressive disorders. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-10-30T00:40:27.486685-05:
      DOI: 10.1002/sta4.168
  • Parallel Markov chain Monte Carlo for Bayesian dynamic item response
           models in educational testing
    • Authors: Zheng Wei; Xiaojing Wang, Erin Marie Conlon
      Pages: 420 - 433
      Abstract: Bayesian dynamic item response models have been successfully used for educational testing data; these models are especially useful for individually varying and irregularly spaced longitudinal testing data. However, because of the complexity of the models and the large size of the data sets, computation time is excessive for carrying out full data analyses in practice. Here, we introduce a parallel Markov chain Monte Carlo method to speed the implementation of these Bayesian models. Using both simulation data and real educational testing data for reading ability, we demonstrate that computation time is greatly reduced for our parallel computing method versus full data analyses. The estimated error of our method is shown to be small, using common distance metrics. Our parallel computing approach can be used for other models in the Educational and Psychometric fields, including Bayesian item response theory models. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-11-02T01:25:26.879838-05:
      DOI: 10.1002/sta4.164
  • Empirical likelihood for linear structural equation models with dependent
    • Authors: Y. Samuel Wang; Mathias Drton
      Pages: 434 - 447
      Abstract: We consider linear structural equation models that are associated with mixed graphs. The structural equations in these models only involve observed variables, but their idiosyncratic error terms are allowed to be correlated and non-Gaussian. We propose empirical likelihood procedures for inference and suggest several modifications, including a profile likelihood, in order to improve tractability and performance of the resulting methods. Through simulations, we show that when the error distributions are non-Gaussian, the use of empirical likelihood and the proposed modifications may increase statistical efficiency and improve assessment of significance. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-11-03T00:06:01.809416-05:
      DOI: 10.1002/sta4.169
  • A class of flexible models for analysis of complex structured correlated
           data with application to clustered longitudinal data
    • Authors: Grace Y. Yi; Wenqing He, Haocheng Li
      Pages: 448 - 461
      Abstract: Generalized linear mixed models have been widely used in correlated data analysis. The applicability of these models is, however, hampered when data possess multilevel complex association structures. For instance, for longitudinal data arising in clusters, modelling complexity is a serious issue, and it is desirable to develop flexible models that are both computationally manageable and interpretatively meaningful. For these purposes, we propose a new class of flexible models, pairwise generalized linear mixed models, to facilitate correlated data that may possess multilevel complex association structures. Inferential procedures are developed to accommodate the proposed modelling framework, and asymptotic properties of the proposed method are established. The proposed models are evaluated through numerical studies. Copyright © 2017 John Wiley & Sons, Ltd.
      PubDate: 2017-11-07T21:11:40.042986-05:
      DOI: 10.1002/sta4.159
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-