for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover Stat
  [2 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Online) 2049-1573
   Published by John Wiley and Sons Homepage  [1597 journals]
  • Uncovering smartphone usage patterns with multi‐view mixed
           membership models
    • Authors: Seppo Virtanen; Mattias Rost, Alistair Morrison, Matthew Chalmers, Mark Girolami
      Abstract: We present a novel class of mixed membership models for combining information from multiple data sources inferring inter‐view and intra‐view statistical associations. An important contemporary application of this work is the meaningful synthesis of data sources corresponding to smartphone application usage, app developers' descriptions and customer feedback. We demonstrate the ability of the model to infer meaningful, interpretable and informative app usage patterns based on the app usage data augmented with rich text data describing the apps. We provide quantitative model evaluations showing the model provides significantly better predictive ability than comparative related existing methods. © 2016 The
      Authors . Stat Published by John Wiley & Sons Ltd
      PubDate: 2016-01-21T19:05:00.383077-05:
      DOI: 10.1002/sta4.103
  • Correlated components
    • Authors: Trevor F. Cox; David S. Arnold
      Abstract: Principal components analysis is a much used and practical technique for analysing multivariate data, finding a particular set of linear compounds of the variables under consideration, such that covariances between all pairs are 0. An alternative view is that when the variables are considered as axes in a Cartesian coordinate system, then principal components analysis is the particular orthogonal rotation of the axes that makes all the pairwise covariances equal to 0. It is this view that is taken here, but instead of finding the rotation that makes all covariances equal to 0, an orthogonal rotation is found that maximizes the sum of the covariances. The rotation is not unique, except for the two or three component case, and so another criterion can be used alongside so that it too can also be optimized. The motivation is that two highly correlated components will tend to measure the same latent variable but with interesting differences because of the orthogonality between them. Theory is given for identifying the correlated components as well as algorithms for finding them. Two illustrative examples are provided, one involving gene expression data and the other consumer questionnaire data. Copyright © 2016 John Wiley & Sons, Ltd.
      PubDate: 2016-01-17T21:11:35.251747-05:
      DOI: 10.1002/sta4.99
  • A genome-wide association study of multiple longitudinal traits with
           related subjects
    • Authors: Yubin Sung; Zeny Feng, Sanjeena Subedi
      Abstract: Pleiotropy is a phenomenon that a single gene inflicts multiple correlated phenotypic effects, often characterized as traits, involving multiple biological systems. We propose a two-stage method to identify pleiotropic effects on multiple longitudinal traits from a family-based data set. The first stage analyses each longitudinal trait via a three-level mixed-effects model. Random effects at the subject-level and at the family-level measure the subject-specific genetic effects and between-subjects intraclass correlations within families, respectively. The second stage performs a simultaneous association test between a single nucleotide polymorphism and all subject-specific effects for multiple longitudinal traits. This is performed using a quasi-likelihood scoring method in which the correlation structure among related subjects is adjusted. Two simulation studies for the proposed method are undertaken to assess both the type I error control and the power. Furthermore, we demonstrate the utility of the two-stage method in identifying pleiotropic genes or loci by analyzing the Genetic Analysis Workshop 16 Problem 2 cohort data drawn from the Framingham Heart Study and illustrate an example of the kind of complexity in data that can be handled by the proposed approach. We establish that our two-stage method can identify pleiotropic effects while accommodating varying data types in the model. Copyright © 2016 John Wiley & Sons, Ltd.
      PubDate: 2016-01-12T20:24:55.293787-05:
      DOI: 10.1002/sta4.102
  • Estimation, filtering and smoothing in the stochastic conditional duration
           model: an estimating function approach
    • Authors: Ramanathan Thekke; Anuj Mishra, Bovas Abraham
      Abstract: Stochastic conditional duration models are widely used in the financial econometrics literature to model the duration between transactions in a financial market. Even though there are developments in terms of modelling aspects, estimation, filtering and smoothing are still being investigated by researchers in this area. Almost all the existing procedures are highly computational intensive because of the complexity of the likelihood function. In this paper, we suggest a new procedure for estimation, filtering and smoothing in stochastic conditional duration models, based on estimating functions. Simulation studies indicate that the suggested procedure performs well and also fast in terms of computation. Copyright © 2016 John Wiley & Sons, Ltd.
      PubDate: 2016-01-12T19:21:32.593463-05:
      DOI: 10.1002/sta4.101
  • Confidence bands for smoothness in nonparametric regression
    • Authors: Julian Faraway
      Abstract: The choice of the smoothing parameter in nonparametric regression is critical to the form of the estimated curve and any inference that follows. Many methods are available that will generate a single choice for this parameter. Here, we argue that the considerable uncertainty in this choice should be explicitly represented. The construction of standard simultaneous confidence bands in nonparametric regression often requires difficult mathematical arguments. We question their practical utility, presenting several deficiencies. We propose a new kind of confidence band that reflects the uncertainty regarding the smoothness of the estimate. Copyright © 2016 John Wiley & Sons, Ltd.
      PubDate: 2016-01-10T23:24:47.684078-05:
      DOI: 10.1002/sta4.100
  • Issue Information
    • Abstract: No abstract is available for this article.
      PubDate: 2015-02-16T02:36:03.549675-05:
      DOI: 10.1002/sta4.63
  • Correcting for non-ignorable missingness in smoking trends
    • Pages: 1 - 14
      Abstract: Data missing not at random (MNAR) are a major challenge in survey sampling. We propose an approach based on registry data to deal with non-ignorable missingness in health examination surveys. The approach relies on follow-up data available from administrative registers several years after the survey. For illustration, we use data on smoking prevalence in Finnish National FINRISK study conducted in 1972–97. The data consist of measured survey information including missingness indicators, register-based background information and register-based time-to-disease survival data. The parameters of missingness mechanism are estimable with these data although the original survey data are MNAR. The underlying data generation process is modelled by a Bayesian model. The results indicate that the estimated smoking prevalence rates in Finland may be significantly affected by missing data. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-01-29T22:52:50.683143-05:
      DOI: 10.1002/sta4.73
  • Spanifold: spanning tree flattening onto lower dimension
    • Authors: Shoja'eddin Chenouri; Petr Kobelevskiy, Christopher G. Small
      Pages: 15 - 31
      Abstract: Dimensionality reduction and manifold learning techniques attempt to recover a lower-dimensional submanifold from the data as encoded in high dimensions. Many techniques, linear or non-linear, have been introduced in the literature. Standard methods, such as Isomap and local linear embedding, map the high-dimensional data points into a low dimension so as to globally minimize a so-called energy function, which measures the mismatch between the precise geometry in high dimensions and the approximate geometry in low dimensions. However, the local effects of such minimizations are often unpredictable, because the energy minimization algorithms are global in nature. In contrast to these methods, the Spanifold algorithm of this paper constructs a tree on the manifold and flattens the manifold in such a way as to approximately preserve pairwise distance relationships within the tree. The vertices of this tree are the data points, and the edges of the tree form a subset of the edges of the nearest-neighbour graph on the data. In addition, the pairwise distances between data points close to the root of the tree undergo minimal distortion as the data are flattened. This allows the user to design the flattening algorithm so as to approximately preserve neighbour relationships in any chosen local region of the data. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-02-23T04:25:55.709429-05:
      DOI: 10.1002/sta4.74
  • Unbiased regression estimation under correlated linkage errors
    • Authors: Gunky Kim; Raymond Chambers
      Pages: 32 - 45
      Abstract: Linkage errors can occur when probability-based methods are used to link records from two or more distinct data sets corresponding to the same target population. Recent research on allowing for these errors when carrying out regression analysis based on linked data assumes that the linkage errors are independent when more than two data sets are used to generate these data. In this paper, we extend these results to accommodate the more realistic scenario of dependent linkage errors. Our simulation results show that an incorrect assumption of independent linkage errors can lead to insufficient linkage error bias correction, while an approach that allows for correlated linkage errors appears to overcome this problem. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-03-02T06:49:03.886206-05:
      DOI: 10.1002/sta4.76
  • Non-parametric Bayes to infer playing strategies adopted in a population
           of mobile gamers
    • Authors: Seppo Virtanen; Mattias Rost, Matthew Higgs, Alistair Morrison, Matthew Chalmers, Mark Girolami
      Pages: 46 - 58
      Abstract: Analysis of trace logging data collections of interactions of a heterogenous and diverse population of consumers of digital software with mobile devices provides unprecedented possibilities for understanding how software is actually used and for finding recurring patterns of software usage over the population that are exhibited to a greater or lesser degree in each individual software user. In this work, we consider an elementary mobile game played by a population of mobile gamers and collect pieces of game sessions over an extended period, resulting in a collection of users' trace logs for multiple sessions. We develop a simple, yet flexible, non-parametric Bayes approach to infer playing strategies adopted in the population from the logged traces of game interactions. We demonstrate that our approach finds interpretable strategies and provides good predictive performance compared with alternative modelling assumptions using a non-parametric Bayes framework. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-03-04T03:59:11.414744-05:
      DOI: 10.1002/sta4.75
  • On sparse representation for optimal individualized treatment selection
           with penalized outcome weighted learning
    • Authors: Rui Song; Michael Kosorok, Donglin Zeng, Yingqi Zhao, Eric Laber, Ming Yuan
      Pages: 59 - 68
      Abstract: As a new strategy for treatment, which takes individual heterogeneity into consideration, personalized medicine is of growing interest. Discovering individualized treatment rules for patients who have heterogeneous responses to treatment is one of the important areas in developing personalized medicine. As more and more information per individual is being collected in clinical studies and not all of the information is relevant for treatment discovery, variable selection becomes increasingly important in discovering individualized treatment rules. In this article, we develop a variable selection method based on penalized outcome weighted learning through which an optimal treatment rule is considered as a classification problem where each subject is weighted proportional to his or her clinical outcome. We show that the resulting estimator of the treatment rule is consistent and establish variable selection consistency and the asymptotic distribution of the estimators. The performance of the proposed approach is demonstrated via simulation studies and an analysis of chronic depression data. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-03-06T01:43:09.313932-05:
      DOI: 10.1002/sta4.78
  • Optimal sample planning for system state analysis with partial data
    • Authors: Martin Heller; Jan Hannig, Malcolm R. Leadbetter
      Pages: 69 - 80
      Abstract: We develop optimal and computationally practical procedures to minimize uncertainty concerning the presence of dangerous levels of a contaminant within a building when neither replication nor complete data collection is feasible. More generally, we address inference about the state of a finite system when the state is related to information collected over components of the system when only partial data collection is feasible. When there is no correlation between sample locations, a simple random sample or maximum a priori trait presence would provide optimal sampling choices. When complicated probability models describe trait manifestation, the need to collect only partial data precludes a full fitting of complicated models, and one must rely heavily on prior information naturally leading to a Bayesian approach. Herein, we introduce a computationally efficient heuristic algorithm to simultaneously find optimal sample locations and decision rule parameterizations and then show that it drastically outperforms both random selection and maximum a priori methods. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-03-27T04:04:11.619384-05:
      DOI: 10.1002/sta4.79
  • Visuanimation in statistics
    • Pages: 81 - 96
      Abstract: This paper explores the use of visualization through animations, coined visuanimation, in the field of statistics. In particular, it illustrates the embedding of animations in the paper itself and the storage of larger movies in the online supplemental material. We present results from statistics research projects using a variety of visuanimations, ranging from exploratory data analysis of image data sets to spatio-temporal extreme event modelling; these include a multiscale analysis of classification methods, the study of the effects of a simulated explosive volcanic eruption and an emulation of climate model output. This paper serves as an illustration of visuanimation for future publications in Stat. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-04-14T02:35:06.1195-05:00
      DOI: 10.1002/sta4.77
  • A new weighted likelihood approach
    • Authors: Adhidev Biswas; Tania Roy, Suman Majumder, Ayanendranath Basu
      Pages: 97 - 107
      Abstract: In this paper, we propose a new weighted likelihood procedure. Here, the weights are suitably calibrated functions of appropriately described residuals at each data point. The residuals describe the match (or mismatch) between the empirical distribution function and the model distribution function. If the match is high, the observation is considered to be a regular observation. But for large (in magnitude) residuals, there is a mismatch, and the corresponding likelihood score function may require downweighting in order to obtain a robust solution. As there is little or no downweighting for observations where there is no evidence of mismatch, asymptotically, we expect that there will be no downweighting under the pure model leading to highly efficient estimators. On the other hand, properly calibrated weight functions that penalize the observations with large residuals will lead to highly robust solutions under model misspecification and the presence of outliers. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-04-21T02:13:36.208644-05:
      DOI: 10.1002/sta4.80
  • Multivariate spatial hierarchical Bayesian empirical likelihood methods
           for small area estimation
    • Authors: Aaron T. Porter; Scott H. Holan, Christopher K. Wikle
      Pages: 108 - 116
      Abstract: Recent advances in small area estimation incorporating both explicit spatial autocorrelation and empirical likelihood techniques have produced estimates with greater precision. Furthermore, the multivariate Fay–Herriot models take advantage of within-location correlation between multiple outcomes for a set of small areas. We extend the Fay–Herriot model by utilizing empirical likelihood techniques to the spatially explicit multivariate setting. We then model the five-year period estimates from the American Community Survey (2006–10) of percent of unemployed individuals and percent of families in poverty for the counties of Missouri. We demonstrate bivariate reduction in leave-one-out median absolute deviation over an approximately equivalently specified parametric model. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-05-04T22:50:51.111952-05:
      DOI: 10.1002/sta4.81
  • A family of likelihood functions to make inferences about the reliability
           parameter for many stress-strength distributions
    • Pages: 117 - 129
      Abstract: Many research papers in statistical literature address the estimation of the reliability parameter in stress-strength models, considering different types of distributions for stress and for strength. We have found that for many of these distributions, their corresponding profile likelihood functions of the reliability parameter can be grouped in a family of likelihood functions, with a simple algebraic structure that facilitates making inferences about this parameter. The novel family of likelihood functions, proposed here, maximum likelihood estimation procedures and suitable reparameterizations, were used to obtain a simple closed-form expression for the likelihood confidence interval of the reliability parameter. This new approach is particularly useful when small and/or unequal sample sizes are involved. Simulation studies for some distributions were carried out to illustrate the performance of the likelihood confidence intervals for the reliability parameter, and adequate coverage frequencies were obtained. The simplicity of our unifying proposal is shown here using three stress-strength distributions that have been analysed individually in statistical literature. However, there are many distributions for which inferences about the reliability parameter could be easily obtained using the proposed family. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-05-27T02:16:01.221211-05:
      DOI: 10.1002/sta4.83
  • Random effects model for bias estimation: higher-order asymptotic
    • Authors: Andrew L. Rukhin
      Pages: 130 - 139
      Abstract: A common issue in physical, chemical and biometrical applications is to validate a laboratory's method. For that purpose, a lab performs measurements on a certified reference material with a given coverage interval. These reference materials are a major tool for assuring quality and reliability of results obtained by a lab in analysis and testing. Assuming that the measurand is random with a normal distribution whose parameters are obtained from the reference material certificate, new remarkably accurate confidence intervals for the bias are derived. These procedures are based on modern higher-order asymptotic statistical methods. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
      PubDate: 2015-05-31T21:02:51.971205-05:
      DOI: 10.1002/sta4.82
  • Modelling space–time varying ENSO teleconnections to droughts in
           North America
    • Authors: InKyung Choi; Bo Li, Hao Zhang, Yun Li
      Pages: 140 - 156
      Abstract: Teleconnection in atmospheric science refers to a significant correlation between climate anomalies in widely separated regions (typically thousands of kilometres), and it is often considered to be responsible for extreme weather conditions occurring simultaneously over large distances. In this paper, we study the influence of El Niño-Southern Oscillation teleconnection on meteorological droughts represented by the Palmer severity drought index across North America from 1870 to 1990. We develop a flexible statistical framework based on spatial random effects to model the covariance (teleconnection) between winter (October–March) sea surface temperature in the tropical Pacific and summer (June–August) droughts in North America. Our model allows us to analyse the dynamic pattern of teleconnection over space and time, and results indicate that the influence of El Niño-Southern Oscillation teleconnections on droughts varies spatially and temporally across North America. We further provide the time-varying teleconnection estimates with their uncertainties for 12 subregions in North America. Copyright ©2015 John Wiley & Sons, Ltd.
      PubDate: 2015-06-09T19:30:07.090421-05:
      DOI: 10.1002/sta4.85
  • Preconditioning for classical relationships: a note relating ridge
    • Authors: Karl Rohe
      Pages: 157 - 166
      Abstract: When the design matrix has orthonormal columns, “soft thresholding” the ordinary least squares solution produces the Lasso solution. If one uses the Puffer preconditioned Lasso, then this result generalizes from orthonormal designs to full rank designs (Theorem 1). Theorem 2 refines the Puffer preconditioner to make the Lasso select the same model as removing the elements of the ordinary least squares solution with the largest p-values. Using a generalized Puffer preconditioner, Theorem 3 relates ridge regression to the preconditioned Lasso; this result is for the high-dimensional setting, p > n. Where the standard Lasso is akin to forward selection, Theorems 1, 2, and 3 suggest that the preconditioned Lasso is more akin to backward elimination. These results hold for sparse penalties beyond; for a broad class of sparse and non-convex techniques (e.g. SCAD and MC+), the results hold for all local minima. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-06-09T19:30:52.330174-05:
      DOI: 10.1002/sta4.86
  • Covariance models on the surface of a sphere: when does it matter'
    • Authors: Jaehong Jeong; Mikyoung Jun
      Pages: 167 - 182
      Abstract: There is a growing interest in developing covariance functions for processes on the surface of a sphere because of the wide availability of data on the globe. Utilizing the one-to-one mapping between the Euclidean distance and the great circle distance, isotropic and positive definite functions in a Euclidean space can be used as covariance functions on the surface of a sphere. This approach, however, may result in physically unrealistic distortion on the sphere especially for large distances. We consider several classes of parametric covariance functions on the surface of a sphere, defined with either the great circle distance or the Euclidean distance, and investigate their impact upon spatial prediction. We fit several isotropic covariance models to simulated data as well as real data from National Center for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis on the sphere. We demonstrate that covariance functions originally defined with the Euclidean distance may not be adequate for some global data. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-06-10T20:22:26.94822-05:0
      DOI: 10.1002/sta4.84
  • Accelerated non-parametrics for cascades of Poisson processes
    • Authors: Chris J. Oates
      Pages: 183 - 195
      Abstract: Cascades of Poisson processes are probabilistic models for spatio-temporal phenomena in which (i) previous events may trigger subsequent events and (ii) both the background and triggering processes are conditionally Poisson. Such phenomena are typically “data rich but knowledge poor,” in the sense that large datasets are available, yet a mechanistic understanding of the background and triggering processes that generate the data is unavailable. In these settings, non-parametric estimation plays a central role. However, existing non-parametric estimators have computational and storage complexity O(N2), precluding their application on large datasets. Here, by assuming the triggering process acts only locally, we derive non-parametric estimators with computational complexity O(NlogN) and storage complexity O(N). Our approach automatically learns the domain of the triggering process from data and is essentially free from hyperparameters. The methodology is applied to a large seismic dataset where estimation under existing algorithms would be infeasible. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-08-06T12:32:10.741911-05:
      DOI: 10.1002/sta4.87
  • Figures of merit for simultaneous inference and comparisons in simulation
    • Authors: Noel Cressie; Sandy Burden
      Pages: 196 - 211
      Abstract: This article considers the traditional figures of merit, namely, bias and mean squared (prediction) error, which are typically used to evaluate simulation experiments. We propose functions of them that account for different variables' units; these alternative figures of merit are closely tied to simultaneous multivariate inference on an unknown parameter vector or unknown state vector. Their usefulness is illustrated in a simulation experiment, where the goal is to determine the statistical properties associated with prediction of a multivariate state. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-08-06T22:39:50.972665-05:
      DOI: 10.1002/sta4.88
  • Longitudinal functional data analysis
    • Authors: So Young Park; Ana-Maria Staicu
      Pages: 212 - 226
      Abstract: We consider dependent functional data that are correlated because of a longitudinal-based design: each subject is observed at repeated times and at each time, a functional observation (curve) is recorded. We propose a novel parsimonious modelling framework for repeatedly observed functional observations that allows to extract low-dimensional features. The proposed methodology accounts for the longitudinal design, is designed to study the dynamic behaviour of the underlying process, allows prediction of full future trajectory and is computationally fast. Theoretical properties of this framework are studied, and numerical investigations confirm excellent behaviour in finite samples. The proposed method is motivated by and applied to a diffusion tensor imaging study of multiple sclerosis. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-08-24T21:53:57.461007-05:
      DOI: 10.1002/sta4.89
  • Zeros and ones: a case for suppressing zeros in sensitive count data with
           an application to stroke mortality
    • Authors: Harrison Quick; Scott H. Holan, Christopher K. Wikle
      Pages: 227 - 234
      Abstract: In the current era of global internet connectivity, privacy concerns are of the utmost importance. When official statistical agencies collect spatially referenced, confidential data that they intend to release as public-use files, the suppression of small counts is a common measure that agencies take to protect the confidentiality of the data-subjects from ill-intentioned users. The goal of this paper is to demonstrate that an interval suppression criterion that does not suppress zeros can fail to protect regions with a single occurrence. We illustrate the difference in disclosure risk between an interval suppression criterion and a one-sided suppression criterion by considering a US county-level dataset composed of the number of deaths due to stroke in White men. Here, we illustrate that an interval suppression criterion leads to a twofold increase in the disclosure risk when compared with a one-sided suppression criterion for regions with a single incidence among a population of less than 600. We conclude with an extension of these findings beyond stroke mortality and by offering general guidelines for data suppression. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-09-21T21:52:47.016667-05:
      DOI: 10.1002/sta4.92
  • Examining statistical disclosure issues involving digital images of ROC
    • Authors: Gregory J. Matthews; Ofer Harel
      Pages: 235 - 245
      Abstract: It has been established that knowing the true values of the empirical receiver operating characteristic (ROC) curve (i.e. false-positive and true-positive rate pairs for all thresholds) along with a subset of the full data set consisting of n − 1 observations can cause unwanted disclosures. Here, we explore a similar problem with two main extensions. First, rather than knowledge of the true values of the empirical ROC curve, we start only with an image of the empirical ROC curve. Second, rather than considering only subsets of n − 1, we look at several differently sized subsets. Given this information (i.e. empirical ROC image and a subset of the full data set), we experimentally act as a data snooper and explore what can be learned about unobserved portions of the full data set. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-10-01T22:39:51.116117-05:
      DOI: 10.1002/sta4.93
  • The perils of quasi-likelihood information criteria
    • Authors: Yishu Wang; Orla Murphy, Maxime Turgeon, ZhuoYu Wang, Sahir R. Bhatnagar, Juliana Schulz, Erica E. M. Moodie
      Pages: 246 - 254
      Abstract: In this paper, we consider some potential pitfalls of the growing use of quasi-likelihood-based information criteria for longitudinal data to select a working correlation structure in a generalized estimating equation framework. In particular, we examine settings where the fully conditional mean does not equal the marginal mean as well as hypothesis testing following selection of the working correlation matrix. Our results suggest that the use of any information criterion for selection of the working correlation matrix is inappropriate when the conditional mean model assumption is violated. We also find that type I error differs from the nominal level in moderate sample sizes following selection of the form of the working correlation but improves as sample size is increased as the selection is then concentrated on a single correlation structure. Our results serve to underline the potential dangers that can arise when using information criteria to select correlation structure in routine data analysis. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-10-04T21:37:34.138179-05:
      DOI: 10.1002/sta4.95
  • Spatio-temporal change of support with application to American Community
           Survey multi-year period estimates
    • Authors: Jonathan R. Bradley; Christopher K. Wikle, Scott H. Holan
      Pages: 255 - 270
      Abstract: We present hierarchical Bayesian methodology to perform spatio-temporal change of support (COS) for survey data with Gaussian sampling errors. This methodology is motivated by the American Community Survey (ACS), which is an ongoing survey administered by the US Census Bureau that provides timely information on several key demographic variables. The ACS has published 1-year, 3-year, and 5-year period estimates, and margins of errors, for demographic and socio-economic variables recorded over predefined geographies. The spatio-temporal COS methodology considered here provides data users with a way to estimate ACS variables on customized geographies and time periods while accounting for sampling errors. Additionally, 3-year ACS period estimates are to be discontinued, and this methodology can provide predictions of ACS variables for 3-year periods given the available period estimates. The methodology is based on a spatio-temporal mixed-effects model with a low-dimensional spatio-temporal basis function representation, which provides multi-resolution estimates through basis function aggregation in space and time. This methodology includes a novel parameterization that uses a target dynamical process and recently proposed parsimonious Moran's I propagator structures. Our approach is demonstrated through two applications using public-use ACS estimates and is shown to produce good predictions on a hold-out set of 3-year period estimates. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-10-06T21:33:58.461836-05:
      DOI: 10.1002/sta4.94
  • The role of regimes in short-term wind speed forecasting at multiple wind
    • Authors: Karen Kazor; Amanda S. Hering
      Pages: 271 - 290
      Abstract: Large-scale integration of wind energy into electric utility systems requires accurate short-term wind speed forecasts. At these horizons, statistical models that account for spatial and temporal information have demonstrated improved accuracy over both physical models and statistical models that ignore spatial information. Off-site information can be incorporated by modelling wind speeds conditional on a set of regimes that capture the predominant wind patterns within a geographic region. Identifying these regimes is a crucial model-building step. Herein, we propose a new forecasting method that relies on regimes identified by fitting a Gaussian mixture model (GMM) to the wind vector, and we build regimes based on a single site, a local average of sites, and a region-wide average. We compare the performance of the models with GMM-identified regimes with three state-of-the-art reference models that each account for wind regimes differently. The models are evaluated at 30-minute, 1-hour, and 2-hour ahead horizons at ten sites across the Pacific Northwest. GMM regimes based on local information produce the best forecasts and have a significantly improved accuracy at a region-wide level over the state-of-the-art models. Even greater improvements are achieved when an average of the forecasts produced by each method is constructed. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-10-26T23:42:29.207529-05:
      DOI: 10.1002/sta4.91
  • A mutual information approach to calculating nonlinearity
    • Authors: Reginald Smith
      Pages: 291 - 303
      Abstract: A new method to measure nonlinear dependence between two variables is described using mutual information to analyse the separate linear and nonlinear components of dependence. This technique, which gives an exact value for the proportion of linear dependence, is then compared with another common test for linearity, the Brock, Dechert and Scheinkman test. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-11-24T21:17:27.580302-05:
      DOI: 10.1002/sta4.96
  • Parallel Markov chain Monte Carlo for non-Gaussian posterior distributions
    • Authors: Alexey Miroshnikov; Zheng Wei, Erin Marie Conlon
      Pages: 304 - 319
      Abstract: Recent developments in big data and analytics research have produced an abundance of large data sets that are too big to be analysed in their entirety, because of limits on computer memory or storage capacity. To address these issues, communication-free parallel Markov chain Monte Carlo methods have been developed for Bayesian analysis of big data. These methods partition data into manageable subsets, perform independent Bayesian Markov chain Monte Carlo analysis on each subset and combine the subset posterior samples to estimate the full data posterior. Current approaches to combining subset posterior samples include sample averaging, weighted averaging and kernel smoothing techniques. Although these methods work well for Gaussian posteriors, they are not well suited to non-Gaussian posterior distributions. Here, we develop a new direct density product method for combining subset marginal posterior samples to estimate full data marginal posteriors. Using commonly implemented distance metrics, we show in simulation studies of Bayesian models with non-Gaussian posteriors that our method outperforms the existing methods in approximating the full data marginal posteriors. Because our method estimates only marginal densities, there is no limitation on the number of model parameters analysed. Our procedure is suitable for Bayesian models with unknown parameters with fixed dimension in continuous parameter spaces. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-12-02T00:45:29.715878-05:
      DOI: 10.1002/sta4.97
  • Flexible link functions in a joint model of binary and longitudinal data
    • Authors: Dan Li; Xia Wang, Seongho Song, Nanhua Zhang, Dipak K. Dey
      Pages: 320 - 330
      Abstract: Joint models of binary primary endpoint and a longitudinal continuous process have been proposed when their association is of interest. The dependence between these two submodels can be characterized by introducing a common set of latent random effects. An important consideration that has been less investigated is to choose appropriate link functions for the binary primary endpoint in this joint model. We introduce two families of flexible link functions based on the generalized extreme value distribution and the symmetric power logit distribution. Our work is the first to investigate the importance of an appropriate and flexible link function in improving the estimation and prediction of a Bayesian joint model. Markov chain Monte Carlo is used for the posterior computation and inference. Flexibility and gains of the proposed joint model are demonstrated through detailed studies on simulated data sets and a real data example. Copyright © 2015 John Wiley & Sons, Ltd.
      PubDate: 2015-12-15T22:33:42.709875-05:
      DOI: 10.1002/sta4.98
  • Wiley-Blackwell Announces Launch of Stat – The ISI's Journal for the
           Rapid Dissemination of Statistics Research
    • PubDate: 2012-04-17T04:34:14.600281-05:
      DOI: 10.1002/sta4.1
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2015