HOME > Journal Current TOC
Stat [SJR: 0.985] [H-I: 5] [1 followers] Follow Hybrid journal (It can contain Open Access articles) ISSN (Online) 2049-1573 Published by John Wiley and Sons [1611 journals] |
- Asymptotic properties of adaptive group Lasso for sparse reduced rank
regression- Authors: Kejun He; Jianhua Z. Huang
Abstract: This paper studies the asymptotic properties of the penalized least squares estimator using an adaptive group Lasso penalty for the reduced rank regression. The group Lasso penalty is defined in the way that the regression coefficients corresponding to each predictor are treated as one group. It is shown that under certain regularity conditions, the estimator can achieve the minimax optimal rate of convergence. Moreover, the variable selection consistency can also be achieved, that is, the relevant predictors can be identified with probability approaching one. In the asymptotic theory, the number of response variables, the number of predictors and the rank number are allowed to grow to infinity with the sample size. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-10-18T03:15:26.676536-05:
DOI: 10.1002/sta4.123
- Authors: Kejun He; Jianhua Z. Huang
- Revisiting inference of coefficient of variation: nuisances parameters
- Authors: Saeid Amiri
Abstract: This paper studies the problem of the statistical inference of coefficient of variation from a different point of view. While the coefficient of variation is well recognized as being pertinent to the interpretation of variability, its application in statistical inference is nevertheless difficult. This work shows that using the estimates of kurtosis and skewness can draw more accurate inference, even under the condition of known distribution. This work investigates a new type of estimator based on an indirect estimate; it is shown when the estimate of the mean in the denominator approaches zero the plug-in estimate overestimates in instances where the coefficient of variation is larger than one, but the indirect estimate provides a more accurate estimate. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-10-04T02:47:56.881124-05:
DOI: 10.1002/sta4.116
- Authors: Saeid Amiri
- Longitudinal functional additive model with continuous proportional
outcomes for physical activity data- Authors: Haocheng Li; Sarah Kozey Keadle, Victor Kipnis, Raymond J. Carroll
Abstract: Motivated by physical activity data obtained from the BodyMedia FIT device (http://www.bodymedia.com), we take a functional data approach for longitudinal studies with continuous proportional outcomes. The functional structure depends on three factors. In our three-factor model, the regression structures are specified as curves measured at various factor points with random effects that have a correlation structure. The random curve for the continuous factor is summarized using a few important principal components. The difficulties in handling the continuous proportion variables are solved by using a quasilikelihood-type approximation. We develop an efficient algorithm to fit the model, which involves the selection of the number of principal components. The method is evaluated empirically by a simulation study. This approach is applied to the BodyMedia data with 935 males and 84 consecutive days of observation, for a total of 78,540 observations. We show that sleep efficiency increases with increasing physical activity, while its variance decreases at the same time. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-10-04T02:44:04.291981-05:
DOI: 10.1002/sta4.121
- Authors: Haocheng Li; Sarah Kozey Keadle, Victor Kipnis, Raymond J. Carroll
- Multivariate spatio-temporal survey fusion with application to the
American Community Survey and Local Area Unemployment Statistics- Authors: Jonathan R. Bradley; Scott H. Holan, Christopher K. Wikle
Abstract: There are often multiple surveys available that estimate and report related demographic variables of interest that are referenced over space and/or time. Not all surveys produce the same information, and thus, combining these surveys typically leads to higher quality estimates. That is, not every survey has the same level of precision nor do they always provide estimates of the same variables. In addition, various surveys often produce estimates with incomplete spatio-temporal coverage. By combining surveys using a Bayesian approach, we can account for different margins of error and leverage dependencies to produce estimates of every variable considered at every spatial location and every time point. Specifically, our strategy is to use a hierarchical modelling approach, where the first stage of the model incorporates the margin of error associated with each survey. Then, in a lower stage of the hierarchical model, the multivariate spatio-temporal mixed effects model is used to incorporate multivariate spatio-temporal dependencies of the processes of interest. We adopt a fully Bayesian approach for combining surveys; that is, given all of the available surveys, the conditional distributions of the latent processes of interest are used for statistical inference. To demonstrate our proposed methodology, we jointly analyze period estimates from the US Census Bureau's American Community Survey, and estimates obtained from the Bureau of Labor Statistics Local Area Unemployment Statistics program. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-10-03T01:41:02.423685-05:
DOI: 10.1002/sta4.120
- Authors: Jonathan R. Bradley; Scott H. Holan, Christopher K. Wikle
- Structure discovery and parametrically guided regression
- Authors: Takuma Yoshida
Abstract: In regression analysis, a parametric model is often assumed from prior information or a pilot study. If the model assumption is valid, the parametric method is useful. However, the efficiency of the estimator is not guaranteed when a poor model is selected. This article aims to check whether the model assumption is correct or not and to estimate the regression function. To achieve this, we propose a hybrid technique of parametrically guided method and group lasso. First, the parametric model is prepared. The parametrically guided estimator is constructed by summing the parametric estimator and nonparametric estimator. For the estimation of the nonparametric component, we use B-splines and the group lasso method. If the nonparametric component is estimated to be a zero function, the parametrically guided estimator is reduced to the parametric estimator. Then, we can decide that the parametric model assumption is correct. If the nonparametric estimator is not zero, the semiparametric estimator is obtained. Thus, the proposed method discovers the model structure and estimates the regression function simultaneously. We investigate the asymptotic properties of the proposed estimator. A simulation study and real data example are presented. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-10-03T01:37:52.102184-05:
DOI: 10.1002/sta4.118
- Authors: Takuma Yoshida
- A data‐driven approach to conditional screening of
high‐dimensional variables- Authors: Hyokyoung G. Hong; Lan Wang, Xuming He
Abstract: Marginal screening is a widely applied technique to handily reduce the dimensionality of the data when the number of potential features overwhelms the sample size. Because of the nature of the marginal screening procedures, they are also known for their difficulty in identifying the so‐called hidden variables that are jointly important but have weak marginal associations with the response variable. Failing to include a hidden variable in the screening stage has two undesirable consequences: (1) important features are missed out in model selection, and (2) biased inference is likely to occur in the subsequent analysis. Motivated by some recent work in conditional screening, we propose a data‐driven conditional screening algorithm, which is computationally efficient, enjoys the sure screening property under weaker assumptions on the model and works robustly in a variety of settings to reduce false negatives of hidden variables. Numerical comparison with alternatives screening procedures is also made to shed light on the relative merit of the proposed method. We illustrate the proposed methodology using a leukaemia microarray data example. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-07-13T22:20:41.391191-05:
DOI: 10.1002/sta4.115
- Authors: Hyokyoung G. Hong; Lan Wang, Xuming He
- Hidden Gibbs random fields model selection using Block Likelihood
Information Criterion- Authors: Julien Stoehr; Jean‐Michel Marin, Pierre Pudlo
Abstract: Performing model selection between Gibbs random fields is a very challenging task. Indeed, because of the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. Furthermore, such unobserved fields cannot be integrated out, and the likelihood evaluation is a doubly intractable problem. This forms a central issue to pick the model that best fits an observed data. We introduce a new approximate version of the Bayesian Information Criterion. We partition the lattice into contiguous rectangular blocks, and we approximate the probability measure of the hidden Gibbs field by the product of some Gibbs distributions over the blocks. On that basis, we estimate the likelihood and derive the Block Likelihood Information Criterion (BLIC) that answers model choice questions such as the selection of the dependence structure or the number of latent states. We study the performances of BLIC for those questions. In addition, we present a comparison with ABC algorithms to point out that the novel criterion offers a better trade‐off between time efficiency and reliable results. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-04-21T22:06:41.645376-05:
DOI: 10.1002/sta4.112
- Authors: Julien Stoehr; Jean‐Michel Marin, Pierre Pudlo
- Point pattern analysis on a region of a sphere
- Authors: Thomas Lawrence; Adrian Baddeley, Robin K. Milne, Gopalan Nair
Abstract: We develop statistical methods for analysing a pattern of points on a region of the sphere, including intensity modelling and estimation, summary functions such as the K function, point process models, and model‐fitting techniques. The methods are demonstrated by analysing a dataset giving the sky positions of galaxies. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-04-13T20:50:46.325464-05:
DOI: 10.1002/sta4.108
- Authors: Thomas Lawrence; Adrian Baddeley, Robin K. Milne, Gopalan Nair
- Generalized Tikhonov regularization in estimation of ordinary differential
equations models- Authors: Ivan Vujačić; Seyed Mahdi Mahmoudi, Ernst Wit
Abstract: We consider estimation of parameters in models defined by systems of ordinary differential equations (ODEs). This problem is important because many processes in different fields of science are modelled by systems of ODEs. Various estimation methods based on smoothing have been suggested to bypass numerical integration of the ODE system. In this paper, we do not propose another method based on smoothing but show how some of the existing ones can be brought together under one unifying framework. The framework is based on generalized Tikhonov regularization and extremum estimation. We define an approximation of the ODE solution by viewing the system of ODEs as an operator equation and exploiting the connection with regularization theory. Combining the introduced regularized solution with an extremum criterion function provides a general framework for estimating parameters in ODEs, which can handle partially observed systems. If the extremum criterion function is the negative log‐likelihood, then suitable regularized solutions yield estimators that are consistent and asymptotically efficient. The well‐known generalized profiling procedure fits into the proposed framework. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-04-07T00:21:12.77646-05:0
DOI: 10.1002/sta4.111
- Authors: Ivan Vujačić; Seyed Mahdi Mahmoudi, Ernst Wit
- Interactive graphics for functional data analyses
- Authors: Julia Wrobel; So Young Park, Ana Maria Staicu, Jeff Goldsmith
Abstract: Although there are established graphics that accompany the most common functional data analyses, generating these graphics for each dataset and analysis can be cumbersome and time‐consuming. Often, the barriers to visualization inhibit useful exploratory data analyses and prevent the development of intuition for a method and its application to a particular dataset. The refund.shiny package was developed to address these issues for several of the most common functional data analyses. After conducting an analysis, the plot_shiny() function is used to generate an interactive visualization environment that contains several distinct graphics, many of which are updated in response to user input. These visualizations reduce the burden of exploratory analyses and can serve as a useful tool for the communication of results to non‐statisticians. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-03-31T11:42:13.00238-05:0
DOI: 10.1002/sta4.109
- Authors: Julia Wrobel; So Young Park, Ana Maria Staicu, Jeff Goldsmith
- Uncovering smartphone usage patterns with multi‐view mixed
membership models- Authors: Seppo Virtanen; Mattias Rost, Alistair Morrison, Matthew Chalmers, Mark Girolami
Abstract: We present a novel class of mixed membership models for combining information from multiple data sources inferring inter‐view and intra‐view statistical associations. An important contemporary application of this work is the meaningful synthesis of data sources corresponding to smartphone application usage, app developers' descriptions and customer feedback. We demonstrate the ability of the model to infer meaningful, interpretable and informative app usage patterns based on the app usage data augmented with rich text data describing the apps. We provide quantitative model evaluations showing the model provides significantly better predictive ability than comparative related existing methods. © 2016 The
Authors . Stat Published by John Wiley & Sons Ltd
PubDate: 2016-01-21T19:05:00.383077-05:
DOI: 10.1002/sta4.103
- Authors: Seppo Virtanen; Mattias Rost, Alistair Morrison, Matthew Chalmers, Mark Girolami
- Correlated components
- Authors: Trevor F. Cox; David S. Arnold
Abstract: Principal components analysis is a much used and practical technique for analysing multivariate data, finding a particular set of linear compounds of the variables under consideration, such that covariances between all pairs are 0. An alternative view is that when the variables are considered as axes in a Cartesian coordinate system, then principal components analysis is the particular orthogonal rotation of the axes that makes all the pairwise covariances equal to 0. It is this view that is taken here, but instead of finding the rotation that makes all covariances equal to 0, an orthogonal rotation is found that maximizes the sum of the covariances. The rotation is not unique, except for the two or three component case, and so another criterion can be used alongside so that it too can also be optimized. The motivation is that two highly correlated components will tend to measure the same latent variable but with interesting differences because of the orthogonality between them. Theory is given for identifying the correlated components as well as algorithms for finding them. Two illustrative examples are provided, one involving gene expression data and the other consumer questionnaire data. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-01-17T21:11:35.251747-05:
DOI: 10.1002/sta4.99
- Authors: Trevor F. Cox; David S. Arnold
- A genome-wide association study of multiple longitudinal traits with
related subjects- Authors: Yubin Sung; Zeny Feng, Sanjeena Subedi
Abstract: Pleiotropy is a phenomenon that a single gene inflicts multiple correlated phenotypic effects, often characterized as traits, involving multiple biological systems. We propose a two-stage method to identify pleiotropic effects on multiple longitudinal traits from a family-based data set. The first stage analyses each longitudinal trait via a three-level mixed-effects model. Random effects at the subject-level and at the family-level measure the subject-specific genetic effects and between-subjects intraclass correlations within families, respectively. The second stage performs a simultaneous association test between a single nucleotide polymorphism and all subject-specific effects for multiple longitudinal traits. This is performed using a quasi-likelihood scoring method in which the correlation structure among related subjects is adjusted. Two simulation studies for the proposed method are undertaken to assess both the type I error control and the power. Furthermore, we demonstrate the utility of the two-stage method in identifying pleiotropic genes or loci by analyzing the Genetic Analysis Workshop 16 Problem 2 cohort data drawn from the Framingham Heart Study and illustrate an example of the kind of complexity in data that can be handled by the proposed approach. We establish that our two-stage method can identify pleiotropic effects while accommodating varying data types in the model. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-01-12T20:24:55.293787-05:
DOI: 10.1002/sta4.102
- Authors: Yubin Sung; Zeny Feng, Sanjeena Subedi
- Estimation, filtering and smoothing in the stochastic conditional duration
model: an estimating function approach- Authors: Ramanathan Thekke; Anuj Mishra, Bovas Abraham
Abstract: Stochastic conditional duration models are widely used in the financial econometrics literature to model the duration between transactions in a financial market. Even though there are developments in terms of modelling aspects, estimation, filtering and smoothing are still being investigated by researchers in this area. Almost all the existing procedures are highly computational intensive because of the complexity of the likelihood function. In this paper, we suggest a new procedure for estimation, filtering and smoothing in stochastic conditional duration models, based on estimating functions. Simulation studies indicate that the suggested procedure performs well and also fast in terms of computation. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-01-12T19:21:32.593463-05:
DOI: 10.1002/sta4.101
- Authors: Ramanathan Thekke; Anuj Mishra, Bovas Abraham
- Confidence bands for smoothness in nonparametric regression
- Authors: Julian Faraway
Abstract: The choice of the smoothing parameter in nonparametric regression is critical to the form of the estimated curve and any inference that follows. Many methods are available that will generate a single choice for this parameter. Here, we argue that the considerable uncertainty in this choice should be explicitly represented. The construction of standard simultaneous confidence bands in nonparametric regression often requires difficult mathematical arguments. We question their practical utility, presenting several deficiencies. We propose a new kind of confidence band that reflects the uncertainty regarding the smoothness of the estimate. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-01-10T23:24:47.684078-05:
DOI: 10.1002/sta4.100
- Authors: Julian Faraway
- Issue Information
- Pages: 1 - 3
Abstract: No abstract is available for this article.
PubDate: 2016-02-22T07:45:50.590087-05:
DOI: 10.1002/sta4.90
- Pages: 1 - 3
- Exploiting the quantile optimality ratio in finding confidence intervals
for quantiles- Authors: Luke A. Prendergast; Robert G. Staudte
Pages: 70 - 81
Abstract: A standard approach to confidence intervals for quantiles requires good estimates of the quantile density. The optimal bandwidth for kernel estimation of the quantile density depends on an underlying location‐scale family only through the quantile optimality ratio (QOR), which is the starting point for our results. While the QOR is not distribution‐free, it turns out that what is optimal for one family often works quite well for families having similar shape. This allows one to rely on a single representative QOR if one has a rough idea of the distributional shape. Another option that we explore assumes the data can be modelled by the highly flexible generalized lambda distribution (GLD), already studied by others, and we show that using the QOR for the estimated GLD can lead to more than competitive intervals. Effective confidence intervals for the difference between quantiles from independent populations is a byproduct. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-02-28T22:43:17.215023-05:
DOI: 10.1002/sta4.105
- Authors: Luke A. Prendergast; Robert G. Staudte
- A note on automatic data transformation
- Authors: Qing Feng; Jan Hannig, J. S. Marron
Pages: 82 - 87
Abstract: Modern data analysis frequently involves variables with highly non‐Gaussian marginal distributions. However, commonly used analysis methods are most effective with roughly Gaussian data. This paper introduces an automatic transformation that improves the closeness of distributions to normality. For each variable, a new family of parametrizations of the shifted logarithm transformation is proposed, which is unique in treating the data as real valued and in allowing transformation for both left and right skewness within the single family. This also allows an automatic selection of the parameter value (which is crucial for high‐dimensional data with many variables to transform) by minimizing the Anderson–Darling test statistic of the transformed data. An application to image features extracted from melanoma microscopy slides demonstrates the utility of the proposed transformation in addressing data with excessive skewness, heteroscedasticity and influential observations. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-03-01T22:21:34.435768-05:
DOI: 10.1002/sta4.104
- Authors: Qing Feng; Jan Hannig, J. S. Marron
- Variable selection in function‐on‐scalar regression
- Authors: Yakuan Chen; Jeff Goldsmith, R. Todd Ogden
Pages: 88 - 101
Abstract: For regression models with functional responses and scalar predictors, it is common for the number of predictors to be large. Despite this, few methods for variable selection exist for function‐on‐scalar models, and none account for the inherent correlation of residual curves in such models. By expanding the coefficient functions using a B‐spline basis, we pose the function‐on‐scalar model as a multivariate regression problem. Spline coefficients are grouped within coefficient function, and group‐minimax concave penalty is used for variable selection. We adapt techniques from generalized least squares to account for residual covariance by “pre‐whitening” using an estimate of the covariance matrix and establish theoretical properties for the resulting estimator. We further develop an iterative algorithm that alternately updates the spline coefficients and covariance; simulation results indicate that this iterative algorithm often performs as well as pre‐whitening using the true covariance and substantially outperforms methods that neglect the covariance structure. We apply our method to two‐dimensional planar reaching motions in a study of the effects of stroke severity on motor control and find that our method provides lower prediction errors than competing methods. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-03-02T20:33:49.341623-05:
DOI: 10.1002/sta4.106
- Authors: Yakuan Chen; Jeff Goldsmith, R. Todd Ogden
- On the smallest eigenvalues of covariance matrices of multivariate spatial
processes- Authors: François Bachoc; Reinhard Furrer
Pages: 102 - 107
Abstract: There has been a growing interest in providing models for multivariate spatial processes. A majority of these models specify a parametric matrix covariance function. Based on observations, the parameters are estimated by maximum likelihood or variants thereof. While the asymptotic properties of maximum likelihood estimators for univariate spatial processes have been analyzed in detail, maximum likelihood estimators for multivariate spatial processes have not received their deserved attention yet. In this article, we consider the classical increasing‐domain asymptotic setting restricting the minimum distance between the locations. Then, one of the main components to be studied from a theoretical point of view is the asymptotic positive definiteness of the underlying covariance matrix. Based on very weak assumptions on the matrix covariance function, we show that the smallest eigenvalue of the covariance matrix is asymptotically bounded away from zero. Several practical implications are discussed as well. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-03-02T20:41:38.13238-05:0
DOI: 10.1002/sta4.107
- Authors: François Bachoc; Reinhard Furrer
- Multinomial probit Bayesian additive regression trees
- Authors: Bereket P. Kindo; Hao Wang, Edsel A. Peña
Pages: 119 - 131
Abstract: This article proposes multinomial probit Bayesian additive regression trees (MPBART) as a multinomial probit extension of Bayesian additive regression trees. MPBART is flexible to allow inclusion of predictors that describe the observed units as well as the available choice alternatives. Through two simulation studies and four real data examples, we show that MPBART exhibits very good predictive performance in comparison with other discrete choice and multiclass classification methods. To implement MPBART, the R package mpbart is freely available from CRAN repositories. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-04-04T14:40:44.051215-05:
DOI: 10.1002/sta4.110
- Authors: Bereket P. Kindo; Hao Wang, Edsel A. Peña
- Visualization of robust L1PCA
- Authors: Yi‐Hui Zhou; J. S. Marron
Pages: 173 - 184
Abstract: Robust principal components are particularly challenging to find for high‐dimensional data sets, including genomic data. Conventional principal component analysis is often unduly influenced by a few closely related family members. This phenomenon is explained using the ideas of a high‐dimensional low sample size geometric representation. These ideas further show why the earlier robust method of spherical principal components fails to solve this problem. A solution is provided, which is called the visual L1 principal component analysis (VL1PCA). This approach is based on a backwards L1‐norm best‐fit idea. VL1PCA improves upon the best previous version of L1PCA by providing interpretable scores and a scatterplot visualization of the data. Another contribution is a new notion of robust centre, the backwards L1 median. The utility of VL1PCA is illustrated on examples and a real high‐dimensional data set. Our VL1PCA is not only robust to outliers but also gives a meaningful population stratification for data even in the presence of special family structure, when other methods fail. © 2016 The
Authors . Stat Published by John Wiley & Sons Ltd
PubDate: 2016-05-24T20:45:33.581379-05:
DOI: 10.1002/sta4.113
- Authors: Yi‐Hui Zhou; J. S. Marron
- Flexible functional regression methods for estimating individualized
treatment rules- Authors: Adam Ciarleglio; Eva Petkova, Thaddeus Tarpey, R. Todd Ogden
Pages: 185 - 199
Abstract: A major focus of personalized medicine is on the development of individualized treatment rules that depend upon baseline measures. Good decision rules have the potential to significantly advance patient care and reduce the burden of a host of diseases. Statistical methods for developing such rules are progressing rapidly, but few methods have considered the use of pretreatment functional data to guide decision‐making. Furthermore, those methods that do allow for the incorporation of functional pretreatment covariates typically make strong assumptions about the relationships between the functional covariates and the response of interest. We propose two approaches for using functional data to select an optimal treatment that address some of the shortcomings of previously developed methods. Specifically, we combine the flexibility of functional additive regression models with Q‐learning or A‐learning in order to obtain treatment decision rules. Properties of the corresponding estimators are discussed. Our approaches are evaluated in several realistic settings using synthetic data and are applied to data arising from a clinical trial comparing two treatments for major depressive disorder in which baseline imaging data are available for subjects who are subsequently treated. Copyright © 2016 John Wiley & Sons, Ltd.
PubDate: 2016-05-31T23:00:54.242258-05:
DOI: 10.1002/sta4.114
- Authors: Adam Ciarleglio; Eva Petkova, Thaddeus Tarpey, R. Todd Ogden
- Wiley-Blackwell Announces Launch of Stat – The ISI's Journal for the
Rapid Dissemination of Statistics Research- PubDate: 2012-04-17T04:34:14.600281-05:
DOI: 10.1002/sta4.1
- PubDate: 2012-04-17T04:34:14.600281-05: