Hybrid journal (It can contain Open Access articles) ISSN (Print) 0006-3444 - ISSN (Online) 1464-3510 Published by Oxford University Press[396 journals]

Authors:Tian X; Loftus J, Taylor J. Pages: 755 - 768 Abstract: SummaryThere has been much recent work on inference after model selection in situations where the noise level is known. However, the error variance is rarely known in practice and its estimation is difficult in high-dimensional settings. In this work we propose using the square-root lasso, also known as the scaled lasso, to perform inference for selected coefficients and the noise level simultaneously. The square-root lasso has the property that the choice of a reasonable tuning parameter does not depend on the noise level in the data. We provide valid $p$-values and confidence intervals for coefficients after variable selection and estimates for the model-specific variance. Our estimators perform better in simulations than other estimators of the noise variance. These results make inference after model selection significantly more applicable. PubDate: Thu, 20 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy045 Issue No:Vol. 105, No. 4 (2018)

Authors:Tan K; Wang Z, Zhang T, et al. Pages: 769 - 782 Abstract: SummarySliced inverse regression is a popular tool for sufficient dimension reduction, which replaces covariates with a minimal set of their linear combinations without loss of information on the conditional distribution of the response given the covariates. The estimated linear combinations include all covariates, making results difficult to interpret and perhaps unnecessarily variable, particularly when the number of covariates is large. In this paper, we propose a convex formulation for fitting sparse sliced inverse regression in high dimensions. Our proposal estimates the subspace of the linear combinations of the covariates directly and performs variable selection simultaneously. We solve the resulting convex optimization problem via the linearized alternating direction methods of multiplier algorithm, and establish an upper bound on the subspace distance between the estimated and the true subspaces. Through numerical studies, we show that our proposal is able to identify the correct covariates in the high-dimensional setting. PubDate: Mon, 22 Oct 2018 00:00:00 GMT DOI: 10.1093/biomet/asy049 Issue No:Vol. 105, No. 4 (2018)

Authors:Proietti T; Giovannelli A. Pages: 783 - 795 Abstract: SummaryThe autocovariance matrix of a stationary random process plays a central role in prediction theory and time series analysis. When the dimension of the matrix is of the same order of magnitude as the number of observations, the sample autocovariance matrix gives an inconsistent estimator. In the nonparametric framework, recent proposals have concentrated on banding and tapering the sample autocovariance matrix. We introduce an alternative approach via a modified Durbin–Levinson algorithm that receives as input the banded and tapered sample partial autocorrelations and returns a consistent and positive-definite estimator of the autocovariance matrix. We establish the convergence rate of our estimator and characterize the properties of the optimal linear predictor obtained from it. The computational complexity of the latter is of the order of the square of the banding parameter, which renders our method scalable for high-dimensional time series. PubDate: Mon, 17 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy042 Issue No:Vol. 105, No. 4 (2018)

Authors:McCullagh P; Polson N. Pages: 797 - 814 Abstract: SummaryThe main contribution of this paper is a mathematical definition of statistical sparsity, which is expressed as a limiting property of a sequence of probability distributions. The limit is characterized by an exceedance measure $H$ and a rate parameter $\rho > 0$, both of which are unrelated to sample size. The definition encompasses all sparsity models that have been suggested in the signal-detection literature. Sparsity implies that $\rho$ is small, and a sparse approximation is asymptotic in the rate parameter, typically with error $o(\rho)$ in the sparse limit $\rho \to 0$. To first order in sparsity, the sparse signal plus Gaussian noise convolution depends on the signal distribution only through its rate parameter and exceedance measure. This is one of several asymptotic approximations implied by the definition, each of which is most conveniently expressed in terms of the zeta transformation of the exceedance measure. One implication is that two sparse families having the same exceedance measure are inferentially equivalent and cannot be distinguished to first order. Thus, aspects of the signal distribution that have a negligible effect on observables can be ignored with impunity, leaving only the exceedance measure to be considered. From this point of view, scale models and inverse-power measures seem particularly attractive. PubDate: Mon, 22 Oct 2018 00:00:00 GMT DOI: 10.1093/biomet/asy051 Issue No:Vol. 105, No. 4 (2018)

Authors:Lynch B; Chen K. Pages: 815 - 831 Abstract: SummaryThis paper concerns the modelling of multi-way functional data where double or multiple indices are involved. We introduce a concept of weak separability. The weakly separable structure supports the use of factorization methods that decompose the signal into its spatial and temporal components. The analysis reveals interesting connections to the usual strongly separable covariance structure, and provides insights into tensor methods for multi-way functional data. We propose a formal test for the weak separability hypothesis, where the asymptotic null distribution of the test statistic is a chi-squared-type mixture. The method is applied to study brain functional connectivity derived from source localized magnetoencephalography signals during motor tasks. PubDate: Thu, 27 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy048 Issue No:Vol. 105, No. 4 (2018)

Authors:Eckley I; Nason G. Pages: 833 - 848 Abstract: SummaryAliasing is often overlooked in time series analysis but can seriously distort the spectrum, the autocovariance and their estimates. We show that dyadic subsampling of a locally stationary wavelet process, which can cause aliasing, results in a process that is the sum of asymptotic white noise and another locally stationary wavelet process with a modified spectrum. We develop a test for the absence of aliasing in a locally stationary wavelet series at a fixed location, and illustrate its application on simulated data and a wind energy time series. A useful by-product is a new test for local white noise. The tests are robust with respect to model misspecification in that the analysis and synthesis wavelets do not need to be identical. Hence, in principle, the tests work irrespective of which wavelet is used to analyse the time series, although in practice there is a trade-off between increasing statistical power and time localization of the test. PubDate: Mon, 24 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy040 Issue No:Vol. 105, No. 4 (2018)

Authors:Basse G; Airoldi E. Pages: 849 - 858 Abstract: SummaryIn this paper we consider how to assign treatment in a randomized experiment in which the correlation among the outcomes is informed by a network available pre-intervention. Working within the potential outcome causal framework, we develop a class of models that posit such a correlation structure among the outcomes. We use these models to develop restricted randomization strategies for allocating treatment optimally, by minimizing the mean squared error of the estimated average treatment effect. Analytical decompositions of the mean squared error, due both to the model and to the randomization distribution, provide insights into aspects of the optimal designs. In particular, the analysis suggests new notions of balance based on specific network quantities, in addition to classical covariate balance. The resulting balanced optimal restricted randomization strategies are still design-unbiased when the model used to derive them does not hold. We illustrate how the proposed treatment allocation strategies improve on allocations that ignore the network structure. PubDate: Mon, 06 Aug 2018 00:00:00 GMT DOI: 10.1093/biomet/asy036 Issue No:Vol. 105, No. 4 (2018)

Authors:Wang L; Van Keilegom I, Maidman A. Pages: 859 - 872 Abstract: SummaryWe consider a heteroscedastic regression model in which some of the regression coefficients are zero but it is not known which ones. Penalized quantile regression is a useful approach for analysing such data. By allowing different covariates to be relevant for modelling conditional quantile functions at different quantile levels, it provides a more complete picture of the conditional distribution of a response variable than mean regression. Existing work on penalized quantile regression has been mostly focused on point estimation. Although bootstrap procedures have recently been shown to be effective for inference for penalized mean regression, they are not directly applicable to penalized quantile regression with heteroscedastic errors. We prove that a wild residual bootstrap procedure for unpenalized quantile regression is asymptotically valid for approximating the distribution of a penalized quantile regression estimator with an adaptive $L_1$ penalty and that a modified version can be used to approximate the distribution of a $L_1$-penalized quantile regression estimator. The new methods do not require estimation of the unknown error density function. We establish consistency, demonstrate finite-sample performance, and illustrate the applications on a real data example. PubDate: Tue, 14 Aug 2018 00:00:00 GMT DOI: 10.1093/biomet/asy037 Issue No:Vol. 105, No. 4 (2018)

Authors:Lee S Wu Y. Pages: 873 - 890 Abstract: SUMMARYWe propose a general bootstrap recipe for estimating the distributions of post-model-selection least squares estimators under a linear regression model. The recipe constrains residual bootstrapping within the most parsimonious, approximately correct, models to yield a distribution estimator which is consistent provided any wrong candidate model is sufficiently separated from the approximately correct ones. Our theory applies to a broad class of model selection methods based on information criteria or sparse estimation. The empirical performance of our procedure is illustrated with simulated data. PubDate: Tue, 25 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy046 Issue No:Vol. 105, No. 4 (2018)

Authors:Wei S; Kosorok M. Pages: 891 - 903 Abstract: SummaryWe propose a projection pursuit technique in survival analysis for finding lower-dimensional projections that exhibit differentiated survival outcomes. This idea is formally introduced as the change-plane Cox model, a nonregular Cox model with a change-plane in the covariate space that divides the population into two subgroups whose hazards are proportional. The proposed technique offers a potential framework for principled subgroup discovery. Estimation of the change-plane is accomplished via likelihood maximization over a data-driven sieve constructed using sliced inverse regression. Consistency of the sieve procedure for the change-plane parameters is established. In simulations the sieve estimator demonstrates better classification performance for subgroup identification than alternatives. PubDate: Wed, 17 Oct 2018 00:00:00 GMT DOI: 10.1093/biomet/asy050 Issue No:Vol. 105, No. 4 (2018)

Authors:Ryalen P; Stensrud M, Røysland K. Pages: 905 - 916 Abstract: SummaryTime-to-event outcomes are often evaluated on the hazard scale, but interpreting hazards may be difficult. Recently in the causal inference literature concerns have been raised that hazards actually have a built-in selection bias that prevents simple causal interpretations. This is a problem even in randomized controlled trials, where hazard ratios have become a standard measure of treatment effects. Modelling on the hazard scale is nevertheless convenient, for example to adjust for covariates; using hazards for intermediate calculations may therefore be desirable. In this paper we present a generic method for transforming hazard estimates consistently to other scales at which these built-in selection biases are avoided. The method is based on differential equations and generalizes a well-known relation between the Nelson–Aalen and Kaplan–Meier estimators. Using the martingale central limit theorem, we show that covariances can be estimated consistently for a large class of estimators, thus allowing for rapid calculation of confidence intervals. Hence, given cumulative hazard estimates based on, for example, Aalen’s additive hazard model, we can obtain many other parameters without much more effort. We give several examples and the associated estimators. Coverage and convergence speed are explored via simulations, and the results suggest that reliable estimates can be obtained in real-life scenarios. PubDate: Tue, 21 Aug 2018 00:00:00 GMT DOI: 10.1093/biomet/asy035 Issue No:Vol. 105, No. 4 (2018)

Authors:Li Q Li L. Pages: 917 - 930 Abstract: SummaryMultiple types of data measured on a common set of subjects arise in many areas. Numerous empirical studies have found that integrative analysis of such data can result in better statistical performance in terms of prediction and feature selection. However, the advantages of integrative analysis have mostly been demonstrated empirically. In the context of two-class classification, we propose an integrative linear discriminant analysis method and establish a theoretical guarantee that it achieves a smaller classification error than running linear discriminant analysis on each data type individually. We address the issues of outliers and missing values, frequently encountered in integrative analysis, and illustrate our method through simulations and a neuroimaging study of Alzheimer’s disease. PubDate: Mon, 22 Oct 2018 00:00:00 GMT DOI: 10.1093/biomet/asy047 Issue No:Vol. 105, No. 4 (2018)

Authors:Picard F; Reynaud-Bouret P, Roquain E. Pages: 931 - 944 Abstract: SummaryWe propose a continuous testing framework to test the intensities of Poisson processes that allows a rigorous definition of the complete testing procedure, from an infinite number of hypotheses to joint error rates. Our work extends procedures based on scanning windows by controlling the familywise error rate and the false discovery rate in a non-asymptotic manner and in a continuous way. We introduce the p-value process on which the decision rule is based. Our method is applied in neuroscience via the standard homogeneity and two-sample tests. PubDate: Tue, 18 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy044 Issue No:Vol. 105, No. 4 (2018)

Authors:Zhang X; Chiou J, Ma Y. Pages: 945 - 962 Abstract: SummaryPrediction is often the primary goal of data analysis. In this work, we propose a novel model averaging approach to the prediction of a functional response variable. We develop a crossvalidation model averaging estimator based on functional linear regression models in which the response and the covariate are both treated as random functions. We show that the weights chosen by the method are asymptotically optimal in the sense that the squared error loss of the predicted function is as small as that of the infeasible best possible averaged function. When the true regression relationship belongs to the set of candidate functional linear regression models, the averaged estimator converges to the true model and can estimate the regression parameter functions at the same rate as under the true model. Monte Carlo studies and a data example indicate that in most cases the approach performs better than model selection. PubDate: Wed, 26 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy041 Issue No:Vol. 105, No. 4 (2018)

Authors:Ertefaie A; Strawderman R. Pages: 963 - 977 Abstract: SummaryExisting methods for estimating optimal dynamic treatment regimes are limited to cases where a utility function is optimized over a fixed time period. We develop an estimation procedure for the optimal dynamic treatment regime over an indefinite time period and derive associated large-sample results. The proposed method can be used to estimate the optimal dynamic treatment regime in chronic disease settings. We illustrate this by simulating a dataset corresponding to a cohort of patients with diabetes that mimics the third wave of the National Health and Nutrition Examination Survey, and examining the performance of the proposed method in controlling the level of haemoglobin A1c. PubDate: Mon, 17 Sep 2018 00:00:00 GMT DOI: 10.1093/biomet/asy043 Issue No:Vol. 105, No. 4 (2018)

Authors:Forastiere L; Mattei A, Ding P. Pages: 979 - 986 Abstract: SummaryIn causal mediation analysis, the definitions of the natural direct and indirect effects involve potential outcomes that can never be observed, so-called a priori counterfactuals. This conceptual challenge translates into issues in identification, which requires strong and often unverifiable assumptions, including sequential ignorability. Alternatively, we can deal with post-treatment variables using the principal stratification framework, where causal effects are defined as comparisons of observable potential outcomes. We establish a novel bridge between mediation analysis and principal stratification, which helps to clarify and weaken the commonly used identifying assumptions for natural direct and indirect effects. Using principal stratification, we show how sequential ignorability extrapolates from observable potential outcomes to a priori counterfactuals, and propose alternative weaker principal ignorability-type assumptions. We illustrate the key concepts using a clinical trial. PubDate: Mon, 22 Oct 2018 00:00:00 GMT DOI: 10.1093/biomet/asy053 Issue No:Vol. 105, No. 4 (2018)

Authors:Miao W; Geng Z, Tchetgen Tchetgen E. Pages: 987 - 993 Abstract: SummaryWe consider a causal effect that is confounded by an unobserved variable, but for which observed proxy variables of the confounder are available. We show that with at least two independent proxy variables satisfying a certain rank condition, the causal effect can be nonparametrically identified, even if the measurement error mechanism, i.e., the conditional distribution of the proxies given the confounder, may not be identified. Our result generalizes the identification strategy of Kuroki & Pearl (2014), which rests on identification of the measurement error mechanism. When only one proxy for the confounder is available, or when the required rank condition is not met, we develop a strategy for testing the null hypothesis of no causal effect. PubDate: Mon, 13 Aug 2018 00:00:00 GMT DOI: 10.1093/biomet/asy038 Issue No:Vol. 105, No. 4 (2018)

Authors:Fogarty C. Pages: 994 - 1000 Abstract: SummaryIn paired randomized experiments, individuals in a given matched pair may differ on prognostically important covariates despite the best efforts of practitioners. We examine the use of regression adjustment to correct for persistent covariate imbalances after randomization, and present two regression-assisted estimators for the sample average treatment effect in paired experiments. Using the potential outcomes framework, we prove that these estimators are consistent for the sample average treatment effect under mild regularity conditions even if the regression model is improperly specified, and describe how asymptotically conservative confidence intervals can be constructed. We demonstrate that the variances of the regression-assisted estimators are no larger than that of the standard difference-in-means estimator asymptotically, and illustrate the proposed methods by simulation. The analysis does not require a superpopulation model, a constant treatment effect, or the truth of the regression model, and hence provides inference for the sample average treatment effect with the potential to increase power without unrealistic assumptions. PubDate: Fri, 29 Jun 2018 00:00:00 GMT DOI: 10.1093/biomet/asy034 Issue No:Vol. 105, No. 4 (2018)