Hybrid journal (It can contain Open Access articles) ISSN (Print) 0006-3444 - ISSN (Online) 1464-3510 Published by Oxford University Press[396 journals]

Authors:Xia Y; Cai T, Li H. Pages: 249 - 269 Abstract: SUMMARYMultivariate regression with high-dimensional covariates has many applications in genomic and genetic research, in which some covariates are expected to be associated with multiple responses. This paper considers joint testing for regression coefficients over multiple responses and develops simultaneous testing methods with false discovery rate control. The test statistic is based on inverse regression and bias-corrected group lasso estimates of the regression coefficients and is shown to have an asymptotic chi-squared null distribution. A row-wise multiple testing procedure is developed to identify the covariates associated with the responses. The procedure is shown to control the false discovery proportion and false discovery rate at a prespecified level asymptotically. Simulations demonstrate the gain in power, relative to entrywise testing, in detecting the covariates associated with the responses. The test is applied to an ovarian cancer dataset to identify the microRNA regulators that regulate protein expression. PubDate: Fri, 16 Feb 2018 00:00:00 GMT DOI: 10.1093/biomet/asx085 Issue No:Vol. 105, No. 2 (2018)

Authors:Avella-Medina M; Battey H, Fan J, et al. Pages: 271 - 284 Abstract: SUMMARYHigh-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded $2+\epsilon$ moments for $\epsilon\in (0,2)$. The associated convergence rates depend on $\epsilon$. PubDate: Tue, 27 Mar 2018 00:00:00 GMT DOI: 10.1093/biomet/asy011 Issue No:Vol. 105, No. 2 (2018)

Authors:Li W; Fearnhead P. Pages: 285 - 299 Abstract: SUMMARYMany statistical applications involve models for which it is difficult to evaluate the likelihood, but from which it is relatively easy to sample. Approximate Bayesian computation is a likelihood-free method for implementing Bayesian inference in such cases. We present results on the asymptotic variance of estimators obtained using approximate Bayesian computation in a large data limit. Our key assumption is that the data are summarized by a fixed-dimensional summary statistic that obeys a central limit theorem. We prove asymptotic normality of the mean of the approximate Bayesian computation posterior. This result also shows that, in terms of asymptotic variance, we should use a summary statistic that is of the same dimension as the parameter vector, $p$, and that any summary statistic of higher dimension can be reduced, through a linear transformation, to dimension $p$ in a way that can only reduce the asymptotic variance of the posterior mean. We look at how the Monte Carlo error of an importance sampling algorithm that samples from the approximate Bayesian computation posterior affects the accuracy of estimators. We give conditions on the importance sampling proposal distribution such that the variance of the estimator will be of the same order as that of the maximum likelihood estimator based on the summary statistics used. This suggests an iterative importance sampling algorithm, which we evaluate empirically on a stochastic volatility model. PubDate: Sat, 20 Jan 2018 00:00:00 GMT DOI: 10.1093/biomet/asx078 Issue No:Vol. 105, No. 2 (2018)

Authors:Li W; Fearnhead P. Pages: 301 - 318 Abstract: SUMMARYWe present asymptotic results for the regression-adjusted version of approximate Bayesian computation introduced by Beaumont et al. (2002). We show that for an appropriate choice of the bandwidth, regression adjustment will lead to a posterior that, asymptotically, correctly quantifies uncertainty. Furthermore, for such a choice of bandwidth we can implement an importance sampling algorithm to sample from the posterior whose acceptance probability tends to unity as the data sample size increases. This compares favourably to results for standard approximate Bayesian computation, where the only way to obtain a posterior that correctly quantifies uncertainty is to choose a much smaller bandwidth, one for which the acceptance probability tends to zero and hence for which Monte Carlo error will dominate. PubDate: Sat, 27 Jan 2018 00:00:00 GMT DOI: 10.1093/biomet/asx081 Issue No:Vol. 105, No. 2 (2018)

Authors:Yu C; Hoff P. Pages: 319 - 335 Abstract: SUMMARYCommonly used interval procedures for multigroup data attain their nominal coverage rates across a population of groups on average, but their actual coverage rate for a given group will be above or below the nominal rate, depending on the group mean. While correct coverage for a given group can be achieved with a standard $t$-interval, this approach is not adaptive to the available information about the distribution of group-specific means. In this article we construct confidence intervals that have a constant frequentist coverage rate and that make use of information about across-group heterogeneity, resulting in constant-coverage intervals that are narrower than standard $t$-intervals on average across groups. Such intervals are constructed by inverting biased Bayes-optimal tests for the mean of each group, where the prior distribution for a given group is estimated with data from the other groups. PubDate: Wed, 11 Apr 2018 00:00:00 GMT DOI: 10.1093/biomet/asy009 Issue No:Vol. 105, No. 2 (2018)

Authors:Fokianos K; Pitsillou M. Pages: 337 - 352 Abstract: SUMMARYWe introduce the matrix multivariate auto-distance covariance and correlation functions for time series, discuss their interpretation and develop consistent estimators for practical implementation. We also develop a test of the independent and identically distributed hypothesis for multivariate time series data and show that it performs better than the multivariate Ljung–Box test. We discuss computational aspects and present a data example to illustrate the method. PubDate: Sat, 20 Jan 2018 00:00:00 GMT DOI: 10.1093/biomet/asx082 Issue No:Vol. 105, No. 2 (2018)

Authors:Chang J; Delaigle A, Hall P, et al. Pages: 353 - 369 Abstract: SUMMARYData observed at a high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or smooth random function, and measurement error. Supposing that the latent component is an Itô diffusion process, we propose to estimate the measurement error density function by applying a deconvolution technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate-optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and an application to real data validate our analysis. PubDate: Tue, 06 Mar 2018 00:00:00 GMT DOI: 10.1093/biomet/asy006 Issue No:Vol. 105, No. 2 (2018)

Authors:Massam H; Li Q, Gao X. Pages: 371 - 388 Abstract: SUMMARYGraphical Gaussian models with edge and vertex symmetries were introduced by Højsgaard & Lauritzen (2008), who gave an algorithm for computing the maximum likelihood estimate of the precision matrix for such models. In this paper, we take a Bayesian approach to its estimation. We consider only models with symmetry constraints and which thus form a natural exponential family with the precision matrix as the canonical parameter. We identify the Diaconis–Ylvisaker conjugate prior for these models, develop a scheme to sample from the prior and posterior distributions, and thus obtain estimates of the posterior mean of the precision and covariance matrices. Such a sampling scheme is essential for model selection in coloured graphical Gaussian models. In order to verify the precision of our estimates, we derive an analytic expression for the expected value of the precision matrix when the graph underlying our model is a tree, a complete graph on three vertices, or a decomposable graph on four vertices with various symmetries, and we compare our estimates with the posterior mean of the precision matrix and the expected mean of the coloured graphical Gaussian model, that is, of the covariance matrix. We also verify the accuracy of our estimates on simulated data. PubDate: Thu, 22 Feb 2018 00:00:00 GMT DOI: 10.1093/biomet/asx084 Issue No:Vol. 105, No. 2 (2018)

Authors:Jung S; Lee M, Ahn J. Pages: 389 - 402 Abstract: SUMMARYWe consider how many components to retain in principal component analysis when the dimension is much higher than the number of observations. To estimate the number of components, we propose to sequentially test skewness of the squared lengths of residual scores that are obtained by removing leading principal components. The residual lengths are asymptotically left-skewed if all principal components with diverging variances are removed, and right-skewed otherwise. The proposed estimator is shown to be consistent, performs well in high-dimensional simulation studies, and provides reasonable estimates in examples. PubDate: Mon, 19 Mar 2018 00:00:00 GMT DOI: 10.1093/biomet/asy010 Issue No:Vol. 105, No. 2 (2018)

Authors:Diao G; Zeng D, Ke C, et al. Pages: 403 - 418 Abstract: SUMMARYComposite endpoints with censored data are commonly used as study outcomes in clinical trials. For example, progression-free survival is a widely used composite endpoint, with disease progression and death as the two components. Progression-free survival time is often defined as the time from randomization to the earlier occurrence of disease progression or death from any cause. The censoring times of the two components could be different for patients not experiencing the endpoint event. Conventional approaches, such as taking the minimum of the censoring times of the two components as the censoring time for progression-free survival time, may suffer from efficiency loss and could produce biased estimates of the treatment effect. We propose a new likelihood-based approach that decomposes the endpoints and models both the progression-free survival time and the time from disease progression to death. The censoring times for different components are distinguished. The approach makes full use of available information and provides a direct and improved estimate of the treatment effect on progression-free survival time. Simulations demonstrate that the proposed method outperforms several other approaches and is robust against various model misspecifications. An application to a prostate cancer clinical trial is provided. PubDate: Mon, 30 Apr 2018 00:00:00 GMT DOI: 10.1093/biomet/asy013 Issue No:Vol. 105, No. 2 (2018)

Authors:Fattorini L; Marcheselli M, Pisani C, et al. Pages: 419 - 429 Abstract: SUMMARYWe analyse the estimation of values of a survey variable throughout a continuum of points in a study area when a sample of points is selected by a probabilistic sampling scheme. At each point, the value is estimated using an inverse distance weighting interpolator, and maps of the survey variable can be obtained. We investigate the design-based asymptotic properties of the interpolator when the study area remains fixed and the number of sampled points approaches infinity, and we derive conditions ensuring design-based asymptotic unbiasedness and consistency. The conditions essentially require the existence of a pointwise or uniformly continuous function describing the behaviour of the survey variable and the use of spatially balanced designs to select points. Finally, we propose a computationally simple mean squared error estimator. PubDate: Wed, 11 Apr 2018 00:00:00 GMT DOI: 10.1093/biomet/asy012 Issue No:Vol. 105, No. 2 (2018)

Authors:Johndrow J; Lum K, Dunson D. Pages: 431 - 446 Abstract: SUMMARYThere has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few observations per cluster and a very large number of clusters. We show that the problem is fundamentally hard from a theoretical perspective and, even in idealized cases, accurate entity resolution is effectively impossible unless the number of entities is small relative to the number of records and/or the separation between records from different entities is extremely large. These results suggest conservatism in interpretation of the results of record linkage, support collection of additional data to more accurately disambiguate the entities, and motivate a focus on coarser inference. For example, results from a simulation study suggest that sometimes one may obtain accurate results for population size estimation even when fine-scale entity resolution is inaccurate. PubDate: Mon, 19 Mar 2018 00:00:00 GMT DOI: 10.1093/biomet/asy003 Issue No:Vol. 105, No. 2 (2018)

Authors:Mondal D. Pages: 447 - 454 Abstract: SUMMARYThis paper discusses edge correction for a large class of conditional and intrinsic autoregressions on two-dimensional finite regular arrays. The proposed method includes a novel reparameterization, retains the simple neighbourhood structure, ensures the nonnegative definiteness of the precision matrix, and enables scalable matrix-free statistical computation. The edge correction provides new insight into how higher-order differencing enters into the precision matrix of a conditional autoregression. PubDate: Wed, 25 Apr 2018 00:00:00 GMT DOI: 10.1093/biomet/asy014 Issue No:Vol. 105, No. 2 (2018)

Authors:Cronie O; Van Lieshout M. Pages: 455 - 462 Abstract: SUMMARYWe propose a new bandwidth selection method for kernel estimators of spatial point process intensity functions. The method is based on an optimality criterion motivated by the Campbell formula applied to the reciprocal intensity function. The new method is fully nonparametric, does not require knowledge of higher-order moments, and is not restricted to a specific class of point process. Our approach is computationally straightforward and does not require numerical approximation of integrals. PubDate: Fri, 16 Feb 2018 00:00:00 GMT DOI: 10.1093/biomet/asy001 Issue No:Vol. 105, No. 2 (2018)

Authors:Kong X; Xu S, Zhou W. Pages: 463 - 469 Abstract: SUMMARYVolatility functionals are widely used in financial econometrics. In the literature, they are estimated with realized volatility functionals using high-frequency data. In this paper we introduce a nonparametric local bootstrap method that resamples the high-frequency returns with replacement in local windows shrinking to zero. While the block bootstrap in time series (Hall et al., 1995) aims to reduce correlation, the local bootstrap is intended to eliminate the heterogeneity of volatility. We prove that the local bootstrap distribution of the studentized realized volatility functional is first-order accurate. We present Edgeworth expansions of the studentized realized variance with and without local bootstrapping in the absence of the leverage effect and jumps. The expansions show that the local bootstrap distribution of the studentized realized variance is second-order accurate. Extensive simulation studies verify the theory. PubDate: Thu, 01 Mar 2018 00:00:00 GMT DOI: 10.1093/biomet/asy002 Issue No:Vol. 105, No. 2 (2018)

Authors:Wang Y; Yang J, Xu H. Pages: 471 - 477 Abstract: SUMMARYMaximin distance designs and orthogonal designs are widely used in computer and physical experiments. We characterize a broad class of maximin distance designs by establishing new bounds on the minimum intersite distance for mirror-symmetric and general U-type designs. We show that maximin distance designs and orthogonal designs are closely related and coincide under some conditions. PubDate: Wed, 28 Feb 2018 00:00:00 GMT DOI: 10.1093/biomet/asy005 Issue No:Vol. 105, No. 2 (2018)

Authors:Zhao J Ma Y. Pages: 479 - 486 Abstract: SUMMARYTang et al. (2003) considered a regression model with missing response, where the missingness mechanism depends on the value of the response variable and hence is nonignorable. They proposed three pseudolikelihood estimators, based on different treatments of the probability distribution of the completely observed covariates. The first assumes the distribution of the covariate to be known, the second estimates this distribution parametrically, and the third estimates the distribution nonparametrically. While it is not hard to show that the second estimator is more efficient than the first, Tang et al. (2003) only conjectured that the third estimator is more efficient than the first two. In this paper, we investigate the asymptotic behaviour of the third estimator by deriving a closed-form representation of its asymptotic variance. We then prove that the third estimator is more efficient than the other two. Our result can be straightforwardly applied to missingness mechanisms that are more general than that in Tang et al. (2003). PubDate: Wed, 28 Feb 2018 00:00:00 GMT DOI: 10.1093/biomet/asy007 Issue No:Vol. 105, No. 2 (2018)

Authors:Yang S; Ding P. Pages: 487 - 493 Abstract: SUMMARYCausal inference with observational studies often relies on the assumptions of unconfoundedness and overlap of covariate distributions in different treatment groups. The overlap assumption is violated when some units have propensity scores close to $0$ or $1$, so both practical and theoretical researchers suggest dropping units with extreme estimated propensity scores. However, existing trimming methods often do not incorporate the uncertainty in this design stage and restrict inference to only the trimmed sample, due to the nonsmoothness of the trimming. We propose a smooth weighting, which approximates sample trimming and has better asymptotic properties. An advantage of our estimator is its asymptotic linearity, which ensures that the bootstrap can be used to make inference for the target population, incorporating uncertainty arising from both design and analysis stages. We extend the theory to the average treatment effect on the treated, suggesting trimming samples with estimated propensity scores close to $1$. PubDate: Mon, 12 Mar 2018 00:00:00 GMT DOI: 10.1093/biomet/asy008 Issue No:Vol. 105, No. 2 (2018)

Authors:Stallard N; Kimani P. Pages: 495 - 501 Abstract: SUMMARYMulti-arm multi-stage clinical trials compare several experimental treatments with a control treatment, with poorly performing treatments dropped at interim analyses. This leads to inferential challenges, including the construction of unbiased treatment effect estimators. A number of estimators which are unbiased conditional on treatment selection have been proposed, but are specific to certain selection rules, may ignore the comparison to the control and are not all minimum variance. We obtain estimators for treatment effects compared to the control that are uniformly minimum variance unbiased conditional on selection with any specified rule or stopping for futility. PubDate: Wed, 28 Feb 2018 00:00:00 GMT DOI: 10.1093/biomet/asy004 Issue No:Vol. 105, No. 2 (2018)