Hybrid journal (It can contain Open Access articles) ISSN (Print) 0006-3444 - ISSN (Online) 1464-3510 Published by Oxford University Press[370 journals]

Authors:Zeng D; Gao F, Lin DY. First page: 505 Abstract: SummaryInterval-censored multivariate failure time data arise when there are multiple types of failure or there is clustering of study subjects and each failure time is known only to lie in a certain interval. We investigate the effects of possibly time-dependent covariates on multivariate failure times by considering a broad class of semiparametric transformation models with random effects, and we study nonparametric maximum likelihood estimation under general interval-censoring schemes. We show that the proposed estimators for the finite-dimensional parameters are consistent and asymptotically normal, with a limiting covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we develop an EM algorithm that converges stably for arbitrary datasets. Finally, we assess the performance of the proposed methods in extensive simulation studies and illustrate their application using data derived from the Atherosclerosis Risk in Communities Study. PubDate: 2017-07-12 DOI: 10.1093/biomet/asx029

Authors:Liu Y; Li P, Qin J. First page: 527 Abstract: SummaryCapture-recapture experiments are widely used to collect data needed for estimating the abundance of a closed population. To account for heterogeneity in the capture probabilities, Huggins (1989) and Alho (1990) proposed a semiparametric model in which the capture probabilities are modelled parametrically and the distribution of individual characteristics is left unspecified. A conditional likelihood method was then proposed to obtain point estimates and Wald-type confidence intervals for the abundance. Empirical studies show that the small-sample distribution of the maximum conditional likelihood estimator is strongly skewed to the right, which may produce Wald-type confidence intervals with lower limits that are less than the number of captured individuals or even are negative. In this paper, we propose a full empirical likelihood approach based on Huggins and Alho’s model. We show that the null distribution of the empirical likelihood ratio for the abundance is asymptotically chi-squared with one degree of freedom, and that the maximum empirical likelihood estimator achieves semiparametric efficiency. Simulation studies show that the empirical likelihood-based method is superior to the conditional likelihood-based method: its confidence interval has much better coverage, and the maximum empirical likelihood estimator has a smaller mean square error. We analyse three datasets to illustrate the advantages of our empirical likelihood approach. PubDate: 2017-07-03 DOI: 10.1093/biomet/asx038

Authors:Dai X; Müller H, Yao F. First page: 545 Abstract: SummaryBayes classifiers for functional data pose a challenge. One difficulty is that probability density functions do not exist for functional data, so the classical Bayes classifier using density quotients needs to be modified. We propose to use density ratios of projections onto a sequence of eigenfunctions that are common to the groups to be classified. The density ratios are then factorized into density ratios of individual projection scores, reducing the classification problem to obtaining a series of one-dimensional nonparametric density estimates. The proposed classifiers can be viewed as an extension to functional data of some of the earliest nonparametric Bayes classifiers that were based on simple density ratios in the one-dimensional case. By means of the factorization of the density quotients, the curse of dimensionality that would otherwise severely affect Bayes classifiers for functional data can be avoided. We demonstrate that in the case of Gaussian functional data, the proposed functional Bayes classifier reduces to a functional version of the classical quadratic discriminant. A study of the asymptotic behaviour of the proposed classifiers in the large-sample limit shows that under certain conditions the misclassification rate converges to zero, a phenomenon that has been referred to as perfect classification. The proposed classifiers also perform favourably in finite-sample settings, as we demonstrate through comparisons with other functional classifiers in simulations and various data applications, including spectral data, functional magnetic resonance imaging data from attention deficit hyperactivity disorder patients, and yeast gene expression data. PubDate: 2017-05-23 DOI: 10.1093/biomet/asx024

Authors:Molina JJ; Rotnitzky AA, Sued MM, et al. First page: 561 Abstract: SummaryWe consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct. PubDate: 2017-06-15 DOI: 10.1093/biomet/asx027

Authors:Huang M; Chan K. First page: 583 Abstract: SummaryThe estimation of treatment effects based on observational data usually involves multiple confounders, and dimension reduction is often desirable and sometimes inevitable. We first clarify the definition of a central subspace that is relevant for the efficient estimation of average treatment effects. A criterion is then proposed to simultaneously estimate the structural dimension, the basis matrix of the joint central subspace, and the optimal bandwidth for estimating the conditional treatment effects. The method can easily be implemented by forward selection. Semiparametric efficient estimation of average treatment effects can be achieved by averaging the conditional treatment effects with a different data-adaptive bandwidth to ensure optimal undersmoothing. Asymptotic properties of the estimated joint central subspace and the corresponding estimator of average treatment effects are studied. The proposed methods are applied to a nutritional study, where the covariate dimension is reduced from 11 to an effective dimension of one. PubDate: 2017-05-19 DOI: 10.1093/biomet/asx028

Authors:Wang L; Zhou X, Richardson TS. First page: 597 Abstract: SummaryIt is common in medical studies that the outcome of interest is truncated by death, meaning that a subject has died before the outcome could be measured. In this case, restricted analysis among survivors may be subject to selection bias. Hence, it is of interest to estimate the survivor average causal effect, defined as the average causal effect among the subgroup consisting of subjects who would survive under either exposure. In this paper, we consider the identification and estimation problems of the survivor average causal effect. We propose to use a substitution variable in place of the latent membership in the always-survivor group. The identification conditions required for a substitution variable are conceptually similar to conditions for a conditional instrumental variable, and may apply to both randomized and observational studies. We show that the survivor average causal effect is identifiable with use of such a substitution variable, and propose novel model parameterizations for estimation of the survivor average causal effect under our identification assumptions. Our approaches are illustrated via simulation studies and a data analysis. PubDate: 2017-07-11 DOI: 10.1093/biomet/asx034

Authors:Zhou Q; Min S. First page: 613 Abstract: SummaryQuantifying the uncertainty in penalized regression under group sparsity is an important open question. We establish, under a high-dimensional scaling, the asymptotic validity of a modified parametric bootstrap method for the group lasso, assuming a Gaussian error model and mild conditions on the design matrix and the true coefficients. Simulation of bootstrap samples provides simultaneous inferences on large groups of coefficients. Through extensive numerical comparisons, we demonstrate that our bootstrap method performs much better than popular competitors, highlighting its practical utility. The theoretical results generalize to other block norm penalization and sub-Gaussian errors, which further broadens the potential applications. PubDate: 2017-08-08 DOI: 10.1093/biomet/asx037

Authors:She YY; Chen KK. First page: 633 Abstract: SummaryIn high-dimensional multivariate regression problems, enforcing low rank in the coefficient matrix offers effective dimension reduction, which greatly facilitates parameter estimation and model interpretation. However, commonly used reduced-rank methods are sensitive to data corruption, as the low-rank dependence structure between response variables and predictors is easily distorted by outliers. We propose a robust reduced-rank regression approach for joint modelling and outlier detection. The problem is formulated as a regularized multivariate regression with a sparse mean-shift parameterization, which generalizes and unifies some popular robust multivariate methods. An efficient thresholding-based iterative procedure is developed for optimization. We show that the algorithm is guaranteed to converge and that the coordinatewise minimum point produced is statistically accurate under regularity conditions. Our theoretical investigations focus on non-asymptotic robust analysis, demonstrating that joint rank reduction and outlier detection leads to improved prediction accuracy. In particular, we show that redescending ψ-functions can essentially attain the minimax optimal error rate, and in some less challenging problems convex regularization guarantees the same low error rate. The performance of the proposed method is examined through simulation studies and real-data examples. PubDate: 2017-07-12 DOI: 10.1093/biomet/asx032

Authors:Srivastava S; Engelhardt BE, Dunson DB. First page: 649 Abstract: SummaryBayesian sparse factor models have proven useful for characterizing dependence in multivariate data, but scaling computation to large numbers of samples and dimensions is problematic. We propose expandable factor analysis for scalable inference in factor models when the number of factors is unknown. The method relies on a continuous shrinkage prior for efficient maximum a posteriori estimation of a low-rank and sparse loadings matrix. The structure of the prior leads to an estimation algorithm that accommodates uncertainty in the number of factors. We propose an information criterion to select the hyperparameters of the prior. Expandable factor analysis has better false discovery rates and true positive rates than its competitors across diverse simulation settings. We apply the proposed approach to a gene expression study of ageing in mice, demonstrating superior results relative to four competing methods. PubDate: 2017-06-16 DOI: 10.1093/biomet/asx030

Authors:Li C; Srivastava S, Dunson DB. First page: 665 Abstract: SummaryStandard posterior sampling algorithms, such as Markov chain Monte Carlo procedures, face major challenges in scaling up to massive datasets. We propose a simple and general posterior interval estimation algorithm to rapidly and accurately estimate quantiles of the posterior distributions for one-dimensional functionals. Our algorithm runs Markov chain Monte Carlo in parallel for subsets of the data, and then averages quantiles estimated from each subset. We provide strong theoretical guarantees and show that the credible intervals from our algorithm asymptotically approximate those from the full posterior in the leading parametric order. Our algorithm has a better balance of accuracy and efficiency than its competitors across a variety of simulations and a real-data example. PubDate: 2017-06-25 DOI: 10.1093/biomet/asx033

Authors:Canale AA; Lijoi AA, Nipoti BB, et al. First page: 681 Abstract: SummaryFor the most popular discrete nonparametric models, beyond the Dirichlet process, the prior guess at the shape of the data-generating distribution, also known as the base measure, is assumed to be diffuse. Such a specification greatly simplifies the derivation of analytical results, allowing for a straightforward implementation of Bayesian nonparametric inferential procedures. However, in several applied problems the available prior information leads naturally to the incorporation of an atom into the base measure, and then the Dirichlet process is essentially the only tractable choice for the prior. In this paper we fill this gap by considering the Pitman–Yor process with an atom in its base measure. We derive computable expressions for the distribution of the induced random partitions and for the predictive distributions. These findings allow us to devise an effective generalized Pólya urn Gibbs sampler. Applications to density estimation, clustering and curve estimation, with both simulated and real data, serve as an illustration of our results and allow comparisons with existing methodology. In particular, we tackle a functional data analysis problem concerning basal body temperature curves. PubDate: 2017-08-03 DOI: 10.1093/biomet/asx041

Authors:Dobler DD; Beyersmann JJ, Pauly MM. First page: 699 Abstract: SummaryThis paper introduces a new data-dependent multiplier bootstrap for nonparametric analysis of survival data, possibly subject to competing risks. The nw procedure includes the general wild bootstrap and the weird bootstrap as special cases. The data may be subject to independent right-censoring and left-truncation. The asymptotic correctness of the proposed resampling procedure is proven under standard assumptions. Simulation results on time-simultaneous inference suggest that the weird bootstrap performs better than the standard normal multiplier approach. PubDate: 2017-05-31 DOI: 10.1093/biomet/asx026

Authors:He X. First page: 713 Abstract: SummaryWe propose a new method for constructing minimax distance designs, which are useful for computer experiments. To circumvent computational difficulties, we consider designs with an interleaved lattice structure, a newly defined class of lattice that has repeated or alternated layers based on any single dimension. Such designs have boundary adaptation and low-thickness properties. From our numerical results, the proposed designs are by far the best minimax distance designs for moderate or large samples. PubDate: 2017-07-03 DOI: 10.1093/biomet/asx036

Authors:Sherlock C; Thiery AH, Lee A. First page: 727 Abstract: SummaryWe consider a pseudo-marginal Metropolis–Hastings kernel ${\mathbb{P}}_m$ that is constructed using an average of $m$ exchangeable random variables, and an analogous kernel ${\mathbb{P}}_s$ that averages $s<m$ of these same random variables. Using an embedding technique to facilitate comparisons, we provide a lower bound for the asymptotic variance of any ergodic average associated with ${\mathbb{P}}_m$ in terms of the asymptotic variance of the corresponding ergodic average associated with ${\mathbb{P}}_s$. We show that the bound is tight and disprove a conjecture that when the random variables to be averaged are independent, the asymptotic variance under ${\mathbb{P}}_m$ is never less than $s/m$ times the variance under ${\mathbb{P}}_s$. The conjecture does, however, hold for continuous-time Markov chains. These results imply that if the computational cost of the algorithm is proportional to $m$, it is often better to set $m=1$. We provide intuition as to why these findings differ so markedly from recent results for pseudo-marginal kernels employing particle filter approximations. Our results are exemplified through two simulation studies; in the first the computational cost is effectively proportional to $m$ and in the second there is a considerable start-up cost at each iteration. PubDate: 2017-06-21 DOI: 10.1093/biomet/asx031

Authors:Hristache MM; Patilea VV. First page: 735 Abstract: SummaryWe consider a general statistical model defined by moment restrictions when data are missing at random. Using inverse probability weighting, we show that such a model is equivalent to a model for the observed variables only, augmented by a moment condition defined by the missingness mechanism. Our framework covers parametric and semiparametric mean regressions and quantile regressions. We allow for missing responses, missing covariates and any combination of them. The equivalence result sheds new light on various aspects of missing data, and provides guidelines for building efficient estimators. PubDate: 2017-05-16 DOI: 10.1093/biomet/asx025

Authors:Eck DJ; Cook RD. First page: 743 Abstract: SummaryEnvelope methodology can provide substantial efficiency gains in multivariate statistical problems, but in some applications the estimation of the envelope dimension can induce selection volatility that may mitigate those gains. Current envelope methodology does not account for the added variance that can result from this selection. In this article, we circumvent dimension selection volatility through the development of a weighted envelope estimator. Theoretical justification is given for our estimator, and the validity of the residual bootstrap for estimating its asymptotic variance is established. A simulation study and real-data analysis illustrate the utility of our weighted envelope estimator. PubDate: 2017-06-12 DOI: 10.1093/biomet/asx035