Hybrid journal (It can contain Open Access articles) ISSN (Print) 0006-3444 - ISSN (Online) 1464-3510 Published by Oxford University Press[368 journals]

Authors:Chang J; Yao Q, Zhou W. Abstract: The paper ‘Testing for high-dimensional white noise using maximum cross-correlations’ was published in Biometrika, on the advance access site on 18 February 2017. Oxford University Press regrets that in the original published version of this manuscript $O_p(1)$ appeared instead of $o_p(1)$ in line 2 of § 2.2 and in the Proof of Theorem 3. Lines 2–6 of the Proof of Theorem 3 are correct as follows: PubDate: 2017-03-04

Authors:Wang L; Robins JM, Richardson TS. Abstract: The paper ‘On falsification of the binary instrumental variable model’ was published in Biometrika, on the advance access site on 23 January 2017. Oxford University Press regrets that in the original published version of this manuscript the equation was incorrectly split and a $\times$ was introduced incorrectly in the second line of the display (66). The display is correct as follows: (6)pr(D=d,Y=1∣Z=1)+pr(D=d,Y=0∣Z=0)−1≤ACDE(d) ≤1−pr(D=d,Y=0∣Z=1)−pr(D=d,Y=1∣Z=0). PubDate: 2017-02-24

Authors:Wang XX; Jiang BB, Liu JS. Abstract: SUMMARYDetecting dependence between two random variables is a fundamental problem. Although the Pearson correlation coefficient is effective for capturing linear dependence, it can be entirely powerless for detecting nonlinear and/or heteroscedastic patterns. We introduce a new measure, G-squared, to test whether two univariate random variables are independent and to measure the strength of their relationship. The G-squared statistic is almost identical to the square of the Pearson correlation coefficient, R-squared, for linear relationships with constant error variance, and has the intuitive meaning of the piecewise R-squared between the variables. It is particularly effective in handling nonlinearity and heteroscedastic errors. We propose two estimators of G-squared and show their consistency. Simulations demonstrate that G-squared estimators are among the most powerful test statistics compared with several state-of-the-art methods. PubDate: 2017-02-22

Authors:Stein ML. Abstract: SUMMARYMotivated by the study of annual temperature extremes, two new results on the limiting distribution of block maxima of random variables with varying upper bounds are obtained. One gives a generalized extreme value distribution as the limit, but with a different shape parameter from that obtained when the bound on the random variables does not vary. The other gives a limiting distribution that is only a generalized extreme value in certain cases. Both results consider triangular arrays of random variables in order to mimic the property of an upper bound that changes slowly with the day of the year, as seems to occur for temperature data at many locations. An analysis of 140 years of daily temperatures in New York City shows mixed results in terms of the ability of the theory presented here to provide new insights into the behaviour of extreme temperatures at this location. PubDate: 2017-02-18

Authors:Chang J; Yao Q, Zhou W. Abstract: SUMMARYWe propose a new omnibus test for vector white noise using the maximum absolute autocorrelations and cross-correlations of the component series. Based on an approximation by the $L_\infty$-norm of a normal random vector, the critical value of the test can be evaluated by bootstrapping from a multivariate normal distribution. In contrast to the conventional white noise test, the new method is proved to be valid for testing departure from white noise that is not independent and identically distributed. We illustrate the accuracy and the power of the proposed test by simulation, which also shows that the new test outperforms several commonly used methods, including the Lagrange multiplier test and the multivariate Box–Pierce portmanteau tests, especially when the dimension of the time series is high in relation to the sample size. The numerical results also indicate that the performance of the new test can be further enhanced when it is applied to pre-transformed data obtained via the time series principal component analysis proposed by J. Chang, B. Guo and Q. Yao (arXiv:1410.2323). The proposed procedures have been implemented in an R package. PubDate: 2017-02-18

Authors:Ogden HE. Abstract: SUMMARYMany statistical models have likelihoods which are intractable: it is impossible or too expensive to compute the likelihood exactly. In such settings, a common approach is to replace the likelihood with an approximation, and proceed with inference as if the approximate likelihood were the true likelihood. In this paper, we describe conditions which guarantee that such naive inference with an approximate likelihood has the same first-order asymptotic properties as inference with the true likelihood. We investigate the implications of these results for inference using a Laplace approximation to the likelihood in a simple two-level latent variable model and using reduced dependence approximations to the likelihood in an Ising model. PubDate: 2017-02-18

Authors:Chen YY; Ning JJ, Ning YY, et al. Abstract: SUMMARYConsider a semiparametric model indexed by a Euclidean parameter of interest and an infinite-dimensional nuisance parameter. In many applications, pseudolikelihood provides a convenient way to infer the parameter of interest, where the nuisance parameter is replaced by a consistent estimator. The purpose of this paper is to establish the asymptotic behaviour of the pseudolikelihood ratio statistic under semiparametric models. In particular, we consider testing the hypothesis that the parameter of interest lies on the boundary of its parameter space. Under regularity conditions, we establish the equivalence between the asymptotic distributions of the pseudolikelihood ratio statistic and a likelihood ratio statistic for a normal mean problem with a misspecified covariance matrix. This result holds when the nuisance parameter is estimated at a rate slower than the usual rate in parametric models. We study three examples in which the asymptotic distributions are shown to be mixtures of chi-squared variables. We conduct simulation studies to examine the finite-sample performance of the pseudolikelihood ratio test. PubDate: 2017-02-18

Authors:Yu T; Li P, Qin J. Abstract: SUMMARYIn this paper, we propose a method for estimating the probability density functions in a two-sample problem where the ratio of the densities is monotone. This problem has been widely identified in the literature, but effective solution methods, in which the estimates should be probability densities and the corresponding density ratio should inherit monotonicity, are unavailable. If these conditions are not satisfied, the applications of the resultant density estimates might be limited. We propose estimates for which the ratio inherits the monotonicity property, and we explore their theoretical properties. One implication is that the corresponding receiver operating characteristic curve estimate is concave. Through numerical studies, we observe that both the density estimates and the receiver operating characteristic curve estimate from our method outperform those resulting directly from kernel density estimates, particularly when the sample size is relatively small. PubDate: 2017-02-03

Authors:Zhou QQ; Zhou HH, Cai JJ. Abstract: SUMMARYThe case-cohort design has been widely used as a means of cost reduction in collecting or measuring expensive covariates in large cohort studies. The existing literature on the case-cohort design is mainly focused on right-censored data. In practice, however, the failure time is often subject to interval-censoring: it is known to fall only within some random time interval. In this paper, we consider the case-cohort study design for interval-censored failure time and develop a sieve semiparametric likelihood method for analysing data from this design under the proportional hazards model. We construct the likelihood function using inverse probability weighting and build the sieves with Bernstein polynomials. The consistency and asymptotic normality of the resulting regression parameter estimator are established, and a weighted bootstrap procedure is considered for variance estimation. Simulations show that the proposed method works well in practical situations, and an application to real data is provided. PubDate: 2017-02-03

Authors:Sadinle M; Reiter JP. Abstract: SUMMARYWe introduce a nonresponse mechanism for multivariate missing data in which each study variable and its nonresponse indicator are conditionally independent given the remaining variables and their nonresponse indicators. This is a nonignorable missingness mechanism, in that nonresponse for any item can depend on values of other items that are themselves missing. We show that under this itemwise conditionally independent nonresponse assumption, one can define and identify nonparametric saturated classes of joint multivariate models for the study variables and their missingness indicators. We also show how to perform sensitivity analysis with respect to violations of the conditional independence assumptions encoded by this missingness mechanism. We illustrate the proposed modelling approach with data analyses. PubDate: 2017-01-23

Authors:Wang L; Robins JM, Richardson TS. Abstract: SUMMARYInstrumental variables are widely used for estimating causal effects in the presence of unmeasured confounding. The discrete instrumental variable model has testable implications for the law of the observed data. However, current assessments of instrumental validity are typically based solely on subject-matter arguments rather than these testable implications, partly due to a lack of formal statistical tests with known properties. In this paper, we develop simple procedures for testing the binary instrumental variable model. Our methods are based on existing techniques for comparing two treatments, such as the $t$-test and the Gail–Simon test. We illustrate the importance of testing the instrumental variable model by evaluating the exogeneity of college proximity using the National Longitudinal Survey of Young Men. PubDate: 2017-01-23

Authors:Luo W; Zhu Y, Ghosh D. Abstract: SUMMARYIn many causal inference problems the parameter of interest is the regression causal effect, defined as the conditional mean difference in the potential outcomes given covariates. In this paper we discuss how sufficient dimension reduction can be used to aid causal inference, and we propose a new estimator of the regression causal effect inspired by minimum average variance estimation. The estimator requires a weaker common support condition than propensity score-based approaches, and can be used to estimate the average causal effect, for which it is shown to be asymptotically super-efficient. Its finite-sample properties are illustrated by simulation. PubDate: 2017-01-23

Authors:Shin S; Wu Y, Zhang H, et al. Abstract: SUMMARYSufficient dimension reduction is popular for reducing data dimensionality without stringent model assumptions. However, most existing methods may work poorly for binary classification. For example, sliced inverse regression (Li, 1991) can estimate at most one direction if the response is binary. In this paper we propose principal weighted support vector machines, a unified framework for linear and nonlinear sufficient dimension reduction in binary classification. Its asymptotic properties are studied, and an efficient computing algorithm is proposed. Numerical examples demonstrate its performance in binary classification. PubDate: 2017-01-19

Authors:Ollier EE; Viallon VV. Abstract: SUMMARYWe consider the estimation of regression models on strata defined using a categorical covariate, in order to identify interactions between this categorical covariate and the other predictors. A basic approach requires the choice of a reference stratum. We show that the performance of a penalized version of this approach depends on this arbitrary choice, and propose an approach that bypasses this at almost no additional computational cost. Regarding model selection consistency, our proposal mimics the strategy based on an optimal and covariate-specific choice for the reference stratum. An empirical study confirms that our proposal generally outperforms the basic approach in the identification and description of the interactions. An illustration on gene expression data is provided. PubDate: 2017-01-19

Authors:Johnstone IM; Nadler BB. Abstract: SUMMARYRoy’s largest root is a common test statistic in multivariate analysis, statistical signal processing and allied fields. Despite its ubiquity, provision of accurate and tractable approximations to its distribution under the alternative has been a longstanding open problem. Assuming Gaussian observations and a rank-one alternative, or concentrated noncentrality, we derive simple yet accurate approximations for the most common low-dimensional settings. These include signal detection in noise, multiple response regression, multivariate analysis of variance and canonical correlation analysis. A small-noise perturbation approach, perhaps underused in statistics, leads to simple combinations of standard univariate distributions, such as central and noncentral $\chi^2$ and $F$. Our results allow approximate power and sample size calculations for Roy’s test for rank-one effects, which is precisely where it is most powerful. PubDate: 2017-01-13

Authors:Tsay RS; Pourahmadi M. Abstract: SUMMARYEnsuring positive definiteness of an estimated structured correlation matrix is challenging. We show that reparameterizing Cholesky factors of correlation matrices using hyperspherical coordinates or angles provides a flexible and effective solution. Once a structured correlation matrix is identified, the corresponding angles and hence the constrained correlations may be estimated by maximum likelihood. Consistency and asymptotic normality of the maximum likelihood estimators of the angles are established. Examples demonstrate the flexibility of the method. PubDate: 2017-01-13

Authors:Fattorini LL; Marcheselli MM, Pisani CC, et al. Abstract: SUMMARYWe analyse design-based properties of two-phase strategies for estimating totals and nonlinear functions of totals for environmental populations when the sampling schemes are uniquely determined by points placed in the study region. In the first phase, points are located using tessellation stratified sampling, whereas in the second phase a finite population sampling scheme is adopted. We give sufficient conditions on second-phase designs that ensure consistency, and we investigate the variance convergence rate for some familiar schemes. PubDate: 2017-01-03

Authors:Kim JK; Yang SS. Abstract: SUMMARYMultiple imputation is popular for handling item nonresponse in survey sampling. Current multiple imputation techniques with complex survey data assume that the sampling design is ignorable. In this paper, we propose a new multiple imputation procedure for parametric inference without this assumption. Instead of using the sample-data likelihood, we use the sampling distribution of the pseudo maximum likelihood estimator to derive the posterior distribution of the parameters. The asymptotic properties of the proposed method are investigated. A simulation study confirms that the new procedure provides unbiased point estimation and valid confidence intervals with correct coverage properties whether or not the sampling design is ignorable. PubDate: 2017-01-03

Authors:Bertrand A; Legrand C, Carroll RJ, et al. Abstract: SUMMARYIn many situations in survival analysis, it may happen that a fraction of individuals will never experience the event of interest: they are considered to be cured. The promotion time cure model takes this into account. We consider the case where one or more explanatory variables in the model are subject to measurement error, which should be taken into account to avoid biased estimators. A general approach is the simulation-extrapolation algorithm, a method based on simulations which allows one to estimate the effect of measurement error on the bias of the estimators and to reduce this bias. We extend this approach to the promotion time cure model. We explain how the algorithm works, and we show that the proposed estimator is approximately consistent and asymptotically normally distributed, and that it performs well in finite samples. Finally, we analyse a database in cardiology: among the explanatory variables of interest is the ejection fraction, which is known to be measured with error. PubDate: 2017-01-03

Authors:She Y. Abstract: SUMMARYThis paper studies simultaneous feature selection and extraction in supervised and unsupervised learning. We propose and investigate selective reduced rank regression for constructing optimal explanatory factors from a parsimonious subset of input features. The proposed estimators enjoy sharp oracle inequalities, and with a predictive information criterion for model selection, they adapt to unknown sparsity by controlling both rank and row support of the coefficient matrix. A class of algorithms is developed that can accommodate various convex and nonconvex sparsity-inducing penalties, and can be used for rank-constrained variable screening in high-dimensional multivariate data. The paper also showcases applications in macroeconomics and computer vision to demonstrate how low-dimensional data structures can be effectively captured by joint variable selection and projection. PubDate: 2017-01-03