Abstract: Parameter change test has been an important issue in time series analysis. The problem has also been actively explored in the field of integer-valued time series, but the testing in the presence of outliers has not yet been extensively investigated. This study considers the problem of testing for parameter change in Poisson autoregressive models particularly when observations are contaminated by outliers. To lessen the impact of outliers on testing procedure, we propose a test based on the density power divergence, which is introduced by Basu et al. (Biometrika 85:549–559, 1998), and derive its limiting null distribution. Monte Carlo simulation results demonstrate validity and strong robustness of the proposed test. PubDate: 2020-03-04
Abstract: Many existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is “Wild Binary Segmentation 2” (WBS2), a recursive algorithm for producing what we call a ‘complete’ solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing \(0, \ldots , T-1\) change-points, where T is the data length. The other ingredient is a new model selection procedure, referred to as “Steepest Drop to Low Levels” (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter. PubDate: 2020-03-02
Abstract: In this work, we consider the identifiability assumption of Gaussian linear structural equation models (SEMs) in which each variable is determined by a linear function of its parents plus normally distributed error. It has been shown that linear Gaussian structural equation models are fully identifiable if all error variances are the same or known. Hence, this work proves the identifiability of Gaussian SEMs with both homogeneous and heterogeneous unknown error variances. Our new identifiability assumption exploits not only error variances, but edge weights; hence, it is strictly milder than prior work on the identifiability result. We further provide a structure learning algorithm that is statistically consistent and computationally feasible, based on our new assumption. The proposed algorithm assumes that all relevant variables are observed, while it does not assume causal minimality and faithfulness. We verify our theoretical findings through simulations and real multivariate data, and compare our algorithm to state-of-the-art PC, GES and GDS algorithms. PubDate: 2020-03-01
Abstract: Model structural inference on semiparametric measurement error models have not been well developed in the existing literature, partially due to the difficulties in dealing with unobservable covariates. In this study, a framework for adaptive structure selection is developed in partially linear error-in-function models with error-prone covariates. Firstly, based on the profile-least-square estimators of the current models, we define two test statistics via generalized likelihood ratio (GLR) test method (Fan et al. in Ann Stat 29(1):153–193, 2001). The proposed test statistics are shown to possess the Wilks-type properties, and a class of new Wilks phenomenon is unveiled in the family of semiparametric measurement error models. Then, we demonstrate that the GLR statistics asymptotically follow chi-squared distributions under null hypotheses. Further, we propose efficient algorithms to implement our methodology and assess the finite sample performance by simulated examples. A real example is given to illustrate the performance of the present methodology. PubDate: 2020-03-01
Abstract: In this paper, we propose self-semi-supervised clustering, a new clustering method for large scale data with a massive null group. Self-semi-supervised clustering is a two-stage procedure: preselect a part of “null” group from the data in the first stage and apply semi-supervised clustering to the rest of the data in the second stage, allowing them to be assigned to the null group. We evaluate the performance of the proposed method using a simulation study and demonstrate the method in the analysis of time course gene expression data from a longitudinal study of Influenza A virus infection. PubDate: 2020-03-01
Abstract: In order to analyze longitudinal ordinal data, researchers commonly use the cumulative logit random effects model. In these models, the random effects covariance matrix is used to account for both subject variation and serial correlation of repeated outcomes. However, the covariance matrix is assumed to be homoscedastic and restricted due to the high-dimensionality and positive-definiteness of the matrix. In order to relieve these assumptions, three Cholesky decomposition methods were proposed to model the random effects covariance matrix: modified Cholesky, moving average Cholesky, and autoregressive moving-average decompositions. We also use the three decompositions to model the random effects covariance matrix in cumulative logit random effects models for longitudinal ordinal data. In addition, Bayesian methods are presented for the parameter estimation of the proposed models, and Markov Chain Monte Carlo is conducted using the JAGS program. The proposed methods are illustrated using lung cancer data. PubDate: 2020-03-01
Abstract: Rakotomamonjy (RO-CAI, pp 71–80, 2009) proposed an ROC-SVM that optimizes the receiver operating characteristic curve (ROC) particularly useful for unbalanced classification. In this article, we establish the piecewise linearity of the ROC-SVM solutions as a function of regularization parameter, and develop an efficient algorithm for computing the entire regularization paths of the ROC-SVM. Finally we develop an R package, rocsvm.path, now available in CRAN. PubDate: 2020-03-01
Abstract: Allocating computation over multiple chains to reduce sampling time in MCMC is crucial in making MCMC more applicable in the state of the art models such as deep neural networks. One of the parallelization schemes for MCMC is partitioning the sample space to run different MCMC chains in each component of the partition (VanDerwerken and Schmidler in Parallel Markov chain Monte Carlo. arXiv:1312.7479, 2013; Basse et al. in Artificial intelligence and statistics, pp 1318–1327, 2016). In this work, we take Basse et al. (2016)’s bridge sampling approach and apply constrained Hamiltonian Monte Carlo on partitioned sample spaces. We propose a random dimension partition scheme that combines well with the constrained HMC. We empirically show that this approach can expedite MCMC sampling for any unnormalized target distribution such as Bayesian neural network in a high dimensional setting. Furthermore, in the presence of multi-modality, this algorithm is expected to be more efficient in mixing MCMC chains when proper partition elements are chosen. PubDate: 2020-03-01
Abstract: Aalen’s additive hazards model plays a very important role in survival analysis. In this paper we are interested in the problem of estimating regression coefficients in the additive hazards model with censored length-biased data. Through both of the parametric invariance of the proportional likelihood ratio model and the unique structure of length-biased data, we propose a pairwise pseudo-likelihood estimating equation, which only relies on the complete residual lifetimes in censored length-biased data. In addition, two combined estimating equations are also considered to estimate covariate coefficients. These estimators are proved to be consistent and asymptotically normal. In order to evaluate the performance of the proposed estimators in a finite sample, some simulations are conducted. Finally, a real data example is also provided. PubDate: 2020-03-01
Abstract: In the Bayesian framework, the marginal likelihood plays an important role in variable selection and model comparison. The marginal likelihood is the marginal density of the data after integrating out the parameters over the parameter space. However, this quantity is often analytically intractable due to the complexity of the model. In this paper, we first examine the properties of the inflated density ratio (IDR) method, which is a Monte Carlo method for computing the marginal likelihood using a single MC or Markov chain Monte Carlo (MCMC) sample. We then develop a variation of the IDR estimator, called the dimension reduced inflated density ratio (Dr.IDR) estimator. We further propose a more general identity and then obtain a general dimension reduced (GDr) estimator. Simulation studies are conducted to examine empirical performance of the IDR estimator as well as the Dr.IDR and GDr estimators. We further demonstrate the usefulness of the GDr estimator for computing the normalizing constants in a case study on the inequality-constrained analysis of variance. PubDate: 2020-03-01
Abstract: Designing their experiments is the significant problem that experimenters face. Maximin distance designs, supersaturated designs, minimum aberration designs, uniform designs, minimum moment designs and orthogonal arrays are arguably the most exceedingly used designs for many real-life experiments. From different perspectives, several criteria have been proposed for constructing these designs for investigating quantitative or qualitative factors. Each of those criteria has its pros and cons and thus an optimal criterion does not exist, which may confuse investigators searching for a suitable criterion for their experiment. Some logical questions are now arising, such as are these designs consistent, can an optimal design via a specific criterion perform well based on another criterion and can an optimal design for screening quantitative factors be optimal for qualitative factors' Through theoretical justifications, this paper tries to answer these interesting questions by building some bridges among these designs. Some conditions under which these designs agree with each other are discussed. These bridges can be used to select a suitable criterion for studying some hard problems effectively, such as detection of (combinatorial/geometrical) non-isomorphism among designs and construction of optimal designs. Benchmarks for reducing the computational complexity are given. PubDate: 2020-03-01
Abstract: Stochastic frontier models have been considered as an alternative to deterministic frontier models in that they attribute the deviation of the output from the production frontier to both measurement error and inefficiency. However, such merit is often dimmed by strong assumptions on the distribution of the measurement error and the inefficiency such as the normal-half normal pair or the normal-exponential pair. Since the distribution of the measurement error is often accepted as being approximately normal, here we show how to estimate various stochastic frontier models with a relaxed assumption on the inefficiency distribution, building on the recent work of Kneip and his coworkers. We illustrate the usefulness of our method with data on Japanese local public hospitals. PubDate: 2020-03-01
Abstract: Change-point models are generative models in which the underlying generative parameters change at different points in time. A Bayesian approach to the problem of hazard change with unknown multiple change-points is developed using informative priors for censored survival data. For the exponential distribution, piecewise constant hazard is considered with change-point estimation. The stochastic approximation Monte Carlo algorithm is implemented for efficient calculation of the posterior distributions. The performance of the proposed estimator is checked via simulation. As a real data application, Leukemia data are analyzed by the proposed method and compared with other previous non-Bayesian method. PubDate: 2020-03-01
Abstract: We obtain explicit formulae for the expected values \(E \{ \prod \nolimits _{i=1}^{3}g_i ( X_i ) \}\) of standard tri-variate Gaussian random vector \({\underline{X}}= \left( X_1, X_2 , X_3 \right) \) over the set \(g_i (x) \in \left\{ \delta (x), \mathrm {sgn}(x), x , x \right\} \) of nonlinear and linear functions. Based on the results, we also suggest corrections to long-known formulae for two incomplete moments. Applications of the formulae in practical examples are also illustrated concisely. For easy reference for the readers, the explicit formulae derived in this paper and related well-known results are summarized in tables. PubDate: 2020-03-01
Abstract: In applications, other than sample information, some prior information on parameters can be used to improve the estimation efficiency. In the framework of varying-coefficient partially linear models with the number of parametric and nonparametric components diverging, this paper proposes a restricted profile least-squares estimation for the parametric components after the varying coefficients are estimated by basis function approximations. This estimator is shown to be consistent and asymptotically normal under certain regularity conditions. To check the validity of the linear constraints on the parametric components, we construct a profile generalized likelihood ratio test statistic and demonstrate that it follows asymptotically chi-squared distribution under the null and alternative hypotheses. Simulation studies are conducted and the Boston housing data is analyzed to illustrate the proposed method. PubDate: 2020-03-01
Abstract: In this paper, we study the tests for sphericity and identity of covariance matrices in time-varying coefficient high-dimensional panel data models with fixed effects. In order to construct the effective test statistics and avoid the influence of the unknown fixed effects, we apply the difference method to eliminate the dependence of the residual sample, and further construct test statistics using the trace estimators of the covariance matrices. For the estimators of the coefficient functions, we use the local linear dummy variable method. Under some regularity conditions, we study the asymptotic property of the estimators and establish the asymptotic distributions of our proposed test statistics without specifying an explicit relationship between the cross-sectional and the time series dimensions. We further show that the test statistics are asymptotic distribution-free. Subsequently simulation studies are carried out to evaluate our proposed methods. In order to assess the performance of our proposed test method, we compare with the existing test methods in panel data linear models with fixed effects. PubDate: 2020-03-01
Abstract: The additive hazards model is one of the most popular regression models for analyzing failure time data, especially when one is interested in the excess risk or risk difference. Although a couple of methods have been developed in the literature for regression analysis of interval-censored data, a general type of failure time data, they may be complicated or inefficient. Corresponding to this, we present a new maximum likelihood estimation procedure based on the sieve approach and in particular, develop an EM algorithm that involves a two-stage data augmentation with the use of Poisson latent variables. The method can be easily implemented and the asymptotic properties of the proposed estimators are established. A simulation study is conducted to assess the performance of the proposed method and indicates that it works well for practical situations. Also the method is applied to a set of interval-censored data from an AIDS cohort study. PubDate: 2020-02-26
Abstract: In this paper, the multiple change-point problem in the scale parameter of a sequence of independent gamma distributed observations is discussed. A reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is developed to compute the posterior probabilities of the number and positions of the multiple change-points. Four types of jumps are designed, and the acceptance probability of each type is given. The simulation studies show that the RJMCMC-based method is efficient in the detection of multiple change-points in the scale parameter in gamma distributed sequence, and performs better than a self-normalization based method. In addition, a real data example about successive rises and falls of Shanghai stock exchange composite index yield is used to illustrate the proposed methodology. PubDate: 2020-02-26
Abstract: Doubly truncated data often arise when event times are observed only if they fall within subject-specific intervals. We analyze doubly truncated data using nonparametric transformation models, where an unknown monotonically increasing transformation of the response variable is equal to an unknown monotonically increasing function of a linear combination of the covariates plus a random error with an unspecified log-concave probability density function. Furthermore, we assume that the truncation variables are conditionally independent of the response variable given the covariates and leave the conditional distributions of truncation variables given the covariates unspecified. For estimation of regression parameters, we propose a weighted rank (WR) estimation procedure and establish the consistency and asymptotic normality of the resulting estimator. The limiting covariance matrix of the WR estimator can be estimated by a resampling technique, which does not involve nonparametric density estimation or numerical derivatives. A numerical study is conducted and suggests that the proposed methodology works well in practice, and an illustration based on real data is provided. PubDate: 2020-02-20
Abstract: This paper presents a general methodology for nonparametric estimation of a function s related to a nonnegative real random variable X, under a constraint of type \(s(0)=c\). When a projection estimator of the target function is available, we explain how to modify it in order to obtain an estimator which satisfies the constraint. We extend risk bounds from the initial to the new estimator, and propose and study adaptive procedures for both estimators. The example of cumulative distribution function estimation illustrates the method for two different models: the multiplicative noise model (\(Y=XU\) is observed, with U following a uniform distribution) and the additive noise model \((Y=X+V\) is observed where V is a nonnegative nuisance variable with known density). PubDate: 2020-02-20