Abstract: Abstract The Fay–Herriot model is an area-level linear mixed model that is widely used for estimating the domain means of a given target variable. Under this model, the dependent variable is a direct estimator calculated by using the survey data and the auxiliary variables are true domain means obtained from external data sources. Administrative registers do not always give good auxiliary variables so that statisticians sometimes take them from alternative surveys and therefore they are measured with error. We introduce a variant of the Fay–Herriot model that takes into account the measurement error of the auxiliary variables and give two fitting algorithms that calculate maximum and residual maximum likelihood estimates of the model parameters. Based on the new model, empirical best predictors of domain means are introduced and an approximation of its mean squared error is derived. We finally give an application to estimate poverty proportions in the Spanish Living Condition Survey, with auxiliary information from the Spanish Labour Force Survey. PubDate: 2019-03-22 DOI: 10.1007/s11749-019-00649-3

Abstract: Abstract In social and biomedical sciences, testing in contingency tables often involves order restrictions on cell probabilities parameters. We develop objective Bayes methods for order-constrained testing and model comparison when observations arise under product binomial or multinomial sampling. Specifically, we consider tests for monotone order of the parameters against equality of all parameters. Our strategy combines in a unified way both the intrinsic prior methodology and the encompassing prior approach in order to compute Bayes factors and posterior model probabilities. Performance of our method is evaluated on several simulation studies and real datasets. PubDate: 2019-03-22 DOI: 10.1007/s11749-019-00650-w

Abstract: Abstract This note considers several technical points concerning the work of Kneib, Klein, Lang, and Umlauf. These points are related to the use of basis functions versus function spaces, types of covariates, and model identifiability. PubDate: 2019-03-01 DOI: 10.1007/s11749-019-00635-9

Abstract: Abstract Accelerated failure time (AFT) model is a useful semi-parametric model under right censoring, which is an alternative to the commonly used proportional hazards model. Making statistical inference for the AFT model has attracted considerable attention. However, it is difficult to compute the estimators of regression parameters due to the lack of smoothness for rank-based estimating equations. Brown and Wang (Stat Med 26(4):828–836, 2007) used an induced smoothing approach, which smooths the estimating functions to obtain point and variance estimators. In this paper, a more computationally efficient method called jackknife empirical likelihood (JEL) is proposed to make inference for the accelerated failure time model without computing the limiting variance. Results from extensive simulation suggest that the JEL method outperforms the traditional normal approximation method in most cases. Subsequently, two real data sets are analyzed for illustration of the proposed method. PubDate: 2019-03-01 DOI: 10.1007/s11749-018-0601-7

Abstract: Abstract We compare two mixtures of arbitrary collection of stochastically ordered distribution functions with respect to fixed mixing distributions. Under the assumption that the first mixture distribution is known, we establish optimal lower and upper bounds on the latter mixture distribution function and present single families of ordered distributions which attain the bounds uniformly at all real arguments. Furthermore, we determine sharp upper and lower bounds on the differences between the expectations of the mixtures expressed in various scale units. General results are illustrated by several examples. PubDate: 2019-03-01 DOI: 10.1007/s11749-018-0604-4

Abstract: Abstract Highly robust and efficient estimators for generalized linear models with a dispersion parameter are proposed. The estimators are based on three steps. In the first step, the maximum rank correlation estimator is used to consistently estimate the slopes up to a scale factor. The scale factor, the intercept, and the dispersion parameter are robustly estimated using a simple regression model. Then, randomized quantile residuals based on the initial estimators are used to define a region S such that observations out of S are considered as outliers. Finally, a conditional maximum likelihood (CML) estimator given the observations in S is computed. We show that, under the model, S tends to the whole space for increasing sample size. Therefore, the CML estimator tends to the unconditional maximum likelihood estimator and this implies that this estimator is asymptotically fully efficient. Moreover, the CML estimator maintains the high degree of robustness of the initial one. The negative binomial regression case is studied in detail. PubDate: 2019-03-01 DOI: 10.1007/s11749-018-0624-0

Abstract: Abstract The multivariate t nonlinear mixed-effects model (MtNLMM) has been shown to be effective for analyzing multi-outcome longitudinal data following nonlinear growth patterns with fat-tailed noises or potential outliers. This paper considers the problem of clustering heterogeneous longitudinal profiles in a mixture framework of MtNLMM. A finite mixture of multivariate t nonlinear mixed model is proposed, and this new model allows accommodating more complex features of longitudinal data. Intermittent missing values frequently occur in the data collection process of multiple repeated measures. Under a missing at random mechanism, a pseudo-data version of the alternating expectation-conditional maximization algorithm is developed to carry out maximum likelihood estimation and impute missing values simultaneously. The techniques for clustering of incomplete multiple trajectories, recovery of missing responses, and allocation of future subjects are also investigated. The practical utility is demonstrated through a real data example coming from a study of 124 normal and 37 abnormal pregnant women. Simulation studies are provided to validate the proposed approach. PubDate: 2019-03-01 DOI: 10.1007/s11749-018-0612-4

Abstract: Abstract We consider a bivariate logistic model for a binary response, and we assume that two rival dependence structures are possible. Copula functions are very useful tools to model different kinds of dependence with arbitrary marginal distributions. We consider Clayton and Gumbel copulae as competing association models. The focus is on applications in testing a new drug looking at both efficacy and toxicity outcomes. In this context, one of the main goals is to find the dose which maximizes the probability of efficacy without toxicity, herein called P-optimal dose. If the P-optimal dose changes under the two rival copulae, then it is relevant to identify the proper association model. To this aim, we propose a criterion (called PKL) which enables us to find the optimal doses to discriminate between the rival copulae, subject to a constraint that protects patients against dangerous doses. Furthermore, by applying the likelihood ratio test for non-nested models, via a simulation study we confirm that the PKL-optimal design is really able to discriminate between the rival copulae. PubDate: 2019-03-01 DOI: 10.1007/s11749-018-0595-1

Abstract: Abstract Semiparametric regression models offer considerable flexibility concerning the specification of additive regression predictors including effects as diverse as nonlinear effects of continuous covariates, spatial effects, random effects, or varying coefficients. Recently, such flexible model predictors have been combined with the possibility to go beyond pure mean-based analyses by specifying regression predictors on potentially all parameters of the response distribution in a distributional regression framework. In this paper, we discuss a generic concept for defining interaction effects in such semiparametric distributional regression models based on tensor products of main effects. These interactions can be assigned anisotropic penalties, i.e. different amounts of smoothness will be associated with the interacting covariates. We investigate identifiability and the decomposition of interactions into main effects and pure interaction effects (similar as in a smoothing spline analysis of variance) to facilitate a modular model building process. The decomposition is based on orthogonality in function spaces which allows for considerable flexibility in setting up the effect decomposition. Inference is based on Markov chain Monte Carlo simulations with iteratively weighted least squares proposals under constraints to ensure identifiability and effect decomposition. One important aspect is therefore to maintain sparse matrix structures of the tensor product also in identifiable, decomposed model formulations. The performance of modular regression is verified in a simulation on decomposed interaction surfaces of two continuous covariates and two applications on the construction of spatio-temporal interactions for the analysis of precipitation on the one hand and functional random effects for analysing house prices on the other hand. PubDate: 2019-03-01 DOI: 10.1007/s11749-019-00631-z

Abstract: Abstract We propose two families of tests for the classical goodness-of-fit problem to univariate normality. The new procedures are based on \(L^2\) -distances of the empirical zero-bias transformation to the empirical distribution or the normal distribution function. Weak convergence results are derived under the null hypothesis, under contiguous as well as under fixed alternatives. A comparative finite-sample power study shows the competitiveness to classical procedures. PubDate: 2019-02-22 DOI: 10.1007/s11749-019-00630-0

Abstract: Abstract Most often, perfect repair is conventionally understood as a replacement of the failed item by the new one. However, contrary to the common perception, new does not mean automatically that the distribution to the next failure is identical to that on the previous cycle. First, it can be different due to dynamic environment and, secondly, due to heterogeneity of items for replacement. Both of these causes that affect the failure mechanism of items are studied. Environment is modeled by the non-homogeneous Poisson shock process. Two models for the failure mechanism defined by the extreme shock model and the cumulative shock model are considered. Examples illustrating our findings are presented. PubDate: 2019-02-18 DOI: 10.1007/s11749-019-00645-7

Abstract: Abstract In this paper, we consider the situation in which the observations follow an isotonic generalized partly linear model. Under this model, the mean of the responses is modelled, through a link function, linearly on some covariates and nonparametrically on an univariate regressor in such a way that the nonparametric component is assumed to be a monotone function. A class of robust estimates for the monotone nonparametric component and for the regression parameter, related to the linear one, is defined. The robust estimators are based on a spline approach combined with a score function which bounds large values of the deviance. As an application, we consider the isotonic partly linear log-Gamma regression model. Under regularity conditions, we derive consistency results for the nonparametric function estimators as well as consistency and asymptotic distribution results for the regression parameter estimators. Besides, the empirical influence function allows us to study the sensitivity of the estimators to anomalous observations. Through a Monte Carlo study, we investigate the performance of the proposed estimators under a partly linear log-Gamma regression model with increasing nonparametric component. The proposal is illustrated on a real data set. PubDate: 2019-02-13 DOI: 10.1007/s11749-019-00629-7

Authors:Ana M. Bianco; Graciela Boente; Wenceslao González-Manteiga; Ana Pérez-González Abstract: Abstract In this paper, we consider a general regression model where missing data occur in the response and in the covariates. Our aim is to estimate the marginal distribution function and a marginal functional, such as the mean, the median or any \(\alpha \) -quantile of the response variable. A missing at random condition is assumed in order to prevent from bias in the estimation of the marginal measures under a non-ignorable missing mechanism. We give two different approaches for the estimation of the responses distribution function and of a given marginal functional, involving inverse probability weighting and the convolution of the distribution function of the observed residuals and that of the observed estimated regression function. Through a Monte Carlo study and two real data sets, we illustrate the behaviour of our proposals. PubDate: 2018-06-05 DOI: 10.1007/s11749-018-0591-5

Authors:Fode Zhang; Hon Keung Tony Ng; Yimin Shi; Ruibing Wang Abstract: Abstract The invariant geometric structures on the statistical manifold under sufficient statistics have played an important role in both statistical inference and information theory. In this paper, we focus on one of the commonly used invariant geometric structures, the Amari–Chentsov structure, on a statistical manifold. The manifold is derived from statistical models for accelerated life tests (ALTs) with censoring based on the exponential family of distributions. The constant-stress ALTs and step-stress ALTs are considered. We show that the statistical manifold still belongs to the exponential family of distributions, but the cumulant generating function depends on a random variable related to the experimental design of the ALT, which is different from the usual situation. We also investigate the Bregman divergence and Riemannian metric. The relationships between the Riemannian metric and the expected Fisher information metric are studied. The dual coordinate system is studied by using the Legendre transformation. Then, the Amari–Chentsov structure is derived based on the two different coordinate systems. The methodologies are illustrated by using two distributions, the exponential and gamma distributions, in the exponential family of distributions. Finally, using the results of Fisher information metric, optimal designs of the two types of ALTs are presented with different optimal criteria. Finally, numerical examples are provided to demonstrate the practical applications of the results developed in this paper. PubDate: 2018-05-30 DOI: 10.1007/s11749-018-0587-1

Authors:Chaofeng Yuan; Wensheng Zhu; Xuming He; Jianhua Guo Abstract: Abstract Investigators routinely use unidimensional summaries for multidimensional data. In microarray data analysis, for example, the gene expression level is indeed a unidimensional summary of probe-level or SNP measurements. In this paper, we propose a mixture factor model for the low-level data, which enables us to examine the adequacy of a unidimensional summary while accommodating known or latent subgroups in the population. We also develop screening procedures based on the proposed model to identify potentially informative genes in biomedical studies. As shown in our empirical studies, the proposed methods are often more effective than existing methods because the new model goes beyond the conventional unidimensional summaries of gene expressions. PubDate: 2018-05-11 DOI: 10.1007/s11749-018-0585-3

Authors:Jun Zhang; Junpeng Zhu; Zhenghui Feng Abstract: Abstract Estimation and hypothesis tests for single-index multiplicative models are considered in this paper. To estimate unknown single-index parameter, we propose a profile least product relative error estimator coupled with a leave-one-component-out method. For the hypothesis testing of parametric components, a Wald-type test statistic is proposed. The asymptotic properties of the estimators and test statistics are established, and a smoothly clipped absolute deviation penalty is employed to select the relevant variables. The resulting penalized estimators are shown to be asymptotically normal and have the oracle property. A score-type test statistic is then proposed for checking the validity of single-index multiplicative models. The quadratic form of the scaled test statistic has an asymptotic chi-squared distribution under the null hypothesis and follows a noncentral chi-squared distribution under local alternatives, converging to the null hypothesis at a parametric convergence rate. Simulation studies demonstrate the performance of the proposed procedure and a real example is analyzed to illustrate its practical usage. PubDate: 2018-05-03 DOI: 10.1007/s11749-018-0586-2