• Flexible clustering via extended mixtures of common t -factor analyzers
• Authors: Wan-Lun Wang; Tsung-I Lin
Abstract: Abstract Mixtures of t-factor analyzers have been broadly used for model-based density estimation and clustering of high-dimensional data from a heterogeneous population with longer-than-normal tails or atypical observations. To reduce the number of parameters in the component covariance matrices, the mixtures of common t-factor analyzers (MCtFA) have been recently proposed by assuming a common factor loading across different components. In this paper, we present an extended version of MCtFA using distinct covariance matrices for component errors. The modified mixture model offers a more appropriate way to represent the data in a graphical fashion. Two flexible EM-type algorithms are developed for iteratively computing maximum likelihood estimates of parameters. Practical considerations for the specification of starting values, model-based clustering, classification of new subject and identification of potential outliers are also provided. We demonstrate the superiority of the proposed methodology by analyzing the Italian wine data and a simulation study.
• A test for the global minimum variance portfolio for small sample and
singular covariance
• Authors: Taras Bodnar; Stepan Mazur; Krzysztof Podgórski
Abstract: Abstract Recently, a test dealing with the linear hypothesis for the global minimum variance portfolio weights was obtained under the assumption of non-singular covariance matrix. However, the problem of potential multicollinearity and correlations of assets constitutes a limitation of the classical portfolio theory. Therefore, there is an interest in developing theory in the presence of singularities in the covariance matrix. In this paper, we extend the test by analyzing the portfolio weights in the small sample case with a singular population covariance matrix. The results are illustrated using actual stock returns and a discussion of practical relevance of the model is presented.
• Prediction model-based kernel density estimation when group membership is
subject to missing
• Authors: Hua He; Wenjuan Wang; Wan Tang
Abstract: Abstract The density function is a fundamental concept in data analysis. When a population consists of heterogeneous subjects, it is often of great interest to estimate the density functions of the subpopulations. Nonparametric methods such as kernel smoothing estimates may be applied to each subpopulation to estimate the density functions if there are no missing values. In situations where the membership for a subpopulation is missing, kernel smoothing estimates using only subjects with membership available are valid only under missing complete at random (MCAR). In this paper, we propose new kernel smoothing methods for density function estimates by applying prediction models of the membership under the missing at random (MAR) assumption. The asymptotic properties of the new estimates are developed, and simulation studies and a real study in mental health are used to illustrate the performance of the new estimates.
• Measuring temporal trends in biodiversity
• Authors: S. T. Buckland; Y. Yuan; E. Marcon
Abstract: Abstract In 2002, nearly 200 nations signed up to the 2010 target of the Convention for Biological Diversity, ‘to significantly reduce the rate of biodiversity loss by 2010’. To assess whether the target was met, it became necessary to quantify temporal trends in measures of diversity. This resulted in a marked shift in focus for biodiversity measurement. We explore the developments in measuring biodiversity that was prompted by the 2010 target. We consider measures based on species proportions, and also explain why a geometric mean of relative abundance estimates was preferred to such measures for assessing progress towards the target. We look at the use of diversity profiles, and consider how species similarity can be incorporated into diversity measures. We also discuss measures of turnover that can be used to quantify shifts in community composition arising, for example, from climate change.
• Variance estimation for integrated population models
• Authors: Panagiotis Besbeas; Byron J. T. Morgan
Abstract: Abstract State-space models are widely used in ecology. However, it is well known that in practice it can be difficult to estimate both the process and observation variances that occur in such models. We consider this issue for integrated population models, which incorporate state-space models for population dynamics. To some extent, the mechanism of integrated population models protects against this problem, but it can still arise, and two illustrations are provided, in each of which the observation variance is estimated as zero. In the context of an extended case study involving data on British Grey herons, we consider alternative approaches for dealing with the problem when it occurs. In particular, we consider penalised likelihood, a method based on fitting splines and a method of pseudo-replication, which is undertaken via a simple bootstrap procedure. For the case study of the paper, it is shown that when it occurs, an estimate of zero observation variance is unimportant for inference relating to the model parameters of primary interest. This unexpected finding is supported by a simulation study.
• First-order random coefficients integer-valued threshold autoregressive
processes
• Authors: Han Li; Kai Yang; Shishun Zhao; Dehui Wang
Abstract: Abstract In this paper, we introduce a first-order random coefficient integer-valued threshold autoregressive process, which is based on binomial thinning. Basic probabilistic and statistical properties of this model are discussed. Conditional least squares and conditional maximum likelihood estimators are derived for both the cases that the threshold variable is known or not. The asymptotic properties of the estimators are established. Moreover, forecasting problem is addressed. Finally, some numerical results of the estimates and a real data example are presented.
• A distance-based model for spatial prediction using radial basis functions
• Authors: Carlos E. Melo; Oscar O. Melo; Jorge Mateu
Abstract: Abstract In the context of local interpolators, radial basis functions (RBFs) are known to reduce the computational time by using a subset of the data for prediction purposes. In this paper, we propose a new distance-based spatial RBFs method which allows modeling spatial continuous random variables. The trend is incorporated into a RBF according to a detrending procedure with mixed variables, among which we may have categorical variables. In order to evaluate the efficiency of the proposed method, a simulation study is carried out for a variety of practical scenarios for five distinct RBFs, incorporating principal coordinates. Finally, the proposed method is illustrated with an application of prediction of calcium concentration measured at a depth of 0–20 cm in Brazil, selecting the smoothing parameter by cross-validation.
• Improving the usability of spatial point process methodology: an
interdisciplinary dialogue between statistics and ecology
• Authors: Janine B. Illian; David F. R. P. Burslem
Abstract: Abstract The last few decades have seen an increasing interest and strong development in spatial point process methodology, and associated software that facilitates model fitting has become available. A lot of this progress has made these approaches more accessible to users, through freely available software. However, in the ecological user community the methodology has only been slowly picked up despite its obvious relevance to the field. This paper reflects on this development, highlighting mutual benefits of interdisciplinary dialogue for both statistics and ecology. We detail the contribution point process methodology has made to research on biodiversity theory as a result of this dialogue and reflect on reasons for the slow take-up of the methodology. This primarily concerns the current lack of consideration of the usability of the approaches, which we discuss in detail, presenting current discussions as well as indicating future directions.
• Statistical modelling of individual animal movement: an overview of key
methods and a discussion of practical challenges
• Authors: Toby A. Patterson; Alison Parton; Roland Langrock; Paul G. Blackwell; Len Thomas; Ruth King
Abstract: Abstract With the influx of complex and detailed tracking data gathered from electronic tracking devices, the analysis of animal movement data has recently emerged as a cottage industry among biostatisticians. New approaches of ever greater complexity are continue to be added to the literature. In this paper, we review what we believe to be some of the most popular and most useful classes of statistical models used to analyse individual animal movement data. Specifically, we consider discrete-time hidden Markov models, more general state-space models and diffusion processes. We argue that these models should be core components in the toolbox for quantitative researchers working on stochastic modelling of individual animal movement. The paper concludes by offering some general observations on the direction of statistical analysis of animal movement. There is a trend in movement ecology towards what are arguably overly complex modelling approaches which are inaccessible to ecologists, unwieldy with large data sets or not based on mainstream statistical practice. Additionally, some analysis methods developed within the ecological community ignore fundamental properties of movement data, potentially leading to misleading conclusions about animal movement. Corresponding approaches, e.g. based on Lévy walk-type models, continue to be popular despite having been largely discredited. We contend that there is a need for an appropriate balance between the extremes of either being overly complex or being overly simplistic, whereby the discipline relies on models of intermediate complexity that are usable by general ecologists, but grounded in well-developed statistical practice and efficient to fit to large data sets.
• Bayesian conditional inference for Rasch models
• Authors: Clemens Draxler
Abstract: Abstract This paper is concerned with Bayesian inference in psychometric modeling. It treats conditional likelihood functions obtained from discrete conditional probability distributions which are generalizations of the hypergeometric distribution. The influence of nuisance parameters is eliminated by conditioning on observed values of their sufficient statistics, and Bayesian considerations are only referred to parameters of interest. Since such a combination of techniques to deal with both types of parameters is less common in psychometrics, a wider scope in future research may be gained. The focus is on the evaluation of the empirical appropriateness of assumptions of the Rasch model, thereby pointing to an alternative to the frequentists’ approach which is dominating in this context. A number of examples are discussed. Some are very straightforward to apply. Others are computationally intensive and may be unpractical. The suggested procedure is illustrated using real data from a study on vocational education.
• On composite likelihood in bivariate meta-analysis of diagnostic test
accuracy studies
• Authors: Aristidis K. Nikoloulopoulos
Abstract: Abstract The composite likelihood is amongst the computational methods used for estimation of the generalized linear mixed model (GLMM) in the context of bivariate meta-analysis of diagnostic test accuracy studies. Its advantage is that the likelihood can be derived conveniently under the assumption of independence between the random effects, but there has not been a clear analysis of the merit or necessity of this method. For synthesis of diagnostic test accuracy studies, a copula mixed model has been proposed in the biostatistics literature. This general model includes the GLMM as a special case and can also allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence in the joint tails. A maximum likelihood (ML) method, which is based on evaluating the bi-dimensional integrals of the likelihood with quadrature methods, has been proposed, and in fact it eases any computational difficulty that might be caused by the double integral in the likelihood function. Both methods are thoroughly examined with extensive simulations and illustrated with data of a published meta-analysis. It is shown that the ML method has no non-convergence issues or computational difficulties and at the same time allows estimation of the dependence between study-specific sensitivity and specificity and thus prediction via summary receiver operating curves.
• Estimation of structural impulse responses: short-run versus long-run
identifying restrictions
• Authors: Helmut Lütkepohl; Anna Staszewska-Bystrova; Peter Winker
Abstract: Abstract There is evidence that estimates of long-run impulse responses of structural vector autoregressive (VAR) models based on long-run identifying restrictions may not be very accurate. This finding suggests that using short-run identifying restrictions may be preferable. We compare structural VAR impulse response estimates based on long-run and short-run identifying restrictions and find that long-run identifying restrictions can result in much more precise estimates for the structural impulse responses than restrictions on the impact effects of the shocks.
• Non-concave penalization in linear mixed-effect models and regularized
selection of fixed effects
• Authors: Abhik Ghosh; Magne Thoresen
Abstract: Abstract Mixed-effect models are very popular for analyzing data with a hierarchical structure. In medical applications, typical examples include repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed-effect covariates collected from each patient can be quite large, e.g., data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed-effect variables in linear mixed-effect models along with maximum penalized likelihood estimation of both fixed and random-effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model setup (like finite mixture of regressions) illustrating the huge range of applicability of our proposal.
• Closure properties of classes of multiple testing procedures
• Authors: Georg Hahn
Abstract: Abstract Statistical discoveries are often obtained through multiple hypothesis testing. A variety of procedures exists to evaluate multiple hypotheses, for instance the ones of Benjamini–Hochberg, Bonferroni, Holm or Sidak. We are particularly interested in multiple testing procedures with two desired properties: (solely) monotonic and well-behaved procedures. This article investigates to which extent the classes of (monotonic or well-behaved) multiple testing procedures, in particular the subclasses of so-called step-up and step-down procedures, are closed under basic set operations, specifically the union, intersection, difference and the complement of sets of rejected or non-rejected hypotheses. The present article proves two main results: First, taking the union or intersection of arbitrary (monotonic or well-behaved) multiple testing procedures results in new procedures which are monotonic but not well-behaved, whereas the complement or difference generally preserves neither property. Second, the two classes of (solely monotonic or well-behaved) step-up and step-down procedures are closed under taking the union or intersection, but not the complement or difference.
• A penalized spline estimator for fixed effects panel data models
• Authors: Peter Pütz; Thomas Kneib
Abstract: Abstract Estimating nonlinear effects of continuous covariates by penalized splines is well established for regressions with cross-sectional data as well as for panel data regressions with random effects. Penalized splines are particularly advantageous since they enable both the estimation of unknown nonlinear covariate effects and inferential statements about these effects. The latter are based, for example, on simultaneous confidence bands that provide a simultaneous uncertainty assessment for the whole estimated functions. In this paper, we consider fixed effects panel data models instead of random effects specifications and develop a first-difference approach for the inclusion of penalized splines in this case. We take the resulting dependence structure into account and adapt the construction of simultaneous confidence bands accordingly. In addition, the penalized spline estimates as well as the confidence bands are also made available for derivatives of the estimated effects which are of considerable interest in many application areas. As an empirical illustration, we analyze the dynamics of life satisfaction over the life span based on data from the German Socio-Economic Panel. An open-source software implementation of our methods is available in the R package pamfe.
• Impact of measurement errors on the performance and distributional
properties of the multivariate capability index $$\mathbf{NMC }_\mathbf{pm }$$ NMC pm
• Authors: Daniela F. Dianda; Marta B. Quaglino; José A. Pagura
Abstract: Abstract Current industrial processes are sophisticated enough to be tied to only one quality variable to describe the process result. Instead, many process variables need to be analyze together to assess the process performance. In particular, multivariate process capability analysis (MPCIs) has been the focus of study during the last few decades, during which many authors proposed alternatives to build the indices. These measures are extremely attractive to people in charge of industrial processes, because they provide a single measure that summarizes the whole process performance regarding its specifications. In most practical applications, these indices are estimated from sampling information collected by measuring the variables of interest on the process outcome. This activity introduces an additional source of variation to data, that needs to be considered, regarding its effect on the properties of the indices. Unfortunately, this problem has received scarce attention, at least in the multivariate domain. In this paper, we study how the presence of measurement errors affects the properties of one of the MPCIs recommended in previous researches. The results indicate that even little measurement errors can induce distortions on the index value, leading to wrong conclusions about the process performance.
• Empirical phi-divergence test statistics for the difference of means of
two populations
• Authors: N. Balakrishnan; N. Martín; L. Pardo
Abstract: Abstract Empirical phi-divergence test statistics have demostrated to be a useful technique for the simple null hypothesis to improve the finite sample behavior of the classical likelihood ratio test statistic, as well as for model misspecification problems, in both cases for the one population problem. This paper introduces this methodology for two-sample problems. A simulation study illustrates situations in which the new test statistics become a competitive tool with respect to the classical z test and the likelihood ratio test statistic.
• Box–Cox symmetric distributions and applications to nutritional data
• Authors: Silvia L. P. Ferrari; Giovana Fumes
Abstract: Abstract We introduce and study the Box–Cox symmetric class of distributions, which is useful for modeling positively skewed, possibly heavy-tailed, data. The new class of distributions includes the Box–Cox t, Box–Cox Cole-Green (or Box–Cox normal), Box–Cox power exponential distributions, and the class of the log-symmetric distributions as special cases. It provides easy parameter interpretation, which makes it convenient for regression modeling purposes. Additionally, it provides enough flexibility to handle outliers. The usefulness of the Box–Cox symmetric models is illustrated in a series of applications to nutritional data.
• Minimum volume confidence sets for parameters of normal distributions
• Authors: Jin Zhang
Abstract: Abstract Under a proper restriction, we establish the minimum volume confidence set (interval and region) for parameter of any normal distribution. Compared with classical methods, the proposed confidence region is proved to be the best with minimum area, for whatever confidence level, sample size and sample data.
• Fourier methods for analyzing piecewise constant volatilities
• Authors: Max Wornowizki; Roland Fried; Simos G. Meintanis
Abstract: Abstract We develop procedures for testing whether a sequence of independent random variables has constant variance. If this is fulfilled, the modulus of a Fourier-type transformation of the volatility process is identically equal to one. Our approach takes advantage of this property considering a canonical estimator for the modulus under the assumption of piecewise identically distributed zero mean observations. Using blockwise variance estimation, we introduce several test statistics resulting from different weight functions. All of them are given by simple explicit formulae. We prove the consistency of the corresponding tests and compare them to alternative procedures on extensive Monte Carlo experiments. According to the results, our proposals offer fairly high power, particularly in the case of multiple structural breaks. They also allow for an adequate estimation of the change point positions. We apply our procedure to gold mining data and also briefly discuss how it can be modified to test for the stationarity of other distributional parameters.
