 AStA Advances in Statistical Analysis   Published by Springer-Verlag
• Information content of partially rank-ordered set samples
• Authors: Armin Hatefi; Mohammad Jafari Jozani
Pages: 117 - 149
Abstract: Partially rank-ordered set (PROS) sampling is a generalization of ranked set sampling in which rankers are not required to fully rank the sampling units in each set, hence having more flexibility to perform the necessary judgemental ranking process. The PROS sampling has a wide range of applications in different fields ranging from environmental and ecological studies to medical research and it has been shown to be superior over ranked set sampling and simple random sampling for estimating the population mean. We study Fisher information content and uncertainty structure of the PROS samples and compare them with those of simple random sample (SRS) and ranked set sample (RSS) counterparts of the same size from the underlying population. We study uncertainty structure in terms of the Shannon entropy, Rényi entropy and Kullback–Leibler (KL) discrimination measures.
PubDate: 2017-04-01
DOI: 10.1007/s10182-016-0277-9
Issue No: Vol. 101, No. 2 (2017)

• How risky is the optimal portfolio which maximizes the Sharpe ratio?
• Authors: Taras Bodnar; Taras Zabolotskyy
Pages: 1 - 28
Abstract: In this paper, we investigate the properties of the optimal portfolio in the sense of maximizing the Sharpe ratio (SR) and develop a procedure for the calculation of the risk of this portfolio. This is achieved by constructing an optimal portfolio which minimizes the Value-at-Risk (VaR) and at the same time coincides with the tangent (market) portfolio on the efficient frontier which is related to the SR portfolio. The resulting significance level of the minimum VaR portfolio is then used to determine the risk of both the market portfolio and the corresponding SR portfolio. However, the expression of this significance level depends on the unknown parameters which have to be estimated in practice. It leads to an estimator of the significance level whose distributional properties are investigated in detail. Based on these results, a confidence interval for the suggested risk measure of the SR portfolio is constructed and applied to real data. Both theoretical and empirical findings document that the SR portfolio is very risky since the corresponding significance level is smaller than 90 % in most of the considered cases.
PubDate: 2017-01-01
DOI: 10.1007/s10182-016-0270-3
Issue No: Vol. 101, No. 1 (2017)

• Statistical modelling of individual animal movement: an overview of key
methods and a discussion of practical challenges
• Authors: Toby A. Patterson; Alison Parton; Roland Langrock; Paul G. Blackwell; Len Thomas; Ruth King
Abstract: With the influx of complex and detailed tracking data gathered from electronic tracking devices, the analysis of animal movement data has recently emerged as a cottage industry among biostatisticians. New approaches of ever greater complexity are continue to be added to the literature. In this paper, we review what we believe to be some of the most popular and most useful classes of statistical models used to analyse individual animal movement data. Specifically, we consider discrete-time hidden Markov models, more general state-space models and diffusion processes. We argue that these models should be core components in the toolbox for quantitative researchers working on stochastic modelling of individual animal movement. The paper concludes by offering some general observations on the direction of statistical analysis of animal movement. There is a trend in movement ecology towards what are arguably overly complex modelling approaches which are inaccessible to ecologists, unwieldy with large data sets or not based on mainstream statistical practice. Additionally, some analysis methods developed within the ecological community ignore fundamental properties of movement data, potentially leading to misleading conclusions about animal movement. Corresponding approaches, e.g. based on Lévy walk-type models, continue to be popular despite having been largely discredited. We contend that there is a need for an appropriate balance between the extremes of either being overly complex or being overly simplistic, whereby the discipline relies on models of intermediate complexity that are usable by general ecologists, but grounded in well-developed statistical practice and efficient to fit to large data sets.
PubDate: 2017-07-04
DOI: 10.1007/s10182-017-0302-7

• Bayesian conditional inference for Rasch models
• Authors: Clemens Draxler
Abstract: This paper is concerned with Bayesian inference in psychometric modeling. It treats conditional likelihood functions obtained from discrete conditional probability distributions which are generalizations of the hypergeometric distribution. The influence of nuisance parameters is eliminated by conditioning on observed values of their sufficient statistics, and Bayesian considerations are only referred to parameters of interest. Since such a combination of techniques to deal with both types of parameters is less common in psychometrics, a wider scope in future research may be gained. The focus is on the evaluation of the empirical appropriateness of assumptions of the Rasch model, thereby pointing to an alternative to the frequentists’ approach which is dominating in this context. A number of examples are discussed. Some are very straightforward to apply. Others are computationally intensive and may be unpractical. The suggested procedure is illustrated using real data from a study on vocational education.
PubDate: 2017-06-21
DOI: 10.1007/s10182-017-0303-6

• On composite likelihood in bivariate meta-analysis of diagnostic test
accuracy studies
• Authors: Aristidis K. Nikoloulopoulos
Abstract: The composite likelihood is amongst the computational methods used for estimation of the generalized linear mixed model (GLMM) in the context of bivariate meta-analysis of diagnostic test accuracy studies. Its advantage is that the likelihood can be derived conveniently under the assumption of independence between the random effects, but there has not been a clear analysis of the merit or necessity of this method. For synthesis of diagnostic test accuracy studies, a copula mixed model has been proposed in the biostatistics literature. This general model includes the GLMM as a special case and can also allow for flexible dependence modelling, different from assuming simple linear correlation structures, normality and tail independence in the joint tails. A maximum likelihood (ML) method, which is based on evaluating the bi-dimensional integrals of the likelihood with quadrature methods, has been proposed, and in fact it eases any computational difficulty that might be caused by the double integral in the likelihood function. Both methods are thoroughly examined with extensive simulations and illustrated with data of a published meta-analysis. It is shown that the ML method has no non-convergence issues or computational difficulties and at the same time allows estimation of the dependence between study-specific sensitivity and specificity and thus prediction via summary receiver operating curves.
PubDate: 2017-06-20
DOI: 10.1007/s10182-017-0299-y

• Estimation of structural impulse responses: short-run versus long-run
identifying restrictions
• Authors: Helmut Lütkepohl; Anna Staszewska-Bystrova; Peter Winker
Abstract: There is evidence that estimates of long-run impulse responses of structural vector autoregressive (VAR) models based on long-run identifying restrictions may not be very accurate. This finding suggests that using short-run identifying restrictions may be preferable. We compare structural VAR impulse response estimates based on long-run and short-run identifying restrictions and find that long-run identifying restrictions can result in much more precise estimates for the structural impulse responses than restrictions on the impact effects of the shocks.
PubDate: 2017-06-20
DOI: 10.1007/s10182-017-0300-9

• Non-concave penalization in linear mixed-effect models and regularized
selection of fixed effects
• Authors: Abhik Ghosh; Magne Thoresen
Abstract: Mixed-effect models are very popular for analyzing data with a hierarchical structure. In medical applications, typical examples include repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed-effect covariates collected from each patient can be quite large, e.g., data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed-effect variables in linear mixed-effect models along with maximum penalized likelihood estimation of both fixed and random-effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model setup (like finite mixture of regressions) illustrating the huge range of applicability of our proposal.
PubDate: 2017-05-15
DOI: 10.1007/s10182-017-0298-z

• Closure properties of classes of multiple testing procedures
• Authors: Georg Hahn
Abstract: Statistical discoveries are often obtained through multiple hypothesis testing. A variety of procedures exists to evaluate multiple hypotheses, for instance the ones of Benjamini–Hochberg, Bonferroni, Holm or Sidak. We are particularly interested in multiple testing procedures with two desired properties: (solely) monotonic and well-behaved procedures. This article investigates to which extent the classes of (monotonic or well-behaved) multiple testing procedures, in particular the subclasses of so-called step-up and step-down procedures, are closed under basic set operations, specifically the union, intersection, difference and the complement of sets of rejected or non-rejected hypotheses. The present article proves two main results: First, taking the union or intersection of arbitrary (monotonic or well-behaved) multiple testing procedures results in new procedures which are monotonic but not well-behaved, whereas the complement or difference generally preserves neither property. Second, the two classes of (solely monotonic or well-behaved) step-up and step-down procedures are closed under taking the union or intersection, but not the complement or difference.
PubDate: 2017-05-05
DOI: 10.1007/s10182-017-0297-0

• A penalized spline estimator for fixed effects panel data models
• Authors: Peter Pütz; Thomas Kneib
Abstract: Estimating nonlinear effects of continuous covariates by penalized splines is well established for regressions with cross-sectional data as well as for panel data regressions with random effects. Penalized splines are particularly advantageous since they enable both the estimation of unknown nonlinear covariate effects and inferential statements about these effects. The latter are based, for example, on simultaneous confidence bands that provide a simultaneous uncertainty assessment for the whole estimated functions. In this paper, we consider fixed effects panel data models instead of random effects specifications and develop a first-difference approach for the inclusion of penalized splines in this case. We take the resulting dependence structure into account and adapt the construction of simultaneous confidence bands accordingly. In addition, the penalized spline estimates as well as the confidence bands are also made available for derivatives of the estimated effects which are of considerable interest in many application areas. As an empirical illustration, we analyze the dynamics of life satisfaction over the life span based on data from the German Socio-Economic Panel. An open-source software implementation of our methods is available in the R package pamfe.
PubDate: 2017-04-12
DOI: 10.1007/s10182-017-0296-1

• Impact of measurement errors on the performance and distributional
properties of the multivariate capability index $$\mathbf{NMC }_\mathbf{pm }$$ NMC pm
• Authors: Daniela F. Dianda; Marta B. Quaglino; José A. Pagura
Abstract: Current industrial processes are sophisticated enough to be tied to only one quality variable to describe the process result. Instead, many process variables need to be analyze together to assess the process performance. In particular, multivariate process capability analysis (MPCIs) has been the focus of study during the last few decades, during which many authors proposed alternatives to build the indices. These measures are extremely attractive to people in charge of industrial processes, because they provide a single measure that summarizes the whole process performance regarding its specifications. In most practical applications, these indices are estimated from sampling information collected by measuring the variables of interest on the process outcome. This activity introduces an additional source of variation to data, that needs to be considered, regarding its effect on the properties of the indices. Unfortunately, this problem has received scarce attention, at least in the multivariate domain. In this paper, we study how the presence of measurement errors affects the properties of one of the MPCIs recommended in previous researches. The results indicate that even little measurement errors can induce distortions on the index value, leading to wrong conclusions about the process performance.
PubDate: 2017-04-07
DOI: 10.1007/s10182-017-0295-2

• A formal framework for hedonic elementary price indices
• Authors: Hans Wolfgang Brachinger; Michael Beer; Olivier Schöni
Abstract: Hedonic methods are considered state of the art for handling quality changes when compiling consumer price indices. The present article proposes first a mathematical description of characteristics and of elementary aggregates. In a following step, a hedonic econometric model is formulated and hedonic elementary population indices are defined. We emphasise that population indices are unobservable economic parameters that need to be estimated by suitable sample indices. It is shown that within the framework developed here, many of the hedonic index formulae used in practice are identified as sample versions corresponding to particular hedonic elementary population indices. The article closes with an empirical part on quarterly housing data where the considered hedonic indices are estimated along with their bootstrapped confidence intervals. It is shown that the computed confidence intervals together with the results from theory suggest a particular answer to the price index problem.
PubDate: 2017-02-22
DOI: 10.1007/s10182-017-0293-4

• A mixture latent variable model for modeling mixed data in heterogeneous
populations and its applications
• Authors: Leila Amiri; Mojtaba Khazaei; Mojtaba Ganjali
Abstract: Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model.
PubDate: 2017-02-21
DOI: 10.1007/s10182-017-0294-3

• Species occupancy estimation and imperfect detection: shall surveys
continue after the first detection?
• Authors: Gurutzeta Guillera-Arroita; José J. Lahoz-Monfort
Abstract: Species occupancy, the proportion of sites occupied by a species, is a state variable of interest in ecology. One challenge in its estimation is that detection is often imperfect in wildlife surveys. As a consequence, occupancy models that explicitly describe the observation process are becoming widely used in the discipline. These models require data that are informative about species detectability. Such information is often obtained by conducting repeat surveys to sampling sites. One strategy is to survey each site a predefined number of times, regardless of whether the species is detected. Alternatively, one can stop surveying a site once the species is detected and reallocate the effort saved to surveying new sites. In this paper we evaluate the merits of these two general design strategies under a range of realistic conditions. We conclude that continuing surveys after detection is beneficial unless the cumulative probability of detection at occupied sites is close to one, and that the benefits are greater when the sample size is small. Since detectability and sample size tend to be small in ecological applications, our recommendation is to follow a strategy where at least some of the sites continue to be sampled after first detection.
PubDate: 2017-02-17
DOI: 10.1007/s10182-017-0292-5

• Empirical phi-divergence test statistics for the difference of means of
two populations
• Authors: N. Balakrishnan; N. Martín; L. Pardo
Abstract: Empirical phi-divergence test statistics have demostrated to be a useful technique for the simple null hypothesis to improve the finite sample behavior of the classical likelihood ratio test statistic, as well as for model misspecification problems, in both cases for the one population problem. This paper introduces this methodology for two-sample problems. A simulation study illustrates situations in which the new test statistics become a competitive tool with respect to the classical z test and the likelihood ratio test statistic.
PubDate: 2017-02-13
DOI: 10.1007/s10182-017-0289-0

• Box–Cox symmetric distributions and applications to nutritional data
• Authors: Silvia L. P. Ferrari; Giovana Fumes
Abstract: We introduce and study the Box–Cox symmetric class of distributions, which is useful for modeling positively skewed, possibly heavy-tailed, data. The new class of distributions includes the Box–Cox t, Box–Cox Cole-Green (or Box–Cox normal), Box–Cox power exponential distributions, and the class of the log-symmetric distributions as special cases. It provides easy parameter interpretation, which makes it convenient for regression modeling purposes. Additionally, it provides enough flexibility to handle outliers. The usefulness of the Box–Cox symmetric models is illustrated in a series of applications to nutritional data.
PubDate: 2017-02-13
DOI: 10.1007/s10182-017-0291-6

• Minimum volume confidence sets for parameters of normal distributions
• Authors: Jin Zhang
Abstract: Under a proper restriction, we establish the minimum volume confidence set (interval and region) for parameter of any normal distribution. Compared with classical methods, the proposed confidence region is proved to be the best with minimum area, for whatever confidence level, sample size and sample data.
PubDate: 2017-02-08
DOI: 10.1007/s10182-017-0290-7

• Fourier methods for analyzing piecewise constant volatilities
• Authors: Max Wornowizki; Roland Fried; Simos G. Meintanis
Abstract: We develop procedures for testing whether a sequence of independent random variables has constant variance. If this is fulfilled, the modulus of a Fourier-type transformation of the volatility process is identically equal to one. Our approach takes advantage of this property considering a canonical estimator for the modulus under the assumption of piecewise identically distributed zero mean observations. Using blockwise variance estimation, we introduce several test statistics resulting from different weight functions. All of them are given by simple explicit formulae. We prove the consistency of the corresponding tests and compare them to alternative procedures on extensive Monte Carlo experiments. According to the results, our proposals offer fairly high power, particularly in the case of multiple structural breaks. They also allow for an adequate estimation of the change point positions. We apply our procedure to gold mining data and also briefly discuss how it can be modified to test for the stationarity of other distributional parameters.
PubDate: 2017-02-03
DOI: 10.1007/s10182-017-0288-1

• A defence of subjective fiducial inference
• Authors: Russell J. Bowater
Abstract: This paper defends the fiducial argument. In particular, an interpretation of the fiducial argument is defended in which fiducial probability is treated as being subjective and the role taken by pivots in a more standard interpretation is taken by what are called primary random variables, which in fact form a special class of pivots. The resulting methodology, which is referred to as subjective fiducial inference, is outlined in the first part of the paper. This is followed by a defence of this methodology arranged in a series of criticisms and responses. These criticisms reflect objections that are often raised against standard fiducial inference and incorporate more specific concerns that are likely to exist with respect to subjective fiducial inference. It is hoped that the responses to these criticisms clarify the contribution that a system of fiducial reasoning can make to statistical inference.
PubDate: 2017-01-24
DOI: 10.1007/s10182-016-0285-9

• From distance sampling to spatial capture–recapture
• Authors: David L. Borchers; Tiago A. Marques
Abstract: Distance sampling and capture–recapture are the two most widely used wildlife abundance estimation methods. capture–recapture methods have only recently incorporated models for spatial distribution and there is an increasing tendency for distance sampling methods to incorporated spatial models rather than to rely on partly design-based spatial inference. In this overview we show how spatial models are central to modern distance sampling and that spatial capture–recapture models arise as an extension of distance sampling methods. Depending on the type of data recorded, they can be viewed as particular kinds of hierarchical binary regression, Poisson regression, survival or time-to-event models, with individuals’ locations as latent variables and a spatial model as the latent variable distribution. Incorporation of spatial models in these two methods provides new opportunities for drawing explicitly spatial inferences. Areas of likely future development include more sophisticated spatial and spatio-temporal modelling of individuals’ locations and movements, new methods for integrating spatial capture–recapture and other kinds of ecological survey data, and methods for dealing with the recapture uncertainty that often arise when “capture” consists of detection by a remote device like a camera trap or microphone.
PubDate: 2017-01-10
DOI: 10.1007/s10182-016-0287-7

• Quantile regression in heteroscedastic varying coefficient models
• Authors: Y. Andriyana; I. Gijbels
Abstract: Varying coefficient models are flexible models to describe the dynamic structure in longitudinal data. Quantile regression, more than mean regression, gives partial information on the conditional distribution of the response given the covariates. In the literature, the focus has been so far mostly on homoscedastic quantile regression models, whereas there is an interest in looking into heteroscedastic modelling. This paper contributes to the area by modelling the heteroscedastic structure and estimating it from the data, together with estimating the quantile functions. The use of the proposed methods is illustrated on real-data applications. The finite-sample behaviour of the methods is investigated via a simulation study, which includes a comparison with an existing method.
PubDate: 2016-11-25
DOI: 10.1007/s10182-016-0284-x

