Authors:Armin Hatefi; Mohammad Jafari Jozani Pages: 117 - 149 Abstract: Abstract Partially rank-ordered set (PROS) sampling is a generalization of ranked set sampling in which rankers are not required to fully rank the sampling units in each set, hence having more flexibility to perform the necessary judgemental ranking process. The PROS sampling has a wide range of applications in different fields ranging from environmental and ecological studies to medical research and it has been shown to be superior over ranked set sampling and simple random sampling for estimating the population mean. We study Fisher information content and uncertainty structure of the PROS samples and compare them with those of simple random sample (SRS) and ranked set sample (RSS) counterparts of the same size from the underlying population. We study uncertainty structure in terms of the Shannon entropy, Rényi entropy and Kullback–Leibler (KL) discrimination measures. PubDate: 2017-04-01 DOI: 10.1007/s10182-016-0277-9 Issue No:Vol. 101, No. 2 (2017)

Authors:Taras Bodnar; Taras Zabolotskyy Pages: 1 - 28 Abstract: Abstract In this paper, we investigate the properties of the optimal portfolio in the sense of maximizing the Sharpe ratio (SR) and develop a procedure for the calculation of the risk of this portfolio. This is achieved by constructing an optimal portfolio which minimizes the Value-at-Risk (VaR) and at the same time coincides with the tangent (market) portfolio on the efficient frontier which is related to the SR portfolio. The resulting significance level of the minimum VaR portfolio is then used to determine the risk of both the market portfolio and the corresponding SR portfolio. However, the expression of this significance level depends on the unknown parameters which have to be estimated in practice. It leads to an estimator of the significance level whose distributional properties are investigated in detail. Based on these results, a confidence interval for the suggested risk measure of the SR portfolio is constructed and applied to real data. Both theoretical and empirical findings document that the SR portfolio is very risky since the corresponding significance level is smaller than 90 % in most of the considered cases. PubDate: 2017-01-01 DOI: 10.1007/s10182-016-0270-3 Issue No:Vol. 101, No. 1 (2017)

Authors:Takuma Yoshida Pages: 29 - 50 Abstract: Abstract This paper considers nonlinear regression analysis with a scalar response and multiple predictors. An unknown regression function is approximated by radial basis function models. The coefficients are estimated in the context of M-estimation. It is known that ordinary M-estimation leads to overfitting in nonlinear regression. The purpose of this paper is to construct a smooth estimator. The proposed method in this paper is conducted by a two-step procedure. First, the sufficient dimension reduction methods are applied to the response and radial basis functions for transforming the large number of radial bases to a small number of linear combinations of the radial bases without loss of information. In the second step, a multiple linear regression model between a response and the transformed radial bases is assumed and the ordinary M-estimation is applied. Thus, the final estimator is also obtained as a linear combination of radial bases. The validity and an asymptotic study of the proposed method are explored. A simulation and data example are addressed to confirm the behavior of the proposed method. PubDate: 2017-01-01 DOI: 10.1007/s10182-016-0271-2 Issue No:Vol. 101, No. 1 (2017)

Authors:Abhik Ghosh; Magne Thoresen Abstract: Abstract Mixed-effect models are very popular for analyzing data with a hierarchical structure. In medical applications, typical examples include repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed-effect covariates collected from each patient can be quite large, e.g., data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed-effect variables in linear mixed-effect models along with maximum penalized likelihood estimation of both fixed and random-effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model setup (like finite mixture of regressions) illustrating the huge range of applicability of our proposal. PubDate: 2017-05-15 DOI: 10.1007/s10182-017-0298-z

Authors:Georg Hahn Abstract: Abstract Statistical discoveries are often obtained through multiple hypothesis testing. A variety of procedures exists to evaluate multiple hypotheses, for instance the ones of Benjamini–Hochberg, Bonferroni, Holm or Sidak. We are particularly interested in multiple testing procedures with two desired properties: (solely) monotonic and well-behaved procedures. This article investigates to which extent the classes of (monotonic or well-behaved) multiple testing procedures, in particular the subclasses of so-called step-up and step-down procedures, are closed under basic set operations, specifically the union, intersection, difference and the complement of sets of rejected or non-rejected hypotheses. The present article proves two main results: First, taking the union or intersection of arbitrary (monotonic or well-behaved) multiple testing procedures results in new procedures which are monotonic but not well-behaved, whereas the complement or difference generally preserves neither property. Second, the two classes of (solely monotonic or well-behaved) step-up and step-down procedures are closed under taking the union or intersection, but not the complement or difference. PubDate: 2017-05-05 DOI: 10.1007/s10182-017-0297-0

Authors:Peter Pütz; Thomas Kneib Abstract: Abstract Estimating nonlinear effects of continuous covariates by penalized splines is well established for regressions with cross-sectional data as well as for panel data regressions with random effects. Penalized splines are particularly advantageous since they enable both the estimation of unknown nonlinear covariate effects and inferential statements about these effects. The latter are based, for example, on simultaneous confidence bands that provide a simultaneous uncertainty assessment for the whole estimated functions. In this paper, we consider fixed effects panel data models instead of random effects specifications and develop a first-difference approach for the inclusion of penalized splines in this case. We take the resulting dependence structure into account and adapt the construction of simultaneous confidence bands accordingly. In addition, the penalized spline estimates as well as the confidence bands are also made available for derivatives of the estimated effects which are of considerable interest in many application areas. As an empirical illustration, we analyze the dynamics of life satisfaction over the life span based on data from the German Socio-Economic Panel. An open-source software implementation of our methods is available in the R package pamfe. PubDate: 2017-04-12 DOI: 10.1007/s10182-017-0296-1

Authors:Daniela F. Dianda; Marta B. Quaglino; José A. Pagura Abstract: Abstract Current industrial processes are sophisticated enough to be tied to only one quality variable to describe the process result. Instead, many process variables need to be analyze together to assess the process performance. In particular, multivariate process capability analysis (MPCIs) has been the focus of study during the last few decades, during which many authors proposed alternatives to build the indices. These measures are extremely attractive to people in charge of industrial processes, because they provide a single measure that summarizes the whole process performance regarding its specifications. In most practical applications, these indices are estimated from sampling information collected by measuring the variables of interest on the process outcome. This activity introduces an additional source of variation to data, that needs to be considered, regarding its effect on the properties of the indices. Unfortunately, this problem has received scarce attention, at least in the multivariate domain. In this paper, we study how the presence of measurement errors affects the properties of one of the MPCIs recommended in previous researches. The results indicate that even little measurement errors can induce distortions on the index value, leading to wrong conclusions about the process performance. PubDate: 2017-04-07 DOI: 10.1007/s10182-017-0295-2

Authors:Hans Wolfgang Brachinger; Michael Beer; Olivier Schöni Abstract: Abstract Hedonic methods are considered state of the art for handling quality changes when compiling consumer price indices. The present article proposes first a mathematical description of characteristics and of elementary aggregates. In a following step, a hedonic econometric model is formulated and hedonic elementary population indices are defined. We emphasise that population indices are unobservable economic parameters that need to be estimated by suitable sample indices. It is shown that within the framework developed here, many of the hedonic index formulae used in practice are identified as sample versions corresponding to particular hedonic elementary population indices. The article closes with an empirical part on quarterly housing data where the considered hedonic indices are estimated along with their bootstrapped confidence intervals. It is shown that the computed confidence intervals together with the results from theory suggest a particular answer to the price index problem. PubDate: 2017-02-22 DOI: 10.1007/s10182-017-0293-4

Authors:Leila Amiri; Mojtaba Khazaei; Mojtaba Ganjali Abstract: Abstract Latent variable models are widely used for jointly modeling of mixed data including nominal, ordinal, count and continuous data. In this paper, we consider a latent variable model for jointly modeling relationships between mixed binary, count and continuous variables with some observed covariates. We assume that, given a latent variable, mixed variables of interest are independent and count and continuous variables have Poisson distribution and normal distribution, respectively. As such data may be extracted from different subpopulations, consideration of an unobserved heterogeneity has to be taken into account. A mixture distribution is considered (for the distribution of the latent variable) which accounts the heterogeneity. The generalized EM algorithm which uses the Newton–Raphson algorithm inside the EM algorithm is used to compute the maximum likelihood estimates of parameters. The standard errors of the maximum likelihood estimates are computed by using the supplemented EM algorithm. Analysis of the primary biliary cirrhosis data is presented as an application of the proposed model. PubDate: 2017-02-21 DOI: 10.1007/s10182-017-0294-3

Authors:Gurutzeta Guillera-Arroita; José J. Lahoz-Monfort Abstract: Abstract Species occupancy, the proportion of sites occupied by a species, is a state variable of interest in ecology. One challenge in its estimation is that detection is often imperfect in wildlife surveys. As a consequence, occupancy models that explicitly describe the observation process are becoming widely used in the discipline. These models require data that are informative about species detectability. Such information is often obtained by conducting repeat surveys to sampling sites. One strategy is to survey each site a predefined number of times, regardless of whether the species is detected. Alternatively, one can stop surveying a site once the species is detected and reallocate the effort saved to surveying new sites. In this paper we evaluate the merits of these two general design strategies under a range of realistic conditions. We conclude that continuing surveys after detection is beneficial unless the cumulative probability of detection at occupied sites is close to one, and that the benefits are greater when the sample size is small. Since detectability and sample size tend to be small in ecological applications, our recommendation is to follow a strategy where at least some of the sites continue to be sampled after first detection. PubDate: 2017-02-17 DOI: 10.1007/s10182-017-0292-5

Authors:N. Balakrishnan; N. Martín; L. Pardo Abstract: Abstract Empirical phi-divergence test statistics have demostrated to be a useful technique for the simple null hypothesis to improve the finite sample behavior of the classical likelihood ratio test statistic, as well as for model misspecification problems, in both cases for the one population problem. This paper introduces this methodology for two-sample problems. A simulation study illustrates situations in which the new test statistics become a competitive tool with respect to the classical z test and the likelihood ratio test statistic. PubDate: 2017-02-13 DOI: 10.1007/s10182-017-0289-0

Authors:Silvia L. P. Ferrari; Giovana Fumes Abstract: Abstract We introduce and study the Box–Cox symmetric class of distributions, which is useful for modeling positively skewed, possibly heavy-tailed, data. The new class of distributions includes the Box–Cox t, Box–Cox Cole-Green (or Box–Cox normal), Box–Cox power exponential distributions, and the class of the log-symmetric distributions as special cases. It provides easy parameter interpretation, which makes it convenient for regression modeling purposes. Additionally, it provides enough flexibility to handle outliers. The usefulness of the Box–Cox symmetric models is illustrated in a series of applications to nutritional data. PubDate: 2017-02-13 DOI: 10.1007/s10182-017-0291-6

Authors:Jin Zhang Abstract: Abstract Under a proper restriction, we establish the minimum volume confidence set (interval and region) for parameter of any normal distribution. Compared with classical methods, the proposed confidence region is proved to be the best with minimum area, for whatever confidence level, sample size and sample data. PubDate: 2017-02-08 DOI: 10.1007/s10182-017-0290-7

Authors:Max Wornowizki; Roland Fried; Simos G. Meintanis Abstract: Abstract We develop procedures for testing whether a sequence of independent random variables has constant variance. If this is fulfilled, the modulus of a Fourier-type transformation of the volatility process is identically equal to one. Our approach takes advantage of this property considering a canonical estimator for the modulus under the assumption of piecewise identically distributed zero mean observations. Using blockwise variance estimation, we introduce several test statistics resulting from different weight functions. All of them are given by simple explicit formulae. We prove the consistency of the corresponding tests and compare them to alternative procedures on extensive Monte Carlo experiments. According to the results, our proposals offer fairly high power, particularly in the case of multiple structural breaks. They also allow for an adequate estimation of the change point positions. We apply our procedure to gold mining data and also briefly discuss how it can be modified to test for the stationarity of other distributional parameters. PubDate: 2017-02-03 DOI: 10.1007/s10182-017-0288-1

Authors:Russell J. Bowater Abstract: Abstract This paper defends the fiducial argument. In particular, an interpretation of the fiducial argument is defended in which fiducial probability is treated as being subjective and the role taken by pivots in a more standard interpretation is taken by what are called primary random variables, which in fact form a special class of pivots. The resulting methodology, which is referred to as subjective fiducial inference, is outlined in the first part of the paper. This is followed by a defence of this methodology arranged in a series of criticisms and responses. These criticisms reflect objections that are often raised against standard fiducial inference and incorporate more specific concerns that are likely to exist with respect to subjective fiducial inference. It is hoped that the responses to these criticisms clarify the contribution that a system of fiducial reasoning can make to statistical inference. PubDate: 2017-01-24 DOI: 10.1007/s10182-016-0285-9

Authors:David L. Borchers; Tiago A. Marques Abstract: Abstract Distance sampling and capture–recapture are the two most widely used wildlife abundance estimation methods. capture–recapture methods have only recently incorporated models for spatial distribution and there is an increasing tendency for distance sampling methods to incorporated spatial models rather than to rely on partly design-based spatial inference. In this overview we show how spatial models are central to modern distance sampling and that spatial capture–recapture models arise as an extension of distance sampling methods. Depending on the type of data recorded, they can be viewed as particular kinds of hierarchical binary regression, Poisson regression, survival or time-to-event models, with individuals’ locations as latent variables and a spatial model as the latent variable distribution. Incorporation of spatial models in these two methods provides new opportunities for drawing explicitly spatial inferences. Areas of likely future development include more sophisticated spatial and spatio-temporal modelling of individuals’ locations and movements, new methods for integrating spatial capture–recapture and other kinds of ecological survey data, and methods for dealing with the recapture uncertainty that often arise when “capture” consists of detection by a remote device like a camera trap or microphone. PubDate: 2017-01-10 DOI: 10.1007/s10182-016-0287-7

Authors:Y. Andriyana; I. Gijbels Abstract: Abstract Varying coefficient models are flexible models to describe the dynamic structure in longitudinal data. Quantile regression, more than mean regression, gives partial information on the conditional distribution of the response given the covariates. In the literature, the focus has been so far mostly on homoscedastic quantile regression models, whereas there is an interest in looking into heteroscedastic modelling. This paper contributes to the area by modelling the heteroscedastic structure and estimating it from the data, together with estimating the quantile functions. The use of the proposed methods is illustrated on real-data applications. The finite-sample behaviour of the methods is investigated via a simulation study, which includes a comparison with an existing method. PubDate: 2016-11-25 DOI: 10.1007/s10182-016-0284-x

Authors:Shuanghua Luo; Changlin Mei; Cheng-yi Zhang Abstract: Abstract This paper studies smoothed quantile linear regression models with response data missing at random. Three smoothed quantile empirical likelihood ratios are proposed first and shown to be asymptotically Chi-squared. Then, the confidence intervals for the regression coefficients are constructed without the estimation of the asymptotic covariance. Furthermore, a class of estimators for the regression parameter is presented to derive its asymptotic distribution. Simulation studies are conducted to assess the finite sample performance. Finally, a real-world data set is analyzed to illustrated the effectiveness of the proposed methods. PubDate: 2016-08-09 DOI: 10.1007/s10182-016-0278-8

Authors:Robert Garthoff; Philipp Otto Abstract: Abstract This paper deals with spatial detection of changes in model parameters of spatial autoregressive processes. The respective sequential testing problems are formulated. Moreover, we introduce characteristic quantities to monitor means or covariances of multivariate spatial autoregressive processes. Additionally, we also take into account the simultaneous surveillance of the mean vector and the covariance matrix. The aim is to apply control charts, important tools of sequential analysis, to these quantities. The considered control procedures are based on either cumulative sums or exponential smoothing. Further, we illustrate the methodology of statistical process control studying the spectrum of additive colors in a satellite photograph. Via simulation studies, the proposed control procedures are calibrated for a predefined average run length. In addition, we compare the performance of the control procedures considering the out-of-control situation. Eventually, the control charts are applied, and the signals of the different schemes are visualized. The final results are critically discussed. PubDate: 2016-07-26 DOI: 10.1007/s10182-016-0276-x

Authors:Haresh D. Rochani; Robert L. Vogel; Hani M. Samawi; Daniel F. Linder Abstract: Abstract Missing observations often occur in cross-classified data collected during observational, clinical, and public health studies. Inappropriate treatment of missing data can reduce statistical power and give biased results. This work extends the Baker, Rosenberger and Dersimonian modeling approach to compute maximum likelihood estimates for cell counts in three-way tables with missing data, and studies the association between two dichotomous variables while controlling for a third variable in \( 2\times 2 \times K \) tables. This approach is applied to the Behavioral Risk Factor Surveillance System data. Simulation studies are used to investigate the efficiency of estimation of the common odds ratio. PubDate: 2016-07-18 DOI: 10.1007/s10182-016-0275-y