Abstract: In this paper, the class of limit distribution functions (df’s) of the joint upper record values with random sample size is fully characterized. Necessary and sufficient conditions, as well as the domains of attraction of the limit df’s are obtained. As an application of this result, the sufficient conditions for the weak convergence of the random of record quasi-ranges, record quasi-midranges, record extremal quasi-quotients and record extremal quasi-products are obtained. Moreover, the classes of the non-degenerate limit df’s of these statistics are derived. PubDate: 2019-03-19 DOI: 10.1007/s13171-019-00167-2

Abstract: The empirical likelihood ratio statistics are constructed for the parameters in spatial autoregressive models with spatial autoregressive disturbances. It is shown that the limiting distributions of the empirical likelihood ratio statistics are chi-squared distributions, which are used to construct confidence regions for the parameters in the models. PubDate: 2019-03-15 DOI: 10.1007/s13171-019-00166-3

Abstract: We propose an alternative skew-normal random matrix, which is an extension of the multivariate skew-normal vector parameterized in Vernic (A Stiint Univ Ovidius Constanta. 13, 83–96 2005, Insur. Math. Econ.38, 413–426 2006). We define the density function and then derive and apply the corresponding moment generating function to determine the mean matrix, covariance matrix, and third and fourth moments of the new skew-normal random matrix. Additionally, we derive eight marginal and two conditional density functions and provide necessary and sufficient conditions such that two pairs of sub-matrices are independent. Finally, we derive the moment generating function for a skew-normal random matrix-based quadratic form and show its relationship to the moment generating function of the noncentral Wishart and central Wishart random matrices. PubDate: 2019-02-20 DOI: 10.1007/s13171-019-00165-4

Abstract: Compositional data consists of vectors of proportions whose components sum to 1. Such vectors lie in the standard simplex, which is a manifold with boundary. One issue that has been rather controversial within the field of compositional data analysis is the choice of metric on the simplex. One popular possibility has been to use the metric implied by log-transforming the data, as proposed by Aitchison (Biometrika70, 57–65, 1983, 1986) and another popular approach has been to use the standard Euclidean metric inherited from the ambient space. Tsagris et al. (2011) proposed a one-parameter family of power transformations, the α-transformations, which include both the metric implied by Aitchison’s transformation and the Euclidean metric as particular cases. Our underlying philosophy is that, with many datasets, it may make sense to use the data to help us determine a suitable metric. A related possibility is to apply the α-transformations to a parametric family of distributions, and then estimate α along with the other parameters. However, as we shall see, when one follows this last approach with the Dirichlet family, some care is needed in a certain limiting case which arises (α → 0), as we found out when fitting this model to real and simulated data. Specifically, when the maximum likelihood estimator of α is close to 0, the other parameters tend to be large. The main purpose of the paper is to study this limiting case both theoretically and numerically and to provide insight into these numerical findings. PubDate: 2019-02-12 DOI: 10.1007/s13171-018-00160-1

Abstract: Randomized nomination sampling (RNS) is a rank-based sampling technique which has been shown to be effective in several nonparametric studies involving environmental, agricultural, medical and ecological applications. In this paper, we investigate parametric inference using RNS design for estimating an unknown vector of parameters θ in some parametric families of distributions. We examine both maximum likelihood (ML) and method of moments (MM) approaches. We introduce four types of RNS-based data as well as necessary EM algorithms for the ML estimation under each data type, and evaluate the performance of corresponding estimators in estimating θ compared with those based on simple random sampling (SRS). Our results can address many parametric inference problems in reliability theory, sport analytics, fisheries, etc. Theoretical results are augmented with numerical evaluations, where we also study inference based on imperfect ranking. We apply our methods to a real data problem in order to study the distribution of the mercury contamination in fish body using RNS designs. PubDate: 2019-02-08 DOI: 10.1007/s13171-018-00159-8

Abstract: This study introduces a method of selecting a subset of k populations containing the best when the populations are ranked in terms of the population means. It is assumed that the populations have an unknown location family of distribution functions. The proposed method involves estimating the constant in Gupta’s subset selection procedure by bootstrap. It is shown that estimating this constant amounts to estimating the distribution function of a certain function of random variables. The proposed bootstrap method is shown to be consistent and second-order correct in the sense that the accuracy of bootstrap approximation is better than that of the approximation based on limiting distribution. Results of a simulation study are given. PubDate: 2019-01-24 DOI: 10.1007/s13171-019-00163-6

Abstract: We study independent random variables (Zi)i∈I aggregated by integrating with respect to a nonatomic and finitely additive probability ν over the index set I. We analyze the behavior of the resulting random average \({\int }_I Z_i d\nu (i)\) . We establish that any ν that guarantees the measurability of \({\int }_I Z_i d\nu (i)\) satisfies the following law of large numbers: for any collection (Zi)i∈I of uniformly bounded and independent random variables, almost surely the realized average \({\int }_I Z_i d\nu (i)\) equals the average expectation \({\int }_I E[Z_i]d\nu (i)\) . PubDate: 2019-01-17 DOI: 10.1007/s13171-018-00162-z

Abstract: Consider a two-dimensional random vector (X, Y )T. Let Q0, Q1,… denote orthogonal polynomials with respect to the marginal distribution of X and let P0, P1,… denote orthogonal polynomials with respect to the marginal distribution of Y. In this paper, identities of the form E[Pn(Y ) X] = anQn(X), for constants a0, a1,… are considered and necessary and sufficient conditions for this type of identity to hold are given,. The application of the identity to the maximal correlation of two random variables and to the L2 completeness of a bivariate distribution are discussed. PubDate: 2019-01-02 DOI: 10.1007/s13171-018-00161-0

Abstract: The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called Pseudo-Marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature. PubDate: 2018-12-17 DOI: 10.1007/s13171-018-0153-7

Abstract: The Lorenz curve is a much used instrument in economic analysis. It is typically used for measuring inequality and concentration. In insurance, it is used to compare the riskiness of portfolios, to order reinsurance contracts and to summarize relativity scores (see Frees et al. J. Am. Statist. Assoc.106, 1085–1098, 2011; J. Risk Insur.81, 335–366, 2014 and Samanthi et al. Insur. Math. Econ.68, 84–91, 2016). It is sometimes called a concentration curve and, with this designation, it attracted the attention of Mahalanobis (Econometrica28, 335–351, 1960) in his well known paper on fractile graphical analysis. The extension of the Lorenz curve to higher dimensions is not a simple task. There are three proposed definitions for a suitable Lorenz surface, proposed by Taguchi (Ann. Inst. Statist. Math.24, 355–382, 1972a, 599–619, 1972b; Comput. Stat. Data Anal.6, 307–334, 1988) and Lunetta (1972), Arnold (1987, 2015) and Koshevoy and Mosler (J. Am. Statist. Assoc.91, 873–882, 1996). In this paper, using the definition proposed by Arnold (1987, 2015), we obtain analytic expressions for many multivariate Lorenz surfaces. We consider two general classes of models. The first is based on mixtures of Lorenz surfaces and the second one is based on some simple classes of bivariate mixture distributions. PubDate: 2018-12-05 DOI: 10.1007/s13171-018-00158-9

Abstract: The paper concerns a random property T of a manufactured product that must with high probability e.g. P* = 95% exceed a specified quantity ηa called the characteristic value (CV). However the product comes from any one of K different subpopulations that may represent such things as manufacturers, regions or countries; the distribution of T will generally differ from one subpopulation to another and so will the associated CV ηka, = 1,…,K. Moreover in applications such as the one we focus on in this paper where the subpopulations are species, the subpopulation of origin will, for both strategic or practical reasons, not be known. The problem confronted in this paper is the creation of a single CV for the population consisting of the union of all the subpopulations. A solution proposed long ago in the application concerning manufactured lumber that is addressed in this paper, selects a subset of the subpopulations using random samples of the T s, called the subset of controlling species CS, that includes the smallest of the {ηka} with high probability. The estimated CV for the entire population is then found by combining and treating as one, the samples for the subpopulations in CS. That method has been published in an ASTM standards document for the lumber industry to ensure the structural engineering strength of manufactured lumber. However this published method has been shown to have some unexpected and undesirable properties, leading to the search for an alternative and this paper. The paper presents and compares three subset selection methods. The simplest of the three methods is an extension of a classical nonparametric method for subset selection. The remaining two, which are more complex, are variations of nonparametric Bayesian methods. Each of the three is seen as a possible candidate for consideration by ASTM committees as a possible replacement for the ASTM method for lumber species depending on what criterion is ultimately used for its selection. But they may well apply in other contexts as well. PubDate: 2018-12-03 DOI: 10.1007/s13171-018-00157-w

Abstract: We propose two novel hyper nonlocal priors for variable selection in generalized linear models. To obtain these priors, we first derive two new priors for generalized linear models that combine the Fisher information matrix with the Johnson-Rossell moment and inverse moment priors. We then obtain our hyper nonlocal priors from our nonlocal Fisher information priors by assigning hyperpriors to their scale parameters. As a consequence, the hyper nonlocal priors bring less information on the effect sizes than the Fisher information priors, and thus are very useful in practice whenever the prior knowledge of effect size is lacking. We develop a Laplace integration procedure to compute posterior model probabilities, and we show that under certain regularity conditions the proposed methods are variable selection consistent. We also show that, when compared to local priors, our hyper nonlocal priors lead to faster accumulation of evidence in favor of a true null hypothesis. Simulation studies that consider binomial, Poisson, and negative binomial regression models indicate that our methods select true models with higher success rates than other existing Bayesian methods. Furthermore, the simulation studies show that our methods lead to mean posterior probabilities for the true models that are closer to their empirical success rates. Finally, we illustrate the application of our methods with an analysis of the Pima Indians diabetes dataset. PubDate: 2018-11-17 DOI: 10.1007/s13171-018-0151-9

Abstract: Debiased estimation has long been an area of research in the group testing literature. This has led to the development of several estimators with the goal of bias minimization and, recently, an unbiased estimator based on sequential binomial sampling. Previous research, however, has focused heavily on the simple case where no misclassification is assumed and only one trait is to be tested. In this paper, we consider the problem of unbiased estimation in these broader areas, giving constructions of such estimators for several cases. We show that, outside of the standard case addressed previously in the literature, it is impossible to find any proper unbiased estimator, that is, an estimator giving only values in the parameter space. This is shown to hold generally under any binomial or multinomial sampling plans. PubDate: 2018-11-15 DOI: 10.1007/s13171-018-0156-4

Abstract: Gui et al. (2017) proposed the Lindley geometric distribution, derived its properties including estimation issues and illustrated a data application. We introduce a new family of distributions containing the Lindley geometric distribution as a particular case. The new family is shown to provide significantly better fits. We also point out errors in various properties derived by Gui et al. (2017). PubDate: 2018-11-10 DOI: 10.1007/s13171-018-0150-x

Abstract: Osatohanmwen, Oyegue and Ogbonmwan (2017) introduced the Gumbel-Burr XII distribution, studied its properties including estimation issues and provided a data application. However, the likelihood function given appears incorrect, and this might have affected subsequent inference results and the real data application. Here, we provide the correct likelihood function. PubDate: 2018-11-08 DOI: 10.1007/s13171-018-0152-8

Abstract: This paper deals with uncertainty quantification (UQ) for a class of robust estimators of population parameters of a stationary, multivariate random field that is observed at a finite number of locations s1,…, sn, generated by a stochastic design. The class of robust estimators considered here is given by the so-called M-estimators that in particular include robust estimators of location, scale, linear regression parameters, as well as the maximum likelihood and pseudo maximum likelihood estimators, among others. Finding practically useful UQ measures, both in terms of standard errors of the point estimators as well as interval estimation for the parameters is a difficult problem due to presence of inhomogeneous dependence among irregularly spaced spatial observations. Exact and asymptotic variances of such estimators have a complicated form that depends on the autocovariance function of the random field, the spatial sampling density, and also on the relative rate of growth of the sample size versus the volume of the sampling region. Similar complex interactions of these factors are also present in the sampling distributions of these estimators which makes exact calibration of confidence intervals impractical. Here it is shown that a version of the spatial block bootstrap can be used to produce valid UQ measures, both in terms of estimation of the standard error as well as interval estimation. A key advantage of the proposed method is that it provides valid approximations in very general settings without requiring any explicit adjustments for spatial sampling structures and without requiring explicit estimation of the covariance function and of the spatial sampling density. PubDate: 2018-11-08 DOI: 10.1007/s13171-018-0154-6

Abstract: We consider the problem of simultaneous estimation of two population means when one suspects that the two means are nearly equal. It is shown that the hierarchical empirical Bayes estimators which shrink the sample means towards the suspected hypothesis dominate the sample mean vectors in simultaneous estimation under the divergence loss function. PubDate: 2018-11-03 DOI: 10.1007/s13171-018-0155-5

Abstract: In this article, we develop a test for multivariate location parameter in elliptical model based on the forward search estimator for a specified scatter matrix. Here, we study the asymptotic power of the test under contiguous alternatives based on the asymptotic distribution of the test statistics under such alternatives. Moreover, the performances of the test have been carried out for different simulated data and real data, and compared the performances with more classical ones. PubDate: 2018-11-03 DOI: 10.1007/s13171-018-0149-3

Abstract: Consider a helix in three-dimensional space along which a sequence of equally spaced points is observed, subject to statistical noise. For data coming from a single helix, a two-stage algorithm based on a profile likelihood is developed to compute the maximum likelihood estimate of the helix parameters. Statistical properties of the estimator are studied and comparisons are made to other estimators found in the literature. Next a likelihood ratio test is developed to test if there is a change point in the helix, splitting the data into two sub-helices. The shapes of protein α-helices are used to illustrate the methodology. PubDate: 2018-11-01 DOI: 10.1007/s13171-018-0144-8

Abstract: We propose a geometric framework to assess global sensitivity in Bayesian nonparametric models for density estimation. We study sensitivity of nonparametric Bayesian models for density estimation, based on Dirichlet-type priors, to perturbations of either the precision parameter or the base probability measure. To quantify the different effects of the perturbations of the parameters and hyperparameters in these models on the posterior, we define three geometrically-motivated global sensitivity measures based on geodesic paths and distances computed under the nonparametric Fisher-Rao Riemannian metric on the space of densities, applied to posterior samples of densities: (1) the Fisher-Rao distance between density averages of posterior samples, (2) the log-ratio of Karcher variances of posterior samples, and (3) the norm of the difference of scaled cumulative eigenvalues of empirical covariance operators obtained from posterior samples. We validate our approach using multiple simulation studies, and consider the problem of sensitivity analysis for Bayesian density estimation models in the context of three real datasets that have previously been studied. PubDate: 2018-10-02 DOI: 10.1007/s13171-018-0145-7