Authors:Sun-Hee Kim; Lei Li; Christos Faloutsos; Hyung-Jeong Yang; Seong-Whan Lee Pages: 1 - 13 Abstract: Publication date: Available online 17 July 2016 Source:Statistical Methodology Author(s): Sun-Hee Kim, Lei Li, Christos Faloutsos, Hyung-Jeong Yang, Seong-Whan Lee Acute hypotensive episodes (AHEs) are serious clinical events in intensive care units (ICUs), and require immediate treatment to prevent patient injury. Reducing the risks associated with an AHE requires effective and efficient mining of data generated from multiple physiological time series. We propose HeartCast, a model that extracts essential features from such data to effectively predict AHE. HeartCast combines a non-linear support vector machine with best-feature extraction via analysis of the baseline threshold, quartile parameters, and window size of the physiological signals. Our approach has the following benefits: (a) it extracts the most relevant features; (b) it provides the best results for identification of an AHE event; (c) it is fast and scales with linear complexity over the length of the window; and (d) it can manage missing values and noise/outliers by using a best-feature extraction method. We performed experiments on data continuously captured from physiological time series of ICU patients (roughly 3 GB of processed data). HeartCast was found to outperform other state-of-the-art methods found in the literature with a 13.7% improvement in classification accuracy.
Authors:Elvan Ceyhan Pages: 31 - 54 Abstract: Publication date: Available online 27 July 2016 Source:Statistical Methodology Author(s): Elvan Ceyhan We consider two types of graphs based on a family of proximity catch digraphs (PCDs) and study their edge density. In particular, the PCDs we use are a parameterized digraph family called proportional-edge (PE) PCDs and the two associated graph types are the “underlying graphs” and the newly introduced “reflexivity graphs” based on the PE-PCDs. These graphs are extensions of random geometric graphs where distance is replaced with a dissimilarity measure and the threshold is not fixed but depends on the location of the points. PCDs and the associated graphs are constructed based on data points from two classes, say X and Y , where one class (say class X ) forms the vertices of the PCD and the Delaunay tessellation of the other class (i.e., class Y ) yields the (Delaunay) cells which serve as the support of class X points. We demonstrate that edge density of these graphs is a U -statistic, hence obtain the asymptotic normality of it for data from any distribution that satisfies mild regulatory conditions. The rate of convergence to asymptotic normality is sharper for the edge density of the reflexivity and underlying graphs compared to the arc density of the PE-PCDs. For uniform data in Euclidean plane where Delaunay cells are triangles, we demonstrate that the distribution of the edge density is geometry invariant (i.e., independent of the shape of the triangular support). We compute the explicit forms of the asymptotic normal distribution for uniform data in one Delaunay triangle in the Euclidean plane utilizing this geometry invariance property. We also provide various versions of edge density in the multiple triangle case. The approach presented here can also be extended for application to data in higher dimensions.
Authors:Lei Yan; Dian-tong Kang Pages: 55 - 70 Abstract: Publication date: Available online 25 May 2016 Source:Statistical Methodology Author(s): Lei Yan, Dian-tong Kang Rényi (1961) proposed the Rényi entropy. Ebrahimi and Pellerey (1995) and Ebrahimi (1996) proposed the residual entropy. Recently, Nanda et al. (2014) obtained a quantile version of the Rényi residual entropy, the Rényi residual quantile entropy (RRQE). Base on the RRQE function, they defined a new stochastic order, the Rényi quantile entropy (RQE) order, and studied some properties of this order. In this paper, we focus on further properties of this new order. Some characterizations of the RQE order are investigated, closure and reversed closure properties are obtained, meanwhile, some illustrative examples are shown. As applications of a main result, the preservation of the RQE order in several stochastic models are discussed.
Authors:Sarah E. Holte; Eva K. Lee; Yajun Mei Pages: 71 - 82 Abstract: Publication date: Available online 24 August 2016 Source:Statistical Methodology Author(s): Sarah E. Holte, Eva K. Lee, Yajun Mei This research is motivated from the analysis of a real gene expression data that aims to identify a subset of “interesting” or “significant” genes for further studies. When we blindly applied the standard false discovery rate (FDR) methods, our biology collaborators were suspicious or confused, as the selected list of significant genes was highly unbalanced: there were ten times more under-expressed genes than the over-expressed genes. Their concerns led us to realize that the observed two-sample t -statistics were highly skewed and asymmetric, and thus the standard FDR methods might be inappropriate. To tackle this case, we propose a symmetric directional FDR control method that categorizes the genes into “over-expressed” and “under-expressed” genes, pairs “over-expressed” and “under-expressed” genes, defines the p -values for gene pairs via column permutations, and then applies the standard FDR method to select “significant” gene pairs instead of “significant” individual genes. We compare our proposed symmetric directional FDR method with the standard FDR method by applying them to simulated data and several well-known real data sets.
Authors:Juan Carlos Bustamante; Edixon Chacón Pages: 83 - 95 Abstract: Publication date: Available online 26 May 2016 Source:Statistical Methodology Author(s): Juan Carlos Bustamante, Edixon Chacón Two theoretical approaches are usually employed for the fitting of ordinal data: the underlying variables approach (UV) and the item response theory (IRT). In the UV approach, limited information methods [generalized least squares (GLS) and weighted least squares (WLS)] are employed. In the IRT approach, fitting is carried out with full information methods [Proportional Odds Model (POM), and the Normal Ogive (NOR)]. The four estimation methods (GLS, WLS, POM and NOR) are compared in this article at the same time, using a simulation study and analyzing the goodness-of-fit indices obtained. The parameters used in the Monte Carlo simulation arise from the application of a political action scale whose two-factor structure is well known. The results show that the estimation method employed affects the goodness-of-fit to the model. In our case, the IRT approach shows a better fitting than UV, especially with the POM method.
Authors:Nadine Hilgert; Ghislain Verdier; Jean-Pierre Vila Pages: 96 - 113 Abstract: Publication date: Available online 1 September 2016 Source:Statistical Methodology Author(s): Nadine Hilgert, Ghislain Verdier, Jean-Pierre Vila A new statistical approach for on-line change detection in uncertain dynamic system is proposed. In change detection problem, the distribution of a sequence of observations can change at some unknown instant. The goal is to detect this change, for example a parameter change, as quickly as possible with a minimal risk of false detection. In this paper, the observations come from an uncertain system modeled by an autoregressive model containing an unknown functional component. The popular Page’s CUSUM rule is not applicable anymore since it requires the full knowledge of the model. A new detection CUSUM-like scheme is proposed, which is based on the nonparametric estimation of the unknown component from a learning sample. Moreover, the estimation procedure can be updated on line which ensures a better detection, especially at the beginning of the monitoring procedure. Simulation trials were performed on a model describing a water treatment process and show the interest of this new procedure with respect to the classic CUSUM rule.
Authors:Michael Brimacombe Pages: 114 - 130 Abstract: Publication date: Available online 24 August 2016 Source:Statistical Methodology Author(s): Michael Brimacombe A general diagnostic approach to the evaluation of asymptotic approximation in likelihood based models is developed and applied to logistic regression. The expected asymptotic and observed log-likelihood functions are compared using a chi distribution in a directional Bayesian setting. This provides a general approach to assessing and visualizing non-convergence in higher dimensional models. Several well-known examples from the logistic regression literature are discussed.
Authors:Clécio S. Ferreira; Víctor H. Lachos Pages: 131 - 146 Abstract: Publication date: Available online 8 September 2016 Source:Statistical Methodology Author(s): Clécio S. Ferreira, Víctor H. Lachos Normal nonlinear regression models are applied in some areas of the sciences and engineering to explain or describe the phenomena under study. However, it is well known that several phenomena are not always represented by the normal model due to lack of symmetry or the presence of heavy-and-lightly tailed distributions related to the normal law in the data. This paper proposes an extension of nonlinear regression models using the skew-scale mixtures of normal (SSMN) distributions proposed by Ferreira et al. (2011). This class of models provides a useful generalization of the symmetrical nonlinear regression models since the random term distributions cover both asymmetric and heavy-tailed distributions, such as the skew-t-normal, skew-slash and skew-contaminated normal, among others. An expectation-maximization (EM) algorithm for maximum likelihood (ML) estimates is presented and the observed information matrix is derived analytically. Some simulation studies are presented to examine the performance of the proposed methods, with relation to robustness and asymptotic properties of the ML estimates. Finally, an illustration of the method is presented considering a dataset previously analyzed under normal and skew-normal (SN) nonlinear regression models. The main conclusion is that the ML estimates from the heavy tails SSMN nonlinear models are more robust against outlying observations compared to the corresponding SN estimates.
Authors:Sudipta Das; Anup Dewanji; Debasis Sengupta Pages: 147 - 159 Abstract: Publication date: Available online 9 September 2016 Source:Statistical Methodology Author(s): Sudipta Das, Anup Dewanji, Debasis Sengupta In many situations, multiple copies of a software are tested in parallel with different test cases as input, and the detected errors from a particular round of testing are debugged together. In this article, we discuss a discrete time model of software reliability for such a scenario of periodic debugging. We propose likelihood based inference of the model parameters, including the initial number of errors, under the assumption that all errors are equally likely to be detected. The proposed method is used to estimate the reliability of the software. We establish asymptotic normality of the estimated model parameters. The performance of the proposed method is evaluated through a simulation study and its use is illustrated through the analysis of a data set obtained from testing of a real-time flight control software. We also consider a more general model, in which different errors have different probabilities of detection.
Authors:Félix Almendra-Arao; José Juan Castro-Alva; Hortensia Reyes-Cervantes Pages: 160 - 171 Abstract: Publication date: Available online 7 September 2016 Source:Statistical Methodology Author(s): Félix Almendra-Arao, José Juan Castro Alva, Hortensia Reyes-Cervantes In both statistical non-inferiority (NI) and superiority (S) tests, the critical region must be a Barnard convex set for two main reasons. One, being computational in nature, based on the fact that calculating test sizes is a computationally intensive problem due to the presence of a nuisance parameter. However, this calculation is considerably reduced when the critical region is a Barnard convex set. The other reason is that in order for the NI/S statistical tests to make sense, its critical regions must be Barnard convex sets. While it is indeed possible for NI/S tests’ critical regions to not be Barnard convex sets, for the reasons stated above, it is desirable that they are. Therefore, it is important to generate, from a given NI/S test, a test which guarantees that the critical regions are Barnard convex sets. We propose a method by which, from a given NI/S test, we construct another NI/S test, ensuring that the critical regions corresponding to the modified test are Barnard convex sets, we illustrate this through examples. This work is theoretical because the type of developments refers to the general framework of NI/S testing for two independent binomial proportions and it is applied because statistical tests that do not ensure that their critical regions are Barnard convex sets may appear in practice, particularly in the clinical trials area.
Authors:Kent R. Riggs; Phil D. Young; Dean M. Young Pages: 1 - 13 Abstract: Publication date: Available online 3 March 2016 Source:Statistical Methodology Author(s): Kent R. Riggs, Phil D. Young, Dean M. Young We derive two new confidence ellipsoids (CEs) and four CE variations for covariate coefficient vectors with nuisance parameters under the seemingly unrelated regression (SUR) model. Unlike most CE approaches for SUR models studied so far, we assume unequal regression coefficients for our two regression models. The two new basic CEs are a CE based on a Wald statistic with nuisance parameters and a CE based on the asymptotic normality of the SUR two-stage unbiased estimator of the primary regression coefficients. We compare the coverage and volume characteristics of the six SUR-based CEs via a Monte Carlo simulation. For the configurations in our simulation, we determine that, except for small sample sizes, a CE based on a two-stage statistic with a Bartlett corrected ( 1 − α ) percentile is generally preferred because it has essentially nominal coverage and relatively small volume. For small sample sizes, the parametric bootstrap CE based on the two-stage estimator attains close-to-nominal coverage and is superior to the competing CEs in terms of volume. Finally, we apply three SUR Wald-type CEs with favorable coverage properties and relatively small volumes to a real data set to demonstrate the gain in precision over the ordinary-least-squares-based CE.
Authors:Dian-Tong Kang; Lei Yan Pages: 14 - 35 Abstract: Publication date: Available online 12 February 2016 Source:Statistical Methodology Author(s): Dian-Tong Kang, Lei Yan A new stochastic order called dynamic cumulative residual quantile entropy (DCRQE) order is established. Some characterizations of the new order are investigated. Closure and reversed closure properties of the DCRQE order are obtained. Applications of the DCRQE ordering in characterizing the proportional hazard rate model and the k -record values model are considered.
Authors:Satya Prakash Singh; Siuli Mukhopadhyay Pages: 36 - 52 Abstract: Publication date: Available online 11 March 2016 Source:Statistical Methodology Author(s): Satya Prakash Singh, Siuli Mukhopadhyay Designing cluster trials depends on the knowledge of the intracluster correlation coefficient. To overcome the issue of parameter dependence, Bayesian designs are proposed for two level models with and without covariates. These designs minimize the variance of the treatment contrast under certain cost constraints. A pseudo Bayesian design approach is advocated that integrates and averages the objective function over a prior distribution of the intracluster correlation coefficient. Theoretical results on the Bayesian criterion are noted when the intracluster correlation follows a uniform distribution. Two data sets based on educational surveys conducted in schools are used to illustrate the proposed methodology.
Authors:Kung-Jong Lui Pages: 53 - 62 Abstract: Publication date: Available online 18 February 2016 Source:Statistical Methodology Author(s): Kung-Jong Lui For comparison of two experimental treatments with a placebo under an incomplete block crossover design, we develop the weighted-least-squares estimator (WLSE) and the conditional maximum likelihood estimator (CMLE) of the relative treatment effects in Poisson frequency data. We further develop the interval estimator based on the WLSE, the interval estimator based on the CMLE, the interval estimator based on the conditional-likelihood-ratio test and the interval estimator based on the exact conditional distribution. Using Monte Carlo simulations, we find that all interval estimators developed here can perform well in a variety of situations. The exact interval estimator derived here can be especially of use when both the number of patients and the mean number of event occurrences are small in a trial. We use the data taken as part of a double-blind randomized crossover trial comparing salbutamol and salmeterol with a placebo with respect to the number of exacerbations in asthma patients to illustrate the use of these estimators.
Authors:Miguel A. Sordo; Marilia C. de Souza; Alfonso Suárez-Llorens Pages: 63 - 76 Abstract: Publication date: Available online 18 March 2016 Source:Statistical Methodology Author(s): Miguel A. Sordo, Marilia C. de Souza, Alfonso Suárez-Llorens In this paper, we derive a measure of discrepancy based on the Gini’s mean difference to test the null hypothesis that two random variables, which are ordered in a variability-type stochastic order, are equally dispersive versus the alternative that one strictly dominates the other. We describe the test, evaluate its performance under a variety of situations and illustrate the procedure with an example using log returns of real data.
Authors:Alexander Katzur; Udo Kamps Pages: 77 - 90 Abstract: Publication date: Available online 16 April 2016 Source:Statistical Methodology Author(s): Alexander Katzur, Udo Kamps Based on stochastically independent samples with underlying density functions from the same multiparameter exponential family, a weighted version of Matusita’s affinity is applied as test statistic in a homogeneity test of identical densities as well as in a discrimination problem. Asymptotic distributions of the test statistics are stated, and the impact of weights on the deviation of actual and required type I error for finite sample sizes is examined in a simulation study.
Authors:Jingjing Yin; Yi Hao; Hani Samawi; Haresh Rochani Pages: 91 - 106 Abstract: Publication date: Available online 27 April 2016 Source:Statistical Methodology Author(s): Jingjing Yin, Yi Hao, Hani Samawi, Haresh Rochani In medical diagnostics, the ROC curve is the graph of sensitivity against 1-specificity as the diagnostic threshold runs through all possible values. The ROC curve and its associated summary indices are very useful for the evaluation of the discriminatory ability of biomarkers/diagnostic tests with continuous measurements. Among all summary indices, the area under the ROC curve (AUC) is the most popular diagnostic accuracy index, which has been extensively used by researchers for biomarker evaluation and selection. Sometimes, taking the actual measurements of a biomarker is difficult and expensive, whereas ranking them without actual measurements can be easy. In such cases, ranked set sampling based on judgment order statistics would provide more representative samples yielding more accurate estimation. In this study, Gaussian kernel is utilized to obtain a nonparametric estimate of the AUC. Asymptotic properties of the AUC estimates are derived based on the theory of U-statistics. Intensive simulation is conducted to compare the estimates using ranked set samples versus simple random samples. The simulation and theoretical derivation indicate that ranked set sampling is generally preferred with smaller variances and mean squared errors (MSE). The proposed method is illustrated via a real data analysis.
Authors:Chantal Larose; Ofer Harel; Katarzyna Kordas; Dipak K. Dey Pages: 107 - 121 Abstract: Publication date: Available online 10 May 2016 Source:Statistical Methodology Author(s): Chantal Larose, Ofer Harel, Katarzyna Kordas, Dipak K. Dey Latent class analysis is used to group categorical data into classes via a probability model. Model selection criteria then judge how well the model fits the data. When addressing incomplete data, the current methodology restricts the imputation to a single, pre-specified number of classes. We seek to develop an entropy-based model selection criterion that does not restrict the imputation to one number of clusters. Simulations show the new criterion performing well against the current standards of AIC and BIC, while a family studies application demonstrates how the criterion provides more detailed and useful results than AIC and BIC.
Authors:Yunyun Qian; Zhensheng Huang Pages: 122 - 130 Abstract: Publication date: Available online 24 May 2016 Source:Statistical Methodology Author(s): Yunyun Qian, Zhensheng Huang In this study a varying-coefficient partially nonlinear model with measurement errors in the nonparametric part is proposed. Based on the corrected profile least-squared estimation methodology, we define the estimates of the unknowns of the current models, and check whether the coefficient functions are a constant or not by using the popular generalized likelihood ratio (GLR) test method. Further, the corresponding asymptotic distribution is established and a bootstrap procedure is also employed to implement the proposed methodology. Simulated and real examples are given to illustrate our proposed methodology.
Authors:N. Nematollahi; R. Farnoosh; Z. Rahnamaei Pages: 131 - 146 Abstract: Publication date: Available online 2 June 2016 Source:Statistical Methodology Author(s): N. Nematollahi, R. Farnoosh, Z. Rahnamaei A flexible class of skew-slash distributions which is a location-scale mixture of skew-elliptically distributed random variable with power of a beta random variable is presented. This family of distributions, which is a generalization of location-scale mixture of normal and beta distributions, contain some existing and important distributions and is appropriate for modeling data with skewness and heavy tail structure. Some distributional properties and the moments of this new family of distributions are obtained. In the special case of location-scale mixture of skew-normal distribution, we estimate the parameters via an EM-type algorithm and a simulation study and an application to real data are provided for illustration. Finally we extend some results to multivariate case.
Authors:Chi Tim Ng; Seungyoung Oh; Youngjo Lee Pages: 147 - 160 Abstract: Publication date: Available online 8 June 2016 Source:Statistical Methodology Author(s): Chi Tim Ng, Seungyoung Oh, Youngjo Lee Recently, the selection consistency of penalized least square estimators has received a great deal of attention. For the penalized likelihood estimation with certain non-convex penalties, search space can be constructed within which there exists a unique local minimizer that exhibits selection consistency in high-dimensional generalized linear models under certain conditions. In particular, we prove that the SCAD penalty of Fan and Li (2001) and a new modified version of the unbounded penalty of Lee and Oh (2014) can be employed to achieve such a property. These results hold even for the non-sparse cases where the number of relevant covariates increases with the sample size. Simulation studies are provided to compare the performance of SCAD penalty and the newly proposed penalty.
Authors:Xiaojuan Kang; Tizheng Li Pages: 161 - 184 Abstract: Publication date: Available online 14 June 2016 Source:Statistical Methodology Author(s): Xiaojuan Kang, Tizheng Li The varying coefficient model provides a useful tool for statistical modeling. In this paper, we propose a new procedure for more efficient estimation of its coefficient functions when its errors are serially correlated and modeled as an autoregressive (AR) process. We establish the asymptotic distribution of the proposed estimator and show that it is more efficient than the conventional local linear estimator. Furthermore, we suggest a penalized profile least squares method with the smoothly clipped absolute deviation (SCAD) penalty function to select the order of the AR error process. Simulation evidence shows that significant gains can be achieved in finite samples with the proposed estimation procedure. Moreover, a real data example is given to illustrate the usefulness of the proposed estimation procedure.
Authors:Sanku Dey; Sukhdev Singh; Yogesh Mani Tripathi; A. Asgharzadeh Pages: 185 - 202 Abstract: Publication date: Available online 21 June 2016 Source:Statistical Methodology Author(s): Sanku Dey, Sukhdev Singh, Yogesh Mani Tripathi, A. Asgharzadeh In this paper, we consider generalized inverted exponential distribution which is capable of modelling various shapes of failure rates and ageing criteria. The purpose of this paper is two fold. Based on progressive type-II censored data, first we consider the problem of estimation of parameters under classical and Bayesian approaches. In this regard, we obtain maximum likelihood estimates, and Bayes estimates under squared error loss function. We also compute 95% asymptotic confidence interval and highest posterior density interval estimates under the respective approaches. Second, we consider the problem of prediction of future observations using maximum likelihood predictor, best unbiased predictor, conditional median predictor and Bayes predictor. The associated predictive interval estimates for the censored observations are computed as well. Finally, we analyze two real data sets and conduct a Monte Carlo simulation study to compare the performance of the various proposed estimators and predictors.
Authors:Viani A. Biatat Djeundje Pages: 203 - 217 Abstract: Publication date: Available online 27 May 2016 Source:Statistical Methodology Author(s): Viani A. Biatat Djeundje The analysis of longitudinal data or repeated measurements is an important and growing area of Statistics. In this context, data come in different formats but typically, they have an hierarchical or multi-level structure including group and subject components, and the main purpose of the analysis is usually to estimate these components from the data. A standard way to perform this estimation is via mixed models. In this paper, we show that the estimated group effects from standard smooth mixed models can deviate systematically from the underlying group mean, leading to wrong conclusions about the data. We then present two ways to avoid such systematic deviations and misinterpretations when fitting flexible mixed models to multi-level data. The first method is a marginal procedure, and the second method is based on the conditional distribution of the subject effects derived from appropriate constraints. Both methods are robust against mis-specification of the covariance structure in the sense that they allow to resolve the lack of centering found in standard smooth mixed models.
Authors:Dian-tong Kang Pages: 218 - 235 Abstract: Publication date: Available online 30 June 2016 Source:Statistical Methodology Author(s): Dian-tong Kang Ebrahimi and Pellerey (1995) and Ebrahimi (1996) proposed the residual entropy. Recently, Sunoj and Sankaran (2012) obtained a quantile version of the residual entropy, the residual quantile entropy (RQE). Base on the RQE function, they defined a new stochastic order, the less quantile entropy (LQE) order, and studied some properties of this order. In this paper, we focus on further properties of this new order. Some characterizations of the LQE order are investigated, closure and reversed closure properties are obtained, meanwhile, some illustrative examples are shown. As applications of a main result, the preservation of the LQE order in several stochastic models are discussed. We give the closure and reversed closure properties of the LQE order for coherent systems with dependent and identically distributed components, and also consider a potential application to insurance of this order.
Authors:H.M. Barakat; A.R. Omar Pages: 1 - 7 Abstract: Publication date: Available online 19 January 2016 Source:Statistical Methodology Author(s): H.M. Barakat, A.R. Omar In this paper we compare the domains of attraction of limit laws of intermediate order statistics under power normalization with those of limit laws of intermediate order statistics under linear normalization. As a result of this comparison, we obtain necessary and sufficient conditions for a univariate distribution function to belong to the domain of attraction for each of the possible limit laws of intermediate order statistics under power normalization.
Authors:Marcelo Bourguignon; Klaus L.P. Vasconcellos Pages: 8 - 19 Abstract: Publication date: Available online 20 January 2016 Source:Statistical Methodology Author(s): Marcelo Bourguignon, Klaus L.P. Vasconcellos In this paper, we introduce a stationary first-order integer-valued autoregressive process with geometric–Poisson marginals. The new process allows negative values for the series. Several properties of the process are established. The unknown parameters of the model are estimated using the Yule-Walker method and the asymptotic properties of the estimator are considered. Some numerical results of the estimators are presented with a brief discussion. Possible application of the process is discussed through a real data example.
Authors:G. Avlogiaris; A. Micheas; K. Zografos Pages: 20 - 42 Abstract: Publication date: Available online 29 January 2016 Source:Statistical Methodology Author(s): G. Avlogiaris, A. Micheas, K. Zografos The aim of this paper is to propose procedures that test statistical hypotheses locally, that is, assess the validity of a model in a specific domain of the data. In this context, the one and two sample problems will be discussed. The proposed tests are based on local divergences which are defined in such a way as to quantify the divergence between probability distributions locally, in a specific area of the joint domain of the underlined models. The theoretical results are exemplified using simulations and two real datasets.
Authors:Ehssan Ghashim; Éric Marchand; William E. Strawderman Pages: 43 - 57 Abstract: Publication date: July 2016 Source:Statistical Methodology, Volume 31 Author(s): Ehssan Ghashim, Éric Marchand, William E. Strawderman For estimating a lower restricted parametric function in the framework of Marchand and Strawderman (2006), we show how ( 1 − α ) × 100 % Bayesian credible intervals can be constructed so that the frequentist probability of coverage is no less than 1 − 3 α 2 . As in Marchand and Strawderman (2013), the findings are achieved through the specification of the spending function of the Bayes credible interval and apply to an “equal-tails” modification of the HPD procedure among others. Our results require a logconcave assumption for the distribution of a pivot, and apply to estimating a lower bounded normal mean with known variance, and to further examples include lower bounded scale parameters from Gamma, Weibull, and Fisher distributions, with the latter also applicable to random effects analysis of variance.
Authors:Bahman Tarvirdizade; Mohammad Ahmadpour Pages: 58 - 72 Abstract: Publication date: Available online 10 February 2016 Source:Statistical Methodology Author(s): Bahman Tarvirdizade, Mohammad Ahmadpour In this paper, the estimation of the stress-strength reliability Pr ( X > Y ) based on upper record values is considered when X and Y are independent random variables from a two-parameter bathtub-shaped lifetime distribution with the same shape but different scale parameters. The maximum likelihood estimator (MLE), the approximate Bayes estimator and the exact confidence intervals of stress-strength reliability are obtained when the shape parameter is known. When the shape parameter is unknown, we obtain the MLE, the asymptotic confidence interval and some bootstrap confidence intervals of stress-strength reliability. In this case, we also apply the Gibbs sampling technique to study the Bayesian estimation of stress-strength reliability and the corresponding credible interval. A Monte Carlo simulation study is conducted to investigate and compare the performance of the different proposed methods in this paper. Finally, analysis of a real data set is presented for illustrative purposes.
Authors:Cathy W.S. Chen; Mike K.P. So; Jessica C. Li; Songsak Sriboonchitta Pages: 73 - 90 Abstract: Publication date: Available online 9 February 2016 Source:Statistical Methodology Author(s): Cathy W.S. Chen, Mike K.P. So, Jessica C. Li, Songsak Sriboonchitta Integer-valued time series analysis offers various applications in biomedical, financial, and environmental research. However, existing works usually assume no or constant over-dispersion. In this paper, we propose a new model for time series of counts, the autoregressive conditional negative binomial model that has a time-varying conditional autoregressive mean function and heteroskedasticity. The location and scale parameters of the negative binomial distribution are flexible in the proposed set-up, inducing dynamic over-dispersion. We adopt Bayesian methods with a Markov chain Monte Carlo sampling scheme to estimate model parameters and utilize deviance information criterion for model comparison. We conduct simulations to investigate the estimation performance of this sampling scheme for the proposed negative binomial model. To demonstrate the proposed approach in modeling time-varying over-dispersion, we consider two criminal incidents recorded by New South Wales (NSW) Police Force in Australia. We also fit the autoregressive conditional Poisson model to these two datasets. Our results demonstrate that the proposed negative binomial model is preferable to the Poisson model.
Authors:S. Ejaz Ahmed; Mohamed Amezziane Abstract: Publication date: Available online 18 December 2016 Source:Statistical Methodology Author(s): S. Ejaz Ahmed, Mohamed Amezziane Shrinkage estimation is used to develop a semiparametric density estimator as a linear combination of a fully known parametric density function and a nonparametric density estimator. We determine the asymptotic properties of the shrinkage coefficient and of the semiparametric estimator’s integrated squared error. Moreover, we show that the proposed estimation methodology delivers density estimators that are more accurate than nonparametric estimators and that do not require the use of optimal smoothing parameters.
Authors:Najmeh Bathaee; Hamid Sheikhzadeh Abstract: Publication date: Available online 11 November 2016 Source:Statistical Methodology Author(s): Najmeh Bathaee, Hamid Sheikhzadeh In this paper, we present a non-parametric continuous density Hidden Markov mixture model (CDHMMix model) with unknown number of mixtures for blind segmentation or clustering of sequences. In our presented model, the emission distributions of HMMs are chosen to be Gaussian with full, diagonal, or tridiagonal covariance matrices. We apply a Bayesian approach to train our presented model and drive the inference of our model using the Monte Carlo Markov Chain (MCMC) method. For the multivariate Gaussian emission a method that maintains the tridiagonal structure of the covariance is introduced. Moreover, we present a new sampling method for hidden state sequences of HMMs based on the Viterbi algorithm that increases the mixing rate.
Authors:Mohamed Chaouch; Elias Ould Abstract: Publication date: Available online 29 October 2016 Source:Statistical Methodology Author(s): Mohamed Chaouch, Naâmane Laïb, Elias Ould Saïd The present paper deals with a nonparametric M -estimation for right censored regression model with stationary ergodic data. Defined as an implicit function, is a kernel-type estimator of a family of robust regression is considered when the covariate take its values in R d ( d ≥ 1 ) and the data are sampled from a stationary ergodic process. The strong consistency (with rate) and the asymptotic distribution of the estimator are established under mild assumptions. Moreover, a usable confidence interval is provided which does not depend on any unknown quantity. Our results hold without any mixing condition and do not require the existence of marginal densities. A comparison study based on simulated data is also provided.
Authors:Shin Zhu Sim; Seng Huat Ong Abstract: Publication date: Available online 26 October 2016 Source:Statistical Methodology Author(s): Shin Zhu Sim, Seng Huat Ong This paper considers a particular generalized inverse trinomial distribution which may be regarded as the convolution of binom‘ial and negative distributions for the statistical analysis of count data. This distribution has the flexibility to cater for under-, equi- and over- dispersion in the data. Some basic and probabilistic properties and tail approximation of the distribution have been derived. Conditions for the numerical stability of the two-term probability recurrence formula have also been examined to facilitate computation. For the purpose of statistical analysis, test of hypothesis for equi-dispersion by the score and likelihood ratio tests and simulation study of their power, parameter estimation by maximum likelihood and a probability generating function based methods have been considered. The versatility of the distribution is illustrated by its application to real biological data sets which exhibit under and over dispersion. It is shown that the distribution fits better than the well-known generalized Poisson and COM-Poisson distributions.
Authors:Rasul A. Khan Abstract: Publication date: Available online 7 October 2016 Source:Statistical Methodology Author(s): Rasul A. Khan A problem for estimating the number of trials n in the binomial distribution B ( n , p ) , is revisited by considering the large sample model N ( μ , c μ ) and the associated maximum likelihood estimator (MLE) and some sequential procedures. Asymptotic properties of the MLE of n via the normal model N ( μ , c μ ) are briefly described. Beyond the asymptotic properties, our main focus is on the sequential estimation of n . Let X 1 , X 2 , . . . , X m , . . . be iid N ( μ , c μ ) ( c > 0 ) random variables with an unknown mean μ = 1 , 2 , . . . and variance c μ , where c is known. The sequential estimation of μ is explored by a method initiated by Robbins (1970) and further pursued by Khan (1973). Various properties of the procedure including the error probability and the expected sample size are determined. An asymptotic optimality of the procedure is given. Sequential interval estimation and point estimation are also briefly discussed.
Authors:Xiang Zhan; Debashis Ghosh Abstract: Publication date: Available online 6 October 2016 Source:Statistical Methodology Author(s): Xiang Zhan, Debashis Ghosh Kernel-based association test (KAT) is a widely used tool in genetics association analysis. The performance of such a test depends on the choice of kernel. In this paper, we study the statistical power of a KAT using a Gaussian kernel. We explicitly develop a notion of analytical power function in this family of tests. We propose a novel approach to select the kernel so as to maximize the analytical power function of the test at a given test level (an upper bound on the probability of making a type I error). We assess some theoretical properties of our optimal estimator, and compare its performance with some similar existing alternatives using simulation studies. Neuroimaging data from an Alzheimer’s disease study is also used to illustrate the proposed kernel selection methodology.
Authors:Gregory E. Wilding; Mark C. Baker Abstract: Publication date: Available online 6 October 2016 Source:Statistical Methodology Author(s): Gregory E. Wilding, Mark C. Baker The testing of equality of several Pearson correlations can be found in a number of scientific fields. We surmise in many such cases that the alternatives of interest in practice are, in deed, order restricted, and therefore the researcher is best served by use of testing procedures developed for those specific alternatives. In this note we introduce a collection of tests for use in testing equality of k correlation coefficients against order alternatives, with an emphasis on simple order. Specifically, we propose likelihood ratio tests and contrast tests based on the well known Fisher Z transformation as well as tests which make use of generalized variable methodologies. The proposed procedures are empirically compared with regards to type I and II error rates via Monte Carlo simulations studies, and the use of the approaches are illustrated using an example. These tests are found to be vastly superior to tests for the general alternative, and the contrasts tests based on the Fisher Z transformation are recommended for practice based on the observed test properties and simplicity.
Authors:Shikai Luo; Subhashis Ghosal Abstract: Publication date: Available online 27 September 2016 Source:Statistical Methodology Author(s): Shikai Luo, Subhashis Ghosal We propose a new variable selection and estimation technique for high dimensional single index models with unknown monotone smooth link function. Among many predictors, typically, only a small fraction of them have significant impact on prediction. In such a situation, more interpretable models with better prediction accuracy can be obtained by variable selection. In this article, we propose a new penalized forward selection technique which can reduce high dimensional optimization problems to several one dimensional optimization problems by choosing the best predictor and then iterating the selection steps until convergence. The advantage of optimizing in one dimension is that the location of optimum solution can be obtained with an intelligent search by exploiting smoothness of the criterion function. Moreover, these one dimensional optimization problems can be solved in parallel to reduce computing time nearly to the level of the one-predictor problem. Numerical comparison with the LASSO and the shrinkage sliced inverse regression shows very promising performance of our proposed method.
Authors:David Abstract: Publication date: Available online 22 July 2016 Source:Statistical Methodology Author(s): David Stibůrek In statistical inference on the drift parameter θ in the process X t = θ a ( t ) + ∫ 0 t b ( s ) d W s , where a ( t ) and b ( t ) are known, deterministic functions, there is known a large number of options how to do it. We may, for example, base this inference on the differences between the observed values of the process at discrete times and their normality. Although such methods are very simple, it turns out that it is more appropriate to use sequential methods. For the hypotheses testing about the drift parameter θ , it is more proper to standardize the observed process and to use sequential methods based on the first exit time of the observed process of a pre-specified interval until some given time. These methods can be generalized to the case of random part being a symmetric Itô integral or continuous symmetric martingale.
Authors:Michael Espendiller; Maria Kateri Abstract: Publication date: Available online 11 January 2016 Source:Statistical Methodology Author(s): Michael Espendiller, Maria Kateri The odds ratio is the predominant measure of association in 2 × 2 contingency tables, which, for inferential purposes, is usually considered on the log-scale. Under an information theoretic set-up, it is connected to the Kullback-Leibler divergence. Considering a generalized family of divergences, the ϕ divergence, alternative association measures are derived for 2 × 2 contingency tables. Their properties are studied and asymptotic inference is developed. For some members of this family, the estimated association measures remain finite in the presence of a sampling zero while for a subset of these members the estimators of these measures have finite variance as well. Special attention is given to the power divergence, which is a parametric family. The role of its parameter λ , in terms of the asymptotic confidence intervals’ coverage probability and average relative length, is further discussed. In special probability table structures, for which the performance of the asymptotic confidence intervals for the classical log odds ratio is poor, the measure corresponding to λ = 1 / 3 is suggested as an alternative.
Authors:Afshin Almasi; Mohammad Reza Eshraghian; Abbas Moghimbeigi; Abbas Rahimi; Kazem Mohammad; Sadegh Fallahigilan Pages: 1 - 14 Abstract: Publication date: May 2016 Source:Statistical Methodology, Volume 30 Author(s): Afshin Almasi, Mohammad Reza Eshraghian, Abbas Moghimbeigi, Abbas Rahimi, Kazem Mohammad, Sadegh Fallahigilan Poisson or zero-inflated Poisson models often fail to fit count data either because of over- or underdispersion relative to the Poisson distribution. Moreover, data may be correlated due to the hierarchical study design or the data collection methods. In this study, we propose a multilevel zero-inflated generalized Poisson regression model that can address both over- and underdispersed count data. Random effects are assumed to be independent and normally distributed. The method of parameter estimation is EM algorithm base on expectation and maximization which falls into the general framework of maximum-likelihood estimations. The performance of the approach was illustrated by data regarding an index of tooth caries on 9-year-old children. Using various dispersion parameters, through Monte Carlo simulations, the multilevel ZIGP yielded more accurate parameter estimates, especially for underdispersed data.
Authors:Mehrdad Vossoughi; S.M.T. Ayatollahi Mina Towhidi Seyyed Taghi Heydari Abstract: Publication date: Available online 29 December 2015 Source:Statistical Methodology Author(s): Mehrdad Vossoughi, S.M.T. Ayatollahi, Mina Towhidi, Seyyed Taghi Heydari In this paper, we propose a new two-sample distribution-free procedure for testing group-by-time interaction effect in repeated measurements from a linear mixed model setting. The test statistic is based on the maximum difference of partial sums (MDPS) over time points between the two groups. Although the test has a biomedical focus, it can be applied in fields that the study is designed and monitored to be balanced and complete with equal sample sizes as would be generally done in a controlled experiment. The asymptotic null distribution of the test statistic was also derived based on the maxima of Brownian bridge under two different conditions. The simulations revealed that MDPS performed markedly better than the commonly used unstructured multivariate approach (UMA) to profile analysis. However, the empirical powers of MDPS test were convincingly close to those of the best-fitting linear mixed model (LMM).
Authors:Pao-sheng Shen Abstract: Publication date: Available online 17 December 2015 Source:Statistical Methodology Author(s): Pao-sheng Shen We analyze doubly truncated data using semiparametric transformation models. It is demonstrated that the extended estimating equations of Cheng et al. (1995) can be used to analyze doubly truncated data. The asymptotic properties of the proposed estimators are derived. A simulation study is conducted to investigate the performance of the proposed estimators.
Authors:Gwo Dong; Lin Chin-Diew Lai Govindaraju Abstract: Publication date: Available online 11 September 2015 Source:Statistical Methodology Author(s): Gwo Dong Lin, Chin-Diew Lai, K. Govindaraju We first review the basic properties of Marshall–Olkin bivariate exponential distribution (BVE) and then investigate its correlation structure. We provide the correct reasonings for deriving some properties of the Marshall–Olkin BVE and show that the correlation of the BVE is always smaller than that of its copula regardless of the parameters. The latter implies that the BVE does not have Lancaster’s phenomenon (any nonlinear transformation of variables decreases the correlation in absolute value). The dependence structure of the BVE is also investigated.