Subjects -> STATISTICS (Total: 130 journals)
 The end of the list has been reached or no journals were found for your choice.
Similar Journals
 Computational StatisticsJournal Prestige (SJR): 0.803 Citation Impact (citeScore): 1Number of Followers: 15      Hybrid journal (It can contain Open Access articles) ISSN (Print) 1613-9658 - ISSN (Online) 0943-4062 Published by Springer-Verlag  [2469 journals]
• Dynamic sampling from a discrete probability distribution with a known
distribution of rates

Abstract: Abstract In this paper, we consider several efficient data structures for the problem of sampling from a dynamically changing discrete probability distribution, where some prior information is known on the distribution of the rates, in particular the maximum and minimum rate, and where the number of possible outcomes N is large. We consider three basic data structures, the Acceptance–Rejection method, the Complete Binary Tree and the Alias method. These can be used as building blocks in a multi-level data structure, where at each of the levels, one of the basic data structures can be used, with the top level selecting a group of events, and the bottom level selecting an element from a group. Depending on assumptions on the distribution of the rates of outcomes, different combinations of the basic structures can be used. We prove that for particular data structures the expected time of sampling and update is constant when the rate distribution follows certain conditions. We show that for any distribution, combining a tree structure with the Acceptance–Rejection method, we have an expected time of sampling and update of $$O\left( \log \log {r_{max}}/{r_{min}}\right)$$ is possible, where $$r_{max}$$ is the maximum rate and $$r_{min}$$ the minimum rate. We also discuss an implementation of a Two Levels Acceptance–Rejection data structure, that allows expected constant time for sampling, and amortized constant time for updates, assuming that $$r_{max}$$ and $$r_{min}$$ are known and the number of events is sufficiently large. We also present an experimental verification, highlighting the limits given by the constraints of a real-life setting.
PubDate: 2022-07-01

• Covariance matrix testing in high dimension using random projections

Abstract: Abstract Estimation and hypothesis tests for the covariance matrix in high dimensions is a challenging problem as the traditional multivariate asymptotic theory is no longer valid. When the dimension is larger than or increasing with the sample size, standard likelihood based tests for the covariance matrix have poor performance. Existing high dimensional tests are either computationally expensive or have very weak control of type I error. In this paper, we propose a test procedure, CRAMP (covariance testing using random matrix projections), for testing hypotheses involving one or more covariance matrices using random projections. Projecting the high dimensional data randomly into lower dimensional subspaces alleviates of the curse of dimensionality, allowing for the use of traditional multivariate tests. An extensive simulation study is performed to compare CRAMP against asymptotics-based high dimensional test procedures. An application of the proposed method to two gene expression data sets is presented.
PubDate: 2022-07-01

• A Bayesian approach to the analysis of asymmetric association for two-way
contingency tables

Abstract: Abstract Recently, a subcopula-based asymmetric association measure was developed for the variables in two-way contingency tables. Here, we develop a fully Bayesian method to implement this measure, and examine its performance using simulation data and several real data sets of colorectal cancer. We use coverage probabilities and lengths of the interval estimators to compare the Bayesian approach and a large-sample method of analysis. In simulation studies, we find that the Bayesian method outperforms the large-sample method on average, and provides either similar or improved results for the real data analyses.
PubDate: 2022-07-01

• A sequential test and a sequential sampling plan based on the process
capability index Cpmk

Abstract: Abstract In this study we propose a sequential test for hypothesis testing on the $$C_{pmk}$$ process capability index. Furthermore, we propose a sequential sampling plan for lot acceptance based on $$C_{pmk}$$ . We compare the statistical properties of the sequential procedures with the performance of the corresponding non-sequential methodologies by carrying out an extensive simulation study. The results show that the proposed sequential methods make it possible to reach decisions much more quickly, on average, than the fixed sample size procedures with the same discriminating power.
PubDate: 2022-07-01

• The modified maximum likelihood estimators for the parameters of the
regression model under bivariate median ranked set sampling

PubDate: 2022-07-01

• A new estimation for INAR(1) process with Poisson distribution

Abstract: Abstract The first-order Poisson autoregressive model may be suitable in situations where the time series data are non-negative integer valued. In this article, we propose a new parameter estimator based on empirical likelihood. Our results show that it can lead to efficient estimators by making effective use of auxiliary information. As a by-product, a test statistic is given, testing the randomness of the parameter. The simulation values show that the proposed test statistic works well. We have applied the suggested method to a real count series.
PubDate: 2022-07-01

• Applying the rescaling bootstrap under imputation for a multistage
sampling design

Abstract: Abstract In this paper, we propose a method that estimates the variance of an imputed estimator in a multistage sampling design. The method is based on the rescaling bootstrap for multistage sampling introduced by Preston (Surv Methodol 35(2):227–234, 2009). In his original version, this resampling method requires that the dataset includes only complete cases and no missing values. Thus, we propose two modifications for applying this method to nonresponse and imputation. These modifications are compared to other modifications in a Monte Carlo simulation study. The results of our simulation study show that our two proposed approaches are superior to the other modifications of the rescaling bootstrap and, in many situations, produce valid estimators for the variance of the imputed estimator in multistage sampling designs.
PubDate: 2022-07-01

• Hierarchical correction of p-values via an ultrametric tree running
Ornstein-Uhlenbeck process

Abstract: Abstract Statistical testing is classically used as an exploratory tool to search for association between a phenotype and many possible explanatory variables. This approach often leads to multiple testing under dependence. We assume a hierarchical structure between tests via an Ornstein-Uhlenbeck process on a tree. The process correlation structure is used for smoothing the p-values. We design a penalized estimation of the mean of the Ornstein-Uhlenbeck process for p-value computation. The performances of the algorithm are assessed via simulations. Its ability to discover new associations is demonstrated on a metagenomic dataset. The corresponding R package is available from https://github.com/abichat/zazou.
PubDate: 2022-07-01

• New approximate Bayesian computation algorithm for censored data

Abstract: Abstract Approximate Bayesian computation refers to a family of algorithms that perform Bayesian inference under intractable likelihoods. In this paper we propose replacing the distance metric in certain algorithms with hypothesis testing. The benefits of which are that summary statistics are no longer required and censoring can be present in the observed data set without needing to simulate any censored data. We illustrate our proposed method through a nanotechnology application in which we estimate the concentration of particles in a liquid suspension. We prove that our method results in an approximation to the true posterior and that the parameter estimates are consistent. We further show, through comparative analysis, that it is more efficient than existing methods for censored data.
PubDate: 2022-07-01

• Characterizations and generalizations of the negative binomial
distribution

Abstract: Abstract In this paper, we give detailed descriptions of the Zero-Modified Negative Binomial distribution for analyzing count data. In particular, we study the characterizations and properties of this distribution, whose main advantage is its flexibility which makes it suitable for modeling a wide range of overdispersed and underdispersed count data (which may or may not be caused by zero-modification, i.e., the inflation or deflation of zeroes), without requiring previous knowledge about any of these inherent data characteristics. We derive maximum likelihood estimation of the model parameters based on positive observations, and evaluate the loss of efficiency by considering this procedure. We illustrate the suitability of this distribution on real data sets with different types of zero-modification.
PubDate: 2022-07-01

• Bayesian analysis of mixture autoregressive models covering the complete
parameter space

Abstract: Abstract Mixture autoregressive (MAR) models provide a flexible way to model time series with predictive distributions which depend on the recent history of the process and are able to accommodate asymmetry and multimodality. Bayesian inference for such models offers the additional advantage of incorporating the uncertainty in the estimated models into the predictions. We introduce a new way of sampling from the posterior distribution of the parameters of MAR models which allows for covering the complete parameter space of the models, unlike previous approaches. We also propose a relabelling algorithm to deal a posteriori with label switching. We apply our new method to simulated and real datasets, discuss the accuracy and performance of our new method, as well as its advantages over previous studies. The idea of density forecasting using MCMC output is also introduced.
PubDate: 2022-07-01

• Bayesian variable selection and estimation in quantile regression using a
quantile-specific prior

Abstract: Abstract Asymmetric Laplace (AL) specification has become one of the ideal statistical models for Bayesian quantile regression. In addition to fast convergence of Markov Chain Monte Carlo, AL specification guarantees posterior consistency under model misspecification. However, variable selection under such a specification is a daunting task because, realistically, prior specification of regression parameters should take the quantile levels into consideration. Quantile-specific g-prior has recently been developed for Bayesian variable selection in quantile regression, whereas it comes at a high price of the computational burden due to the intractability of the posterior distributions. In this paper, we develop a novel three-stage computational scheme for the foregoing quantile-specific g-prior, which starts with an expectation-maximization algorithm, followed by Gibbs sampler and ends with an importance re-weighting step that improves the accuracy of approximation. The performance of the proposed procedure is illustrated with simulations and a real-data application. Numerical results suggest that our procedure compares favorably with the Metropolis–Hastings algorithm.
PubDate: 2022-07-01

• Objective Bayesian group variable selection for linear model

Abstract: Abstract Prediction variables of the regression model are grouped in many application problems. For example, a factor in an analysis of variance can have several levels or each original prediction variable in additive models can be expanded into different order polynomials or a set of basis functions. It is essential to select important groups and individual variables within the selected groups. In this study, we propose the objective Bayesian group and individual variable selections within the selected groups in the regression model to reduce the computational cost, even though the number of regression variables is large. Besides, we examine the consistency of the proposed group variable selection procedure. The proposed objective Bayesian approach is investigated using simulation and real data examples. The comparisons between the penalized regression approaches, Bayesian group lasso and the proposed method are presented.
PubDate: 2022-07-01

• Robust estimation of the number of factors for the pair-elliptical factor
models

Abstract: Abstract In this paper, we investigate the robust estimation of the number of common factors in high-dimensional factor model with pair-elliptically distributed idiosyncratic errors. Motivated by the pandemic heavy-tail distributions of financial returns, we first introduce a pair-elliptical factor model by allowing the factors and noises to follow pairwisely the joint elliptical distributions. Compared with the elliptical factor model invented in Fan et al. (Ann Stat 46:1383–1414, 2018), the pair-elliptical factor model has more richer structure with more relaxed assumptions. We propose two robust quantile-based estimators of the number of factors and obtain the asymptotic properties of the estimators under some mild conditions. Then, some simulation studies and a real data analysis are carried out to show the effectiveness of the estimators of the factor numbers.
PubDate: 2022-07-01

• Statistical inference in massive datasets by empirical likelihood

Abstract: Abstract In this paper, we propose a new statistical inference method for massive data sets, which is very simple and efficient by combining divide-and-conquer method and empirical likelihood. Compared with two popular methods (the bag of little bootstrap and the subsampled double bootstrap), we make full use of data sets, and reduce the computation burden. Extensive numerical studies and real data analysis demonstrate the effectiveness and flexibility of our proposed method. Furthermore, the asymptotic property of our method is derived.
PubDate: 2022-07-01

• Smallest covering regions and highest density regions for discrete
distributions

Abstract: Abstract This paper examines the problem of computing a canonical smallest covering region for an arbitrary discrete probability distribution. This optimisation problem is similar to the classical 0–1 knapsack problem, but it involves optimisation over a set that may be countably infinite, raising a computational challenge that makes the problem non-trivial. To solve the problem we present theorems giving useful conditions for an optimising region and we develop an iterative one-at-a-time computational method to compute a canonical smallest covering region. We show how this can be programmed in pseudo-code and we examine the performance of our method. We compare this algorithm with other algorithms available in statistical computation packages to compute HDRs. We find that our method is the only one that accurately computes HDRs for arbitrary discrete distributions.
PubDate: 2022-07-01

• Kolmogorov–Smirnov simultaneous confidence bands for time series
distribution function

Abstract: Abstract Claims about distributions of time series are often unproven assertions instead of substantiated conclusions for lack of hypotheses testing tools. In this work, Kolmogorov–Smirnov type simultaneous confidence bands (SCBs) are constructed based on simple random samples (SRSs) drawn from realizations of time series, together with smooth SCBs using kernel distribution estimator (KDE) instead of empirical cumulative distribution function of the SRS. All SCBs are shown to enjoy the same limiting distribution as the standard Kolmogorov–Smirnov for i.i.d. sample, which is validated in simulation experiments on various time series. Computing these SCBs for the standardized S&P 500 daily returns data leads to some rather unexpected findings, i.e., student’s t-distributions with degrees of freedom no less than 3 and the normal distribution are all acceptable versions of the standardized daily returns series’ distribution, with proper rescaling. These findings present challenges to the long held belief that daily financial returns distribution is fat-tailed and leptokurtic.
PubDate: 2022-07-01

• Laplace regression with clustered censored data

Abstract: Abstract In survival analysis, data may be correlated or clustered, because of some features such as shared genes and environmental background. A common approach to accommodate clustered data is the Cox frailty model that has proportional hazard assumption and complexity of interpreting hazard ratio lead to the misinterpretation of a direct effect on the time of event. In this paper, we considered Laplace quantile regression model for clustered survival data that interpret the effect of covariates on the time to event. A Bayesian approach with Markov Chain Monte Carlo method was used to fit the model. The results from a simulation study to evaluate the performance of proposed model showed that the Laplace regression model with frailty term performed well for different scenarios and the coverage rates of the pointwise 95% CIs were close to the nominal level (0.95). An application to data from breast cancer was presented to illustrate the theory and method developed in this paper.
PubDate: 2022-07-01

• On community structure validation in real networks

Abstract: Abstract Community structure is a commonly observed feature of real networks. The term refers to the presence in a network of groups of nodes (communities) that feature high internal connectivity, but are poorly connected between each other. Whereas the issue of community detection has been addressed in several works, the problem of validating a partition of nodes as a good community structure for a real network has received considerably less attention and remains an open issue. We propose a set of indices for community structure validation of network partitions that are based on an hypothesis testing procedure that assesses the distribution of links between and within communities. Using both simulations and real data, we illustrate how the proposed indices can be employed to compare the adequacy of different partitions of nodes as community structures in a given network, to assess whether two networks share the same or similar community structures, and to evaluate the performance of different network clustering algorithms.
PubDate: 2022-07-01

• Hierarchical and multivariate regression models to fit correlated
asymmetric positive continuous outcomes

Abstract: Abstract In the extant literature, hierarchical models typically assume a flexible distribution for the random-effects. The random-effects approach has been used in the inferential procedure of the generalized linear mixed models . In this paper, we propose a random intercept gamma mixed model to fit correlated asymmetric positive continuous outcomes. The generalized log-gamma (GLG) distribution is assumed as an alternative to the normality assumption for the random intercept. Numerical results demonstrate the impact on the maximum likelihood (ML) estimator when the random-effect distribution is misspecified. The extended inverted Dirichlet (EID) distribution is derived from the random intercept gamma-GLG model that leads to the EID regression model by supposing a particular parameter setting of the hierarchical model. Monte Carlo simulation studies are performed to evaluate the asymptotic behavior of the ML estimators from the proposed models. Analysis of diagnostic methods based on quantile residual and COVARATIO statistic are used to assess departures from the EID regression model and identify atypical subjects. Two applications with real data are presented to illustrate the proposed methodology.
PubDate: 2022-07-01

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762