A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  Subjects -> STATISTICS (Total: 130 journals)
The end of the list has been reached or no journals were found for your choice.
Similar Journals
Journal Cover
Computational Statistics
Journal Prestige (SJR): 0.803
Citation Impact (citeScore): 1
Number of Followers: 15  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1613-9658 - ISSN (Online) 0943-4062
Published by Springer-Verlag Homepage  [2467 journals]
  • Two-stage unrelated randomized response model to estimate the prevalence
           of a sensitive attribute

    • Free pre-print version: Loading...

      Abstract: Abstract The present work proposes a new two-stage unrelated randomized response model to estimate the mean number of individuals who possess a rare sensitive attribute in a given population by using Poisson probability distribution, when the proportion of rare non-sensitive unrelated attribute is known and unknown. The properties of the proposed model are examined. The variance of the proposed randomized response model smaller than Land et al. (Stat J Theor Appl Stat, 46(3):351–360, 2012) and Singh and Tarray (Model Assist Stat Appl, 10(2):129–138, 2015) to estimate sensitive characteristic under study. The proposed model provides a more efficient unbiased estimator of the mean number of individuals. The procedure also introduces the measure of privacy protection of respondents and compares randomized response models in terms of efficiency and privacy protection. Empirical illustrations are presented to support the theoretical results and suitable recommendations are put forward to the survey statisticians/practitioners.
      PubDate: 2023-01-30
       
  • Compositional PLS biplot based on pivoting balances: an application to
           explore the association between 24-h movement behaviours and adiposity

    • Free pre-print version: Loading...

      Abstract: Abstract Movement behaviour data are compositional in nature, therefore the logratio methodology has been demonstrated appropriate for their statistical analysis. Compositional data can be mapped into the ordinary real space through new sets of variables (orthonormal logratio coordinates) representing balances between the original compositional parts. Geometric rotation between orthonormal logratio coordinates systems can be used to extract relevant information from any of them. We exploit this idea to introduce the concept of pivoting balances, which facilitates the construction and use of interpretable balances according to the purpose of the data analysis. Moreover, graphical representation through ternary diagrams has been ordinarily used to explore time-use compositions consisting of, or being amalgamated into, three parts. Data dimension reduction techniques can however serve well for visualisation and facilitate understanding in the case of larger compositions. We here develop suitable pivoting balance coordinates that in combination with an adapted formulation of compositional partial least squares regression biplots enable meaningful visualisation of more complex time-use patterns and their relationships with an outcome variable. The use and features of the proposed method are illustrated in a study examining the association between movement behaviours and adiposity from a sample of Czech school-aged girls. The results suggest that an adequate strategy for obesity prevention in this group would be to focus on achieving a positive balance of vigorous physical activity in combination with sleep against the other daily behaviours.
      PubDate: 2023-01-28
       
  • Robust singular spectrum analysis: comparison between classical and robust
           approaches for model fit and forecasting

    • Free pre-print version: Loading...

      Abstract: Abstract Singular spectrum analysis is a powerful and widely used non-parametric method to analyse and forecast time series. Although singular spectrum analysis has proven to outperform traditional parametric methods for model fit and model forecasting, one of the steps of this algorithm is the singular value decomposition of the trajectory matrix, which is very sensitive to the presence of outliers because it uses the \(L_{2}\) norm optimization. Therefore the presence of outlying observations have a significant impact on the singular spectrum analysis reconstruction and forecasts. The main aim of this paper is to introduce four robust alternatives to the singular spectrum analysis, where the singular value decomposition is replaced by the: (i) robust regularized singular value decomposition; (ii) robust principal component analysis algorithm, which combines projection pursuit ideas with robust scatter matrix estimation; (iii) robust principal component analysis based on the grid algorithm and projection pursuit; and (iv) robust principal component analysis based on a robust covariance matrix. The four proposed robust singular spectrum analysis alternatives are compared with the classical singular spectrum analysis and other available robust singular spectrum analysis algorithms, in terms of model fit and model forecasting via Monte Carlo simulations based on synthetic and real data, considering several contamination scenarios.
      PubDate: 2023-01-20
       
  • A pivot-based simulated annealing algorithm to determine oblique splits
           for decision tree induction

    • Free pre-print version: Loading...

      Abstract: Abstract We describe a new simulated annealing algorithm to compute near-optimal oblique splits in the context of decision tree induction. The algorithm can be interpreted as a walk on the cells of a hyperplane arrangement defined by the observations in the training set. The cells of this hyperplane arrangement correspond to subsets of oblique splits that divide the feature space in the same manner and the vertices of this arrangement reveal multiple neighboring solutions. We use a pivoting strategy to iterate over the vertices and to explore this neighborhood. Embedding this neighborhood search in a simulated annealing framework allows to escape local minima and increases the probability of finding global optimal solutions. To overcome the problems related to degeneracy, we rely on a lexicographic pivoting scheme. Our experimental results indicate that our approach is well-suited for inducing small and accurate decision trees and capable of outperforming existing univariate and oblique decision tree induction algorithms. Furthermore, oblique decision trees obtained with this method are competitive with other popular prediction models.
      PubDate: 2023-01-18
       
  • Correction to: Tempered expectation-maximization algorithm for the
           estimation of discrete latent variable models

    • Free pre-print version: Loading...

      PubDate: 2023-01-17
       
  • Distributed quantile regression for longitudinal big data

    • Free pre-print version: Loading...

      Abstract: Abstract Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.
      PubDate: 2023-01-17
       
  • A batch process for high dimensional imputation

    • Free pre-print version: Loading...

      Abstract: Abstract This paper describes a correlation-based batch process for addressing high dimensional imputation problems. There are relatively few algorithms designed to efficiently handle imputation of missing data in high dimensional contexts. Fewer still are flexible enough to natively handle mixed-type data, often requiring lengthy pre-processing to get the data into proper shape, and then post-processing to return the data to usable form. Such decisions as well as assumptions made by many methods (e.g., data generating process) limit their performance, flexibility, and usability. Built on a set of complementary algorithms for nonparametric imputation via chained random forests, I introduce a batching process to ease computational costs associated with high dimensional imputation by subsetting data based on ranked cross-feature absolute correlations. The algorithm then imputes each batch separately, and joins imputed subsets in the final step. The process, hdImpute, is fast and accurate. As a result, high dimensional imputation is more accessible, and researchers are not forced to decide between speed or accuracy. Complementary software is available in the form of an R package, and is openly developed on Github under the MIT public license. In the spirit of open science, collaboration and engagement with the actively developing software are encouraged.
      PubDate: 2023-01-17
       
  • Interactive graphics for visually diagnosing forest classifiers in R

    • Free pre-print version: Loading...

      Abstract: Abstract This article describes structuring data and constructing plots to explore forest classification models interactively. A forest classifier is an example of an ensemble since it is produced by bagging multiple trees. The process of bagging and combining results from multiple trees produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into class structure in high dimensions. Various aspects of models are explored in this article, to assess model complexity, individual model contributions, variable importance and dimension reduction, and uncertainty in prediction associated with individual observations. The ideas are applied to the random forest algorithm and projection pursuit forest but could be more broadly applied to other bagged ensembles helping in the interpretability deficit of these methods. Interactive graphics are built in R using the ggplot2, plotly, and shiny packages.
      PubDate: 2023-01-12
       
  • Explainable Ensemble Trees

    • Free pre-print version: Loading...

      Abstract: Abstract Ensemble methods are supervised learning algorithms that provide highly accurate solutions by training many models. Random forest is probably the most widely used in regression and classification problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression. However, such an algorithm suffers from a lack of explainability and thus does not allow users to understand how particular decisions are made. To improve on that, we propose a new way of interpreting an ensemble tree structure. Starting from a random forest model, our approach is able to explain graphically the relationship structure between the response variable and predictors. The proposed method appears to be useful in all real-world cases where model interpretation for predictive purposes is crucial. The proposal is evaluated by means of real data sets.
      PubDate: 2023-01-12
       
  • A simple portmanteau test with data-driven truncation point

    • Free pre-print version: Loading...

      Abstract: Abstract Time series forecasting is an important application of many statistical methods. When it is appropriate to assume that the data may be projected towards the future based on the past history of the dataset, a preliminary examination is usually required to ensure that the data sequence is autocorrelated. This is a quite obvious assumption that has to be made and can be the object of a formal test of hypotheses. The most widely used test is the portmanteau test, i.e., a sum of the squared standardized autocorrelations up to an appropriate maximum lag (the truncation point). The choice of the truncation point is not obvious and may be data-driven exploiting supplementary information, e.g. the largest autocorrelation and the lag where such maximum is found. In this paper, we propose a portmanteau test with a truncation point equal to the lag of the largest (absolute value) estimated autocorrelation. Theoretical and simulation-based comparisons based on size and power are performed with competing portmanteau tests, and encouraging results are obtained.
      PubDate: 2023-01-02
       
  • Computer algorithms of lower-order confounding in regular designs

    • Free pre-print version: Loading...

      Abstract: Abstract In the design of experiments, an optimal design should minimize the confounding between factorial effects, especially main effects and two-factor interaction effects. The general minimum lower-order confounding (GMC) criterion can be used to choose optimal regular designs based on the aliased component-number pattern. This paper aims to study the confounding properties of lower-order effects and provide several computer algorithms to calculate the lower-order confounding in regular designs. We provide a search algorithm to obtain GMC designs. Through python software, we conduct these algorithms. Some examples are analyzed to illustrate the effectiveness of the proposed algorithms.
      PubDate: 2022-12-30
       
  • Applications of resampling methods in multivariate Liu estimator

    • Free pre-print version: Loading...

      Abstract: Abstract Multicollinearity among independent variables is one of the most common problems in regression models. The aftereffects of this problem, such as ill-conditioning, instability of estimators, and inflating mean squared error of ordinary least squares estimator (OLS), in the multivariate linear regression model (MLRM) are the same that of linear regression models. To combat multicollinearity, several approaches have been presented in the literature. Liu estimator (LE), as a well known estimator in this connection, has been used in linear, generalized linear, and nonlinear regression models by researchers in recent years. In this paper, for the first time, LE and jackknifed Liu estimator (JLE) are investigated in MLRM. To improve estimators in the sense of mean squared error, two known resampling methods, i.e., jackknife and bootstrap, are used. Finally, OLS, LE, and JLE are compared by a simulation study and also using a real data set, by resampling methods in MLRM.
      PubDate: 2022-12-30
       
  • Exact inference for progressively Type-I censored step-stress accelerated
           life test under interval monitoring

    • Free pre-print version: Loading...

      Abstract: Abstract Thanks to continuously advancing technology and manufacturing processes, the products and devices are becoming highly reliable but performing the life tests of these products at normal operating conditions has become extremely difficult, if not impossible, due to their long lifespans. This problem is solved by accelerated life tests where the test units are subjected to higher stress levels than the normal usage level so that information on the lifetime parameters can be obtained more quickly. The lifetime at the design condition is then estimated through extrapolation using a regression model. Although continuous inspection of the exact failure times is an ideal mode, the exact failure times of test units may not be available in practice due to technical limitations and/or budgetary constraints, but only the failure counts are collected at certain time points during the test (i.e., interval inspection). In this work, we consider the progressively Type-I censored step-stress accelerated life test under the assumption that the lifetime of each test unit is exponentially distributed. Under this setup, we obtain the maximum likelihood estimator of the mean time to failure at each stress level and derive its exact sampling distribution under the condition that its existence is ensured. Using the exact distribution of the MLE as well as its asymptotic distribution and the parametric bootstrap method, we then discuss the construction of confidence intervals for the mean parameters and their performance is assessed through Monte Carlo simulations. Finally, an example is presented to illustrate all the methods of inference discussed here.
      PubDate: 2022-12-30
       
  • Deterministic subsampling for logistic regression with massive data

    • Free pre-print version: Loading...

      Abstract: Abstract For logistic regression with massive data, subsampling is an effective way to alleviate the computational challenge. In contrast to most existing methods in the literature that select subsamples randomly, we propose to obtain subsamples in a deterministic way. To be more specific, we measure with leverage scores the influence of each sample to model fitting and select the ones with the highest scores deterministically. We propose a faster alternative method by mimicking the leverage scores with a simple and intuitive form. Our methods pick subsamples catering for constructing a linear classification boundary and hence are more efficient when the subsample size is small. We derive non-asymptotic properties of the two methods regarding the observed information, prediction, and parameter estimation accuracy. Extensive simulation studies and two real applications validate the theoretical results and demonstrate the superiority of our methods.
      PubDate: 2022-12-30
       
  • Approximating income inequality dynamics given incomplete information: an
           upturned Markov chain model

    • Free pre-print version: Loading...

      Abstract: Abstract This article aims to understand mobility within income distribution in cases where there is incomplete information about how individuals transit between income distribution brackets. Understanding these transitions is crucial for evaluating and designing economic policies that affect the population in the long run. For this reason, we propose a methodology that may assist decision-makers to improve policies related to poverty reduction. We start by assuming that the income distribution bracket a person holds exclusively depends on the previous generation’s income bracket, i.e. it has the memoryless property. Therefore, our model resembles a Markov chain model with a steady state distribution that describes the distribution of the income brackets in the long run, and a transition matrix that describes the transitions between income distribution brackets from generation to generation. In contrast to a Markov chain, we assume a given steady state, in order to analyze the space of consistent transition matrices that could generate the steady state proposed. Additionally, we used the joint distribution simulation algorithm developed by Montiel and Bickel (Decis Anal 9:329–347, https://doi.org/10.1287/deca.1120.0252, 2012) to analyze the transition matrix, which allows us to understand the effects of partial information. We test the model with official data from the National Institute of Statistics and Geography and the Social Mobility Survey in Mexico.
      PubDate: 2022-12-30
       
  • Correction to: Evaluating countries’ performances by means of rank
           trajectories: functional measures of magnitude and evolution

    • Free pre-print version: Loading...

      PubDate: 2022-12-28
       
  • On the fast computation of the Dirichlet-multinomial log-likelihood
           function

    • Free pre-print version: Loading...

      Abstract: Abstract We introduce a new algorithm to compute the difference between values of the \(\log \Gamma\) -function in close points, where \(\Gamma\) denotes Euler’s gamma function. As a consequence, we obtain a way of computing the Dirichlet-multinomial log-likelihood function which is more accurate, has a better computational complexity and a wider range of application than the previously known ones.
      PubDate: 2022-12-26
       
  • Probabilistic learning constrained by realizations using a weak
           formulation of Fourier transform of probability measures

    • Free pre-print version: Loading...

      Abstract: Abstract This paper deals with the taking into account a given target set of realizations as constraints in the Kullback–Leibler divergence minimum principle (KLDMP). We present a novel probabilistic learning algorithm that makes it possible to use the KLDMP when the constraints are not defined by a target set of statistical moments for the quantity of interest (QoI) of an uncertain/stochastic computational model, but are defined by a target set of realizations for the QoI for which the statistical moments associated with these realizations are not or cannot be estimated. The method consists in defining a functional constraint, as the equality of the Fourier transforms of the posterior probability measure and the target probability measure, and in constructing a finite representation of the weak formulation of this functional constraint. The proposed approach allows for estimating the posterior probability measure of the QoI (unsupervised case) or of the posterior joint probability measure of the QoI with the control parameter (supervised case). The existence and the uniqueness of the posterior probability measure is analyzed for the two cases. The numerical aspects are detailed in order to facilitate the implementation of the proposed method. The presented application in high dimension demonstrates the efficiency and the robustness of the proposed algorithm.
      PubDate: 2022-12-23
       
  • Comparing the diagnostic performance of methods used in a full-factorial
           design multi-reader multi-case studies

    • Free pre-print version: Loading...

      Abstract: Abstract In radiology, patients are frequently diagnosed according to the subjective interpretations of radiologists based on an image. Such diagnosis results may be biased and significantly differ among evaluators (i.e., readers) due to different education levels and experiences. One solution to overcome this problem is to use a multi-reader multi-case study design in which there are multiple readers, and the same images are evaluated multiple times. Several methods, including model-based and bootstrap-based, are available for analyzing the multi-reader multi-case studies. In this study, we aimed to compare the performance of available methods on a mammogram dataset. We also conducted a comprehensive simulation study to generalize the results to more general scenarios. We considered the effect of the number of samples and readers, data structures (i.e., correlation structures and variance components), and overall accuracy of diagnostic tests (AUC) in the simulation set-up. Results showed that the model-based methods had type-I error rates close to the nominal level as the number of samples and readers increased. Bootstrap-based methods, on the other hand, were generally conservative. However, they performed the best when the sample size was small, and the AUC level was high. In conclusion, the performance of the proposed methods was not the same under all conditions and was affected by the factors we considered in the simulation study. Therefore, it is not a perfect strategy to use one method under all scenarios because it may lead to biased conclusions.
      PubDate: 2022-12-18
       
  • Fitting sparse Markov models through a collapsed Gibbs sampler

    • Free pre-print version: Loading...

      Abstract: Abstract Sparse Markov models (SMMs) provide a parsimonious representation for higher-order Markov models. We present a computationally efficient method for fitting SMMs using a collapsed Gibbs sampler, the GSDPMM. We prove the consistency of the GSDPMM in fitting SMMs. In simulations, the GSDPMM was found to perform as well or better than existing methods for fitting SMMs. We apply the GSDPMM method to fit SMMs to patterns of wind speeds and DNA sequences.
      PubDate: 2022-12-15
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 18.232.179.5
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-