Subjects -> STATISTICS (Total: 130 journals)
 The end of the list has been reached or no journals were found for your choice.
Similar Journals
 Computational StatisticsJournal Prestige (SJR): 0.803 Citation Impact (citeScore): 1Number of Followers: 15      Hybrid journal (It can contain Open Access articles) ISSN (Print) 1613-9658 - ISSN (Online) 0943-4062 Published by Springer-Verlag  [2469 journals]
• Topic based quality indexes assessment through sentiment

Abstract: Abstract This paper proposes a new methodology called TOpic modeling Based Index Assessment through Sentiment (TOBIAS). This method aims at modeling the effects of the topics, moods, and sentiments of the comments describing a phenomenon upon its overall rating. TOBIAS is built combining different techniques and methodologies. Firstly, Sentiment Analysis identifies sentiments, emotions, and moods, and Topic Modeling finds the main relevant topics inside comments. Then, Partial Least Square Path Modeling estimates how they affect an overall rating that summarizes the performance of the analyzed phenomenon. We carried out TOBIAS on a real case study on the university courses’ quality evaluated by the University of Cagliari (Italy) students. We found TOBIAS able to provide interpretable results on the impact of discussed topics by students with their expressed sentiments, emotions, and moods and with the overall rating.
PubDate: 2022-09-20

• Clustering directional data through depth functions

Abstract: Abstract A new depth-based clustering procedure for directional data is proposed. Such method is fully non-parametric and has the advantages to be flexible and applicable even in high dimensions when a suitable notion of depth is adopted. The introduced technique is evaluated through an extensive simulation study. In addition, a real data example in text mining is given to explain its effectiveness in comparison with other existing directional clustering algorithms.
PubDate: 2022-09-19

• Joint Bayesian longitudinal models for mixed outcome types and associated
model selection techniques

Abstract: Abstract Motivated by data measuring progression of leishmaniosis in a cohort of US dogs, we develop a Bayesian longitudinal model with autoregressive errors to jointly analyze ordinal and continuous outcomes. Multivariate methods can borrow strength across responses and may produce improved longitudinal forecasts of disease progression over univariate methods. We explore the performance of our proposed model under simulation, and demonstrate that it has improved prediction accuracy over traditional Bayesian hierarchical models. We further identify an appropriate model selection criterion. We show that our method holds promise for use in the clinical setting, particularly when ordinal outcomes are measured alongside other variables types that may aid clinical decision making. This approach is particularly applicable when multiple, imperfect measures of disease progression are available.
PubDate: 2022-09-18

• Bayesian variable selection using Knockoffs with applications to genomics

Abstract: Abstract Given the costliness of HIV drug therapy research, it is important not only to maximize true positive rate (TPR) by identifying which genetic markers are related to drug resistance, but also to minimize false discovery rate (FDR) by reducing the number of incorrect markers unrelated to drug resistance. In this study, we propose a multiple testing procedure that unifies key concepts in computational statistics, namely Model-free Knockoffs, Bayesian variable selection, and the local false discovery rate. We develop an algorithm that utilizes the augmented data-Knockoff matrix and implement Bayesian Lasso. We then identify signals using test statistics based on Markov Chain Monte Carlo outputs and local false discovery rate. We test our proposed methods against non-bayesian methods such as Benjamini–Hochberg (BHq) and Lasso regression in terms TPR and FDR. Using numerical studies, we show the proposed method yields lower FDR compared to BHq and Lasso for certain cases, such as for low and equi-dimensional cases. We also discuss an application to an HIV-1 data set, which aims to be applied analyzing genetic markers linked to drug resistant HIV in the Philippines in future work.
PubDate: 2022-09-18

• Two-parameter link functions, with applications to negative binomial,
Weibull and quantile regression

Abstract: Abstract One-parameter link functions play a fundamental role in regression via generalized linear modelling. This paper develops the general theory for two-parameter links in the very large class of vector generalized linear models by using total derivatives applied to a composite log-likelihood within the Fisher scoring/iteratively reweighted least squares algorithm. We solve a four-decade old problem with an interesting history as our first example: the canonical link for negative binomial regression. The remaining examples are fitting Weibull regression using both the mean and quantile directly compared to GAMLSS, and performing quantile regression based on the Gaussian distribution. Numerical examples based on real and simulated data are given. The methods described here are implemented by the VGAM and VGAMextra R packages, available on CRAN. Supplementary materials for this article are available online.
PubDate: 2022-09-15

• A hybrid approach for the analysis of complex categorical data structures:
assessment of latent distance learning perception in higher education

Abstract: Abstract A long tradition of analysing ordinal response data deals with parametric models, which started with the seminal approach of cumulative models. When data are collected by means of Likert scale survey questions in which several scored items measure one or more latent traits, one of the sore topics is how to deal with the ordered categories. A stacked ensemble (or hybrid) model is introduced in the proposal to tackle the limitations of summing up the items. In particular, multiple items responses are synthesised into a single meta-item, defined via a joint data reduction approach; the meta-item is then modelled according to regression approaches for ordered polytomous variables accounting for potential scaling effects. Finally, a recursive partitioning method yielding trees provides automatic variable selection. The performance of the method is evaluated empirically by using a survey on Distance Learning perception.
PubDate: 2022-09-15

• Correction: Sparse reduced-rank regression for simultaneous rank and
variable selection via manifold optimization

PubDate: 2022-09-10

• Evaluating countries’ performances by means of rank trajectories:
functional measures of magnitude and evolution

Abstract: Abstract Countries’ performance can be compared by means of indicators, which in turn give rise to rankings at a given time. However, the ranking does not show whether a country is improving, worsening or is stable in its performance. Meanwhile, the evolutionary behaviour of a country’s performance is of fundamental importance to assess the effect of the adopted policies in both absolute and comparative terms. Nevertheless, establishing a general ranking among countries over time is an open problem in the literature. Consequently, this paper aims to analyze ranks’ dynamic by means of the functional data analysis approach. Specifically, countries’ performances are evaluated by taking into account both their ranking position and their evolutionary behaviour, and by considering two functional measures: the modified hypograph index and the weighted integrated first derivative. The latter are scalar measures that are able to reflect trajectories behaviours over time. Furthermore, a novel visualisation technique based on the suggested measures is proposed to identify groups of countries according to their performance. The effectiveness of the proposed method is shown through a simulation study. The procedure is also applied on a real dataset that is drawn from the Government Effectiveness index of 27 European countries.
PubDate: 2022-09-08

• Polynomial whitening for high-dimensional data

Abstract: Abstract The inverse square root of a covariance matrix is often desirable for performing data whitening in the process of applying many common multivariate data analysis methods. Direct calculation of the inverse square root is not available when the covariance matrix is either singular or nearly singular, as often occurs in high dimensions. We develop new methods, which we broadly call polynomial whitening, to construct a low-degree polynomial in the empirical covariance matrix which has similar properties to the true inverse square root of the covariance matrix (should it exist). Our method does not suffer in singular or near-singular settings, and is computationally tractable in high dimensions. We demonstrate that our construction of low-degree polynomials provides a good substitute for high-dimensional inverse square root covariance matrices, in both $$d < N$$ and $$d \ge N$$ cases. We offer examples on data whitening, outlier detection and principal component analysis to demonstrate the performance of the proposed method.
PubDate: 2022-09-08

• A new non-archimedean metric on persistent homology

Abstract: Abstract In this article, we define a new non-archimedean metric structure, called cophenetic metric, on persistent homology classes of all degrees. We then show that zeroth persistent homology together with the cophenetic metric and hierarchical clustering algorithms with a number of different metrics do deliver statistically verifiable commensurate topological information based on experimental results we obtained on different datasets. We also observe that the resulting clusters coming from cophenetic distance do shine in terms of different evaluation measures such as silhouette score and the Rand index. Moreover, since the cophenetic metric is defined for all homology degrees, one can now display the inter-relations of persistent homology classes in all degrees via rooted trees.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01187-z

• Modified minimum distance estimators: definition, properties and
applications

Abstract: Abstract Estimating the location and scale parameters of a distribution is one of the most crucial issues in Statistics. Therefore, various estimators are proposed for estimating them, such as maximum likelihood, method of moments and minimum distance (e.g. Cramér-von Mises—CvM and Anderson Darling—AD), etc. However, in most of the cases, estimators of the location parameter $$\mu$$ and scale parameter $$\sigma$$ cannot be obtained in closed forms because of the nonlinear function(s) included in the corresponding estimating equations. Therefore, numerical methods are used to obtain the estimates of these parameters. However, they may have some drawbacks such as multiple roots, wrong convergency, and non-convergency of iterations. In this study, we adopt the idea of Tiku (Biometrika 54:155–165, 1967) into the CvM and AD methodologies with the intent of eliminating the aforementioned difficulties and obtaining closed form estimators of the parameters $$\mu$$ and $$\sigma$$ . Resulting estimators are called as modified CvM (MCvM) and modified AD (MAD), respectively. Proposed estimators are expressed as functions of sample observations and thus their calculations are straightforward. This property also allows us to avoid computational cost of iteration. A Monte-Carlo simulation study is conducted to compare the efficiencies of the CvM and AD estimators with their modified counterparts, i.e. the MCvM and MAD, for the normal, extreme value and Weibull distributions for an illustration. Real data sets are used to show the implementation of the proposed estimation methodologies.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01170-8

• New classes of tests for the Weibull distribution using Stein’s method
in the presence of random right censoring

Abstract: Abstract We develop two new classes of tests for the Weibull distribution based on Stein’s method. The proposed tests are applied in the full sample case as well as in the presence of random right censoring. We investigate the finite sample performance of the new tests using a comprehensive Monte Carlo study. In both the absence and presence of censoring, it is found that the newly proposed classes of tests outperform competing tests against the majority of the distributions considered. In the cases where censoring is present we consider various censoring distributions. Some remarks on the asymptotic properties of the proposed tests are included. We present another result of independent interest; a test initially proposed for use with full samples is amended to allow for testing for the Weibull distribution in the presence of censoring. The techniques developed in the paper are illustrated using two practical examples.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01178-0

• A computationally efficient approach to estimating species richness and
rarefaction curve

Abstract: Abstract In ecological and educational studies, estimators of the total number of species and rarefaction curve based on empirical samples are important tools. We propose a new method to estimate both rarefaction curve and the number of species based on a ready-made numerical approach such as quadratic optimization. The key idea in developing the proposed algorithm is based on nonparametric empirical Bayes estimation incorporating an interpolated rarefaction curve through quadratic optimization with linear constraints based on g-modeling in Efron (Stat Sci 29:285–301, 2014). Our proposed algorithm is easily implemented and shows better performances than existing methods in terms of computational speed and accuracy. Furthermore, we provide a criterion of model selection to choose some tuning parameters in estimation procedure and the idea of confidence interval based on asymptotic theory rather than resampling method. We present some asymptotic result of our estimator to validate the efficiency of our estimator theoretically. A broad range of numerical studies including simulations and real data examples are also conducted, and the gain that it produces has been compared to existing methods.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01185-1

• Statistical modeling of directional data using a robust hierarchical von
mises distribution model: perspectives for wind energy

Abstract: Abstract For describing wind direction, a variety of statistical distributions has been suggested that provides information about the wind regime at a particular location and aids the development of efficient wind energy generation. In this paper a systematic approach for data classification putting a special emphasis on the von Mises mixtures is presented. A von Mises mixture model is broad enough to cover, on one hand, symmetry and asymmetry, on the other hand, unimodality and multimodality of circular data. We developed an improved mathematical model of the classical von Mises mixture method, rests on number of principles which gives its internal coherence and originality. In principle, our hierarchical model of von Mises distributions is flexible to precisely modeled complex directional data sets. We define a new specific expectation–maximization (S-EM) algorithm for estimating the parameters of the model. The simulation showed that satisfactory fit of complex directional data could be obtained (error generally < 1%). Furthermore, the Bayesian Information Criterion is used to judge the goodness of fit and the suitability for this model versus common distributions found in the literature. The findings prove that our hierarchical model of von Mises distributions is relevant for modeling the complex directional data with several modes and/or prevailing data directions.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01173-5

• Estimation and inferences for varying coefficient partially nonlinear
quantile models with censoring indicators missing at random

Abstract: Abstract In this paper, we focus on the varying coefficient partially nonlinear quantile regression model when the response variable is right censored and the censoring indicator is missing at random. Based on the calibration and imputation estimation methods, the three-stage approaches are carried out to construct the estimators of the parameter vector in the nonlinear function part and the nonparametric varying-coefficient functions involved in the model. Under some appropriate conditions, the asymptotic properties of the proposed estimators are established. Simulation study and a real data analysis are performed to illustrate the performances of our proposed estimators.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01192-2

• Reliability inference for multicomponent stress–strength model from
Kumaraswamy-G family of distributions based on progressively first failure
censored samples

Abstract: Abstract In this article, the problem of reliability inference of multicomponent stress–strength (MSS) from Kumaraswamy-G (Kw-G) family of distributions under progressive first failure censoring is considered. The reliability of MSS is considered when both the stress and strength variables follow Kw-G distributions with different first shape parameters and common second shape parameter. The maximum likelihood (ML) and Bayes estimators of reliability are derived when all the parameters are unknown. Also, the ML, uniformly minimum variance unbiased and Bayes estimators of reliability are derived in case of common shape parameter is known. The Bayesian credible and HPD credible intervals of reliability are developed using Gibbs sampling method. The performance of various estimates developed are discussed by a Monte Carlo simulation study. At last, two real life examples are considered for illustrative purposes.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01180-6

• Gumbel’s bivariate exponential distribution: estimation of the
association parameter using ranked set sampling

PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01176-2

• RMSE-minimizing confidence intervals for the binomial parameter

Abstract: Abstract Let X be the number of successes in n mutually independent and identically distributed Bernoulli trials, each with probability of success p. For fixed n and $$\alpha$$ , there are $$n + 1$$ distinct two-sided $$100(1 - \alpha )$$ % confidence intervals for p associated with the outcomes $${X = 0, 1, 2, \ldots , n}$$ . There is no known exact non-randomized confidence interval for p. Existing approximate confidence interval procedures use a formula, which often requires numerical methods to implement, to calculate confidence interval bounds. The bounds associated with these confidence intervals correspond to discontinuities in the actual coverage function. The paper does not aim to provide a formula for the confidence interval bounds, but rather to select the confidence interval bounds that minimize the root mean square error of the actual coverage function for sample size n and significance level $$\alpha$$ in the frequentist context.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01183-3

• Shapiro-Wilk test for multivariate skew-normality

Abstract: Abstract The multivariate skew-normal family of distributions is a flexible class of probability models that includes the multivariate normal distribution as a special case. Two procedures for testing that a multivariate random sample comes from the multivariate skew-normal distribution are proposed here based on the estimated canonical form. Canonical data are transformed into approximately multivariate normal observations and then a multivariate version of the Shapiro-Wilk test is used for testing multivariate normality. Critical values for the tests are approximated without using parametric bootstrap. Monte Carlo simulation results provide evidence that the nominal test level is preserved, in general, under the considered settings. The simulation results also indicate that these tests are in general more powerful than existing tests for the same problem versus the studied alternatives.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01188-y

• Accelerated failure time models for recurrent event data analysis and
joint modeling

Abstract: Abstract There are two commonly encountered problems in survival analysis: (a) recurrent event data analysis, where an individual may experience an event multiple times over follow-up; and (b) joint modeling, where the event time distribution depends on a longitudinally measured internal covariate. The proportional hazards (PH) family offers an attractive modeling paradigm for recurrent event data analysis and joint modeling. Although there are well-known techniques to test the PH assumption for standard survival data analysis, checking this assumption for joint modeling has received less attention. An alternative framework involves considering an accelerated failure time (AFT) model, which is particularly useful when the PH assumption fails. Note that there are AFT models that can describe data with wide ranging characteristics but have received far less attention in modeling recurrent event data and joint analysis of time-to-event and longitudinal data. In this paper, we develop methodology to analyze these types of data using the AFT family of distributions. Fitting these models is computationally and numerically much more demanding compared to standard survival data analysis. In particular, fitting a joint model is a computationally intensive task as it requires to approximate multiple integrals that do not have an analytic solution except in very special cases. We propose computational algorithms for statistical inference, and develop a software package to fit these models. The proposed methodology is demonstrated using both simulated and real data.
PubDate: 2022-09-01
DOI: 10.1007/s00180-021-01171-7

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762