A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

              [Sort by number of followers]   [Restore default list]

  Subjects -> STATISTICS (Total: 130 journals)
Showing 1 - 151 of 151 Journals sorted alphabetically
Advances in Complex Systems     Hybrid Journal   (Followers: 10)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 52)
Applied Categorical Structures     Hybrid Journal   (Followers: 4)
Argumentation et analyse du discours     Open Access   (Followers: 7)
Asian Journal of Mathematics & Statistics     Open Access   (Followers: 8)
AStA Advances in Statistical Analysis     Hybrid Journal   (Followers: 2)
Australian & New Zealand Journal of Statistics     Hybrid Journal   (Followers: 12)
Biometrical Journal     Hybrid Journal   (Followers: 9)
Biometrics     Hybrid Journal   (Followers: 51)
British Journal of Mathematical and Statistical Psychology     Full-text available via subscription   (Followers: 17)
Building Simulation     Hybrid Journal   (Followers: 2)
CHANCE     Hybrid Journal   (Followers: 5)
Communications in Statistics - Simulation and Computation     Hybrid Journal   (Followers: 9)
Communications in Statistics - Theory and Methods     Hybrid Journal   (Followers: 11)
Computational Statistics     Hybrid Journal   (Followers: 15)
Computational Statistics & Data Analysis     Hybrid Journal   (Followers: 35)
Current Research in Biostatistics     Open Access   (Followers: 8)
Decisions in Economics and Finance     Hybrid Journal   (Followers: 12)
Demographic Research     Open Access   (Followers: 14)
Engineering With Computers     Hybrid Journal   (Followers: 5)
Environmental and Ecological Statistics     Hybrid Journal   (Followers: 7)
ESAIM: Probability and Statistics     Open Access   (Followers: 4)
Extremes     Hybrid Journal   (Followers: 2)
Fuzzy Optimization and Decision Making     Hybrid Journal   (Followers: 8)
Geneva Papers on Risk and Insurance - Issues and Practice     Hybrid Journal   (Followers: 11)
Handbook of Numerical Analysis     Full-text available via subscription   (Followers: 5)
Handbook of Statistics     Full-text available via subscription   (Followers: 7)
IEA World Energy Statistics and Balances -     Full-text available via subscription   (Followers: 2)
International Journal of Computational Economics and Econometrics     Hybrid Journal   (Followers: 6)
International Journal of Quality, Statistics, and Reliability     Open Access   (Followers: 17)
International Journal of Stochastic Analysis     Open Access   (Followers: 2)
International Statistical Review     Hybrid Journal   (Followers: 12)
Journal of Algebraic Combinatorics     Hybrid Journal   (Followers: 3)
Journal of Applied Statistics     Hybrid Journal   (Followers: 20)
Journal of Biopharmaceutical Statistics     Hybrid Journal   (Followers: 23)
Journal of Business & Economic Statistics     Full-text available via subscription   (Followers: 38, SJR: 3.664, CiteScore: 2)
Journal of Combinatorial Optimization     Hybrid Journal   (Followers: 7)
Journal of Computational & Graphical Statistics     Full-text available via subscription   (Followers: 21)
Journal of Econometrics     Hybrid Journal   (Followers: 82)
Journal of Educational and Behavioral Statistics     Hybrid Journal   (Followers: 7)
Journal of Forecasting     Hybrid Journal   (Followers: 19)
Journal of Global Optimization     Hybrid Journal   (Followers: 6)
Journal of Mathematics and Statistics     Open Access   (Followers: 6)
Journal of Nonparametric Statistics     Hybrid Journal   (Followers: 6)
Journal of Probability and Statistics     Open Access   (Followers: 10)
Journal of Risk and Uncertainty     Hybrid Journal   (Followers: 34)
Journal of Statistical and Econometric Methods     Open Access   (Followers: 3)
Journal of Statistical Physics     Hybrid Journal   (Followers: 13)
Journal of Statistical Planning and Inference     Hybrid Journal   (Followers: 7)
Journal of Statistical Software     Open Access   (Followers: 16, SJR: 13.802, CiteScore: 16)
Journal of the American Statistical Association     Full-text available via subscription   (Followers: 72, SJR: 3.746, CiteScore: 2)
Journal of the Korean Statistical Society     Hybrid Journal  
Journal of the Royal Statistical Society Series C (Applied Statistics)     Hybrid Journal   (Followers: 36)
Journal of the Royal Statistical Society, Series A (Statistics in Society)     Hybrid Journal   (Followers: 28)
Journal of the Royal Statistical Society, Series B (Statistical Methodology)     Hybrid Journal   (Followers: 41)
Journal of Theoretical Probability     Hybrid Journal   (Followers: 3)
Journal of Time Series Analysis     Hybrid Journal   (Followers: 16)
Journal of Urbanism: International Research on Placemaking and Urban Sustainability     Hybrid Journal   (Followers: 23)
Law, Probability and Risk     Hybrid Journal   (Followers: 6)
Lifetime Data Analysis     Hybrid Journal   (Followers: 7)
Mathematical Methods of Statistics     Hybrid Journal   (Followers: 4)
Measurement Interdisciplinary Research and Perspectives     Hybrid Journal   (Followers: 1)
Metrika     Hybrid Journal   (Followers: 4)
Monthly Statistics of International Trade - Statistiques mensuelles du commerce international     Full-text available via subscription   (Followers: 3)
Multivariate Behavioral Research     Hybrid Journal   (Followers: 8)
Optimization Letters     Hybrid Journal   (Followers: 2)
Optimization Methods and Software     Hybrid Journal   (Followers: 6)
Oxford Bulletin of Economics and Statistics     Hybrid Journal   (Followers: 33)
Pharmaceutical Statistics     Hybrid Journal   (Followers: 16)
Queueing Systems     Hybrid Journal   (Followers: 7)
Research Synthesis Methods     Hybrid Journal   (Followers: 7)
Review of Economics and Statistics     Hybrid Journal   (Followers: 138)
Review of Socionetwork Strategies     Hybrid Journal  
Risk Management     Hybrid Journal   (Followers: 17)
Sankhya A     Hybrid Journal   (Followers: 3)
Scandinavian Journal of Statistics     Hybrid Journal   (Followers: 9)
Sequential Analysis: Design Methods and Applications     Hybrid Journal  
Significance     Hybrid Journal   (Followers: 7)
Sociological Methods & Research     Hybrid Journal   (Followers: 40)
SourceOECD Measuring Globalisation Statistics - SourceOCDE Mesurer la mondialisation - Base de donnees statistiques     Full-text available via subscription  
Stata Journal     Full-text available via subscription   (Followers: 8)
Statistica Neerlandica     Hybrid Journal   (Followers: 1)
Statistical Inference for Stochastic Processes     Hybrid Journal   (Followers: 3)
Statistical Methods and Applications     Hybrid Journal   (Followers: 6)
Statistical Methods in Medical Research     Hybrid Journal   (Followers: 27)
Statistical Modelling     Hybrid Journal   (Followers: 18)
Statistical Papers     Hybrid Journal   (Followers: 4)
Statistics & Probability Letters     Hybrid Journal   (Followers: 13)
Statistics and Computing     Hybrid Journal   (Followers: 13)
Statistics and Economics     Open Access  
Statistics in Medicine     Hybrid Journal   (Followers: 122)
Statistics: A Journal of Theoretical and Applied Statistics     Hybrid Journal   (Followers: 12)
Stochastic Models     Hybrid Journal   (Followers: 2)
Stochastics An International Journal of Probability and Stochastic Processes: formerly Stochastics and Stochastics Reports     Hybrid Journal   (Followers: 2)
Structural and Multidisciplinary Optimization     Hybrid Journal   (Followers: 11)
Teaching Statistics     Hybrid Journal   (Followers: 8)
Technology Innovations in Statistics Education (TISE)     Open Access   (Followers: 2)
TEST     Hybrid Journal   (Followers: 2)
The American Statistician     Full-text available via subscription   (Followers: 25)
The Canadian Journal of Statistics / La Revue Canadienne de Statistique     Hybrid Journal   (Followers: 10)
Wiley Interdisciplinary Reviews - Computational Statistics     Hybrid Journal   (Followers: 1)

              [Sort by number of followers]   [Restore default list]

Similar Journals
Journal Cover
Advances in Data Analysis and Classification
Journal Prestige (SJR): 1.09
Citation Impact (citeScore): 1
Number of Followers: 52  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1862-5355 - ISSN (Online) 1862-5347
Published by Springer-Verlag Homepage  [2469 journals]
  • Extending finite mixtures of nonlinear mixed-effects models with
           covariate-dependent mixing weights

    • Free pre-print version: Loading...

      Abstract: Finite mixtures of nonlinear mixed-effects models have emerged as a prominent tool for modeling and clustering longitudinal data following nonlinear growth patterns with heterogeneous behavior. This paper proposes an extended finite mixtures of nonlinear mixed-effects model in which the mixing proportions are related to some explanatory covariates. A logistic function is incorporated to describe the relationship between the prior classification probabilities and the covariates of interest. For parameter estimation, we develop an analytically simple expectation conditional maximization algorithm coupled with the first-order Taylor approximation to linearize the model with pseudo data. The calculation of the standard errors of estimators via a general information-based method and the empirical Bayes estimation of random effects are also discussed. The methodology is illustrated through several simulation experiments and an application to the AIDS Clinical Trials Group Protocol 315 study.
      PubDate: 2022-05-08
       
  • Modal clustering of matrix-variate data

    • Free pre-print version: Loading...

      Abstract: The nonparametric formulation of density-based clustering, known as modal clustering, draws a correspondence between groups and the attraction domains of the modes of the density function underlying the data. Its probabilistic foundation allows for a natural, yet not trivial, generalization of the approach to the matrix-valued setting, increasingly widespread, for example, in longitudinal and multivariate spatio-temporal studies. In this work we introduce nonparametric estimators of matrix-variate distributions based on kernel methods, and analyze their asymptotic properties. Additionally, we propose a generalization of the mean-shift procedure for the identification of the modes of the estimated density. Given the intrinsic high dimensionality of matrix-variate data, we discuss some locally adaptive solutions to handle the problem. We test the procedure via extensive simulations, also with respect to some competitors, and illustrate its performance through two high-dimensional real data applications.
      PubDate: 2022-05-05
       
  • Sparsifying the least-squares approach to PCA: comparison of lasso and
           cardinality constraint

    • Free pre-print version: Loading...

      Abstract: Abstract Sparse PCA methods are used to overcome the difficulty of interpreting the solution obtained from PCA. However, constraining PCA to obtain sparse solutions is an intractable problem, especially in a high-dimensional setting. Penalized methods are used to obtain sparse solutions due to their computational tractability. Nevertheless, recent developments permit efficiently obtaining good solutions of cardinality-constrained PCA problems allowing comparison between these approaches. Here, we conduct a comparison between a penalized PCA method with its cardinality-constrained counterpart for the least-squares formulation of PCA imposing sparseness on the component weights. We compare the penalized and cardinality-constrained methods through a simulation study that estimates the sparse structure’s recovery, mean absolute bias, mean variance, and mean squared error. Additionally, we use a high-dimensional data set to illustrate the methods in practice. Results suggest that using cardinality-constrained methods leads to better recovery of the sparse structure.
      PubDate: 2022-04-27
       
  • Basis expansion approaches for functional analysis of variance with
           repeated measures

    • Free pre-print version: Loading...

      Abstract: Abstract The methodological contribution in this paper is motivated by biomechanical studies where data characterizing human movement are waveform curves representing joint measures such as flexion angles, velocity, acceleration, and so on. In many cases the aim consists of detecting differences in gait patterns when several independent samples of subjects walk or run under different conditions (repeated measures). Classic kinematic studies often analyse discrete summaries of the sample curves discarding important information and providing biased results. As the sample data are obviously curves, a Functional Data Analysis approach is proposed to solve the problem of testing the equality of the mean curves of a functional variable observed on several independent groups under different treatments or time periods. A novel approach for Functional Analysis of Variance (FANOVA) for repeated measures that takes into account the complete curves is introduced. By assuming a basis expansion for each sample curve, two-way FANOVA problem is reduced to Multivariate ANOVA for the multivariate response of basis coefficients. Then, two different approaches for MANOVA with repeated measures are considered. Besides, an extensive simulation study is developed to check their performance. Finally, two applications with gait data are developed.
      PubDate: 2022-04-09
       
  • Early identification of biliary atresia using subspace and the bootstrap
           methods

    • Free pre-print version: Loading...

      Abstract: Abstract In clinical medicine, physicians often rely on information derived from medical imaging systems, such as image data for diagnosis. To detect disease early, physicians extract essential information from data manually to distinguish accurately between positive and negative cases of disease. In recent years, deep learning (DL) has been used for this purpose, attracting the attention of prominent researchers because of its excellent performance. Consequently, DL and other artificial intelligence (AI) technologies are expected to develop further through integration with statistical and other approaches. Here, we examine biliary atresia (BA), a rare disease that affects primarily infants. Our study focuses on the identification of BA from image data (stool images of BA patients). Using AI and statistical approaches, we propose a machine learning classifier (model) for accurate diagnosis, efficient classification, and early detection of BA after exposure to limited training data. In an initial study, we used the subspace pattern recognition method for the development of a similar classifier. In this study, we propose the development of a filter based on the subspace method and a statistical approach. The filter enables the classifier to extract essential information from image data and discriminate efficiently between BA and non-BA patients.
      PubDate: 2022-04-04
       
  • Robust mixture regression modeling based on two-piece scale mixtures of
           normal distributions

    • Free pre-print version: Loading...

      Abstract: Abstract The inference of mixture regression models (MRM) is traditionally based on the normal (symmetry) assumption of component errors and thus is sensitive to outliers or symmetric/asymmetric lightly/heavy-tailed errors. To deal with these problems, some new mixture regression models have been proposed recently. In this paper, a general class of robust mixture regression models is presented based on the two-piece scale mixtures of normal (TP-SMN) distributions. The proposed model is so flexible that can simultaneously accommodate asymmetry and heavy tails. The stochastic representation of the proposed model enables us to easily implement an EM-type algorithm to estimate the unknown parameters of the model based on a penalized likelihood. In addition, the performance of the considered estimators is illustrated using a simulation study and a real data example.
      PubDate: 2022-03-23
       
  • Kurtosis removal for data pre-processing

    • Free pre-print version: Loading...

      Abstract: Abstract Mesokurtic projections are linear projections with null fourth cumulants. They might be useful data pre-processing tools when nonnormality, as measured by the fourth cumulants, is either an opportunity or a challenge. Nonnull fourth cumulants are opportunities when projections with extreme kurtosis are used to identify interesting nonnormal features, as for example clusters and outliers. Unfortunately, this approach suffers from the curse of dimensionality, which may be addressed by projecting the data onto the subspace orthogonal to mesokurtic projections. Nonnull fourth cumulants are challenges when using statistical methods whose sampling properties heavily depend on the fourth cumulant themselves. Mesokurtic projections ease the problem by allowing to use the inferential properties of the same methods under normality. The paper shows necessary and sufficient conditions for the existence of mesokurtic projections and compares them with other gaussianization methods. Theoretical and empirical results suggest that mesokurtic transformations are particularly useful when sampling from finite normal mixtures. The practical use of mesokurtic projections is illustrated with the AIS and the RANDU datasets.
      PubDate: 2022-03-19
       
  • Over-optimistic evaluation and reporting of novel cluster algorithms: an
           illustrative study

    • Free pre-print version: Loading...

      Abstract: Abstract When researchers publish new cluster algorithms, they usually demonstrate the strengths of their novel approaches by comparing the algorithms’ performance with existing competitors. However, such studies are likely to be optimistically biased towards the new algorithms, as the authors have a vested interest in presenting their method as favorably as possible in order to increase their chances of getting published. Therefore, the superior performance of newly introduced cluster algorithms is over-optimistic and might not be confirmed in independent benchmark studies performed by neutral and unbiased authors. This problem is known among many researchers, but so far, the different mechanisms leading to over-optimism in cluster algorithm evaluation have never been systematically studied and discussed. Researchers are thus often not aware of the full extent of the problem. We present an illustrative study to illuminate the mechanisms by which authors—consciously or unconsciously—paint their cluster algorithm’s performance in an over-optimistic light. Using the recently published cluster algorithm Rock as an example, we demonstrate how optimization of the used datasets or data characteristics, of the algorithm’s parameters and of the choice of the competing cluster algorithms leads to Rock’s performance appearing better than it actually is. Our study is thus a cautionary tale that illustrates how easy it can be for researchers to claim apparent “superiority” of a new cluster algorithm. This illuminates the vital importance of strategies for avoiding the problems of over-optimism (such as, e.g., neutral benchmark studies), which we also discuss in the article.
      PubDate: 2022-03-17
       
  • On discriminating between lognormal and Pareto tail: an unsupervised
           mixture-based approach

    • Free pre-print version: Loading...

      Abstract: Abstract Many stochastic models in economics and finance are described by distributions with a lognormal body. Testing for a possible Pareto tail and estimating the parameters of the Pareto distribution in these models is an important topic. Although the problem has been extensively studied in the literature, most applications are characterized by some weaknesses. We propose a method that exploits all the available information by taking into account the data generating process of the whole population. After estimating a lognormal–Pareto mixture with a known threshold via the EM algorithm, we exploit this result to develop an unsupervised tail estimation approach based on the maximization of the profile likelihood function. Monte Carlo experiments and two empirical applications to the size of US metropolitan areas and of firms in an Italian district confirm that the proposed method works well and outperforms two commonly used techniques. Simulation results are available in an online supplementary appendix.
      PubDate: 2022-03-08
       
  • Strong consistency of the MLE under two-parameter Gamma mixture models
           with a structural scale parameter

    • Free pre-print version: Loading...

      Abstract: Abstract We study the strong consistency of the maximum likelihood estimator under a special finite mixture of two-parameter Gamma distributions. Somewhat surprisingly, the likelihood function under Gamma mixture with a set of independent and identically distributed observations is unbounded. There exist many sets of nonsensical parameter values at which the likelihood value is arbitrarily large. This leads to an inconsistent, or arguably undefined, maximum likelihood estimator. Interestingly, when the scale or shape parameter in the finite Gamma mixture model is structural, the maximum likelihood estimator of the mixing distribution is well defined and strongly consistent. Establishing the consistency when the shape parameter is structural is technically less challenging and already given in the literature. In this paper, we prove the consistency when the scale parameter is structural and provide some illustrative simulation experiments. We further include an application example of the model with a structural scale parameter to salary potential data. We conclude that the Gamma mixture distribution with a structural scale parameter provides another flexible yet relatively parsimonious model for observations with intrinsic positive values.
      PubDate: 2022-03-01
       
  • Mixed Deep Gaussian Mixture Model: a clustering model for mixed datasets

    • Free pre-print version: Loading...

      Abstract: Abstract Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the variables in order to design groups. In this work we introduce a multilayer architecture model-based clustering method called Mixed Deep Gaussian Mixture Model that can be viewed as an automatic way to merge the clustering performed separately on continuous and non-continuous data. This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data. In this sense we generalize Generalized Linear Latent Variable Models and Deep Gaussian Mixture Models. We also design a new initialisation strategy and a data driven method that selects the best specification of the model and the optimal number of clusters for a given dataset. Besides, our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets. Finally, we validate the performance of our approach comparing its results with state-of-the-art mixed data clustering models over several commonly used datasets.
      PubDate: 2022-03-01
       
  • Multivariate cluster weighted models using skewed distributions

    • Free pre-print version: Loading...

      Abstract: Abstract Much work has been done in the area of the cluster weighted model (CWM), which extends the finite mixture of regression model to include modelling of the covariates. Although many types of distributions have been considered for both the response(s) and covariates, to our knowledge skewed distributions have not yet been considered in this paradigm. Herein, a family of 24 novel CWMs is considered which allows both the responses and covariates to be modelled using one of four skewed distributions (the generalized hyberbolic and three of its skewed special cases, i.e., the skew-t, the variance-gamma and the normal-inverse Gaussian distributions) or the normal distribution. Parameter estimation is performed using the expectation-maximization algorithm and both simulated and real data are used for illustration.
      PubDate: 2022-03-01
       
  • Robust optimal classification trees under noisy labels

    • Free pre-print version: Loading...

      Abstract: Abstract In this paper we propose a novel methodology to construct Optimal Classification Trees that takes into account that noisy labels may occur in the training sample. The motivation of this new methodology is based on the superaditive effect of combining together margin based classifiers and outlier detection techniques. Our approach rests on two main elements: (1) the splitting rules for the classification trees are designed to maximize the separation margin between classes applying the paradigm of SVM; and (2) some of the labels of the training sample are allowed to be changed during the construction of the tree trying to detect the label noise. Both features are considered and integrated together to design the resulting Optimal Classification Tree. We present a Mixed Integer Non Linear Programming formulation for the problem, suitable to be solved using any of the available off-the-shelf solvers. The model is analyzed and tested on a battery of standard datasets taken from UCI Machine Learning repository, showing the effectiveness of our approach. Our computational results show that in most cases the new methodology outperforms both in accuracy and AUC the results of the benchmarks provided by OCT and OCT-H.
      PubDate: 2022-03-01
       
  • Editorial for ADAC issue 1 of volume 16 (2022)

    • Free pre-print version: Loading...

      PubDate: 2022-03-01
      DOI: 10.1007/s11634-022-00494-7
       
  • Unobserved classes and extra variables in high-dimensional discriminant
           analysis

    • Free pre-print version: Loading...

      Abstract: Abstract In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.
      PubDate: 2022-03-01
      DOI: 10.1007/s11634-021-00474-3
       
  • Poisson degree corrected dynamic stochastic block model

    • Free pre-print version: Loading...

      Abstract: Abstract Stochastic Block Model (SBM) provides a statistical tool for modeling and clustering network data. In this paper, we propose an extension of this model for discrete-time dynamic networks that takes into account the variability in node degrees, allowing us to model a broader class of networks. We develop a probabilistic model that generates temporal graphs with a dynamic cluster structure and time-dependent degree corrections for each node. Thanks to these degree corrections, the nodes can have variable in- and out-degrees, allowing us to model complex cluster structures as well as interactions that decrease or increase over time. We compare the proposed model to a model without degree correction and highlight its advantages in the case of inhomogenous degree distributions in the clusters and in the recovery of unstable cluster dynamics. We propose an inference procedure based on Variational Expectation-Maximization (VEM) that also provides the means to estimate the time-dependent degree corrections. Extensive experiments on simulated and real datasets confirm the benefits of our approach and show the effectiveness of the proposed algorithm.
      PubDate: 2022-02-27
      DOI: 10.1007/s11634-022-00492-9
       
  • Gaussian mixture model with an extended ultrametric covariance structure

    • Free pre-print version: Loading...

      Abstract: Gaussian Mixture Models (GMMs) are one of the most widespread methodologies for model-based clustering. They assume a multivariate Gaussian distribution for each component of the mixture, centered at the mean vector and with volume, shape and orientation derived by the covariance matrix. To reduce the large number of parameters produced by the covariance matrices, parsimonious parameterizations of the latter were proposed in literature, e.g., the eigen-decomposition and the parsimonious GMMs based on mixtures of probabilistic principal component analyzers and mixtures of factor analyzers. We introduce a new parameterization of a covariance matrix by defining an extended ultrametric covariance matrix and we implement it into a GMM. This structure can be used to describe multidimensional phenomena which are characterized by nested latent concepts having different levels of abstraction, from the most specific to the most general. The proposal is able to pinpoint a hierarchical structure on variables for each component of the GMM, thus identifying a different characterization of a multidimensional phenomenon for each component (cluster, subpopulation) of the mixture. At the same time, it defines a new parsimonious GMM since the ultrametric covariance structure reconstructs the relationships among variables with a limited number of parameters. The proposal is applied on synthetic and real data. On the former it shows good performance in terms of classification when compared to the other existing parameterizations, and on the latter it also provides insight into the hierarchical relationships among the variables for each cluster.
      PubDate: 2022-02-25
      DOI: 10.1007/s11634-021-00488-x
       
  • Model-based clustering and outlier detection with missing data

    • Free pre-print version: Loading...

      Abstract: Abstract The use of the multivariate contaminated normal (MCN) distribution in model-based clustering is recommended to cluster data characterized by mild outliers, the model can at the same time detect outliers automatically and produce robust parameter estimates in each cluster. However, one of the limitations of this approach is that it requires complete data, i.e. the MCN cannot be used directly on data with missing values. In this paper, we develop a framework for fitting a mixture of MCN distributions to incomplete data sets, i.e. data sets with some values missing at random. Parameter estimation is obtained using the expectation-conditional maximization algorithm—a variant of the expectation-maximization algorithm in which the traditional maximization steps are instead replaced by simpler conditional maximization steps. We perform a simulation study to compare the results of our model to a mixture of multivariate normal and Student’s t distributions for incomplete data. The simulation also includes a study on the effect of the percentage of missing data on the performance of the three algorithms. The model is then applied to the Automobile data set (UCI machine learning repository). The results show that, while the Student’s t distribution gives similar classification performance, the MCN works better in detecting outliers with a lower false positive rate of outlier detection. The performance of all the techniques decreases linearly as the percentage of missing values increases.
      PubDate: 2022-01-22
      DOI: 10.1007/s11634-021-00476-1
       
  • An empirical comparison and characterisation of nine popular clustering
           methods

    • Free pre-print version: Loading...

      Abstract: Abstract Nine popular clustering methods are applied to 42 real data sets. The aim is to give a detailed characterisation of the methods by means of several cluster validation indexes that measure various individual aspects of the resulting clusters such as small within-cluster distances, separation of clusters, closeness to a Gaussian distribution etc. as introduced in Hennig (in: Data analysis and applications 1: clustering and regression, modeling—estimating, forecasting and data mining, ISTE Ltd., London, 2019). 30 of the data sets come with a “true” clustering. On these data sets the similarity of the clusterings from the nine methods to the “true” clusterings is explored. Furthermore, a mixed effects regression relates the observable individual aspects of the clusters to the similarity with the “true” clusterings, which in real clustering problems is unobservable. The study gives new insight not only into the ability of the methods to discover “true” clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given “true” clustering.
      PubDate: 2022-01-09
      DOI: 10.1007/s11634-021-00478-z
       
  • Robust clustering of functional directional data

    • Free pre-print version: Loading...

      Abstract: Abstract A robust approach for clustering functional directional data is proposed. The proposal adapts “impartial trimming” techniques to this particular framework. Impartial trimming uses the dataset itself to tell us which appears to be the most outlying curves. A feasible algorithm is proposed for its practical implementation justified by some theoretical properties. A “warping” approach is also introduced which allows including controlled time warping in that robust clustering procedure to detect typical “templates”. The proposed methodology is illustrated in a real data analysis problem where it is applied to cluster aircraft trajectories.
      PubDate: 2021-12-09
      DOI: 10.1007/s11634-021-00482-3
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 3.238.94.194
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-