  STATISTICS (Total: 130 journals)
Showing 1 - 151 of 151 Journals sorted alphabetically
Advances in Complex Systems     Hybrid Journal   (Followers: 10)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 61)
Annals of Applied Statistics     Full-text available via subscription   (Followers: 39)
Applied Categorical Structures     Hybrid Journal   (Followers: 4)
Argumentation et analyse du discours     Open Access   (Followers: 10)
Asian Journal of Mathematics & Statistics     Open Access   (Followers: 8)
AStA Advances in Statistical Analysis     Hybrid Journal   (Followers: 4)
Australian & New Zealand Journal of Statistics     Hybrid Journal   (Followers: 13)
Bernoulli     Full-text available via subscription   (Followers: 9)
Biometrical Journal     Hybrid Journal   (Followers: 10)
Biometrics     Hybrid Journal   (Followers: 51)
British Journal of Mathematical and Statistical Psychology     Full-text available via subscription   (Followers: 18)
Building Simulation     Hybrid Journal   (Followers: 1)
Bulletin of Statistics     Full-text available via subscription   (Followers: 4)
CHANCE     Hybrid Journal   (Followers: 5)
Communications in Statistics - Simulation and Computation     Hybrid Journal   (Followers: 9)
Communications in Statistics - Theory and Methods     Hybrid Journal   (Followers: 11)
Computational Statistics     Hybrid Journal   (Followers: 14)
Computational Statistics & Data Analysis     Hybrid Journal   (Followers: 37)
Current Research in Biostatistics     Open Access   (Followers: 8)
Decisions in Economics and Finance     Hybrid Journal   (Followers: 11)
Demographic Research     Open Access   (Followers: 16)
Electronic Journal of Statistics     Open Access   (Followers: 8)
Engineering With Computers     Hybrid Journal   (Followers: 5)
Environmental and Ecological Statistics     Hybrid Journal   (Followers: 7)
ESAIM: Probability and Statistics     Full-text available via subscription   (Followers: 5)
Extremes     Hybrid Journal   (Followers: 2)
Fuzzy Optimization and Decision Making     Hybrid Journal   (Followers: 8)
Geneva Papers on Risk and Insurance - Issues and Practice     Hybrid Journal   (Followers: 13)
Handbook of Numerical Analysis     Full-text available via subscription   (Followers: 5)
Handbook of Statistics     Full-text available via subscription   (Followers: 7)
IEA World Energy Statistics and Balances -     Full-text available via subscription   (Followers: 2)
International Journal of Computational Economics and Econometrics     Hybrid Journal   (Followers: 6)
International Journal of Quality, Statistics, and Reliability     Open Access   (Followers: 17)
International Journal of Stochastic Analysis     Open Access   (Followers: 3)
International Statistical Review     Hybrid Journal   (Followers: 12)
International Trade by Commodity Statistics - Statistiques du commerce international par produit     Full-text available via subscription  
Journal of Algebraic Combinatorics     Hybrid Journal   (Followers: 4)
Journal of Applied Statistics     Hybrid Journal   (Followers: 20)
Journal of Biopharmaceutical Statistics     Hybrid Journal   (Followers: 20)
Journal of Business & Economic Statistics     Full-text available via subscription   (Followers: 39, SJR: 3.664, CiteScore: 2)
Journal of Combinatorial Optimization     Hybrid Journal   (Followers: 7)
Journal of Computational & Graphical Statistics     Full-text available via subscription   (Followers: 20)
Journal of Econometrics     Hybrid Journal   (Followers: 82)
Journal of Educational and Behavioral Statistics     Hybrid Journal   (Followers: 6)
Journal of Forecasting     Hybrid Journal   (Followers: 17)
Journal of Global Optimization     Hybrid Journal   (Followers: 7)
Journal of Interactive Marketing     Hybrid Journal   (Followers: 10)
Journal of Mathematics and Statistics     Open Access   (Followers: 8)
Journal of Nonparametric Statistics     Hybrid Journal   (Followers: 6)
Journal of Probability and Statistics     Open Access   (Followers: 10)
Journal of Risk and Uncertainty     Hybrid Journal   (Followers: 32)
Journal of Statistical and Econometric Methods     Open Access   (Followers: 5)
Journal of Statistical Physics     Hybrid Journal   (Followers: 13)
Journal of Statistical Planning and Inference     Hybrid Journal   (Followers: 8)
Journal of Statistical Software     Open Access   (Followers: 20, SJR: 13.802, CiteScore: 16)
Journal of the American Statistical Association     Full-text available via subscription   (Followers: 72, SJR: 3.746, CiteScore: 2)
Journal of the Korean Statistical Society     Hybrid Journal   (Followers: 1)
Journal of the Royal Statistical Society Series C (Applied Statistics)     Hybrid Journal   (Followers: 31)
Journal of the Royal Statistical Society, Series A (Statistics in Society)     Hybrid Journal   (Followers: 26)
Journal of the Royal Statistical Society, Series B (Statistical Methodology)     Hybrid Journal   (Followers: 43)
Journal of Theoretical Probability     Hybrid Journal   (Followers: 3)
Journal of Time Series Analysis     Hybrid Journal   (Followers: 16)
Journal of Urbanism: International Research on Placemaking and Urban Sustainability     Hybrid Journal   (Followers: 30)
Law, Probability and Risk     Hybrid Journal   (Followers: 8)
Lifetime Data Analysis     Hybrid Journal   (Followers: 7)
Mathematical Methods of Statistics     Hybrid Journal   (Followers: 4)
Measurement Interdisciplinary Research and Perspectives     Hybrid Journal   (Followers: 1)
Metrika     Hybrid Journal   (Followers: 4)
Modelling of Mechanical Systems     Full-text available via subscription   (Followers: 1)
Monte Carlo Methods and Applications     Hybrid Journal   (Followers: 6)
Monthly Statistics of International Trade - Statistiques mensuelles du commerce international     Full-text available via subscription   (Followers: 2)
Multivariate Behavioral Research     Hybrid Journal   (Followers: 5)
Optimization Letters     Hybrid Journal   (Followers: 2)
Optimization Methods and Software     Hybrid Journal   (Followers: 8)
Oxford Bulletin of Economics and Statistics     Hybrid Journal   (Followers: 34)
Pharmaceutical Statistics     Hybrid Journal   (Followers: 17)
Probability Surveys     Open Access   (Followers: 4)
Queueing Systems     Hybrid Journal   (Followers: 7)
Research Synthesis Methods     Hybrid Journal   (Followers: 7)
Review of Economics and Statistics     Hybrid Journal   (Followers: 124)
Review of Socionetwork Strategies     Hybrid Journal  
Risk Management     Hybrid Journal   (Followers: 15)
Sankhya A     Hybrid Journal   (Followers: 2)
Scandinavian Journal of Statistics     Hybrid Journal   (Followers: 9)
Sequential Analysis: Design Methods and Applications     Hybrid Journal  
Significance     Hybrid Journal   (Followers: 7)
Sociological Methods & Research     Hybrid Journal   (Followers: 37)
SourceOCDE Comptes nationaux et Statistiques retrospectives     Full-text available via subscription  
SourceOCDE Statistiques : Sources et methodes     Full-text available via subscription  
SourceOECD Bank Profitability Statistics - SourceOCDE Rentabilite des banques     Full-text available via subscription   (Followers: 1)
SourceOECD Insurance Statistics - SourceOCDE Statistiques d'assurance     Full-text available via subscription   (Followers: 2)
SourceOECD Main Economic Indicators - SourceOCDE Principaux indicateurs economiques     Full-text available via subscription   (Followers: 1)
SourceOECD Measuring Globalisation Statistics - SourceOCDE Mesurer la mondialisation - Base de donnees statistiques     Full-text available via subscription  
SourceOECD Monthly Statistics of International Trade     Full-text available via subscription   (Followers: 1)
SourceOECD National Accounts & Historical Statistics     Full-text available via subscription  
SourceOECD OECD Economic Outlook Database - SourceOCDE Statistiques des Perspectives economiques de l'OCDE     Full-text available via subscription   (Followers: 2)
SourceOECD Science and Technology Statistics - SourceOCDE Base de donnees des sciences et de la technologie     Full-text available via subscription  
SourceOECD Statistics Sources & Methods     Full-text available via subscription   (Followers: 1)
SourceOECD Taxing Wages Statistics - SourceOCDE Statistiques des impots sur les salaires     Full-text available via subscription  
Stata Journal     Full-text available via subscription   (Followers: 9)
Statistica Neerlandica     Hybrid Journal   (Followers: 1)
Statistical Applications in Genetics and Molecular Biology     Hybrid Journal   (Followers: 5)
Statistical Communications in Infectious Diseases     Hybrid Journal  
Statistical Inference for Stochastic Processes     Hybrid Journal   (Followers: 3)
Statistical Methodology     Hybrid Journal   (Followers: 7)
Statistical Methods and Applications     Hybrid Journal   (Followers: 6)
Statistical Methods in Medical Research     Hybrid Journal   (Followers: 27)
Statistical Modelling     Hybrid Journal   (Followers: 19)
Statistical Papers     Hybrid Journal   (Followers: 4)
Statistical Science     Full-text available via subscription   (Followers: 13)
Statistics & Probability Letters     Hybrid Journal   (Followers: 13)
Statistics & Risk Modeling     Hybrid Journal   (Followers: 2)
Statistics and Computing     Hybrid Journal   (Followers: 13)
Statistics and Economics     Open Access   (Followers: 1)
Statistics in Medicine     Hybrid Journal   (Followers: 191)
Statistics, Politics and Policy     Hybrid Journal   (Followers: 6)
Statistics: A Journal of Theoretical and Applied Statistics     Hybrid Journal   (Followers: 14)
Stochastic Models     Hybrid Journal   (Followers: 3)
Stochastics An International Journal of Probability and Stochastic Processes: formerly Stochastics and Stochastics Reports     Hybrid Journal   (Followers: 2)
Structural and Multidisciplinary Optimization     Hybrid Journal   (Followers: 12)
Teaching Statistics     Hybrid Journal   (Followers: 7)
Technology Innovations in Statistics Education (TISE)     Open Access   (Followers: 2)
TEST     Hybrid Journal   (Followers: 3)
The American Statistician     Full-text available via subscription   (Followers: 24)
The Annals of Applied Probability     Full-text available via subscription   (Followers: 8)
The Annals of Probability     Full-text available via subscription   (Followers: 10)
The Annals of Statistics     Full-text available via subscription   (Followers: 34)
The Canadian Journal of Statistics / La Revue Canadienne de Statistique     Hybrid Journal   (Followers: 11)
Wiley Interdisciplinary Reviews - Computational Statistics     Hybrid Journal   (Followers: 1)

Advances in Data Analysis and Classification
Journal Prestige (SJR): 1.09
Citation Impact (citeScore): 1
Number of Followers: 61  
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1862-5355 - ISSN (Online) 1862-5347
Published by Springer-Verlag Homepage  [2626 journals]
  • PCA-KL: a parametric dimensionality reduction approach for unsupervised
           metric learning
    • Abstract: Abstract Dimensionality reduction algorithms are powerful mathematical tools for data analysis and visualization. In many pattern recognition applications, a feature extraction step is often required to mitigate the curse of the dimensionality, a collection of negative effects caused by an arbitrary increase in the number of features in classification tasks. Principal Component Analysis (PCA) is a classical statistical method that creates new features based on linear combinations of the original ones through the eigenvectors of the covariance matrix. In this paper, we propose PCA-KL, a parametric dimensionality reduction algorithm for unsupervised metric learning, based on the computation of the entropic covariance matrix, a surrogate for the covariance matrix of the data obtained in terms of the relative entropy between local Gaussian distributions instead of the usual Euclidean distance between the data points. Numerical experiments with several real datasets show that the proposed method is capable of producing better defined clusters and also higher classification accuracy in comparison to regular PCA and several manifold learning algorithms, making PCA-KL a promising alternative for unsupervised metric learning.
      PubDate: 2021-01-07
  • Functional data clustering by projection into latent generalized
           hyperbolic subspaces
    • Abstract: Abstract We introduce a latent subpace model which facilitates model-based clustering of functional data. Flexible clustering is attained by imposing jointly generalized hyperbolic distributions on projections of basis expansion coefficients into group specific subspaces. The model acquires parsimony by assuming these subspaces are of relatively low dimension. Parameter estimation is done through a multicycle ECM algorithm. Application to simulated and real datasets illustrate competitive clustering capabilities, and demonstrate the models general applicability.
      PubDate: 2021-01-07
  • A process framework for inducing and explaining Datalog theories
    • Abstract: Abstract With the increasing prevalence of Machine Learning in everyday life, a growing number of people will be provided with Machine-Learned assessments on a regular basis. We believe that human users interacting with systems based on Machine-Learned classifiers will demand and profit from the systems’ decisions being explained in an approachable and comprehensive way. We developed a general process framework for logic-rule-based classifiers facilitating mutual exchange between system and user. The framework constitutes a guideline for how a system can apply Inductive Logic Programming in order to provide comprehensive explanations for classification choices and empowering users to evaluate and correct the system’s decisions. It also includes users’ corrections being integrated into the system’s core logic rules via retraining in order to increase the overall performance of the human-computer system. The framework suggests various forms of explanations—like natural language argumentations, near misses emphasizing unique characteristics, or image annotations—to be integrated into the system.
      PubDate: 2021-01-05
  • Automatic gait classification patterns in spastic hemiplegia
    • Abstract: Abstract Clinical gait analysis and the interpretation of related records are a powerful tool to aid clinicians in the diagnosis, treatment and prognosis of human gait disabilities. The aim of this study is to investigate kinematic, kinetic, and electromyographic (EMG) data from child patients with spastic hemiplegia (SH) in order to discover useful patterns in human gait. Data mining techniques and classification algorithms were used to explore data from 278 SH patients. We studied different techniques for selection of attributes in order to get the best classification scores. For kinematics data, the dimension of the initial attribute space was 1033, which was reduced to 78 using the Ranker and FilteredAttributeEval algorithms. For kinetics data, the best combination of attributes was determined by SubsetSizeForward Selection and CfsSubEval with a reduction of attribute space size from 931 to 25. Decision-tree based learning algorithms, in particular the logistic model tree based on logistic regression and J48, produced the best scores for correct SH gait classification (89.393% for kinetics, 89.394% for kinematics, and 97.183% for EMG). To evaluate the effectiveness of combined feature selection methods with the classifiers, quantitative measures of model quality were used (kappa statistic, measures of sensitivity and specificity, verisimilitude rates, and ROC curves). Comparison of these results to a qualitative assessment from physicians showed a success rate of 100% for results from kinematics and EMG data, while for kinetics data the success rate was 60%. The patterns resulting from automatic data analysis of gait records have been integrated into an end-user application in order to support medical decision-making.
      PubDate: 2021-01-04
  • A bivariate finite mixture growth model with selection
    • Abstract: Abstract A model is proposed to analyze longitudinal data where two response variables are available, one of which is a binary indicator of selection and the other is continuous and observed only if the first is equal to 1. The model also accounts for individual covariates and may be considered as a bivariate finite mixture growth model as it is based on three submodels: (i) a probit model for the selection variable; (ii) a linear model for the continuous variable; and (iii) a multinomial logit model for the class membership. To suitably address endogeneity, the first two components rely on correlated errors as in a standard selection model. The proposed approach is applied to the analysis of the dynamics of household portfolio choices based on an unbalanced panel dataset of Italian households over the 1998–2014 period. For this dataset, we identify three latent classes of households with specific investment behaviors and we assess the effect of individual characteristics on households’ portfolio choices. Our empirical findings also confirm the need to jointly model risky asset market participation and the conditional portfolio share to properly analyze investment behaviors over the life-cycle.
      PubDate: 2020-12-29
  • Adapted single-cell consensus clustering (adaSC3)
    • Abstract: Abstract The analysis of single-cell RNA sequencing data is of great importance in health research. It challenges data scientists, but has enormous potential in the context of personalized medicine. The clustering of single cells aims to detect different subgroups of cell populations within a patient in a data-driven manner. Some comparison studies denote single-cell consensus clustering (SC3), proposed by Kiselev et al. (Nat Methods 14(5):483–486, 2017), as the best method for classifying single-cell RNA sequencing data. SC3 includes Laplacian eigenmaps and a principal component analysis (PCA). Our proposal of unsupervised adapted single-cell consensus clustering (adaSC3) suggests to replace the linear PCA by diffusion maps, a non-linear method that takes the transition of single cells into account. We investigate the performance of adaSC3 in terms of accuracy on the data sets of the original source of SC3 as well as in a simulation study. A comparison of adaSC3 with SC3 as well as with related algorithms based on further alternative dimension reduction techniques shows a quite convincing behavior of adaSC3.
      PubDate: 2020-12-15
  • Data generation for composite-based structural equation modeling methods
    • Abstract: Abstract Examining the efficacy of composite-based structural equation modeling (SEM) features prominently in research. However, studies analyzing the efficacy of corresponding estimators usually rely on factor model data. Thereby, they assess and analyze their performance on erroneous grounds (i.e., factor model data instead of composite model data). A potential reason for this malpractice lies in the lack of available composite model-based data generation procedures for prespecified model parameters in the structural model and the measurements models. Addressing this gap in research, we derive model formulations and present a composite model-based data generation approach. The findings will assist researchers in their composite-based SEM simulation studies.
      PubDate: 2020-12-01
  • On the use of quantile regression to deal with heterogeneity: the case of
           multi-block data
    • Abstract: Abstract The aim of the paper is to propose a quantile regression based strategy to assess heterogeneity in a multi-block type data structure. Specifically, the paper deals with a particular data structure where several blocks of variables are observed on the same units and a structure of relations is assumed between the different blocks. The idea is that quantile regression complements the results of the least squares regression by evaluating the impact of regressors on the entire distribution of the dependent variable, and not only exclusively on the expected value. By taking advantage of this, the proposed approach analyses the relationship among a dependent variable block and a set of regressors blocks but highlighting possible similarities among the statistical units. An empirical analysis is provided in the consumer analysis framework with the aim to cluster groups of consumers according to the similarities in the dependence structure among their overall liking and the liking for different drivers.
      PubDate: 2020-12-01
  • The GNG neural network in analyzing consumer behaviour patterns: empirical
           research on a purchasing behaviour processes realized by the elderly
    • Abstract: Abstract The paper sheds light on the use of a self-learning GNG neural network for identification and exploration of the purchasing behaviour patterns. The test has been conducted on the data collected from consumers aged 60 years and over, with regard to three product purchases. The primary data used to explore the purchasing behaviour patterns was collected during a survey carried out among the elderly students at the Universities of Third Age in Slovenia, the Czech Republic and Poland, in the years 2017–2018. Finally, a total of six different types of purchasing patterns have been identified, namely the ‘thoughtful decision’, the ‘sensitive to recommendation’, the ‘beneficiary, the ‘short thoughtful decision’, the ‘habitual decision’ and ‘multiple’ patterns. The most significant differences in the purchasing patterns of the three national samples have been identified with regard to the process of purchasing a smartphone, while the most repetitive patterns have been identified with regard to the purchasing of a new product. The results significantly support the GNG network’s validity for identification of consumer behaviour patterns. The application of this method allowed quick and effective to identify and segment consumers groups as well as facilitated the mapping of the differences among these groups and to compare the consumption behaviour expressed by consumers on different markets. The identified consumer purchase patterns may play a basic role for marketers to understand consumer behaviour and then propose tailored strategies in international marketing.
      PubDate: 2020-12-01
  • Mixtures of Dirichlet-Multinomial distributions for supervised and
           unsupervised classification of short text data
    • Abstract: Abstract Topic detection in short textual data is a challenging task due to its representation as high-dimensional and extremely sparse document-term matrix. In this paper we focus on the problem of classifying textual data on the base of their (unique) topic. For unsupervised classification, a popular approach called Mixture of Unigrams consists in considering a mixture of multinomial distributions over the word counts, each component corresponding to a different topic. The multinomial distribution can be easily extended by a Dirichlet prior to the compound mixtures of Dirichlet-Multinomial distributions, which is preferable for sparse data. We propose a gradient descent estimation method for fitting the model, and investigate supervised and unsupervised classification performance on real empirical problems.
      PubDate: 2020-12-01
  • Chained correlations for feature selection
    • Abstract: Abstract Data-driven algorithms stand and fall with the availability and quality of existing data sources. Both can be limited in high-dimensional settings ( \(n \gg m\) ). For example, supervised learning algorithms designed for molecular pheno- or genotyping are restricted to samples of the corresponding diagnostic classes. Samples of other related entities, such as arise in differential diagnosis, are usually not utilized in this learning scheme. Nevertheless, they might provide domain knowledge on the background or context of the original diagnostic task. In this work, we discuss the possibility of incorporating samples of foreign classes in the training of diagnostic classification models that can be related to the task of differential diagnosis. Especially in heterogeneous data collections comprising multiple diagnostic categories, the foreign ones can change the magnitude of available samples. More precisely, we utilize this information for the internal feature selection process of diagnostic models. We propose the use of chained correlations of original and foreign diagnostic classes. This method allows the detection of intermediate foreign classes by evaluating the correlation between class labels and features for each pair of original and foreign categories. Interestingly, this criterion does not require direct comparisons of the initial diagnostic groups and therefore, might be suitable for settings with restricted data access.
      PubDate: 2020-12-01
  • SEM-Tree hybrid models in the preferences analysis of the members of
           Polish households
    • Abstract: Abstract The purpose of the paper is to identify the dimensions of the strategy of resources allocation of Polish households members and test the hypothesis concerning risky shift effect in the relationship between strategy of family decision making and trade-off in family scarce resources allocation. These dimensions were identified on the basis of nationwide empirical data gathered on a representative sample of 1020 respondents nested in 410 households. SEM-Tree hybrid models are used in the analysis of the results, which combine the confirmatory structural equation models with exploratory and predictive classification and regression trees. This allows to apply structural modeling for the study of heterogeneous populations and to assess the hierarchical impact of exogenous predictors on the identification of segments with separate and unique model structural parameters. The approach combines the advantages of a model approach (at the stage of constructing hypotheses on structural relationships and specifications of measurement models) and exploration-based data (at the stage of recursive division of the sample).
      PubDate: 2020-12-01
  • The ultrametric correlation matrix for modelling hierarchical latent
    • Abstract: Abstract Many relevant multidimensional phenomena are defined by nested latent concepts, which can be represented by a tree-structure supposing a hierarchical relationship among manifest variables. The root of the tree is a general concept which includes more specific ones. The aim of the paper is to reconstruct an observed data correlation matrix of manifest variables through an ultrametric correlation matrix which is able to pinpoint the hierarchical nature of the phenomenon under study. With this scope, we introduce a novel model which detects consistent latent concepts and their relationships starting from the observed correlation matrix.
      PubDate: 2020-12-01
  • A Riemannian geometric framework for manifold learning of non-Euclidean
    • Abstract: Abstract A growing number of problems in data analysis and classification involve data that are non-Euclidean. For such problems, a naive application of vector space analysis algorithms will produce results that depend on the choice of local coordinates used to parametrize the data. At the same time, many data analysis and classification problems eventually reduce to an optimization, in which the criteria being minimized can be interpreted as the distortion associated with a mapping between two curved spaces. Exploiting this distortion minimizing perspective, we first show that manifold learning problems involving non-Euclidean data can be naturally framed as seeking a mapping between two Riemannian manifolds that is closest to being an isometry. A family of coordinate-invariant first-order distortion measures is then proposed that measure the proximity of the mapping to an isometry, and applied to manifold learning for non-Euclidean data sets. Case studies ranging from synthetic data to human mass-shape data demonstrate the many performance advantages of our Riemannian distortion minimization framework.
      PubDate: 2020-11-27
  • Robust semiparametric inference for polytomous logistic regression with
           complex survey design
    • Abstract: Abstract Analyzing polytomous response from a complex survey scheme, like stratified or cluster sampling is very crucial in several socio-economics applications. We present a class of minimum quasi weighted density power divergence estimators for the polytomous logistic regression model with such a complex survey. This family of semiparametric estimators is a robust generalization of the maximum quasi weighted likelihood estimator exploiting the advantages of the popular density power divergence measure. Accordingly robust estimators for the design effects are also derived. Using the new estimators, robust testing of general linear hypotheses on the regression coefficients are proposed. Their asymptotic distributions and robustness properties are theoretically studied and also empirically validated through a numerical example and an extensive Monte Carlo study.
      PubDate: 2020-11-23
  • Predicting brand confusion in imagery markets based on deep learning of
           visual advertisement content
    • Abstract: Abstract In the consumer goods industry, unique brand positionings are assumed to be the road to success. They document product distinctiveness and so justify high prices. However, as products are getting more and more interchangeable, brand positionings must rely—at least partially—on supporting advertisements. Here, especially ads with visual content (e.g. photos, video clips) are able to connect brands with desirable emotions and values. Recently, besides TV, cinema, newspaper, also search engines, social networks, photo-, video-sharing platforms are used to spread such ads. In this paper, we demonstrate, how deep learning based on such ads can be used to predict uniqueness of brand positionings. A sample application to the German Pils beer market is used for demonstration.
      PubDate: 2020-11-19
  • Special issue on “Learning in data science: theory, methods and
           applications”—preface by the guest editors
    • PubDate: 2020-11-18
  • Clustering of modal-valued symbolic data
    • Abstract: Abstract Symbolic data analysis is based on special descriptions of data known as symbolic objects (SOs). Such descriptions preserve more detailed information about units and their clusters than the usual representations with mean values. A special type of SO is a representation with frequency or probability distributions (modal values). This representation enables us to simultaneously consider variables of all measurement types during the clustering process. In this paper, we present the theoretical basis for compatible leaders and agglomerative clustering methods with alternative dissimilarities for modal-valued SOs. The leaders method efficiently solves clustering problems with large numbers of units, while the agglomerative method can be applied either alone to a small data set, or to leaders, obtained from the compatible leaders clustering method. We focus on (a) the inclusion of weights that enables clustering representatives to retain the same structure as if clustering only first order units and (b) the selection of relative dissimilarities that produce more interpretable, i.e., meaningful optimal clustering representatives. The usefulness of the proposed methods with adaptations was assessed and substantiated by carefully constructed simulation settings and demonstrated on three different real-world data sets gaining in interpretability from the use of weights (population pyramids and ESS data) or relative dissimilarity (US patents data).
      PubDate: 2020-10-24
  • Editable machine learning models' A rule-based framework for user
           studies of explainability
    • Abstract: Abstract So far, most user studies dealing with comprehensibility of machine learning models have used questionnaires or surveys to acquire input from participants. In this article, we argue that compared to questionnaires, the use of an adapted version of a real machine learning interface can yield a new level of insight into what attributes make a machine learning model interpretable, and why. Also, we argue that interpretability research also needs to consider the task of humans editing the model, not least due to the existing or forthcoming legal requirements on the right of human intervention. In this article, we focus on rule models as these are directly interpretable as well as editable. We introduce an extension of the EasyMiner system for generating classification and explorative models based on association rules. The presented web-based rule editing software allows the user to perform common editing actions such as modify rule (add or remove attribute), delete rule, create new rule, or reorder rules. To observe the effect of a particular edit on predictive performance, the user can validate the rule list against a selected dataset using a scoring procedure. The system is equipped with functionality that facilitates its integration with crowdsourcing platforms commonly used to recruit participants.
      PubDate: 2020-09-11
  • A comparison of instance-level counterfactual explanation algorithms for
           behavioral and textual data: SEDC, LIME-C and SHAP-C
    • Abstract: Abstract Predictive systems based on high-dimensional behavioral and textual data have serious comprehensibility and transparency issues: linear models require investigating thousands of coefficients, while the opaqueness of nonlinear models makes things worse. Counterfactual explanations are becoming increasingly popular for generating insight into model predictions. This study aligns the recently proposed linear interpretable model-agnostic explainer and Shapley additive explanations with the notion of counterfactual explanations, and empirically compares the effectiveness and efficiency of these novel algorithms against a model-agnostic heuristic search algorithm for finding evidence counterfactuals using 13 behavioral and textual data sets. We show that different search methods have different strengths, and importantly, that there is much room for future research.
      PubDate: 2020-09-02
