for Journals by Title or ISSN for Articles by Keywords help
 Subjects -> COMPUTER SCIENCE (Total: 1988 journals)     - ANIMATION AND SIMULATION (29 journals)    - ARTIFICIAL INTELLIGENCE (99 journals)    - AUTOMATION AND ROBOTICS (100 journals)    - CLOUD COMPUTING AND NETWORKS (63 journals)    - COMPUTER ARCHITECTURE (9 journals)    - COMPUTER ENGINEERING (9 journals)    - COMPUTER GAMES (16 journals)    - COMPUTER PROGRAMMING (23 journals)    - COMPUTER SCIENCE (1153 journals)    - COMPUTER SECURITY (45 journals)    - DATA BASE MANAGEMENT (13 journals)    - DATA MINING (32 journals)    - E-BUSINESS (22 journals)    - E-LEARNING (27 journals)    - ELECTRONIC DATA PROCESSING (21 journals)    - IMAGE AND VIDEO PROCESSING (40 journals)    - INFORMATION SYSTEMS (104 journals)    - INTERNET (92 journals)    - SOCIAL WEB (50 journals)    - SOFTWARE (33 journals)    - THEORY OF COMPUTING (8 journals) COMPUTER SCIENCE (1153 journals)                  1 2 3 4 5 6 | Last
 Advances in Data Analysis and Classification   [SJR: 1.113]   [H-I: 14]   [54 followers]  Follow         Hybrid journal (It can contain Open Access articles)    ISSN (Print) 1862-5355 - ISSN (Online) 1862-5347    Published by Springer-Verlag  [2353 journals]
• Exploratory data analysis for interval compositional data
• Authors: Karel Hron; Paula Brito; Peter Filzmoser
Pages: 223 - 241
Abstract: Abstract Compositional data are considered as data where relative contributions of parts on a whole, conveyed by (log-)ratios between them, are essential for the analysis. In Symbolic Data Analysis (SDA), we are in the framework of interval data when elements are characterized by variables whose values are intervals on $$\mathbb {R}$$ representing inherent variability. In this paper, we address the special problem of the analysis of interval compositions, i.e., when the interval data are obtained by the aggregation of compositions. It is assumed that the interval information is represented by the respective midpoints and ranges, and both sources of information are considered as compositions. In this context, we introduce the representation of interval data as three-way data. In the framework of the log-ratio approach from compositional data analysis, it is outlined how interval compositions can be treated in an exploratory context. The goal of the analysis is to represent the compositions by coordinates which are interpretable in terms of the original compositional parts. This is achieved by summarizing all relative information (logratios) about each part into one coordinate from the coordinate system. Based on an example from the European Union Statistics on Income and Living Conditions (EU-SILC), several possibilities for an exploratory data analysis approach for interval compositions are outlined and investigated.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0245-y
Issue No: Vol. 11, No. 2 (2017)

• Model-based regression clustering for high-dimensional data: application
to functional data
• Authors: Emilie Devijver
Pages: 243 - 279
Abstract: Abstract Finite mixture regression models are useful for modeling the relationship between response and predictors arising from different subpopulations. In this article, we study high-dimensional predictors and high-dimensional response and propose two procedures to cluster observations according to the link between predictors and the response. To reduce the dimension, we propose to use the Lasso estimator, which takes into account the sparsity and a maximum likelihood estimator penalized by the rank, to take into account the matrix structure. To choose the number of components and the sparsity level, we construct a collection of models, varying those two parameters and we select a model among this collection with a non-asymptotic criterion. We extend these procedures to functional data, where predictors and responses are functions. For this purpose, we use a wavelet-based approach. For each situation, we provide algorithms and apply and evaluate our methods both on simulated and real datasets, to understand how they work in practice.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0242-1
Issue No: Vol. 11, No. 2 (2017)

• Mixture models for ordinal responses to account for uncertainty of choice
• Authors: Gerhard Tutz; Micha Schneider; Maria Iannario; Domenico Piccolo
Pages: 281 - 305
Abstract: Abstract In CUB models the uncertainty of choice is explicitly modelled as a Combination of discrete Uniform and shifted Binomial random variables. The basic concept to model the response as a mixture of a deliberate choice of a response category and an uncertainty component that is represented by a uniform distribution on the response categories is extended to a much wider class of models. The deliberate choice can in particular be determined by classical ordinal response models as the cumulative and adjacent categories model. Then one obtains the traditional and flexible models as special cases when the uncertainty component is irrelevant. It is shown that the effect of explanatory variables is underestimated if the uncertainty component is neglected in a cumulative type mixture model. Visualization tools for the effects of variables are proposed and the modelling strategies are evaluated by use of real data sets. It is demonstrated that the extended class of models frequently yields better fit than classical ordinal response models without an uncertainty component.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0247-9
Issue No: Vol. 11, No. 2 (2017)

• Logistic biplot for nominal data
• Authors: Julio César Hernández-Sánchez; José Luis Vicente-Villardón
Pages: 307 - 326
Abstract: Abstract Classical biplot methods allow for the simultaneous representation of individuals (rows) and variables (columns) of a data matrix. For binary data, logistic biplots have been recently developed. When data are nominal, both classical and binary logistic biplots are not adequate and techniques such as multiple correspondence analysis (MCA), latent trait analysis (LTA) or item response theory (IRT) for nominal items should be used instead. In this paper we extend the binary logistic biplot to nominal data. The resulting method is termed “nominal logistic biplot”(NLB), although the variables are represented as convex prediction regions rather than vectors. Using the methods from computational geometry, the set of prediction regions is converted to a set of points in such a way that the prediction for each individual is established by its closest “category point”. Then interpretation is based on distances rather than on projections. We study the geometry of such a representation and construct computational algorithms for the estimation of parameters and the calculation of prediction regions. Nominal logistic biplots extend both MCA and LTA in the sense that they give a graphical representation for LTA similar to the one obtained in MCA.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0249-7
Issue No: Vol. 11, No. 2 (2017)

• Principal component analysis for histogram-valued data
• Authors: J. Le-Rademacher; L. Billard
Pages: 327 - 351
Abstract: Abstract This paper introduces a principal component methodology for analysing histogram-valued data under the symbolic data domain. Currently, no comparable method exists for this type of data. The proposed method uses a symbolic covariance matrix to determine the principal component space. The resulting observations on principal component space are presented as polytopes for visualization. Numerical representation of the resulting polytopes via histogram-valued output is also presented. The necessary algorithms are included. The technique is illustrated on a weather data set.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0255-9
Issue No: Vol. 11, No. 2 (2017)

• T3C: improving a decision tree classification algorithm’s interval
splits on continuous attributes
• Authors: Panagiotis Tzirakis; Christos Tjortjis
Pages: 353 - 370
Abstract: Abstract This paper proposes, describes and evaluates T3C, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the size of the tree reasonably small. T3C is an improvement over algorithm T3 in the way it performs splits on continuous attributes. When run against publicly available data sets, T3C achieved lower generalisation error than T3 and the popular C4.5, and competitive results compared to Random Forest and Rotation Forest.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0246-x
Issue No: Vol. 11, No. 2 (2017)

• ADCLUS and INDCLUS: analysis, experimentation, and meta-heuristic
algorithm extensions
• Authors: Stephen L. France; Wen Chen; Yumin Deng
Pages: 371 - 393
Abstract: Abstract The ADCLUS and INDCLUS models, along with associated fitting techniques, can be used to extract an overlapping clustering structure from similarity data. In this paper, we examine the scalability of these models. We test the SINDLCUS algorithm and an adapted version of the SYMPRES algorithm on medium size datasets and try to infer their scalability and the degree of the local optima problem as the problem size increases. We describe several meta-heuristic approaches to minimizing the INDCLUS and ADCLUS loss functions.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0244-z
Issue No: Vol. 11, No. 2 (2017)

• A sequential distance-based approach for imputing missing data: Forward
Imputation
• Authors: Nadia Solaro; Alessandro Barbiero; Giancarlo Manzi; Pier Alda Ferrari
Pages: 395 - 414
Abstract: Abstract Missing data recurrently affect datasets in almost every field of quantitative research. The subject is vast and complex and has originated a literature rich in very different approaches to the problem. Within an exploratory framework, distance-based methods such as nearest-neighbour imputation (NNI), or procedures involving multivariate data analysis (MVDA) techniques seem to treat the problem properly. In NNI, the metric and the number of donors can be chosen at will. MVDA-based procedures expressly account for variable associations. The new approach proposed here, called Forward Imputation, ideally meets these features. It is designed as a sequential procedure that imputes missing data in a step-by-step process involving subsets of units according to their “completeness rate”. Two methods within this context are developed for the imputation of quantitative data. One applies NNI with the Mahalanobis distance, the other combines NNI and principal component analysis. Statistical properties of the two methods are discussed, and their performance is assessed, also in comparison with alternative imputation methods. To this purpose, a simulation study in the presence of different data patterns along with an application to real data are carried out, and practical hints for users are also provided.
PubDate: 2017-06-01
DOI: 10.1007/s11634-016-0243-0
Issue No: Vol. 11, No. 2 (2017)

• Backtransformation: a new representation of data processing chains with a
scalar decision function
• Authors: Mario Michael Krell; Sirko Straube
Pages: 415 - 439
Abstract: Abstract Data processing often transforms a complex signal using a set of different preprocessing algorithms to a single value as the outcome of a final decision function. Still, it is challenging to understand and visualize the interplay between the algorithms performing this transformation. Especially when dimensionality reduction is used, the original data structure (e.g., spatio-temporal information) is hidden from subsequent algorithms. To tackle this problem, we introduce the backtransformation concept suggesting to look at the combination of algorithms as one transformation which maps the original input signal to a single value. Therefore, it takes the derivative of the final decision function and transforms it back through the previous processing steps via backward iteration and the chain rule. The resulting derivative of the composed decision function in the sample of interest represents the complete decision process. Using it for visualizations might improve the understanding of the process. Often, it is possible to construct a feasible processing chain with affine mappings which simplifies the calculation for the backtransformation and the interpretation of the result a lot. In this case, the affine backtransformation provides the complete parameterization of the processing chain. This article introduces the theory, provides implementation guidelines, and presents three application examples.
PubDate: 2017-06-01
DOI: 10.1007/s11634-015-0229-3
Issue No: Vol. 11, No. 2 (2017)

• On visual distances for spectrum-type functional data
• Authors: A. Cholaquidis; A. Cuevas; R. Fraiman
Pages: 5 - 24
Abstract: Abstract A functional distance $${\mathbb H}$$ , based on the Hausdorff metric between the function hypographs, is proposed for the space $${\mathcal E}$$ of non-negative real upper semicontinuous functions on a compact interval. The main goal of the paper is to show that the space $$({\mathcal E},{\mathbb H})$$ is particularly suitable in some statistical problems with functional data which involve functions with very wiggly graphs and narrow, sharp peaks. A typical example is given by spectrograms, either obtained by magnetic resonance or by mass spectrometry. On the theoretical side, we show that $$({\mathcal E},{\mathbb H})$$ is a complete, separable locally compact space and that the $${\mathbb H}$$ -convergence of a sequence of functions implies the convergence of the respective maximum values of these functions. The probabilistic and statistical implications of these results are discussed, in particular regarding the consistency of k-NN classifiers for supervised classification problems with functional data in $${\mathbb H}$$ . On the practical side, we provide the results of a small simulation study and check also the performance of our method in two real data problems of supervised classification involving mass spectra.
PubDate: 2017-03-01
DOI: 10.1007/s11634-015-0217-7
Issue No: Vol. 11, No. 1 (2017)

• Evaluation of the evolution of relationships between topics over time
• Authors: Wolfgang Gaul; Dominique Vincent
Pages: 159 - 178
Abstract: Abstract Topics that attract public attention can originate from current events or developments, might be influenced by situations in the past, and often continue to be of interest in the future. When respective information is made available textually, one possibility of detecting such topics of public importance consists in scrutinizing, e.g., appropriate press articles using—given the continual growth of information—text processing techniques enriched by computer routines which examine present-day textual material, check historical publications, find newly emerging topics, and are able to track topic trends over time. Information clustering based on content-(dis)similarity of the underlying textual material and graph-theoretical considerations to deal with the network of relationships between content-similar topics are described and combined in a new approach. Explanatory examples of topic detection and tracking in online news articles illustrate the usefulness of the approach in different situations.
PubDate: 2017-03-01
DOI: 10.1007/s11634-016-0241-2
Issue No: Vol. 11, No. 1 (2017)

• Statistical inference in constrained latent class models for multinomial
data based on $$\phi$$ ϕ -divergence measures
• Authors: A. Felipe; N. Martín; P. Miranda; L. Pardo
Abstract: Abstract In this paper we explore the possibilities of applying $$\phi$$ -divergence measures in inferential problems in the field of latent class models (LCMs) for multinomial data. We first treat the problem of estimating the model parameters. As explained below, minimum $$\phi$$ -divergence estimators (M $$\phi$$ Es) considered in this paper are a natural extension of the maximum likelihood estimator (MLE), the usual estimator for this problem; we study the asymptotic properties of M $$\phi$$ Es, showing that they share the same asymptotic distribution as the MLE. To compare the efficiency of the M $$\phi$$ Es when the sample size is not big enough to apply the asymptotic results, we have carried out an extensive simulation study; from this study, we conclude that there are estimators in this family that are competitive with the MLE. Next, we deal with the problem of testing whether a LCM for multinomial data fits a data set; again, $$\phi$$ -divergence measures can be used to generate a family of test statistics generalizing both the classical likelihood ratio test and the chi-squared test statistics. Finally, we treat the problem of choosing the best model out of a sequence of nested LCMs; as before, $$\phi$$ -divergence measures can handle the problem and we derive a family of $$\phi$$ -divergence test statistics based on them; we study the asymptotic behavior of these test statistics, showing that it is the same as the classical test statistics. A simulation study for small and moderate sample sizes shows that there are some test statistics in the family that can compete with the classical likelihood ratio and the chi-squared test statistics.
PubDate: 2017-07-04
DOI: 10.1007/s11634-017-0289-7

• Minimum distance method for directional data and outlier detection
• Authors: Mercedes Fernandez Sau; Daniela Rodriguez
Abstract: Abstract In this paper, we propose estimators based on the minimum distance for the unknown parameters of a parametric density on the unit sphere. We show that these estimators are consistent and asymptotically normally distributed. Also, we apply our proposal to develop a method that allows us to detect potential atypical values. The behavior under small samples of the proposed estimators is studied using Monte Carlo simulations. Two applications of our procedure are illustrated with real data sets.
PubDate: 2017-06-02
DOI: 10.1007/s11634-017-0287-9

• Editorial for issue 2/2017
• PubDate: 2017-05-17
DOI: 10.1007/s11634-017-0288-8

• Local generalized quadratic distance metrics: application to the k
-nearest neighbors classifier
• Authors: Karim Abou-Moustafa; Frank P. Ferrie
Abstract: Abstract Finding the set of nearest neighbors for a query point of interest appears in a variety of algorithms for machine learning and pattern recognition. Examples include k nearest neighbor classification, information retrieval, case-based reasoning, manifold learning, and nonlinear dimensionality reduction. In this work, we propose a new approach for determining a distance metric from the data for finding such neighboring points. For a query point of interest, our approach learns a generalized quadratic distance (GQD) metric based on the statistical properties in a “small” neighborhood for the point of interest. The locally learned GQD metric captures information such as the density, curvature, and the intrinsic dimensionality for the points falling in this particular neighborhood. Unfortunately, learning the GQD parameters under such a local learning mechanism is a challenging problem with a high computational overhead. To address these challenges, we estimate the GQD parameters using the minimum volume covering ellipsoid (MVCE) for a set of points. The advantage of the MVCE is two-fold. First, the MVCE together with the local learning approach approximate the functionality of a well known robust estimator for covariance matrices. Second, computing the MVCE is a convex optimization problem which, in addition to having a unique global solution, can be efficiently solved using a first order optimization algorithm. We validate our metric learning approach on a large variety of datasets and show that the proposed metric has promising results when compared with five algorithms from the literature for supervised metric learning.
PubDate: 2017-04-25
DOI: 10.1007/s11634-017-0286-x

• Ensemble feature selection for high dimensional data: a new method and a
comparative study
• Authors: Afef Ben Brahim; Mohamed Limam
Abstract: Abstract The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.
PubDate: 2017-04-24
DOI: 10.1007/s11634-017-0285-y

• Sparsest factor analysis for clustering variables: a matrix decomposition
approach
• Authors: Kohei Adachi; Nickolay T. Trendafilov
Abstract: Abstract We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA.
PubDate: 2017-04-13
DOI: 10.1007/s11634-017-0284-z

• Unsupervised classification of children’s bodies using currents
• Authors: Sonia Barahona; Ximo Gual-Arnau; Maria Victoria Ibáñez; Amelia Simó
Abstract: Abstract Object classification according to their shape and size is of key importance in many scientific fields. This work focuses on the case where the size and shape of an object is characterized by a current. A current is a mathematical object which has been proved relevant to the modeling of geometrical data, like submanifolds, through integration of vector fields along them. As a consequence of the choice of a vector-valued reproducing kernel Hilbert space (RKHS) as a test space for integrating manifolds, it is possible to consider that shapes are embedded in this Hilbert Space. A vector-valued RKHS is a Hilbert space of vector fields; therefore, it is possible to compute a mean of shapes, or to calculate a distance between two manifolds. This embedding enables us to consider size-and-shape clustering algorithms. These algorithms are applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children’s wear.
PubDate: 2017-03-11
DOI: 10.1007/s11634-017-0283-0

• Relating brand confusion to ad similarities and brand strengths through
image data analysis and classification
• Authors: Daniel Baier; Sarah Frost
PubDate: 2017-03-04
DOI: 10.1007/s11634-017-0282-1

• Editorial for issue 1/2017
• PubDate: 2017-02-22
DOI: 10.1007/s11634-017-0281-2

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327

Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs