Authors:A. Cholaquidis; A. Cuevas; R. Fraiman Pages: 5 - 24 Abstract: A functional distance \({\mathbb H}\) , based on the Hausdorff metric between the function hypographs, is proposed for the space \({\mathcal E}\) of non-negative real upper semicontinuous functions on a compact interval. The main goal of the paper is to show that the space \(({\mathcal E},{\mathbb H})\) is particularly suitable in some statistical problems with functional data which involve functions with very wiggly graphs and narrow, sharp peaks. A typical example is given by spectrograms, either obtained by magnetic resonance or by mass spectrometry. On the theoretical side, we show that \(({\mathcal E},{\mathbb H})\) is a complete, separable locally compact space and that the \({\mathbb H}\) -convergence of a sequence of functions implies the convergence of the respective maximum values of these functions. The probabilistic and statistical implications of these results are discussed, in particular regarding the consistency of k-NN classifiers for supervised classification problems with functional data in \({\mathbb H}\) . On the practical side, we provide the results of a small simulation study and check also the performance of our method in two real data problems of supervised classification involving mass spectra. PubDate: 2017-03-01 DOI: 10.1007/s11634-015-0217-7 Issue No:Vol. 11, No. 1 (2017)

Authors:Andri Mirzal Pages: 25 - 48 Abstract: Blind source separation (BSS) is a problem of recovering source signals from signal mixtures without or very limited information about the sources and the mixing process. From literatures, nonnegative matrix factorization (NMF) and independent component analysis (ICA) seem to be the mainstream techniques for solving the BSS problems. Even though the using of NMF and ICA for BSS is well studied, there is still a lack of works that compare the performances of these techniques. Moreover, the nonuniqueness property of NMF is rarely mentioned even though this property actually can make the reconstructed signals vary significantly, and thus introduces the difficulty on how to choose the representative reconstructions from several possible outcomes. In this paper, we compare the performances of NMF and ICA as BSS methods using some standard NMF and ICA algorithms, and point out the difficulty in choosing the representative reconstructions originated from the nonuniqueness property of NMF. PubDate: 2017-03-01 DOI: 10.1007/s11634-014-0192-4 Issue No:Vol. 11, No. 1 (2017)

Authors:Nathalie Girard; Karell Bertet; Muriel Visani Pages: 49 - 77 Abstract: The present paper deals with supervised classification methods based on Galois lattices and decision trees. Such ordered structures require attributes discretization and it is known that, for decision trees, local discretization improves the classification performance compared with global discretization. While most literature on discretization for Galois lattices relies on global discretization, the presented work introduces a new local discretization algorithm for Galois lattices which hinges on a property of some specific lattices that we introduce as dichotomic lattices. Their properties, co-atomicity and \(\vee \) -complementarity are proved along with their links with decision trees. Finally, some quantitative and qualitative evaluations of the local discretization are proposed. PubDate: 2017-03-01 DOI: 10.1007/s11634-015-0225-7 Issue No:Vol. 11, No. 1 (2017)

Authors:Wenxin Zhu; Ping Zhong Pages: 79 - 96 Abstract: In this paper, a new Support Vector Machine Plus (SVM+) type model called Minimum Class Variance SVM+ (MCVSVM+) is presented. Similar to SVM+, the proposed model utilizes the group information in the training data. We show that MCVSVM+ has both the advantages of SVM+ and Minimum Class Variance Support Vector Machine (MCVSVM). That is, MCVSVM+ not only considers class distribution characteristics in its optimization problem but also utilizes the additional information (i.e. group information) hidden in the data, in contrast to SVM+ that takes into consideration only the samples that are in the class boundaries. The experimental results demonstrate the validity and advantage of the new model compared with the standard SVM, SVM+ and MCVSVM. PubDate: 2017-03-01 DOI: 10.1007/s11634-015-0212-z Issue No:Vol. 11, No. 1 (2017)

Authors:Margret-Ruth Oelker; Gerhard Tutz Pages: 97 - 120 Abstract: Penalized estimation has become an established tool for regularization and model selection in regression models. A variety of penalties with specific features are available and effective algorithms for specific penalties have been proposed. But not much is available to fit models with a combination of different penalties. When modeling the rent data of Munich as in our application, various types of predictors call for a combination of a Ridge, a group Lasso and a Lasso-type penalty within one model. We propose to approximate penalties that are (semi-)norms of scalar linear transformations of the coefficient vector in generalized structured models—such that penalties of various kinds can be combined in one model. The approach is very general such that the Lasso, the fused Lasso, the Ridge, the smoothly clipped absolute deviation penalty, the elastic net and many more penalties are embedded. The computation is based on conventional penalized iteratively re-weighted least squares algorithms and hence, easy to implement. New penalties can be incorporated quickly. The approach is extended to penalties with vector based arguments. There are several possibilities to choose the penalty parameter(s). A software implementation is available. Some illustrative examples show promising results. PubDate: 2017-03-01 DOI: 10.1007/s11634-015-0205-y Issue No:Vol. 11, No. 1 (2017)

Authors:Caterina Liberati; Furio Camillo; Gilbert Saporta Pages: 121 - 138 Abstract: Due to the recent financial turmoil, a discussion in the banking sector about how to accomplish long term success, and how to follow an exhaustive and powerful strategy in credit scoring is being raised up. Recently, the significant theoretical advances in machine learning algorithms have pushed the application of kernel-based classifiers, producing very effective results. Unfortunately, such tools have an inability to provide an explanation, or comprehensible justification, for the solutions they supply. In this paper, we propose a new strategy to model credit scoring data, which exploits, indirectly, the classification power of the kernel machines into an operative field. A reconstruction process of the kernel classifier is performed via linear regression, if all predictors are numerical, or via a general linear model, if some or all predictors are categorical. The loss of performance, due to such approximation, is balanced by better interpretability for the end user, which is able to order, understand and to rank the influence of each category of the variables set in the prediction. An Italian bank case study has been illustrated and discussed; empirical results reveal a promising performance of the introduced strategy. PubDate: 2017-03-01 DOI: 10.1007/s11634-015-0213-y Issue No:Vol. 11, No. 1 (2017)

Authors:Maurizio Carpita; Enrico Ciavolino Pages: 139 - 158 Abstract: We extend the simple linear measurement error model through the inclusion of a composite indicator by using the generalized maximum entropy estimator. A Monte Carlo simulation study is proposed for comparing the performances of the proposed estimator to his counterpart the ordinary least squares “Adjusted for attenuation”. The two estimators are compared in term of correlation with the true latent variable, standard error and root mean of squared error. Two illustrative case studies are reported in order to discuss the results obtained on the real data set, and relate them to the conclusions drawn via simulation study. PubDate: 2017-03-01 DOI: 10.1007/s11634-016-0237-y Issue No:Vol. 11, No. 1 (2017)

Authors:Wolfgang Gaul; Dominique Vincent Pages: 159 - 178 Abstract: Topics that attract public attention can originate from current events or developments, might be influenced by situations in the past, and often continue to be of interest in the future. When respective information is made available textually, one possibility of detecting such topics of public importance consists in scrutinizing, e.g., appropriate press articles using—given the continual growth of information—text processing techniques enriched by computer routines which examine present-day textual material, check historical publications, find newly emerging topics, and are able to track topic trends over time. Information clustering based on content-(dis)similarity of the underlying textual material and graph-theoretical considerations to deal with the network of relationships between content-similar topics are described and combined in a new approach. Explanatory examples of topic detection and tracking in online news articles illustrate the usefulness of the approach in different situations. PubDate: 2017-03-01 DOI: 10.1007/s11634-016-0241-2 Issue No:Vol. 11, No. 1 (2017)

Authors:Vincenzo Spinelli Pages: 179 - 204 Abstract: In this work we address a technique for effectively clustering points in specific convex sets, called homogeneous boxes, having sides aligned with the coordinate axes (isothetic condition). The proposed clustering approach is based on homogeneity conditions, not according to some distance measure, and, even if it was originally developed in the context of the logical analysis of data, it is now placed inside the framework of Supervised clustering. First, we introduce the basic concepts in box geometry; then, we consider a generalized clustering algorithm based on a class of graphs, called incompatibility graphs. For supervised classification problems, we consider classifiers based on box sets, and compare the overall performances to the accuracy levels of competing methods for a wide range of real data sets. The results show that the proposed method performs comparably with other supervised learning methods in terms of accuracy. PubDate: 2017-03-01 DOI: 10.1007/s11634-016-0233-2 Issue No:Vol. 11, No. 1 (2017)

Authors:Carlos Lara-Alvarez; Leonardo Romero; Cuauhtemoc Gomez Pages: 205 - 218 Abstract: This paper introduces a Bayesian approach to solve the problem of fitting multiple straight lines to a set of 2D points. Other approaches use many arbitrary parameters and threshold values, the proposed criterion uses only the parameters of the measurement errors. Models with multiple lines are useful in many applications, this paper analyzes the performance of the new approach to solve a classical problem in robotics: finding a map of lines from laser measurements. Tests show that the Bayesian approach obtains reliable models. PubDate: 2017-03-01 DOI: 10.1007/s11634-016-0236-z Issue No:Vol. 11, No. 1 (2017)

Authors:Karim Abou-Moustafa; Frank P. Ferrie Abstract: Finding the set of nearest neighbors for a query point of interest appears in a variety of algorithms for machine learning and pattern recognition. Examples include k nearest neighbor classification, information retrieval, case-based reasoning, manifold learning, and nonlinear dimensionality reduction. In this work, we propose a new approach for determining a distance metric from the data for finding such neighboring points. For a query point of interest, our approach learns a generalized quadratic distance (GQD) metric based on the statistical properties in a “small” neighborhood for the point of interest. The locally learned GQD metric captures information such as the density, curvature, and the intrinsic dimensionality for the points falling in this particular neighborhood. Unfortunately, learning the GQD parameters under such a local learning mechanism is a challenging problem with a high computational overhead. To address these challenges, we estimate the GQD parameters using the minimum volume covering ellipsoid (MVCE) for a set of points. The advantage of the MVCE is two-fold. First, the MVCE together with the local learning approach approximate the functionality of a well known robust estimator for covariance matrices. Second, computing the MVCE is a convex optimization problem which, in addition to having a unique global solution, can be efficiently solved using a first order optimization algorithm. We validate our metric learning approach on a large variety of datasets and show that the proposed metric has promising results when compared with five algorithms from the literature for supervised metric learning. PubDate: 2017-04-25 DOI: 10.1007/s11634-017-0286-x

Authors:Afef Ben Brahim; Mohamed Limam Abstract: The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets. PubDate: 2017-04-24 DOI: 10.1007/s11634-017-0285-y

Authors:Kohei Adachi; Nickolay T. Trendafilov Abstract: We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA. PubDate: 2017-04-13 DOI: 10.1007/s11634-017-0284-z

Authors:Sonia Barahona; Ximo Gual-Arnau; Maria Victoria Ibáñez; Amelia Simó Abstract: Object classification according to their shape and size is of key importance in many scientific fields. This work focuses on the case where the size and shape of an object is characterized by a current. A current is a mathematical object which has been proved relevant to the modeling of geometrical data, like submanifolds, through integration of vector fields along them. As a consequence of the choice of a vector-valued reproducing kernel Hilbert space (RKHS) as a test space for integrating manifolds, it is possible to consider that shapes are embedded in this Hilbert Space. A vector-valued RKHS is a Hilbert space of vector fields; therefore, it is possible to compute a mean of shapes, or to calculate a distance between two manifolds. This embedding enables us to consider size-and-shape clustering algorithms. These algorithms are applied to a 3D database obtained from an anthropometric survey of the Spanish child population with a potential application to online sales of children’s wear. PubDate: 2017-03-11 DOI: 10.1007/s11634-017-0283-0

Authors:Daniel Baier; Sarah Frost Abstract: Brand confusion occurs when a consumer is exposed to an advertisement (ad) for brand A but believes that it is for brand B. If more consumers are confused in this direction than in the other one (assuming that an ad for B is for A), this asymmetry is a disadvantage for A. Consequently, the confusion potential and structure of ads has to be checked: A sample of consumers is exposed to a sample of ads. For each ad the consumers have to specify their guess about the advertised brand. Then, the collected data are aggregated and analyzed using, e.g., MDS or two-mode clustering. In this paper we compare this approach to a new one where image data analysis and classification is applied: The confusion potential and structure of ads is related to featurewise distances between ads and—to model asymmetric effects—to the strengths of the advertised brands. A sample application for the German beer market is presented, the results are encouraging. PubDate: 2017-03-04 DOI: 10.1007/s11634-017-0282-1

Authors:Parvin Ahmadi; Iman Gholampour; Mahmoud Tabandeh Abstract: In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information. PubDate: 2017-02-28 DOI: 10.1007/s11634-017-0280-3

Authors:Roberto Rocci; Stefano Antonio Gattone; Roberto Di Mari Abstract: Maximum likelihood estimation of Gaussian mixture models with different class-specific covariance matrices is known to be problematic. This is due to the unboundedness of the likelihood, together with the presence of spurious maximizers. Existing methods to bypass this obstacle are based on the fact that unboundedness is avoided if the eigenvalues of the covariance matrices are bounded away from zero. This can be done imposing some constraints on the covariance matrices, i.e. by incorporating a priori information on the covariance structure of the mixture components. The present work introduces a constrained approach, where the class conditional covariance matrices are shrunk towards a pre-specified target matrix \(\varvec{\varPsi }.\) Data-driven choices of the matrix \(\varvec{\varPsi },\) when a priori information is not available, and the optimal amount of shrinkage are investigated. Then, constraints based on a data-driven \(\varvec{\varPsi }\) are shown to be equivariant with respect to linear affine transformations, provided that the method used to select the target matrix be also equivariant. The effectiveness of the proposal is evaluated on the basis of a simulation study and an empirical example. PubDate: 2017-01-06 DOI: 10.1007/s11634-016-0279-1

Authors:María Teresa Gallegos; Gunter Ritter Abstract: The present paper proposes a new strategy for probabilistic (often called model-based) clustering. It is well known that local maxima of mixture likelihoods can be used to partition an underlying data set. However, local maxima are rarely unique. Therefore, it remains to select the reasonable solutions, and in particular the desired one. Credible partitions are usually recognized by separation (and cohesion) of their clusters. We use here the p values provided by the classical tests of Wilks, Hotelling, and Behrens–Fisher to single out those solutions that are well separated by location. It has been shown that reasonable solutions to a clustering problem are related to Pareto points in a plot of scale balance vs. model fit of all local maxima. We briefly review this theory and propose as solutions all well-fitting Pareto points in the set of local maxima separated by location in the above sense. We also design a new iterative, parameter-free cutting plane algorithm for the multivariate Behrens–Fisher problem. PubDate: 2016-12-30 DOI: 10.1007/s11634-016-0278-2