for Journals by Title or ISSN
for Articles by Keywords
help
  Subjects -> COMPUTER SCIENCE (Total: 2122 journals)
    - ANIMATION AND SIMULATION (31 journals)
    - ARTIFICIAL INTELLIGENCE (105 journals)
    - AUTOMATION AND ROBOTICS (105 journals)
    - CLOUD COMPUTING AND NETWORKS (67 journals)
    - COMPUTER ARCHITECTURE (10 journals)
    - COMPUTER ENGINEERING (11 journals)
    - COMPUTER GAMES (21 journals)
    - COMPUTER PROGRAMMING (26 journals)
    - COMPUTER SCIENCE (1231 journals)
    - COMPUTER SECURITY (50 journals)
    - DATA BASE MANAGEMENT (14 journals)
    - DATA MINING (38 journals)
    - E-BUSINESS (22 journals)
    - E-LEARNING (30 journals)
    - ELECTRONIC DATA PROCESSING (22 journals)
    - IMAGE AND VIDEO PROCESSING (40 journals)
    - INFORMATION SYSTEMS (107 journals)
    - INTERNET (96 journals)
    - SOCIAL WEB (53 journals)
    - SOFTWARE (34 journals)
    - THEORY OF COMPUTING (9 journals)

COMPUTER SCIENCE (1231 journals)                  1 2 3 4 5 6 7 | Last

Showing 1 - 200 of 872 Journals sorted alphabetically
3D Printing and Additive Manufacturing     Full-text available via subscription   (Followers: 24)
Abakós     Open Access   (Followers: 4)
ACM Computing Surveys     Hybrid Journal   (Followers: 31)
ACM Journal on Computing and Cultural Heritage     Hybrid Journal   (Followers: 8)
ACM Journal on Emerging Technologies in Computing Systems     Hybrid Journal   (Followers: 17)
ACM Transactions on Accessible Computing (TACCESS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Algorithms (TALG)     Hybrid Journal   (Followers: 15)
ACM Transactions on Applied Perception (TAP)     Hybrid Journal   (Followers: 5)
ACM Transactions on Architecture and Code Optimization (TACO)     Hybrid Journal   (Followers: 9)
ACM Transactions on Autonomous and Adaptive Systems (TAAS)     Hybrid Journal   (Followers: 9)
ACM Transactions on Computation Theory (TOCT)     Hybrid Journal   (Followers: 12)
ACM Transactions on Computational Logic (TOCL)     Hybrid Journal   (Followers: 3)
ACM Transactions on Computer Systems (TOCS)     Hybrid Journal   (Followers: 18)
ACM Transactions on Computer-Human Interaction     Hybrid Journal   (Followers: 16)
ACM Transactions on Computing Education (TOCE)     Hybrid Journal   (Followers: 7)
ACM Transactions on Design Automation of Electronic Systems (TODAES)     Hybrid Journal   (Followers: 6)
ACM Transactions on Economics and Computation     Hybrid Journal   (Followers: 2)
ACM Transactions on Embedded Computing Systems (TECS)     Hybrid Journal   (Followers: 3)
ACM Transactions on Information Systems (TOIS)     Hybrid Journal   (Followers: 20)
ACM Transactions on Intelligent Systems and Technology (TIST)     Hybrid Journal   (Followers: 8)
ACM Transactions on Interactive Intelligent Systems (TiiS)     Hybrid Journal   (Followers: 5)
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)     Hybrid Journal   (Followers: 9)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)     Hybrid Journal   (Followers: 6)
ACM Transactions on Sensor Networks (TOSN)     Hybrid Journal   (Followers: 8)
ACM Transactions on Speech and Language Processing (TSLP)     Hybrid Journal   (Followers: 9)
ACM Transactions on Storage     Hybrid Journal  
ACS Applied Materials & Interfaces     Hybrid Journal   (Followers: 35)
Acta Automatica Sinica     Full-text available via subscription   (Followers: 2)
Acta Informatica Malaysia     Open Access  
Acta Universitatis Cibiniensis. Technical Series     Open Access  
Ad Hoc Networks     Hybrid Journal   (Followers: 11)
Adaptive Behavior     Hybrid Journal   (Followers: 10)
Advanced Engineering Materials     Hybrid Journal   (Followers: 29)
Advanced Science Letters     Full-text available via subscription   (Followers: 11)
Advances in Adaptive Data Analysis     Hybrid Journal   (Followers: 7)
Advances in Artificial Intelligence     Open Access   (Followers: 15)
Advances in Calculus of Variations     Hybrid Journal   (Followers: 6)
Advances in Catalysis     Full-text available via subscription   (Followers: 5)
Advances in Computational Mathematics     Hybrid Journal   (Followers: 19)
Advances in Computer Engineering     Open Access   (Followers: 4)
Advances in Computer Science : an International Journal     Open Access   (Followers: 14)
Advances in Computing     Open Access   (Followers: 2)
Advances in Data Analysis and Classification     Hybrid Journal   (Followers: 59)
Advances in Engineering Software     Hybrid Journal   (Followers: 28)
Advances in Geosciences (ADGEO)     Open Access   (Followers: 14)
Advances in Human Factors/Ergonomics     Full-text available via subscription   (Followers: 23)
Advances in Human-Computer Interaction     Open Access   (Followers: 21)
Advances in Materials Science     Open Access   (Followers: 15)
Advances in Operations Research     Open Access   (Followers: 12)
Advances in Parallel Computing     Full-text available via subscription   (Followers: 7)
Advances in Porous Media     Full-text available via subscription   (Followers: 5)
Advances in Remote Sensing     Open Access   (Followers: 51)
Advances in Science and Research (ASR)     Open Access   (Followers: 6)
Advances in Technology Innovation     Open Access   (Followers: 6)
AEU - International Journal of Electronics and Communications     Hybrid Journal   (Followers: 8)
African Journal of Information and Communication     Open Access   (Followers: 9)
African Journal of Mathematics and Computer Science Research     Open Access   (Followers: 4)
AI EDAM     Hybrid Journal   (Followers: 1)
Air, Soil & Water Research     Open Access   (Followers: 14)
AIS Transactions on Human-Computer Interaction     Open Access   (Followers: 7)
Algebras and Representation Theory     Hybrid Journal   (Followers: 1)
Algorithms     Open Access   (Followers: 11)
American Journal of Computational and Applied Mathematics     Open Access   (Followers: 5)
American Journal of Computational Mathematics     Open Access   (Followers: 4)
American Journal of Information Systems     Open Access   (Followers: 6)
American Journal of Sensor Technology     Open Access   (Followers: 4)
Anais da Academia Brasileira de Ciências     Open Access   (Followers: 2)
Analog Integrated Circuits and Signal Processing     Hybrid Journal   (Followers: 7)
Analysis in Theory and Applications     Hybrid Journal   (Followers: 1)
Animation Practice, Process & Production     Hybrid Journal   (Followers: 5)
Annals of Combinatorics     Hybrid Journal   (Followers: 4)
Annals of Data Science     Hybrid Journal   (Followers: 12)
Annals of Mathematics and Artificial Intelligence     Hybrid Journal   (Followers: 12)
Annals of Pure and Applied Logic     Open Access   (Followers: 3)
Annals of Software Engineering     Hybrid Journal   (Followers: 13)
Annals of West University of Timisoara - Mathematics and Computer Science     Open Access  
Annual Reviews in Control     Hybrid Journal   (Followers: 8)
Anuario Americanista Europeo     Open Access  
Applicable Algebra in Engineering, Communication and Computing     Hybrid Journal   (Followers: 2)
Applied and Computational Harmonic Analysis     Full-text available via subscription   (Followers: 1)
Applied Artificial Intelligence: An International Journal     Hybrid Journal   (Followers: 12)
Applied Categorical Structures     Hybrid Journal   (Followers: 5)
Applied Clinical Informatics     Hybrid Journal   (Followers: 2)
Applied Computational Intelligence and Soft Computing     Open Access   (Followers: 14)
Applied Computer Systems     Open Access   (Followers: 2)
Applied Informatics     Open Access  
Applied Mathematics and Computation     Hybrid Journal   (Followers: 33)
Applied Medical Informatics     Open Access   (Followers: 11)
Applied Numerical Mathematics     Hybrid Journal   (Followers: 5)
Applied Soft Computing     Hybrid Journal   (Followers: 17)
Applied Spatial Analysis and Policy     Hybrid Journal   (Followers: 7)
Applied System Innovation     Open Access  
Architectural Theory Review     Hybrid Journal   (Followers: 3)
Archive of Applied Mechanics     Hybrid Journal   (Followers: 6)
Archive of Numerical Software     Open Access  
Archives and Museum Informatics     Hybrid Journal   (Followers: 152)
Archives of Computational Methods in Engineering     Hybrid Journal   (Followers: 6)
arq: Architectural Research Quarterly     Hybrid Journal   (Followers: 8)
Artifact     Open Access   (Followers: 2)
Artificial Life     Hybrid Journal   (Followers: 7)
Asia Pacific Journal on Computational Engineering     Open Access  
Asia-Pacific Journal of Information Technology and Multimedia     Open Access   (Followers: 1)
Asian Journal of Control     Hybrid Journal  
Assembly Automation     Hybrid Journal   (Followers: 2)
at - Automatisierungstechnik     Hybrid Journal   (Followers: 1)
Australian Educational Computing     Open Access   (Followers: 1)
Automatic Control and Computer Sciences     Hybrid Journal   (Followers: 6)
Automatic Documentation and Mathematical Linguistics     Hybrid Journal   (Followers: 5)
Automatica     Hybrid Journal   (Followers: 12)
Automation in Construction     Hybrid Journal   (Followers: 7)
Autonomous Mental Development, IEEE Transactions on     Hybrid Journal   (Followers: 8)
Balkan Journal of Electrical and Computer Engineering     Open Access  
Basin Research     Hybrid Journal   (Followers: 5)
Behaviour & Information Technology     Hybrid Journal   (Followers: 51)
Big Data and Cognitive Computing     Open Access   (Followers: 3)
Biodiversity Information Science and Standards     Open Access  
Bioinformatics     Hybrid Journal   (Followers: 323)
Biomedical Engineering     Hybrid Journal   (Followers: 16)
Biomedical Engineering and Computational Biology     Open Access   (Followers: 13)
Biomedical Engineering, IEEE Reviews in     Full-text available via subscription   (Followers: 19)
Biomedical Engineering, IEEE Transactions on     Hybrid Journal   (Followers: 35)
Briefings in Bioinformatics     Hybrid Journal   (Followers: 51)
British Journal of Educational Technology     Hybrid Journal   (Followers: 157)
Broadcasting, IEEE Transactions on     Hybrid Journal   (Followers: 12)
c't Magazin fuer Computertechnik     Full-text available via subscription   (Followers: 1)
CALCOLO     Hybrid Journal  
Calphad     Hybrid Journal   (Followers: 2)
Canadian Journal of Electrical and Computer Engineering     Full-text available via subscription   (Followers: 15)
Capturing Intelligence     Full-text available via subscription  
Catalysis in Industry     Hybrid Journal   (Followers: 1)
CEAS Space Journal     Hybrid Journal   (Followers: 2)
Cell Communication and Signaling     Open Access   (Followers: 2)
Central European Journal of Computer Science     Hybrid Journal   (Followers: 5)
CERN IdeaSquare Journal of Experimental Innovation     Open Access   (Followers: 3)
Chaos, Solitons & Fractals     Hybrid Journal   (Followers: 3)
Chemometrics and Intelligent Laboratory Systems     Hybrid Journal   (Followers: 15)
ChemSusChem     Hybrid Journal   (Followers: 7)
China Communications     Full-text available via subscription   (Followers: 8)
Chinese Journal of Catalysis     Full-text available via subscription   (Followers: 2)
CIN Computers Informatics Nursing     Hybrid Journal   (Followers: 11)
Circuits and Systems     Open Access   (Followers: 15)
Clean Air Journal     Full-text available via subscription   (Followers: 1)
CLEI Electronic Journal     Open Access  
Clin-Alert     Hybrid Journal   (Followers: 1)
Clinical eHealth     Open Access  
Cluster Computing     Hybrid Journal   (Followers: 2)
Cognitive Computation     Hybrid Journal   (Followers: 4)
COMBINATORICA     Hybrid Journal  
Combinatorics, Probability and Computing     Hybrid Journal   (Followers: 4)
Combustion Theory and Modelling     Hybrid Journal   (Followers: 14)
Communication Methods and Measures     Hybrid Journal   (Followers: 13)
Communication Theory     Hybrid Journal   (Followers: 24)
Communications Engineer     Hybrid Journal   (Followers: 1)
Communications in Algebra     Hybrid Journal   (Followers: 3)
Communications in Computational Physics     Full-text available via subscription   (Followers: 2)
Communications in Information Science and Management Engineering     Open Access   (Followers: 4)
Communications in Partial Differential Equations     Hybrid Journal   (Followers: 4)
Communications of the ACM     Full-text available via subscription   (Followers: 51)
Communications of the Association for Information Systems     Open Access   (Followers: 16)
COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering     Hybrid Journal   (Followers: 3)
Complex & Intelligent Systems     Open Access   (Followers: 1)
Complex Adaptive Systems Modeling     Open Access  
Complex Analysis and Operator Theory     Hybrid Journal   (Followers: 2)
Complexity     Hybrid Journal   (Followers: 6)
Complexus     Full-text available via subscription  
Composite Materials Series     Full-text available via subscription   (Followers: 8)
Computación y Sistemas     Open Access  
Computation     Open Access   (Followers: 1)
Computational and Applied Mathematics     Hybrid Journal   (Followers: 3)
Computational and Mathematical Biophysics     Open Access   (Followers: 1)
Computational and Mathematical Methods in Medicine     Open Access   (Followers: 2)
Computational and Mathematical Organization Theory     Hybrid Journal   (Followers: 2)
Computational and Structural Biotechnology Journal     Open Access   (Followers: 1)
Computational and Theoretical Chemistry     Hybrid Journal   (Followers: 9)
Computational Astrophysics and Cosmology     Open Access   (Followers: 1)
Computational Biology and Chemistry     Hybrid Journal   (Followers: 12)
Computational Chemistry     Open Access   (Followers: 2)
Computational Cognitive Science     Open Access   (Followers: 2)
Computational Complexity     Hybrid Journal   (Followers: 4)
Computational Condensed Matter     Open Access   (Followers: 1)
Computational Ecology and Software     Open Access   (Followers: 9)
Computational Economics     Hybrid Journal   (Followers: 9)
Computational Geosciences     Hybrid Journal   (Followers: 17)
Computational Linguistics     Open Access   (Followers: 24)
Computational Management Science     Hybrid Journal  
Computational Mathematics and Modeling     Hybrid Journal   (Followers: 8)
Computational Mechanics     Hybrid Journal   (Followers: 5)
Computational Methods and Function Theory     Hybrid Journal  
Computational Molecular Bioscience     Open Access   (Followers: 2)
Computational Optimization and Applications     Hybrid Journal   (Followers: 8)
Computational Particle Mechanics     Hybrid Journal   (Followers: 1)
Computational Research     Open Access   (Followers: 1)
Computational Science and Discovery     Full-text available via subscription   (Followers: 2)
Computational Science and Techniques     Open Access  
Computational Statistics     Hybrid Journal   (Followers: 14)
Computational Statistics & Data Analysis     Hybrid Journal   (Followers: 35)
Computer     Full-text available via subscription   (Followers: 105)
Computer Aided Surgery     Open Access   (Followers: 6)
Computer Applications in Engineering Education     Hybrid Journal   (Followers: 8)
Computer Communications     Hybrid Journal   (Followers: 16)

        1 2 3 4 5 6 7 | Last

Journal Cover
Advances in Data Analysis and Classification
Journal Prestige (SJR): 1.09
Citation Impact (citeScore): 1
Number of Followers: 59  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 1862-5355 - ISSN (Online) 1862-5347
Published by Springer-Verlag Homepage  [2352 journals]
  • Exploration of the variability of variable selection based on distances
           between bootstrap sample results
    • Abstract: It is well known that variable selection in multiple regression can be unstable and that the model uncertainty can be considerable. The model uncertainty can be quantified and explored by bootstrap resampling, see Sauerbrei et al. (Biom J 57:531–555, 2015). Here approaches are introduced that use the results of bootstrap replications of the variable selection process to obtain more detailed information about the data. Analyses will be based on dissimilarities between the results of the analyses of different bootstrap samples. Dissimilarities are computed between the vector of predictions, and between the sets of selected variables. The dissimilarities are used to map the models by multidimensional scaling, to cluster them, and to construct heatplots. Clusters can point to different interpretations of the data that could arise from different selections of variables supported by different bootstrap samples. A new measure of variable selection instability is also defined. The methodology can be applied to various regression models, estimators, and variable selection methods. It will be illustrated by three real data examples, using linear regression and a Cox proportional hazards model, and model selection by AIC and BIC.
      PubDate: 2019-02-15
       
  • Discriminant analysis for discrete variables derived from a
           tree-structured graphical model
    • Authors: Gonzalo Perez-de-la-Cruz; Guillermina Eslava-Gomez
      Abstract: The purpose of this paper is to illustrate the potential use of discriminant analysis for discrete variables whose dependence structure is assumed to follow, or can be approximated by, a tree-structured graphical model. This is done by comparing its empirical performance, using estimated error rates for real and simulated data, with the well-known Naive Bayes classification rule and with linear logistic regression, both of which do not consider any interaction between variables, and with models that consider interactions like a decomposable and the saturated model. The results show that discriminant analysis based on tree-structured graphical models, a simple nonlinear method including only some of the pairwise interactions between variables, is competitive with, and sometimes superior to, other methods which assume no interactions, and has the advantage over more complex decomposable models of finding the graph structure in a fast way and exact form.
      PubDate: 2019-02-12
      DOI: 10.1007/s11634-019-00352-z
       
  • Ensemble of a subset of k NN classifiers
    • Authors: Asma Gul; Aris Perperoglou; Zardad Khan; Osama Mahmoud; Miftahuddin Miftahuddin; Werner Adler; Berthold Lausen
      Pages: 827 - 840
      Abstract: Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-015-0227-5
      Issue No: Vol. 12, No. 4 (2018)
       
  • Understanding non-linear modeling of measurement invariance in
           heterogeneous populations
    • Authors: Deana Desa
      Pages: 841 - 865
      Abstract: This study examined how a non-linear modeling of ordered categorical variables within multiple-group confirmatory factor analysis supported measurement invariance. A four-item classroom disciplinary climate scale used in cross-cultural framework was empirically investigated. In the first part of the analysis, a separated categorical confirmatory factor analysis was initially applied to account for the complex structure of the relationships between the observed measures in each country. The categorical multiple-group confirmatory factor analysis (MGCFA) was then used to conduct a cross-country examination of full measurement invariance namely the configural, metric, and scalar levels of invariance in the classroom discipline climate measures. The categorical MGCFA modeling supported configural and metric invariances as well as scalar invariance for the latent factor structure of classroom disciplinary climate. This finding implying meaningful cross-country comparisons on the scale means, on the associations of classroom disciplinary climate scale with other scales and on the item-factor latent structure. Application of the categorical modeling appeared to correctly specify the factor structure of the scale, thereby promising the appropriateness of reporting comparisons such as rankings of many groups, and illustrating league tables of different heterogeneous groups. Limitations of the modeling in this study and future suggestions for measurement invariance testing in studies with large numbers of groups are discussed.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-016-0240-3
      Issue No: Vol. 12, No. 4 (2018)
       
  • A comparative study on large scale kernelized support vector machines
    • Authors: Daniel Horn; Aydın Demircioğlu; Bernd Bischl; Tobias Glasmachers; Claus Weihs
      Pages: 867 - 883
      Abstract: Kernelized support vector machines (SVMs) belong to the most widely used classification methods. However, in contrast to linear SVMs, the computation time required to train such a machine becomes a bottleneck when facing large data sets. In order to mitigate this shortcoming of kernel SVMs, many approximate training algorithms were developed. While most of these methods claim to be much faster than the state-of-the-art solver LIBSVM, a thorough comparative study is missing. We aim to fill this gap. We choose several well-known approximate SVM solvers and compare their performance on a number of large benchmark data sets. Our focus is to analyze the trade-off between prediction error and runtime for different learning and accuracy parameter settings. This includes simple subsampling of the data, the poor-man’s approach to handling large scale problems. We employ model-based multi-objective optimization, which allows us to tune the parameters of learning machine and solver over the full range of accuracy/runtime trade-offs. We analyze (differences between) solvers by studying and comparing the Pareto fronts formed by the two objectives classification error and training time. Unsurprisingly, given more runtime most solvers are able to find more accurate solutions, i.e., achieve a higher prediction accuracy. It turns out that LIBSVM with subsampling of the data is a strong baseline. Some solvers systematically outperform others, which allows us to give concrete recommendations of when to use which solver.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-016-0265-7
      Issue No: Vol. 12, No. 4 (2018)
       
  • A computationally fast variable importance test for random forests for
           high-dimensional data
    • Authors: Silke Janitza; Ender Celik; Anne-Laure Boulesteix
      Pages: 885 - 915
      Abstract: Random forests are a commonly used tool for classification and for ranking candidate predictors based on the so-called variable importance measures. These measures attribute scores to the variables reflecting their importance. A drawback of variable importance measures is that there is no natural cutoff that can be used to discriminate between important and non-important variables. Several approaches, for example approaches based on hypothesis testing, were developed for addressing this problem. The existing testing approaches require the repeated computation of random forests. While for low-dimensional settings those approaches might be computationally tractable, for high-dimensional settings typically including thousands of candidate predictors, computing time is enormous. In this article a computationally fast heuristic variable importance test is proposed that is appropriate for high-dimensional data where many variables do not carry any information. The testing approach is based on a modified version of the permutation variable importance, which is inspired by cross-validation procedures. The new approach is tested and compared to the approach of Altmann and colleagues using simulation studies, which are based on real data from high-dimensional binary classification settings. The new approach controls the type I error and has at least comparable power at a substantially smaller computation time in the studies. Thus, it might be used as a computationally fast alternative to existing procedures for high-dimensional data settings where many variables do not carry any information. The new approach is implemented in the R package vita.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-016-0276-4
      Issue No: Vol. 12, No. 4 (2018)
       
  • Rank-based classifiers for extremely high-dimensional gene expression data
    • Authors: Ludwig Lausser; Florian Schmid; Lyn-Rouven Schirra; Adalbert F. X. Wilhelm; Hans A. Kestler
      Pages: 917 - 936
      Abstract: Predicting phenotypes on the basis of gene expression profiles is a classification task that is becoming increasingly important in the field of precision medicine. Although these expression signals are real-valued, it is questionable if they can be analyzed on an interval scale. As with many biological signals their influence on e.g. protein levels is usually non-linear and thus can be misinterpreted. In this article we study gene expression profiles with up to 54,000 dimensions. We analyze these measurements on an ordinal scale by replacing the real-valued profiles by their ranks. This type of rank transformation can be used for the construction of invariant classifiers that are not affected by noise induced by data transformations which can occur in the measurement setup. Our 10 \(\times \) 10 fold cross-validation experiments on 86 different data sets and 19 different classification models indicate that classifiers largely benefit from this transformation. Especially random forests and support vector machines achieve improved classification results on a significant majority of datasets.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-016-0277-3
      Issue No: Vol. 12, No. 4 (2018)
       
  • Ensemble feature selection for high dimensional data: a new method and a
           comparative study
    • Authors: Afef Ben Brahim; Mohamed Limam
      Pages: 937 - 952
      Abstract: The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-017-0285-y
      Issue No: Vol. 12, No. 4 (2018)
       
  • An efficient random forests algorithm for high dimensional data
           classification
    • Authors: Qiang Wang; Thanh-Tung Nguyen; Joshua Z. Huang; Thuy Thi Nguyen
      Pages: 953 - 972
      Abstract: In this paper, we propose a new random forest (RF) algorithm to deal with high dimensional data for classification using subspace feature sampling method and feature value searching. The new subspace sampling method maintains the diversity and randomness of the forest and enables one to generate trees with a lower prediction error. A greedy technique is used to handle cardinal categorical features for efficient node splitting when building decision trees in the forest. This allows trees to handle very high cardinality meanwhile reducing computational time in building the RF model. Extensive experiments on high dimensional real data sets including standard machine learning data sets and image data sets have been conducted. The results demonstrated that the proposed approach for learning RFs significantly reduced prediction errors and outperformed most existing RFs when dealing with high-dimensional data.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-018-0318-1
      Issue No: Vol. 12, No. 4 (2018)
       
  • Equi-Clustream: a framework for clustering time evolving mixed data
    • Authors: Ravi Sankar Sangam; Hari Om
      Pages: 973 - 995
      Abstract: In data stream environment, most of the conventional clustering algorithms are not sufficiently efficient, since large volumes of data arrive in a stream and these data points unfold with time. The problem of clustering time-evolving metric data and categorical time-evolving data has separately been well explored in recent years, but the problem of clustering mixed type time-evolving data remains a challenging issue due to an awkward gap between the structure of metric and categorical attributes. In this paper, we devise a generalized framework, termed Equi-Clustream to dynamically cluster mixed type time-evolving data, which comprises three algorithms: a Hybrid Drifting Concept Detection Algorithm that detects the drifting concept between the current sliding window and previous sliding window, a Hybrid Data Labeling Algorithm that assigns an appropriate cluster label to each data vector of the current non-drifting window based on the clustering result of the previous sliding window, and a visualization algorithm that analyses the relationship between the clusters at different timestamps and also visualizes the evolving trends of the clusters. The efficacy of the proposed framework is shown by experiments on synthetic and real world datasets.
      PubDate: 2018-12-01
      DOI: 10.1007/s11634-018-0316-3
      Issue No: Vol. 12, No. 4 (2018)
       
  • Convex clustering for binary data
    • Authors: Hosik Choi; Seokho Lee
      Abstract: We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a natural distribution for binary data and by formulating convex clustering using a pairwise penalty on prototypes of clusters. Under convex clustering, we show that the typical \(\ell _1\) pairwise fused penalty results in ineffective cluster formation. In an attempt to promote the clustering performance and select the relevant clustering variables, we propose the penalized maximum likelihood estimation with an \(\ell _2\) fused penalty on the fusion parameters and an \(\ell _1\) penalty on the loading matrix. We provide an efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers. Numerical studies confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.
      PubDate: 2018-11-14
      DOI: 10.1007/s11634-018-0350-1
       
  • Special issue on “Science of big data: theory, methods and
           applications”
    • Authors: Hans A. Kestler; Paul D. McNicholas; Adalbert F. X. Wilhelm
      PubDate: 2018-11-01
      DOI: 10.1007/s11634-018-0349-7
       
  • Orthogonal nonnegative matrix tri-factorization based on Tweedie
           distributions
    • Authors: Hiroyasu Abe; Hiroshi Yadohisa
      Abstract: Orthogonal nonnegative matrix tri-factorization (ONMTF) is a biclustering method using a given nonnegative data matrix and has been applied to document-term clustering, collaborative filtering, and so on. In previously proposed ONMTF methods, it is assumed that the error distribution is normal. However, the assumption of normal distribution is not always appropriate for nonnegative data. In this paper, we propose three new ONMTF methods, which respectively employ the following error distributions: normal, Poisson, and compound Poisson. To develop the new methods, we adopt a k-means based algorithm but not a multiplicative updating algorithm, which was the main method used for obtaining estimators in previous methods. A simulation study and an application involving document-term matrices demonstrate that our method can outperform previous methods, in terms of the goodness of clustering and in the estimation of the factor matrix.
      PubDate: 2018-10-25
      DOI: 10.1007/s11634-018-0348-8
       
  • Random effects clustering in multilevel modeling: choosing a proper
           partition
    • Authors: Claudio Conversano; Massimo Cannas; Francesco Mola; Emiliano Sironi
      Abstract: A novel criterion for estimating a latent partition of the observed groups based on the output of a hierarchical model is presented. It is based on a loss function combining the Gini income inequality ratio and the predictability index of Goodman and Kruskal in order to achieve maximum heterogeneity of random effects across groups and maximum homogeneity of predicted probabilities inside estimated clusters. The index is compared with alternative approaches in a simulation study and applied in a case study concerning the role of hospital level variables in deciding for a cesarean section.
      PubDate: 2018-10-12
      DOI: 10.1007/s11634-018-0347-9
       
  • Supervised learning via smoothed Polya trees
    • Authors: William Cipolli; Timothy Hanson
      Abstract: We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) (Cox in J R Stat Soc Ser B (Methodol) 20:215–242, 1958) and Linear Discriminant Analysis (LDA) (Fisher in Ann Eugen 7:179–188, 1936; Rao in J R Stat Soc Ser B 10:159–203, 1948) to the Bayesian nonparametric setting, providing a competitor to MclustDA (Fraley and Raftery in Am Stat Assoc 97:611–631, 2002). This approach models the data distribution for each class using a multivariate Polya tree and realizes impressive results in simulations and real data analyses. The flexibility gained from further relaxing the distributional assumptions of QDA can greatly improve the ability to correctly classify new observations for models with severe deviations from parametric distributional assumptions, while still performing well when the assumptions hold. The proposed method is quite fast compared to other supervised classifiers and very simple to implement as there are no kernel tricks or initialization steps perhaps making it one of the more user-friendly approaches to supervised learning. This highlights a significant feature of the proposed methodology as suboptimal tuning can greatly hamper classification performance; e.g., SVMs fit with non-optimal kernels perform significantly worse.
      PubDate: 2018-10-12
      DOI: 10.1007/s11634-018-0344-z
       
  • sARI: a soft agreement measure for class partitions incorporating
           assignment probabilities
    • Authors: Abby Flynt; Nema Dean; Rebecca Nugent
      Abstract: Agreement indices are commonly used to summarize the performance of both classification and clustering methods. The easy interpretation/intuition and desirable properties that result from the Rand and adjusted Rand indices, has led to their popularity over other available indices. While more algorithmic clustering approaches like k-means and hierarchical clustering produce hard partition assignments (assigning observations to a single cluster), other techniques like model-based clustering include information about the certainty of allocation of objects through class membership probabilities (soft partitions). To assess performance using traditional indices, e.g., the adjusted Rand index (ARI), the soft partition is mapped to a hard set of assignments, which commonly overstates the certainty of correct assignments. This paper proposes an extension of the ARI, the soft adjusted Rand index (sARI), with similar intuition and interpretation but also incorporating information from one or two soft partitions. It can be used in conjunction with the ARI, comparing the similarities of hard to soft, or soft to soft partitions to the similarities of the mapped hard partitions. Simulation study results support the intuition that in general, mapping to hard partitions tends to increase the measure of similarity between partitions. In applications, the sARI more accurately reflects the cluster boundary overlap commonly seen in real data.
      PubDate: 2018-10-09
      DOI: 10.1007/s11634-018-0346-x
       
  • Generalised linear model trees with global additive effects
    • Authors: Heidi Seibold; Torsten Hothorn; Achim Zeileis
      Abstract: Model-based trees are used to find subgroups in data which differ with respect to model parameters. In some applications it is natural to keep some parameters fixed globally for all observations while asking if and how other parameters vary across subgroups. Existing implementations of model-based trees can only deal with the scenario where all parameters depend on the subgroups. We propose partially additive linear model trees (PALM trees) as an extension of (generalised) linear model trees (LM and GLM trees, respectively), in which the model parameters are specified a priori to be estimated either globally from all observations or locally from the observations within the subgroups determined by the tree. Simulations show that the method has high power for detecting subgroups in the presence of global effects and reliably recovers the true parameters. Furthermore, treatment–subgroup differences are detected in an empirical application of the method to data from a mathematics exam: the PALM tree is able to detect a small subgroup of students that had a disadvantage in an exam with two versions while adjusting for overall ability effects.
      PubDate: 2018-10-05
      DOI: 10.1007/s11634-018-0342-1
       
  • A classification tree approach for the modeling of competing risks in
           discrete time
    • Authors: Moritz Berger; Thomas Welchowski; Steffen Schmitz-Valckenberg; Matthias Schmid
      Abstract: Cause-specific hazard models are a popular tool for the analysis of competing risks data. The classical modeling approach in discrete time consists of fitting parametric multinomial logit models. A drawback of this method is that the focus is on main effects only, and that higher order interactions are hard to handle. Moreover, the resulting models contain a large number of parameters, which may cause numerical problems when estimating coefficients. To overcome these problems, a tree-based model is proposed that extends the survival tree methodology developed previously for time-to-event models with one single type of event. The performance of the method, compared with several competitors, is investigated in simulations. The usefulness of the proposed approach is demonstrated by an analysis of age-related macular degeneration among elderly people that were monitored by annual study visits.
      PubDate: 2018-09-28
      DOI: 10.1007/s11634-018-0345-y
       
  • Variable selection in discriminant analysis for mixed continuous-binary
           variables and several groups
    • Authors: Alban Mbina Mbina; Guy Martial Nkiet; Fulgence Eyi Obiang
      Abstract: We propose a method for variable selection in discriminant analysis with mixed continuous and binary variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating suitable permutation and dimensionality. Then, estimators for these parameters are proposed and the resulting method for selecting variables is shown to be consistent. A simulation study that permits to study several properties of the proposed approach and to compare it with an existing method is given, and an example on a real data set is provided.
      PubDate: 2018-09-21
      DOI: 10.1007/s11634-018-0343-0
       
  • Bayesian nonstationary Gaussian process models via treed process
           convolutions
    • Abstract: The Gaussian process is a common model in a wide variety of applications, such as environmental modeling, computer experiments, and geology. Two major challenges often arise: First, assuming that the process of interest is stationary over the entire domain often proves to be untenable. Second, the traditional Gaussian process model formulation is computationally inefficient for large datasets. In this paper, we propose a new Gaussian process model to tackle these problems based on the convolution of a smoothing kernel with a partitioned latent process. Nonstationarity can be modeled by allowing a separate latent process for each partition, which approximates a regional clustering structure. Partitioning follows a binary tree generating process similar to that of Classification and Regression Trees. A Bayesian approach is used to estimate the partitioning structure and model parameters simultaneously. Our motivating dataset consists of 11918 precipitation anomalies. Results show that our model has promising prediction performance and is computationally efficient for large datasets.
      PubDate: 2018-09-15
      DOI: 10.1007/s11634-018-0341-2
       
 
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
 
About JournalTOCs
API
Help
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-