for Journals by Title or ISSN for Articles by Keywords help
 Subjects -> COMPUTER SCIENCE (Total: 2122 journals)     - ANIMATION AND SIMULATION (31 journals)    - ARTIFICIAL INTELLIGENCE (105 journals)    - AUTOMATION AND ROBOTICS (105 journals)    - CLOUD COMPUTING AND NETWORKS (67 journals)    - COMPUTER ARCHITECTURE (10 journals)    - COMPUTER ENGINEERING (11 journals)    - COMPUTER GAMES (21 journals)    - COMPUTER PROGRAMMING (26 journals)    - COMPUTER SCIENCE (1231 journals)    - COMPUTER SECURITY (50 journals)    - DATA BASE MANAGEMENT (14 journals)    - DATA MINING (38 journals)    - E-BUSINESS (22 journals)    - E-LEARNING (30 journals)    - ELECTRONIC DATA PROCESSING (22 journals)    - IMAGE AND VIDEO PROCESSING (40 journals)    - INFORMATION SYSTEMS (107 journals)    - INTERNET (96 journals)    - SOCIAL WEB (53 journals)    - SOFTWARE (34 journals)    - THEORY OF COMPUTING (9 journals) COMPUTER SCIENCE (1231 journals)                  1 2 3 4 5 6 7 | Last
 Showing 1 - 200 of 872 Journals sorted alphabetically 3D Printing and Additive Manufacturing       (Followers: 24) Abakós       (Followers: 4) ACM Computing Surveys       (Followers: 31) ACM Journal on Computing and Cultural Heritage       (Followers: 8) ACM Journal on Emerging Technologies in Computing Systems       (Followers: 17) ACM Transactions on Accessible Computing (TACCESS)       (Followers: 3) ACM Transactions on Algorithms (TALG)       (Followers: 15) ACM Transactions on Applied Perception (TAP)       (Followers: 5) ACM Transactions on Architecture and Code Optimization (TACO)       (Followers: 9) ACM Transactions on Autonomous and Adaptive Systems (TAAS)       (Followers: 9) ACM Transactions on Computation Theory (TOCT)       (Followers: 12) ACM Transactions on Computational Logic (TOCL)       (Followers: 3) ACM Transactions on Computer Systems (TOCS)       (Followers: 18) ACM Transactions on Computer-Human Interaction       (Followers: 16) ACM Transactions on Computing Education (TOCE)       (Followers: 7) ACM Transactions on Design Automation of Electronic Systems (TODAES)       (Followers: 6) ACM Transactions on Economics and Computation       (Followers: 2) ACM Transactions on Embedded Computing Systems (TECS)       (Followers: 3) ACM Transactions on Information Systems (TOIS)       (Followers: 20) ACM Transactions on Intelligent Systems and Technology (TIST)       (Followers: 8) ACM Transactions on Interactive Intelligent Systems (TiiS)       (Followers: 5) ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)       (Followers: 9) ACM Transactions on Reconfigurable Technology and Systems (TRETS)       (Followers: 6) ACM Transactions on Sensor Networks (TOSN)       (Followers: 8) ACM Transactions on Speech and Language Processing (TSLP)       (Followers: 9) ACM Transactions on Storage ACS Applied Materials & Interfaces       (Followers: 35) Acta Automatica Sinica       (Followers: 2) Acta Informatica Malaysia Acta Universitatis Cibiniensis. Technical Series Ad Hoc Networks       (Followers: 11) Adaptive Behavior       (Followers: 10) Advanced Engineering Materials       (Followers: 29) Advanced Science Letters       (Followers: 11) Advances in Adaptive Data Analysis       (Followers: 7) Advances in Artificial Intelligence       (Followers: 15) Advances in Calculus of Variations       (Followers: 6) Advances in Catalysis       (Followers: 5) Advances in Computational Mathematics       (Followers: 19) Advances in Computer Engineering       (Followers: 4) Advances in Computer Science : an International Journal       (Followers: 14) Advances in Computing       (Followers: 2) Advances in Data Analysis and Classification       (Followers: 59) Advances in Engineering Software       (Followers: 28) Advances in Geosciences (ADGEO)       (Followers: 14) Advances in Human Factors/Ergonomics       (Followers: 23) Advances in Human-Computer Interaction       (Followers: 21) Advances in Materials Science       (Followers: 15) Advances in Operations Research       (Followers: 12) Advances in Parallel Computing       (Followers: 7) Advances in Porous Media       (Followers: 5) Advances in Remote Sensing       (Followers: 51) Advances in Science and Research (ASR)       (Followers: 6) Advances in Technology Innovation       (Followers: 6) AEU - International Journal of Electronics and Communications       (Followers: 8) African Journal of Information and Communication       (Followers: 9) African Journal of Mathematics and Computer Science Research       (Followers: 4) AI EDAM       (Followers: 1) Air, Soil & Water Research       (Followers: 14) AIS Transactions on Human-Computer Interaction       (Followers: 7) Algebras and Representation Theory       (Followers: 1) Algorithms       (Followers: 11) American Journal of Computational and Applied Mathematics       (Followers: 5) American Journal of Computational Mathematics       (Followers: 4) American Journal of Information Systems       (Followers: 6) American Journal of Sensor Technology       (Followers: 4) Anais da Academia Brasileira de CiÃªncias       (Followers: 2) Analog Integrated Circuits and Signal Processing       (Followers: 7) Analysis in Theory and Applications       (Followers: 1) Animation Practice, Process & Production       (Followers: 5) Annals of Combinatorics       (Followers: 4) Annals of Data Science       (Followers: 12) Annals of Mathematics and Artificial Intelligence       (Followers: 12) Annals of Pure and Applied Logic       (Followers: 3) Annals of Software Engineering       (Followers: 13) Annals of West University of Timisoara - Mathematics and Computer Science Annual Reviews in Control       (Followers: 8) Anuario Americanista Europeo Applicable Algebra in Engineering, Communication and Computing       (Followers: 2) Applied and Computational Harmonic Analysis       (Followers: 1) Applied Artificial Intelligence: An International Journal       (Followers: 12) Applied Categorical Structures       (Followers: 5) Applied Clinical Informatics       (Followers: 2) Applied Computational Intelligence and Soft Computing       (Followers: 14) Applied Computer Systems       (Followers: 2) Applied Informatics Applied Mathematics and Computation       (Followers: 33) Applied Medical Informatics       (Followers: 11) Applied Numerical Mathematics       (Followers: 5) Applied Soft Computing       (Followers: 17) Applied Spatial Analysis and Policy       (Followers: 7) Applied System Innovation Architectural Theory Review       (Followers: 3) Archive of Applied Mechanics       (Followers: 6) Archive of Numerical Software Archives and Museum Informatics       (Followers: 152) Archives of Computational Methods in Engineering       (Followers: 6) arq: Architectural Research Quarterly       (Followers: 8) Artifact       (Followers: 2) Artificial Life       (Followers: 7) Asia Pacific Journal on Computational Engineering Asia-Pacific Journal of Information Technology and Multimedia       (Followers: 1) Asian Journal of Control Assembly Automation       (Followers: 2) at - Automatisierungstechnik       (Followers: 1) Australian Educational Computing       (Followers: 1) Automatic Control and Computer Sciences       (Followers: 6) Automatic Documentation and Mathematical Linguistics       (Followers: 5) Automatica       (Followers: 12) Automation in Construction       (Followers: 7) Autonomous Mental Development, IEEE Transactions on       (Followers: 8) Balkan Journal of Electrical and Computer Engineering Basin Research       (Followers: 5) Behaviour & Information Technology       (Followers: 51) Big Data and Cognitive Computing       (Followers: 3) Biodiversity Information Science and Standards Bioinformatics       (Followers: 323) Biomedical Engineering       (Followers: 16) Biomedical Engineering and Computational Biology       (Followers: 13) Biomedical Engineering, IEEE Reviews in       (Followers: 19) Biomedical Engineering, IEEE Transactions on       (Followers: 35) Briefings in Bioinformatics       (Followers: 51) British Journal of Educational Technology       (Followers: 157) Broadcasting, IEEE Transactions on       (Followers: 12) c't Magazin fuer Computertechnik       (Followers: 1) CALCOLO Calphad       (Followers: 2) Canadian Journal of Electrical and Computer Engineering       (Followers: 15) Capturing Intelligence Catalysis in Industry       (Followers: 1) CEAS Space Journal       (Followers: 2) Cell Communication and Signaling       (Followers: 2) Central European Journal of Computer Science       (Followers: 5) CERN IdeaSquare Journal of Experimental Innovation       (Followers: 3) Chaos, Solitons & Fractals       (Followers: 3) Chemometrics and Intelligent Laboratory Systems       (Followers: 15) ChemSusChem       (Followers: 7) China Communications       (Followers: 8) Chinese Journal of Catalysis       (Followers: 2) CIN Computers Informatics Nursing       (Followers: 11) Circuits and Systems       (Followers: 15) Clean Air Journal       (Followers: 1) CLEI Electronic Journal Clin-Alert       (Followers: 1) Clinical eHealth Cluster Computing       (Followers: 2) Cognitive Computation       (Followers: 4) COMBINATORICA Combinatorics, Probability and Computing       (Followers: 4) Combustion Theory and Modelling       (Followers: 14) Communication Methods and Measures       (Followers: 13) Communication Theory       (Followers: 24) Communications Engineer       (Followers: 1) Communications in Algebra       (Followers: 3) Communications in Computational Physics       (Followers: 2) Communications in Information Science and Management Engineering       (Followers: 4) Communications in Partial Differential Equations       (Followers: 4) Communications of the ACM       (Followers: 51) Communications of the Association for Information Systems       (Followers: 16) COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering       (Followers: 3) Complex & Intelligent Systems       (Followers: 1) Complex Adaptive Systems Modeling Complex Analysis and Operator Theory       (Followers: 2) Complexity       (Followers: 6) Complexus Composite Materials Series       (Followers: 8) Computación y Sistemas Computation       (Followers: 1) Computational and Applied Mathematics       (Followers: 3) Computational and Mathematical Biophysics       (Followers: 1) Computational and Mathematical Methods in Medicine       (Followers: 2) Computational and Mathematical Organization Theory       (Followers: 2) Computational and Structural Biotechnology Journal       (Followers: 1) Computational and Theoretical Chemistry       (Followers: 9) Computational Astrophysics and Cosmology       (Followers: 1) Computational Biology and Chemistry       (Followers: 12) Computational Chemistry       (Followers: 2) Computational Cognitive Science       (Followers: 2) Computational Complexity       (Followers: 4) Computational Condensed Matter       (Followers: 1) Computational Ecology and Software       (Followers: 9) Computational Economics       (Followers: 9) Computational Geosciences       (Followers: 17) Computational Linguistics       (Followers: 24) Computational Management Science Computational Mathematics and Modeling       (Followers: 8) Computational Mechanics       (Followers: 5) Computational Methods and Function Theory Computational Molecular Bioscience       (Followers: 2) Computational Optimization and Applications       (Followers: 8) Computational Particle Mechanics       (Followers: 1) Computational Research       (Followers: 1) Computational Science and Discovery       (Followers: 2) Computational Science and Techniques Computational Statistics       (Followers: 14) Computational Statistics & Data Analysis       (Followers: 35) Computer       (Followers: 105) Computer Aided Surgery       (Followers: 6) Computer Applications in Engineering Education       (Followers: 8) Computer Communications       (Followers: 16)
 Advances in Data Analysis and ClassificationJournal Prestige (SJR): 1.09 Citation Impact (citeScore): 1Number of Followers: 59      Hybrid journal (It can contain Open Access articles) ISSN (Print) 1862-5355 - ISSN (Online) 1862-5347 Published by Springer-Verlag  [2352 journals]
• Exploration of the variability of variable selection based on distances
between bootstrap sample results
• Abstract: It is well known that variable selection in multiple regression can be unstable and that the model uncertainty can be considerable. The model uncertainty can be quantified and explored by bootstrap resampling, see Sauerbrei et al. (Biom J 57:531–555, 2015). Here approaches are introduced that use the results of bootstrap replications of the variable selection process to obtain more detailed information about the data. Analyses will be based on dissimilarities between the results of the analyses of different bootstrap samples. Dissimilarities are computed between the vector of predictions, and between the sets of selected variables. The dissimilarities are used to map the models by multidimensional scaling, to cluster them, and to construct heatplots. Clusters can point to different interpretations of the data that could arise from different selections of variables supported by different bootstrap samples. A new measure of variable selection instability is also defined. The methodology can be applied to various regression models, estimators, and variable selection methods. It will be illustrated by three real data examples, using linear regression and a Cox proportional hazards model, and model selection by AIC and BIC.
PubDate: 2019-02-15

• Discriminant analysis for discrete variables derived from a
tree-structured graphical model
• Authors: Gonzalo Perez-de-la-Cruz; Guillermina Eslava-Gomez
Abstract: The purpose of this paper is to illustrate the potential use of discriminant analysis for discrete variables whose dependence structure is assumed to follow, or can be approximated by, a tree-structured graphical model. This is done by comparing its empirical performance, using estimated error rates for real and simulated data, with the well-known Naive Bayes classification rule and with linear logistic regression, both of which do not consider any interaction between variables, and with models that consider interactions like a decomposable and the saturated model. The results show that discriminant analysis based on tree-structured graphical models, a simple nonlinear method including only some of the pairwise interactions between variables, is competitive with, and sometimes superior to, other methods which assume no interactions, and has the advantage over more complex decomposable models of finding the graph structure in a fast way and exact form.
PubDate: 2019-02-12
DOI: 10.1007/s11634-019-00352-z

• Ensemble of a subset of k NN classifiers
• Authors: Asma Gul; Aris Perperoglou; Zardad Khan; Osama Mahmoud; Miftahuddin Miftahuddin; Werner Adler; Berthold Lausen
Pages: 827 - 840
Abstract: Combining multiple classifiers, known as ensemble methods, can give substantial improvement in prediction performance of learning algorithms especially in the presence of non-informative features in the data sets. We propose an ensemble of subset of kNN classifiers, ESkNN, for classification task in two steps. Firstly, we choose classifiers based upon their individual performance using the out-of-sample accuracy. The selected classifiers are then combined sequentially starting from the best model and assessed for collective performance on a validation data set. We use bench mark data sets with their original and some added non-informative features for the evaluation of our method. The results are compared with usual kNN, bagged kNN, random kNN, multiple feature subset method, random forest and support vector machines. Our experimental comparisons on benchmark classification problems and simulated data sets reveal that the proposed ensemble gives better classification performance than the usual kNN and its ensembles, and performs comparable to random forest and support vector machines.
PubDate: 2018-12-01
DOI: 10.1007/s11634-015-0227-5
Issue No: Vol. 12, No. 4 (2018)

• Understanding non-linear modeling of measurement invariance in
heterogeneous populations
• Authors: Deana Desa
Pages: 841 - 865
Abstract: This study examined how a non-linear modeling of ordered categorical variables within multiple-group confirmatory factor analysis supported measurement invariance. A four-item classroom disciplinary climate scale used in cross-cultural framework was empirically investigated. In the first part of the analysis, a separated categorical confirmatory factor analysis was initially applied to account for the complex structure of the relationships between the observed measures in each country. The categorical multiple-group confirmatory factor analysis (MGCFA) was then used to conduct a cross-country examination of full measurement invariance namely the configural, metric, and scalar levels of invariance in the classroom discipline climate measures. The categorical MGCFA modeling supported configural and metric invariances as well as scalar invariance for the latent factor structure of classroom disciplinary climate. This finding implying meaningful cross-country comparisons on the scale means, on the associations of classroom disciplinary climate scale with other scales and on the item-factor latent structure. Application of the categorical modeling appeared to correctly specify the factor structure of the scale, thereby promising the appropriateness of reporting comparisons such as rankings of many groups, and illustrating league tables of different heterogeneous groups. Limitations of the modeling in this study and future suggestions for measurement invariance testing in studies with large numbers of groups are discussed.
PubDate: 2018-12-01
DOI: 10.1007/s11634-016-0240-3
Issue No: Vol. 12, No. 4 (2018)

• A comparative study on large scale kernelized support vector machines
• Authors: Daniel Horn; Aydın Demircioğlu; Bernd Bischl; Tobias Glasmachers; Claus Weihs
Pages: 867 - 883
Abstract: Kernelized support vector machines (SVMs) belong to the most widely used classification methods. However, in contrast to linear SVMs, the computation time required to train such a machine becomes a bottleneck when facing large data sets. In order to mitigate this shortcoming of kernel SVMs, many approximate training algorithms were developed. While most of these methods claim to be much faster than the state-of-the-art solver LIBSVM, a thorough comparative study is missing. We aim to fill this gap. We choose several well-known approximate SVM solvers and compare their performance on a number of large benchmark data sets. Our focus is to analyze the trade-off between prediction error and runtime for different learning and accuracy parameter settings. This includes simple subsampling of the data, the poor-man’s approach to handling large scale problems. We employ model-based multi-objective optimization, which allows us to tune the parameters of learning machine and solver over the full range of accuracy/runtime trade-offs. We analyze (differences between) solvers by studying and comparing the Pareto fronts formed by the two objectives classification error and training time. Unsurprisingly, given more runtime most solvers are able to find more accurate solutions, i.e., achieve a higher prediction accuracy. It turns out that LIBSVM with subsampling of the data is a strong baseline. Some solvers systematically outperform others, which allows us to give concrete recommendations of when to use which solver.
PubDate: 2018-12-01
DOI: 10.1007/s11634-016-0265-7
Issue No: Vol. 12, No. 4 (2018)

• A computationally fast variable importance test for random forests for
high-dimensional data
• Authors: Silke Janitza; Ender Celik; Anne-Laure Boulesteix
Pages: 885 - 915
Abstract: Random forests are a commonly used tool for classification and for ranking candidate predictors based on the so-called variable importance measures. These measures attribute scores to the variables reflecting their importance. A drawback of variable importance measures is that there is no natural cutoff that can be used to discriminate between important and non-important variables. Several approaches, for example approaches based on hypothesis testing, were developed for addressing this problem. The existing testing approaches require the repeated computation of random forests. While for low-dimensional settings those approaches might be computationally tractable, for high-dimensional settings typically including thousands of candidate predictors, computing time is enormous. In this article a computationally fast heuristic variable importance test is proposed that is appropriate for high-dimensional data where many variables do not carry any information. The testing approach is based on a modified version of the permutation variable importance, which is inspired by cross-validation procedures. The new approach is tested and compared to the approach of Altmann and colleagues using simulation studies, which are based on real data from high-dimensional binary classification settings. The new approach controls the type I error and has at least comparable power at a substantially smaller computation time in the studies. Thus, it might be used as a computationally fast alternative to existing procedures for high-dimensional data settings where many variables do not carry any information. The new approach is implemented in the R package vita.
PubDate: 2018-12-01
DOI: 10.1007/s11634-016-0276-4
Issue No: Vol. 12, No. 4 (2018)

• Rank-based classifiers for extremely high-dimensional gene expression data
• Authors: Ludwig Lausser; Florian Schmid; Lyn-Rouven Schirra; Adalbert F. X. Wilhelm; Hans A. Kestler
Pages: 917 - 936
Abstract: Predicting phenotypes on the basis of gene expression profiles is a classification task that is becoming increasingly important in the field of precision medicine. Although these expression signals are real-valued, it is questionable if they can be analyzed on an interval scale. As with many biological signals their influence on e.g. protein levels is usually non-linear and thus can be misinterpreted. In this article we study gene expression profiles with up to 54,000 dimensions. We analyze these measurements on an ordinal scale by replacing the real-valued profiles by their ranks. This type of rank transformation can be used for the construction of invariant classifiers that are not affected by noise induced by data transformations which can occur in the measurement setup. Our 10 $$\times$$ 10 fold cross-validation experiments on 86 different data sets and 19 different classification models indicate that classifiers largely benefit from this transformation. Especially random forests and support vector machines achieve improved classification results on a significant majority of datasets.
PubDate: 2018-12-01
DOI: 10.1007/s11634-016-0277-3
Issue No: Vol. 12, No. 4 (2018)

• Ensemble feature selection for high dimensional data: a new method and a
comparative study
• Authors: Afef Ben Brahim; Mohamed Limam
Pages: 937 - 952
Abstract: The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for such problems where there is a need to reduce the data dimensionality. Different feature selection algorithms may yield feature subsets that can be considered local optima in the space of feature subsets. Ensemble feature selection combines independent feature subsets and might give a better approximation to the optimal subset of features. We propose an ensemble feature selection approach based on feature selectors’ reliability assessment. It aims at providing a unique and stable feature selection without ignoring the predictive accuracy aspect. A classification algorithm is used as an evaluator to assign a confidence to features selected by ensemble members based on their associated classification performance. We compare our proposed approach to several existing techniques and to individual feature selection algorithms. Results show that our approach often improves classification performance and feature selection stability for high dimensional data sets.
PubDate: 2018-12-01
DOI: 10.1007/s11634-017-0285-y
Issue No: Vol. 12, No. 4 (2018)

• An efficient random forests algorithm for high dimensional data
classification
• Authors: Qiang Wang; Thanh-Tung Nguyen; Joshua Z. Huang; Thuy Thi Nguyen
Pages: 953 - 972
Abstract: In this paper, we propose a new random forest (RF) algorithm to deal with high dimensional data for classification using subspace feature sampling method and feature value searching. The new subspace sampling method maintains the diversity and randomness of the forest and enables one to generate trees with a lower prediction error. A greedy technique is used to handle cardinal categorical features for efficient node splitting when building decision trees in the forest. This allows trees to handle very high cardinality meanwhile reducing computational time in building the RF model. Extensive experiments on high dimensional real data sets including standard machine learning data sets and image data sets have been conducted. The results demonstrated that the proposed approach for learning RFs significantly reduced prediction errors and outperformed most existing RFs when dealing with high-dimensional data.
PubDate: 2018-12-01
DOI: 10.1007/s11634-018-0318-1
Issue No: Vol. 12, No. 4 (2018)

• Equi-Clustream: a framework for clustering time evolving mixed data
• Authors: Ravi Sankar Sangam; Hari Om
Pages: 973 - 995
Abstract: In data stream environment, most of the conventional clustering algorithms are not sufficiently efficient, since large volumes of data arrive in a stream and these data points unfold with time. The problem of clustering time-evolving metric data and categorical time-evolving data has separately been well explored in recent years, but the problem of clustering mixed type time-evolving data remains a challenging issue due to an awkward gap between the structure of metric and categorical attributes. In this paper, we devise a generalized framework, termed Equi-Clustream to dynamically cluster mixed type time-evolving data, which comprises three algorithms: a Hybrid Drifting Concept Detection Algorithm that detects the drifting concept between the current sliding window and previous sliding window, a Hybrid Data Labeling Algorithm that assigns an appropriate cluster label to each data vector of the current non-drifting window based on the clustering result of the previous sliding window, and a visualization algorithm that analyses the relationship between the clusters at different timestamps and also visualizes the evolving trends of the clusters. The efficacy of the proposed framework is shown by experiments on synthetic and real world datasets.
PubDate: 2018-12-01
DOI: 10.1007/s11634-018-0316-3
Issue No: Vol. 12, No. 4 (2018)

• Convex clustering for binary data
• Authors: Hosik Choi; Seokho Lee
Abstract: We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a natural distribution for binary data and by formulating convex clustering using a pairwise penalty on prototypes of clusters. Under convex clustering, we show that the typical $$\ell _1$$ pairwise fused penalty results in ineffective cluster formation. In an attempt to promote the clustering performance and select the relevant clustering variables, we propose the penalized maximum likelihood estimation with an $$\ell _2$$ fused penalty on the fusion parameters and an $$\ell _1$$ penalty on the loading matrix. We provide an efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers. Numerical studies confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.
PubDate: 2018-11-14
DOI: 10.1007/s11634-018-0350-1

• Special issue on “Science of big data: theory, methods and
applications”
• Authors: Hans A. Kestler; Paul D. McNicholas; Adalbert F. X. Wilhelm
PubDate: 2018-11-01
DOI: 10.1007/s11634-018-0349-7

• Orthogonal nonnegative matrix tri-factorization based on Tweedie
distributions
• Authors: Hiroyasu Abe; Hiroshi Yadohisa
Abstract: Orthogonal nonnegative matrix tri-factorization (ONMTF) is a biclustering method using a given nonnegative data matrix and has been applied to document-term clustering, collaborative filtering, and so on. In previously proposed ONMTF methods, it is assumed that the error distribution is normal. However, the assumption of normal distribution is not always appropriate for nonnegative data. In this paper, we propose three new ONMTF methods, which respectively employ the following error distributions: normal, Poisson, and compound Poisson. To develop the new methods, we adopt a k-means based algorithm but not a multiplicative updating algorithm, which was the main method used for obtaining estimators in previous methods. A simulation study and an application involving document-term matrices demonstrate that our method can outperform previous methods, in terms of the goodness of clustering and in the estimation of the factor matrix.
PubDate: 2018-10-25
DOI: 10.1007/s11634-018-0348-8

• Random effects clustering in multilevel modeling: choosing a proper
partition
• Authors: Claudio Conversano; Massimo Cannas; Francesco Mola; Emiliano Sironi
Abstract: A novel criterion for estimating a latent partition of the observed groups based on the output of a hierarchical model is presented. It is based on a loss function combining the Gini income inequality ratio and the predictability index of Goodman and Kruskal in order to achieve maximum heterogeneity of random effects across groups and maximum homogeneity of predicted probabilities inside estimated clusters. The index is compared with alternative approaches in a simulation study and applied in a case study concerning the role of hospital level variables in deciding for a cesarean section.
PubDate: 2018-10-12
DOI: 10.1007/s11634-018-0347-9

• Supervised learning via smoothed Polya trees
• Authors: William Cipolli; Timothy Hanson
Abstract: We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) (Cox in J R Stat Soc Ser B (Methodol) 20:215–242, 1958) and Linear Discriminant Analysis (LDA) (Fisher in Ann Eugen 7:179–188, 1936; Rao in J R Stat Soc Ser B 10:159–203, 1948) to the Bayesian nonparametric setting, providing a competitor to MclustDA (Fraley and Raftery in Am Stat Assoc 97:611–631, 2002). This approach models the data distribution for each class using a multivariate Polya tree and realizes impressive results in simulations and real data analyses. The flexibility gained from further relaxing the distributional assumptions of QDA can greatly improve the ability to correctly classify new observations for models with severe deviations from parametric distributional assumptions, while still performing well when the assumptions hold. The proposed method is quite fast compared to other supervised classifiers and very simple to implement as there are no kernel tricks or initialization steps perhaps making it one of the more user-friendly approaches to supervised learning. This highlights a significant feature of the proposed methodology as suboptimal tuning can greatly hamper classification performance; e.g., SVMs fit with non-optimal kernels perform significantly worse.
PubDate: 2018-10-12
DOI: 10.1007/s11634-018-0344-z

• sARI: a soft agreement measure for class partitions incorporating
assignment probabilities
• Authors: Abby Flynt; Nema Dean; Rebecca Nugent
Abstract: Agreement indices are commonly used to summarize the performance of both classification and clustering methods. The easy interpretation/intuition and desirable properties that result from the Rand and adjusted Rand indices, has led to their popularity over other available indices. While more algorithmic clustering approaches like k-means and hierarchical clustering produce hard partition assignments (assigning observations to a single cluster), other techniques like model-based clustering include information about the certainty of allocation of objects through class membership probabilities (soft partitions). To assess performance using traditional indices, e.g., the adjusted Rand index (ARI), the soft partition is mapped to a hard set of assignments, which commonly overstates the certainty of correct assignments. This paper proposes an extension of the ARI, the soft adjusted Rand index (sARI), with similar intuition and interpretation but also incorporating information from one or two soft partitions. It can be used in conjunction with the ARI, comparing the similarities of hard to soft, or soft to soft partitions to the similarities of the mapped hard partitions. Simulation study results support the intuition that in general, mapping to hard partitions tends to increase the measure of similarity between partitions. In applications, the sARI more accurately reflects the cluster boundary overlap commonly seen in real data.
PubDate: 2018-10-09
DOI: 10.1007/s11634-018-0346-x

• Generalised linear model trees with global additive effects
• Authors: Heidi Seibold; Torsten Hothorn; Achim Zeileis
Abstract: Model-based trees are used to find subgroups in data which differ with respect to model parameters. In some applications it is natural to keep some parameters fixed globally for all observations while asking if and how other parameters vary across subgroups. Existing implementations of model-based trees can only deal with the scenario where all parameters depend on the subgroups. We propose partially additive linear model trees (PALM trees) as an extension of (generalised) linear model trees (LM and GLM trees, respectively), in which the model parameters are specified a priori to be estimated either globally from all observations or locally from the observations within the subgroups determined by the tree. Simulations show that the method has high power for detecting subgroups in the presence of global effects and reliably recovers the true parameters. Furthermore, treatment–subgroup differences are detected in an empirical application of the method to data from a mathematics exam: the PALM tree is able to detect a small subgroup of students that had a disadvantage in an exam with two versions while adjusting for overall ability effects.
PubDate: 2018-10-05
DOI: 10.1007/s11634-018-0342-1

• A classification tree approach for the modeling of competing risks in
discrete time
• Authors: Moritz Berger; Thomas Welchowski; Steffen Schmitz-Valckenberg; Matthias Schmid
Abstract: Cause-specific hazard models are a popular tool for the analysis of competing risks data. The classical modeling approach in discrete time consists of fitting parametric multinomial logit models. A drawback of this method is that the focus is on main effects only, and that higher order interactions are hard to handle. Moreover, the resulting models contain a large number of parameters, which may cause numerical problems when estimating coefficients. To overcome these problems, a tree-based model is proposed that extends the survival tree methodology developed previously for time-to-event models with one single type of event. The performance of the method, compared with several competitors, is investigated in simulations. The usefulness of the proposed approach is demonstrated by an analysis of age-related macular degeneration among elderly people that were monitored by annual study visits.
PubDate: 2018-09-28
DOI: 10.1007/s11634-018-0345-y

• Variable selection in discriminant analysis for mixed continuous-binary
variables and several groups
• Authors: Alban Mbina Mbina; Guy Martial Nkiet; Fulgence Eyi Obiang
Abstract: We propose a method for variable selection in discriminant analysis with mixed continuous and binary variables. This method is based on a criterion that permits to reduce the variable selection problem to a problem of estimating suitable permutation and dimensionality. Then, estimators for these parameters are proposed and the resulting method for selecting variables is shown to be consistent. A simulation study that permits to study several properties of the proposed approach and to compare it with an existing method is given, and an example on a real data set is provided.
PubDate: 2018-09-21
DOI: 10.1007/s11634-018-0343-0

• Bayesian nonstationary Gaussian process models via treed process
convolutions
• Abstract: The Gaussian process is a common model in a wide variety of applications, such as environmental modeling, computer experiments, and geology. Two major challenges often arise: First, assuming that the process of interest is stationary over the entire domain often proves to be untenable. Second, the traditional Gaussian process model formulation is computationally inefficient for large datasets. In this paper, we propose a new Gaussian process model to tackle these problems based on the convolution of a smoothing kernel with a partitioned latent process. Nonstationarity can be modeled by allowing a separate latent process for each partition, which approximates a regional clustering structure. Partitioning follows a binary tree generating process similar to that of Classification and Regression Trees. A Bayesian approach is used to estimate the partitioning structure and model parameters simultaneously. Our motivating dataset consists of 11918 precipitation anomalies. Results show that our model has promising prediction performance and is computationally efficient for large datasets.
PubDate: 2018-09-15
DOI: 10.1007/s11634-018-0341-2

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327

About JournalTOCs
API
Help
News (blog, publications)

JournalTOCs © 2009-