for Journals by Title or ISSN for Articles by Keywords help
 Subjects -> COMPUTER SCIENCE (Total: 2050 journals)     - ANIMATION AND SIMULATION (30 journals)    - ARTIFICIAL INTELLIGENCE (99 journals)    - AUTOMATION AND ROBOTICS (100 journals)    - CLOUD COMPUTING AND NETWORKS (64 journals)    - COMPUTER ARCHITECTURE (9 journals)    - COMPUTER ENGINEERING (10 journals)    - COMPUTER GAMES (16 journals)    - COMPUTER PROGRAMMING (27 journals)    - COMPUTER SCIENCE (1196 journals)    - COMPUTER SECURITY (46 journals)    - DATA BASE MANAGEMENT (14 journals)    - DATA MINING (32 journals)    - E-BUSINESS (22 journals)    - E-LEARNING (28 journals)    - ELECTRONIC DATA PROCESSING (22 journals)    - IMAGE AND VIDEO PROCESSING (39 journals)    - INFORMATION SYSTEMS (109 journals)    - INTERNET (94 journals)    - SOCIAL WEB (51 journals)    - SOFTWARE (34 journals)    - THEORY OF COMPUTING (8 journals) COMPUTER SCIENCE (1196 journals)                  1 2 3 4 5 6 | Last
 Annals of Data Science   [11 followers]  Follow         Hybrid journal (It can contain Open Access articles)    ISSN (Print) 2198-5804 - ISSN (Online) 2198-5812    Published by Springer-Verlag  [2352 journals]
• A Novel Multiview Topic Model to Compute Correlation of Heterogeneous Data
• Authors: Jinsheng Shen; Mingmin Chi
Abstract: With fast development of Internet technologies and sensor techniques, it is much easier to acquire data from different sources in different dates and times. However, how to compute the correlation of those heterogeneous data is a big challenge for data mining and information retrieval. Here, data feature from one source is called as a view, and the multiview features denote the same data point. In the paper, hidden correlation of two-view features is proposed to construct a Heterogeneous (multiview) Topic Model (HTM). In particular, probabilistic topic model is utilized for different views as usually, generative models provide much richer features when handling high-dimensional data such as texts. Nevertheless, it is necessary to know the form of probability distribution for most existent probabilistic topic models, such as latent Dirichlet allocation. By avoiding the limitation of probabilistic topic model, the HTM is reduced to solving a non-negative matrix tri-factorization problem with certain constraints such that the proposed approach can be used in terms of an arbitrary model.
PubDate: 2018-02-20
DOI: 10.1007/s40745-017-0135-y

• Assessing Survival Time of Women with Cervical Cancer Using Various
Parametric Frailty Models: A Case Study at Tikur Anbessa Specialized
• Authors: Selamawit Endale Gurmu
Abstract: Cervical cancer is one of the leading causes of death in the world and represents a tremendous burden on patients, families and societies. It is estimated that over one million women worldwide currently have cervical cancer; most of them have not been diagnosed or have no access to treatment that could cure them or prolong their lives. The goal of this study is to investigate potential risk factors affecting survival time of women with cervical cancer at Tikur Anbessa specialized hospital. Data were taken from patients’ medical record card that enrolled during September 2011–September 2015. Kaplan–Meier estimation method, Cox proportional hazard model and parametric shared frailty model were used to analysis survival time of cervical cancer patients. Study subjects (cervical cancer patients) in this study came from clustered community and hence clustered survival data correlated at the regional level. Parametric frailty models will be explored assuming that women with in the same cluster (region for this study) shares similar risk factors. We used Exponential, Weibull, Log logistics and Log normal distributions and based on AIC criteria, all models were compared for their performance. The lognormal inverse Gaussian model has the minimum AIC value among the models compared. The results implied that not giving birth up to the study ends and married after twenty years were significantly prolong the survival time of patients while age class 51–60, 61–70, > 70, smoking cigarettes, patients with stage III and IV disease, family history of cervical cancer, history of abortion and living with HIV AIDS were significantly shorten survival time of patients. The findings of this study suggested that age, smoking cigarettes, stage, family history, abortion history, living with HIV AIDS, age at first marriage and age at first birth were major factors to survival time of patients. Heterogeneity between the regions in the survival time of cervical cancer patients, indicating that one needs to account for this clustering variable using frailty models. The fit statistics showed that lognormal inverse-Gaussian frailty model described the survival time of cervical cancer patients dataset better than other distributions used in this study.
PubDate: 2018-02-17
DOI: 10.1007/s40745-018-0150-7

• Build a Tourism-Specific Sentiment Lexicon Via Word2vec
• Authors: Wei Li; Luyao Zhu; Kun Guo; Yong Shi; Yuanchun Zheng
Abstract: Online travel and online travel culture developed fast in China recently years while useful knowledge still hidden under a large number of tourism reviews. Therefore, we need effective sentiment analysis methods to mine useful knowledge which can help tourism websites make decisions and improve their travel products. Some data-driven sentiment lexicons have poor performance on sentiment polarity classification due to lack of semantic information. Thus, we propose an effective and more proper data-driven sentiment lexicon construction method incorporating manually labeled sentiment scores, semantic similarity information that is introduced by machine learning method word2vec. Experimental results demonstrate that our method improves the performance of tourism sentiment analysis significantly.
PubDate: 2018-02-16
DOI: 10.1007/s40745-017-0130-3

• User Data Can Tell Defaulters in P2P Lending
• Authors: Jackson J. Mi; Tianxiao Hu; Luke Deer
Abstract: Online peer-to-peer (P2P) lending service is a new type of financial platforms that enables individuals borrow and lend money directly from one to another. As P2P lending service is rapidly developing, a number of rating systems of borrowers’ creditworthiness are published by different P2P lending companies. However, whether these rating systems could truly reflect the creditworthiness and loan risk of borrowers is unconfirmed. In this paper, we analyzed the differences between credit levels and users’ distribution of CPLP to evaluate if the credit levels can truly reflect the borrowers’ credit. We used soft factors to establish a model that can find borrowers who are likely to default. Further, we proposed some strategies to construct and improve the risk-control of P2P lending platforms according to the result of our research.
PubDate: 2018-02-05
DOI: 10.1007/s40745-017-0134-z

• Enhancing Situation Awareness Using Semantic Web Technologies and Complex
Event Processing
• Authors: Havva Alizadeh Noughabi; Mohsen Kahani; Alireza Shakibamanesh
Abstract: Data fusion techniques combine raw data of multiple sources and collect associated data to achieve more specific inferences than what could be attained with a single source. Situational awareness is one of the levels of the JDL, a matured information fusion model. The aim of situational awareness is to understand the developing relationships of interests between entities within a specific time and space. The present research shows how semantic web technologies, i.e. ontology and semantic reasoner, can be used to describe situations and increase awareness of the situation. As the situation awareness level receives data streams from numerous distributed sources, it is necessary to manage data streams by applying data stream processor engines such as Esper. In addition, in this research, complex event processing, a technique for achieving related situational in real-time, has been used, whose main aim is to generate actionable abstractions from event streams, automatically. The proposed approach combines Complex Event Processing and semantic web technologies to achieve better situational awareness. To show the functionality of the proposed approach in practice, some simple examples are discussed.
PubDate: 2018-02-05
DOI: 10.1007/s40745-018-0148-1

• A New Family of Generalized Distributions Based on Alpha Power
Transformation with Application to Cancer Data
• Authors: M. Nassar; A. Alzaatreh; O. Abo-Kasem; M. Mead; M. Mansoor
Abstract: In this paper, we propose a new method for generating distributions based on the idea of alpha power transformation introduced by Mahdavi and Kundu (Commun Stat Theory Methods 46(13):6543–6557, 2017). The new method can be applied to any distribution by inverting its quantile function as a function of alpha power transformation. We apply the proposed method to the Weibull distribution to obtain a three-parameter alpha power within Weibull quantile function. The new distribution possesses a very flexible density and hazard rate function shapes which are very useful in cancer research. The hazard rate function can be increasing, decreasing, bathtub or upside down bathtub shapes. We derive some general properties of the proposed distribution including moments, moment generating function, quantile and Shannon entropy. The maximum likelihood estimation method is used to estimate the parameters. We illustrate the applicability of the proposed distribution to complete and censored cancer data sets.
PubDate: 2018-02-03
DOI: 10.1007/s40745-018-0144-5

• $$\ell _1$$ ℓ 1 -Norm Based Central Point Analysis for Asymmetric Radial
Data
• Authors: Qi An; Shu-Cherng Fang; Tiantian Nie; Shan Jiang
Abstract: Multivariate asymmetric radial data clouds with irregularly positioned “spokes” and “clutters” are commonly seen in real life applications. In identifying the spoke directions of such data, a key initial step is to locate a central point from which each spoke extends and diverges. In this technical note, we propose a novel method that features a preselection procedure to screen out candidate points that have sufficiently many data points in the vicinity and identifies the central point by solving an $$\ell _1$$ -norm constrained discrete optimization program. Extensive numerical experiments show that the proposed method is capable of providing central points with superior accuracy and robustness compared with other known methods and is computationally efficient for implementation.
PubDate: 2018-01-29
DOI: 10.1007/s40745-018-0147-2

• Collective Anomaly Detection Techniques for Network Traffic Analysis
• Authors: Mohiuddin Ahmed
Abstract: In certain cyber-attack scenarios, such as flooding denial of service attacks, the data distribution changes significantly. This forms a collective anomaly, where some similar kinds of normal data instances appear in abnormally large numbers. Since they are not rare anomalies, existing anomaly detection techniques cannot properly identify them. This paper investigates detecting this behaviour using the existing clustering and co-clustering based techniques and utilizes the network traffic modelling technique via Hurst parameter to propose a more effective algorithm combining clustering and Hurst parameter. Experimental analysis reflects that the proposed Hurst parameter-based technique outperforms existing collective and rare anomaly detection techniques in terms of detection accuracy and false positive rates. The experimental results are based on benchmark datasets such as KDD Cup 1999 and UNSW-NB15 datasets.
PubDate: 2018-01-24
DOI: 10.1007/s40745-018-0149-0

• Region Based Instance Document (RID) Approach Using Compression Features
• Authors: N. V. Ganapathi Raju; Someswara Rao Chinta
Abstract:
Authors hip attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially conspicuous in legal, criminal/civil cases, threatening letters and terroristic communications also in computer forensics. There are two basic approaches for authorship attribution one is instance based (treat each training text individually) and the other is profile based (treat each training text cumulatively). Both of these methods have their own advantages and disadvantages. The present paper proposes a new region based document model for authorship identification, to address the dimensionality problem of instance based approaches and scalability problem of profile based approaches. The proposed model concatenates a set of individual ‘n’ instance documents of the author as a single region based instance document (RID). On the RID compression based similarity distance method is used. The compression based methods requires no pre-processing and easy to apply. This paper uses Gzip compression algorithm with two compression based similarity measures NCD, CDM. The proposed compression model is character based and it can automatically capture easily non word features such as word stems, punctuations etc. The only disadvantage of compression models is complexity is high. The proposed RID approach addresses this issue by reducing the repeated words in the document. The present approach is experimented on English editorial columns. We achieved approximately 98% of accuracy in identifying the author.
PubDate: 2018-01-20
DOI: 10.1007/s40745-018-0145-4

• Analysis of Prevalence of Malaria and Anemia Using Bivariate Probit Model
• Authors: Senayit Seyoum
Abstract: Malaria and anemia are public health problems that have an impact on social and economic development. Malaria causes 70,000 deaths each year and accounts for 17% of outpatient visits to health institutions. It is one of the causes of anemia. Therefore, knowing the relation between malaria and anemia could have a great contribution to the development of prevention strategies. This study is intended to jointly model the prevalence of malaria and anemia by employing a bivariate probit model and show their relationship. The data was obtained from 384 patients visiting Alaba health center. The results of the bivariate probit model shows that sex, age, education level and marital status are significantly associated with malaria and sex and education level are significantly associated with anemia. The results of the seemingly unrelated bivariate probit model shows that sex, education level, age and marital status are significantly determining the prevalence of malaria, and malaria, sex and education level are significantly determining the prevalence of anemia.
PubDate: 2018-01-19
DOI: 10.1007/s40745-018-0138-3

• Development of Optimal ANN Model to Estimate the Thermal Performance of
Roughened Solar Air Heater Using Two different Learning Algorithms
Abstract: In the present study, artificial neural network (ANN) model has been developed with two different training algorithms to predict the thermal efficiency of wire rib roughened solar air heater. Total 50 sets of data have been taken from experiments with three different types of absorber plate. The experimental data and calculated values of collector efficiency were used to develop ANN model. Scaled conjugate gradient (SCG) and Levenberg–Marquardt (LM) learning algorithms were used. It has been found that TRAINLM with 6 neurons and TRAINSCG with 7 neurons is optimal model on the basis of statistical error analysis. The performance of both the models have been compared with actual data and found that TRAINLM performs better than TRAINSCG. The value of coefficient of determination $$(\hbox {R}^{2})$$ for LM-6 is 0.99882 which gives the satisfactory performance. Learning algorithm with LM based proposed MLP ANN model seems more reliable for predicting performance of solar air heater.
PubDate: 2018-01-18
DOI: 10.1007/s40745-018-0146-3

• A Family of Generalised Beta Distributions: Properties and Applications
• Authors: Emilio Gómez-Déniz; José María Sarabia
Abstract: A family of continuous distributions with bounded support, which is a generalisation of the standard beta distribution, is introduced. We study some basic properties of the new family and simulation experiments are performed to observe the behaviour of the maximum likelihood estimators. We also derive a multivariate version of the proposed distributions. Three numerical experiments were performed to determine the flexibility of the new family of distributions in comparison with other extensions of the beta distribution that have been proposed. In this respect, the new family was found to be superior.
PubDate: 2018-01-15
DOI: 10.1007/s40745-018-0143-6

• Classifying Categories of SCADA Attacks in a Big Data Framework
• Authors: Krishna Madhuri Paramkusem; Ramazan S. Aygun
PubDate: 2018-01-15
DOI: 10.1007/s40745-018-0141-8

• On Some Further Properties and Application of Weibull- R Family of
Distributions
• Authors: Indranil Ghosh; Saralees Nadarajah
Abstract: In this paper, we provide some new results for the Weibull-R family of distributions (Alzaghal et al. in Int J Stat Probab 5:139–149, 2016). We derive some new structural properties of the Weibull-R family of distributions. We provide various characterizations of the family via conditional moments, some functions of order statistics and via record values.
PubDate: 2018-01-13
DOI: 10.1007/s40745-018-0142-7

• Ranking of Classification Algorithms in Terms of Mean–Standard
Deviation Using A-TOPSIS
• Authors: André G. C. Pacheco; Renato A. Krohling
Abstract: In classification problems when multiple algorithms are applied to different benchmarks a difficult issue arises, i.e., how can we rank the algorithms' In machine learning, it is common to run the algorithms several times and then a statistic is calculated in terms of means and standard deviations. In order to compare the performance of the algorithms, it is very common to employ statistical tests. However, these tests may also present limitations, since they consider only the means and not the standard deviations of the obtained results. In this paper, we present the so-called A-TOPSIS, based on Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), to solve the problem of ranking and comparing classification algorithms in terms of means and standard deviations. We use two case studies to illustrate the A-TOPSIS for ranking classification algorithms and the results show the suitability of A-TOPSIS to rank the algorithms. The presented approach can be applied to compare the performance of stochastic algorithms in machine learning. Lastly, to encourage researchers to use the A-TOPSIS for ranking algorithms, we also presented in this work an easy-to-use A-TOPSIS web framework.
PubDate: 2018-01-13
DOI: 10.1007/s40745-018-0136-5

• A New Approach for Improving Classification Accuracy in Predictive
Discriminant Analysis
• Authors: A. Iduseri; J. E. Osemwenkhae
Abstract: The focus of a predictive discriminant analysis is to improve classification accuracy, and to obtain statistically optimal classification accuracy or hit rate is still a challenge due to the inherent variability of most real life dataset. Improving classification accuracy is usually achieved with best subset of relevant predictors obtained by using classical variable selection methods. The goal of variable selection methods is to choose the best subset (or training sample) of relevant variables that typically reduces the complexity of a model and makes it easier to interpret, improves the classification accuracy of the model and reduces the training time. However, a statistically optimal hit rate can be achieved if the training sample meets a near optimal condition by resolving any significant differences in the variances for the groups formed by the dependent variable. This paper proposes a new approach for obtaining a near optimal training sample that will produce a statistically optimal hit rate using a modified winsorization with graphical diagnostic. In application to real life data sets, the proposed new approach was able to identify and remove legitimate contaminants in one or more predictors in the training sample, thereby resolving any significant differences in the variances for the groups formed by the dependent variable. The graphical diagnostic associated with the new approach, however, provides a useful visual tool which served as an alternative graphical test for homogeneity of variances.
PubDate: 2018-01-12
DOI: 10.1007/s40745-018-0140-9

• Operational Loss Data Collection: A Literature Review
• Authors: Lu Wei; Jianping Li; Xiaoqian Zhu
Abstract: This paper is the first to provide a comprehensive overview of the worldwide operational loss data collection exercises (LDCEs) of internal loss, external loss, scenario analysis and business environment and internal control factors (BEICFs). Based on analyzing operational risk-related articles from 2002 to March 2017 and surveying a large amount of other information, various sources of operational risk data are classified into five types, i.e. individual banks, regulatory authorities, consortia of financial institutions, commercial vendors and researchers. Then by reviewing operational risk databases from these five data sources, we summarized and described 32 internal databases, 26 external databases, 7 scenario databases and 1 BEICFs database. We also find that compared with developing countries, developed countries have performed relatively better in operational risk LDCEs. Besides, the two subjective data elements of scenario analysis and BEICFs are less used than the two objective data elements of internal and external loss data in operational risk estimation.
PubDate: 2018-01-12
DOI: 10.1007/s40745-018-0139-2

• Cardiopulmonary Function Monitoring Based on MEWMA Control Chart
• Authors: Hongxia Zhang; Liu Liu; Jin Yue; Xin Lai
Abstract: According to the characteristics of parameters of cardiopulmonary function diversity and change slowly in pathology, we apply the multivariate exponentially weighted moving average (MEWMA) control chart to monitor the state of lungs. This paper aimed at five indicators of cardiopulmonary function, using principal component test to diagnose whether it is from the multivariate normal distribution, Clearing the relationship model of control line and weight coefficient of MEWMA control graph, and drawing the control diagram for monitoring. The process stay in control state before 103 observations, however, beyond the control limit from the 104 observation statistics and give an alarm. This means that there is a problem with the cardiopulmonary starting on the 103rd sample. Control chart has a good warning function because it can raise the alarm before cardiopulmonary function has a big problem. Using MEWMA control chart for monitoring can reduce the cost of medical examination and frequency, it can improve the hospital resource utilization rate and confirm the case. Thus we can avoid missing the best treatment time.
PubDate: 2018-01-11
DOI: 10.1007/s40745-018-0137-4

• How China Deals with Big Data
• Authors: Yong Shi; Zhiguang Shan; Jianping Li; Yufei Fang
Pages: 433 - 440
Abstract: On September 5, 2015, the State Council of Chinese Government, China’s cabinet formally announced its Action Framework for Promoting Big Data (www.gov.cn, 2015). This is the milestone for China to catch up the global wave of big data. Since 2012 big data became a hot issue for scientific communities as well as the governments of many countries (Lazer et al. in Science 343:1203–1205, 2014; Einav et al. in Science 345:715, 2014; Cate in Science 346:818, 2014; Khoury and Ioannidis in Science 346:1054–1055, 2014). At the 2013 G8 Summit, the leaders of Canada, France, Germany, Italy, Japan, Russia, U.S.A. and United Kingdom agreed on an “open government plan” (www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex, 2013). China’s framework, however, mainly emphasizes the integration of all trans-departmental data and establishes a number of government-driven national big data platforms so as to provide big data services to research, public and enterprises. The framework not only demonstrates a strong commitment of the Chinese government on big data, but also covers a wide range of governmental branches, enterprises and institutions far more than that of other countries. In addition, the framework shows an interpretation of big data that differs from other countries. If its objective is achieved, China would become a strong “big data country”.
PubDate: 2017-12-01
DOI: 10.1007/s40745-017-0129-9
Issue No: Vol. 4, No. 4 (2017)

• Hardware Implementation of Bone Fracture Detector Using Fuzzy Method Along
with Local Normalization Technique
• Authors: Abdullah-Al Nahid; Tariq M. Khan; Yinan Kong
Abstract: Bone fracture detection from the digital image segmentation is a well-known image processing application which is frequently used to process biomedical images. Hardware realization of different image processing algorithm specially utilizing Field Programmable Gate Array (FPGA) has been gained a great interest among the researchers. FPGA has many significant features like spatial and temporal parallelism that best suits for real-time implementation of image processing. To gain the benefit from these characteristics of a FPGA, a new method for bone fracture detection is proposed and its performance is validated through real-time implementation. Simulation results show that the proposed method give superior performance than the existing method.
PubDate: 2017-07-21
DOI: 10.1007/s40745-017-0118-z

JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327

Home (Search)
Subjects A-Z
Publishers A-Z
Customise
APIs