Authors:Nádia P. Kozievitch; Thiago H. Silva; Artur Ziviani; Giovani Costa; Gustavo Lugo Pages: 307 - 327 Abstract: Recent concepts such as Smart Cities, Urban Computing, and Geographic Information Systems are being discussed in various international forums, using themes such as sustainability and efficient use of the city infrastructures. One important aspect in this regard is to correctly associate computational techniques with statistical models and integrate heterogeneous data sources using open data shared by cities. Based on that, this study uses open data from the city of Curitiba (Brazil) in order to bring results on the spatiotemporal evolution of business activities along a period of over thirty years. To that end, the study identifies and discusses important challenges that had to be tackled toward data quality, data categorization, and data integration, in order to perform this type of study in practice. By looking at the dynamics of geographically grounded microeconomic variables, this study shows how the expansion and diversification of business types in different neighborhoods happened, contributing to a better understanding of the process of evolution of the business activity in a city. PubDate: 2017-09-01 DOI: 10.1007/s40745-017-0104-5 Issue No:Vol. 4, No. 3 (2017)

Authors:Suresh Dara; Haider Banka; Chandra Sekhara Rao Annavarapu Pages: 341 - 360 Abstract: Feature selection in high dimensional data, particularly, in gene expression data, is one of the challenging task in bioinformatics due to the curse of dimensionality, data redundancy and noise values. In gene expression data, insignificant features causes poor classification, hence feature selection reduces feature subset, improving classification accuracy. Feature selection algorithms in gene expression data(such as filter based, wrapper based and hybrid methods) performing poor accuracy, where as few methods takes too much time to converge for an acceptable results. For example, in NSGA-II, over 10,000 generations, on an average, to converge in the search space. where it incurs increased computational time. Proposed rough based hybrid binary PSO algorithm, which uses a heuristic based fast processing strategy to reduce crude domain features by statistical elimination of redundant features and then discretized subsequently into a binary table, known as distinction table, in rough set theory. This distinction table is later used as input to evaluate and optimize the objectives functions i.e., to generate reduct in rough set theory. The proposed hybrid binary PSO is then used to tune the objective functions, to choose the most important features (i:e:reduct). The fitness function is used in such a way that it can reduce the cardinality of the features and at the same time, improve the classification performance as well. Results have been demonstrated to show the effectiveness of the proposed method, on existing three benchmark datasets (i.e. colon cancer, lymphoma and leukemia data), from literature. PubDate: 2017-09-01 DOI: 10.1007/s40745-017-0106-3 Issue No:Vol. 4, No. 3 (2017)

Authors:Million Wesenu; Sudhir Kulkarni; Tafere Tilahun Pages: 361 - 381 Abstract: Preterm birth is the term used to define births that occur before 37 completed weeks or 259 days of gestation. The aim of this study is to model survival probability of premature infants who were under follow-up and identify significant risk factors for mortality. Recorded hospital data were obtained for a cohort of 490 infants at Jimma University Specialized Hospital, Ethiopia. The infants have been under follow-up from January 2013 to December 2015. The non-parametric, semi-parametric and parametric survival models are used to estimate the survival time as well as examine the association between the survival time with different demographic, health and risk behavior variables. The analysis shows that most factors significantly contribute to a shorter survival time of premature infants. These factors include having prenatal Asphyxia, hyaline membrane disease, sepsis, jaundice, low gestational age, respiratory distress syndrome and initial temperature. It is therefore recommended that people ought to be cognizant on the burden of these risk factors and well informed about the prematurity. PubDate: 2017-09-01 DOI: 10.1007/s40745-017-0107-2 Issue No:Vol. 4, No. 3 (2017)

Authors:Ramesh Naidu Balaka; Prasad Babu Maddali Surendra Pages: 383 - 404 Abstract: Biometric authentication plays pivotal role for providing security in any industry. In the previous works, biometric authentication systems are developed by using the Password, Pin-number and Signature as a single source of identification (i.e. unimodal biometric system). But these systems can be noisy, lost, stolen or subjected to spoofing attack. This paper proposes a Multimodal Biometric Authenticated system which use more than one biometric trait for recognition and it is more effective than the any previous work. The proposed system is strong enough from attacks as the authentication is being done by using multimodal biometric traits. The present system handles two traits face and finger for recognition and these are followed by prepossessing, removing the noise, compression the traits and then extract features by using Histogram Oriented Gradients technique (HOG). The probability Density Function (PDF) values are obtained from the HOG features by using Gaussian mixer model. Fusion the PDF values by using score level fusion. Finally correlation compares both the training dataset and testing dataset traits. Identification of biometric traits have been done based on multimodal biometric system and results are better recognition performance compared to existing methods. However, experiments also done on different parametric measures like RMSE, PSNR and CR. It was observed that DCT has better performance than the existing HAAR wavelet transform. The proposed work is useful for reduce the size of the database, utilization of bandwidth, identification of traits and authentication in bank system, crime investigation etc. PubDate: 2017-09-01 DOI: 10.1007/s40745-017-0110-7 Issue No:Vol. 4, No. 3 (2017)

Authors:Daya K. Nagar; Saralees Nadarajah; Idika E. Okorie Pages: 405 - 420 Abstract: The most flexible bivariate distribution to date is proposed with one variable restricted to [0, 1] and the other taking any non-negative value. Various mathematical properties and maximum likelihood estimation are addressed. The mathematical properties derived include shape of the distribution, covariance, correlation coefficient, joint moment generating function, Rényi entropy and Shannon entropy. For interval estimation, explicit expressions are derived for the information matrix. Illustrations using two real data sets show that the proposed distribution performs better than all other known distributions of its kind. PubDate: 2017-09-01 DOI: 10.1007/s40745-017-0111-6 Issue No:Vol. 4, No. 3 (2017)

Authors:K. Ramadan; M. I. Dessouky; S. Elagooz; M. Elkordy; F. E. Abd El-Samie Abstract: Due to noise enhancement, conventional Zero Forcing (ZF) equalizers are not suitable for wireless Underwater Acoustic (UWA) Orthogonal Frequency Division Multiplexing (OFDM) communication systems. Furthermore, these systems suffer from increasing complexity due to the large number of subcarriers, especially in Multiple-Input Multiple-Output (MIMO) systems. On the other hand, the Minimum Mean Square Error equalizer suffers from high complexity. This type of equalizers needs an estimation of the operating Signal-to-Noise Ratio to work properly. In this paper, we propose a Joint Low-Complexity Regularized ZF equalizer for MIMO UWA-OFDM systems to cope with these problems. The main objective of the proposed equalizer is to enhance the system performance with a lower complexity by performing equalization in two steps. The co-channel interference can be mitigated in the first step. A regularization term is added in the second step to avoid the noise enhancement. Simulation results show that the proposed equalization scheme has the ability to enhance the UWA system performance with low complexity. PubDate: 2017-09-06 DOI: 10.1007/s40745-017-0127-y

Authors:M. Elgarhy; Muhammad Ahsan ul Haq; Qurat ul Ain Abstract: In this article, we introduced and studied exponentiated generalized Kumaraswamy distribution. We derived mathematical properties including quantile function, moment generating function, ordinary moments, probability weighted moments, incomplete moments, and Rényi entropy. The expressions of order statistics are also derived. Here we discuss the parameter estimation by using the method of maximum likelihood. We showed resilience of the introduced distribution over existing some well-known distributions by using real dataset applications. PubDate: 2017-08-17 DOI: 10.1007/s40745-017-0128-x

Authors:Hossein Hassani; Xu Huang; Mansi Ghodsi Abstract: Causality analysis continues to remain one of the fundamental research questions and the ultimate objective for a tremendous amount of scientific studies. In line with the rapid progress of science and technology, the age of big data has significantly influenced the causality analysis on various disciplines especially for the last decade due to the fact that the complexity and difficulty on identifying causality among big data has dramatically increased. Data mining, the process of uncovering hidden information from big data is now an important tool for causality analysis, and has been extensively exploited by scholars around the world. The primary aim of this paper is to provide a concise review of the causality analysis in big data. To this end the paper reviews recent significant applications of data mining techniques in causality analysis covering a substantial quantity of research to date, presented in chronological order with an overview table of data mining applications in causality analysis domain as a reference directory. PubDate: 2017-08-01 DOI: 10.1007/s40745-017-0122-3

Authors:S. Viswanadha Raju; K. K. V. V. S. Reddy; Chinta Someswara Rao Abstract: String Matching is a technique of searching a pattern in a text. It is the basic concept to extract the fruitful information from large volume of text, which is used in different applications like text processing, information retrieval, text mining, pattern recognition, DNA sequencing and data cleaning etc., . Though it is stated some of the simple mechanisms perform very well in practice, plenty of research has been published on the subject and research is still active in this area and there are ample opportunities to develop new techniques. For this purpose, this paper has proposed linear array based string matching, string matching with butterfly model and string matching with divide and conquer models for sequential and parallel environments. To assess the efficiency of the proposed models, the genome sequences of different sizes (10–100 Mb) are taken as input data set. The experimental results have shown that the proposed string matching algorithms performs very well compared to those of Brute force, KMP and Boyer moore string matching algorithms. PubDate: 2017-07-29 DOI: 10.1007/s40745-017-0124-1

Authors:Firuz Kamalov; Fadi Thabtah Abstract: One of the major aspects of any classification process is selecting the relevant set of features to be used in a classification algorithm. This initial step in data analysis is called the feature selection process. Disposing of the irrelevant features from the dataset will reduce the complexity of the classification task and will increase the robustness of the decision rules when applied on the test set. This paper proposes a new filtering method that combines and normalizes the scores of three major feature selection methods: information gain, chi-squared statistic and inter-correlation. Our method utilizes the strengths of each of the aforementioned methods to maximum advantage while avoiding their drawbacks—especially the disparity of the results produced by these methods. Our filtering method stabilizes each variable score and gives it the true rank among the input data’s available variables. Hence it maximizes the stability in the variables’ scores without losing the overall accuracy of the predictive model. A number of experiments on different datasets from various domains have shown that features chosen by the proposed method are highly predictive when compared with features selected by other existing filtering methods. The evaluation of the filtering phase was conducted via thorough experimentations using a number of predictive classification algorithms in addition to statistical analysis of the filtering methods’ scores. PubDate: 2017-07-29 DOI: 10.1007/s40745-017-0116-1

Authors:Tariku Tessema Abstract: Community acquired pneumonia refers to pneumonia acquired outside of hospitals or extended health facilities and it is a leading infectious disease. This study aims to model mortality of hospitalized under-5 year child pneumonia patients and investigate potential risk factors associated with child mortality due to pneumonia. The study was a retrospective study on 305 sampled under-five hospitalized patients of community acquired pneumonia. A cross-classified multilevel logistic regression was employed with resident and hospital classified at the second level. Bayesian estimation method was applied in which the posterior distribution was simulated via Markov Chain Monte Carlo. The variability attributable to hospital was found to be larger than variability attributable to residence. The odds of dying from the community acquired pneumonia was higher among patients who were; diagnosed in spring season, complicated with malaria, AGE and AFI, in a neonatal age group, diagnosed late (more than a week). The risk of mortality was also found high for lower nurse: patient and physician: patients’ ratios. PubDate: 2017-07-28 DOI: 10.1007/s40745-017-0121-4

Authors:Harihara Santosh Dadi; Gopala Krishna Mohan Pillutla; Madhavi Latha Makkena Abstract: Tracking of human and recognition in public places using surveillance cameras is the topic of research in the area computer vision. Recognition of human and then tracking completes the video surveillance system. A novel algorithm for face recognition and human tracking is presented in this article. Human is tracked using Gaussian mixture model. To track the human in specific, template of GMM is divided into four regions which are placed one above the other and tracked simultaneously. For recognizing the human, the histogram of oriented gradients features of the face region are given to the support vector machine classifier. Three experiments are conducted in taking the training faces. Every \(10{\mathrm{th}}\) frame, every \(5{\mathrm{th}}\) frame and every \(3{\mathrm{rd}}\) frame of the first 100 frames are considered. The other frames in the video are considered for testing using SVM classifier. Three datasets namely AITAM1 (simple), AITAM2 (moderate) and AITAM3 (complex) are used in this work. The experimental results show that as the complexity of dataset increases the performance metrics are getting decreased. The more the number of training faces in preparing a classifier, the better is the face recognition rate. This is experimented for all types of datasets. The Performance results show that the combination of the tracking algorithm and the face recognition algorithm not only tracks the person but also recognizes the person. This unique property of both tracking and recognition makes it best suit for video surveillance applications. PubDate: 2017-07-25 DOI: 10.1007/s40745-017-0123-2

Authors:Chandrakant; M. K. Rastogi; Y. M. Tripathi Abstract: In this paper we study various reliability properties of a Weibull inverse exponential distribution. The maximum likelihood and Bayes estimates of unknown parameters and reliability characteristics are obtained. Bayes estimates are obtained with respect to the squared error loss function under proper and improper prior situations. We use the Lindley method and the Metropolis–Hastings algorithm to compute the Bayes estimates. Interval estimation is also considered. Asymptotic and highest posterior density intervals of unknown parameters are constructed in this respect. We perform a numerical study to compare the performance of all methods and obtain comments based on this study. We also analyze two real data sets for illustration purposes. Finally a conclusion is presented. PubDate: 2017-07-24 DOI: 10.1007/s40745-017-0125-0

Authors:Sakshi Agarwal; Shikha Mehta Abstract: Shortest distance query is widely used aspect in large scale networks. Numerous approaches are present in the literature to approximate the distance between two query nodes. Most popular distance approximation approach is landmark embedding scheme. In this technique selection of optimal landmarks is a NP-hard problem. Various heuristics available to locate optimal landmarks include random, degree, closeness centrality, betweenness and eccentricity etc. In this paper, we propose to employ k-medoids clustering based approach to improve distance estimation accuracy over local landmark embedding techniques. In particular, it is observed that global selection of the seed landmarks causes’ large relative error, which is further reduced using local landmark embedding. The efficacy of the proposed approach is analyzed with respect to conventional graph embedding techniques on six large-scale networks. Results express that the proposed landmark selection scheme reduces the shortest distance estimation error considerably. Proposed technique is able to reduce the approximation error of shortest distance by upto 29% with respect to the other graph embedding technique. PubDate: 2017-07-22 DOI: 10.1007/s40745-017-0119-y

Authors:Abdullah-Al Nahid; Tariq M. Khan; Yinan Kong Abstract: Bone fracture detection from the digital image segmentation is a well-known image processing application which is frequently used to process biomedical images. Hardware realization of different image processing algorithm specially utilizing Field Programmable Gate Array (FPGA) has been gained a great interest among the researchers. FPGA has many significant features like spatial and temporal parallelism that best suits for real-time implementation of image processing. To gain the benefit from these characteristics of a FPGA, a new method for bone fracture detection is proposed and its performance is validated through real-time implementation. Simulation results show that the proposed method give superior performance than the existing method. PubDate: 2017-07-21 DOI: 10.1007/s40745-017-0118-z

Authors:Sanku Dey; Tanmay Kayal; Yogesh Mani Tripathi Abstract: This article addresses the different methods of estimation of the probability density function and the cumulative distribution function for the Gompertz distribution. Following estimation methods are considered: maximum likelihood estimators, uniformly minimum variance unbiased estimators, least squares estimators, weighted least square estimators, percentile estimators, maximum product of spacings estimators, Cramér–von-Mises estimators, Anderson–Darling estimators. Monte Carlo simulations are performed to compare the behavior of the proposed methods of estimation for different sample sizes. Finally, one real data set and one simulated data set are analyzed for illustrative purposes. PubDate: 2017-07-21 DOI: 10.1007/s40745-017-0126-z

Authors:Sanku Dey; Mazen Nassar; Devendra Kumar Abstract: In this paper, a new three-parameter distribution, called \(\alpha \) logarithmic transformed generalized exponential distribution ( \(\alpha LTGE\) ) is proposed. Various properties of the proposed distribution, including explicit expressions for the moments, quantiles, moment generating function, mean deviation about the mean and median, mean residual life, Bonferroni curve, Lorenz curve, Gini index, Rényi entropy, stochastic ordering and order statistics are derived. It appears to be a distribution capable of allowing monotonically increasing, decreasing, bathtub and upside-down bathtub shaped hazard rates depending on its parameters. The maximum likelihood estimators of the unknown parameters cannot be obtained in explicit forms, and they have to be obtained by solving non-linear equations only. The asymptotic confidence intervals for the parameters are also obtained based on asymptotic variance covariance matrix. Finally, two empirical applications of the new model to real data are presented for illustrative purposes. PubDate: 2017-07-21 DOI: 10.1007/s40745-017-0115-2

Authors:Pramendra Singh Pundir; Puneet Kumar Gupta Abstract: This study deals with the reliability analysis of a multi-component load sharing system where failure of any component within the system induces higher failure rate on the remaining surviving components. It is assumed that each component failure time follows Chen distribution. In classical set up, the maximum likelihood estimates of the load sharing parameters, system reliability and hazard rate along with their standard errors are computed. Since maximum likelihood estimates are not in closed form, so asymptotic confidence intervals and two bootstrap confidence intervals for the unknown parameters have also been constructed. Further, by assuming both informative and non-informative prior for the unknown parameters, Bayes estimates along with their posterior standard errors and HPD intervals of the parameters are obtained. Thereafter, a simulation study elicitates the theoretical developments. A real data analysis, at the end, eshtablishes the applicability of the proposed theory. PubDate: 2017-07-20 DOI: 10.1007/s40745-017-0120-5

Authors:Reza Mokarram; Mehdi Emadi Abstract: Classification is the most important issues that have gained much attention in various fields such as health and medicine. Especially in survival models, classification represents a main objective and it is also one of the main purposes in data mining. Among data mining methods used for classification, implementation of the decision tree due to its simplicity and understandable and accurate results, has gained much attention and popularity. In this paper, first we generate the observations by using Monte-Carlo simulation from hazard model with the three degrees of complexity in different levels of censorship 0 to 70%. Then the accuracy of classification in the Cox and the decision tree models is compared for the number of samples 1000, 5000 and 10,000 by area under the ROC curve(AUC) and the ROC-test. PubDate: 2017-07-12 DOI: 10.1007/s40745-017-0105-4

Authors:Yuanyuan Zhang; Saralees Nadarajah Abstract: The Pareto type I distribution (also known as the power law distribution and Zipf’s law) appears to be the main distribution used to model heavy tailed phenomena in the big data literature. The Pareto type I distribution being one of the oldest heavy tailed distributions is not very flexible. Here, we show flexibility of four other heavy tailed distributions for modeling four big data sets in social networks. The Pareto type I distribution is shown not to provide the best or even an adequate fit for any of the data sets. PubDate: 2017-06-10 DOI: 10.1007/s40745-017-0113-4