Authors:Ramazan S. Aygun Abstract: In this paper, we evaluate maximum subarrays for approximate string matching and alignment. The global alignment score as well as local sub-alignments are indicators of good alignment. After showing how maximum sub-arrays could be used for string matching, we provide several ways of using maximum subarrays: long, short, loose, strict, and top-k. While long version extends the local sub-alignments, the short method avoids extensions that would not increase the alignment score. The loose method tries to achieve high global score whereas the strict method converts the output of loose alignment by minimizing the unnecessary gaps. The top-k method is used to find out top-k sub-alignments. The results are compared with two global and local dynamic programming methods that use gap penalties in addition to one of the state-of-art methods. In our experiments, using maximum subarrays generated good overall as well as local sub-alignments without requiring gap penalties. PubDate: 2017-07-19 DOI: 10.1007/s40745-017-0117-0

Authors:Sanku Dey; Chunfang Zhang; A. Asgharzadeh; M. Ghorbannezhad Abstract: The extended exponential distribution due to Nadarajah and Haghighi (Stat J Theor Appl Stat 45(6):543–558, 2011) is an alternative and always provides better fits than the gamma, Weibull and the generalized exponential distributions whenever the data contains zero values. This article addresses different methods of estimation of the unknown parameters from both frequentist and Bayesian view points of Nadarajah and Haghighi (in short NH ) distribution. We briefly describe different frequentist approaches, namely, maximum likelihood estimators, moment estimators, percentile estimators, least square and weighted least square estimators and compare them using extensive numerical simulations. Next we consider Bayes estimation under different types of loss functions (symmetric and asymmetric loss functions) using gamma priors for both shape and scale parameters. Besides, the asymptotic confidence intervals, two parametric bootstrap confidence intervals using frequentist approaches are provided to compare with Bayes credible intervals. Furthermore, the Bayes estimators and their respective posterior risks are computed and compared using Markov chain Monte Carlo algorithm. Finally, two real data sets have been analyzed for illustrative purposes. PubDate: 2017-07-17 DOI: 10.1007/s40745-017-0114-3

Authors:Reza Mokarram; Mehdi Emadi Abstract: Classification is the most important issues that have gained much attention in various fields such as health and medicine. Especially in survival models, classification represents a main objective and it is also one of the main purposes in data mining. Among data mining methods used for classification, implementation of the decision tree due to its simplicity and understandable and accurate results, has gained much attention and popularity. In this paper, first we generate the observations by using Monte-Carlo simulation from hazard model with the three degrees of complexity in different levels of censorship 0 to 70%. Then the accuracy of classification in the Cox and the decision tree models is compared for the number of samples 1000, 5000 and 10,000 by area under the ROC curve(AUC) and the ROC-test. PubDate: 2017-07-12 DOI: 10.1007/s40745-017-0105-4

Authors:Yuanyuan Zhang; Saralees Nadarajah Abstract: The Pareto type I distribution (also known as the power law distribution and Zipf’s law) appears to be the main distribution used to model heavy tailed phenomena in the big data literature. The Pareto type I distribution being one of the oldest heavy tailed distributions is not very flexible. Here, we show flexibility of four other heavy tailed distributions for modeling four big data sets in social networks. The Pareto type I distribution is shown not to provide the best or even an adequate fit for any of the data sets. PubDate: 2017-06-10 DOI: 10.1007/s40745-017-0113-4

Authors:Feng Liu; Yong Shi; Ying Liu Abstract: Although artificial intelligence (AI) is currently one of the most interesting areas in scientific research, the potential threats posed by emerging AI systems remain a source of persistent controversy. To address the issue of AI threat,this study proposes a “standard intelligence model” that unifies AI and human characteristics in terms of four aspects of knowledge, i.e., input, output, mastery, and creation. Using this model, we observe three challenges, namely, expanding of the von Neumann architecture; testing and ranking the intelligence quotient (IQ) of naturally and artificially intelligent systems, including humans, Google, Microsoft’s Bing, Baidu, and Siri; and finally, the dividing of artificially intelligent systems into seven grades from robots to Google Brain. Based on this, we conclude that Google’s AlphaGo belongs to the third grade. PubDate: 2017-05-16 DOI: 10.1007/s40745-017-0109-0

Authors:James M. Tien Abstract: In several earlier papers, the author defined and detailed the concept of a servgood, which can be thought of as a physical good or product enveloped by a services-oriented layer that makes the good smarter or more adaptable and customizable for a particular use. Adding another layer of physical sensors could then enhance its smartness and intelligence, especially if it were to be connected with other servgoods—thus, constituting an Internet of Things (IoT) or servgoods. More importantly, real-time decision making is central to the Internet of Things; it is about decision informatics and embraces the advanced technologies of sensing (i.e., Big Data), processing (i.e., real-time analytics), reacting (i.e., real-time decision-making), and learning (i.e., deep learning). Indeed, real-time decision making (RTDM) is becoming an integral aspect of IoT and artificial intelligence (AI), including its improving abilities at voice and video recognition, speech and predictive synthesis, and language and social-media understanding. These three key and mutually supportive technologies—IoT, RTDM, and AI—are considered herein, including their progress to date. PubDate: 2017-05-16 DOI: 10.1007/s40745-017-0112-5

Authors:Ramesh Naidu Balaka; Prasad Babu Maddali Surendra Abstract: Biometric authentication plays pivotal role for providing security in any industry. In the previous works, biometric authentication systems are developed by using the Password, Pin-number and Signature as a single source of identification (i.e. unimodal biometric system). But these systems can be noisy, lost, stolen or subjected to spoofing attack. This paper proposes a Multimodal Biometric Authenticated system which use more than one biometric trait for recognition and it is more effective than the any previous work. The proposed system is strong enough from attacks as the authentication is being done by using multimodal biometric traits. The present system handles two traits face and finger for recognition and these are followed by prepossessing, removing the noise, compression the traits and then extract features by using Histogram Oriented Gradients technique (HOG). The probability Density Function (PDF) values are obtained from the HOG features by using Gaussian mixer model. Fusion the PDF values by using score level fusion. Finally correlation compares both the training dataset and testing dataset traits. Identification of biometric traits have been done based on multimodal biometric system and results are better recognition performance compared to existing methods. However, experiments also done on different parametric measures like RMSE, PSNR and CR. It was observed that DCT has better performance than the existing HAAR wavelet transform. The proposed work is useful for reduce the size of the database, utilization of bandwidth, identification of traits and authentication in bank system, crime investigation etc. PubDate: 2017-04-18 DOI: 10.1007/s40745-017-0110-7

Authors:Pei-Zhuang Wang; Ho-Chung Lui; Hai-Tao Liu; Si-Cong Guo Abstract: An algorithm named Gravity Sliding is presented in the paper, which emulates the gravity sliding motion in a feasible region D constrained by a group of hyper planes in \(R^{m}\) . At each stage point P of the sliding path, we need to calculate the projection of gravity vector g on constraint planes: if a constraint plane blocks the way at P, then it may change the direction of sliding path. The core technique is the synthetical treatment for multiple blocking planes, which is a basic problem of structural adjustment in practice; while the whole path provides the solution of a linear programming. Existing LP algorithms have no intuitive vision to emulate gravity sliding, therefore, their paths are not able to avoid circling and roving, and they could not provide a best direction at each step for structural adjustment. The first author presented the algorithm Cone Cutting (Wang in Inf Technol Decis Mak 10(1):65–82, 2011), which provides an intuitive explanation for Simplex pivoting. And then the algorithm Gradient Falling (Wang in Ann Data Sci 1(1):41–71, 2014. doi:10.1007/s40745-014-0005-9) was presented, which emulates the gradient motion on the feasible region. This paper is an improvement of gradient falling algorithm: in place of the description focusing on the null subspace of norm vectors, we focus the description on the expanding subspace of the very vectors in this paper. It makes the projection calculation easier and faster. We guess that the sliding path realized by the algorithm is the optimal path and the number of stage points of the path is limited by a polynomial function of the dimension number and the number of constraint planes. PubDate: 2017-04-05 DOI: 10.1007/s40745-017-0108-1

Authors:Daya K. Nagar; Saralees Nadarajah; Idika E. Okorie Abstract: The most flexible bivariate distribution to date is proposed with one variable restricted to [0, 1] and the other taking any non-negative value. Various mathematical properties and maximum likelihood estimation are addressed. The mathematical properties derived include shape of the distribution, covariance, correlation coefficient, joint moment generating function, Rényi entropy and Shannon entropy. For interval estimation, explicit expressions are derived for the information matrix. Illustrations using two real data sets show that the proposed distribution performs better than all other known distributions of its kind. PubDate: 2017-04-05 DOI: 10.1007/s40745-017-0111-6

Authors:Nádia P. Kozievitch; Thiago H. Silva; Artur Ziviani; Giovani Costa; Gustavo Lugo Abstract: Recent concepts such as Smart Cities, Urban Computing, and Geographic Information Systems are being discussed in various international forums, using themes such as sustainability and efficient use of the city infrastructures. One important aspect in this regard is to correctly associate computational techniques with statistical models and integrate heterogeneous data sources using open data shared by cities. Based on that, this study uses open data from the city of Curitiba (Brazil) in order to bring results on the spatiotemporal evolution of business activities along a period of over thirty years. To that end, the study identifies and discusses important challenges that had to be tackled toward data quality, data categorization, and data integration, in order to perform this type of study in practice. By looking at the dynamics of geographically grounded microeconomic variables, this study shows how the expansion and diversification of business types in different neighborhoods happened, contributing to a better understanding of the process of evolution of the business activity in a city. PubDate: 2017-03-23 DOI: 10.1007/s40745-017-0104-5

Authors:Million Wesenu; Sudhir Kulkarni; Tafere Tilahun Abstract: Preterm birth is the term used to define births that occur before 37 completed weeks or 259 days of gestation. The aim of this study is to model survival probability of premature infants who were under follow-up and identify significant risk factors for mortality. Recorded hospital data were obtained for a cohort of 490 infants at Jimma University Specialized Hospital, Ethiopia. The infants have been under follow-up from January 2013 to December 2015. The non-parametric, semi-parametric and parametric survival models are used to estimate the survival time as well as examine the association between the survival time with different demographic, health and risk behavior variables. The analysis shows that most factors significantly contribute to a shorter survival time of premature infants. These factors include having prenatal Asphyxia, hyaline membrane disease, sepsis, jaundice, low gestational age, respiratory distress syndrome and initial temperature. It is therefore recommended that people ought to be cognizant on the burden of these risk factors and well informed about the prematurity. PubDate: 2017-03-20 DOI: 10.1007/s40745-017-0107-2

Authors:Mehrzad Ghorbani; Seyed Fazel Bagheri; Mojtaba Alizadeh Abstract: In this paper, a new family of distributions, called the additive modified Weibull odd log-logistic-G Poisson distribution, is proposed and studied. Some mathematical properties are presented and special models are discussed. We derive a power series for the quantile function, explicit expressions for the moments, quantile and generating functions and order statistics. we also consider some estimators of the PDF and the survival function of the new family such as: maximum likelihood estimator, percentile estimator, least squares estimator and weighted least squares estimator. Simulation studies and real data application are also considered for performance of the new family and comparing these estimators. PubDate: 2017-03-13 DOI: 10.1007/s40745-017-0102-7

Authors:Suresh Dara; Haider Banka; Chandra Sekhara Rao Annavarapu Abstract: Feature selection in high dimensional data, particularly, in gene expression data, is one of the challenging task in bioinformatics due to the curse of dimensionality, data redundancy and noise values. In gene expression data, insignificant features causes poor classification, hence feature selection reduces feature subset, improving classification accuracy. Feature selection algorithms in gene expression data(such as filter based, wrapper based and hybrid methods) performing poor accuracy, where as few methods takes too much time to converge for an acceptable results. For example, in NSGA-II, over 10,000 generations, on an average, to converge in the search space. where it incurs increased computational time. Proposed rough based hybrid binary PSO algorithm, which uses a heuristic based fast processing strategy to reduce crude domain features by statistical elimination of redundant features and then discretized subsequently into a binary table, known as distinction table, in rough set theory. This distinction table is later used as input to evaluate and optimize the objectives functions i.e., to generate reduct in rough set theory. The proposed hybrid binary PSO is then used to tune the objective functions, to choose the most important features (i:e:reduct). The fitness function is used in such a way that it can reduce the cardinality of the features and at the same time, improve the classification performance as well. Results have been demonstrated to show the effectiveness of the proposed method, on existing three benchmark datasets (i.e. colon cancer, lymphoma and leukemia data), from literature. PubDate: 2017-03-11 DOI: 10.1007/s40745-017-0106-3

Authors:Zhuopei Yang; Yanmei Zhang; Hengyue Jia Abstract: The low success rate of lending is the main drawback of development of online P2P lending platforms in China. Based on the theory of social capital, this study analysed the influence factors of success rate of P2P lending platform in China, using social network method and multiple linear regression model. Soft information, such as bidding record, has been creatively employed to study the corresponding topics. Data used in this study comes from the largest online P2P lending platform in China. The results show that: compared with other influence factors, the bidding record has a more significant effect on the success rate, and the users depend more on the social capital; the bidding records reduce the asymmetry of information, and help increasing the success rate of lending and decreasing the cost of online P2P lending. PubDate: 2017-03-10 DOI: 10.1007/s40745-017-0103-6

Authors:M. Nassar; S. G. Nassr; S. Dey Abstract: In this paper, we investigate the maximum likelihood estimation of the unknown parameters of the Burr Type-XII distribution and the acceleration factor based on two different progressively hybrid censoring schemes, namely, Type-I progressive hybrid censoring scheme (T-I PHCS) proposed by Kundu and Joarder (Comput Stat Data Anal 50:2509–2528, 2006) and adaptive Type-II progressive hybrid censoring scheme (AT-II PHCS) introduced by Ng et al. (Nav Res Logist 56:687–698, 2009) under step-stress partially accelerated life test model. The observed Fisher information matrix is obtained to construct an approximate confidence interval for the unknown parameters. The performances of the estimators of the model parameters using the above mentioned progressively hybrid censoring schemes are evaluated and compared in terms of the mean squared errors and relative errors through a Monte Carlo simulation study. PubDate: 2017-03-04 DOI: 10.1007/s40745-017-0101-8

Authors:F. Maleki; E. Deiri Abstract: In this paper, we consider the estimation of the PDF and the CDF of the Frechet distribution. In this regard, following estimators are considered: uniformly minimum variance unbiased estimator, maximum likelihood estimator, percentile estimator, least squares estimator and weighted least squares estimator. To do so, analytical expressions are derived for the bias and the mean squared error. As the result of simulation studies and real data applications indicate, the ML estimator performs better than the others. PubDate: 2017-02-27 DOI: 10.1007/s40745-017-0100-9

Authors:P. RajaRajeswari; S. Viswanadha Raju Abstract: The data contained in the DNA atom for even basic unicellular life forms is huge and requires proficient capacity. Proficient capacity implies, expulsion of all excess from the information being put away. The Proposed Compression calculation “GENBIT Compress” is solely intended to dispense with all repetition from the DNA groupings of extensive genomes. We characterize a pressure separation, taking into account an ordinary compressor to show it is a permissible separation. Just as of late have researchers started to value the way that pressure proportions imply a lot of essential measurable data. In applying the methodology, we have utilized another DNA succession compressor “GENBIT Compress”. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, is provably optimal in the sense that it minimises every computable normalized metric that satisfies a certain density requirement. However, the optimality comes at the price of using the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates optimality The normalized compression distance, an efficiently computable, and thus practically applicable form of the normalized information distance is used to calculate Distance Matrix The normalized compression distance, an effectively processable, and along these lines for all intents and purposes relevant type of the standardized data separation is utilized to figure Distance Matrix. In this paper this new separation framework is proposed to recreate Phylogenetic tree. Phylogeny are the fundamental device for speaking to the relationship among organic elements. Phylogenetic remaking techniques endeavor to locate the developmental history of given arrangement of species. This history is generally depicted by an edge weighted tree, where edges relate to various branches of advancement, and the heaviness of an edge compares to the measure of developmental change on that specific branch. We developed a phylogenetic tree with BChE DNA arrangements of warm blooded creatures giving new proposed separation grid by GENBIT compressor to NJ (Neighbor-Joining calculation) tree. The results in the present research confirm the existence of low compression ratios for natural DNA sequences with high repetitive DNA bases(A, C, G, T), the more repetitive bases, the less is their compression ratios. The ultimate goal is, of course, to learn the “genome organization” principles, and explain this organization using our knowledge about evolution. PubDate: 2017-01-12 DOI: 10.1007/s40745-016-0098-4

Authors:Sanku Dey; Vikas Kumar Sharma; Mhamed Mesfioui Abstract: The Weibull distribution has been generalized by many authors in recent years. Here, we introduce a new generalization, called alpha-power transformed Weibull distribution that provides better fits than the Weibull distribution and some of its known generalizations. The distribution contains alpha-power transformed exponential and alpha-power transformed Rayleigh distributions as special cases. Various properties of the proposed distribution, including explicit expressions for the quantiles, mode, moments, conditional moments, mean residual lifetime, stochastic ordering, Bonferroni and Lorenz curve, stress–strength reliability and order statistics are derived. The distribution is capable of modeling monotonically increasing, decreasing, constant, bathtub, upside-down bathtub and increasing–decreasing–increasing hazard rates. The maximum likelihood estimators of unknown parameters cannot be obtained in explicit forms, and they have to be obtained by solving non-linear equations only. Two data sets have been analyzed to show how the proposed models work in practice. Further, a bivariate extension based on Marshall–Olkin and copula concept of the proposed model are developed but the properties of the distribution not considered in detail in this paper that can be addressed in future research. PubDate: 2017-01-07 DOI: 10.1007/s40745-016-0094-8

Authors:Yongjia Xie; Dengsheng Wu; Yuanping Chen; Wenbin Jiao; Jianping Li Abstract: It has been worthy of notice that the number of scientific researchers has experienced a rapid growth in China. Meanwhile, the strict restriction to the total number and the position structure of researchers has exerted great pressure on the Chinese researchers. The decision makers have noticed this dilemma and a quantitative predicting result for decision support is in need. This paper puts forward a data-driven dynamic programming model to estimate the research position demand gap based on the thought of dynamic programming. This model fully considers the real practice of human resource management in scientific management in China. In the empirical study, the personnel data from 2006 to 2014, which are abstracted from the Academia Resource Planning system of the Chinese Academy of Sciences, are applied to the empirical analysis to estimate the human resource demand gap in the 13th Five Year Plan. The results show that there is a big demand gap of the research position on the whole in the next five years. PubDate: 2017-01-04 DOI: 10.1007/s40745-016-0095-7

Authors:Tiantian Nie; Ziteng Wang; Shu-Cherng Fang; John E. Lavery Abstract: Cubic \(L^1\) spline fits have been developed for geometric data approximation and shown excellent performances in shape preservation. To quantify the convex-shape-preserving capability of spline fits, we consider a basic shape of convex corner with two line segments in a given window. Given the horizontal length difference and slope change of convex corner, we conduct an analytical approach and a numerical procedure to calculate the second-derivative-based spline fit in 3-node and 5-node windows, respectively. Results in both cases show that the convex shape can be well preserved when the horizontal length difference is within the middle third of window’s length. In addition, numerical results in the 5-node window indicate that the second-derivative based and first-derivative based spline fits outperform function-value based spline fits for preserving this convex shape. Our study extends current quantitative research on shape preservation of cubic \(L^1\) spline fits and provides more insights on improving advance spline node positions for shape preserving purpose. PubDate: 2017-01-03 DOI: 10.1007/s40745-016-0099-3