Abstract: Abstract Recent concepts such as Smart Cities, Urban Computing, and Geographic Information Systems are being discussed in various international forums, using themes such as sustainability and efficient use of the city infrastructures. One important aspect in this regard is to correctly associate computational techniques with statistical models and integrate heterogeneous data sources using open data shared by cities. Based on that, this study uses open data from the city of Curitiba (Brazil) in order to bring results on the spatiotemporal evolution of business activities along a period of over thirty years. To that end, the study identifies and discusses important challenges that had to be tackled toward data quality, data categorization, and data integration, in order to perform this type of study in practice. By looking at the dynamics of geographically grounded microeconomic variables, this study shows how the expansion and diversification of business types in different neighborhoods happened, contributing to a better understanding of the process of evolution of the business activity in a city. PubDate: 2017-03-23

Abstract: Abstract Preterm birth is the term used to define births that occur before 37 completed weeks or 259 days of gestation. The aim of this study is to model survival probability of premature infants who were under follow-up and identify significant risk factors for mortality. Recorded hospital data were obtained for a cohort of 490 infants at Jimma University Specialized Hospital, Ethiopia. The infants have been under follow-up from January 2013 to December 2015. The non-parametric, semi-parametric and parametric survival models are used to estimate the survival time as well as examine the association between the survival time with different demographic, health and risk behavior variables. The analysis shows that most factors significantly contribute to a shorter survival time of premature infants. These factors include having prenatal Asphyxia, hyaline membrane disease, sepsis, jaundice, low gestational age, respiratory distress syndrome and initial temperature. It is therefore recommended that people ought to be cognizant on the burden of these risk factors and well informed about the prematurity. PubDate: 2017-03-20

Authors:Mehrzad Ghorbani; Seyed Fazel Bagheri; Mojtaba Alizadeh Abstract: Abstract In this paper, a new family of distributions, called the additive modified Weibull odd log-logistic-G Poisson distribution, is proposed and studied. Some mathematical properties are presented and special models are discussed. We derive a power series for the quantile function, explicit expressions for the moments, quantile and generating functions and order statistics. we also consider some estimators of the PDF and the survival function of the new family such as: maximum likelihood estimator, percentile estimator, least squares estimator and weighted least squares estimator. Simulation studies and real data application are also considered for performance of the new family and comparing these estimators. PubDate: 2017-03-13 DOI: 10.1007/s40745-017-0102-7

Authors:Suresh Dara; Haider Banka; Chandra Sekhara Rao Annavarapu Abstract: Abstract Feature selection in high dimensional data, particularly, in gene expression data, is one of the challenging task in bioinformatics due to the curse of dimensionality, data redundancy and noise values. In gene expression data, insignificant features causes poor classification, hence feature selection reduces feature subset, improving classification accuracy. Feature selection algorithms in gene expression data(such as filter based, wrapper based and hybrid methods) performing poor accuracy, where as few methods takes too much time to converge for an acceptable results. For example, in NSGA-II, over 10,000 generations, on an average, to converge in the search space. where it incurs increased computational time. Proposed rough based hybrid binary PSO algorithm, which uses a heuristic based fast processing strategy to reduce crude domain features by statistical elimination of redundant features and then discretized subsequently into a binary table, known as distinction table, in rough set theory. This distinction table is later used as input to evaluate and optimize the objectives functions i.e., to generate reduct in rough set theory. The proposed hybrid binary PSO is then used to tune the objective functions, to choose the most important features (i:e:reduct). The fitness function is used in such a way that it can reduce the cardinality of the features and at the same time, improve the classification performance as well. Results have been demonstrated to show the effectiveness of the proposed method, on existing three benchmark datasets (i.e. colon cancer, lymphoma and leukemia data), from literature. PubDate: 2017-03-11 DOI: 10.1007/s40745-017-0106-3

Authors:Zhuopei Yang; Yanmei Zhang; Hengyue Jia Abstract: Abstract The low success rate of lending is the main drawback of development of online P2P lending platforms in China. Based on the theory of social capital, this study analysed the influence factors of success rate of P2P lending platform in China, using social network method and multiple linear regression model. Soft information, such as bidding record, has been creatively employed to study the corresponding topics. Data used in this study comes from the largest online P2P lending platform in China. The results show that: compared with other influence factors, the bidding record has a more significant effect on the success rate, and the users depend more on the social capital; the bidding records reduce the asymmetry of information, and help increasing the success rate of lending and decreasing the cost of online P2P lending. PubDate: 2017-03-10 DOI: 10.1007/s40745-017-0103-6

Authors:M. Nassar; S. G. Nassr; S. Dey Abstract: Abstract In this paper, we investigate the maximum likelihood estimation of the unknown parameters of the Burr Type-XII distribution and the acceleration factor based on two different progressively hybrid censoring schemes, namely, Type-I progressive hybrid censoring scheme (T-I PHCS) proposed by Kundu and Joarder (Comput Stat Data Anal 50:2509–2528, 2006) and adaptive Type-II progressive hybrid censoring scheme (AT-II PHCS) introduced by Ng et al. (Nav Res Logist 56:687–698, 2009) under step-stress partially accelerated life test model. The observed Fisher information matrix is obtained to construct an approximate confidence interval for the unknown parameters. The performances of the estimators of the model parameters using the above mentioned progressively hybrid censoring schemes are evaluated and compared in terms of the mean squared errors and relative errors through a Monte Carlo simulation study. PubDate: 2017-03-04 DOI: 10.1007/s40745-017-0101-8

Authors:F. Maleki; E. Deiri Abstract: Abstract In this paper, we consider the estimation of the PDF and the CDF of the Frechet distribution. In this regard, following estimators are considered: uniformly minimum variance unbiased estimator, maximum likelihood estimator, percentile estimator, least squares estimator and weighted least squares estimator. To do so, analytical expressions are derived for the bias and the mean squared error. As the result of simulation studies and real data applications indicate, the ML estimator performs better than the others. PubDate: 2017-02-27 DOI: 10.1007/s40745-017-0100-9

Authors:Kenneth David Strang; Zhaohao Sun Abstract: Abstract We extended the big data body of knowledge by analyzing the longitudinal literature to highlight important research topics and identify critical gaps. We initially collected 79,012 articles from 1900 to 2016 related to big data. We refined our sample to 13,029 articles allowing us to determine that the big data paradigm commenced in late 2011 and the research production exponentially rose starting in 2012, which approximated a Weibull distribution that captured 82% of the variance ( \(p<.01\) ). We developed a dominant topic list for the big data body of knowledge that contained 49 keywords resulting in an inter-rater reliability of 93% ( \(\hbox {r}^{2}=0.89\) ). We found there were 13 dominant topics that captured 49% of the big data production in journals during 2011–2016 but privacy and security related topics accounted for only 2% of those outcomes. We analyzed the content of 970 journal manuscripts produced during the first of 2016 to determine the current status of big data research. The results revealed a vastly different current trend with too many literature reviews and conceptual papers that accounted for 41% of the current big data knowledge production. Interestingly, we observed new big data topics emerging from the healthcare and physical sciences disciplines. PubDate: 2017-01-21 DOI: 10.1007/s40745-016-0096-6

Authors:Yogesh Mani Tripathi; Amulya Kumar Mahto; Sanku Dey Abstract: Abstract The generalized logistic distribution is a useful extension of the logistic distribution, allowing for increasing and bathtub shaped hazard rates and has been used to model the data with a unimodal density. Here, we consider estimation of the probability density function and the cumulative distribution function of the generalized logistic distribution. The following estimators are considered: maximum likelihood estimator, uniformly minimum variance unbiased estimator (UMVUE), least square estimator, weighted least square estimator, percentile estimator, maximum product spacing estimator, Cramér–von-Mises estimator and Anderson–Darling estimator. Analytical expressions are derived for the bias and the mean squared error. Simulation studies are also carried out to show that the maximum-likelihood estimator is better than the UMVUE and that the UMVUE is better than others. Finally, a real data set has been analyzed for illustrative purposes. PubDate: 2017-01-13 DOI: 10.1007/s40745-016-0093-9

Authors:P. RajaRajeswari; S. Viswanadha Raju Abstract: Abstract The data contained in the DNA atom for even basic unicellular life forms is huge and requires proficient capacity. Proficient capacity implies, expulsion of all excess from the information being put away. The Proposed Compression calculation “GENBIT Compress” is solely intended to dispense with all repetition from the DNA groupings of extensive genomes. We characterize a pressure separation, taking into account an ordinary compressor to show it is a permissible separation. Just as of late have researchers started to value the way that pressure proportions imply a lot of essential measurable data. In applying the methodology, we have utilized another DNA succession compressor “GENBIT Compress”. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, is provably optimal in the sense that it minimises every computable normalized metric that satisfies a certain density requirement. However, the optimality comes at the price of using the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates optimality The normalized compression distance, an efficiently computable, and thus practically applicable form of the normalized information distance is used to calculate Distance Matrix The normalized compression distance, an effectively processable, and along these lines for all intents and purposes relevant type of the standardized data separation is utilized to figure Distance Matrix. In this paper this new separation framework is proposed to recreate Phylogenetic tree. Phylogeny are the fundamental device for speaking to the relationship among organic elements. Phylogenetic remaking techniques endeavor to locate the developmental history of given arrangement of species. This history is generally depicted by an edge weighted tree, where edges relate to various branches of advancement, and the heaviness of an edge compares to the measure of developmental change on that specific branch. We developed a phylogenetic tree with BChE DNA arrangements of warm blooded creatures giving new proposed separation grid by GENBIT compressor to NJ (Neighbor-Joining calculation) tree. The results in the present research confirm the existence of low compression ratios for natural DNA sequences with high repetitive DNA bases(A, C, G, T), the more repetitive bases, the less is their compression ratios. The ultimate goal is, of course, to learn the “genome organization” principles, and explain this organization using our knowledge about evolution. PubDate: 2017-01-12 DOI: 10.1007/s40745-016-0098-4

Authors:Sanku Dey; Vikas Kumar Sharma; Mhamed Mesfioui Abstract: Abstract The Weibull distribution has been generalized by many authors in recent years. Here, we introduce a new generalization, called alpha-power transformed Weibull distribution that provides better fits than the Weibull distribution and some of its known generalizations. The distribution contains alpha-power transformed exponential and alpha-power transformed Rayleigh distributions as special cases. Various properties of the proposed distribution, including explicit expressions for the quantiles, mode, moments, conditional moments, mean residual lifetime, stochastic ordering, Bonferroni and Lorenz curve, stress–strength reliability and order statistics are derived. The distribution is capable of modeling monotonically increasing, decreasing, constant, bathtub, upside-down bathtub and increasing–decreasing–increasing hazard rates. The maximum likelihood estimators of unknown parameters cannot be obtained in explicit forms, and they have to be obtained by solving non-linear equations only. Two data sets have been analyzed to show how the proposed models work in practice. Further, a bivariate extension based on Marshall–Olkin and copula concept of the proposed model are developed but the properties of the distribution not considered in detail in this paper that can be addressed in future research. PubDate: 2017-01-07 DOI: 10.1007/s40745-016-0094-8

Authors:Reza Hadizadeh; Amir Abbas Shojaie Abstract: Abstract Measuring the bullwhip effect, a phenomenon in which demand variability increases as one moves up the supply chain, is a major issue in supply chain management. In this paper, we quantify the impact of the bullwhip effect on a simple two-stage supply chain consisting of one supplier and one retailer, where the retailer employed a base-stock policy to replenish their inventory. The demand forecast was performed via a mixed autoregressive moving average model, ARMA(1,1), in which ARMA model errors have the GARCH process and the model’s variance changes with time i.e. the model has conditional heteroscedasticity in order to simulate the bullwhip effect which has a non-linear behavior. The definition of bullwhip effect has been expanded to “over time bullwhip effect” (conditional bullwhip effect). We use the minimum mean-square error forecasting technique and also investigate the effects of the autoregressive coefficient, the moving average parameter and the lead time on the bullwhip effect. Moreover, bullwhip effect has been compared in linear demand ARMA and none linear demand ARMA–GARCH process. The results show that the bullwhip effect can be decreased by choosing correct coefficients in demand process through none linear demand process. PubDate: 2017-01-06 DOI: 10.1007/s40745-016-0097-5

Authors:Yongjia Xie; Dengsheng Wu; Yuanping Chen; Wenbin Jiao; Jianping Li Abstract: It has been worthy of notice that the number of scientific researchers has experienced a rapid growth in China. Meanwhile, the strict restriction to the total number and the position structure of researchers has exerted great pressure on the Chinese researchers. The decision makers have noticed this dilemma and a quantitative predicting result for decision support is in need. This paper puts forward a data-driven dynamic programming model to estimate the research position demand gap based on the thought of dynamic programming. This model fully considers the real practice of human resource management in scientific management in China. In the empirical study, the personnel data from 2006 to 2014, which are abstracted from the Academia Resource Planning system of the Chinese Academy of Sciences, are applied to the empirical analysis to estimate the human resource demand gap in the 13th Five Year Plan. The results show that there is a big demand gap of the research position on the whole in the next five years. PubDate: 2017-01-04 DOI: 10.1007/s40745-016-0095-7

Authors:Tiantian Nie; Ziteng Wang; Shu-Cherng Fang; John E. Lavery Abstract: Abstract Cubic \(L^1\) spline fits have been developed for geometric data approximation and shown excellent performances in shape preservation. To quantify the convex-shape-preserving capability of spline fits, we consider a basic shape of convex corner with two line segments in a given window. Given the horizontal length difference and slope change of convex corner, we conduct an analytical approach and a numerical procedure to calculate the second-derivative-based spline fit in 3-node and 5-node windows, respectively. Results in both cases show that the convex shape can be well preserved when the horizontal length difference is within the middle third of window’s length. In addition, numerical results in the 5-node window indicate that the second-derivative based and first-derivative based spline fits outperform function-value based spline fits for preserving this convex shape. Our study extends current quantitative research on shape preservation of cubic \(L^1\) spline fits and provides more insights on improving advance spline node positions for shape preserving purpose. PubDate: 2017-01-03 DOI: 10.1007/s40745-016-0099-3

Abstract: Abstract Traditional anaesthesia training is considered as a time-consuming task since trainees are required to go through an extended period of knowledge learning and practice their skill in the supervision of experienced anaesthetists. In this paper, a Computational Virtual Reality Environment for Anesthesia (CVREA) is proposed, which can significantly improve the training and learning performance of trainee anaesthetists in an efficient way. Virtual reality, big data, data mining and machine learning techniques will be explored and applied in this system. CVREA consists of two main parts: (1) an immersive and interactive VR-based training platform for anaesthetists. It allows trainees to hone their clinical skills in a virtual environment without placing risk to patients. (2) a knowledge learning system which records and collects clinical data with greater richness. Knowledge learning algorithms will be developed to explore these data in order to help data processing and facilitates knowledge discovery in anaesthesiology. PubDate: 2016-11-03 DOI: 10.1007/s40745-016-0089-5

Authors:Jingjing Tang; Yingjie Tian Abstract: Abstract Similarity detection technology captures a host of researchers’ attention. Minwise hashing schemes become the current researching hot spots in machine learning for similarity preservation. During the data preprocessing stage, the basic idea of minwise hashing schemes is to transfer the original data into binary codes which are good proxies of original data to preserve the similarity. Minwise hashing schemes can improve the computation efficiency and save the storage space without notable loss of accuracy. Thus, they have been studied extensively and developed rapidly for decades. Considering minwise hashing algorithm and its variants, a systematic survey is needed and beneficial to understand and utilize this kind of data preprocessing techniques more easily. The purpose of this paper is to review minwise hashing algorithms in detail and provide an insightful understanding of current developments. In order to show the application prospect of the minwise hashing algorithms, various algorithms have combined with linear Support Vector Machine for large-scale classification. Both theoretical analysis and experimental results demonstrate that these algorithms can achieve massive advantages in accuracy, efficiency and energy-consumption. Furthermore, their limitations, major opportunities and challenges, extensions and variants as well as potential important research directions have been pointed out. PubDate: 2016-10-26 DOI: 10.1007/s40745-016-0091-y

Authors:Shouvik Dutta; Jason Sauppe; Sheldon Jacobson Abstract: Abstract Customers today are faced with a plethora of choices of products to buy and consume. The sheer volume of choices can be daunting, and customers forced to sift through the products are likely to become dissatisfied. Retailers have the ability to solve this problem by providing customers with recommendations of products that are likely to be of interest to each specific customer. This can be done by profiling each customer and identifying products that similar customers like. This paper presents a balance optimization approach, where customers are characterized and matched as groups. By identifying and analyzing a group of customers who have shown positive reactions to a specific product, we propose a technique to find a comparable group who we hypothesize will show a similar positive reaction. This allows for the creation of targeted advertisements, mailing lists, and other material to recommend products to customers. The methodology is tested using a Netflix dataset, where we are able to show a statistically significant improvement on the mean rating of selected users over random selection of 0.384 when the ratings are on a scale of 0–5. PubDate: 2016-10-12 DOI: 10.1007/s40745-016-0090-z

Authors:S. M. A. Jahanshahi; A. Habibi Rad; V. Fakoor Abstract: Abstract In this paper, we introduce a new goodness-of-fit test for Rayleigh distribution based on Hellinger distance. In addition, some properties about the proposed test is presented. Then, new proposed test is compared with other goodness-of-fit tests for Rayleigh distribution in the literature in terms of power. Finally, we conclude that the entropy based tests demonstrate a good performance in terms of power and we can choose the Hellinger test as more powerful than the other competitor tests. PubDate: 2016-10-11 DOI: 10.1007/s40745-016-0088-6

Authors:Mehdi Jabbari Nooghabi Abstract: Abstract In this paper, we find the moment, maximum likelihood, least squares and weighted least squares estimators of the parameters of Lomax distribution in the presence of outliers. Also, the mixture estimator of these four methods is derived. Further, we discuss about the efficiency of the estimators. Analysis of a simulated data set and an actual example from an insurance company has been presented for illustrative purposes. PubDate: 2016-09-15 DOI: 10.1007/s40745-016-0087-7

Authors:Butchi Raju Katari; S. Viswanadha Raju Abstract: Abstract A massive requirement of information vitalized the importance of managing enormous amount of data. It becomes a herculean task to fetch the anticipated data from large data storage as it includes text processing, text mining, pattern recognition, data cleaning etc., The need for concurrent events and coming up with high performance processing models to extract data is a challenge to the researchers. One of the solutions to this challenge is concurrent process to match string on processing models. While, some of the mechanisms do perform very well in practice. Frequent works have been published on this subject and research is still active in this area as the scope and opportunities to develop the new techniques is perennial. This paper proposes N-folded parallel string matching mechanism. This mechanism would be able to divide the input sequence files into various parts and the same would be distributed to the processors. Considering this mechanism as a model, experiments have been conducted considering chloroplast, mitochondria and different categories of plants genome sequence file as input for different sizes with seven possible patterns. The results of the experiment made evident that N-folded parallel string matching mechanism can reduce the processing time on a multi processor system. PubDate: 2016-08-30 DOI: 10.1007/s40745-016-0086-8