Abstract: The Lindley distribution has been generalized by many authors in recent years. A new two-parameter distribution with decreasing failure rate is introduced, called Alpha Power Transformed Lindley (APTL, in short, henceforth) distribution that provides better fits than the Lindley distribution and some of its known generalizations. The new model includes the Lindley distribution as a special case. Various properties of the proposed distribution, including explicit expressions for the ordinary moments, incomplete and conditional moments, mean residual lifetime, mean deviations, L-moments, moment generating function, cumulant generating function, characteristic function, Bonferroni and Lorenz curves, entropies, stress-strength reliability, stochastic ordering, statistics and distribution of sums, differences, ratios and products are derived. The new distribution can have decreasing increasing, and upside-down bathtub failure rates function depending on its parameters. The model parameters are obtained by the method of maximum likelihood estimation. Also, we obtain the confidence intervals of the model parameters. A simulation study is carried out to examine the bias and mean squared error of the maximum likelihood estimators of the parameters. Finally, two data sets have been analyzed to show how the proposed models work in practice. PubDate: 2019-12-01

Abstract: This paper introduces a new family of distributions using exponential negative binomial distribution. The proposed family of distributions generalizes the Marshall–Olkin, Complementary exponential G-geometric, Complementary Beta G-geometric and Complementary Kumaraswamy G-geometric families of distribution. Explicit expressions of statistical and reliability properties of the proposed family of distributions are derived. Some special cases of this family of distributions are presented in detail. Suitability of the suggested family of distributions is established by using real life data sets from different areas of application. The empirical results indicate that the proposed family performs better than already existing families of distributions. PubDate: 2019-12-01

Abstract: High dimensional data are rapidly growing in many domains due to the development of technological advances which helps collect data with a large number of variables to better understand a given phenomenon of interest. Particular examples appear in genomics, fMRI data analysis, large-scale healthcare analytics, text/image analysis and astronomy. In the last two decades regularisation approaches have become the methods of choice for analysing such high dimensional data. This paper aims to study the performance of regularisation methods, including the recently proposed method called de-biased lasso, for the analysis of high dimensional data under different sparse and non-sparse situations. Our investigation concerns prediction, parameter estimation and variable selection. We particularly study the effects of correlated variables, covariate location and effect size which have not been well investigated. We find that correlated data when associated with important variables improve those common regularisation methods in all aspects, and that the level of sparsity can be reflected not only from the number of important variables but also from their overall effect size and locations. The latter may be seen under a non-sparse data structure. We demonstrate that the de-biased lasso performs well especially in low dimensional data, however it still suffers from issues, such as multicollinearity and multiple hypothesis testing, similar to the classical regression methods. PubDate: 2019-12-01

Abstract: This article presents the Bayesian and classical inferences for the Chen distribution assuming upper record values. As the posterior distribution is not in a closed form, a Markov Chain Monte Carlo method is presented to obtain the posterior summaries. To assess the effect of prior on the estimated parameters, sensitivity analysis is also a part of this study. Moreover, a comparison between the Bayesian and frequentist approaches is also given. Besides the simulation studies, a real data example to show the application of the study is also discussed. PubDate: 2019-12-01

Abstract: Ethiopia’s coffee export earning percentage share in the total export has been rapidly waning over the last decades while it is the first commodity in currency grossing of the country. Since, this study analyses the determinant factors of Ethiopia’s coffee exports (ECE) performance, in the dimension of export sales, via a more realistic model application, dynamic panel gravity model. It commences with the disintegration of the determinant into supply- and demand-side factors. It used short panel data that comprise 71 countries of consistent Ethiopia’s coffee importers for the period of 11 years from 2005 to 2015. The panel unit root test of Harris–Tzavalis was made for each variable and applied the first difference transformation for the variables that had a unit root. The system model of a linear dynamic panel gravity model was specified and estimated with two-step general method moment estimation approach. The model results suggested that lagged ECE performance, real gross domestic product (GDP) of importing countries, Ethiopian population, Ethiopian real GDP, openness to trade of importing countries, Ethiopian institutional quality, and weighted distance were found to be the determinant factors of Ethiopia’s coffee exports performance. The study also implied policies that would promote institutional quality or permits favorable market environments, supply capacity, trade liberalization, and destination with relatively cheaper transportation costs in order to progress Ethiopia’s coffee exports performance. PubDate: 2019-12-01

Abstract: Machine learning algorithms (MLAs) usually process large and complex datasets containing a substantial number of features to extract meaningful information about the target concept (a.k.a class). In most cases, MLAs suffer from the latency and computational complexity issues while processing such complex datasets due to the presence of lesser weight (i.e., irrelevant or redundant) features. The computing time of the MLAs increases explosively with increase in the number of features, feature dependence, number of records, types of the features, and nested features categories present in such datasets. Appropriate feature selection before applying MLA is a handy solution to effectively resolve the computing speed and accuracy trade-off while processing large and complex datasets. However, selection of the features that are sufficient, necessary, and are highly co-related with the target concept is very challenging. This paper presents an efficient feature selection algorithm based on random forest to improve the performance of the MLAs without sacrificing the guarantees on the accuracy while processing the large and complex datasets. The proposed feature selection algorithm yields unique features that are closely related with the target concept (i.e., class). The proposed algorithm significantly reduces the computing time of the MLAs without degrading the accuracy much while learning the target concept from the large and complex datasets. The simulation results fortify the efficacy and effectiveness of the proposed algorithm. PubDate: 2019-12-01

Abstract: In the present paper, we introduce a new lifetime distribution based on the general odd hyperbolic cosine-FG model. Some important properties of proposed model including survival function, quantile function, hazard function, order statistic are obtained. In addition estimating unknown parameters of this model will be examined from the perspective of classic and Bayesian statistics. Moreover, an example of real data set is studied; point and interval estimations of all parameters are obtained by maximum likelihood, bootstrap (parametric and non-parametric) and Bayesian procedures. Finally, the superiority of proposed model in terms of parent exponential distribution over other fundamental statistical distributions is shown via the example of real observations. PubDate: 2019-12-01

Abstract: In this article, we study the composite generalizers of Weibull distribution using exponentiated, Kumaraswamy, transmuted and beta distributions. The composite generalizers are constructed using both forward and reverse order of each of these distributions. The usefulness and effectiveness of the composite generalizers and their order of composition is investigated by studying the reliability behavior of the resulting distributions. Two sets of real-world data are analyzed using the proposed generalized Weibull distributions. PubDate: 2019-12-01

Abstract: The generalized Lindley distribution is an important distribution for analyzing the stress–strength reliability models and lifetime data, which is quite flexible and can be used effectively in modeling survival data. It can have increasing, decreasing, upside-down bathtub and bathtub shaped failure rate. In this paper, we derive the exact explicit expressions for the single, double (product), triple and quadruple moments of order statistics from the generalized Lindley distribution. By using these relations, we have tabulated the expected values, second moments, variances and covariances of order statistics from samples of sizes up to 10 for various values of the parameters. Also, we use these moments to obtain the best linear unbiased estimates of the location and scale parameters based on Type-II right-censored samples. In addition, we carry out some numerical illustrations through Monte Carlo simulations to show the usefulness of the findings. Finally, we apply the findings of the paper to some real data set. PubDate: 2019-12-01

Abstract: In this paper, the statistical inference for the Gompertz distribution based on generalized progressively hybrid censored data is discussed. The estimation of the parameters for Gompertz distribution is discussed using the maximum likelihood method and the Bayesian methods under different loss functions. The existence and uniqueness of the maximum likelihood estimation are proved. The point and interval Bayesian predictions for unobserved failures from the same sample and that from the future sample are derived. The Monte Carlo simulation is applied to compare the proposed methods. A real data example is used to apply the methods of estimation and to construct the prediction intervals. PubDate: 2019-12-01

Abstract: Exponentiated Kumaraswamy-power function (EKPF) distribution has been proposed recently by Bursa and Ozel (Hacet J Math Stat 46:277–292, 2017) as a quite flexible in terms of probability density and hazard rate functions than power function distribution. In this paper, we obtain the explicit expressions for the single, double (product), triple and quadruple moments and moment generating function for single, double, triple and quadruple of order statistics of the EKPF distribution. By using these relations, we have tabulated the means and variances of order statistics from samples of sizes up to 10 for various values of the parameters. We use five frequentist estimation methods to estimate the unknown parameters and a simulation study is used to compare the performance of the different estimators. Finally, we analyse a real data set for illustrative purpose. PubDate: 2019-11-06

Abstract: Classification is an important task in Machine Learning. Often datasets used for such problems have a large number of features where only a few may be actually useful for this task. Feature Selection is the process where we aim to remove irrelevant features in order to improve our performance. This improved performance could be achieved with an increase in accuracy or by minimizing number of features selected for the task of classification, most Feature Selection algorithms aims at only one of these objectives in their approach. This paper presents the use of Diploid Genetic Algorithm (DGA) on multi-objective optimization of a classification problem for feature selection. The task is to develop a model for solving a Subset Sum Problem using DGA and applying the solution of this problem in order to accomplish the goal of multi-objective optimization by maximizing accuracy using minimum number of features. The model has been applied to publicly available datasets and the results shown are encouraging. This work establishes the veracity of DGA in feature selection. PubDate: 2019-10-17

Abstract: This article considers the problem of estimation of the parameter of DUS-exponential distribution based on upper records through classical as well as Bayesian procedures. Maximum likelihood estimator is calculated under the classical scheme and Bayes estimator is obtained under the squared error loss function by using MCMC technique. The performances of both the estimators are compared on the basis of their estimated mean squared errors. The asymptotic and HPD intervals for the unknown parameter is also calculated. In addition to that, the entropy of ith upper record and joint entropy based on m upper records are obtained for considered distribution. A real data set is considered to illustrate the suitability of the proposed methodology. PubDate: 2019-09-09

Abstract: In this paper, a new family of distributions called the exponentiated generalized power series family is proposed and studied. Statistical properties such as stochastic order, quantile function, entropy, mean residual life and order statistics were derived. Bivariate and multivariate extensions of the family was proposed. The method of maximum likelihood estimation was proposed for the estimation of the parameters. Some special distributions from the family were defined and their applications were demonstrated with real data sets. PubDate: 2019-09-01

Abstract: Causal inference with observational data has drawn attention across various fields. These observational studies typically use matching methods which find matched pairs with similar covariate values. However, matching methods may not directly achieve covariate balance, a measure of matching effectiveness. As an alternative, the Balance Optimization Subset Selection (BOSS) framework, which seeks optimal covariate balance directly, has been proposed. This paper extends BOSS by estimating and decomposing a treatment effect as a combination of heterogeneous treatment effects from a partitioned set. Our method differs from the traditional propensity score subclassification method in that we find a subset in each subclass using BOSS instead of using the stratum determined by the propensity score. Then, by conducting a bootstrap hypothesis test on each component, we check the statistical significance of these treatment effects. These methods are applied to a dataset from the National Supported Work Demonstration (NSW) program which was conducted in the 1970s. By examining the statistical significance, we show that the program was not significantly effective to a specific subgroup composed of those who were already employed. This differs from the combined estimate—the NSW program was effective when considering all the individuals. Lastly, we provide results that are obtained when these steps are repeated with sub-samples. PubDate: 2019-09-01

Abstract: The ranking of some English Premier League (EPL) clubs during football season is of keen interest to many stakeholders with special attention to the London rivals: Arsenal, Chelsea and Tottenham. In particular, the first (GF) and second half (GS) scores, besides being inter-related, is perceived as a convenient measure of the clubs potential. This paper studies the contributory effects of the possible factors that commonly influence the club scoring capacity in the halves along with forecasted measures diagnostics via a novel flexible bivariate time series model with COM-Poisson innovations using data from August 2014 to December 2017. PubDate: 2019-09-01

Abstract: A new generator of continuous distributions called Exponentiated Generalized Marshall–Olkin-G family with three additional parameters is proposed. This family of distribution contains several known distributions as sub models. The probability density function and cumulative distribution function are expressed as infinite mixture of the Marshall–Olkin distribution. Important properties like quantile function, order statistics, moment generating function, probability weighted moments, entropy and shapes are investigated. The maximum likelihood method to estimate model parameters is presented. A simulation result to assess the performance of the maximum likelihood estimation is briefly discussed. A distribution from this family is compared with two sub models and some recently introduced lifetime models by considering three real life data fitting applications. PubDate: 2019-09-01

Abstract: We define and study a new family of distributions, called generalized Burr XII power series class, by compounding the generalized Burr XII and power series distributions. Several properties of the new family are derived. The maximum likelihood estimation method is used to estimate the model parameters. The importance and potentiality of the new family are illustrated by means of three applications to real data sets. PubDate: 2019-09-01

Abstract: In this article, we propose and study a new family of distributions which is defined by using the genesis of the truncated Poisson distribution and the beta distribution. Some mathematical properties of the new family including moments, quantile and generating functions, mean deviations, order statistics and their moments, and reliability analysis are discussed. We also discuss the parameter estimation procedures and potential applications of such generalized family of distributions. PubDate: 2019-09-01

Abstract: Breast Cancer is a serious threat to women. The identification of breast cancer relies heavily on histopathological image analysis. Among the different breast-cancer image analysis techniques, classifying the images into Benign and Malignant classes, have been an active area of research. The involvement of machine learning for breast-cancer image classification is also an active area of research. Considering the importance of the breast-cancer image classification, this paper has classified a set of histopathological images into Benign and Malignant classes utilizing Neural Network techniques and Random Forest algorithms. As histopathological images suffer intensity variation, in this paper, we have normalized the intensity information by newly proposed intensity normalization techniques, and classify the images using Neural Network techniques and Tree-based classification tools. Investigation shows that the proposed Normalization technique gives the best performance when we use Neural Network techniques but Tree-based algorithms such as the Random Forest algorithm give better performance when we use images without normalization techniques. PubDate: 2019-09-01