Abstract: Over the last 10 years, the soaring housing prices have raised concerns over ‘affordability’ in Chinese housing market, although it is still not enshrined in agreed standards, partly because of different opinions about how it should be measured. To overcome the inadequacy of a single index, we examine the housing affordability of 35 large and medium cities in China from 2009 to 2016 using price-to-income ratio (PIR), monthly payment-income ratio (MIR) and the residual income approach (RI). With consideration of the characteristics of China’s real estate market, we have re-discussed the reasonable range of the indexes. The comparison of single index between cities shows significant periodicity and multi-index clustering analysis reveals regional characteristics, which help us to further the understanding of housing affordability. In the end, policy recommendations on reforming Chinese urban housing system are suggested according to the differences and changing laws of housing affordability among cities. PubDate: 2019-06-01

Abstract: We defined and studied and inventive distribution called Type II half logistic exponential (TIIHLE) distribution. Some well-known mathematical properties; moments, probability weighted moments, mean deviation, quantile function, Renyi entropy of TIIHLE distribution are investigated. The expressions of order statistics are derived. Parameters of the derived distribution are obtained using maximum likelihood method. The importance of proposed distribution is exemplified by two datasets. PubDate: 2019-06-01

Abstract: In this article, we introduce inverse Gompertz distribution with two parameters. Some statistical properties are presented such as hazard rate function, quantile, probability weighted (moments), skewness, kurtosis, entropies function, mean residual lifetime and mean inactive lifetime. The model parameters are estimated by the method of maximum likelihood, bootstrap, least squares, weighted least squares and Cramér-von Mises. Further, Monte Carlo simulations are carried out to compare the long-run performance of the estimators based on complete and type II right censored data. Finally, we estimate the parameters based on behavioral sciences data and fatigue life of 10 bearing of a certain type in hours censored data, which explain that the model fits the data better than some models. PubDate: 2019-06-01

Abstract: In this paper, we introduce a new family of probability distributions generated from a power Lindley random variable called the power Lindley-generated family. The new family extends several classical distributions as well as generalizes the odd Lindley family which is performed by Silva et al. (Austrian J Stat 46:65–87, 2017). Some of the mathematical properties are obtained involving moments, incomplete moments, quantile function and order statistics. New four distributions are provided as special models from the family. The model parameters of the family are estimated by the maximum likelihood technique. An application to real data set and simulation study are provided to demonstrate the flexibility and interest of one special model of the suggested family. PubDate: 2019-06-01

Abstract: The Lagos annual maximum rainfall is modeled by the generalized extreme value distribution. Hydrologic risk measures like the probability of exceedance or recurrence, return period, and return level is given. PubDate: 2019-06-01

Abstract: AdEater is an early browsing assistant that automatically removes advertisement images from internet pages. It works by generating rules from training data and implementing these rules when browsing the internet. Advertisement images on web pages are replaced by transparent images that display on the image the word “ad”, and where images are misclassified, non-advertisement images on a webpage will also be replaced by transparent images displaying “ad”. This paper critically examines the dataset derived from a trial of AdEater and tries to build a robust image classifier. We apply data mining techniques to uncover associations between features of advertisements and non-advertisements and try to predict whether the images are advertisements or non-advertisements based on three classification methods. We achieve classification accuracy of 96.5%, using k-fold cross validation to train and test the model. PubDate: 2019-06-01

Abstract: We introduce and study a new three-parameter lifetime distribution named as the inverse power Lomax. The proposed distribution is obtained as the inverse form of the power Lomax distribution. Some statistical properties of the inverse power Lomax model are implemented. Based on censored samples, maximum likelihood estimators of the model parameters are obtained. An intensive simulation study is performed for evaluating the behavior of estimators based on their biases and mean square errors. Superiority of the new model over some well-known distributions is illustrated by means of real data sets. The results revealed the fact that; the suggested model can produce better fits than some well-known distributions. PubDate: 2019-06-01

Abstract: In this paper, we propose a new conjugate prior probability distribution to many likelihoods distributions. In particular, we use the weighted Lindley distribution as a conjugate prior distribution. The weighted Lindley distribution can be viewed as a mixture of two gamma distributions with know weights. The weighted Lindley distribution of conjugate priors offers a more flexible class of priors than the class of gamma prior distributions. The results are illustrated for the problem of inference for Poisson and normal parameters. PubDate: 2019-06-01

Abstract: In this paper, a new extension of the Rayleigh distribution called the Hyperbolic Sine-Rayleigh distribution is introduced and studied. The proposed model is very flexible and is capable of modeling with increasing and unimodal hazard rates. A comprehensive treatment of its mathematical properties including explicit expressions for the moments, quantiles, moment generating function, Entropy and order statistics are provided. Maximum likelihood estimates of the model parameters are obtained. Furthermore, a simulation study is conducted to access the behavior of the maximum likelihood estimators. Finally, the superiority of the subject model is illustrated empirically over the other distributions by analyzing a real-life application. PubDate: 2019-06-01

Abstract: This paper presents a random projection scheme for cancelable iris recognition. Instead of using original iris features, masked versions of the features are generated through the random projection in order to increase the security of the iris recognition system. The proposed framework for iris recognition includes iris localization, sector selection of the iris to avoid eyelids and eyelashes effects, normalization, segmentation of normalized iris region into halves, selection of the upper half for further reduction of eyelids and eyelashes effects, feature extraction with Gabor filter, and finally random projection. This framework guarantees exclusion of eyelids and eyelashes effects, and masking of the original Gabor features to increase the level of security. Matching is performed with a Hamming Distance (HD) metric. The proposed framework achieves promising recognition rates of 99.67% and a leading Equal Error Rate (EER) of 0.58%. PubDate: 2019-06-01

Abstract: In this paper, we introduce a flexible modified beta linear exponential (MBLE) distribution. Our motivation, besides others are there, dues to its ability in hydrology applications. We investigate a set of its statistical properties for supporting such applications, like moments, moment generating function, conditional moments, mean deviations, entropy, mean and variance (reversed) residual life and maximum likelihood estimators with observed information matrix. The distribution can accommodate both decreasing and increasing hazard rates as well as upside down bathtub and bathtub shaped hazard rates. Moreover, several distributions arise as special cases of the distribution. The MBLE distribution with others are fitted to two hydrology data sets. It is shown that, the MBLE distribution is the best fit among the compared distributions based on nine goodness-of-fit statistics among them the Corrected Akaike information criterion, Hannan–Quinn information criterion, Anderson–Darling and Kolmogorov–Smirnov p value. Consequently, some parameters of these data are obtained such as return level, conditional mean, mean deviation about the return level, risk of failure for designing hydraulic structures. Finally, we hope that this model will be able to attract wider applicability in hydrology and other life areas. PubDate: 2019-05-18

Abstract: In the present paper, we introduce a new lifetime distribution based on the general odd hyperbolic cosine-FG model. Some important properties of proposed model including survival function, quantile function, hazard function, order statistic are obtained. In addition estimating unknown parameters of this model will be examined from the perspective of classic and Bayesian statistics. Moreover, an example of real data set is studied; point and interval estimations of all parameters are obtained by maximum likelihood, bootstrap (parametric and non-parametric) and Bayesian procedures. Finally, the superiority of proposed model in terms of parent exponential distribution over other fundamental statistical distributions is shown via the example of real observations. PubDate: 2019-05-17

Abstract: A method for developing generalized parametric regression models for count data is proposed and studied. The method is based on the framework of the T-geometric family of distributions. A T-geometric family consists of discrete distributions, which are analogues to the continuous distributions for the random variable T. The general methodology is applied to derive some generalized regression models for count data. These regression models can fit count data that are under-dispersed, equi-dispersed or over-dispersed. The extension to model truncated or inflated data is addressed. Some new generalized T-geometric regression models are applied to real world data sets to illustrate the flexibility of the models. The models were fitted to four response variables from health care data and their performance compared. No single regression model outperforms other models for all the four response variables. Thus, a researcher should evaluate different models before selecting a final regression model for a count response variable. PubDate: 2019-05-16

Abstract: We present results for Shannon entropy from environmental data, such as air temperature, relative humidity, rainfall and wind speed. We use hourly generated time-series hydrological model data covering the whole of Tasmania, a state of Australia, and employ concepts from statistical mechanics in our calculations. We also present enthalpy and heat capacitance equivalent quantities for the environment. The results capture interesting seasonal fluctuations in environmental parameters over time. Our results also present an indication that corresponds to a slight increase in the number of microstates due to air temperature over the duration of data considered in this work. PubDate: 2019-05-15

Abstract: Machine learning algorithms (MLAs) usually process large and complex datasets containing a substantial number of features to extract meaningful information about the target concept (a.k.a class). In most cases, MLAs suffer from the latency and computational complexity issues while processing such complex datasets due to the presence of lesser weight (i.e., irrelevant or redundant) features. The computing time of the MLAs increases explosively with increase in the number of features, feature dependence, number of records, types of the features, and nested features categories present in such datasets. Appropriate feature selection before applying MLA is a handy solution to effectively resolve the computing speed and accuracy trade-off while processing large and complex datasets. However, selection of the features that are sufficient, necessary, and are highly co-related with the target concept is very challenging. This paper presents an efficient feature selection algorithm based on random forest to improve the performance of the MLAs without sacrificing the guarantees on the accuracy while processing the large and complex datasets. The proposed feature selection algorithm yields unique features that are closely related with the target concept (i.e., class). The proposed algorithm significantly reduces the computing time of the MLAs without degrading the accuracy much while learning the target concept from the large and complex datasets. The simulation results fortify the efficacy and effectiveness of the proposed algorithm. PubDate: 2019-05-02

Abstract: Women have always faced a number of disadvantageous gaps in the labour market; the status of women at the labour markets throughout the world has not substantially narrowed gender gaps in the workplace. Many women in developing countries are domestic workers or informal factory workers, while others are unpaid workers in family enterprises and family farms. Agriculture is the primary sector of women’s employment; Sub-Saharan Africa is among regions with the highest proportion of women employment in the agriculture sector. This research was conducted on 274 sampled households with the objective to determine the factors associated with women’s employment status and to examine whether the estimated parameters for logistic regression model adopting Bayesian and maximum likelihood estimation approaches are similar or not. The research revealed that about 144 (52.6%) of sampled women were unemployed that is, they were not involved in any activity for earning during the data collection. The inferential analysis using both Bayesian and Maximum likelihood estimation schemes indicated that, pregnancy, age, education level, husband/partner occupation, marital status, family size, training opportunity and a child less than 5 years old had statistically significant (p < 0.05) effect on employment status of women. The maximum likelihood estimates and Bayesian estimates with non-informative prior do not have considerable difference. PubDate: 2019-04-30

Abstract: A new family of continuous distributions which ensure model flexiblity, is introduced based on the Fréchet distribution and Topp Leone-G family. Two special sub-models of the new family are discussed. We provide some distributional properties of this family in the general setting such as the series expansions of density, moments, generating function, stress strength model, Rényi and Shannon entropies, probability weighted moments and order statistics. Certain characterizations of the proposed family are presented. The maximum likelihood estimates and the observed information matrix are obtained for the model parameters. We assess the performance of the maximum likelihood estimators by means of a graphical simulation study. The potentiality of the new class is shown via two applications to real data sets. PubDate: 2019-04-27

Abstract: Latent Dirichlet Allocation (LDA) is a topic model that represents a document as a distribution of multiple topics. It expresses each topic as a distribution of multiple words by mining semantic relationships hidden in text. However, traditional LDA ignores some of the semantic features hidden inside the document semantic structure of medium and long texts. Instead of using the original LDA to model the topic at the document level, it is better to refine the document into different semantic topic units. In this paper, we propose an improved LDA topic model based on partition (LDAP) for medium and long texts. LDAP not only preserves the benefits of the original LDA but also refines the modeled granularity from the document level to the semantic topic level, which is particularly suitable for the topic modeling of the medium and long text. The extensive experimental classification results on Fudan University corpus and Sougou Lab corpus demonstrate that LDAP achieves better performance compared with other topic models, such as LDA, HDP, LSA and doc2vec. PubDate: 2019-04-25

Abstract: In this paper, we have derived the classical and Bayesian inferences for stress–strength reliability \(R=P(X<Y)\) , when the stress–strength data are available in the form of generalized order statistics (gos). It is supposed that the two random samples are mutually independent and obtained from the exponential population. Based on gos, maximum likelihood estimator (MLE) and uniformly minimum variance unbiased estimator (UMVUE) for R of the exponential distribution have been obtained. We have also constructed the exact confidence interval (CI) and asymptotic CI for R. In addition, we have derived the Bayes estimator for R by considering squared error loss function. Simulation study has been performed for comparing the performance of MLE and UMVUE. A Monte–Carlo simulation is also carried out for comparing the performance of Bayes estimator with different priors. For illustrative purposes, a real data analysis is also provided. PubDate: 2019-04-24

Abstract: For the first time, a more detailed statistical analysis of the dependence across Nigeria inflation, exchange rate, and stock market returns is provided by means of copulas. A positive relationship is found to exist between Nigeria inflation and the exchange rate of Nigeria Naira versus USD, a negligible positive relationship exists between Nigeria inflation and her stock market returns, and a weak positive relationship exists between the exchange rate of Nigeria Naira versus USD and her stock market returns. Eighteen months forecast for each of the time series and the value at risk estimates for the Nigeria stock market returns are given. The Nigeria stock market is confirmed to be weak form inefficient. PubDate: 2019-04-20