 Annals of Data Science
• Operational Loss Data Collection: A Literature Review
• Authors: Lu Wei; Jianping Li; Xiaoqian Zhu
Pages: 313 - 337
Abstract: Abstract This paper is the first to provide a comprehensive overview of the worldwide operational loss data collection exercises (LDCEs) of internal loss, external loss, scenario analysis and business environment and internal control factors (BEICFs). Based on analyzing operational risk-related articles from 2002 to March 2017 and surveying a large amount of other information, various sources of operational risk data are classified into five types, i.e. individual banks, regulatory authorities, consortia of financial institutions, commercial vendors and researchers. Then by reviewing operational risk databases from these five data sources, we summarized and described 32 internal databases, 26 external databases, 7 scenario databases and 1 BEICFs database. We also find that compared with developing countries, developed countries have performed relatively better in operational risk LDCEs. Besides, the two subjective data elements of scenario analysis and BEICFs are less used than the two objective data elements of internal and external loss data in operational risk estimation.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0139-2
Vol. 5, No. 3 (2018)

• A New Approach for Improving Classification Accuracy in Predictive
Discriminant Analysis
• Authors: A. Iduseri; J. E. Osemwenkhae
Pages: 339 - 357
Abstract: Abstract The focus of a predictive discriminant analysis is to improve classification accuracy, and to obtain statistically optimal classification accuracy or hit rate is still a challenge due to the inherent variability of most real life dataset. Improving classification accuracy is usually achieved with best subset of relevant predictors obtained by using classical variable selection methods. The goal of variable selection methods is to choose the best subset (or training sample) of relevant variables that typically reduces the complexity of a model and makes it easier to interpret, improves the classification accuracy of the model and reduces the training time. However, a statistically optimal hit rate can be achieved if the training sample meets a near optimal condition by resolving any significant differences in the variances for the groups formed by the dependent variable. This paper proposes a new approach for obtaining a near optimal training sample that will produce a statistically optimal hit rate using a modified winsorization with graphical diagnostic. In application to real life data sets, the proposed new approach was able to identify and remove legitimate contaminants in one or more predictors in the training sample, thereby resolving any significant differences in the variances for the groups formed by the dependent variable. The graphical diagnostic associated with the new approach, however, provides a useful visual tool which served as an alternative graphical test for homogeneity of variances.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0140-9
Issue No: Vol. 5, No. 3 (2018)

• Classifying Categories of SCADA Attacks in a Big Data Framework
• Authors: Krishna Madhuri Paramkusem; Ramazan S. Aygun
Pages: 359 - 386
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0141-8
Issue No: Vol. 5, No. 3 (2018)

• On Some Further Properties and Application of Weibull- R Family of
Distributions
• Authors: Indranil Ghosh; Saralees Nadarajah
Pages: 387 - 399
Abstract: Abstract In this paper, we provide some new results for the Weibull-R family of distributions (Alzaghal et al. in Int J Stat Probab 5:139–149, 2016). We derive some new structural properties of the Weibull-R family of distributions. We provide various characterizations of the family via conditional moments, some functions of order statistics and via record values.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0142-7
Issue No: Vol. 5, No. 3 (2018)

• A Family of Generalised Beta Distributions: Properties and Applications
• Authors: Emilio Gómez-Déniz; José María Sarabia
Pages: 401 - 420
Abstract: Abstract A family of continuous distributions with bounded support, which is a generalisation of the standard beta distribution, is introduced. We study some basic properties of the new family and simulation experiments are performed to observe the behaviour of the maximum likelihood estimators. We also derive a multivariate version of the proposed distributions. Three numerical experiments were performed to determine the flexibility of the new family of distributions in comparison with other extensions of the beta distribution that have been proposed. In this respect, the new family was found to be superior.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0143-6
Issue No: Vol. 5, No. 3 (2018)

• A New Family of Generalized Distributions Based on Alpha Power
Transformation with Application to Cancer Data
• Authors: M. Nassar; A. Alzaatreh; O. Abo-Kasem; M. Mead; M. Mansoor
Pages: 421 - 436
Abstract: Abstract In this paper, we propose a new method for generating distributions based on the idea of alpha power transformation introduced by Mahdavi and Kundu (Commun Stat Theory Methods 46(13):6543–6557, 2017). The new method can be applied to any distribution by inverting its quantile function as a function of alpha power transformation. We apply the proposed method to the Weibull distribution to obtain a three-parameter alpha power within Weibull quantile function. The new distribution possesses a very flexible density and hazard rate function shapes which are very useful in cancer research. The hazard rate function can be increasing, decreasing, bathtub or upside down bathtub shapes. We derive some general properties of the proposed distribution including moments, moment generating function, quantile and Shannon entropy. The maximum likelihood estimation method is used to estimate the parameters. We illustrate the applicability of the proposed distribution to complete and censored cancer data sets.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0144-5
Issue No: Vol. 5, No. 3 (2018)

• Region Based Instance Document (RID) Approach Using Compression Features
• Authors: N. V. Ganapathi Raju; Someswara Rao Chinta
Pages: 437 - 451
Abstract:
Authors hip attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially conspicuous in legal, criminal/civil cases, threatening letters and terroristic communications also in computer forensics. There are two basic approaches for authorship attribution one is instance based (treat each training text individually) and the other is profile based (treat each training text cumulatively). Both of these methods have their own advantages and disadvantages. The present paper proposes a new region based document model for authorship identification, to address the dimensionality problem of instance based approaches and scalability problem of profile based approaches. The proposed model concatenates a set of individual ‘n’ instance documents of the author as a single region based instance document (RID). On the RID compression based similarity distance method is used. The compression based methods requires no pre-processing and easy to apply. This paper uses Gzip compression algorithm with two compression based similarity measures NCD, CDM. The proposed compression model is character based and it can automatically capture easily non word features such as word stems, punctuations etc. The only disadvantage of compression models is complexity is high. The proposed RID approach addresses this issue by reducing the repeated words in the document. The present approach is experimented on English editorial columns. We achieved approximately 98% of accuracy in identifying the author.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0145-4
Issue No: Vol. 5, No. 3 (2018)

• Development of Optimal ANN Model to Estimate the Thermal Performance of
Roughened Solar Air Heater Using Two different Learning Algorithms
Pages: 453 - 467
Abstract: Abstract In the present study, artificial neural network (ANN) model has been developed with two different training algorithms to predict the thermal efficiency of wire rib roughened solar air heater. Total 50 sets of data have been taken from experiments with three different types of absorber plate. The experimental data and calculated values of collector efficiency were used to develop ANN model. Scaled conjugate gradient (SCG) and Levenberg–Marquardt (LM) learning algorithms were used. It has been found that TRAINLM with 6 neurons and TRAINSCG with 7 neurons is optimal model on the basis of statistical error analysis. The performance of both the models have been compared with actual data and found that TRAINLM performs better than TRAINSCG. The value of coefficient of determination $$(\hbox {R}^{2})$$ for LM-6 is 0.99882 which gives the satisfactory performance. Learning algorithm with LM based proposed MLP ANN model seems more reliable for predicting performance of solar air heater.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0146-3
Issue No: Vol. 5, No. 3 (2018)

• $$\ell _1$$ ℓ 1 -Norm Based Central Point Analysis for Asymmetric Radial
Data
• Authors: Qi An; Shu-Cherng Fang; Tiantian Nie; Shan Jiang
Pages: 469 - 486
Abstract: Abstract Multivariate asymmetric radial data clouds with irregularly positioned “spokes” and “clutters” are commonly seen in real life applications. In identifying the spoke directions of such data, a key initial step is to locate a central point from which each spoke extends and diverges. In this technical note, we propose a novel method that features a preselection procedure to screen out candidate points that have sufficiently many data points in the vicinity and identifies the central point by solving an $$\ell _1$$ -norm constrained discrete optimization program. Extensive numerical experiments show that the proposed method is capable of providing central points with superior accuracy and robustness compared with other known methods and is computationally efficient for implementation.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0147-2
Issue No: Vol. 5, No. 3 (2018)

• Enhancing Situation Awareness Using Semantic Web Technologies and Complex
Event Processing
• Authors: Havva Alizadeh Noughabi; Mohsen Kahani; Alireza Shakibamanesh
Pages: 487 - 496
Abstract: Data fusion techniques combine raw data of multiple sources and collect associated data to achieve more specific inferences than what could be attained with a single source. Situational awareness is one of the levels of the JDL, a matured information fusion model. The aim of situational awareness is to understand the developing relationships of interests between entities within a specific time and space. The present research shows how semantic web technologies, i.e. ontology and semantic reasoner, can be used to describe situations and increase awareness of the situation. As the situation awareness level receives data streams from numerous distributed sources, it is necessary to manage data streams by applying data stream processor engines such as Esper. In addition, in this research, complex event processing, a technique for achieving related situational in real-time, has been used, whose main aim is to generate actionable abstractions from event streams, automatically. The proposed approach combines Complex Event Processing and semantic web technologies to achieve better situational awareness. To show the functionality of the proposed approach in practice, some simple examples are discussed.
PubDate: 2018-09-01
DOI: 10.1007/s40745-018-0148-1
Issue No: Vol. 5, No. 3 (2018)

• A Primer on a Flexible Bivariate Time Series Model for Analyzing First and
Second Half Football Goal Scores: The Case of the Big 3 London Rivals in
the EPL
• Authors: Yuvraj Sunecher; Naushad Mamode Khan; Vandna Jowaheer; Marcelo Bourguignon; Mohammad Arashi
Abstract: Abstract The ranking of some English Premier League (EPL) clubs during football season is of keen interest to many stakeholders with special attention to the London rivals: Arsenal, Chelsea and Tottenham. In particular, the first (GF) and second half (GS) scores, besides being inter-related, is perceived as a convenient measure of the clubs potential. This paper studies the contributory effects of the possible factors that commonly influence the club scoring capacity in the halves along with forecasted measures diagnostics via a novel flexible bivariate time series model with COM-Poisson innovations using data from August 2014 to December 2017.
PubDate: 2018-09-11
DOI: 10.1007/s40745-018-0180-1

• Treatment Effect Decomposition and Bootstrap Hypothesis Testing in
Observational Studies
• Authors: Hee Youn Kwon; Jason J. Sauppe; Sheldon H. Jacobson
Abstract: Abstract Causal inference with observational data has drawn attention across various fields. These observational studies typically use matching methods which find matched pairs with similar covariate values. However, matching methods may not directly achieve covariate balance, a measure of matching effectiveness. As an alternative, the Balance Optimization Subset Selection (BOSS) framework, which seeks optimal covariate balance directly, has been proposed. This paper extends BOSS by estimating and decomposing a treatment effect as a combination of heterogeneous treatment effects from a partitioned set. Our method differs from the traditional propensity score subclassification method in that we find a subset in each subclass using BOSS instead of using the stratum determined by the propensity score. Then, by conducting a bootstrap hypothesis test on each component, we check the statistical significance of these treatment effects. These methods are applied to a dataset from the National Supported Work Demonstration (NSW) program which was conducted in the 1970s. By examining the statistical significance, we show that the program was not significantly effective to a specific subgroup composed of those who were already employed. This differs from the combined estimate—the NSW program was effective when considering all the individuals. Lastly, we provide results that are obtained when these steps are repeated with sub-samples.
PubDate: 2018-09-08
DOI: 10.1007/s40745-018-0179-7

• Two-Stage Composition of Probabilistic Preferences
• Authors: Annibal Parracho Sant’Anna; Gilson Brito Alves Lima; Leonardo Augusto da Fonseca Parrach Sant’Anna; Luiz Octávio Gavião
Abstract: Abstract A strategy of multicriteria decision aid in which initial acceptance of all the alternatives is followed by an automatic classification of them is here proposed. This classification is used in the next step to limit to the alternatives in the upper class the comparisons employed to choose the best alternative. A procedure to perform the classification through successive divisions is proposed. It is also studied the replacement, in the calculation of the preference score according to each criterion, which is a central step in the Composition of Probabilistic Preferences, of the probability of being preferable simultaneously to all alternatives by the average of the probabilities of being preferable to each one. Another development brought is the calculation of preference probabilities based on empirical cumulative distributions, derived from the observed preference counts. Procedures for effectively bringing to practice each of these proposals are presented and the results of applying them to different practical situations are discussed.
PubDate: 2018-08-29
DOI: 10.1007/s40745-018-0177-9

• Cubic Transmuted Pareto Distribution
• Authors: Md. Mahabubur Rahman; Bander Al-Zahrani; Muhammad Qaiser Shahbaz
Abstract: Abstract In this article, we have proposed the cubic transmuted Pareto distribution, by using the cubic transmuted family of distributions introduced by Rahman et al. (in Pak J Stat Oper Res 14:451–469, 2018). We have explored the distribution in detail and statistical properties of the distribution have been studied. The parameter estimation for the distribution has been discussed and the performance of estimators is studied by conducting extensive simulation study. Finally, the cubic transmuted Pareto distribution has been fitted on two real datasets to investigate it’s applicability.
PubDate: 2018-08-29
DOI: 10.1007/s40745-018-0178-8

• Type II Half Logistic Exponential Distribution with Applications
• Authors: M. Elgarhy; Muhammad Ahsan ul Haq; Ismat Perveen
Abstract: Abstract We defined and studied and inventive distribution called Type II half logistic exponential (TIIHLE) distribution. Some well-known mathematical properties; moments, probability weighted moments, mean deviation, quantile function, Renyi entropy of TIIHLE distribution are investigated. The expressions of order statistics are derived. Parameters of the derived distribution are obtained using maximum likelihood method. The importance of proposed distribution is exemplified by two datasets.
PubDate: 2018-08-20
DOI: 10.1007/s40745-018-0175-y

• On the Beta-G Poisson Family
• Authors: Gokarna R. Aryal; Sher B. Chhetri; Hongwei Long; Alfred A. Akinsete
Abstract: Abstract In this article, we propose and study a new family of distributions which is defined by using the genesis of the truncated Poisson distribution and the beta distribution. Some mathematical properties of the new family including moments, quantile and generating functions, mean deviations, order statistics and their moments, and reliability analysis are discussed. We also discuss the parameter estimation procedures and potential applications of such generalized family of distributions.
PubDate: 2018-08-18
DOI: 10.1007/s40745-018-0176-x

• Inverse Gompertz Distribution: Properties and Different Estimation Methods
with Application to Complete and Censored Data
• Authors: M. S. Eliwa; M. El-Morshedy; Mohamed Ibrahim
Abstract: Abstract In this article, we introduce inverse Gompertz distribution with two parameters. Some statistical properties are presented such as hazard rate function, quantile, probability weighted (moments), skewness, kurtosis, entropies function, mean residual lifetime and mean inactive lifetime. The model parameters are estimated by the method of maximum likelihood, bootstrap, least squares, weighted least squares and Cramér-von Mises. Further, Monte Carlo simulations are carried out to compare the long-run performance of the estimators based on complete and type II right censored data. Finally, we estimate the parameters based on behavioral sciences data and fatigue life of 10 bearing of a certain type in hours censored data, which explain that the model fits the data better than some models.
PubDate: 2018-08-17
DOI: 10.1007/s40745-018-0173-0

• An Alternative Conjugate Prior Distribution for Positive Parameters
• Authors: Marcelo Bourguignon
Abstract: Abstract In this paper, we propose a new conjugate prior probability distribution to many likelihoods distributions. In particular, we use the weighted Lindley distribution as a conjugate prior distribution. The weighted Lindley distribution can be viewed as a mixture of two gamma distributions with know weights. The weighted Lindley distribution of conjugate priors offers a more flexible class of priors than the class of gamma prior distributions. The results are illustrated for the problem of inference for Poisson and normal parameters.
PubDate: 2018-08-16
DOI: 10.1007/s40745-018-0174-z

• A Modified Cancelable Biometrics Scheme Using Random Projection
• Authors: Randa F. Soliman; Mohamed Amin; Fathi E. Abd El-Samie
Abstract: Abstract This paper presents a random projection scheme for cancelable iris recognition. Instead of using original iris features, masked versions of the features are generated through the random projection in order to increase the security of the iris recognition system. The proposed framework for iris recognition includes iris localization, sector selection of the iris to avoid eyelids and eyelashes effects, normalization, segmentation of normalized iris region into halves, selection of the upper half for further reduction of eyelids and eyelashes effects, feature extraction with Gabor filter, and finally random projection. This framework guarantees exclusion of eyelids and eyelashes effects, and masking of the original Gabor features to increase the level of security. Matching is performed with a Hamming Distance (HD) metric. The proposed framework achieves promising recognition rates of 99.67% and a leading Equal Error Rate (EER) of 0.58%.
PubDate: 2018-08-13
DOI: 10.1007/s40745-018-0172-1

• The Generalized Burr XII Power Series Distributions with Properties and
Applications
• Authors: Ibrahim Elbatal; Emrah Altun; Ahmed Z. Afify; Gamze Ozel
Abstract: Abstract We define and study a new family of distributions, called generalized Burr XII power series class, by compounding the generalized Burr XII and power series distributions. Several properties of the new family are derived. The maximum likelihood estimation method is used to estimate the model parameters. The importance and potentiality of the new family are illustrated by means of three applications to real data sets.
PubDate: 2018-08-04
DOI: 10.1007/s40745-018-0171-2

