Authors:Lu Wei; Jianping Li; Xiaoqian Zhu Pages: 313 - 337 Abstract: This paper is the first to provide a comprehensive overview of the worldwide operational loss data collection exercises (LDCEs) of internal loss, external loss, scenario analysis and business environment and internal control factors (BEICFs). Based on analyzing operational risk-related articles from 2002 to March 2017 and surveying a large amount of other information, various sources of operational risk data are classified into five types, i.e. individual banks, regulatory authorities, consortia of financial institutions, commercial vendors and researchers. Then by reviewing operational risk databases from these five data sources, we summarized and described 32 internal databases, 26 external databases, 7 scenario databases and 1 BEICFs database. We also find that compared with developing countries, developed countries have performed relatively better in operational risk LDCEs. Besides, the two subjective data elements of scenario analysis and BEICFs are less used than the two objective data elements of internal and external loss data in operational risk estimation. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0139-2 Issue No:Vol. 5, No. 3 (2018)

Authors:A. Iduseri; J. E. Osemwenkhae Pages: 339 - 357 Abstract: The focus of a predictive discriminant analysis is to improve classification accuracy, and to obtain statistically optimal classification accuracy or hit rate is still a challenge due to the inherent variability of most real life dataset. Improving classification accuracy is usually achieved with best subset of relevant predictors obtained by using classical variable selection methods. The goal of variable selection methods is to choose the best subset (or training sample) of relevant variables that typically reduces the complexity of a model and makes it easier to interpret, improves the classification accuracy of the model and reduces the training time. However, a statistically optimal hit rate can be achieved if the training sample meets a near optimal condition by resolving any significant differences in the variances for the groups formed by the dependent variable. This paper proposes a new approach for obtaining a near optimal training sample that will produce a statistically optimal hit rate using a modified winsorization with graphical diagnostic. In application to real life data sets, the proposed new approach was able to identify and remove legitimate contaminants in one or more predictors in the training sample, thereby resolving any significant differences in the variances for the groups formed by the dependent variable. The graphical diagnostic associated with the new approach, however, provides a useful visual tool which served as an alternative graphical test for homogeneity of variances. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0140-9 Issue No:Vol. 5, No. 3 (2018)

Authors:Krishna Madhuri Paramkusem; Ramazan S. Aygun Pages: 359 - 386 Abstract: The supervisory control and data acquisition (SCADA) systems monitor and control industrial control systems in many industrial and economic sectors such as water treatment, power plants, railroads, and gas pipelines. The integration of SCADA systems with the internet and corporate enterprise networks for various economical reasons exposes SCADA systems to attacks by hackers who could remotely exploit and gain access to SCADA systems to damage the infrastructure and thereby harming people’s lives. The simplicity of datasets and possible overfitting of models to training data are some of the issues in the previous research. In this paper, we present detecting and classifying malicious command and response packets in a SCADA network by analyzing attribute differences and history of packets using k-means clustering. This study presents a solution to classify SCADA cyber attacks to detect and classify SCADA attacks with high accuracy using a big data framework that comprises of Apache Hadoop and Apache Mahout. Apache Mahout’s random forest classification algorithm is applied on SCADA’s gas pipeline dataset to categorize attacks. When 70% of the data is used for training the classifier, our approach resulted in 5–17% improvement in accuracy for the classification of read response attacks and 2–8% improvement in accuracy for write command attacks with respect to using the original dataset. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0141-8 Issue No:Vol. 5, No. 3 (2018)

Authors:Indranil Ghosh; Saralees Nadarajah Pages: 387 - 399 Abstract: In this paper, we provide some new results for the Weibull-R family of distributions (Alzaghal et al. in Int J Stat Probab 5:139–149, 2016). We derive some new structural properties of the Weibull-R family of distributions. We provide various characterizations of the family via conditional moments, some functions of order statistics and via record values. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0142-7 Issue No:Vol. 5, No. 3 (2018)

Authors:Emilio Gómez-Déniz; José María Sarabia Pages: 401 - 420 Abstract: A family of continuous distributions with bounded support, which is a generalisation of the standard beta distribution, is introduced. We study some basic properties of the new family and simulation experiments are performed to observe the behaviour of the maximum likelihood estimators. We also derive a multivariate version of the proposed distributions. Three numerical experiments were performed to determine the flexibility of the new family of distributions in comparison with other extensions of the beta distribution that have been proposed. In this respect, the new family was found to be superior. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0143-6 Issue No:Vol. 5, No. 3 (2018)

Authors:M. Nassar; A. Alzaatreh; O. Abo-Kasem; M. Mead; M. Mansoor Pages: 421 - 436 Abstract: In this paper, we propose a new method for generating distributions based on the idea of alpha power transformation introduced by Mahdavi and Kundu (Commun Stat Theory Methods 46(13):6543–6557, 2017). The new method can be applied to any distribution by inverting its quantile function as a function of alpha power transformation. We apply the proposed method to the Weibull distribution to obtain a three-parameter alpha power within Weibull quantile function. The new distribution possesses a very flexible density and hazard rate function shapes which are very useful in cancer research. The hazard rate function can be increasing, decreasing, bathtub or upside down bathtub shapes. We derive some general properties of the proposed distribution including moments, moment generating function, quantile and Shannon entropy. The maximum likelihood estimation method is used to estimate the parameters. We illustrate the applicability of the proposed distribution to complete and censored cancer data sets. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0144-5 Issue No:Vol. 5, No. 3 (2018)

Authors:N. V. Ganapathi Raju; Someswara Rao Chinta Pages: 437 - 451 Abstract: Authors hip attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially conspicuous in legal, criminal/civil cases, threatening letters and terroristic communications also in computer forensics. There are two basic approaches for authorship attribution one is instance based (treat each training text individually) and the other is profile based (treat each training text cumulatively). Both of these methods have their own advantages and disadvantages. The present paper proposes a new region based document model for authorship identification, to address the dimensionality problem of instance based approaches and scalability problem of profile based approaches. The proposed model concatenates a set of individual ‘n’ instance documents of the author as a single region based instance document (RID). On the RID compression based similarity distance method is used. The compression based methods requires no pre-processing and easy to apply. This paper uses Gzip compression algorithm with two compression based similarity measures NCD, CDM. The proposed compression model is character based and it can automatically capture easily non word features such as word stems, punctuations etc. The only disadvantage of compression models is complexity is high. The proposed RID approach addresses this issue by reducing the repeated words in the document. The present approach is experimented on English editorial columns. We achieved approximately 98% of accuracy in identifying the author. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0145-4 Issue No:Vol. 5, No. 3 (2018)

Authors:Harish Kumar Ghritlahre; Radha Krishna Prasad Pages: 453 - 467 Abstract: In the present study, artificial neural network (ANN) model has been developed with two different training algorithms to predict the thermal efficiency of wire rib roughened solar air heater. Total 50 sets of data have been taken from experiments with three different types of absorber plate. The experimental data and calculated values of collector efficiency were used to develop ANN model. Scaled conjugate gradient (SCG) and Levenberg–Marquardt (LM) learning algorithms were used. It has been found that TRAINLM with 6 neurons and TRAINSCG with 7 neurons is optimal model on the basis of statistical error analysis. The performance of both the models have been compared with actual data and found that TRAINLM performs better than TRAINSCG. The value of coefficient of determination \((\hbox {R}^{2})\) for LM-6 is 0.99882 which gives the satisfactory performance. Learning algorithm with LM based proposed MLP ANN model seems more reliable for predicting performance of solar air heater. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0146-3 Issue No:Vol. 5, No. 3 (2018)

Authors:Qi An; Shu-Cherng Fang; Tiantian Nie; Shan Jiang Pages: 469 - 486 Abstract: Multivariate asymmetric radial data clouds with irregularly positioned “spokes” and “clutters” are commonly seen in real life applications. In identifying the spoke directions of such data, a key initial step is to locate a central point from which each spoke extends and diverges. In this technical note, we propose a novel method that features a preselection procedure to screen out candidate points that have sufficiently many data points in the vicinity and identifies the central point by solving an \(\ell _1\) -norm constrained discrete optimization program. Extensive numerical experiments show that the proposed method is capable of providing central points with superior accuracy and robustness compared with other known methods and is computationally efficient for implementation. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0147-2 Issue No:Vol. 5, No. 3 (2018)

Authors:Havva Alizadeh Noughabi; Mohsen Kahani; Alireza Shakibamanesh Pages: 487 - 496 Abstract: Data fusion techniques combine raw data of multiple sources and collect associated data to achieve more specific inferences than what could be attained with a single source. Situational awareness is one of the levels of the JDL, a matured information fusion model. The aim of situational awareness is to understand the developing relationships of interests between entities within a specific time and space. The present research shows how semantic web technologies, i.e. ontology and semantic reasoner, can be used to describe situations and increase awareness of the situation. As the situation awareness level receives data streams from numerous distributed sources, it is necessary to manage data streams by applying data stream processor engines such as Esper. In addition, in this research, complex event processing, a technique for achieving related situational in real-time, has been used, whose main aim is to generate actionable abstractions from event streams, automatically. The proposed approach combines Complex Event Processing and semantic web technologies to achieve better situational awareness. To show the functionality of the proposed approach in practice, some simple examples are discussed. PubDate: 2018-09-01 DOI: 10.1007/s40745-018-0148-1 Issue No:Vol. 5, No. 3 (2018)

Authors:Tariku Tessema Pages: 111 - 132 Abstract: Community acquired pneumonia refers to pneumonia acquired outside of hospitals or extended health facilities and it is a leading infectious disease. This study aims to model mortality of hospitalized under-5 year child pneumonia patients and investigate potential risk factors associated with child mortality due to pneumonia. The study was a retrospective study on 305 sampled under-five hospitalized patients of community acquired pneumonia. A cross-classified multilevel logistic regression was employed with resident and hospital classified at the second level. Bayesian estimation method was applied in which the posterior distribution was simulated via Markov Chain Monte Carlo. The variability attributable to hospital was found to be larger than variability attributable to residence. The odds of dying from the community acquired pneumonia was higher among patients who were; diagnosed in spring season, complicated with malaria, AGE and AFI, in a neonatal age group, diagnosed late (more than a week). The risk of mortality was also found high for lower nurse: patient and physician: patients’ ratios. PubDate: 2018-06-01 DOI: 10.1007/s40745-017-0121-4 Issue No:Vol. 5, No. 2 (2018)

Authors:K. Ramadan; M. I. Dessouky; S. Elagooz; M. Elkordy; F. E. Abd El-Samie Pages: 259 - 272 Abstract: Due to noise enhancement, conventional Zero Forcing (ZF) equalizers are not suitable for wireless Underwater Acoustic (UWA) Orthogonal Frequency Division Multiplexing (OFDM) communication systems. Furthermore, these systems suffer from increasing complexity due to the large number of subcarriers, especially in Multiple-Input Multiple-Output (MIMO) systems. On the other hand, the Minimum Mean Square Error equalizer suffers from high complexity. This type of equalizers needs an estimation of the operating Signal-to-Noise Ratio to work properly. In this paper, we propose a Joint Low-Complexity Regularized ZF equalizer for MIMO UWA-OFDM systems to cope with these problems. The main objective of the proposed equalizer is to enhance the system performance with a lower complexity by performing equalization in two steps. The co-channel interference can be mitigated in the first step. A regularization term is added in the second step to avoid the noise enhancement. Simulation results show that the proposed equalization scheme has the ability to enhance the UWA system performance with low complexity. PubDate: 2018-06-01 DOI: 10.1007/s40745-017-0127-y Issue No:Vol. 5, No. 2 (2018)

Authors:Hongxia Zhang; Liu Liu; Jin Yue; Xin Lai Pages: 293 - 299 Abstract: According to the characteristics of parameters of cardiopulmonary function diversity and change slowly in pathology, we apply the multivariate exponentially weighted moving average (MEWMA) control chart to monitor the state of lungs. This paper aimed at five indicators of cardiopulmonary function, using principal component test to diagnose whether it is from the multivariate normal distribution, Clearing the relationship model of control line and weight coefficient of MEWMA control graph, and drawing the control diagram for monitoring. The process stay in control state before 103 observations, however, beyond the control limit from the 104 observation statistics and give an alarm. This means that there is a problem with the cardiopulmonary starting on the 103rd sample. Control chart has a good warning function because it can raise the alarm before cardiopulmonary function has a big problem. Using MEWMA control chart for monitoring can reduce the cost of medical examination and frequency, it can improve the hospital resource utilization rate and confirm the case. Thus we can avoid missing the best treatment time. PubDate: 2018-06-01 DOI: 10.1007/s40745-018-0137-4 Issue No:Vol. 5, No. 2 (2018)

Authors:Ibrahim Elbatal; Emrah Altun; Ahmed Z. Afify; Gamze Ozel Abstract: We define and study a new family of distributions, called generalized Burr XII power series class, by compounding the generalized Burr XII and power series distributions. Several properties of the new family are derived. The maximum likelihood estimation method is used to estimate the model parameters. The importance and potentiality of the new family are illustrated by means of three applications to real data sets. PubDate: 2018-08-04 DOI: 10.1007/s40745-018-0171-2

Authors:Suleman Nasiru; Peter N. Mwita; Oscar Ngesa Abstract: In this paper, a new family of distributions called the exponentiated generalized power series family is proposed and studied. Statistical properties such as stochastic order, quantile function, entropy, mean residual life and order statistics were derived. Bivariate and multivariate extensions of the family was proposed. The method of maximum likelihood estimation was proposed for the estimation of the parameters. Some special distributions from the family were defined and their applications were demonstrated with real data sets. PubDate: 2018-07-31 DOI: 10.1007/s40745-018-0170-3

Authors:Yun Si Li; Ai Hua Li; Zhi Feng Wang; Qiang Wu Abstract: Over the last 10 years, the soaring housing prices have raised concerns over ‘affordability’ in Chinese housing market, although it is still not enshrined in agreed standards, partly because of different opinions about how it should be measured. To overcome the inadequacy of a single index, we examine the housing affordability of 35 large and medium cities in China from 2009 to 2016 using price-to-income ratio (PIR), monthly payment-income ratio (MIR) and the residual income approach (RI). With consideration of the characteristics of China’s real estate market, we have re-discussed the reasonable range of the indexes. The comparison of single index between cities shows significant periodicity and multi-index clustering analysis reveals regional characteristics, which help us to further the understanding of housing affordability. In the end, policy recommendations on reforming Chinese urban housing system are suggested according to the differences and changing laws of housing affordability among cities. PubDate: 2018-06-19 DOI: 10.1007/s40745-018-0168-x

Authors:Zubair Ahmad Abstract: In this article, a new method is suggested to expand a family of life distributions by adding an additional parameter. The new proposal may be named as the Zubair-G family of distributions. For this family, general expressions for some mathematical properties are derived. The maximum product spacing, ordinary least square and maximum likelihood methods are discussed to estimate the model parameters. A three-parameter special sub-model of the proposed family, called the Zubair–Weibull distribution is considered in detail. Its density function can be symmetrical, left-skewed, right-skewed, and has increasing, decreasing, bathtub and upside-down bathtub shaped failure rates. To illustrate the importance of the proposed family over the other well-known methods, two applications to real data sets are analyzed. PubDate: 2018-06-18 DOI: 10.1007/s40745-018-0169-9

Authors:Fiaz Ahmad Bhatti; G. G. Hamedani; Seyed Morteza Najibi; Munir Ahmad Abstract: In this paper, a flexible modified extended exponential power life testing (MEEPLT) distribution is proposed. The MEEPLT distribution has increasing, decreasing and bathtub hazard rate function. The MEEPLT density is arc, left skewed, right-skewed and symmetrical shaped. The MEEPLT distribution is developed on the basis of the generalized Pearson differential equation. Some structural and mathematical properties including descriptive measures on the basis of quantiles, moments, order statistics and reliability measures are theoretically established. Characterizations of MEEPLT distribution are also studied via different techniques. Parameters of the MEEPLT distribution are estimated using maximum likelihood method. The simulation study for performance of the MLEs of the MEEPLT distribution is carried out. Goodness of fit of this distribution through different methods is studied. PubDate: 2018-06-16 DOI: 10.1007/s40745-018-0167-y

Abstract: AdEater is an early browsing assistant that automatically removes advertisement images from internet pages. It works by generating rules from training data and implementing these rules when browsing the internet. Advertisement images on web pages are replaced by transparent images that display on the image the word “ad”, and where images are misclassified, non-advertisement images on a webpage will also be replaced by transparent images displaying “ad”. This paper critically examines the dataset derived from a trial of AdEater and tries to build a robust image classifier. We apply data mining techniques to uncover associations between features of advertisements and non-advertisements and try to predict whether the images are advertisements or non-advertisements based on three classification methods. We achieve classification accuracy of 96.5%, using k-fold cross validation to train and test the model. PubDate: 2018-06-06 DOI: 10.1007/s40745-018-0164-1

Authors:Laba Handique; Subrata Chakraborty; Thiago A. N. de Andrade Abstract: A new generator of continuous distributions called Exponentiated Generalized Marshall–Olkin-G family with three additional parameters is proposed. This family of distribution contains several known distributions as sub models. The probability density function and cumulative distribution function are expressed as infinite mixture of the Marshall–Olkin distribution. Important properties like quantile function, order statistics, moment generating function, probability weighted moments, entropy and shapes are investigated. The maximum likelihood method to estimate model parameters is presented. A simulation result to assess the performance of the maximum likelihood estimation is briefly discussed. A distribution from this family is compared with two sub models and some recently introduced lifetime models by considering three real life data fitting applications. PubDate: 2018-06-05 DOI: 10.1007/s40745-018-0166-z