A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  

  Subjects -> STATISTICS (Total: 130 journals)
The end of the list has been reached or no journals were found for your choice.
Similar Journals
Journal Cover
Journal of Probability and Statistics
Journal Prestige (SJR): 0.316
Citation Impact (citeScore): 1
Number of Followers: 10  

  This is an Open Access Journal Open Access journal
ISSN (Print) 1687-952X - ISSN (Online) 1687-9538
Published by Hindawi Homepage  [339 journals]
  • Random Forests in Count Data Modelling: An Analysis of the Influence of
           Data Features and Overdispersion on Regression Performance

    • Abstract: Machine learning algorithms, especially random forests (RFs), have become an integrated part of the modern scientific methodology and represent an efficient alternative to conventional parametric algorithms. This study aimed to assess the influence of data features and overdispersion on RF regression performance. We assessed the effect of types of predictors (100, 75, 50, and 20% continuous, and 100% categorical), the number of predictors (p = 816 and 24), and the sample size (N = 50, 250, and 1250) on RF parameter settings. We also compared RF performance to that of classical generalized linear models (Poisson, negative binomial, and zero-inflated Poisson) and the linear model applied to log-transformed data. Two real datasets were analysed to demonstrate the usefulness of RF for overdispersed data modelling. Goodness-of-fit statistics such as root mean square error (RMSE) and biases were used to determine RF accuracy and validity. Results revealed that the number of variables to be randomly selected for each split, the proportion of samples to train the model, the minimal number of samples within each terminal node, and RF regression performance are not influenced by the sample size, number, and type of predictors. However, the ratio of observations to the number of predictors affects the stability of the best RF parameters. RF performs well for all types of covariates and different levels of dispersion. The magnitude of dispersion does not significantly influence RF predictive validity. In contrast, its predictive accuracy is significantly influenced by the magnitude of dispersion in the response variable, conditional on the explanatory variables. RF has performed almost as well as the models of the classical Poisson family in the presence of overdispersion. Given RF’s advantages, it is an appropriate statistical alternative for counting data.
      PubDate: Thu, 01 Dec 2022 14:50:01 +000
  • Mathematical Modeling of Concentration Risk under the Default Risk Charge
           Using Probability and Statistics Theory

    • Abstract: In the Fundamental Review of the Trading Book (FRTB), the latest regulation for minimum capital market risk requirements, one of the major changes, is replacing the Incremental Risk Charge (IRC) with the Default Risk Charge (DRC). The DRC measures only the default and does not consider the migration rating risk. The second new change in this approach was that the DRC now includes equity assets, contrary to the IRC. This paper studies DRC modeling under the Internal Model Approach (IMA) and the regulator conditions that every DRC component must respect. The FRTB presents the DRC measurement as Value at Risk (VaR) over a one-year horizon, with the quantile equal to 99.9%. We use multifactor adjustment to measure the DRC and compare it with the Monte Carlo Model to understand how the approach fits. We then define concentration in the DRC and propose two methods to quantify the concentration risk: the Ad Hoc and Add-On methods. Finally, we study the behavior of the DRC with respect to the concentration risk.
      PubDate: Tue, 01 Nov 2022 13:20:01 +000
  • Extreme Value Distributions: An Overview of Estimation and Simulation

    • Abstract: The generalized extreme value distribution (GEVD) and various extreme value distributions are commonly applied in air pollution, telecommunications, operational risk management, finance, insurance, material sciences, economics, and hydrology, among many other industries that deal with extreme events. Extreme value distributions (EVDs) typically limit the distribution of maximum and minimum values for many random observations drawn from the same arbitrary distribution. Besides that, it is a crucial method for forecasting future events and emerged as critical method for predicting future events. As a result, prior research is required to select the best estimation method to obtain a reliable value for the parameters of extreme value distributions. This study provides an overview of three-parameter estimation methods based on goodness-of-fit statistics and root mean square error (RMSE). This paper reviewed and compared three estimation methods used to approximate values of parameters for simulated observations taken from the EVD and GEVD. The method of moments (MOMs), maximum likelihood estimator (MLE), and maximum product of spacing (MPS) were the methods investigated in this study. Our findings indicated that the MPS performed better based on the mean square errors (MSEs); meanwhile, the MPS had similar goodness-of-fit statistic values compared to the MLE.
      PubDate: Wed, 19 Oct 2022 02:50:01 +000
  • Interpretability of Composite Indicators Based on Principal Components

    • Abstract: Principal component approaches are often used in the construction of composite indicators to summarize the information of input variables. The gain of dimension reduction comes at the cost of difficulties in interpretation, inaccurate targeting, and possible conflicts with the theoretical framework when the signs in the loading are not aligned with the expected direction of impact. In this study, we propose an adjustment in the construction of principal component approaches to avoid these problems. The effectiveness of the proposed approach is illustrated in defining the Food and Agriculture Organization of the United Nations’ Resilience Capacity Index, which is used to measure household-level resilience to food insecurity. We conclude that the robustness gain of using the new method improves the reliability of the composite indicator.
      PubDate: Thu, 29 Sep 2022 09:35:01 +000
  • NetDA: An R Package for Network-Based Discriminant Analysis Subject to
           Multilabel Classes

    • Abstract: In this paper, we introduce the R package NetDA, which aims to deal with multiclassification with network structures in predictors accommodated. To address the natural feature of network structures, we apply Gaussian graphical models to characterize dependence structures of the predictors and directly estimate the precision matrix. After that, the estimated precision matrix is employed to linear discriminant functions and quadratic discriminant functions. The R package NetDA is now available on CRAN, and the demonstration of functions is summarized as a vignette in the online documentation.
      PubDate: Tue, 27 Sep 2022 15:05:02 +000
  • Some Improved Classes of Estimators in Stratified Sampling Using Bivariate
           Auxiliary Information

    • Abstract: This manuscript considers some improved combined and separate classes of estimators of population mean using bivariate auxiliary information under stratified simple random sampling. The expressions of bias and mean square error of the proposed classes of estimators are determined to the first order of approximation. It is exhibited that under some particular conditions, the proposed classes of estimators dominate the existing prominent estimators. The theoretical findings are supported by a simulation study performed over a hypothetically generated population.
      PubDate: Wed, 31 Aug 2022 11:35:01 +000
  • D-Optimal Design for a Causal Structure for Completely Randomized and
           Random Blocked Experiments

    • Abstract: Most experimental design literature on causal inference focuses on establishing a causal relationship between variables, but there is no literature on how to identify a design that results in the optimal parameter estimates for a structural equation model (SEM). In this research, search algorithms are used to produce a D-optimal design for a SEM for three-stage least squares and full information maximum likelihood estimators. Then, a D-optimal design for the estimate of the model parameters of a mixed-effects SEM is obtained. The efficiency of each of the D-optimal designs for SEMs is compared with univariate optimal and uniform designs. In each case, the causal relationship changed the optimal designs dramatically and the new D-optimal designs were more efficient.
      PubDate: Tue, 30 Aug 2022 08:35:01 +000
  • On Hierarchical Bayesian Spatial Small Area Model for Binary Data under
           Spatial Misalignment

    • Abstract: Small area models have become popular methods for producing reliable estimates for sub-populations (small geographic areas in this study). Small area modeling may be carried out via model-assisted approaches within the model-based approaches or design-based paradigm. When there are medium or large samples, a model-assisted approach may be reliable. However, when data are scarce, a model-based technique may be required. Model-based Bayesian analysis is popular for its ability to combine information from several sources as well as taking account uncertainties in the analysis and spatial prediction of spatial data. Nevertheless, things become more complex when the geographic boundaries of interest are misaligned. Some authors have addressed the problem of misalignment under hierarchical Bayesian approach. In this study, we developed non-trivial extension of existing hierarchical Bayesian model for a binary outcome variable under spatial misalignment with three contributions. First, the model uses unit-level survey data and area-level auxiliary data to predict the posterior mean proportion spatially at the second geographic area level. Second, the linking model is changed to logit-normal model in the proposed model. Lastly, the mean process was considered to overcome the multicollinearity between the true predictors and the spatial random effect. Sensitivity analysis was also done via simulation.
      PubDate: Thu, 30 Jun 2022 12:20:02 +000
  • Attribute Control Chart for Rayleigh Distribution Using Repetitive
           Sampling under Truncated Life Test

    • Abstract: A control chart is an important tool in statistical process monitoring that is useful to monitor and improve production process quality. In this article, an attribute control chart using repetitive sampling under a truncated life test is proposed for monitoring the mean life of the product where the lifetime follows the Rayleigh distribution. The repetitive sampling parameters and the control limit coefficients of the chart are determined so that the in-control average run length (ARL) is very close to the target ARL. Tables of ARL values for various shift sizes in the scale parameter were presented, and the performance of the proposed chart is compared with the existing attribute control charts using the out-of-control ARL. The proposed control chart is shown to outperform the existing control charts in terms of ARL. An illustrative example is given to demonstrate the application of the proposed chart.
      PubDate: Tue, 31 May 2022 05:50:04 +000
  • Nonstationary Generalised Autoregressive Conditional Heteroskedasticity
           Modelling for Fitting Higher Order Moments of Financial Series within
           Moving Time Windows

    • Abstract: Here, we present a method for a simple GARCH (1,1) model to fit higher order moments for different companies’ stock prices. When we assume a Gaussian conditional distribution, we fail to capture any empirical data when fitting the first three even moments of financial time series. We show instead that a mixture of normal distributions is needed to better capture the higher order moments of the data. To demonstrate this point, we construct regions (parameter diagrams), in the fourth- and sixth-order standardised moment space, where a GARCH (1,1) model can be used to fit moment values and compare them with the corresponding moments from empirical data for different sectors of the economy. We found that the ability of the GARCH model with a double normal conditional distribution to fit higher order moments is dictated by the time window our data spans. We can only fit data collected within specific time window lengths and only with certain parameters of the conditional double Gaussian distribution. In order to incorporate the nonstationarity of financial series, we assume that the parameters of the GARCH model can have time dependence. Furthermore, using the method developed here, we investigate the effect of the COVID-19 pandemic has upon stock’s stability and how this compares with the 2008 financial crash.
      PubDate: Fri, 20 May 2022 06:50:05 +000
  • Mean Estimation of a Sensitive Variable under Nonresponse Using
           Three-Stage RRT Model in Stratified Two-Phase Sampling

    • Abstract: The present study addresses the problems of mean estimation and nonresponse under the three-stage RRT model. Auxiliary information on an attribute and variable is used to propose a generalized class of exponential ratio-type estimators. Expressions for the bias, mean squared error, and minimum mean squared error for the proposed estimator are derived up to the first degree of approximation. The efficiency of the proposed estimator is studied theoretically and numerically using two real datasets. From the numerical analysis, the proposed generalized class of exponential ratio-type estimators outperforms ordinary mean estimators, usual ratio estimators, and exponential ratio-type estimators. Furthermore, the efficiencies of the mean estimators are observed to decrease with an increase in the sensitivity level of the survey question. As the inverse sampling rate and nonresponse rate go up, so does the efficiency of the mean estimators, which makes them more accurate.
      PubDate: Fri, 22 Apr 2022 12:05:02 +000
  • Gompertz Ampadu Class of Distributions: Properties and Applications

    • Abstract: This paper introduces a new generator family of distributions called the Gompertz Ampadu-G family. Based on the generator, the Lomax distribution was modified into Gompertz Ampadu Lomax. The new distribution has a flexible hazard rate function that has upside-down and bathtub shapes, including increasing and decreasing hazard rate functions. The distribution comes with some desirable statistical properties. The distribution is applied to real-life data. Parameter estimates and test statistics show a better fit for the competitive models.
      PubDate: Wed, 20 Apr 2022 10:20:01 +000
  • Tweedie Model for Predicting Factors Associated with Distance Traveled to
           Access Inpatient Services in Kenya

    • Abstract: Aim. This study aims to examine which factors influence the distance traveled by patients for inpatient care in Kenya. Methods. We used data from the fourth round of the Kenya Household Health Expenditure and Utilization survey. Our dependent variable was the self-reported distance traveled by patients to access inpatient care at public health facilities. As the clustered data were correlated, we used the generalized estimating equations approach with an exchangeable correlation under a Tweedie distribution. To select the best-fit covariates for predicting distance, we adopted a variable selection technique using the and criteria, wherein the lowest (highest) value for the former (latter) is preferred. Results. Using data for 451 participants from 47 counties, we found that three-fifths were admitted between 1 and 5 days, two-thirds resided in rural areas, and 90% were satisfied with the facilities’ service. Wealth quintiles were evenly distributed across respondents. Most admissions (81%) comprised 15, 65, and 25–54 years. Many households were of medium size (4–6 members) and had low education level (48%), and nine-tenths had no access to insurance. While two-thirds reported employment-based income, the same number reported not having cash to pay for inpatient services; 6 out of 10 paid over 3000 KES. Thus, differences in employment, ability to pay, and household size influence the distance traveled to access government healthcare facilities in Kenya. Interpretation. Low-income individuals more likely have large households and live in rural areas and, thus, are forced to travel farther to access inpatient care. Unlike the unemployed, the employed may have better socioeconomic status and possibly live near inpatient healthcare facilities, thereby explaining their short distances to access these services. Policymakers must support equal access to inpatient services, prioritize rural areas, open job opportunities, and encourage smaller families.
      PubDate: Tue, 19 Apr 2022 08:35:00 +000
  • A Mixture of Clayton, Gumbel, and Frank Copulas: A Complete Dependence

    • Abstract: Knowledge of the dependence between random variables is necessary in the area of risk assessment and evaluation. Some of the existing Archimedean copulas, namely the Clayton and the Gumbel copulas, allow for higher correlations on the extreme left and right, respectively. In this study, we use the idea of convex combinations to build a hybrid Clayton–Gumbel–Frank copula that provides all dependence scenarios from existing Archimedean copulas. The corresponding density and conditional distribution functions of the derived models for two random variables, as well as an estimator for the proportion parameter associated with the proposed model, are also derived. The results show that the proposed model is able to show any case of dependence by providing coefficients for the upper tail and lower tail dependence.
      PubDate: Tue, 12 Apr 2022 18:20:01 +000
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762

Your IP address:
Home (Search)
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-