Subjects -> MATHEMATICS (Total: 1013 journals)
    - APPLIED MATHEMATICS (92 journals)
    - GEOMETRY AND TOPOLOGY (23 journals)
    - MATHEMATICS (714 journals)
    - MATHEMATICS (GENERAL) (45 journals)
    - NUMERICAL ANALYSIS (26 journals)

PROBABILITIES AND MATH STATISTICS (113 journals)                     

Showing 1 - 98 of 98 Journals sorted alphabetically
Advances in Statistics     Open Access   (Followers: 9)
Afrika Statistika     Open Access   (Followers: 1)
American Journal of Applied Mathematics and Statistics     Open Access   (Followers: 10)
American Journal of Mathematics and Statistics     Open Access   (Followers: 8)
Annals of Data Science     Hybrid Journal   (Followers: 17)
Annual Review of Statistics and Its Application     Full-text available via subscription   (Followers: 8)
Applied Medical Informatics     Open Access   (Followers: 12)
Asian Journal of Mathematics & Statistics     Open Access   (Followers: 8)
Asian Journal of Probability and Statistics     Open Access  
Austrian Journal of Statistics     Open Access   (Followers: 4)
Biostatistics & Epidemiology     Hybrid Journal   (Followers: 4)
Cadernos do IME : Série Estatística     Open Access  
Calcutta Statistical Association Bulletin     Hybrid Journal  
Communications in Mathematics and Statistics     Hybrid Journal   (Followers: 3)
Communications in Statistics - Simulation and Computation     Hybrid Journal   (Followers: 9)
Communications in Statistics: Case Studies, Data Analysis and Applications     Hybrid Journal  
Comunicaciones en Estadística     Open Access  
Econometrics and Statistics     Hybrid Journal   (Followers: 1)
Forecasting     Open Access   (Followers: 1)
Foundations and Trends® in Optimization     Full-text available via subscription   (Followers: 2)
Frontiers in Applied Mathematics and Statistics     Open Access   (Followers: 1)
Game Theory     Open Access   (Followers: 3)
Geoinformatics & Geostatistics     Hybrid Journal   (Followers: 13)
Geomatics, Natural Hazards and Risk     Open Access   (Followers: 14)
Indonesian Journal of Applied Statistics     Open Access  
International Game Theory Review     Hybrid Journal   (Followers: 1)
International Journal of Advanced Statistics and IT&C for Economics and Life Sciences     Open Access  
International Journal of Advanced Statistics and Probability     Open Access   (Followers: 6)
International Journal of Algebra and Statistics     Open Access   (Followers: 3)
International Journal of Applied Mathematics and Statistics     Full-text available via subscription   (Followers: 3)
International Journal of Ecological Economics and Statistics     Full-text available via subscription   (Followers: 5)
International Journal of Energy and Statistics     Hybrid Journal   (Followers: 3)
International Journal of Game Theory     Hybrid Journal   (Followers: 3)
International Journal of Mathematics and Statistics     Full-text available via subscription   (Followers: 2)
International Journal of Multivariate Data Analysis     Hybrid Journal  
International Journal of Probability and Statistics     Open Access   (Followers: 3)
International Journal of Statistics & Economics     Full-text available via subscription   (Followers: 6)
International Journal of Statistics and Applications     Open Access   (Followers: 2)
International Journal of Statistics and Probability     Open Access   (Followers: 3)
International Journal of Statistics in Medical Research     Hybrid Journal   (Followers: 5)
International Journal of Testing     Hybrid Journal   (Followers: 1)
Iraqi Journal of Statistical Sciences     Open Access  
Japanese Journal of Statistics and Data Science     Hybrid Journal  
Journal of Biometrics & Biostatistics     Open Access   (Followers: 5)
Journal of Cost Analysis and Parametrics     Hybrid Journal   (Followers: 5)
Journal of Environmental Statistics     Open Access   (Followers: 4)
Journal of Game Theory     Open Access   (Followers: 1)
Journal of Mathematical Economics and Finance     Full-text available via subscription  
Journal of Mathematics and Statistics Studies     Open Access  
Journal of Modern Applied Statistical Methods     Open Access   (Followers: 1)
Journal of Official Statistics     Open Access   (Followers: 2)
Journal of Quantitative Economics     Hybrid Journal  
Journal of Social and Economic Statistics     Open Access  
Journal of Statistical Theory and Practice     Hybrid Journal   (Followers: 2)
Journal of Statistics and Data Science Education     Open Access   (Followers: 2)
Journal of Survey Statistics and Methodology     Hybrid Journal   (Followers: 4)
Journal of the Indian Society for Probability and Statistics     Full-text available via subscription  
Jurnal Biometrika dan Kependudukan     Open Access   (Followers: 1)
Jurnal Ekonomi Kuantitatif Terapan     Open Access  
Jurnal Sains Matematika dan Statistika     Open Access  
Lietuvos Statistikos Darbai     Open Access  
Mathematics and Statistics     Open Access   (Followers: 2)
Methods, Data, Analyses     Open Access   (Followers: 1)
METRON     Hybrid Journal   (Followers: 2)
Nepalese Journal of Statistics     Open Access   (Followers: 1)
North American Actuarial Journal     Hybrid Journal   (Followers: 2)
Open Journal of Statistics     Open Access   (Followers: 3)
Open Mathematics, Statistics and Probability Journal     Open Access  
Pakistan Journal of Statistics and Operation Research     Open Access   (Followers: 1)
Physica A: Statistical Mechanics and its Applications     Hybrid Journal   (Followers: 6)
Probability, Uncertainty and Quantitative Risk     Open Access   (Followers: 2)
Ratio Mathematica     Open Access  
Research & Reviews : Journal of Statistics     Open Access   (Followers: 3)
Revista Brasileira de Biometria     Open Access  
Revista Colombiana de Estadística     Open Access  
RMS : Research in Mathematics & Statistics     Open Access  
Romanian Statistical Review     Open Access  
Sankhya B - Applied and Interdisciplinary Statistics     Hybrid Journal  
SIAM Journal on Mathematics of Data Science     Hybrid Journal   (Followers: 1)
SIAM/ASA Journal on Uncertainty Quantification     Hybrid Journal   (Followers: 3)
Spatial Statistics     Hybrid Journal   (Followers: 2)
Sri Lankan Journal of Applied Statistics     Open Access  
Stat     Hybrid Journal   (Followers: 1)
Stata Journal     Full-text available via subscription   (Followers: 8)
Statistica     Open Access   (Followers: 6)
Statistical Analysis and Data Mining     Hybrid Journal   (Followers: 23)
Statistical Theory and Related Fields     Hybrid Journal  
Statistics and Public Policy     Open Access   (Followers: 4)
Statistics in Transition New Series : An International Journal of the Polish Statistical Association     Open Access  
Statistics Research Letters     Open Access   (Followers: 1)
Statistics, Optimization & Information Computing     Open Access   (Followers: 3)
Stats     Open Access  
Synthesis Lectures on Mathematics and Statistics     Full-text available via subscription   (Followers: 1)
Theory of Probability and its Applications     Hybrid Journal   (Followers: 2)
Theory of Probability and Mathematical Statistics     Full-text available via subscription   (Followers: 2)
Turkish Journal of Forecasting     Open Access   (Followers: 1)
VARIANSI : Journal of Statistics and Its application on Teaching and Research     Open Access  
Zeitschrift für die gesamte Versicherungswissenschaft     Hybrid Journal  


Similar Journals
Journal Cover
Number of Followers: 0  

  This is an Open Access Journal Open Access journal
ISSN (Online) 2571-905X
Published by MDPI Homepage  [249 journals]
  • Stats, Vol. 6, Pages 99-112: An ϵ-Greedy Multiarmed Bandit Approach
           to Markov Decision Processes

    • Authors: Isa Muqattash, Jiaqiao Hu
      First page: 99
      Abstract: We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature.
      Citation: Stats
      PubDate: 2023-01-01
      DOI: 10.3390/stats6010006
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 113-130: Change Point Detection by State Space
           Modeling of Long-Term Air Temperature Series in Europe

    • Authors: Magda Monteiro, Marco Costa
      First page: 113
      Abstract: This work presents the statistical analysis of a monthly average temperatures time series in several European cities using a state space approach, which considers models with a deterministic seasonal component and a stochastic trend. Temperature rise rates in Europe seem to have increased in the last decades when compared with longer periods. Therefore, change point detection methods, both parametric and non-parametric methods, were applied to the standardized residuals of the state space models (or some other related component) in order to identify these possible changes in the monthly temperature rise rates. All of the used methods have identified at least one change point in each of the temperature time series, particularly in the late 1980s or early 1990s. The differences in the average temperature trend are more evident in Eastern European cities than in Western Europe. The smoother-based t-test framework proposed in this work showed an advantage over the other methods, precisely because it considers the time correlation presented in time series. Moreover, this framework focuses the change point detection on the stochastic trend component.
      Citation: Stats
      PubDate: 2023-01-04
      DOI: 10.3390/stats6010007
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 131-147: Statistical Prediction of Future Sports
           Records Based on Record Values

    • Authors: Christina Empacher, Udo Kamps, Grigoriy Volovskiy
      First page: 131
      Abstract: Point prediction of future record values based on sequences of previous lower or upper records is considered by means of the method of maximum product of spacings, where the underlying distribution is assumed to be a power function distribution and a Pareto distribution, respectively. Moreover, exact and approximate prediction intervals are discussed and compared with regard to their expected lengths and their percentages of coverage. The focus is on deriving explicit expressions in the point and interval prediction procedures. Predictions and forecasts are of interest, e.g., in sports analytics, which is gaining more and more attention in several sports disciplines. Previous works on forecasting athletic records have mainly been based on extreme value theory. The presented statistical prediction methods are exemplarily applied to data from various disciplines of athletics as well as to data from American football based on fantasy football points according to the points per reception scoring scheme. The results are discussed along with basic assumptions and the choice of underlying distributions.
      Citation: Stats
      PubDate: 2023-01-11
      DOI: 10.3390/stats6010008
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 148-149: Acknowledgment to Reviewers of Stats in 2022

    • Authors: Stats Editorial Office Stats Editorial Office
      First page: 148
      Abstract: High-quality academic publishing is built on rigorous peer review [...]
      Citation: Stats
      PubDate: 2023-01-12
      DOI: 10.3390/stats6010009
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 150-168: A Novel Flexible Class of Intervened Poisson
           Distribution by Lagrangian Approach

    • Authors: Muhammed Rasheed Irshad, Mohanan Monisha, Christophe Chesneau, Radhakumari Maya, Damodaran Santhamani Shibu
      First page: 150
      Abstract: The zero-truncated Poisson distribution (ZTPD) generates a statistical model that could be appropriate when observations begin once at least one event occurs. The intervened Poisson distribution (IPD) is a substitute for the ZTPD, in which some intervention processes may change the mean of the rare events. These two zero-truncated distributions exhibit underdispersion (i.e., their variance is less than their mean). In this research, we offer an alternative solution for dealing with intervention problems by proposing a generalization of the IPD by a Lagrangian approach called the Lagrangian intervened Poisson distribution (LIPD), which in fact generalizes both the ZTPD and the IPD. As a notable feature, it has the ability to analyze both overdispersed and underdispersed datasets. In addition, the LIPD has a closed-form expression of all of its statistical characteristics, as well as an increasing, decreasing, bathtub-shaped, and upside-down bathtub-shaped hazard rate function. A consequent part is devoted to its statistical application. The maximum likelihood estimation method is considered, and the effectiveness of the estimates is demonstrated through a simulated study. To evaluate the significance of the new parameter in the LIPD, a generalized likelihood ratio test is performed. Subsequently, we present a new count regression model that is suitable for both overdispersed and underdispersed datasets using the mean-parametrized form of the LIPD. Additionally, the LIPD’s relevance and application are shown using real-world datasets.
      Citation: Stats
      PubDate: 2023-01-15
      DOI: 10.3390/stats6010010
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 169-191: Informative g-Priors for Mixed Models

    • Authors: Yu-Fang Chien, Haiming Zhou, Timothy Hanson, Theodore Lystig
      First page: 169
      Abstract: Zellner’s objective g-prior has been widely used in linear regression models due to its simple interpretation and computational tractability in evaluating marginal likelihoods. However, the g-prior further allows portioning the prior variability explained by the linear predictor versus that of pure noise. In this paper, we propose a novel yet remarkably simple g-prior specification when a subject matter expert has information on the marginal distribution of the response yi. The approach is extended for use in mixed models with some surprising but intuitive results. Simulation studies are conducted to compare the model fitting under the proposed g-prior with that under other existing priors.
      Citation: Stats
      PubDate: 2023-01-16
      DOI: 10.3390/stats6010011
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 192-208: Comparing Robust Linking and Regularized
           Estimation for Linking Two Groups in the 1PL and 2PL Models in the
           Presence of Sparse Uniform Differential Item Functioning

    • Authors: Alexander Robitzsch
      First page: 192
      Abstract: In the social sciences, the performance of two groups is frequently compared based on a cognitive test involving binary items. Item response models are often utilized for comparing the two groups. However, the presence of differential item functioning (DIF) can impact group comparisons. In order to avoid the biased estimation of groups, appropriate statistical methods for handling differential item functioning are required. This article compares the performance-regularized estimation and several robust linking approaches in three simulation studies that address the one-parameter logistic (1PL) and two-parameter logistic (2PL) models, respectively. It turned out that robust linking approaches are at least as effective as the regularized estimation approach in most of the conditions in the simulation studies.
      Citation: Stats
      PubDate: 2023-01-25
      DOI: 10.3390/stats6010012
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 209-231: Bayesian Logistic Regression Model for

    • Authors: Lu Chen, Balgobin Nandram
      First page: 209
      Abstract: Many population-based surveys have binary responses from a large number of individuals in each household within small areas. One example is the Nepal Living Standards Survey (NLSS II), in which health status binary data (good versus poor) for each individual from sampled households (sub-areas) are available in the sampled wards (small areas). To make an inference for the finite population proportion of individuals in each household, we use the sub-area logistic regression model with reliable auxiliary information. The contribution of this model is twofold. First, we extend an area-level model to a sub-area level model. Second, because there are numerous sub-areas, standard Markov chain Monte Carlo (MCMC) methods to find the joint posterior density are very time-consuming. Therefore, we provide a sampling-based method, the integrated nested normal approximation (INNA), which permits fast computation. Our main goal is to describe this hierarchical Bayesian logistic regression model and to show that the computation is much faster than the exact MCMC method and also reasonably accurate. The performance of our method is studied by using NLSS II data. Our model can borrow strength from both areas and sub-areas to obtain more efficient and precise estimates. The hierarchical structure of our model captures the variation in the binary data reasonably well.
      Citation: Stats
      PubDate: 2023-01-29
      DOI: 10.3390/stats6010013
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 232-252: A New Class of Alternative Bivariate
           Kumaraswamy-Type Models: Properties and Applications

    • Authors: Indranil Ghosh
      First page: 232
      Abstract: In this article, we introduce two new bivariate Kumaraswamy (KW)-type distributions with univariate Kumaraswamy marginals (under certain parametric restrictions) that are less restrictive in nature compared with several other existing bivariate beta and beta-type distributions. Mathematical expressions for the joint and marginal density functions are presented, and properties such as the marginal and conditional distributions, product moments and conditional moments are obtained. Additionally, we show that both the proposed bivariate probability models have positive likelihood ratios dependent on a potential model for fitting positively dependent data in the bivariate domain. The method of maximum likelihood and the method of moments are used to derive the associated estimation procedure. An acceptance and rejection sampling plan to draw random samples from one of the proposed models along with a simulation study are also provided. For illustrative purposes, two real data sets are reanalyzed from different domains to exhibit the applicability of the proposed models in comparison with several other bivariate probability distributions, which are defined on [0,1]×[0,1].
      Citation: Stats
      PubDate: 2023-01-30
      DOI: 10.3390/stats6010014
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 253-267: Farlie–Gumbel–Morgenstern
           Bivariate Moment Exponential Distribution and Its Inferences Based on
           Concomitants of Order Statistics

    • Authors: Sasikumar Padmini Arun, Christophe Chesneau, Radhakumari Maya, Muhammed Rasheed Irshad
      First page: 253
      Abstract: In this research, we design the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution, a bivariate analogue of the moment exponential distribution, using the Farlie–Gumbel–Morgenstern approach. With the analysis of real-life data, the competitiveness of the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution in comparison with the other Farlie–Gumbel–Morgenstern distributions is discussed. Based on the Farlie–Gumbel–Morgenstern bivariate moment exponential distribution, we develop the distribution theory of concomitants of order statistics and derive the best linear unbiased estimator of the parameter associated with the variable of primary interest (study variable). Evaluations are also conducted regarding the efficiency comparison of the best linear unbiased estimator relative to the respective unbiased estimator. Additionally, empirical illustrations of the best linear unbiased estimator with respect to the unbiased estimator are performed.
      Citation: Stats
      PubDate: 2023-02-03
      DOI: 10.3390/stats6010015
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 268-278: Point Cloud Registration via Heuristic
           Reward Reinforcement Learning

    • Authors: Bingren Chen
      First page: 268
      Abstract: This paper proposes a heuristic reward reinforcement learning framework for point cloud registration. As an essential step of many 3D computer vision tasks such as object recognition and 3D reconstruction, point cloud registration has been well studied in the existing literature. This paper contributes to the literature by addressing the limitations of embedding and reward functions in existing methods. An improved state-embedding module and a stochastic reward function are proposed. While the embedding module enriches the captured characteristics of states, the newly designed reward function follows a time-dependent searching strategy, which allows aggressive attempts at the beginning and tends to be conservative in the end. We assess our method based on two public datasets (ModelNet40 and ScanObjectNN) and real-world data. The results confirm the strength of the new method in reducing errors in object rotation and translation, leading to more precise point cloud registration.
      Citation: Stats
      PubDate: 2023-02-06
      DOI: 10.3390/stats6010016
      Issue No: Vol. 6, No. 1 (2023)
  • Stats, Vol. 6, Pages 1-16: A Semiparametric Tilt Optimality Model

    • Authors: Chathurangi H. Pathiravasan, Bhaskar Bhattacharya
      First page: 1
      Abstract: Practitioners often face the situation of comparing any set of k distributions, which may follow neither normality nor equality of variances. We propose a semiparametric model to compare those distributions using an exponential tilt method. This extends the classical analysis of variance models when all distributions are unknown by relaxing its assumptions. The proposed model is optimal when one of the distributions is known. Large-sample estimates of the model parameters are derived, and the hypotheses for the equality of the distributions are tested for one-at-a-time and simultaneous comparison cases. Real data examples from NASA meteorology experiments and social credit card limits are analyzed to illustrate our approach. The proposed approach is shown to be preferable in a simulated power comparison with existing parametric and nonparametric methods.
      Citation: Stats
      PubDate: 2022-12-22
      DOI: 10.3390/stats6010001
      Issue No: Vol. 6, No. 1 (2022)
  • Stats, Vol. 6, Pages 17-29: Data Cloning Estimation and Identification of
           a Medium-Scale DSGE Model

    • Authors: Pedro Chaim, Márcio Poletti Laurini
      First page: 17
      Abstract: We apply the data cloning method to estimate a medium-scale dynamic stochastic general equilibrium model. The data cloning algorithm is a numerical method that employs replicas of the original sample to approximate the maximum likelihood estimator as the limit of Bayesian simulation-based estimators. We also analyze the identification properties of the model. We measure the individual identification strength of each parameter by observing the posterior volatility of data cloning estimates and access the identification problem globally through the maximum eigenvalue of the posterior data cloning covariance matrix. Our results corroborate existing evidence suggesting that the DSGE model of Smeets and Wouters is only poorly identified. The model displays weak global identification properties, and many of its parameters seem locally ill-identified.
      Citation: Stats
      PubDate: 2022-12-24
      DOI: 10.3390/stats6010002
      Issue No: Vol. 6, No. 1 (2022)
  • Stats, Vol. 6, Pages 30-49: Estimating Smoothness and Optimal Bandwidth
           for Probability Density Functions

    • Authors: Dimitris N. Politis, Peter F. Tarassenko, Vyacheslav A. Vasiliev
      First page: 30
      Abstract: The properties of non-parametric kernel estimators for probability density function from two special classes are investigated. Each class is parametrized with distribution smoothness parameter. One of the classes was introduced by Rosenblatt, another one is introduced in this paper. For the case of the known smoothness parameter, the rates of mean square convergence of optimal (on the bandwidth) density estimators are found. For the case of unknown smoothness parameter, the estimation procedure of the parameter is developed and almost surely convergency is proved. The convergence rates in the almost sure sense of these estimators are obtained. Adaptive estimators of densities from the given class on the basis of the constructed smoothness parameter estimators are presented. It is shown in examples how parameters of the adaptive density estimation procedures can be chosen. Non-asymptotic and asymptotic properties of these estimators are investigated. Specifically, the upper bounds for the mean square error of the adaptive density estimators for a fixed sample size are found and their strong consistency is proved. The convergence of these estimators in the almost sure sense is established. Simulation results illustrate the realization of the asymptotic behavior when the sample size grows large.
      Citation: Stats
      PubDate: 2022-12-27
      DOI: 10.3390/stats6010003
      Issue No: Vol. 6, No. 1 (2022)
  • Stats, Vol. 6, Pages 50-66: Do Deep Reinforcement Learning Agents Model

    • Authors: Tambet Matiisen, Aqeel Labash, Daniel Majoral, Jaan Aru, Raul Vicente
      First page: 50
      Abstract: Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each agent models or represents other agents in their environment remains unclear. In this work, we test whether deep reinforcement learning agents trained with the multi-agent deep deterministic policy gradient (MADDPG) algorithm explicitly represent other agents’ intentions (their specific aims or plans) during a task in which the agents have to coordinate the covering of different spots in a 2D environment. In particular, we tracked over time the performance of a linear decoder trained to predict the final targets of all agents from the hidden-layer activations of each agent’s neural network controller. We observed that the hidden layers of agents represented explicit information about other agents’ intentions, i.e., the target landmark the other agent ended up covering. We also performed a series of experiments in which some agents were replaced by others with fixed targets to test the levels of generalization of the trained agents. We noticed that during the training phase, the agents developed a preference for each landmark, which hindered generalization. To alleviate the above problem, we evaluated simple changes to the MADDPG training algorithm which lead to better generalization against unseen agents. Our method for confirming intention modeling in deep learning agents is simple to implement and can be used to improve the generalization of multi-agent systems in fields such as robotics, autonomous vehicles and smart cities.
      Citation: Stats
      PubDate: 2022-12-28
      DOI: 10.3390/stats6010004
      Issue No: Vol. 6, No. 1 (2022)
  • Stats, Vol. 6, Pages 67-98: Applying the Multilevel Approach in Estimation
           of Income Population Differences

    • Authors: Venera Timiryanova, Dina Krasnoselskaya, Natalia Kuzminykh
      First page: 67
      Abstract: Income inequality remains one of the most burning issues discussed in the world. The difficulty of the problem arises from its multiple manifestations at regional and local levels and unique patterns within countries. This paper employs a multilevel approach to identify factors that influence income and wage inequalities at regional and municipal scales in Russia. We carried out the study on data from 2017 municipalities of 75 Russian regions from 2015 to 2019. A Hierarchical Linear Model with Cross-Classified Random Effects (HLMHCM) allowed us to establish that most of the total variances in population income and average wages accounted for the regional scale. Our analysis revealed different variances of income per capita and average wage; we disclosed the reasons for these disparities. We also found a mixed relationship between income inequality and social transfers. These variables influence income growth but change the relationship between income and labour productivity. Our study underlined that the impacts of shares of employees in agriculture and manufacturing should be considered together with labour productivity in these industries.
      Citation: Stats
      PubDate: 2022-12-29
      DOI: 10.3390/stats6010005
      Issue No: Vol. 6, No. 1 (2022)
  • Stats, Vol. 5, Pages 934-947: Benford Networks

    • Authors: Roeland de Kok, Giulia Rotundo
      First page: 934
      Abstract: The Benford law applied within complex networks is an interesting area of research. This paper proposes a new algorithm for the generation of a Benford network based on priority rank, and further specifies the formal definition. The condition to be taken into account is the probability density of the node degree. In addition to this first algorithm, an iterative algorithm is proposed based on rewiring. Its development requires the introduction of an ad hoc measure for understanding how far an arbitrary network is from a Benford network. The definition is a semi-distance and does not lead to a distance in mathematical terms, instead serving to identify the Benford network as a class. The semi-distance is a function of the network; it is computationally less expensive than the degree of conformity and serves to set a descent condition for the rewiring. The algorithm stops when it meets the condition that either the network is Benford or the maximum number of iterations is reached. The second condition is needed because only a limited set of densities allow for a Benford network. Another important topic is assortativity and the extremes which can be achieved by constraining the network topology; for this reason, we ran simulations on artificial networks and explored further theoretical settings as preliminary work on models of preferential attachment. Based on our extensive analysis, the first proposed algorithm remains the best one from a computational point of view.
      Citation: Stats
      PubDate: 2022-09-30
      DOI: 10.3390/stats5040054
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 948-969: On the Bivariate Composite
           Gumbel–Pareto Distribution

    • Authors: Alexandra Badea, Catalina Bolancé, Raluca Vernic
      First page: 948
      Abstract: In this paper, we propose a bivariate extension of univariate composite (two-spliced) distributions defined by a bivariate Pareto distribution for values larger than some thresholds and by a bivariate Gumbel distribution on the complementary domain. The purpose of this distribution is to capture the behavior of bivariate data consisting of mainly small and medium values but also of some extreme values. Some properties of the proposed distribution are presented. Further, two estimation procedures are discussed and illustrated on simulated data and on a real data set consisting of a bivariate sample of claims from an auto insurance portfolio. In addition, the risk of loss in this insurance portfolio is estimated by Monte Carlo simulation.
      Citation: Stats
      PubDate: 2022-10-16
      DOI: 10.3390/stats5040055
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 970-976: Ordinal Cochran-Mantel-Haenszel Testing and
           Nonparametric Analysis of Variance: Competing Methodologies

    • Authors: J. C. W. Rayner, G. C. Livingston
      First page: 970
      Abstract: The Cochran-Mantel-Haenszel (CMH) and nonparametric analysis of variance (NP ANOVA) methodologies are both sets of tests for categorical response data. The latter are competitor tests for the ordinal CMH tests in which the response variable is necessarily ordinal; the treatment variable may be either ordinal or nominal. The CMH mean score test seeks to detect mean treatment differences, while the CMH correlation test assesses ordinary or (1, 1) generalized correlation. Since the corresponding nonparametric ANOVA tests assess arbitrary univariate and bivariate moments, the ordinal CMH tests have been extended to enable a fuller comparison. The CMH tests are conditional tests, assuming that certain marginal totals in the data table are known. They have been extended to have unconditional analogues. The NP ANOVA tests are unconditional. Here, we give a brief overview of both methodologies to address the question “which methodology is preferable'”.
      Citation: Stats
      PubDate: 2022-10-17
      DOI: 10.3390/stats5040056
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 977-984: Extreme Tail Ratios and Overrepresentation
           among Subpopulations with Normal Distributions

    • Authors: Theodore P. Hill, Ronald F. Fox
      First page: 977
      Abstract: Given several different populations, the relative proportions of each in the high (or low) end of the distribution of a given characteristic are often more important than the overall average values or standard deviations. In the case of two different normally-distributed random variables, as is shown here, one of the (right) tail ratios will not only eventually be greater than 1 from some point on, but will even become infinitely large. More generally, in every finite mixture of different normal distributions, there will always be exactly one of those distributions that is not only overrepresented in the right tail of the mixture but even completely overwhelms all other subpopulations in the rightmost tails. This property (and the analogous result for the left tails), although not unique to normal distributions, is not shared by other common continuous centrally symmetric unimodal distributions, such as Laplace, nor even by other bell-shaped distributions, such as Cauchy (Lorentz) distributions.
      Citation: Stats
      PubDate: 2022-10-20
      DOI: 10.3390/stats5040057
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 985-992: Snooker Statistics and Zipf’s Law

    • Authors: Wim Hordijk
      First page: 985
      Abstract: Zipf’s law is well known in linguistics: the frequency of a word is inversely proportional to its rank. This is a special case of a more general power law, a common phenomenon in many kinds of real-world statistical data. Here, it is shown that snooker statistics also follow such a mathematical pattern, but with varying parameter values. Two types of rankings (prize money earned and centuries scored), and three different time frames (all-time, decade, and year) are considered. The results indicate that the power law parameter values depend on the type of ranking used, as well as the time frame considered. Furthermore, in some cases, the resulting parameter values vary significantly over time, for which a plausible explanation is provided. Finally, it is shown how individual rankings can be described somewhat more accurately using a log-normal distribution, but that the overall conclusions derived from the power law analysis remain valid.
      Citation: Stats
      PubDate: 2022-10-21
      DOI: 10.3390/stats5040058
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 993-1003: Comparison of Positivity in Two Epidemic
           Waves of COVID-19 in Colombia with FDA

    • Authors: Cristhian Leonardo Urbano-Leon, Manuel Escabias
      First page: 993
      Abstract: We use the functional data methodology to examine whether there are significant differences between two waves of contagion by COVID-19 in Colombia between 7 July 2020 and 20 July 2021. A pointwise functional t-test is initially used, then an alternative statistical test proposal for paired samples is presented, which has a theoretical distribution and performs well in small samples. Our statistical test generates a scalar p-value, which provides a global idea about the significance of the positivity curves, complementing the existing punctual tests, as an advantage.
      Citation: Stats
      PubDate: 2022-10-28
      DOI: 10.3390/stats5040059
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1004-1028: A Novel Generalization of Zero-Truncated
           Binomial Distribution by Lagrangian Approach with Applications for the
           COVID-19 Pandemic

    • Authors: Muhammed Rasheed Irshad, Christophe Chesneau, Damodaran Santhamani Shibu, Mohanan Monisha, Radhakumari Maya
      First page: 1004
      Abstract: The importance of Lagrangian distributions and their applicability in real-world events have been highlighted in several studies. In light of this, we create a new zero-truncated Lagrangian distribution. It is presented as a generalization of the zero-truncated binomial distribution (ZTBD) and hence named the Lagrangian zero-truncated binomial distribution (LZTBD). The moments, probability generating function, factorial moments, as well as skewness and kurtosis measures of the LZTBD are discussed. We also show that the new model’s finite mixture is identifiable. The unknown parameters of the LZTBD are estimated using the maximum likelihood method. A broad simulation study is executed as an evaluation of the well-established performance of the maximum likelihood estimates. The likelihood ratio test is used to assess the effectiveness of the third parameter in the new model. Six COVID-19 datasets are used to demonstrate the LZTBD’s applicability, and we conclude that the LZTBD is very competitive on the fitting objective.
      Citation: Stats
      PubDate: 2022-10-30
      DOI: 10.3390/stats5040060
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1029-1043: Spatial Analysis: A Socioeconomic View on
           the Incidence of the New Coronavirus in Paraná-Brazil

    • Authors: Elizabeth Giron Cima, Miguel Angel Uribe Opazo, Marcos Roberto Bombacini, Weimar Freire da Rocha Junior, Luciana Pagliosa Carvalho Guedes
      First page: 1029
      Abstract: This paper presents a spatial analysis of the incidence rate of COVID-19 cases in the state of Paraná, Brazil, from June to December 2020, and a study of the incidence rate of COVID-19 cases associated with socioeconomic variables, such as the Gini index, Theil-L index, and municipal human development index (MHDI). The data were provided from the Paraná State Health Department and Paraná Institute for Economic and Social Development. For the study of spatial autocorrelation, the univariate global Moran index (I), local univariate Moran (LISA), global Geary (c), and univariate local Geary (ci) were calculated. For the analysis of the spatial correlation, the global bivariate Moran index (Ixy), the local multivariate Geary indices (CiM), and the bivariate Lee index (Lxy) were calculated. There is significant positive spatial autocorrelation between the incidence rate of COVID-19 cases and correlations between the incidence rate of COVID-19 cases and the Gini index, Theil-L index, and MHDI in the regions under study. The highest risk areas were concentrated in the macro-regions: east and west. Understanding the spatial distribution of COVID-19, combined with economic and social factors, can contribute to greater efficiency in preventive actions and the control of new viral epidemics.
      Citation: Stats
      PubDate: 2022-10-31
      DOI: 10.3390/stats5040061
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1044-1061: Product Recalls in European Textile and

    • Authors: Vijay Kumar
      First page: 1044
      Abstract: Textile and clothing (T&C) products contribute to a substantial proportion of the nonfood product recalls in the European Union (EU) due to various levels of associated risks. Out of the listed 34 categories for product recalls in the EU’s Rapid Exchange of Information System (RAPEX), the category ‘clothing, textiles, and fashion items’ was among the top 3 categories with the most recall cases during 2013–2019. Previous studies have attempted to highlight the issue of product recalls and their impacts from the perspective of a single company or selected companies, whereas limited attention is paid to understand the problem from a sector-specific perspective. However, considering the nature of product risks and the consistency in a higher number of recall cases, it is important to analyze the issue of product recalls in the T&C sector from a sector-specific perspective. In this context, the paper focuses on investigating the past recalls in the T&C sector reported RAPEX during 2005–2021 to understand the major trends in recall occurrence and associated hazards. Correspondence Analysis (CA) and Latent Dirichlet Allocation (LDA) were applied to analyze the qualitative and quantitative recall data. The results reveal that there is a geographical pattern for the product risk that leads to the recalls. The countries in eastern part of Europe tend to have proportionately high recalls in strangulation and choking-related issues, whereas chemical-related recalls are proportionately high in countries located in western part of Europe. Further, text-mining results indicate that design-related recall issues are more prevalent in children’s clothing.
      Citation: Stats
      PubDate: 2022-10-31
      DOI: 10.3390/stats5040062
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1062-1078: Bayesian Hierarchical Copula Models with a
           Dirichlet–Laplace Prior

    • Authors: Paolo Onorati, Brunero Liseo
      First page: 1062
      Abstract: We discuss a Bayesian hierarchical copula model for clusters of financial time series. A similar approach has been developed in recent paper. However, the prior distributions proposed there do not always provide a proper posterior. In order to circumvent the problem, we adopt a proper global–local shrinkage prior, which is also able to account for potential dependence structures among different clusters. The performance of the proposed model is presented via simulations and a real data analysis.
      Citation: Stats
      PubDate: 2022-11-01
      DOI: 10.3390/stats5040063
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1079-1096: Bias-Corrected Maximum Likelihood
           Estimation and Bayesian Inference for the Process Performance Index Using
           Inverse Gaussian Distribution

    • Authors: Tzong-Ru Tsai, Hua Xin, Ya-Yen Fan, Yuhlong Lio
      First page: 1079
      Abstract: In this study, the estimation methods of bias-corrected maximum likelihood (BCML), bootstrap BCML (B-BCML) and Bayesian using Jeffrey’s prior distribution were proposed for the inverse Gaussian distribution with small sample cases to obtain the ML and Bayes estimators of the model parameters and the process performance index based on the lower specification process performance index. Moreover, an approximate confidence interval and the highest posterior density interval of the process performance index were established via the delta and Bayesian inference methods, respectively. To overcome the computational difficulty of sampling from the posterior distribution in Bayesian inference, the Markov chain Monte Carlo approach was used to implement the proposed Bayesian inference procedures. Monte Carlo simulations were conducted to evaluate the performance of the proposed BCML, B-BCML and Bayesian estimation methods. An example of the active repair times for an airborne communication transceiver is used for illustration.
      Citation: Stats
      PubDate: 2022-11-05
      DOI: 10.3390/stats5040064
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1097-1112: Selected Payback Statistical Contributions
           to Matrix/Linear Algebra: Some Counterflowing Conceptualizations

    • Authors: Daniel A. Griffith
      First page: 1097
      Abstract: Matrix/linear algebra continues bestowing benefits on theoretical and applied statistics, a practice it began decades ago (re Fisher used the word matrix in a 1941 publication), through a myriad of contributions, from recognition of a suite of matrix properties relevant to statistical concepts, to matrix specifications of linear and nonlinear techniques. Consequently, focused parts of matrix algebra are topics of several statistics books and journal articles. Contributions mostly have been unidirectional, from matrix/linear algebra to statistics. Nevertheless, statistics offers great potential for making this interface a bidirectional exchange point, the theme of this review paper. Not surprisingly, regression, the workhorse of statistics, provides one tool for such historically based recompence. Another prominent one is the mathematical matrix theory eigenfunction abstraction. A third is special matrix operations, such as Kronecker sums and products. A fourth is multivariable calculus linkages, especially arcane matrix/vector operators as well as the Jacobian term associated with variable transformations. A fifth, and the final idea this paper treats, is random matrices/vectors within the context of simulation, particularly for correlated data. These are the five prospectively reviewed discipline of statistics subjects capable of informing, inspiring, or otherwise furnishing insight to the far more general world of linear algebra.
      Citation: Stats
      PubDate: 2022-11-09
      DOI: 10.3390/stats5040065
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1113-1129: Conditional Kaplan–Meier Estimator
           with Functional Covariates for Time-to-Event Data

    • Authors: Sudaraka Tholkage, Qi Zheng, Karunarathna B. Kulasekera
      First page: 1113
      Abstract: Due to the wide availability of functional data from multiple disciplines, the studies of functional data analysis have become popular in the recent literature. However, the related development in censored survival data has been relatively sparse. In this work, we consider the problem of analyzing time-to-event data in the presence of functional predictors. We develop a conditional generalized Kaplan–Meier (KM) estimator that incorporates functional predictors using kernel weights and rigorously establishes its asymptotic properties. In addition, we propose to select the optimal bandwidth based on a time-dependent Brier score. We then carry out extensive numerical studies to examine the finite sample performance of the proposed functional KM estimator and bandwidth selector. We also illustrated the practical usage of our proposed method by using a data set from Alzheimer’s Disease Neuroimaging Initiative data.
      Citation: Stats
      PubDate: 2022-11-10
      DOI: 10.3390/stats5040066
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1130-1144: On the Sampling Size for Inverse Sampling

    • Authors: Daniele Cuntrera, Vincenzo Falco, Ornella Giambalvo
      First page: 1130
      Abstract: In the Big Data era, sampling remains a central theme. This paper investigates the characteristics of inverse sampling on two different datasets (real and simulated) to determine when big data become too small for inverse sampling to be used and to examine the impact of the sampling rate of the subsamples. We find that the method, using the appropriate subsample size for both the mean and proportion parameters, performs well with a smaller dataset than big data through the simulation study and real-data application. Different settings related to the selection bias severity are considered during the simulation study and real application.
      Citation: Stats
      PubDate: 2022-11-15
      DOI: 10.3390/stats5040067
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1145-1158: A New Predictive Algorithm for Time Series
           Forecasting Based on Machine Learning Techniques: Evidence for Decision
           Making in Agriculture and Tourism Sectors

    • Authors: Borrero, Mariscal, Vargas-Sánchez
      First page: 1145
      Abstract: Accurate time series prediction techniques are becoming fundamental to modern decision support systems. As massive data processing develops in its practicality, machine learning (ML) techniques applied to time series can automate and improve prediction models. The radical novelty of this paper is the development of a hybrid model that combines a new approach to the classical Kalman filter with machine learning techniques, i.e., support vector regression (SVR) and nonlinear autoregressive (NAR) neural networks, to improve the performance of existing predictive models. The proposed hybrid model uses, on the one hand, an improved Kalman filter method that eliminates the convergence problems of time series data with large error variance and, on the other hand, an ML algorithm as a correction factor to predict the model error. The results reveal that our hybrid models obtain accurate predictions, substantially reducing the root mean square and absolute mean errors compared to the classical and alternative Kalman filter models and achieving a goodness of fit greater than 0.95. Furthermore, the generalization of this algorithm was confirmed by its validation in two different scenarios.
      Citation: Stats
      PubDate: 2022-11-16
      DOI: 10.3390/stats5040068
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1159-1173: A Weibull-Beta Prime Distribution to Model
           COVID-19 Data with the Presence of Covariates and Censored Data

    • Authors: Elisângela C. Biazatti, Gauss M. Cordeiro, Gabriela M. Rodrigues, Edwin M. M. Ortega, Luís H. de Santana
      First page: 1159
      Abstract: Motivated by the recent popularization of the beta prime distribution, a more flexible generalization is presented to fit symmetrical or asymmetrical and bimodal data, and a non-monotonic failure rate. Thus, the Weibull-beta prime distribution is defined, and some of its structural properties are obtained. The parameters are estimated by maximum likelihood, and a new regression model is proposed. Some simulations reveal that the estimators are consistent, and applications to censored COVID-19 data show the adequacy of the models.
      Citation: Stats
      PubDate: 2022-11-17
      DOI: 10.3390/stats5040069
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1174-1194: Closed Form Bayesian Inferences for Binary
           Logistic Regression with Applications to American Voter Turnout

    • Authors: Kevin Dayaratna, Jesse Crosson, Chandler Hubbard
      First page: 1174
      Abstract: Understanding the factors that influence voter turnout is a fundamentally important question in public policy and political science research. Bayesian logistic regression models are useful for incorporating individual level heterogeneity to answer these and many other questions. When these questions involve incorporating individual level heterogeneity for large data sets that include many demographic and ethnic subgroups, however, standard Markov Chain Monte Carlo (MCMC) sampling methods to estimate such models can be quite slow and impractical to perform in a reasonable amount of time. We present an innovative closed form Empirical Bayesian approach that is significantly faster than MCMC methods, thus enabling the estimation of voter turnout models that had previously been considered computationally infeasible. Our results shed light on factors impacting voter turnout data in the 2000, 2004, and 2008 presidential elections. We conclude with a discussion of these factors and the associated policy implications. We emphasize, however, that although our application is to the social sciences, our approach is fully generalizable to the myriads of other fields involving statistical models with binary dependent variables and high-dimensional parameter spaces as well.
      Citation: Stats
      PubDate: 2022-11-17
      DOI: 10.3390/stats5040070
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1195-1212: Model Validation of a Single
           Degree-of-Freedom Oscillator: A Case Study

    • Authors: Edward Boone, Jan Hannig, Ryad Ghanam, Sujit Ghosh, Fabrizio Ruggeri, Serge Prudhomme
      First page: 1195
      Abstract: In this paper, we investigate a validation process in order to assess the predictive capabilities of a single degree-of-freedom oscillator. Model validation is understood here as the process of determining the accuracy with which a model can predict observed physical events or important features of the physical system. Therefore, assessment of the model needs to be performed with respect to the conditions under which the model is used in actual simulations of the system and to specific quantities of interest used for decision-making. Model validation also supposes that the model be trained and tested against experimental data. In this work, virtual data are produced from a non-linear single degree-of-freedom oscillator, the so-called oracle model, which is supposed to provide an accurate representation of reality. The mathematical model to be validated is derived from the oracle model by simply neglecting the non-linear term. The model parameters are identified via Bayesian updating. This calibration process also includes a modeling error due to model misspecification and modeled as a normal probability density function with zero mean and standard deviation to be calibrated.
      Citation: Stats
      PubDate: 2022-11-18
      DOI: 10.3390/stats5040071
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1212-1220: On the Relation between Lambert W-Function
           and Generalized Hypergeometric Functions

    • Authors: Pushpa Narayan Rathie, Luan Carlos de Sena Monteiro Ozelim
      First page: 1212
      Abstract: In the theory of special functions, finding correlations between different types of functions is of great interest as unifying results, especially when considering issues such as analytic continuation. In the present paper, the relation between Lambert W-function and generalized hypergeometric functions is discussed. It will be shown that it is possible to link these functions by following two different strategies, namely, by means of the direct and inverse Mellin transform of Lambert W-function and by solving the trinomial equation originally studied by Lambert and Euler. The new results can be used both to numerically evaluate Lambert W-function and to study its analytic structure.
      Citation: Stats
      PubDate: 2022-11-23
      DOI: 10.3390/stats5040072
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1221-1230: Assessing Regional Entrepreneurship: A
           Bootstrapping Approach in Data Envelopment Analysis

    • Authors: Ioannis E. Tsolas
      First page: 1221
      Abstract: The aim of the present paper is to demonstrate the viability of using data envelopment analysis (DEA) in a regional context to evaluate entrepreneurial activities. DEA was used to assess regional entrepreneurship in Greece using individual measures of entrepreneurship as inputs and employment rates as outputs. In addition to point estimates, a bootstrap algorithm was used to produce bias-corrected metrics. In the light of the results of the study, the Greek regions perform differently in terms of converting entrepreneurial activity into job creation. Moreover, there is some evidence that unemployment may be a driver of entrepreneurship and thus negatively affects DEA-based inefficiency. The derived indicators can serve as diagnostic tools and can also be used for the design of various interventions at the regional level.
      Citation: Stats
      PubDate: 2022-11-28
      DOI: 10.3390/stats5040073
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1231-1241: A Bootstrap Method for a
           Multiple-Imputation Variance Estimator in Survey Sampling

    • Authors: Yu, Zhao
      First page: 1231
      Abstract: Rubin’s variance estimator of the multiple imputation estimator for a domain mean is not asymptotically unbiased. Kim et al. derived the closed-form bias for Rubin’s variance estimator. In addition, they proposed an asymptotically unbiased variance estimator for the multiple imputation estimator when the imputed values can be written as a linear function of the observed values. However, this needs the assumption that the covariance of the imputed values in the same imputed dataset is twice that in the different imputed datasets. In this study, we proposed a bootstrap variance estimator that does not need this assumption. Both theoretical argument and simulation studies show that it was unbiased and asymptotically valid. The new method was applied to the Hox pupil popularity data for illustration.
      Citation: Stats
      PubDate: 2022-11-29
      DOI: 10.3390/stats5040074
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1242-1253: A Bayesian One-Sample Test for Proportion

    • Authors: Luai Al-Labadi, Yifan Cheng, Forough Fazeli-Asl, Kyuson Lim, Yanqing Weng
      First page: 1242
      Abstract: This paper deals with a new Bayesian approach to the one-sample test for proportion. More specifically, let x=(x1,…,xn) be an independent random sample of size n from a Bernoulli distribution with an unknown parameter θ. For a fixed value θ0, the goal is to test the null hypothesis H0:θ=θ0 against all possible alternatives. The proposed approach is based on using the well-known formula of the Kullback–Leibler divergence between two binomial distributions chosen in a certain way. Then, the difference of the distance from a priori to a posteriori is compared through the relative belief ratio (a measure of evidence). Some theoretical properties of the method are developed. Examples and simulation results are included.
      Citation: Stats
      PubDate: 2022-12-01
      DOI: 10.3390/stats5040075
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1254-1270: Addressing Disparities in the Propensity
           Score Distributions for Treatment Comparisons from Observational Studies

    • Authors: Tingting Zhou, Michael R. Elliott, Roderick J. A. Little
      First page: 1254
      Abstract: Propensity score (PS) based methods, such as matching, stratification, regression adjustment, simple and augmented inverse probability weighting, are popular for controlling for observed confounders in observational studies of causal effects. More recently, we proposed penalized spline of propensity prediction (PENCOMP), which multiply-imputes outcomes for unassigned treatments using a regression model that includes a penalized spline of the estimated selection probability and other covariates. For PS methods to work reliably, there should be sufficient overlap in the propensity score distributions between treatment groups. Limited overlap can result in fewer subjects being matched or in extreme weights causing numerical instability and bias in causal estimation. The problem of limited overlap suggests (a) defining alternative estimands that restrict inferences to subpopulations where all treatments have the potential to be assigned, and (b) excluding or down-weighting sample cases where the propensity to receive one of the compared treatments is close to zero. We compared PENCOMP and other PS methods for estimation of alternative causal estimands when limited overlap occurs. Simulations suggest that, when there are extreme weights, PENCOMP tends to outperform the weighted estimators for ATE and performs similarly to the weighted estimators for alternative estimands. We illustrate PENCOMP in two applications: the effect of antiretroviral treatments on CD4 counts using the Multicenter AIDS cohort study (MACS) and whether right heart catheterization (RHC) is a beneficial treatment in treating critically ill patients.
      Citation: Stats
      PubDate: 2022-12-02
      DOI: 10.3390/stats5040076
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1271-1293: The Lookup Table Regression Model for
           Histogram-Valued Symbolic Data

    • Authors: Manabu Ichino
      First page: 1271
      Abstract: This paper presents the Lookup Table Regression Model (LTRM) for histogram-valued symbolic data. We first transform the given symbolic data to a numerical data table by the quantile method. Then, under the selected response variable, we apply the Monotone Blocks Segmentation (MBS) to the obtained numerical data table. If the selected response variable and some remained explanatory variable(s) organize a monotone structure, the MBS generates a Lookup Table composed of interval values. For a given object, we search the nearest value of an explanatory variable, then the corresponding value of the response variable becomes the estimated value. If the response variable and the explanatory variable(s) are covariate but they follow to a non-monotonic structure, we need to divide the given data into several monotone substructures. For this purpose, we apply the hierarchical conceptual clustering to the given data, and we obtain Multiple Lookup Tables by applying the MBS to each of substructures. We show the usefulness of the proposed method by using an artificial data set and real data sets.
      Citation: Stats
      PubDate: 2022-12-04
      DOI: 10.3390/stats5040077
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1294-1304: Regression Models for Lifetime Data: An

    • Authors: Chrys Caroni
      First page: 1294
      Abstract: Two methods dominate the regression analysis of time-to-event data: the accelerated failure time model and the proportional hazards model. Broadly speaking, these predominate in reliability modelling and biomedical applications, respectively. However, many other methods have been proposed, including proportional odds, proportional mean residual life and several other “proportional” models. This paper presents an overview of the field and the concept behind each of these ideas. Multi-parameter modelling is also discussed, in which (in contrast to, say, the proportional hazards model) more than one parameter of the lifetime distribution may depend on covariates. This includes first hitting time (or threshold) regression based on an underlying latent stochastic process. Many of the methods that have been proposed have seen little or no practical use. Lack of user-friendly software is certainly a factor in this. Diagnostic methods are also lacking for most methods.
      Citation: Stats
      PubDate: 2022-12-07
      DOI: 10.3390/stats5040078
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1305-1320: Extracting Proceedings Data from Court
           Cases with Machine Learning

    • Authors: Bruno Mathis
      First page: 1305
      Abstract: France is rolling out an open data program for all court cases, but with few metadata attached. Reusers will have to use named-entity recognition (NER) within the text body of the case to extract any value from it. Any court case may include up to 26 variables, or labels, that are related to the proceeding, regardless of the case substance. These labels are from different syntactic types: some of them are rare; others are ubiquitous. This experiment compares different algorithms, namely CRF, SpaCy, Flair and DeLFT, to extract proceedings data and uses the learning model assessment capabilities of Kairntech, an NLP platform. It shows that an NER model can apply to this large and diverse set of labels and extract data of high quality. We achieved an 87.5% F1 measure with Flair trained on more than 27,000 manual annotations. Quality may yet be improved by combining NER models by data type.
      Citation: Stats
      PubDate: 2022-12-13
      DOI: 10.3390/stats5040079
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1321-1333: Robust Testing of Paired Outcomes
           Incorporating Covariate Effects in Clustered Data with Informative Cluster

    • Authors: Dutta
      First page: 1321
      Abstract: Paired outcomes are common in correlated clustered data where the main aim is to compare the distributions of the outcomes in a pair. In such clustered paired data, informative cluster sizes can occur when the number of pairs in a cluster (i.e., a cluster size) is correlated to the paired outcomes or the paired differences. There have been some attempts to develop robust rank-based tests for comparing paired outcomes in such complex clustered data. Most of these existing rank tests developed for paired outcomes in clustered data compare the marginal distributions in a pair and ignore any covariate effect on the outcomes. However, when potentially important covariate data is available in observational studies, ignoring these covariate effects on the outcomes can result in a flawed inference. In this article, using rank based weighted estimating equations, we propose a robust procedure for covariate effect adjusted comparison of paired outcomes in a clustered data that can also address the issue of informative cluster size. Through simulated scenarios and real-life neuroimaging data, we demonstrate the importance of considering covariate effects during paired testing and robust performances of our proposed method in covariate adjusted paired comparisons in complex clustered data settings.
      Citation: Stats
      PubDate: 2022-12-14
      DOI: 10.3390/stats5040080
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 1334-1353: Statistical Analysis in the Presence of
           Spatial Autocorrelation: Selected Sampling Strategy Effects

    • Authors: Daniel A. Griffith, Richard E. Plant
      First page: 1334
      Abstract: Fundamental to most classical data collection sampling theory development is the random drawings assumption requiring that each targeted population member has a known sample selection (i.e., inclusion) probability. Frequently, however, unrestricted random sampling of spatially autocorrelated data is impractical and/or inefficient. Instead, randomly choosing a population subset accounts for its exhibited spatial pattern by utilizing a grid, which often provides improved parameter estimates, such as the geographic landscape mean, at least via its precision. Unfortunately, spatial autocorrelation latent in these data can produce a questionable mean and/or standard error estimate because each sampled population member contains information about its nearby members, a data feature explicitly acknowledged in model-based inference, but ignored in design-based inference. This autocorrelation effect prompted the development of formulae for calculating an effective sample size (i.e., the equivalent number of sample selections from a geographically randomly distributed population that would yield the same sampling error) estimate. Some researchers recently challenged this and other aspects of spatial statistics as being incorrect/invalid/misleading. This paper seeks to address this category of misconceptions, demonstrating that the effective geographic sample size is a valid and useful concept regardless of the inferential basis invoked. Its spatial statistical methodology builds upon the preceding ingredients.
      Citation: Stats
      PubDate: 2022-12-16
      DOI: 10.3390/stats5040081
      Issue No: Vol. 5, No. 4 (2022)
  • Stats, Vol. 5, Pages 583-605: Quantile Regression Approach for Analyzing
           Similarity of Gene Expressions under Multiple Biological Conditions

    • Authors: Dianliang Deng, Mashfiqul Huq Chowdhury
      First page: 583
      Abstract: Temporal gene expression data contain ample information to characterize gene function and are now widely used in bio-medical research. A dense temporal gene expression usually shows various patterns in expression levels under different biological conditions. The existing literature investigates the gene trajectory using the mean function. However, temporal gene expression curves usually show a strong degree of heterogeneity under multiple conditions. As a result, rates of change for gene expressions may be different in non-central locations and a mean function model may not capture the non-central location of the gene expression distribution. Further, the mean regression model depends on the normality assumptions of the error terms of the model, which may be impractical when analyzing gene expression data. In this research, a linear quantile mixed model is used to find the trajectory of gene expression data. This method enables the changes in gene expression over time to be studied by estimating a family of quantile functions. A statistical test is proposed to test the similarity between two different gene expressions based on estimated parameters using a quantile model. Then, the performance of the proposed test statistic is examined using extensive simulation studies. Simulation studies demonstrate the good statistical performance of this proposed test statistic and show that this method is robust against normal error assumptions. As an illustration, the proposed method is applied to analyze a dataset of 18 genes in P. aeruginosa, expressed in 24 biological conditions. Furthermore, a minimum Mahalanobis distance is used to find the clustering tree for gene expressions.
      Citation: Stats
      PubDate: 2022-07-02
      DOI: 10.3390/stats5030036
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 606-616: A Log-Det Heuristics for Covariance Matrix
           Estimation: The Analytic Setup

    • Authors: Enrico Bernardi, Matteo Farnè
      First page: 606
      Abstract: This paper studies a new nonconvex optimization problem aimed at recovering high-dimensional covariance matrices with a low rank plus sparse structure. The objective is composed of a smooth nonconvex loss and a nonsmooth composite penalty. A number of structural analytic properties of the new heuristics are presented and proven, thus providing the necessary framework for further investigating the statistical applications. In particular, the first and the second derivative of the smooth loss are obtained, its local convexity range is derived, and the Lipschitzianity of its gradient is shown. This opens the path to solve the described problem via a proximal gradient algorithm.
      Citation: Stats
      PubDate: 2022-07-05
      DOI: 10.3390/stats5030037
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 617-630: Semiparametric Survival Analysis of 30-Day
           Hospital Readmissions with Bayesian Additive Regression Kernel Model

    • Authors: Sounak Chakraborty, Peng Zhao, Yilun Huang, Tanujit Dey
      First page: 617
      Abstract: In this paper, we introduce a kernel-based nonlinear Bayesian model for a right-censored survival outcome data set. Our kernel-based approach provides a flexible nonparametric modeling framework to explore nonlinear relationships between predictors with right-censored survival outcome data. Our proposed kernel-based model is shown to provide excellent predictive performance via several simulation studies and real-life examples. Unplanned hospital readmissions greatly impair patients’ quality of life and have imposed a significant economic burden on American society. In this paper, we focus our application on predicting 30-day readmissions of patients. Our survival Bayesian additive regression kernel model (survival BARK or sBARK) improves the timeliness of readmission preventive intervention through a data-driven approach.
      Citation: Stats
      PubDate: 2022-07-14
      DOI: 10.3390/stats5030038
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 631-672: Comparing the Robustness of the Structural
           after Measurement (SAM) Approach to Structural Equation Modeling (SEM)
           against Local Model Misspecifications with Alternative Estimation

    • Authors: Alexander Robitzsch
      First page: 631
      Abstract: Structural equation models (SEM), or confirmatory factor analysis as a special case, contain model parameters at the measurement part and the structural part. In most social-science SEM applications, all parameters are simultaneously estimated in a one-step approach (e.g., with maximum likelihood estimation). In a recent article, Rosseel and Loh (2022, Psychol. Methods) proposed a two-step structural after measurement (SAM) approach to SEM that estimates the parameters of the measurement model in the first step and the parameters of the structural model in the second step. Rosseel and Loh claimed that SAM is more robust to local model misspecifications (i.e., cross loadings and residual correlations) than one-step maximum likelihood estimation. In this article, it is demonstrated with analytical derivations and simulation studies that SAM is generally not more robust to misspecifications than one-step estimation approaches. Alternative estimation methods are proposed that provide more robustness to misspecifications. SAM suffers from finite-sample bias that depends on the size of factor reliability and factor correlations. A bootstrap-bias-corrected LSAM estimate provides less biased estimates in finite samples. Nevertheless, we argue in the discussion section that applied researchers should nevertheless adopt SAM because robustness to local misspecifications is an irrelevant property when applying SAM. Parameter estimates in a structural model are of interest because intentionally misspecified SEMs frequently offer clearly interpretable factors. In contrast, SEMs with some empirically driven model modifications will result in biased estimates of the structural parameters because the meaning of factors is unintentionally changed.
      Citation: Stats
      PubDate: 2022-07-22
      DOI: 10.3390/stats5030039
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 673-688: Multivariate Global-Local Priors for Small
           Area Estimation

    • Authors: Tamal Ghosh, Malay Ghosh, Jerry J. Maples, Xueying Tang
      First page: 673
      Abstract: It is now widely recognized that small area estimation (SAE) needs to be model-based. Global-local (GL) shrinkage priors for random effects are important in sparse situations where many areas’ level effects do not have a significant impact on the response beyond what is offered by covariates. We propose in this paper a hierarchical multivariate model with GL priors. We prove the propriety of the posterior density when the regression coefficient matrix has an improper uniform prior. Some concentration inequalities are derived for the tail probabilities of the shrinkage estimators. The proposed method is illustrated via both data analysis and simulations.
      Citation: Stats
      PubDate: 2022-07-25
      DOI: 10.3390/stats5030040
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 689-713: A Variable Selection Method for Small Area
           Estimation Modeling of the Proficiency of Adult Competency

    • Authors: Weijia Ren, Jianzhu Li, Andreea Erciulescu, Tom Krenzke, Leyla Mohadjer
      First page: 689
      Abstract: In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.
      Citation: Stats
      PubDate: 2022-07-27
      DOI: 10.3390/stats5030041
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 714-737: Reciprocal Data Transformations and Their

    • Authors: Daniel A. Griffith
      First page: 714
      Abstract: Variable transformations have a long and celebrated history in statistics, one that was rather academically glamorous at least until generalized linear models theory eclipsed their nurturing normal curve theory role. Still, today it continues to be a covered topic in introductory mathematical statistics courses, offering worthwhile pedagogic insights to students about certain aspects of traditional and contemporary statistical theory and methodology. Since its inception in the 1930s, it has been plagued by a paucity of adequate back-transformation formulae for inverse/reciprocal functions. A literature search exposes that, to date, the inequality E(1/X) ≤ 1/(E(X), which often has a sizeable gap captured by the inequality part of its relationship, is the solitary contender for solving this problem. After documenting that inverse data transformations are anything but a rare occurrence, this paper proposes an innovative, elegant back-transformation solution based upon the Kummer confluent hypergeometric function of the first kind. This paper also derives formal back-transformation formulae for the Manly transformation, something apparently never done before. Much related future research remains to be undertaken; this paper furnishes numerous clues about what some of these endeavors need to be.
      Citation: Stats
      PubDate: 2022-07-30
      DOI: 10.3390/stats5030042
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 738-754: Model-Based Estimates for Farm Labor

    • Authors: Lu Chen, Nathan B. Cruze, Linda J. Young
      First page: 738
      Abstract: The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) conducts the Farm Labor Survey to produce estimates of the number of workers, duration of the workweek, and wage rates for all agricultural workers. Traditionally, expert opinion is used to integrate auxiliary information, such as the previous year’s estimates, with the survey’s direct estimates. Alternatively, implementing small area models for integrating survey estimates with additional sources of information provides more reliable official estimates and valid measures of uncertainty for each type of estimate. In this paper, several hierarchical Bayesian subarea-level models are developed in support of different estimates of interest in the Farm Labor Survey. A 2020 case study illustrates the improvement of the direct survey estimates for areas with small sample sizes by using auxiliary information and borrowing information across areas and subareas. The resulting framework provides a complete set of coherent estimates for all required geographic levels. These methods were incorporated into the official Farm Labor publication for the first time in 2020.
      Citation: Stats
      PubDate: 2022-08-03
      DOI: 10.3390/stats5030043
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 755-772: Poisson Extended Exponential Distribution
           with Associated INAR(1) Process and Applications

    • Authors: Radhakumari Maya, Christophe Chesneau, Anuresha Krishna, Muhammed Rasheed Irshad
      First page: 755
      Abstract: The significance of count data modeling and its applications to real-world phenomena have been highlighted in several research studies. The present study focuses on a two-parameter discrete distribution that can be obtained by compounding the Poisson and extended exponential distributions. It has tractable and explicit forms for its statistical properties. The maximum likelihood estimation method is used to estimate the unknown parameters. An extensive simulation study was also performed. In this paper, the significance of the proposed distribution is demonstrated in a count regression model and in a first-order integer-valued autoregressive process, referred to as the INAR(1) process. In addition to this, the empirical importance of the proposed model is proved through three real-data applications, and the empirical findings indicate that the proposed INAR(1) model provides better results than other competitive models for time series of counts that display overdispersion.
      Citation: Stats
      PubDate: 2022-08-05
      DOI: 10.3390/stats5030044
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 773-783: Neutrosophic F-Test for Two Counts of Data
           from the Poisson Distribution with Application in Climatology

    • Authors: Muhammad Aslam
      First page: 773
      Abstract: This paper addresses the modification of the F-test for count data following the Poisson distribution. The F-test when the count data are expressed in intervals is considered in this paper. The proposed F-test is evaluated using real data from climatology. The comparative study showed the efficiency of the F-test for count data under neutrosophic statistics over the F-test for count data under classical statistics.
      Citation: Stats
      PubDate: 2022-08-12
      DOI: 10.3390/stats5030045
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 784-804: Autoregressive Models with Time-Dependent
           Coefficients—A Comparison between Several Approaches

    • Authors: Rajae Azrak, Guy Mélard
      First page: 784
      Abstract: Autoregressive-moving average (ARMA) models with time-dependent (td) coefficients and marginally heteroscedastic innovations provide a natural alternative to stationary ARMA models. Several theories have been developed in the last 25 years for parametric estimations in that context. In this paper, we focus on time-dependent autoregressive (tdAR) models and consider one of the estimation theories in that case. We also provide an alternative theory for tdAR processes that relies on a -mixing property. We compare these theories to the Dahlhaus theory for locally stationary processes and the Bibi and Francq theory, made essentially for cyclically time-dependent models, with our own theory. Regarding existing theories, there are differences in the basic assumptions (e.g., on derivability with respect to time or with respect to parameters) that are better seen in specific cases such as the tdAR(1) process. There are also differences in terms of asymptotics, as shown by an example. Our opinion is that the field of application can play a role in choosing one of the theories. This paper is completed by simulation results that show that the asymptotic theory can be used even for short series (less than 50 observations).
      Citation: Stats
      PubDate: 2022-08-12
      DOI: 10.3390/stats5030046
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 805-818: Deriving the Optimal Strategy for the Two
           Dice Pig Game via Reinforcement Learning

    • Authors: Tian Zhu, Merry H. Ma
      First page: 805
      Abstract: Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game Pig, both the standard version and its variant with doubles, coined “Double-Trouble”, using certain fundamental concepts of reinforcement learning, especially the Markov decision process and dynamic programming. We further compare the newly derived optimal strategy to other popular play strategies in terms of the winning chances and the order of play. In particular, we compare to the popular “hold at n” strategy, which is considered to be close to the optimal strategy, especially for the best n, for each type of Pig Game. For the standard two-player, two-dice, sequential Pig Game examined here, we found that “hold at 23” is the best choice, with the average winning chance against the optimal strategy being 0.4747. For the “Double-Trouble” version, we found that the “hold at 18” is the best choice, with the average winning chance against the optimal strategy being 0.4733. Furthermore, time in terms of turns to play each type of game is also examined for practical purposes. For optimal vs. optimal or optimal vs. the best “hold at n” strategy, we found that the average number of turns is 19, 23, and 24 for one-die Pig, standard two-dice Pig, and the “Double-Trouble” two-dice Pig games, respectively. We hope our work will inspire students of all ages to invest in the field of reinforcement learning, which is crucial for the development of artificial intelligence and robotics and, subsequently, for the future of humanity.
      Citation: Stats
      PubDate: 2022-08-17
      DOI: 10.3390/stats5030047
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 819-840: A New Bivariate INAR(1) Model with
           Time-Dependent Innovation Vectors

    • Authors: Huaping Chen, Fukang Zhu, Xiufang Liu
      First page: 819
      Abstract: Recently, there has been a growing interest in integer-valued time series models, especially in multivariate models. Motivated by the diversity of the infinite-patch metapopulation models, we propose an extension to the popular bivariate INAR(1) model, whose innovation vector is assumed to be time-dependent in the sense that the mean of the innovation vector is linearly increased by the previous population size. We discuss the stationarity and ergodicity of the observed process and its subprocesses. We consider the conditional maximum likelihood estimate of the parameters of interest, and establish their large-sample properties. The finite sample performance of the estimator is assessed via simulations. Applications on crime data illustrate the model.
      Citation: Stats
      PubDate: 2022-08-19
      DOI: 10.3390/stats5030048
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 841-855: A New Benford Test for Clustered Data with
           Applications to American Elections

    • Authors: Katherine M. Anderson, Kevin Dayaratna, Drew Gonshorowski, Steven J. Miller
      First page: 841
      Abstract: A frequent problem with classic first digit applications of Benford’s law is the law’s inapplicability to clustered data, which becomes especially problematic for analyzing election data. This study offers a novel adaptation of Benford’s law by performing a first digit analysis after converting vote counts from election data to base 3 (referred to throughout the paper as 1-BL 3), spreading out the data and thus rendering the law significantly more useful. We test the efficacy of our approach on synthetic election data using discrete Weibull modeling, finding in many cases that election data often conforms to 1-BL 3. Lastly, we apply 1-BL 3 analysis to selected states from the 2004 US Presidential election to detect potential statistical anomalies.
      Citation: Stats
      PubDate: 2022-08-31
      DOI: 10.3390/stats5030049
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 856-881: Modeling Realized Variance with Realized

    • Authors: Hiroyuki Kawakatsu
      First page: 856
      Abstract: This paper proposes a model for realized variance that exploits information in realized quarticity. The realized variance and quarticity measures are both highly persistent and highly correlated with each other. The proposed model incorporates information from the observed realized quarticity process via autoregressive conditional variance dynamics. It exploits conditional dependence in higher order (fourth) moments in analogy to the class of GARCH models exploit conditional dependence in second moments.
      Citation: Stats
      PubDate: 2022-09-07
      DOI: 10.3390/stats5030050
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 881-897: Using Small Area Estimation to Produce
           Official Statistics

    • Authors: Linda J. Young, Lu Chen
      First page: 881
      Abstract: The USDA National Agricultural Statistics Service (NASS) and other federal statistical agencies have used probability-based surveys as the foundation for official statistics for over half a century. Non-survey data that can be used to improve the accuracy and precision of estimates such as administrative, remotely sensed, and retail data have become increasingly available. Both frequentist and Bayesian models are used to combine survey and non-survey data in a principled manner. NASS has recently adopted Bayesian subarea models for three of its national programs: farm labor, crop county estimates, and cash rent county estimates. Each program provides valuable estimates at multiple scales of geography. For each program, technical challenges had to be met and a strenuous review completed before models could be adopted as the foundation for official statistics. Moving models out of the research phase into production required major changes in the production process and a cultural shift. With the implemented models, NASS now has measures of uncertainty, transparency, and reproducibility of its official statistics.
      Citation: Stats
      PubDate: 2022-09-08
      DOI: 10.3390/stats5030051
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 898-915: Smoothing County-Level Sampling Variances to
           Improve Small Area Models’ Outputs

    • Authors: Lu Chen, Luca Sartore, Habtamu Benecha, Valbona Bejleri, Balgobin Nandram
      First page: 898
      Abstract: The use of hierarchical Bayesian small area models, which take survey estimates along with auxiliary data as input to produce official statistics, has increased in recent years. Survey estimates for small domains are usually unreliable due to small sample sizes, and the corresponding sampling variances can also be imprecise and unreliable. This affects the performance of the model (i.e., the model will not produce an estimate or will produce a low-quality modeled estimate), which results in a reduced number of official statistics published by a government agency. To mitigate the unreliable sampling variances, these survey-estimated variances are typically modeled against the direct estimates wherever a relationship between the two is present. However, this is not always the case. This paper explores different alternatives to mitigate the unreliable (beyond some threshold) sampling variances. A Bayesian approach under the area-level model set-up and a distribution-free technique based on bootstrap sampling are proposed to update the survey data. An application to the county-level corn yield data from the County Agricultural Production Survey of the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) is used to illustrate the proposed approaches. The final county-level model-based estimates for small area domains, produced based on updated survey data from each method, are compared with county-level model-based estimates produced based on the original survey data and the official statistics published in 2016.
      Citation: Stats
      PubDate: 2022-09-11
      DOI: 10.3390/stats5030052
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 916-933: Robust Permutation Tests for Penalized

    • Authors: Nathaniel E. Helwig
      First page: 916
      Abstract: Penalized splines are frequently used in applied research for understanding functional relationships between variables. In most applications, statistical inference for penalized splines is conducted using the random effects or Bayesian interpretation of a smoothing spline. These interpretations can be used to assess the uncertainty of the fitted values and the estimated component functions. However, statistical tests about the nature of the function are more difficult, because such tests often involve testing a null hypothesis that a variance component is equal to zero. Furthermore, valid statistical inference using the random effects or Bayesian interpretation depends on the validity of the utilized parametric assumptions. To overcome these limitations, I propose a flexible and robust permutation testing framework for inference with penalized splines. The proposed approach can be used to test omnibus hypotheses about functional relationships, as well as more flexible hypotheses about conditional relationships. I establish the conditions under which the methods will produce exact results, as well as the asymptotic behavior of the various permutation tests. Additionally, I present extensive simulation results to demonstrate the robustness and superiority of the proposed approach compared to commonly used methods.
      Citation: Stats
      PubDate: 2022-09-16
      DOI: 10.3390/stats5030053
      Issue No: Vol. 5, No. 3 (2022)
  • Stats, Vol. 5, Pages 339-357: A Bootstrap Variance Estimation Method for
           Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used
           at the Second Phase

    • Authors: Jean-François Beaumont, Nelson Émond
      First page: 339
      Abstract: The bootstrap method is often used for variance estimation in sample surveys with a stratified multistage sampling design. It is typically implemented by producing a set of bootstrap weights that is made available to users and that accounts for the complexity of the sampling design. The Rao–Wu–Yue method is often used to produce the required bootstrap weights. It is valid under stratified with-replacement sampling at the first stage or fixed-size without-replacement sampling provided the first-stage sampling fractions are negligible. Some surveys use designs that do not satisfy these conditions. We propose a simple and unified bootstrap method that addresses this limitation of the Rao–Wu–Yue bootstrap weights. This method is applicable to any multistage sampling design as long as valid bootstrap weights can be produced for each distinct stage of sampling. Our method is also applicable to two-phase sampling designs provided that Poisson sampling is used at the second phase. We use this design to model survey nonresponse and derive bootstrap weights that account for nonresponse weighting. The properties of our bootstrap method are evaluated in three limited simulation studies.
      Citation: Stats
      PubDate: 2022-03-22
      DOI: 10.3390/stats5020019
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 358-370: Multiple Imputation of Composite Covariates
           in Survival Studies

    • Authors: Lily Clements, Alan C. Kimber, Stefanie Biedermann
      First page: 358
      Abstract: Missing covariate values are a common problem in survival studies, and the method of choice when handling such incomplete data is often multiple imputation. However, it is not obvious how this can be used most effectively when an incomplete covariate is a function of other covariates. For example, body mass index (BMI) is the ratio of weight and height-squared. In this situation, the following question arises: Should a composite covariate such as BMI be imputed directly, or is it advantageous to impute its constituents, weight and height, first and to construct BMI afterwards' We address this question through a carefully designed simulation study that compares various approaches to multiple imputation of composite covariates in a survival context. We discuss advantages and limitations of these approaches for various types of missingness and imputation models. Our results are a first step towards providing much needed guidance to practitioners for analysing their incomplete survival data effectively.
      Citation: Stats
      PubDate: 2022-03-29
      DOI: 10.3390/stats5020020
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 371-384: ordinalbayes: Fitting Ordinal Bayesian
           Regression Models to High-Dimensional Data Using R

    • Authors: Kellie J. Archer, Anna Eames Seffernick, Shuai Sun, Yiran Zhang
      First page: 371
      Abstract: The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on the size and depth of the tumor as well as the level of spread. It may be of clinical relevance to identify molecular features from high-throughput genomic assays that are associated with the stage of cervical cancer to elucidate pathways related to tumor aggressiveness, identify improved molecular features that may be useful for staging, and identify therapeutic targets. High-throughput RNA-Seq data and corresponding clinical data (including stage) for cervical cancer patients have been made available through The Cancer Genome Atlas Project (TCGA). We recently described penalized Bayesian ordinal response models that can be used for variable selection for over-parameterized datasets, such as the TCGA-CESC dataset. Herein, we describe our ordinalbayes R package, available from the Comprehensive R Archive Network (CRAN), which enhances the runjags R package by enabling users to easily fit cumulative logit models when the outcome is ordinal and the number of predictors exceeds the sample size, P>N, such as for TCGA and other high-throughput genomic data. We demonstrate the use of this package by applying it to the TCGA cervical cancer dataset. Our ordinalbayes package can be used to fit models to high-dimensional datasets, and it effectively performs variable selection.
      Citation: Stats
      PubDate: 2022-04-15
      DOI: 10.3390/stats5020021
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 385-400: Some Empirical Results on Nearest-Neighbour
           Pseudo-populations for Resampling from Spatial Populations

    • Authors: Sara Franceschi, Rosa Maria Di Biase, Agnese Marcelli, Lorenzo Fattorini
      First page: 385
      Abstract: In finite populations, pseudo-population bootstrap is the sole method preserving the spirit of the original bootstrap performed from iid observations. In spatial sampling, theoretical results about the convergence of bootstrap distributions to the actual distributions of estimators are lacking, owing to the failure of spatially balanced sampling designs to converge to the maximum entropy design. In addition, the issue of creating pseudo.populations able to mimic the characteristics of real populations is challenging in spatial frameworks where spatial trends, relationships, and similarities among neighbouring locations are invariably present. In this paper, we propose the use of the nearest-neighbour interpolation of spatial populations for constructing pseudo-populations that converge to real populations under mild conditions. The effectiveness of these proposals with respect to traditional pseudo-populations is empirically checked by a simulation study.
      Citation: Stats
      PubDate: 2022-04-15
      DOI: 10.3390/stats5020022
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 401-407: Has the Market Started to Collapse or Will
           It Resist'

    • Authors: Yao Kuang, Raphael Douady
      First page: 401
      Abstract: Many people are concerned about the stock market in 2022 as it faces several threats, from rising inflation rates to geopolitical events. The S&P 500 Index has already dropped about 10% from the peak in early January 2022 until the end of February 2022. This paper aims at updating the crisis indicator to predict when the market may experience a significant drawdown, which we developed in Crisis Risk Prediction with Concavity from Polymodel (2022). This indicator uses regime switching and Polymodel theory to calculate the market concavity. We found that concavity had not increased in the past 6 months. We conclude that at present, the market does not bear inherent dynamic instability. This does not exclude a possible collapse which would be due to external events unrelated to financial markets.
      Citation: Stats
      PubDate: 2022-04-23
      DOI: 10.3390/stats5020023
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 408-421: Omnibus Tests for Multiple Binomial
           Proportions via Doubly Sampled Framework with Under-Reported Data

    • Authors: Dewi Rahardja
      First page: 408
      Abstract: Previously, Rahardja (2020) paper (in the first reference list) developed a (pairwise) multiple comparison procedure (MCP) to determine which (proportions) pairs of Multiple Binomial Proportions (with under-reported data), the significant differences came from. Generally, such an MCP test (developed by Rahardja, 2020) is the second part of a two-stage sequential test. In this paper, we derived two omnibus tests (i.e., the overall equality of multiple proportions test) as the first part of the above two-stage sequential test (with under-reported data), in general. Using two likelihood-based approaches, we acquire two Wald-type (Omnibus) tests to compare Multiple Binomial Proportions (in the presence of under-reported data). Our closed-form algorithm is easy to implement and not computationally burdensome. We applied our algorithm to a vehicle-accident data example.
      Citation: Stats
      PubDate: 2022-04-23
      DOI: 10.3390/stats5020024
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 422-439: Bootstrap Assessment of Crop Area Estimates
           Using Satellite Pixels Counting

    • Authors: Cristiano Ferraz, Jacques Delincé, André Leite, Raydonal Ospina
      First page: 422
      Abstract: Crop area estimates based on counting pixels over classified satellite images are a promising application of remote sensing to agriculture. However, such area estimates are biased, and their variance is a function of the error rates of the classification rule. To redress the bias, estimators (direct and inverse) relying on the so-called confusion matrix have been proposed, but analytic estimators for variances can be tricky to derive. This article proposes a bootstrap method for assessing statistical properties of such estimators based on information from a sample confusion matrix. The proposed method can be applied to any other type of estimator that is built upon confusion matrix information. The resampling procedure is illustrated in a small study to assess the biases and variances of estimates using purely pixel counting and estimates provided by both direct and inverse estimators. The method has the advantage of being simple to implement even when the sample confusion matrix is generated under unequal probability sample design. The results show the limitations of estimates based solely on pixel counting as well as respective advantages and drawbacks of the direct and inverse estimators with respect to their feasibility, unbiasedness, and variance.
      Citation: Stats
      PubDate: 2022-04-25
      DOI: 10.3390/stats5020025
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 440-457: Opening the Black Box: Bootstrapping
           Sensitivity Measures in Neural Networks for Interpretable Machine Learning

    • Authors: Michele La Rocca, Cira Perna
      First page: 440
      Abstract: Artificial neural networks are powerful tools for data analysis, particularly in the context of highly nonlinear regression models. However, their utility is critically limited due to the lack of interpretation of the model given its black-box nature. To partially address the problem, the paper focuses on the important problem of feature selection. It proposes and discusses a statistical test procedure for selecting a set of input variables that are relevant to the model while taking into account the multiple testing nature of the problem. The approach is within the general framework of sensitivity analysis and uses the conditional expectation of functions of the partial derivatives of the output with respect to the inputs as a sensitivity measure. The proposed procedure extensively uses the bootstrap to approximate the test statistic distribution under the null while controlling the familywise error rate to correct for data snooping arising from multiple testing. In particular, a pair bootstrap scheme was implemented in order to obtain consistent results when using misspecified statistical models, a typical characteristic of neural networks. Numerical examples and a Monte Carlo simulation were carried out to verify the ability of the proposed test procedure to correctly identify the set of relevant features.
      Citation: Stats
      PubDate: 2022-04-25
      DOI: 10.3390/stats5020026
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 458-476: Repeated-Measures Analysis in the Context of
           Heteroscedastic Error Terms with Factors Having Both Fixed and Random

    • Authors: Lyson Chaka, Peter Njuho
      First page: 458
      Abstract: The design and analysis of experiments which involve factors each consisting of both fixed and random levels fit into linear mixed models. The assumed linear mixed-model design matrix takes either a full-rank or less-than-full-rank form. The complexity of the data structures of such experiments falls in the model-selection and parameter-estimation process. The fundamental consideration in the estimation process of linear models is the special case in which elements of the error vector are assumed equal and uncorrelated. However, different assumptions on the structure of the variance–covariance matrix of error vector in the estimation of parameters of a linear mixed model may be considered. We conceptualise a repeated-measures design with multiple between-subjects factors, in which each of these factors has both fixed and random levels. We focus on the construction of linear mixed-effects models, the estimation of variance components, and hypothesis testing in which the default covariance structure of homoscedastic error terms is not appropriate. We illustrate the proposed approach using longitudinal data fitted to a three-factor linear mixed-effects model. The novelty of this approach lies in the exploration of the fixed and random levels of the same factor and in the subsequent interaction effects of the fixed levels. In addition, we assess the differences between levels of the same factor and determine the proportion of the total variation accounted for by the random levels of the same factor.
      Citation: Stats
      PubDate: 2022-05-06
      DOI: 10.3390/stats5020027
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 477-493: Bayesian Semiparametric Regression Analysis
           of Multivariate Panel Count Data

    • Authors: Chunling Wang, Xiaoyan Lin
      First page: 477
      Abstract: Panel count data often occur in a long-term recurrent event study, where the exact occurrence time of the recurrent events is unknown, but only the occurrence count between any two adjacent observation time points is recorded. Most traditional methods only handle panel count data for a single type of event. In this paper, we propose a Bayesian semiparameteric approach to analyze panel count data for multiple types of events. For each type of recurrent event, the proportional mean model is adopted to model the mean count of the event, where its baseline mean function is approximated by monotone I-splines. The correlation between multiple types of events is modeled by common frailty terms and scale parameters. Unlike many frequentist estimating equation methods, our approach is based on the observed likelihood and makes no assumption on the relationship between the recurrent process and the observation process. Under the Poisson counting process assumption, we develop an efficient Gibbs sampler based on novel data augmentation for the Markov chain Monte Carlo sampling. Simulation studies show good estimation performance of the baseline mean functions and the regression coefficients; meanwhile, the importance of including the scale parameter to flexibly accommodate the correlation between events is also demonstrated. Finally, a skin cancer data example is fully analyzed to illustrate the proposed methods.
      Citation: Stats
      PubDate: 2022-05-10
      DOI: 10.3390/stats5020028
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 494-506: The Missing Indicator Approach for
           Accelerated Failure Time Model with Covariates Subject to Limits of

    • Authors: Norah Alyabs, Sy Han Chiou
      First page: 494
      Abstract: The limit of detection (LOD) is commonly encountered in observational studies when one or more covariate values fall outside the measuring ranges. Although the complete-case (CC) approach is widely employed in the presence of missing values, it could result in biased estimations or even become inapplicable in small sample studies. On the other hand, approaches such as the missing indicator (MDI) approach are attractive alternatives as they preserve sample sizes. This paper compares the effectiveness of different alternatives to the CC approach under different LOD settings with a survival outcome. These alternatives include substitution methods, multiple imputation (MI) methods, MDI approaches, and MDI-embedded MI approaches. We found that the MDI approach outperformed its competitors regarding bias and mean squared error in small sample sizes through extensive simulation.
      Citation: Stats
      PubDate: 2022-05-10
      DOI: 10.3390/stats5020029
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 507-520: Goodness-of-Fit and Generalized Estimating
           Equation Methods for Ordinal Responses Based on the Stereotype Model

    • Authors: Daniel Fernández, Louise McMillan, Richard Arnold, Martin Spiess, Ivy Liu
      First page: 507
      Abstract: Background: Data with ordinal categories occur in many diverse areas, but methodologies for modeling ordinal data lag severely behind equivalent methodologies for continuous data. There are advantages to using a model specifically developed for ordinal data, such as making fewer assumptions and having greater power for inference. Methods: The ordered stereotype model (OSM) is an ordinal regression model that is more flexible than the popular proportional odds ordinal model. The primary benefit of the OSM is that it uses numeric encoding of the ordinal response categories without assuming the categories are equally-spaced. Results: This article summarizes two recent advances in the OSM: (1) three novel tests to assess goodness-of-fit; (2) a new Generalized Estimating Equations approach to estimate the model for longitudinal studies. These methods use the new spacing of the ordinal categories indicated by the estimated score parameters of the OSM. Conclusions: The recent advances presented can be applied to several fields. We illustrate their use with the well-known arthritis clinical trial dataset. These advances fill a gap in methodologies available for ordinal responses and may be useful for practitioners in many applied fields.
      Citation: Stats
      PubDate: 2022-06-01
      DOI: 10.3390/stats5020030
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 521-537: A Comparison of Existing Bootstrap
           Algorithms for Multi-Stage Sampling Designs

    • Authors: Sixia Chen, David Haziza, Zeinab Mashreghi
      First page: 521
      Abstract: Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the availability of second-order inclusion probabilities at each stage. To cope with this issue, several bootstrap algorithms have been proposed in the literature in the context of a two-stage sampling design. In this paper, we describe some of these algorithms and compare them empirically in terms of bias, stability, and coverage probability.
      Citation: Stats
      PubDate: 2022-06-06
      DOI: 10.3390/stats5020031
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 538-545: Evaluation of the Gauss Integral

    • Authors: Dmitri Martila, Stefan Groote
      First page: 538
      Abstract: The normal or Gaussian distribution plays a prominent role in almost all fields of science. However, it is well known that the Gauss (or Euler–Poisson) integral over a finite boundary, as is necessary, for instance, for the error function or the cumulative distribution of the normal distribution, cannot be expressed by analytic functions. This is proven by the Risch algorithm. Regardless, there are proposals for approximate solutions. In this paper, we give a new solution in terms of normal distributions by applying a geometric procedure iteratively to the problem.
      Citation: Stats
      PubDate: 2022-06-10
      DOI: 10.3390/stats5020032
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 546-560: Quantitative Trading through Random
           Perturbation Q-Network with Nonlinear Transaction Costs

    • Authors: Tian Zhu, Wei Zhu
      First page: 546
      Abstract: In recent years, reinforcement learning (RL) has seen increasing applications in the financial industry, especially in quantitative trading and portfolio optimization when the focus is on the long-term reward rather than short-term profit. Sequential decision making and Markov decision processes are rather suited for this type of application. Through trial and error based on historical data, an agent can learn the characteristics of the market and evolve an algorithm to maximize the cumulative returns. In this work, we propose a novel RL trading algorithm utilizing random perturbation of the Q-network and account for the more realistic nonlinear transaction costs. In summary, we first design a new near-quadratic transaction cost function considering the slippage. Next, we develop a convolutional deep Q-learning network (CDQN) with multiple price input based on this cost functions. We further propose a random perturbation (rp) method to modify the learning network to solve the instability issue intrinsic to the deep Q-learning network. Finally, we use this newly developed CDQN-rp algorithm to make trading decisions based on the daily stock prices of Apple (AAPL), Meta (FB), and Bitcoin (BTC) and demonstrate its strengths over other quantitative trading methods.
      Citation: Stats
      PubDate: 2022-06-10
      DOI: 10.3390/stats5020033
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 561-571: Bayesian Bootstrap in Multiple Frames

    • Authors: Daniela Cocchi, Lorenzo Marchi, Riccardo Ievoli
      First page: 561
      Abstract: Multiple frames are becoming increasingly relevant due to the spread of surveys conducted via registers. In this regard, estimators of population quantities have been proposed, including the multiplicity estimator. In all cases, variance estimation still remains a matter of debate. This paper explores the potential of Bayesian bootstrap techniques for computing such estimators. The suitability of the method, which is compared to the existing frequentist bootstrap, is shown by conducting a small-scale simulation study and a case study.
      Citation: Stats
      PubDate: 2022-06-15
      DOI: 10.3390/stats5020034
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 572-582: A Multi-Aspect Permutation Test for
           Goodness-of-Fit Problems

    • Authors: Rosa Arboretti, Elena Barzizza, Nicolò Biasetton, Riccardo Ceccato, Livio Corain, Luigi Salmaso
      First page: 572
      Abstract: Parametric techniques commonly rely on specific distributional assumptions. It is therefore fundamental to preliminarily identify the eventual violations of such assumptions. Therefore, appropriate testing procedures are required for this purpose to deal with a the goodness-of-fit (GoF) problem. This task can be quite challenging, especially with small sample sizes and multivariate data. Previous studiesshowed how a GoF problem can be easily represented through a traditional two-sample system of hypotheses. Following this idea, in this paper, we propose a multi-aspect permutation-based test to deal with the multivariate goodness-of-fit, taking advantage of the nonparametric combination (NPC) methodology. A simulation study is then conducted to evaluate the performance of our proposal and to identify the eventual critical scenarios. Finally, a real data application is considered.
      Citation: Stats
      PubDate: 2022-06-17
      DOI: 10.3390/stats5020035
      Issue No: Vol. 5, No. 2 (2022)
  • Stats, Vol. 5, Pages 52-69: A Flexible Mixed Model for Clustered Count

    • Authors: Darcy Steeg Morris, Kimberly F. Sellers
      First page: 52
      Abstract: Clustered count data are commonly modeled using Poisson regression with random effects to account for the correlation induced by clustering. The Poisson mixed model allows for overdispersion via the nature of the within-cluster correlation, however, departures from equi-dispersion may also exist due to the underlying count process mechanism. We study the cross-sectional COM-Poisson regression model—a generalized regression model for count data in light of data dispersion—together with random effects for analysis of clustered count data. We demonstrate model flexibility of the COM-Poisson random intercept model, including choice of the random effect distribution, via simulated and real data examples. We find that COM-Poisson mixed models provide comparable model fit to well-known mixed models for associated special cases of clustered discrete data, and result in improved model fit for data with intermediate levels of over- or underdispersion in the count mechanism. Accordingly, the proposed models are useful for capturing dispersion not consistent with commonly used statistical models, and also serve as a practical diagnostic tool.
      Citation: Stats
      PubDate: 2022-01-07
      DOI: 10.3390/stats5010004
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 70-88: A Noncentral Lindley Construction Illustrated
           in an INAR(1) Environment

    • Authors: Johannes Ferreira, Ané van der Merwe
      First page: 70
      Abstract: This paper proposes a previously unconsidered generalization of the Lindley distribution by allowing for a measure of noncentrality. Essential structural characteristics are investigated and derived in explicit and tractable forms, and the estimability of the model is illustrated via the fit of this developed model to real data. Subsequently, this model is used as a candidate for the parameter of a Poisson model, which allows for departure from the usual equidispersion restriction that the Poisson offers when modelling count data. This Poisson-noncentral Lindley is also systematically investigated and characteristics are derived. The value of this count model is illustrated and implemented as the count error distribution in an integer autoregressive environment, and juxtaposed against other popular models. The effect of the systematically-induced noncentrality parameter is illustrated and paves the way for future flexible modelling not only as a standalone contender in continuous Lindley-type scenarios but also in discrete and discrete time series scenarios when the often-encountered equidispersed assumption is not adhered to in practical data environments.
      Citation: Stats
      PubDate: 2022-01-10
      DOI: 10.3390/stats5010005
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 89-107: A Bayesian Approach for Imputation of
           Censored Survival Data

    • Authors: Shirin Moghaddam, John Newell, John Hinde
      First page: 89
      Abstract: A common feature of much survival data is censoring due to incompletely observed lifetimes. Survival analysis methods and models have been designed to take account of this and provide appropriate relevant summaries, such as the Kaplan–Meier plot and the commonly quoted median survival time of the group under consideration. However, a single summary is not really a relevant quantity for communication to an individual patient, as it conveys no notion of variability and uncertainty, and the Kaplan–Meier plot can be difficult for the patient to understand and also is often mis-interpreted, even by some physicians. This paper considers an alternative approach of treating the censored data as a form of missing, incomplete data and proposes an imputation scheme to construct a completed dataset. This allows the use of standard descriptive statistics and graphical displays to convey both typical outcomes and the associated variability. We propose a Bayesian approach to impute any censored observations, making use of other information in the dataset, and provide a completed dataset. This can then be used for standard displays, summaries, and even, in theory, analysis and model fitting. We particularly focus on the data visualisation advantages of the completed data, allowing displays such as density plots, boxplots, etc, to complement the usual Kaplan–Meier display of the original dataset. We study the performance of this approach through a simulation study and consider its application to two clinical examples.
      Citation: Stats
      PubDate: 2022-01-26
      DOI: 10.3390/stats5010006
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 108-110: Acknowledgment to Reviewers of Stats in 2021

    • Authors: Stats Editorial Office Stats Editorial Office
      First page: 108
      Abstract: Rigorous peer-reviews are the basis of high-quality academic publishing [...]
      Citation: Stats
      PubDate: 2022-01-28
      DOI: 10.3390/stats5010007
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 111-127: A General Description of Growth Trends

    • Authors: Moshe Elitzur
      First page: 111
      Abstract: Time series that display periodicity can be described with a Fourier expansion. In a similar vein, a recently developed formalism enables the description of growth patterns with the optimal number of parameters. The method has been applied to the growth of national GDP, population and the COVID-19 pandemic; in all cases, the deviations of long-term growth patterns from purely exponential required no more than two additional parameters, mostly only one. Here, I utilize the new framework to develop a unified formulation for all functions that describe growth deceleration, wherein the growth rate decreases with time. The result offers the prospects for a new general tool for trend removal in time-series analysis.
      Citation: Stats
      PubDate: 2022-02-03
      DOI: 10.3390/stats5010008
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 128-138: Selection of Auxiliary Variables for
           Three-Fold Linking Models in Small Area Estimation: A Simple and Effective

    • Authors: Song Cai, J.N.K. Rao
      First page: 128
      Abstract: Model-based estimation of small area means can lead to reliable estimates when the area sample sizes are small. This is accomplished by borrowing strength across related areas using models linking area means to related covariates and random area effects. The effective selection of variables to be included in the linking model is important in small area estimation. The main purpose of this paper is to extend the earlier work on variable selection for area level and two-fold subarea level models to three-fold sub-subarea models linking sub-subarea means to related covariates and random effects at the area, sub-area, and sub-subarea levels. The proposed variable selection method transforms the sub-subarea means to reduce the linking model to a standard regression model and applies commonly used criteria for variable selection, such as AIC and BIC, to the reduced model. The resulting criteria depend on the unknown sub-subarea means, which are then estimated using the sample sub-subarea means. Then, the estimated selection criteria are used for variable selection. Simulation results on the performance of the proposed variable selection method relative to methods based on area level and two-fold subarea level models are also presented.
      Citation: Stats
      PubDate: 2022-02-05
      DOI: 10.3390/stats5010009
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 139-153: Analysis of Household Pulse Survey
           Public-Use Microdata via Unit-Level Models for Informative Sampling

    • Authors: Alexander Sun, Paul A. Parker, Scott H. Holan
      First page: 139
      Abstract: The Household Pulse Survey, recently released by the U.S. Census Bureau, gathers information about the respondents’ experiences regarding employment status, food security, housing, physical and mental health, access to health care, and education disruption. Design-based estimates are produced for all 50 states and the District of Columbia (DC), as well as 15 Metropolitan Statistical Areas (MSAs). Using public-use microdata, this paper explores the effectiveness of using unit-level model-based estimators that incorporate spatial dependence for the Household Pulse Survey. In particular, we consider Bayesian hierarchical model-based spatial estimates for both a binomial and a multinomial response under informative sampling. Importantly, we demonstrate that these models can be easily estimated using Hamiltonian Monte Carlo through the Stan software package. In doing so, these models can readily be implemented in a production environment. For both the binomial and multinomial responses, an empirical simulation study is conducted, which compares spatial and non-spatial models. Finally, using public-use Household Pulse Survey micro-data, we provide an analysis that compares both design-based and model-based estimators and demonstrates a reduction in standard errors for the model-based approaches.
      Citation: Stats
      PubDate: 2022-02-07
      DOI: 10.3390/stats5010010
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 154-171: All-NBA Teams’ Selection Based on
           Unsupervised Learning

    • Authors: João Vítor Rocha da Silva, Paulo Canas Rodrigues
      First page: 154
      Abstract: All-NBA Teams’ selections have great implications for the players’ and teams’ futures. Since contract extensions are highly related to awards, which can be seen as indexes that measure a players’ production in a year, team selection is of mutual interest for athletes and franchises. In this paper, we are interested in studying the current selection format. In particular, this study aims to: (i) identify the factors that are taken into consideration by voters when choosing the three All-NBA Teams; and (ii) suggest a new selection format to evaluate players’ performances. Average game-related statistics of all active NBA players in regular seasons from 2013-14 to 2018-19, were analyzed using LASSO (Logistic) Regression and Principal Component Analysis (PCA). It was possible: (i) to determine an All-NBA player profile; (ii) to determine that this profile can cause a misrepresentation of players’ modern and versatile gameplay styles; and (iii) to suggest a new way to evaluate and select players, through PCA. As the results of this paper a model is presented that may help not only the NBA to better evaluate players, but any basketball league; it also may be a source to researchers that aim to investigate player performance, development, and their impact over many seasons.
      Citation: Stats
      PubDate: 2022-02-09
      DOI: 10.3390/stats5010011
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 172-189: Multivariate Threshold Regression Models
           with Cure Rates: Identification and Estimation in the Presence of the
           Esscher Property

    • Authors: Mei-Ling Ting Lee, George A. Whitmore
      First page: 172
      Abstract: The first hitting time of a boundary or threshold by the sample path of a stochastic process is the central concept of threshold regression models for survival data analysis. Regression functions for the process and threshold parameters in these models are multivariate combinations of explanatory variates. The stochastic process under investigation may be a univariate stochastic process or a multivariate stochastic process. The stochastic processes of interest to us in this report are those that possess stationary independent increments (i.e., Lévy processes) as well as the Esscher property. The Esscher transform is a transformation of probability density functions that has applications in actuarial science, financial engineering, and other fields. Lévy processes with this property are often encountered in practical applications. Frequently, these applications also involve a ‘cure rate’ fraction because some individuals are susceptible to failure and others not. Cure rates may arise endogenously from the model alone or exogenously from mixing of distinct statistical populations in the data set. We show, using both theoretical analysis and case demonstrations, that model estimates derived from typical survival data may not be able to distinguish between individuals in the cure rate fraction who are not susceptible to failure and those who may be susceptible to failure but escape the fate by chance. The ambiguity is aggravated by right censoring of survival times and by minor misspecifications of the model. Slightly incorrect specifications for regression functions or for the stochastic process can lead to problems with model identification and estimation. In this situation, additional guidance for estimating the fraction of non-susceptibles must come from subject matter expertise or from data types other than survival times, censored or otherwise. The identifiability issue is confronted directly in threshold regression but is also present when applying other kinds of models commonly used for survival data analysis. Other methods, however, usually do not provide a framework for recognizing or dealing with the issue and so the issue is often unintentionally ignored. The theoretical foundations of this work are set out, which presents new and somewhat surprising results for the first hitting time distributions of Lévy processes that have the Esscher property.
      Citation: Stats
      PubDate: 2022-02-11
      DOI: 10.3390/stats5010012
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 190-202: Bootstrap Prediction Intervals of Temporal

    • Authors: Bu Hyoung Lee
      First page: 190
      Abstract: In this article, we propose an interval estimation method to trace an unknown disaggregate series within certain bandwidths. First, we consider two model-based disaggregation methods called the GLS disaggregation and the ARIMA disaggregation. Then, we develop iterative steps to construct AR-sieve bootstrap prediction intervals for model-based temporal disaggregation. As an illustration, we analyze the quarterly total balances of U.S. international trade in goods and services between the first quarter of 1992 and the fourth quarter of 2020.
      Citation: Stats
      PubDate: 2022-02-18
      DOI: 10.3390/stats5010013
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 203-214: Modeling Secondary Phenotypes Conditional on
           Genotypes in Case–Control Studies

    • Authors: Naomi C. Brownstein, Jianwen Cai, Shad Smith, Luda Diatchenko, Gary D. Slade, Eric Bair
      First page: 203
      Abstract: Traditional case–control genetic association studies examine relationships between case–control status and one or more covariates. It is becoming increasingly common to study secondary phenotypes and their association with the original covariates. The Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) project, a study of temporomandibular disorders (TMD), motivates this work. Numerous measures of interest are collected at enrollment, such as the number of comorbid pain conditions from which a participant suffers. Examining the potential genetic basis of these measures is of secondary interest. Assessing these associations is statistically challenging, as participants do not form a random sample from the population of interest. Standard methods may be biased and lack coverage and power. We propose a general method for the analysis of arbitrary phenotypes utilizing inverse probability weighting and bootstrapping for standard error estimation. The method may be applied to the complicated association tests used in next-generation sequencing studies, such as analyses of haplotypes with ambiguous phase. Simulation studies show that our method performs as well as competing methods when they are applicable and yield promising results for outcome types, such as time-to-event, to which other methods may not apply. The method is applied to the OPPERA baseline case–control genetic study.
      Citation: Stats
      PubDate: 2022-02-22
      DOI: 10.3390/stats5010014
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 215-257: The Stacy-G Class: A New Family of
           Distributions with Regression Modeling and Applications to Survival Real

    • Authors: Lucas D. Ribeiro Reis, Gauss M. Cordeiro, Maria do Carmo S. Lima
      First page: 215
      Abstract: We study the Stacy-G family, which extends the gamma-G class and provides four of the most well-known forms of the hazard rate function: increasing, decreasing, bathtub, and inverted bathtub. We provide some of its structural properties. We estimate the parameters by maximum likelihood, and perform a simulation study to verify the asymptotic properties of the estimators for the Burr-XII baseline. We construct the log-Stacy-Burr XII regression for censored data. The usefulness of the new models is shown through applications to uncensored and censored real data.
      Citation: Stats
      PubDate: 2022-03-04
      DOI: 10.3390/stats5010015
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 258-269: Resampling under Complex Sampling Designs:
           Roots, Development and the Way Forward

    • Authors: Pier Luigi Conti, Fulvia Mecatti
      First page: 258
      Abstract: In the present paper, resampling for finite populations under an iid sampling design is reviewed. Our attention is mainly focused on pseudo-population-based resampling due to its properties. A principled appraisal of the main theoretical foundations and results is given and discussed, together with important computational aspects. Finally, a discussion on open problems and research perspectives is provided.
      Citation: Stats
      PubDate: 2022-03-08
      DOI: 10.3390/stats5010016
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 270-311: Properties and Limiting Forms of the
           Multivariate Extended Skew-Normal and Skew-Student Distributions

    • Authors: Christopher J. Adcock
      First page: 270
      Abstract: This paper is concerned with the multivariate extended skew-normal [MESN] and multivariate extended skew-Student [MEST] distributions, that is, distributions in which the location parameters of the underlying truncated distributions are not zero. The extra parameter leads to greater variability in the moments and critical values, thus providing greater flexibility for empirical work. It is reported in this paper that various theoretical properties of the extended distributions, notably the limiting forms as the magnitude of the extension parameter, denoted τ in this paper, increases without limit. In particular, it is shown that as τ→−∞, the limiting forms of the MESN and MEST distributions are different. The effect of the difference is exemplified by a study of stockmarket crashes. A second example is a short study of the extent to which the extended skew-normal distribution can be approximated by the skew-Student.
      Citation: Stats
      PubDate: 2022-03-09
      DOI: 10.3390/stats5010017
      Issue No: Vol. 5, No. 1 (2022)
  • Stats, Vol. 5, Pages 312-338: Importance of Weather Conditions in a Flight

    • Authors: Gong Chen, Hartmut Fricke, Ostap Okhrin, Judith Rosenow
      First page: 312
      Abstract: Current research initiatives, such as the Single European Sky Air Traffic Management Research Program, call for an air traffic system with improved safety and efficiency records and environmental compatibility. The resulting multi-criteria system optimization and individual flight trajectories require, in particular, reliable three-dimensional meteorological information. The Global (Weather) Forecast System only provides data at a resolution of around 100 km. We postulate a reliable interpolation at high resolution to compute these trajectories accurately and in due time to comply with operational requirements. We investigate different interpolation methods for aerodynamic crucial weather variables such as temperature, wind speed, and wind direction. These methods, including Ordinary Kriging, the radial basis function method, neural networks, and decision trees, are compared concerning cross-validation interpolation errors. We show that using the interpolated data in a flight performance model emphasizes the effect of weather data accuracy on trajectory optimization. Considering a trajectory from Prague to Tunis, a Monte Carlo simulation is applied to examine the effect of errors on input (GFS data) and output (i.e., Ordinary Kriging) on the optimized trajectory.
      Citation: Stats
      PubDate: 2022-03-09
      DOI: 10.3390/stats5010018
      Issue No: Vol. 5, No. 1 (2022)
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762

Your IP address:
Home (Search)
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-