Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Vicente Núñez-Antón, Andreas Mayr, Francesco Bartolucci Pages: 7 - 8 Abstract: Statistical Modelling, Volume 23, Issue 1, Page 7-8, February 2023.
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Francisco F. Queiroz, Silvia L. P. Ferrari Abstract: Statistical Modelling, Ahead of Print. The main purpose of this article is to introduce a new class of regression models for bounded continuous data, commonly encountered in applied research. The models, named the power logit regression models, assume that the response variable follows a distribution in a wide, flexible class of distributions with three parameters, namely, the median, a dispersion parameter and a skewness parameter. The article offers a comprehensive set of tools for likelihood inference and diagnostic analysis, and introduces the new R package PLreg. Applications with real and simulated data show the merits of the proposed models, the statistical tools, and the computational package. Citation: Statistical Modelling PubDate: 2023-02-14T06:29:59Z DOI: 10.1177/1471082X221140157
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Maria Felice Arezzo, Serena Arima, Giuseppina Guagnano Abstract: Statistical Modelling, Ahead of Print. In undeclared work research, the estimation of the magnitude of the phenomenon (i.e., the amount of income and/or the percentage of workers involved) is of major interest. This has been done either using indirect methods or by means of ad hoc surveys such as the Eurobarometer special survey on undeclared work, our motivating study. The extent of undeclared work can be measured by means of two different outcomes: the event of working off-the-book (binary variable) and, when the event occurs, the amount of earnings deriving from the undeclared activity (continuous variable). This setup has been typically modeled via the so called two-part model: a binary choice model for the probability of observing a positive-versus-zero outcome and then, conditional on a positive outcome, a regression model for the positive outcome. We propose an extension of the two-part model that goes in two directions. The first regards the measurement error that, given the very nature of undeclared activities, is most likely to affect both the outcomes of interest. The second is that we generalize the linear regression part of the model to allow individual-level means. We also conduct an extensive simulation study to investigate the performance of the proposed model and compare it with traditional approaches. Citation: Statistical Modelling PubDate: 2023-02-07T11:22:06Z DOI: 10.1177/1471082X221145240
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Christian H. Weiß, Malte Jahn Abstract: Statistical Modelling, Ahead of Print. The soft-clipping binomial INGARCH (scBINGARCH) models are proposed as time series models for bounded counts, which have a nearly linear structure and also allow for negative autocor-relations. Conditions that guarantee the existence and certain mixing properties of the scBINGARCH process are derived, and further stochastic properties are discussed. The consistency and asymptotic nor-mality of maximum likelihood estimators are established, and finite-sample properties are studied with simulations. The practical relevance of the scBINGARCH model’s ability to allow for negative parameter and ACF values is demonstrated by some real-data examples. Citation: Statistical Modelling PubDate: 2022-12-21T03:55:15Z DOI: 10.1177/1471082X221121223
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Alfonso Russo, Alessio Farcomeni, Maria Grazia Pittau, Roberto Zelli Abstract: Statistical Modelling, Ahead of Print. We derive a multivariate latent Markov model with number of latent states that can possibly change at each time point. We model both the manifest and latent distributions conditionally on explanatory variables. Bayesian inference is based on a transdimensional Markov Chain Monte Carlo approach, where Reversible Jump is separately performed for each time occasion. In a simulation study, we show how our approach can recover the true underlying sequence of latent states with high probability, and that it has lower bias than competitors. We conclude with an analysis of the well-being of 100 nations, as expressed by the dimensions of the Human Development Index, for six-time points spanning a period of 22 years. R code with an implementation is available as supplementary material, together with files for reproducing the data analysis. Citation: Statistical Modelling PubDate: 2022-11-16T08:24:22Z DOI: 10.1177/1471082X221127732
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Cornelius Fritz, Giacomo De Nicola, Martje Rave, Maximilian Weigert, Yeganeh Khazaei, Ursula Berger, Helmut Küchenhoff, Göran Kauermann Abstract: Statistical Modelling, Ahead of Print. Over the course of the COVID-19 pandemic, Generalized Additive Models (GAMs) have been successfully employed on numerous occasions to obtain vital data-driven insights. In this article we further substantiate the success story of GAMs, demonstrating their flexibility by focusing on three relevant pandemic-related issues. First, we examine the interdepency among infections in different age groups, concentrating on school children. In this context, we derive the setting under which parameter estimates are independent of the (unknown) case-detection ratio, which plays an important role in COVID-19 surveillance data. Second, we model the incidence of hospitalizations, for which data is only available with a temporal delay. We illustrate how correcting for this reporting delay through a nowcasting procedure can be naturally incorporated into the GAM framework as an offset term. Third, we propose a multinomial model for the weekly occupancy of intensive care units (ICU), where we distinguish between the number of COVID-19 patients, other patients and vacant beds. With these three examples, we aim to showcase the practical and ‘off-the-shelf’ applicability of GAMs to gain new insights from real-world data. Citation: Statistical Modelling PubDate: 2022-09-28T10:49:54Z DOI: 10.1177/1471082X221124628
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Joungyoun Kim, Johan Lim, Jong Soo Lee Abstract: Statistical Modelling, Ahead of Print. In this article, we propose a new semiparametric hidden Markov model (HMM) for use in the simultaneous hypothesis testing with dependency. The semi- or non-parametric HMM in the literature requires two conditions for its model identifiability, (a) the latent Markov chain (MC) is ergodic and its transition probability is full rank and (b) the observational distributions of different hidden states are disjoint or linearly independent. Unlike the existing models, our semiparametric HMM with two hidden states makes no assumption on the transition probability of the latent MC but assumes that observational distributions are extremal for the set of all stationary distributions of the model. To estimate the model, we propose a modified expectation-maximization algorithm, whose M-step has an additional purification step to make the observational distribution be extremal one. We numerically investigate the performance of the proposed procedure in the estimation of the model and compare it to two recent existing methods in various multiple testing error settings. In addition, we apply our procedure to analyzing two real data examples, the gas chromatography/mass spectrometry experiment to differentiate the origin of herbal medicine and the epidemiologic surveillance of an influenza-like illness. Citation: Statistical Modelling PubDate: 2022-09-27T07:56:45Z DOI: 10.1177/1471082X221121235
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Manuel Carlan, Thomas Kneib Abstract: Statistical Modelling, Ahead of Print. We propose a novel Bayesian model framework for discrete ordinal and count data based on conditional transformations of the responses. The conditional transformation function is estimated from the data in conjunction with an a priori chosen reference distribution. For count responses, the resulting transformation model is novel in the sense that it is a Bayesian fully parametric yet distribution-free approach that can additionally account for excess zeros with additive transformation function specifications. For ordinal categoric responses, our cumulative link transformation model allows the inclusion of linear and non-linear covariate effects that can additionally be made category-specific, resulting in (non-)proportional odds or hazards models and more, depending on the choice of the reference distribution. Inference is conducted by a generic modular Markov chain Monte Carlo algorithm where multivariate Gaussian priors enforce specific properties such as smoothness on the functional effects. To illustrate the versatility of Bayesian discrete conditional transformation models, applications to counts of patent citations in the presence of excess zeros and on treating forest health categories in a discrete partial proportional odds model are presented. Citation: Statistical Modelling PubDate: 2022-09-23T04:39:02Z DOI: 10.1177/1471082X221114177
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Jordache Ramjith, Andreas Bender, Kit C. B. Roes, Marianne A. Jonker Abstract: Statistical Modelling, Ahead of Print. Recurrent events analysis plays an important role in many applications, including the study of chronic diseases or recurrence of infections. Historically, many models for recurrent events have been variants of the Cox model. In this article we introduce and describe the application of the piece-wise exponential Additive Mixed Model (PAMM) for recurrent events analysis and illustrate how PAMMs can be used to flexibly model the dependencies in recurrent events data. Simulations confirm that PAMMs provide unbiased estimates as well as equivalence to the Cox model when proportional hazards are assumed. Applications to recurrence of staphylococcus aureus and malaria in children illustrate the estimation of seasonality, bivariate non-linear effects, multiple timescales and relaxation of the proportional hazards assumption via time-varying effects. The R package pammtools is extended to facilitate estimation and visualization of PAMMs for recurrent events data. Citation: Statistical Modelling PubDate: 2022-09-09T04:35:55Z DOI: 10.1177/1471082X221117612
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Fatima-Zahra Jaouimaa, Il Do Ha, Kevin Burke Abstract: Statistical Modelling, Ahead of Print. We consider a parametric modelling approach for survival data where covariates are allowed to enter the model through multiple distributional parameters (i.e., scale and shape). This is in contrast with the standard convention of having a single covariate-dependent parameter, typically the scale. Taking what is referred to as a multi-parameter regression (MPR) approach to modelling has been shown to produce flexible and robust models with relatively low model complexity cost. However, it is very common to have clustered data arising from survival analysis studies, and this is something that is under developed in the MPR context. The purpose of this article is to extend MPR models to handle multivariate survival data by introducing random effects in both the scale and the shape regression components. We consider a variety of possible dependence structures for these random effects (independent, shared and correlated), and estimation proceeds using a h-likelihood approach. The performance of our estimation procedure is investigated by a way of an extensive simulation study, and the merits of our modelling approach are illustrated through applications to two real data examples, a lung cancer dataset and a bladder cancer dataset. Citation: Statistical Modelling PubDate: 2022-09-07T10:47:43Z DOI: 10.1177/1471082X221117377
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Julien Gibaud, Xavier Bry, Catherine Trottier, Frédéric Mortier, Maxime Réjou-Méchain Abstract: Statistical Modelling, Ahead of Print. In this article, we propose to cluster responses in order to identify groups predicted by specific explanatory components. A response matrix is assumed to depend on a set of explanatory variables and a set of additional covariates. Explanatory variables are supposed many and redundant, which implies some dimension reduction and regularization. By contrast, additional covariates contain few selected variables which are forced into the regression model, as they demand no regularization. The response matrix is assumed partitioned into several unknown groups of responses. We suppose that the responses in each group are predictable from an appropriate number of specific orthogonal supervised components of explanatory variables. The classification is based on a mixture model of the responses. To estimate the model, we propose a criterion extending that of Supervised Component-based Generalized Linear Regression, a Partial Least Squares-type method, and develop an algorithm combining component-based model and Expectation Maximization estimation. This new methodology is tested on simulated data and then applied to a floristic ecology dataset. Citation: Statistical Modelling PubDate: 2022-09-05T11:37:36Z DOI: 10.1177/1471082X221115525
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Paola Bortot, Carlo Gaetan Abstract: Statistical Modelling, Ahead of Print. In extreme value studies, models for observations exceeding a fixed high threshold have the advantage of exploiting the available extremal information while avoiding bias from low values. In the context of space-time data, the challenge is to develop models for threshold exceedances that account for both spatial and temporal dependence. We address this issue through a modelling approach that embeds spatial dependence within a time series formulation. The model allows for different forms of limiting dependence in the spatial and temporal domains as the threshold level increases. In particular, temporal asymptotic independence is assumed, as this is often supported by empirical evidence, especially in environmental applications, while both asymptotic dependence and asymptotic independence are considered for the spatial domain. Inference from the observed exceedances is carried out through a combination of pairwise likelihood and a censoring mechanism. For those model specifications for which direct maximization of the censored pairwise likelihood is unfeasible, we propose an indirect inference procedure which leads to satisfactory results in a simulation study. The approach is applied to a dataset of rainfall amounts recorded over a set of weather stations in the North Brabant province of the Netherlands. The application shows that the range of extremal patterns that the model can cover is wide and that it has a competitive performance with respect to an alternative existing model for space-time threshold exceedances. Citation: Statistical Modelling PubDate: 2022-05-28T06:09:13Z DOI: 10.1177/1471082X221098224
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Nicoletta D’Angelo, David Payares, Giada Adelfio, Jorge Mateu Abstract: Statistical Modelling, Ahead of Print. Although there are recent developments for the analysis of first and second-order characteristics of point processes on networks, there are very few attempts in introducing models for network data. Motivated by the analysis of crime data in Bucaramanga (Colombia), we propose a spatiotemporal Hawkes point process model adapted to events living on linear networks. We first consider a non-parametric modelling strategy, for which we follow a non-parametric estimation of both the background and the triggering components. Then we consider a semi-parametric version, including a parametric estimation of the background based on covariates, and a non-parametric one of the triggering effects. Our model can be easily adapted to multi-type processes. Our network model outperforms a planar version, improving the fitting of the self-exciting point process model. Citation: Statistical Modelling PubDate: 2022-05-20T04:54:21Z DOI: 10.1177/1471082X221094146
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Emmanuel O. Ogundimu Abstract: Statistical Modelling, Ahead of Print. Prediction models in credit scoring are often formulated using available data on accepted applicants at the loan application stage. The use of this data to estimate probability of default (PD) may lead to bias due to non-random selection from the population of applicants. That is, the PD in the general population of applicants may not be the same with the PD in the subpopulation of the accepted applicants. A prominent model for the reduction of bias in this framework is the sample selection model, but there is no consensus on its utility yet. It is unclear if the bias-variance trade- off of regularization techniques can improve the predictions of PD in non-random sample selection setting. To address this, we propose the use of Lasso and adaptive Lasso for variable selection and optimal predictive accuracy. By appealing to the least square approximation of the likelihood function of sample selection model, we optimize the resulting function subject to L1 and adaptively weighted L1 penalties using an efficient algorithm. We evaluate the performance of the proposed approach and competing alternatives in a simulation study and applied it to the well-known American Express credit card dataset. Citation: Statistical Modelling PubDate: 2022-05-09T09:11:19Z DOI: 10.1177/1471082X221092181
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Jan Felix Meyer, Go¨ran Kauermann, Michael Stanley Smith Abstract: Statistical Modelling, Ahead of Print. We propose a model of retail demand for air travel and ticket price elasticity at the daily booking and individual flight level. Daily bookings are modelled as a non-homogeneous Poisson process with respect to the time to departure. The booking intensity is a function of booking and flight level covariates, including non-linear effects modelled semi-parametrically using penalized splines. Customer heterogeneity is incorporated using a finite mixture model, where the latent segments have covariate-dependent probabilities. We fit the model to a unique dataset of over one million daily counts of bookings for 9 602 scheduled flights on a short-haul route over two years. A control variate approach with a strong instrument corrects for a substantial level of price endogeneity. A rich latent segmentation is uncovered, along with strong covariate effects. The calibrated model can be used to quantify demand and price elasticity for different flights booked on different days prior to departure and is a step towards continuous pricing; something that is a major objective of airlines. As our model is interpretable, forecasts can be created under different scenarios. For instance, while our model is calibrated on data collected prior to COVID-19, many of the empirical insights are likely to remain valid as air travel recovers in the post-COVID-19 period. Citation: Statistical Modelling PubDate: 2022-05-09T09:01:57Z DOI: 10.1177/1471082X221083343
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Zeynab Aghabazaz, Iraj Kazemi, Alireza Nematollahi Abstract: Statistical Modelling, Ahead of Print. This article studies long-term, short-term volatility and co-volatility in stock markets by introducing modelling strategies to the multivariate data analysis that deal with serially correlated innovations and cross-section dependence. In particular, it presents an innovative mixed-effects model through a GARCH process, allowing for heterogeneity effects and time-series dynamics. We propose a non-parametric regression model of the penalized low-rank smoothing spline to present time trends into the variance and covariance equations. The strategy provides flexible modelling of the low-frequency volatility and co-volatility in equity markets. The decomposed low-frequency matrix was modelled using the modified Cholesky factorization. The Hamiltonian Monte Carlo technique is implemented as a Bayesian computing process for estimating parameters and latent factors. The advantage of our modelling strategy in empirical studies is highlighted by examining the effect of latent financial factors on a panel across 10 equities over 110 weekly series. The model can differentiate non-parametrically dynamic patterns of high and low frequencies of variance–covariance structural equations and incorporate economic features to predict variabilities in stock markets regarding time-series evidence. Citation: Statistical Modelling PubDate: 2022-03-15T05:28:10Z DOI: 10.1177/1471082X221080488
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Aljo Clair Pingal, Cathy W. S. Chen Abstract: Statistical Modelling, Ahead of Print. External events are commonly known as interventions that often affect times series of counts. This research introduces a class of transfer function models that include four different types of interventions on integer-valued time series: abrupt start and abrupt decay (additive outlier), abrupt start and gradual decay (transient shift), abrupt start and permanent effect (level shift) and gradual start and permanent effect. We propose integer-valued transfer function models incorporating a generalized Poisson, log-linear generalized Poisson or negative binomial to estimate and detect these four types of interventions in a time series of counts. Utilizing Bayesian methods, which are adaptive Markov chain Monte Carlo (MCMC) algorithms to obtain the estimation, we further employ deviance information criterion (DIC), posterior odd ratios and mean squared standardized residual for model comparisons. As an illustration, this study evaluates the effectiveness of our methods through a simulation study and application to crime data in Albury City, New South Wales (NSW) Australia. Simulation results show that the MCMC procedure is reasonably effective. The empirical outcome also reveals that the proposed models are able to successfully detect the locations and type of interventions. Citation: Statistical Modelling PubDate: 2022-03-02T06:34:05Z DOI: 10.1177/1471082X221075477
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors:Sina Mews, Roland Langrock, Marius Ötting, Houda Yaqine, Jost Reinecke Abstract: Statistical Modelling, Ahead of Print. Continuous-time state-space models (SSMs) are flexible tools for analysing irregularly sampled sequential observations that are driven by an underlying state process. Corresponding applications typically involve restrictive assumptions concerning linearity and Gaussianity to facilitate inference on the model parameters via the Kalman filter. In this contribution, we provide a general continuous-time SSM framework, allowing both the observation and the state process to be non-linear and non-Gaussian. Statistical inference is carried out by maximum approximate likelihood estimation, where multiple numerical integration within the likelihood evaluation is performed via a fine discretization of the state process. The corresponding reframing of the SSM as a continuous-time hidden Markov model, with structured state transitions, enables us to apply the associated efficient algorithms for parameter estimation and state decoding. We illustrate the modelling approach in a case study using data from a longitudinal study on delinquent behaviour of adolescents in Germany, revealing temporal persistence in the deviation of an individual's delinquency level from the population mean. Citation: Statistical Modelling PubDate: 2022-01-17T04:44:45Z DOI: 10.1177/1471082X211065785