|
|
- Nonparametric estimation of bivariate hidden Markov models using
tensor-product B-splines-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Rouven Michels; Roland Langrock Abstract: Statistical Modelling, Ahead of Print. For multivariate time series driven by underlying states, hidden Markov models (HMMs) constitute a powerful framework which can be flexibly tailored to the situation at hand. However, in practice, it can be challenging to choose an adequate family of ... Citation: Statistical Modelling PubDate: 2025-05-06T11:56:03Z DOI: 10.1177/1471082X251335431
- Bayesian semiparametric inference for TVP-SVAR models with asymmetry and
fat tails-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Matteo Iacopini; Luca Rossini Abstract: Statistical Modelling, Ahead of Print. Time-varying parameter (TVP) structural vector autoregressive models with stochastic volatility (SVAR-SV) usually assume Gaussian innovations and a smooth or discrete path for the coefficients. To account for possible skewness and fat tails, this work ... Citation: Statistical Modelling PubDate: 2025-04-17T12:38:49Z DOI: 10.1177/1471082X251326360
- Fast and efficient joint modelling of multivariate longitudinal data and
time-to-event data with a pairwise-fitting approach-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Dries De Witte; Geert Molenberghs, Ariel Alonso Abad, Thomas Neyens, Geert Verbeke Abstract: Statistical Modelling, Ahead of Print. In empirical studies, multiple outcomes are often measured repeatedly over time, and interest frequently lies in studying the association between these longitudinal outcomes and a time-to-event outcome. Therefore, shared-parameter joint models for ... Citation: Statistical Modelling PubDate: 2025-04-11T05:55:08Z DOI: 10.1177/1471082X251328452
- Realized covariance models with time-varying parameters and spillover
effects-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Luc Bauwens; Edoardo Otranto Abstract: Statistical Modelling, Ahead of Print. A realized covariance model specifies a dynamic process for a conditional covariance matrix of daily asset returns as a function of past realized variances and covariances. We propose parsimonious parameterizations enabling a spillover effect in the ... Citation: Statistical Modelling PubDate: 2025-03-19T11:52:30Z DOI: 10.1177/1471082X251324273
- A general framework for random effects models for binary, ordinal, count
type and continuous dependent variables-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Gerhard Tutz Abstract: Statistical Modelling, Ahead of Print. A general random effects model is proposed that allows for continuous as well as discrete distributions of the responses. Responses can be unrestricted continuous, bounded continuous, binary, ordered categorical or given in the form of counts. The distribution of the responses is not restricted to exponential families, which is a severe restriction in generalized mixed models. Generalized mixed models use fixed distributions for responses, for example the Poisson distribution in count data, which has the disadvantage of not accounting for overdispersion. By using a response function and a threshold function, the proposed mixed threshold model can account for a variety of alternative distributions that often show better fits than fixed distributions used within the generalized linear model framework. A particular strength of the model is that it provides a tool for joint modelling, responses may be of different types, some can be discrete, others continuous. Citation: Statistical Modelling PubDate: 2025-02-26T04:49:01Z DOI: 10.1177/1471082X251318471
- Editorial: An informal look at 25 Years of Statistical Modelling papers
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Vito M. R. Muggeo, Paul H. C. Eilers; Paul H. C. Eilers Abstract: Statistical Modelling, Ahead of Print.
Citation: Statistical Modelling PubDate: 2025-02-14T12:20:19Z DOI: 10.1177/1471082X251317581
- Semi-parametric model approach to causal mediation analysis for
longitudinal data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Youjun Li, Jeffrey M. Albert; Jeffrey M. Albert Abstract: Statistical Modelling, Ahead of Print. There has been a lack of causal mediation analysis implementation on complicated longitudinal data. Most existing work focuses on extensions of parametric models that have been well developed for causal mediation analysis. To better handle more complex data patterns, our approach takes advantage of the flexibility of penalized splines and performs the causal mediation analysis under the structural equation model framework. We also provide the formula for identifying the natural direct and indirect effects based on our semi-parametric models, whose inference is carried out by the delta method and Monte Carlo approximation. Our approach is first evaluated by conducting simulation studies, where the two methods for inference are compared. Finally, we apply the method to data from a longitudinal cohort study to examine the effect of a training programme for healthcare providers on improving their patients' type 2 diabetes condition. Citation: Statistical Modelling PubDate: 2025-02-07T09:25:26Z DOI: 10.1177/1471082X241306911
- A two-level multivariate response model for data with latent structures
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yingjuan Zhang, Jochen Einbeck, Reza Drikvandi; Jochen Einbeck, Reza Drikvandi Abstract: Statistical Modelling, Ahead of Print. A novel approach is proposed for analysing multilevel multivariate response data. The approach is based on identifying a one-dimensional latent variable spanning the space of responses, which then induces correlation between upper-level units. The latent variable, which can be thought of as a random effect, is estimated along with the other model parameters using an EM algorithm, which can be seen in the tradition of the 'nonparametric maximum likelihood' estimator for two-level linear (univariate response) models. Simulations and real data examples from different fields are provided to illustrate the proposed methods in the context of regression and clustering applications. Citation: Statistical Modelling PubDate: 2025-02-07T09:24:27Z DOI: 10.1177/1471082X241313024
- Graph-structured variable selection with Gaussian Markov random field
horseshoe prior-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Marie Denis, Mahlet G. Tadesse; Mahlet G. Tadesse Abstract: Statistical Modelling, Ahead of Print. A graph structure is commonly used to characterize the dependence between variables, which may be induced by time, space, biological networks or other factors. Incorporating this dependence structure into the variable selection procedure can improve the identification of relevant variables, especially those with subtle effects. The Bayesian approach provides a natural framework to integrate this information through the prior distributions. In this work, we propose combining two priors that have been well studied separately-the Gaussian Markov random field prior and the horseshoe prior-to perform selection on graph-structured variables. Local shrinkage parameters that capture the dependence between connected covariates are specified to encourage similar amount of shrinkage for their regression coefficients, while a standard horseshoe prior is used for non-connected variables. After evaluating the performance of the method on different simulated scenarios, we present three applications: one in quantitative trait loci mapping with block sequential structure, one in near-infrared spectroscopy with sequential non-disjoint dependence and another in gene expression study with a general dependence structure. Citation: Statistical Modelling PubDate: 2025-01-28T11:07:03Z DOI: 10.1177/1471082X241310958
- Dual-phase threshold selection methodology for modelling extreme events
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: K.M Sakthivel, V. Nandhini; V. Nandhini Abstract: Statistical Modelling, Ahead of Print. Extreme value theory is a method for modelling and evaluating risks in unusual or rare situations, and it has gained popularity in risk management. In general, the probability of extreme occurrences can be assessed by fitting a probability distribution to a sample of extreme observations. It is crucial to examine the tail shape of the distribution since it affects the estimation of parameters associated with extreme events. The extreme value index is a shape parameter that quantifies the tail behaviour of a distribution. Selecting an appropriate threshold is a major challenge in extreme value analysis, especially in peaks over threshold methods. The choice of threshold significantly impacts the bias-variance trade-off in extreme value models. The choice of the threshold based on the stability plot is often subjective. To address this challenge, we have developed an efficient dual-phase threshold selection method. This approach involves trimming non-exceedances through a two-phase procedure and using the Cramér-von Mises test to assess exceedances above a suitable threshold, thereby determining the optimal threshold. Exceedances above a certain threshold typically follow the generalized Pareto distribution asymptotically. Estimates of the return levels associated with their return periods can be used to predict the statistical properties (magnitude and frequency) of upcoming exceedances. A simulation study is conducted to evaluate the effectiveness of the proposed threshold selection method. Citation: Statistical Modelling PubDate: 2025-01-27T06:32:29Z DOI: 10.1177/1471082X241307286
- Spatio-temporal hierarchical clustering of interval time series with
application to suicide rates in Europe-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Raffaele Mattera, Philip Hans Franses; Philip Hans Franses Abstract: Statistical Modelling, Ahead of Print. In this paper, we investigate similarities of suicide rates in Europe, which are available as interval time series. For this aim, a novel spatio-temporal hierarchical clustering algorithm for interval time-series data is proposed. The spatial dimension is included in the clustering process to account for possible relevant information such as weather conditions, sunlight hours and socio-cultural factors. Our results indicate the presence of six main clusters in Europe, which almost overlap with the sunlight hours distribution. Differences between male and female suicide rates are also investigated. Citation: Statistical Modelling PubDate: 2024-12-19T01:32:45Z DOI: 10.1177/1471082X241299250
- Joint modelling of dyadic and monadic count outcomes: an application to
modelling forced migration flows-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Caterina Conigliani Abstract: Statistical Modelling, Ahead of Print. The aim of this study is to explore the adoption of a joint modelling framework for dealing with dyadic and monadic count outcomes with excess zeros simultaneously via a common latent structure. As a case study, we consider the problem of identifying the different push and pull factors of cross-border forced migration and internal displacement. We consider a full panel data analysis and estimate a random effects joint hurdle model following the Bayesian paradigm; the resultant posterior is approximated through the integrated nested Laplace approximation. Citation: Statistical Modelling PubDate: 2024-12-17T01:02:29Z DOI: 10.1177/1471082X241302176
- Inference on a bivariate binomial distribution with zero-inflation
applicable to baseball data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Seong W. Kim, Kipum Kim, Jaeyong Lee, Beom Seuk Hwang; Kipum Kim, Jaeyong Lee, Beom Seuk Hwang Abstract: Statistical Modelling, Ahead of Print. It is common to encounter situations where two success probabilities are parameters of interest based on nested (two-stage) binary data that commonly occur in sports data. Under these circumstances, two correlated and nested binomial random variables can be utilized for analysis. Analysis of discrete count data with excessive zeros has been developed using different zero-inflated statistical models that allow for frequent zero-valued observations. The ZIB distribution is one of the models that can be adequate when the underlying data generation of non-zero values is based on a sequence of independent Bernoulli trials. In this article, we propose a zero-inflated bivariate binomial distribution that can be applied to nested bivariate data when both components are zero-inflated. Some theoretical properties of the model are investigated and default Bayesian procedures regarding prior elicitation are also addressed. Moreover, the Bayesian predictive distribution is derived based on a three-fold distribution to see how a future observation behaves. Extensive simulation studies are performed to support the theoretical results, and real datasets for Major League Baseball players are analysed to illustrate the methodology developed in this paper. Citation: Statistical Modelling PubDate: 2024-12-17T01:01:00Z DOI: 10.1177/1471082X241299916
- A Bayesian approach for variable selection in mixture of logistic
regressions with Pólya-Gamma data augmentation-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Mariella A. Bogoni, Daiane A. Zuanetti; Daiane A. Zuanetti Abstract: Statistical Modelling, Ahead of Print. We present Bayesian methods for estimating and selecting variables in a mixture of logistic regression models. A common issue with the logistic model is its intractable likelihood, which prevents us from applying simpler Bayesian algorithms, such as Gibbs sampling, for estimating and selecting the model since there is no conjugacy for the regression coefficients. We propose to solve this problem by applying the data augmentation approach with Pólya-Gamma random variables to the logistic regression mixture model. For selecting covariates in this model, we investigate the performance of two prior distributions for the regression coefficients. A Gibbs sampling algorithm is then applied to perform variable selection and fit the model. The conjugacy obtained for the distribution of the regression coefficients allows us to analytically calculate the marginal likelihood and gain computational efficiency in the variable selection process. The methodologies are applied to both synthetic and real data. Citation: Statistical Modelling PubDate: 2024-11-14T08:55:11Z DOI: 10.1177/1471082X241277373
- Quantile and expectile copula-based hidden Markov regression models for
the analysis of the cryptocurrency market-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Beatrice Foroni, Luca Merlo, Lea Petrella; Luca Merlo, Lea Petrella Abstract: Statistical Modelling, Ahead of Print. The role of cryptocurrencies within the financial systems has been expanding rapidly in recent years among investors and institutions. It is therefore crucial to investigate this phenomenon and develop statistical methods able to capture their interrelationships, the links with other global systems and, at the same time, the serial heterogeneity. Here we introduce hidden Markov regression models for jointly estimating quantiles and expectiles of cryptocurrency returns using regime-switching copulas. The proposed approach allows us to focus on extreme returns and describe their temporal evolution by introducing time-dependent coefficients, evolving according to a latent Markov chain. Moreover to model their time-varying dependence structure, we consider elliptical copula functions defined by state-specific parameters. Maximum likelihood estimates are obtained via an expectation-maximization algorithm. The empirical analysis investigates the relationship between daily returns of five cryptocurrencies and major world market indices. Citation: Statistical Modelling PubDate: 2024-10-21T11:53:05Z DOI: 10.1177/1471082X241279513
- Guided structure learning of DAGs for count data
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Thi Kim Hue Nguyen, Monica Chiogna, Davide Risso, Erika Banzato; Monica Chiogna, Davide Risso, Erika Banzato Abstract: Statistical Modelling, Ahead of Print. In this paper, we tackle structure learning of directed acyclic graphs (DAGs), with the idea of exploiting available prior knowledge of the domain at hand to guide the search of the best structure. In particular, we assume to know the topological ordering of variables in addition to the given data. We study a new algorithm for learning the structure of DAGs, proving its theoretical consistency in the limit of infinite observations. Furthermore, we experimentally compare the proposed algorithm to several popular competitors, to study its behaviour in finite samples. Biological validation of the algorithm is presented through the analysis of non-small cell lung cancer data. Citation: Statistical Modelling PubDate: 2024-10-11T11:02:33Z DOI: 10.1177/1471082X241266738
- A doubly stochastic point process approach for spatio-temporal dynamics of
crime data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Jonatan A. González, Jorge Mateu, Nubia E. Céspedes, Erika A. Camacho, Luis C. Cervantes; Jorge Mateu, Nubia E. Céspedes, Erika A. Camacho, Luis C. Cervantes Abstract: Statistical Modelling, Ahead of Print. Crime, in general, is at the base of crucial issues for many societies living in large cities worldwide. Indeed, crime and neighbourhood disorder may negatively impact the health of urban residents. Thus, the crime rate reduction is at the core of many local policies driven by active plans supported by police action and local authorities.Considering crime reports as a spatio-temporal point pattern, we propose spatio-temporal log-Gaussian Cox processes as a modelling framework for crimes in space and time. We model the spatial and temporal variation through generalized parametric additive and linear models, and a Gaussian space-time process approximates the residual variation. The inference is performed via Markov chain Monte Carlo through MALA algorithms. We provide short-term forecasts of future crimes and suggest a surveillance system that operates by reporting predictive probabilities.Our data come from the reported crimes in the locality of Kennedy (Bogota) over several years and several types of crimes. The police department may use our method to help allocate police resources and design crime prevention strategies and policies, such as surveillance, in specific zonal planning units. Citation: Statistical Modelling PubDate: 2024-10-08T09:15:36Z DOI: 10.1177/1471082X241264690
- Rational voting behaviour accounting for heterogeneous ballots in British
elections-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Ingrid Mauerer, Annemarie Walter; Annemarie Walter Abstract: Statistical Modelling, Ahead of Print. The most analysed choice situation by electoral researchers is at the ballot box. In partially contested elections, such as in Britain or Spain, voters cannot vote for all parties in every region or constituency. We present a modelling approach to study rational voting behaviour that integrates such heterogeneous ballots and takes voting for the numerous parties therein seriously. The empirical application to the 2015 British Election demonstrates substantial insights not found when neglecting ballot composition heterogeneity or applying existing approaches. Citation: Statistical Modelling PubDate: 2024-10-03T11:52:46Z DOI: 10.1177/1471082X241272346
- A statistical modelling approach to feedforward neural network model
selection-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Andrew McInerney, Kevin Burke; Kevin Burke Abstract: Statistical Modelling, Ahead of Print. Feedforward neural networks (FNNs) can be viewed as non-linear regression models, where covariates enter the model through a combination of weighted summations and non-linear functions. Although these models have some similarities to the approaches used within statistical modelling, the majority of neural network research has been conducted outside of the field of statistics. This has resulted in a lack of statistically based methodology, and, in particular, there has been little emphasis on model parsimony. Determining the input layer structure is analogous to variable selection, while the structure for the hidden layer relates to model complexity. In practice, neural network model selection is often carried out by comparing models using out-of-sample performance. However, in contrast, the construction of an associated likelihood function opens the door to information-criteria-based variable and architecture selection. A novel model selection method, which performs both input- and hidden-node selection, is proposed using the Bayesian information criterion (BIC) for FNNs. The choice of BIC over out-of-sample performance as the model selection objective function leads to an increased probability of recovering the true model, while parsimoniously achieving favourable out-of-sample performance. Simulation studies are used to evaluate and justify the proposed method, and applications on real data are investigated. Citation: Statistical Modelling PubDate: 2024-09-17T05:58:23Z DOI: 10.1177/1471082X241258261
- An extended generalized Pareto regression model for count data
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Touqeer Ahmad, Carlo Gaetan, Philippe Naveau; Carlo Gaetan, Philippe Naveau Abstract: Statistical Modelling, Ahead of Print. The statistical modelling of discrete extremes has received less attention than their continuous counterparts in the extreme value theory (EVT) literature. One approach to the transition from continuous to discrete extremes is the modelling of threshold exceedances of integer random variables by the discrete version of the generalized Pareto distribution. However, the optimal choice of thresholds defining exceedances remains a problematic issue. Moreover, in a regression framework, the treatment of the majority of non-extreme data below the selected threshold is either ignored or separated from the extremes. To tackle these issues, we expand on the concept of employing a smooth transition between the bulk and the upper tail of the distribution. In the case of zero inflation, we also develop models with an additional parameter. To incorporate possible predictors, we relate the parameters to additive smoothed predictors via an appropriate link, as in the generalized additive model (GAM) framework. A penalized maximum likelihood estimation (MLE) procedure is implemented. We illustrate our modelling proposal with a real dataset of avalanche activity in the French Alps. With the advantage of bypassing the threshold selection step, our results indicate that the proposed models are more flexible and robust than competing models, such as the negative binomial distribution. Citation: Statistical Modelling PubDate: 2024-09-16T11:13:37Z DOI: 10.1177/1471082X241266729
- The Skellam distribution revisited: Estimating the unobserved incoming and
outgoing ICU COVID-19 patients on a regional level in Germany-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Martje Rave, Göran Kauermann; Göran Kauermann Abstract: Statistical Modelling, Ahead of Print. With the beginning of the COVID-19 pandemic, we became aware of the need for comprehensive data collection and its provision to scientists and experts for proper data analyses. In Germany, the Robert Koch Institute (RKI) has tried to keep up with this demand for data on COVID-19, but there were (and still are) relevant data missing that are needed to understand the whole picture of the pandemic. In this article, we take a closer look at the severity of the course of COVID-19 in Germany, for which ideal information would be the number of incoming patients to ICU units. This information was (and still is) not available. Instead, the current occupancy of ICU units on the district level was reported daily. We demonstrate how this information can be used to predict the number of incoming as well as released COVID-19 patients using a stochastic version of the Expectation Maximization algorithm (SEM). This, in turn, allows for estimating the influence of district-specific and age-specific infection rates as well as further covariates, including spatial effects, on the number of incoming patients. The article demon-strates that even if relevant data are not recorded or provided officially, statistical modelling allows for reconstructing them. This also includes the quantification of uncertainty which naturally results from the application of the SEM algorithm. Citation: Statistical Modelling PubDate: 2024-05-27T11:06:12Z DOI: 10.1177/1471082X241235024
- Bayesian effect selection in structured additive quantile regression
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Anja Rappl, Manuel Carlan, Thomas Kneib, Sebastiaan Klokman, Elisabeth Bergherr; Manuel Carlan, Thomas Kneib, Sebastiaan Klokman, Elisabeth Bergherr Abstract: Statistical Modelling, Ahead of Print. Bayesian structured additive quantile regression is an established tool for regressing outcomes with unknown distributions on a set of explanatory variables and/or when interest lies with effects on the more extreme values of the outcome. Even though variable selection for quantile regression exists, its scope is limited. We propose the use of the Normal Beta Prime Spike and Slab (NBPSS) prior in Bayesian quantile regression to aid the researcher in not only variable but also effect selection. We compare the Bayesian NBPSS approach to statistical boosting for quantile regression, a current standard in automated variable selection in quantile regression, in a simulation study with varying degrees of model complexity and illustrate both methods on an example of childhood malnutrition in Nigeria. The NBPSS prior shows good performance in variable and effect selection as well as prediction compared to boosting and can thus be recommended as an additional tool for quantile regression model building. Citation: Statistical Modelling PubDate: 2024-05-23T12:07:22Z DOI: 10.1177/1471082X241242617
- Modelling dependence in football match outcomes: Traditional assumptions
and an alternative proposal-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Marco Petretta, Lorenzo Schiavon, Jacopo Diquigiovanni; Lorenzo Schiavon, Jacopo Diquigiovanni Abstract: Statistical Modelling, Ahead of Print. The approaches routinely used to model the outcomes of football matches are characterized by strong assumptions about the dependence between the number of goals scored by the two competing teams and their marginal distribution. In this work, we argue that the assumptions traditionally made are not always based on solid arguments. Although most of these assumptions have been relaxed in the recent literature, the model introduced by Dixon and Coles in 1997 still represents a point of reference in the betting industry. While maintaining its conceptual simplicity, alternatives based on modelling the conditional distributions allow for the specification of more comprehensive dependence structures. In view of this, we propose a straightforward modification of the usual Poisson marginal models by means of thoroughly chosen marginal and conditional distributions. Careful model validation is provided, and a real data application involving five European leagues is conducted. The novel dependence structure allows to extract key insights on league dynamics and presents practical gains in several betting scenarios. Citation: Statistical Modelling PubDate: 2024-05-15T07:25:41Z DOI: 10.1177/1471082X241238802
- A Bayesian hierarchical model for predicting rates of oxygen consumption
in mechanically ventilated intensive care patients-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Luke Hardcastle, S Samuel Livingstone, Claire Black, Federico Ricciardi, Gianluca Baio; S Samuel Livingstone, Claire Black, Federico Ricciardi, Gianluca Baio Abstract: Statistical Modelling, Ahead of Print. Patients who are mechanically ventilated in the Intensive Care Unit participate in exercise as a component of their rehabilitation to ameliorate the long-term impact of critical illness on their physical function. The effective implementation of these programmes is limited, however, as clinicians do not have access to a patient's [math] values, a physiological measure that quantifies an individual patient's exercise intensity level in real-time. In this work we have developed a Bayesian hierarchical model with temporally correlated latent Gaussian processes to predict [math] using readily available physiological data, providing clinicians with information to personalise rehabilitation sessions in real-time. The model was fitted using the Integrated Nested Laplace Approximation and validated using posterior predictive checks, and the impact of alternate specifications of the latent process was examined. Assessed using leave-one-patientout cross-validation, we show that the ability to provide probabilistic statements describing classification uncertainty gives the model favourable predictive power compared to a state-of-the-art comparator based on the oxygen uptake efficiency slope, with a more than seven-fold increase in accuracy in identifying when a patient is at risk of over-exertion. Citation: Statistical Modelling PubDate: 2024-05-14T08:23:56Z DOI: 10.1177/1471082X241238810
- A novel mixture model for characterizing human aiming performance data
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Yanxi Li, Derek S. Young, Julien Gori, Olivier Rioul; Derek S. Young, Julien Gori, Olivier Rioul Abstract: Statistical Modelling, Ahead of Print. Fitts’ law is often employed as a predictive model for human movement, especially in the field of human-computer interaction. Models with an assumed Gaussian error structure are usually adequate when applied to data collected from controlled studies. However, observational data (often referred to as data gathered ‘in the wild’) typically display noticeable positive skewness relative to a mean trend as users do not routinely try to minimize their task completion time. As such, the exponentially modified Gaussian (EMG) regression model has been applied to aimed movements data. However, it is also of interest to reasonably characterize those regions where a user likely was not trying to minimize their task completion time. In this article, we propose a novel model with a two-component mixture structure—one Gaussian and one exponential—on the errors to identify such a region. An expectation-conditional-maximization (ECM) algorithm is developed for estimation of such a model and some properties of the algorithm are established. The efficacy of the proposed model, as well as its ability to inform model-based clustering, are addressed in this work through extensive simulations and an insightful analysis of a human aiming performance study. Citation: Statistical Modelling PubDate: 2024-04-25T10:54:07Z DOI: 10.1177/1471082X241234139
- Fast, effective, and coherent time series modelling using the
sparsity-ranked lasso-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Ryan Peterson, Joseph Cavanaugh; Joseph Cavanaugh Abstract: Statistical Modelling, Ahead of Print. The sparsity-ranked lasso (SRL) has been developed for model selection and estimation in the presence of interactions and polynomials. The main tenet of the SRL is that an algorithm should be more sceptical of higher-order polynomials and interactions a priori compared to main effects, and hence the inclusion of these more complex terms should require a higher level of evidence. In time series, the same idea of ranked prior scepticism can be applied to characterize the potentially complex seasonal autoregressive (AR) structure of a series during the model fitting process, becoming especially useful in settings with uncertain or multiple modes of seasonality. The SRL can naturally incorporate exogenous variables, with streamlined options for inference and/or feature selection. The fitting process is quick even for large series with a high-dimensional feature set. In this work, we discuss both the formulation of this procedure and the software we have developed for its implementation via the fastTS R package. We explore the performance of our SRL-based approach in a novel application involving the autoregressive modelling of hourly emergency room arrivals at the University of Iowa Hospitals and Clinics. We find that the SRL is considerably faster than its competitors, while generally producing more accurate predictions. Citation: Statistical Modelling PubDate: 2024-03-08T11:25:27Z DOI: 10.1177/1471082X231225307
- Taking advantage of sampling designs in spatial small-area survey studies
-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Carlos Vergara-Hernández, Marc Marí-Dell’Olmo, Laura Oliveras, Miguel Angel Martinez-Beneito; Marc Marí-Dell’Olmo, Laura Oliveras, Miguel Angel Martinez-Beneito Abstract: Statistical Modelling, Ahead of Print. Spatial small area estimation models have become very popular in some contexts, such as disease mapping. Data in disease mapping studies are exhaustive, that is, the available data are supposed to be a complete register of all the observable events. In contrast, some other small area studies do not use exhaustive data, such as survey based studies, where a particular sampling design is typically followed and inferences are later extrapolated to the entire population. In this article we propose a spatial model for small area survey studies, taking advantage of spatial dependence between units, which is the key assumption used for yielding reliable estimates in exhaustive data based studies. In addition, and in contrast to most survey-based spatial studies, we also take into account information on the sampling design and additional supplementary variables to obtain estimates in small areas. This makes it possible to merge spatial and sampling models into a common proposal. Citation: Statistical Modelling PubDate: 2024-03-05T09:01:57Z DOI: 10.1177/1471082X231226287
- Copula-based pairwise estimator for quantile regression with hierarchical
missing data-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Anneleen Verhasselt, Alvaro J. Flórez, Geert Molenberghs, Ingrid Van Keilegom; Alvaro J. Flórez, Geert Molenberghs, Ingrid Van Keilegom Abstract: Statistical Modelling, Ahead of Print. Quantile regression can be a helpful technique for analysing clustered (such as longitudinal) data. It can characterize the change in response over time without making distributional assumptions and is robust to outliers in the response. A quantile regression model using a copula-based multivariate asymmetric Laplace distribution for addressing correlation due to clustering is introduced. Furthermore, we propose a pairwise estimator for the parameters of the model. Since it is based on pseudo-likelihood, it needs to be modified to avoid bias in presence of missingness. Therefore, we enhance the model with inverse probability weighting. In this way, our proposal is unbiased under the missing at random assumption. Based on simulations, the estimator is efficient and computationally fast. Finally, the methodology is illustrated using a study in ophthalmology. Citation: Statistical Modelling PubDate: 2024-02-28T06:13:50Z DOI: 10.1177/1471082X231225806
- Estimation for vector autoregressive model under multivariate
skew-t-normal innovations-
Free pre-print version: Loading...
Rate this result:
What is this?
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Authors: Uchenna Chinedu Nduka, Everestus Okafor Ossai, Mbanefo Solomon Madukaife, Tobias Ejiofor Ugah; Everestus Okafor Ossai, Mbanefo Solomon Madukaife, Tobias Ejiofor Ugah Abstract: Statistical Modelling, Ahead of Print. Current procedures for estimating the parameters of [math]th order vector autoregressive (VAR [math]) model are usually based on assuming that the ensuing error distribution is multivariate normal. But there exists large body of evidence that several data encountered in real life are skewed; thereby making estimators derived based on normality assumption not suitable in such scenarios. This prompts for the search of appropriate methods for skewed distributions. Therefore, this article proposes estimators for the mean and covariance matrices of the [math] model under multivariate skew- [math]-normal (MSTN) distribution. Also, estimators for the shape and skewness parameters are provided. The expectation conditional maximization (ECM) and its extension the expectation conditional maximization either (ECME) algorithms are the tools used to derive the estimators. The performance of the estimators were examined through extensive simulations, and results show that they compete favourably with other numerical methods especially when the underlying distribution is skewed. The usefulness of our estimators was illustrated using a real data set on some US economic indicators. The VAR [math] model under MSTN distribution provides a good fit, better than [math] model under the assumption of normality. Citation: Statistical Modelling PubDate: 2024-02-15T10:36:40Z DOI: 10.1177/1471082X231224910
|