Authors:Gunther Schauberger, Gerhard Tutz Abstract: Statistical Modelling, Ahead of Print. Common random effects models for repeated measurements account for the heterogeneity in the population by including subject-specific intercepts or variable effects. They do not account for the heterogeneity in answering tendencies. For ordinal responses in particular, the tendency to choose extreme or middle responses can vary in the population. Extended models are proposed that account for this type of heterogeneity. Location effects as well as the tendency to extreme or middle responses are modelled as functions of explanatory variables. It is demonstrated that ignoring response styles may affect the accuracy of parameter estimates. An example demonstrates the applicability of the method. Citation: Statistical Modelling PubDate: 2021-01-06T10:47:57Z DOI: 10.1177/1471082X20978034
Authors:M. Carmen Aguilera-Morillo, Ana M. Aguilera Pages: 592 - 616 Abstract: Statistical Modelling, Volume 20, Issue 6, Page 592-616, December 2020. A functional linear discriminant analysis approach to classify a set of kinematic data (human movement curves of individuals performing different physical activities) is performed. Kinematic data, usually collected in linear acceleration or angular rotation format, can be identified with functions in a continuous domain (time, percentage of gait cycle, etc.). Since kinematic curves are measured in the same sample of individuals performing different activities, they are a clear example of functional data with repeated measures. On the other hand, the sample curves are observed with noise. Then, a roughness penalty might be necessary in order to provide a smooth estimation of the discriminant functions, which would make them more interpretable. Moreover, because of the infinite dimension of functional data, a reduction dimension technique should be considered. To solve these problems, we propose a multi-class approach for penalized functional partial least squares (FPLS) regression. Then linear discriminant analysis (LDA) will be performed on the estimated FPLS components. This methodology is motivated by two case studies. The first study considers the linear acceleration recorded every two seconds in 30 subjects, related to three different activities (walking, climbing stairs and down stairs). The second study works with the triaxial angular rotation, for each joint, in 51 children when they completed a cycle walking under three conditions (walking, carrying a backpack and pulling a trolley). A simulation study is also developed for comparing the performance of the proposed functional LDA with respect to the corresponding multivariate and non-penalized approaches. Citation: Statistical Modelling PubDate: 2020-10-15T04:36:17Z DOI: 10.1177/1471082X19871157 Issue No:Vol. 20, No. 6 (2020)
Authors:Francesco Finazzi, Lucia Paci Pages: 617 - 633 Abstract: Statistical Modelling, Volume 20, Issue 6, Page 617-633, December 2020. Localizing people across space and over time is a relevant and challenging problem in many modern applications. Smartphone ubiquity gives the opportunity to collect useful individual data as never before. In this work, the focus is on location data collected by smartphone applications. We propose a kernel-based density estimation approach that exploits cyclical spatio-temporal patterns of people to estimate the individual location density at any time, uncertainty included. Model parameters are estimated by maximum likelihood cross-validation. Unlike classic tracking methods designed for high spatio-temporal resolution data, the approach is suitable when location data are sparse in time and are affected by non-negligible errors. The approach is applied to location data collected by the Earthquake Network citizen science project which carries out a worldwide earthquake early warning system based on smartphones. The approach is parsimonious and is suitable to model location data gathered by any location-aware smartphone application. Citation: Statistical Modelling PubDate: 2020-10-15T04:37:28Z DOI: 10.1177/1471082X19870331 Issue No:Vol. 20, No. 6 (2020)
Authors:Lizbeth Naranjo, Emmanuel Lesaffre, Carlos J. Pérez Abstract: Statistical Modelling, Ahead of Print. Motivated by a longitudinal oral health study, the Signal-Tandmobiel® study, an inhomogeneous mixed hidden Markov model with continuous state-space is proposed to explain the caries disease process in children between 6 and 12 years of age. The binary caries experience outcomes are subject to misclassification. We modelled this misclassification process via a longitudinal latent continuous response subject to a measurement error process and showing a monotone behaviour. The baseline distributions of the unobservable continuous processes are defined as a function of the covariates through the specification of conditional distributions making use of the Markov property. In addition, random effects are considered to model the relationships among the multivariate responses. Our approach is in contrast with a previous approach working on the binary outcome scale. This method requires conditional independence of the possibly corrupted binary outcomes on the true binary outcomes. We assumed conditional independence on the latent scale, which is a weaker assumption than conditional independence on the binary scale. The aim of this article is therefore to show the properties of a model for a progressive longitudinal response with misclassification on the manifest scale but modelled on the latent scale. The model parameters are estimated in a Bayesian way using an efficient Markov chain Monte Carlo method. The model performance is shown through a simulation-based example, and the analysis of the motivating dataset is presented. Citation: Statistical Modelling PubDate: 2020-12-23T04:42:29Z DOI: 10.1177/1471082X20973473
Authors:Francesco Finazzi, Lucia Paci Abstract: Statistical Modelling, Ahead of Print. AbstractLocalizing people across space and over time is a relevant and challenging problem in many modern applications. Smartphone ubiquity gives the opportunity to collect useful individual data as never before. In this work, the focus is on location data collected by smartphone applications. We propose a kernel-based density estimation approach that exploits cyclical spatio-temporal patterns of people to estimate the individual location density at any time, uncertainty included. Model parameters are estimated by maximum likelihood cross-validation. Unlike classic tracking methods designed for high spatio-temporal resolution data, the approach is suitable when location data are sparse in time and are affected by non-negligible errors. The approach is applied to location data collected by the Earthquake Network citizen science project which carries out a worldwide earthquake early warning system based on smartphones. The approach is parsimonious and is suitable to model location data gathered by any location-aware smartphone application. Citation: Statistical Modelling PubDate: 2020-12-22T05:12:40Z DOI: 10.1177/1471082X17870331
Authors:Avner Bar-Hen, Pierre Barbillon, Sophie Donnet Abstract: Statistical Modelling, Ahead of Print. Generalized multipartite networks consist in the joint observation of several networks implying some common pre-specified groups of individuals. Such complex networks arise commonly in social sciences, biology, ecology, etc. We propose a flexible probabilistic model named Multipartite Block Model (MBM) able to unravel the topology of multipartite networks by identifying clusters (blocks) of nodes sharing the same patterns of connectivity across the collection of networks they are involved in. The model parameters are estimated through a variational version of the Expectation–Maximization algorithm. The numbers of blocks are chosen using an Integrated Completed Likelihood criterion specifically designed for our model. A simulation study illustrates the robustness of the inference strategy. Finally, two datasets respectively issued from ecology and ethnobiology are analyzed with the MBM in order to illustrate its flexibility and its relevance for the analysis of real datasets.The inference procedure is implemented in an R-package GREMLIN, available on Github (https://github.com/Demiperimetre/GREMLINhttps://github.com/Demiperimetre/GREMLIN). Citation: Statistical Modelling PubDate: 2020-12-18T11:43:51Z DOI: 10.1177/1471082X20963254
Authors:Roger S. Bivand, Virgilio Gómez-Rubio Abstract: Statistical Modelling, Ahead of Print. Zhou and Hanson; Zhou and Hanson; Zhou and Hanson (, Nonparametric Bayesian Inference in Biostatistics, pages 215–46. Cham: Springer; 2018, Journal of the American Statistical Association, 113, 571–81; 2020, spBayesSurv: Bayesian Modeling and Analysis of Spatially Correlated Survival Data. R package version 1.1.4) and Zhou et al. (2020, Journal of Statistical Software, Articles, 92, 1–33) present methods for estimating spatial survival models using areal data. This article applies their methods to a dataset recording New Orleans business decisions to re-open after Hurricane Katrina; the data were included in LeSage et al. (2011b, Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 1007—27). In two articles (LeSage etal., 2011a, Significance, 8, 160—63; 2011b, Journal of the Royal Statistical Society: Series A (Statistics in Society), 174, 1007—27), spatial probit models are used to model spatial dependence in this dataset, with decisions to re-open aggregated to the first 90, 180 and 360 days. We re-cast the problem as one of examining the time-to-event records in the data, right-censored as observations ceased before 175 businesses had re-opened; we omit businesses already re-opened when observations began on Day 41. We are interested in checking whether the conclusions about the covariates using aspatial and spatial probit models are modified when applying survival and spatial survival models estimated using MCMC and INLA. In general, we find that the same covariates are associated with re-opening decisions in both modelling approaches. We do however find that data collected from three streets differ substantially, and that the streets are probably better handled separately or that the street effect should be included explicitly. Citation: Statistical Modelling PubDate: 2020-12-15T11:11:10Z DOI: 10.1177/1471082X20967158
Authors:Marc Schneble, Göran Kauermann Abstract: Statistical Modelling, Ahead of Print. Estimation of latent network flows is a common problem in statistical network analysis. The typical setting is that we know the margins of the network, that is, in- and outdegrees, but the flows are unobserved. In this article, we develop a mixed regression model to estimate network flows in a bike-sharing network if only the hourly differences of in- and outdegrees at bike stations are known. We also include exogenous covariates such as weather conditions. Two different parameterizations of the model are considered to estimate (a) the whole network flow and (b) the network margins only. The estimation of the model parameters is proposed via an iterative penalized maximum likelihood approach. This is exemplified by modelling network flows in the Vienna bike-sharing system. In order to evaluate our modelling approach, we conduct our analyses exploiting different distributional assumptions while we also respect the provider's interventions appropriately for keeping the estimation error low. Furthermore, a simulation study is conducted to show the performance of the model. For practical purposes, it is crucial to predict when and at which station there is a lack or an excess of bikes. For this application, our model shows to be well suited by providing quite accurate predictions. Citation: Statistical Modelling PubDate: 2020-12-15T10:57:08Z DOI: 10.1177/1471082X20971911
Authors:Zahra Mahdiyeh, Iraj Kazemi, Geert Verbeke Abstract: Statistical Modelling, Ahead of Print. This article introduces a flexible modelling strategy to extend the familiar mixed-effects models for analysing longitudinal responses in the multivariate setting. By initiating a flexible multivariate multimodal distribution, this strategy relaxes the imposed normality assumption of related random-effects. We use copulas to construct a multimodal form of elliptical distributions. It can deal with the multimodality of responses and the non-linearity of dependence structure. Moreover, the proposed model can flexibly accommodate clustered subject-effects for multiple longitudinal measurements. It is much useful when several subpopulations exist but cannot be directly identifiable. Since the implied marginal distribution is not in the closed form, to approximate the associated likelihood functions, we suggest a computational methodology based on the Gauss–Hermite quadrature that consequently enables us to implement standard optimization techniques. We conduct a simulation study to highlight the main properties of the theoretical part and make a comparison with regular mixture distributions. Results confirm that the new strategy deserves to receive attention in practice. We illustrate the usefulness of our model by the analysis of a real-life dataset taken from a low back pain study. Citation: Statistical Modelling PubDate: 2020-12-14T03:52:50Z DOI: 10.1177/1471082X20967168
Authors:Amani Almohaimeed, Jochen Einbeck Abstract: Statistical Modelling, Ahead of Print. Random effect models have been popularly used as a mainstream statistical technique over several decades; and the same can be said for response transformation models such as the Box–Cox transformation. The latter aims at ensuring that the assumptions of normality and of homoscedasticity of the response distribution are fulfilled, which are essential conditions for inference based on a linear model or a linear mixed model. However, methodology for response transformation and simultaneous inclusion of random effects has been developed and implemented only scarcely, and is so far restricted to Gaussian random effects. We develop such methodology, thereby not requiring parametric assumptions on the distribution of the random effects. This is achieved by extending the ‘Nonparametric Maximum Likelihood’ towards a ‘Nonparametric profile maximum likelihood’ technique, allowing to deal with overdispersion as well as two-level data scenarios. Citation: Statistical Modelling PubDate: 2020-12-14T03:52:10Z DOI: 10.1177/1471082X20966919
Authors:Grilli Leonardo, Francesca Marino Maria, Paccagnella Omar, Rampichini Carla Abstract: Statistical Modelling, Ahead of Print. The article is motivated by the analysis of the relationship between university student ratings and teacher practices and attitudes, which are measured via a set of binary and ordinal items collected by an innovative survey. The analysis is conducted through a two-level random intercept model, where student ratings are nested within teachers. The analysis must face two issues about the items measuring teacher practices and attitudes, which are level 2 predictors: (a) the items are severely affected by missingness due to teacher non-response and (b) there is redundancy in both the number of items and the number of categories of their measurement scale. We tackle the missing data issue by considering a multiple imputation strategy exploiting information at both student and teacher levels. For the redundancy issue, we rely on regularization techniques for ordinal predictors, also accounting for the multilevel data structure. The proposed solution addresses the problem at hand in an original way, and it can be applied whenever it is required to select level 2 predictors affected by missing values. The results obtained with the final model indicate that ratings on teacher ability to motivate students are related to certain teacher practices and attitudes. Citation: Statistical Modelling PubDate: 2020-10-23T06:19:53Z DOI: 10.1177/1471082X20949710
Authors:Jennifer Pohle, Roland Langrock, Mihaela van der Schaar, Ruth King, Frants Havmand Jensen Abstract: Statistical Modelling, Ahead of Print. State-switching models such as hidden Markov models or Markov-switching regression models are routinely applied to analyse sequences of observations that are driven by underlying non-observable states. Coupled state-switching models extend these approaches to address the case of multiple observation sequences whose underlying state variables interact. In this article, we provide an overview of the modelling techniques related to coupling in state-switching models, thereby forming a rich and flexible statistical framework particularly useful for modelling correlated time series. Simulation experiments demonstrate the relevance of being able to account for an asynchronous evolution as well as interactions between the underlying latent processes. The models are further illustrated using two case studies related to (a) interactions between a dolphin mother and her calf as inferred from movement data and (b) electronic health record data collected on 696 patients within an intensive care unit. Citation: Statistical Modelling PubDate: 2020-10-21T12:18:30Z DOI: 10.1177/1471082X20956423
Authors:Katya Mauff, Nicole S. Erler, Isabella Kardys, Dimitris Rizopoulos Abstract: Statistical Modelling, Ahead of Print. Multiple longitudinal outcomes are theoretically easily modelled via extension of the generalized linear mixed effects model. However, due to computational limitations in high dimensions, in practice these models are applied only in situations with relatively few outcomes. We adapt the solution proposed by Fieuws and Verbeke (2006) to the Bayesian setting: fitting all pairwise bivariate models instead of a single multivariate model, and combining the Markov Chain Monte Carlo (MCMC) realizations obtained for each pairwise bivariate model for the relevant parameters. We explore importance sampling as a method to more closely approximate the correct multivariate posterior distribution. Simulation studies show satisfactory results in terms of bias, RMSE and coverage of the 95% credible intervals for multiple longitudinal outcomes, even in scenarios with more limited information and non-continuous outcomes, although the use of importance sampling is not successful. We further examine the incorporation of a time-to-event outcome, proposing the use of Bayesian pairwise estimation of a multivariate GLMM in an adaptation of the corrected two-stage estimation procedure for the joint model for multiple longitudinal outcomes and a time-to-event outcome (Statistics and Computing Mauff et al.(2020)Mauff, Steyerberg, Kardys, Boersma, and Rizopoulos). The method does not work as well in the case of the corrected two-stage joint model; however, the results are promising and should be explored further. Citation: Statistical Modelling PubDate: 2020-09-28T09:48:55Z DOI: 10.1177/1471082X20945069
Authors:Denise Costantin, Andrea Sottosanti, Alessandra R. Brazzale, Denis Bastieri, JunHui Fan Abstract: Statistical Modelling, Ahead of Print. Identifying as yet undetected high-energy sources in the [math]-ray sky is one of the declared objectives of the Fermi Large Area Telescope (LAT) Collaboration. We develop a Bayesian mixture model which is capable of disentangling the high-energy extra-galactic sources present in a given sky region from the pervasive background radiation. We achieve this by combining two model components. The first component models the emission activity of the single sources and incorporates the instrument response function of the Fermi [math]-ray space telescope. The second component reliably reflects the current knowledge of the physical phenomena which underlie the [math]-ray background. The model parameters are estimated using a reversible jump MCMC algorithm, which simultaneously returns the number of detected sources, their locations and relative intensities, and the background component. Our proposal is illustrated using a sample of the Fermi LAT data. In the analysed sky region, our model correctly identifies 116 sources out of the 132 present. The detection rate and the estimated directions and intensities of the identified sources are largely unaffected by the number of detected sources. Citation: Statistical Modelling PubDate: 2020-09-28T09:43:34Z DOI: 10.1177/1471082X20947222
Authors:Dunfu Yang, Gyuhyeong Goh, Haiyan Wang Abstract: Statistical Modelling, Ahead of Print. In the context of high-dimensional multivariate linear regression, sparse reduced-rank regression (SRRR) provides a way to handle both variable selection and low-rank estimation problems. Although there has been extensive research on SRRR, statistical inference procedures that deal with the uncertainty due to variable selection and rank reduction are still limited. To fill this research gap, we develop a fully Bayesian approach to SRRR. A major difficulty that occurs in a fully Bayesian framework is that the dimension of parameter space varies with the selected variables and the reduced-rank. Due to the varying-dimensional problems, traditional Markov chain Monte Carlo (MCMC) methods such as Gibbs sampler and Metropolis-Hastings algorithm are inapplicable in our Bayesian framework. To address this issue, we propose a new posterior computation procedure based on the Laplace approximation within the collapsed Gibbs sampler. A key feature of our fully Bayesian method is that the model uncertainty is automatically integrated out by the proposed MCMC computation. The proposed method is examined via simulation study and real data analysis. Citation: Statistical Modelling PubDate: 2020-09-25T08:40:47Z DOI: 10.1177/1471082X20948697
Authors:Md. Tuhin Sheikh, Joseph G. Ibrahim, Jonathan A. Gelfond, Wei Sun, Ming-Hui Chen Abstract: Statistical Modelling, Ahead of Print. This research is motivated from the data from a large Selenium and Vitamin E Cancer Prevention Trial (SELECT). The prostate specific antigens (PSAs) were collected longitudinally, and the survival endpoint was the time to low-grade cancer or the time to high-grade cancer (competing risks). In this article, the goal is to model the longitudinal PSA data and the time-to-prostate cancer (PC) due to low- or high-grade. We consider the low-grade and high-grade as two competing causes of developing PC. A joint model for simultaneously analysing longitudinal and time-to-event data in the presence of multiple causes of failure (or competing risk) is proposed within the Bayesian framework. The proposed model allows for handling the missing causes of failure in the SELECT data and implementing an efficient Markov chain Monte Carlo sampling algorithm to sample from the posterior distribution via a novel reparameterization technique. Bayesian criteria, [math]DIC[math], and [math]WAIC[math], are introduced to quantify the gain in fit in the survival sub-model due to the inclusion of longitudinal data. A simulation study is conducted to examine the empirical performance of the posterior estimates as well as [math]DIC[math] and [math]WAIC[math] and a detailed analysis of the SELECT data is also carried out to further demonstrate the proposed methodology. Citation: Statistical Modelling PubDate: 2020-09-25T03:40:35Z DOI: 10.1177/1471082X20944620
Authors:Wagner H. Bonat, Ricardo R. Petterle, Priscilla Balbinot, Alexandre Mansur, Ruth Graf Abstract: Statistical Modelling, Ahead of Print. We propose a multivariate regression model to deal with multiple outcomes along with repeated measures in the context of longitudinal data analysis. Our model allows for flexible and interpretable modelling of the covariance structure within outcomes by using a linear combination of known matrices, while the generalized Kronecker product is employed to take into account the correlation between outcomes. We present maximum likelihood estimation along with extensions of the classical multivariate analysis of variance and multiple comparison hypothesis tests to deal with multivariate longitudinal data. The model and the associated multivariate hypothesis test are motivated by a prospective study conducted to compare three aesthetic eyelid surgery techniques, namely blepharoplasty, endoscopic forehead lift and endoscopic forehead lift associated with blepharoplasty. The effect of the techniques was assessed using measurements of a horizontal line through pupil centre and then three vertical lines, which go in direction to lateral canthus, middle pupil and medial canthus to the top of the brow. In this study, 30 female patients were randomly divided into three groups. Preoperative measurements were compared with postoperative measurements taken 30 days, 90 days and 10 years after the surgery. The presented multivariate model provided a better fit than its univariate counterpart. The results showed that the three surgery techniques tend to increase all considered outcomes in a long-term perspective, that is, from preoperative to 10 years postoperative evaluations. The only exception was for the outcome lateral eyebrow, for which the blepharoplasty had no significant effect. Citation: Statistical Modelling PubDate: 2020-09-21T05:15:44Z DOI: 10.1177/1471082X20943312
Authors:Meredith A. Ray, Dale Bowman, Ryan Csontos, Roy B. Van Arsdale, Hongmei Zhang Abstract: Statistical Modelling, Ahead of Print. Earthquakes are one of the deadliest natural disasters. Our study focuses on detecting temporal patterns of earthquakes occurring along intraplate faults in the New Madrid seismic zone (NMSZ) within the middle of the United States from 1996–2016. Based on the magnitude and location of each earthquake, we developed a Bayesian clustering method to group hypocentres such that each group shared the same temporal pattern of occurrence. We constructed a matrix-variate Dirichlet process prior to describe temporal trends in the space and to detect regions showing similar temporal patterns. Simulations were conducted to assess accuracy and performance of the proposed method and to compare to other commonly used clustering methods such as Kmean, Kmedian and partition-around-medoids. We applied the method to NMSZ data to identify clusters of temporal patterns, which represent areas of stress that are potentially migrating over time. This information can then be used to assist in the prediction of future earthquakes. Citation: Statistical Modelling PubDate: 2020-08-25T11:52:45Z DOI: 10.1177/1471082X20939767
Authors:Aris Perperoglou, Marianne Huebner Abstract: Statistical Modelling, Ahead of Print. In this work, we develop ‘quantile foliation’ to predict outcomes for one explanatory variable based on two covariates and varying quantiles. This is an extension of quantile sheets.Data from World Championships in Olympic weightlifting with athletes aged 13 to 90 are used to study performances across the life span. Weightlifters of all ages compete in body weight classes, and we study performance development for adolescents, age at peak performance and decline for Masters athletes who are 35 years or older.In prior studies, weightlifting performances were compared with a body mass adjustment formula developed using world records. Although intended for elite athletes with highest performances, this formula was applied to weightlifters of all ages, and age factors for Masters were estimated based on these body mass adjustments. A comparison of youth athletes’ performances for different body mass has not been done.With quantile foliation, it is possible to examine age-associated patterns of performance increase for youth and to study the decline after reaching the peak performance. This can be done for athletes with different body mass and different performance levels as measured by quantiles. R code and example data are available as supplementary materials. Citation: Statistical Modelling PubDate: 2020-08-25T06:12:19Z DOI: 10.1177/1471082X20940156
Authors:Mirko Signorelli, Pietro Spitali, Roula Tsonaka Abstract: Statistical Modelling, Ahead of Print. We present a new modelling approach for longitudinal overdispersed counts that is motivated by the increasing availability of longitudinal RNA-sequencing experiments. The distribution of RNA-seq counts typically exhibits overdispersion, zero-inflation and heavy tails; moreover, in longitudinal designs repeated measurements from the same subject are typically (positively) correlated. We propose a generalized linear mixed model based on the Poisson–Tweedie distribution that can flexibly handle each of the aforementioned features of longitudinal overdispersed counts. We develop a computational approach to accurately evaluate the likelihood of the proposed model and to perform maximum likelihood estimation. Our approach is implemented in the R package ptmixed, which can be freely downloaded from CRAN. We assess the performance of ptmixed on simulated data, and we present an application to a dataset with longitudinal RNA-sequencing measurements from healthy and dystrophic mice. The applicability of the Poisson–Tweedie mixed-effects model is not restricted to longitudinal RNA-seq data, but it extends to any scenario where non-independent measurements of a discrete overdispersed response variable are available. Citation: Statistical Modelling PubDate: 2020-08-25T04:56:31Z DOI: 10.1177/1471082X20936017
Authors:M. Menictas, T.H. Nolan, D.G. Simpson, M.P. Wand Abstract: Statistical Modelling, Ahead of Print. A two-level group-specific curve model is such that the mean response of each member of a group is a separate smooth function of a predictor of interest. The three-level extension is such that one grouping variable is nested within another one, and higher level extensions are analogous. Streamlined variational inference for higher level group-specific curve models is a challenging problem. We confront it by systematically working through two-level and then three-level cases and making use of the higher level sparse matrix infrastructure laid down in (Nolan and Wand (2020), ANZIAM Journal, doi: 10.1017/S1446181120000061). A motivation is analysis of data from ultrasound technology for which three-level group-specific curve models are appropriate. Whilst extension to the number of levels exceeding three is not covered explicitly, the pattern established by our systematic approach sheds light on what is required for even higher level group-specific curve models. Citation: Statistical Modelling PubDate: 2020-08-21T07:25:16Z DOI: 10.1177/1471082X20930894
Authors:Fan Zhang, Ming-Hui Chen, Xiuyu Julie Cong, Qingxia Chen Abstract: Statistical Modelling, Ahead of Print. Longitudinal biomarkers such as patient-reported outcomes (PROs) and quality of life (QOL) are routinely collected in cancer clinical trials or other studies. Joint modelling of PRO/QOL and survival data can provide a comparative assessment of patient-reported changes in specific symptoms or global measures that correspond to changes in survival. Motivated by a head and neck cancer clinical trial, we develop a class of trajectory-based models for longitudinal and survival data with disease progression. Specifically, we propose a class of mixed effects regression models for longitudinal measures, a cure rate model for the disease progression time ([math]) and a Cox proportional hazards model with time-varying covariates for the overall survival time ([math]) to account for [math] and treatment switching. Under the semi-competing risks framework, the disease progression is the non-terminal event, the occurrence of which is subject to a terminal event of death. The properties of the proposed models are examined in detail. Within the Bayesian paradigm, we derive the decompositions of the deviance information criterion (DIC) and the logarithm of the pseudo-marginal likelihood (LPML) to assess the fit of the longitudinal component of the model and the fit of each survival component, separately. We further develop [math]DIC as well as [math]LPML to determine the importance and contribution of the longitudinal data to the model fit of the [math] and [math] data. Citation: Statistical Modelling PubDate: 2020-07-28T05:09:59Z DOI: 10.1177/1471082X20933363
Authors:Vito M.R. Muggeo, Federico Torretta, Paul H. C. Eilers, Mariangela Sciandra, Massimo Attanasio Abstract: Statistical Modelling, Ahead of Print. We propose an iterative algorithm to select the smoothing parameters in additive quantile regression, wherein the functional forms of the covariate effects are unspecified and expressed via B-spline bases with difference penalties on the spline coefficients. The proposed algorithm relies on viewing the penalized coefficients as random effects from the symmetric Laplace distribution, and it turns out to be very efficient and particularly attractive with multiple smooth terms. Through simulations we compare our proposal with some alternative approaches, including the traditional ones based on minimization of the Schwarz Information Criterion. A real-data analysis is presented to illustrate the method in practice. Citation: Statistical Modelling PubDate: 2020-07-18T05:46:08Z DOI: 10.1177/1471082X20929802
Authors:Halvard Arntzen, Lars Magnus Hvattum Abstract: Statistical Modelling, Ahead of Print. The main goal of this article is to compare the performance of team ratings and individual player ratings when trying to forecast match outcomes in association football. The well-known Elo rating system is used to calculate team ratings, whereas a variant of plus-minus ratings is used to rate individual players. For prediction purposes, two covariates are introduced. The first represents the pre-match difference in Elo ratings of the two teams competing, while the second is the average difference in individual ratings for the players in the starting line-ups of the two teams. Two different statistical models are used to generate forecasts. The first type is an ordered logit regression (OLR) model that directly outputs probabilities for each of the three possible match outcomes, namely home win, draw and away win. The second type is based on competing risk modelling and involves the estimation of scoring rates for the two competing teams. These scoring rates are used to derive match outcome probabilities using discrete event simulation. Both types of models can be used to generate pre-game forecasts, whereas the competing risk models can also be used for in-game predictions. Computational experiments indicate that there is no statistical difference in the prediction quality for pre-game forecasts between the OLR models and the competing risk models. It is also found that team ratings and player ratings perform about equally well when predicting match outcomes. However, forecasts made when using both team ratings and player ratings as covariates are significantly better than those based on only one of the ratings. Citation: Statistical Modelling PubDate: 2020-07-10T01:56:40Z DOI: 10.1177/1471082X20929881
Authors:Almond Stöcker, Sarah Brockhaus, Sophia Anna Schaffer, Benedikt von Bronk, Madeleine Opitz, Sonja Greven Abstract: Statistical Modelling, Ahead of Print. AbstractWe extend generalized additive models for location, scale and shape (GAMLSS) to regression with functional response. This allows us to simultaneously model point-wise mean curves, variances and other distributional parameters of the response in dependence of various scalar and functional covariate effects. In addition, the scope of distributions is extended beyond exponential families. The model is fitted via gradient boosting, which offers inherent model selection and is shown to be suitable for both complex model structures and highly auto-correlated response curves. This enables us to analyse bacterial growth in Escherichia coli in a complex interaction scenario, fruitfully extending usual growth models. Citation: Statistical Modelling PubDate: 2020-06-11T04:34:51Z DOI: 10.1177/1471082X20917586
Authors:Grigorios Papageorgiou, Dimitris Rizopoulos Abstract: Statistical Modelling, Ahead of Print. AbstractDropout is a common complication in longitudinal studies, especially since the distinction between missing not at random (MNAR) and missing at random (MAR) dropout is intractable. Consequently, one starts with an analysis that is valid under MAR and then performs a sensitivity analysis by considering MNAR departures from it. To this end, specific classes of joint models, such as pattern-mixture models (PMMs) and selection models (SeMs), have been proposed. On the contrary, shared-parameter models (SPMs) have received less attention, possibly because they do not embody a characterization of MAR. A few approaches to achieve MAR in SPMs exist, but are difficult to implement in existing software. In this article, we focus on SPMs for incomplete longitudinal and time-to-dropout data and propose an alternative characterization of MAR by exploiting the conditional independence assumption, under which outcome and missingness are independent given a set of random effects. By doing so, the censoring distribution can be utilized to cover a wide range of assumptions for the missing data mechanism on the subject-specific level. This approach offers substantial advantages over its counterparts and can be easily implemented in existing software. More specifically, it offers flexibility over the assumption for the missing data generating mechanism that governs dropout by allowing subject-specific perturbations of the censoring distribution, whereas in PMMs and SeMs dropout is considered MNAR strictly. Citation: Statistical Modelling PubDate: 2020-06-10T10:05:27Z DOI: 10.1177/1471082X20927114
Authors:Lauren J Beesley, Jeremy MG Taylor Abstract: Statistical Modelling, Ahead of Print. Multistate modelling is a strategy for jointly modelling related time-to-event outcomes that can handle complicated outcome relationships, has appealing interpretations, can provide insight into different aspects of disease development and can be useful for making individualized predictions. A challenge with using multistate modelling in practice is the large number of parameters, and variable selection and shrinkage strategies are needed in order for these models to gain wider adoption. Application of existing selection and shrinkage strategies in the multistate modelling setting can be challenging due to complicated patterns of data missingness, inclusion of highly correlated predictors and hierarchical parameter relationships.In this article, we discuss how to modify and implement several existing Bayesian variable selection and shrinkage methods in a general multistate modelling setting. We compare the performance of these methods in terms of parameter estimation and model selection in a multistate cure model of recurrence and death in patients treated for head and neck cancer. We can view this work as a case study of variable selection and shrinkage in a complicated modelling setting with missing data. Citation: Statistical Modelling PubDate: 2020-06-08T06:09:30Z DOI: 10.1177/1471082X20920972
Authors:Janet van Niekerk, Haakon Bakka, Håvard Rue Abstract: Statistical Modelling, Ahead of Print. AbstractThe methodological advancements made in the field of joint models are numerous. None the less, the case of competing risks joint models has largely been neglected, especially from a practitioner's point of view. In the relevant works on competing risks joint models, the assumptions of a Gaussian linear longitudinal series and proportional cause-specific hazard functions, amongst others, have remained unchallenged. In this article, we provide a framework based on R-INLA to apply competing risks joint models in a unifying way such that non-Gaussian longitudinal data, spatial structures, times-dependent splines and various latent association structures, to mention a few, are all embraced in our approach. Our motivation stems from the SANAD trial which exhibits non-linear longitudinal trajectories and competing risks for failure of treatment. We also present a discrete competing risks joint model for longitudinal count data as well as a spatial competing risks joint model as specific examples. Citation: Statistical Modelling PubDate: 2020-05-25T10:27:54Z DOI: 10.1177/1471082X19913654
Authors:Özgür Asar Abstract: Statistical Modelling, Ahead of Print. AbstractThis article is motivated by the panel surveys, called Statistics on Income and Living Conditions (SILC), conducted annually on (randomly selected) country representative households to monitor EU 2020 aims on poverty reduction. We particularly consider the surveys conducted in Turkey within the scope of integration to the EU. Our main interests are on health aspects of economic and living conditions. The outcome is self-reported health that is clustered longitudinal ordinal, since repeated measures of it are nested within individuals and individuals are nested within families. Economic and living conditions have been measured through a number of individual- and family-level explanatory variables. The questions of interest are on the marginal relationships between the outcome and covariates that we address using a polytomous logistic regression with Bridge distributed random effects. This choice of distribution allows us to directly obtain marginal inferences in the presence of random effects. Widely used Normal distribution is also considered as the random effects distribution. Samples from the joint posterior densities of parameters and random effects are drawn using Markov Chain Monte Carlo. Interesting findings from the public health point of view are that differences were found between the subgroups of employment status, income level and panel year in terms of odds of reporting better health. Citation: Statistical Modelling PubDate: 2020-05-25T06:03:03Z DOI: 10.1177/1471082X20920122
Authors:Danilo Alvares, Carmen Armero, Anabel Forte, Nicolas Chopin Abstract: Statistical Modelling, Ahead of Print. AbstractThe statistical analysis of the information generated by medical follow-up is a very important challenge in the field of personalized medicine. As the evolutionary course of a patient's disease progresses, his/her medical follow-up generates more and more information that should be processed immediately in order to review and update his/her prognosis and treatment. Hence, we focus on this update process through sequential inference methods for joint models of longitudinal and time-to-event data from a Bayesian perspective. More specifically, we propose the use of sequential Monte Carlo (SMC) methods for static parameter joint models with the intention of reducing computational time in each update of the full Bayesian inferential process. Our proposal is very general and can be easily applied to most popular joint models approaches. We illustrate the use of the presented sequential methodology in a joint model with competing risk events for a real scenario involving patients on mechanical ventilation in intensive care units (ICUs). Citation: Statistical Modelling PubDate: 2020-05-23T07:49:49Z DOI: 10.1177/1471082X20916088
Authors:Louise Marquart, Geert Verbeke Abstract: Statistical Modelling, Ahead of Print. AbstractThe conventional normality assumption for the random effects distribution in logistic mixed models can be too restrictive in some applications. In our data example of a longitudinal study modelling employment participation of Australian women, the random effects exhibit non-normality due to a potential mover–stayer scenario. In such a scenario, the women observed to remain in the same initial response state over the study period may consist of two subgroups: latent stayers—those with extremely small probability of transitioning response states—and latent movers, those with a probability of transitioning response states. The similarities between estimating the random effects using non-parametric approaches and mover–stayer models have previously been highlighted. We explore non-parametric approaches to model univariate and bivariate random effects in a potential mover–stayer scenario. As there are limited approaches available to fit the non-parametric maximum likelihood estimate for bivariate random effects in logistic mixed models, we implement the Vertex Exchange Method (VEM) to estimate the random effects in logistic mixed models. The approximation of the non-parametric maximum likelihood estimate derived by the VEM algorithm induces more flexibility of the random effects, identifying regions corresponding to potential latent stayers in the non-employment category in our data example. Citation: Statistical Modelling PubDate: 2020-05-23T07:43:49Z DOI: 10.1177/1471082X19889143
Authors:Alba Carballo, Maria Durban, Göran Kauermann, Dae-Jin Lee Abstract: Statistical Modelling, Ahead of Print. AbstractThere are two main approaches to carrying out prediction in the context of penalized regression: with low-rank basis and penalties or through the smooth mixed models. In this article, we give further insight in the case of P-splines showing the influence of the penalty on the prediction. In the context of mixed models, we can connect the new predicted values to the observed values through a joint normal distribution, which allows us to compute prediction intervals. In this work, we propose an alternative approach, called the extended mixed model approach, that allows us to fit and predict data simultaneously. The methodology is illustrated with two real datasets, one of them on aboveground biomass and the other on monthly sulphur dioxide ([math]) levels in a selection of monitoring sites in Europe. Citation: Statistical Modelling PubDate: 2020-02-28T05:30:31Z DOI: 10.1177/1471082X19896867
Authors:Zhihua Ma, Guanghui Chen Abstract: Statistical Modelling, Ahead of Print. AbstractMotivated by the China Health and Nutrition Survey (CHNS) data, a semiparametric latent variable model with a Dirichlet process (DP) mixtures prior on the latent variable is proposed to jointly analyse mixed binary and continuous responses. Non-ignorable missing covariates are considered through a selection model framework where a missing covariate model and a missing data mechanism model are included. The logarithm of the pseudo-marginal likelihood (LPML) is applied for selecting the priors, and the deviance information criterion measure focusing on the missing data mechanism model only is used for selecting different missing data mechanisms. A Bayesian index of local sensitivity to non-ignorability (ISNI) is extended to explore the local sensitivity of the parameters in our model. A simulation study is carried out to examine the empirical performance of the proposed methodology. Finally, the proposed model and the ISNI index are applied to analyse the CHNS data in the motivating example. Citation: Statistical Modelling PubDate: 2020-02-19T11:47:36Z DOI: 10.1177/1471082X19896688