Authors:Luca Merlo, Antonello Maruotti, Lea Petrella Abstract: Statistical Modelling, Ahead of Print. This article develops a two-part finite mixture quantile regression model for semi-continuous longitudinal data. The proposed methodology allows heterogeneity sources that influence the model for the binary response variable to also influence the distribution of the positive outcomes. As is common in the quantile regression literature, estimation and inference on the model parameters are based on the asymmetric Laplace distribution. Maximum likelihood estimates are obtained through the EM algorithm without parametric assumptions on the random effects distribution. In addition, a penalized version of the EM algorithm is presented to tackle the problem of variable selection. The proposed statistical method is applied to the well-known RAND Health Insurance Experiment dataset which gives further insights on its empirical behaviour. Citation: Statistical Modelling PubDate: 2021-04-07T11:42:24Z DOI: 10.1177/1471082X21993603

Authors:Alexandra Grand, Regina Dittrich Abstract: Statistical Modelling, Ahead of Print. This article proposes an alternative method of making comparative judgements in multivariate paired comparisons (PCs) where judgements about change are made directly by comparing an object at two time points for each of a series of attributes. The application deals with the design of shop window displays where products should be arranged by teams of vocational students according to aesthetic principles (attributes).The photos of the students’ window displays at time 1 (before feedback) and at time 2 (after feedback) were compared by judging each attribute as to whether it was fulfilled better at time 1 or at time 2. An advantage of this PC approach over an alternative of a scoring system is the possibility to assess even subtle changes of various aspects of attractiveness, which cannot easily be measured using a score. To analyse these data, we used earlier work which developed both a multivariate PC pattern model for multi-attribute data and a PC model over time and defined a multivariate PC model of changes (MPCC). The model can be fitted as a non-standard Poisson log-linear model and provides estimates of change for the three attributes for time 2 and we were able to check for possible interaction effects between these attributes. Citation: Statistical Modelling PubDate: 2021-04-01T06:43:10Z DOI: 10.1177/1471082X21995675

Authors:Yingying Zhang, Volodymyr Melnykov, Igor Melnykov Abstract: Statistical Modelling, Ahead of Print. A new approach to the analysis of heterogeneous categorical sequences is proposed. The first-order Markov model is employed in a finite mixture setting with initial state and transition probabilities being expressed as functions of time. The expectation–maximization algorithm approach to parameter estimation is implemented in the presence of positive equivalence constraints that determine which observations must be placed in the same class in the solution. The proposed model is applied to a dataset from the British Household Panel Survey to evaluate the association between the education background and life outcomes of study participants. The analysis of the survey data reveals many interesting relationships between the level of education and major life events. Citation: Statistical Modelling PubDate: 2021-03-09T04:31:41Z DOI: 10.1177/1471082X21989170

Authors:Alan Agresti, Francesco Bartolucci, Antonietta Mira Abstract: Statistical Modelling, Ahead of Print. We describe two interesting and innovative strands of Murray Aitkin's research publications, dealing with mixture models and with Bayesian inference. Of his considerable publications on mixture models, we focus on a nonparametric random effects approach in generalized linear mixed modelling, which has proven useful in a wide variety of applications. As an early proponent of ways of implementing the Bayesian paradigm, Aitkin proposed an alternative Bayes factor based on a posterior mean likelihood. We discuss these innovative approaches and some research lines motivated by them and also suggest future related methodological implementations. Citation: Statistical Modelling PubDate: 2021-02-09T03:14:34Z DOI: 10.1177/1471082X20981312

Authors:John Nicholson, Piotr Kokoszka, Robert Lund, Peter Kiessler, Julia Sharp Abstract: Statistical Modelling, Ahead of Print. We propose and estimate an alternating renewal model describing the propagation of anomalies in a backbone internet network in the United States. Internet anomalies, either caused by equipment malfunction, news events or malicious attacks, have been a focus of research in network engineering since the advent of the internet over 30 years ago. This article contributes to the understanding of statistical properties of the times between the arrivals of the anomalies, their duration and stochastic structure. Anomalous, or active, time periods are modelled as periods containing clusters or 1s, where 1 indicates a presence of an anomaly. The inactive periods consisting entirely of 0s dominate the 0–1 time series in every link. Since the active periods contain 0s, a separation parameter is introduced and estimated jointly with all other parameters of the model. Our statistical analysis shows that the integer-valued separation parameter and five other non-negative, scalar parameters satisfactorily describe all statistical properties of the observed 0–1 series. Citation: Statistical Modelling PubDate: 2021-02-02T06:06:13Z DOI: 10.1177/1471082X19983146

Authors:John Nicholson, Piotr Kokoszka, Robert Lund, Peter Kiessler, Julia Sharp Abstract: Statistical Modelling, Ahead of Print. We propose and estimate an alternating renewal model describing the propagation of anomalies in a backbone internet network in the United States. Internet anomalies, either caused by equipment malfunction, news events or malicious attacks, have been a focus of research in network engineering since the advent of the internet over 30 years ago. This article contributes to the understanding of statistical properties of the times between the arrivals of the anomalies, their duration and stochastic structure. Anomalous, or active, time periods are modelled as periods containing clusters or 1s, where 1 indicates a presence of an anomaly. The inactive periods consisting entirely of 0s dominate the 0–1 time series in every link. Since the active periods contain 0s, a separation parameter is introduced and estimated jointly with all other parameters of the model. Our statistical analysis shows that the integer-valued separation parameter and five other non-negative, scalar parameters satisfactorily describe all statistical properties of the observed 0–1 series. Citation: Statistical Modelling PubDate: 2021-01-22T11:55:22Z DOI: 10.1177/1471082X20983146

Authors:Gunther Schauberger, Gerhard Tutz Abstract: Statistical Modelling, Ahead of Print. Common random effects models for repeated measurements account for the heterogeneity in the population by including subject-specific intercepts or variable effects. They do not account for the heterogeneity in answering tendencies. For ordinal responses in particular, the tendency to choose extreme or middle responses can vary in the population. Extended models are proposed that account for this type of heterogeneity. Location effects as well as the tendency to extreme or middle responses are modelled as functions of explanatory variables. It is demonstrated that ignoring response styles may affect the accuracy of parameter estimates. An example demonstrates the applicability of the method. Citation: Statistical Modelling PubDate: 2021-01-06T10:47:57Z DOI: 10.1177/1471082X20978034

Authors:Lizbeth Naranjo, Emmanuel Lesaffre, Carlos J. Pérez Abstract: Statistical Modelling, Ahead of Print. Motivated by a longitudinal oral health study, the Signal-Tandmobiel® study, an inhomogeneous mixed hidden Markov model with continuous state-space is proposed to explain the caries disease process in children between 6 and 12 years of age. The binary caries experience outcomes are subject to misclassification. We modelled this misclassification process via a longitudinal latent continuous response subject to a measurement error process and showing a monotone behaviour. The baseline distributions of the unobservable continuous processes are defined as a function of the covariates through the specification of conditional distributions making use of the Markov property. In addition, random effects are considered to model the relationships among the multivariate responses. Our approach is in contrast with a previous approach working on the binary outcome scale. This method requires conditional independence of the possibly corrupted binary outcomes on the true binary outcomes. We assumed conditional independence on the latent scale, which is a weaker assumption than conditional independence on the binary scale. The aim of this article is therefore to show the properties of a model for a progressive longitudinal response with misclassification on the manifest scale but modelled on the latent scale. The model parameters are estimated in a Bayesian way using an efficient Markov chain Monte Carlo method. The model performance is shown through a simulation-based example, and the analysis of the motivating dataset is presented. Citation: Statistical Modelling PubDate: 2020-12-23T04:42:29Z DOI: 10.1177/1471082X20973473

Authors:Francesco Finazzi, Lucia Paci Abstract: Statistical Modelling, Ahead of Print. Localizing people across space and over time is a relevant and challenging problem in many modern applications. Smartphone ubiquity gives the opportunity to collect useful individual data as never before. In this work, the focus is on location data collected by smartphone applications. We propose a kernel-based density estimation approach that exploits cyclical spatio-temporal patterns of people to estimate the individual location density at any time, uncertainty included. Model parameters are estimated by maximum likelihood cross-validation. Unlike classic tracking methods designed for high spatio-temporal resolution data, the approach is suitable when location data are sparse in time and are affected by non-negligible errors. The approach is applied to location data collected by the Earthquake Network citizen science project which carries out a worldwide earthquake early warning system based on smartphones. The approach is parsimonious and is suitable to model location data gathered by any location-aware smartphone application. Citation: Statistical Modelling PubDate: 2020-12-22T05:12:40Z DOI: 10.1177/1471082X17870331

Authors:Avner Bar-Hen, Pierre Barbillon, Sophie Donnet Abstract: Statistical Modelling, Ahead of Print. Generalized multipartite networks consist in the joint observation of several networks implying some common pre-specified groups of individuals. Such complex networks arise commonly in social sciences, biology, ecology, etc. We propose a flexible probabilistic model named Multipartite Block Model (MBM) able to unravel the topology of multipartite networks by identifying clusters (blocks) of nodes sharing the same patterns of connectivity across the collection of networks they are involved in. The model parameters are estimated through a variational version of the Expectation–Maximization algorithm. The numbers of blocks are chosen using an Integrated Completed Likelihood criterion specifically designed for our model. A simulation study illustrates the robustness of the inference strategy. Finally, two datasets respectively issued from ecology and ethnobiology are analyzed with the MBM in order to illustrate its flexibility and its relevance for the analysis of real datasets.The inference procedure is implemented in an R-package GREMLIN, available on Github (https://github.com/Demiperimetre/GREMLINhttps://github.com/Demiperimetre/GREMLIN). Citation: Statistical Modelling PubDate: 2020-12-18T11:43:51Z DOI: 10.1177/1471082X20963254

Authors:Marc Schneble, Göran Kauermann Abstract: Statistical Modelling, Ahead of Print. Estimation of latent network flows is a common problem in statistical network analysis. The typical setting is that we know the margins of the network, that is, in- and outdegrees, but the flows are unobserved. In this article, we develop a mixed regression model to estimate network flows in a bike-sharing network if only the hourly differences of in- and outdegrees at bike stations are known. We also include exogenous covariates such as weather conditions. Two different parameterizations of the model are considered to estimate (a) the whole network flow and (b) the network margins only. The estimation of the model parameters is proposed via an iterative penalized maximum likelihood approach. This is exemplified by modelling network flows in the Vienna bike-sharing system. In order to evaluate our modelling approach, we conduct our analyses exploiting different distributional assumptions while we also respect the provider's interventions appropriately for keeping the estimation error low. Furthermore, a simulation study is conducted to show the performance of the model. For practical purposes, it is crucial to predict when and at which station there is a lack or an excess of bikes. For this application, our model shows to be well suited by providing quite accurate predictions. Citation: Statistical Modelling PubDate: 2020-12-15T10:57:08Z DOI: 10.1177/1471082X20971911

Authors:Zahra Mahdiyeh, Iraj Kazemi, Geert Verbeke Abstract: Statistical Modelling, Ahead of Print. This article introduces a flexible modelling strategy to extend the familiar mixed-effects models for analysing longitudinal responses in the multivariate setting. By initiating a flexible multivariate multimodal distribution, this strategy relaxes the imposed normality assumption of related random-effects. We use copulas to construct a multimodal form of elliptical distributions. It can deal with the multimodality of responses and the non-linearity of dependence structure. Moreover, the proposed model can flexibly accommodate clustered subject-effects for multiple longitudinal measurements. It is much useful when several subpopulations exist but cannot be directly identifiable. Since the implied marginal distribution is not in the closed form, to approximate the associated likelihood functions, we suggest a computational methodology based on the Gauss–Hermite quadrature that consequently enables us to implement standard optimization techniques. We conduct a simulation study to highlight the main properties of the theoretical part and make a comparison with regular mixture distributions. Results confirm that the new strategy deserves to receive attention in practice. We illustrate the usefulness of our model by the analysis of a real-life dataset taken from a low back pain study. Citation: Statistical Modelling PubDate: 2020-12-14T03:52:50Z DOI: 10.1177/1471082X20967168

Authors:Amani Almohaimeed, Jochen Einbeck Abstract: Statistical Modelling, Ahead of Print. Random effect models have been popularly used as a mainstream statistical technique over several decades; and the same can be said for response transformation models such as the Box–Cox transformation. The latter aims at ensuring that the assumptions of normality and of homoscedasticity of the response distribution are fulfilled, which are essential conditions for inference based on a linear model or a linear mixed model. However, methodology for response transformation and simultaneous inclusion of random effects has been developed and implemented only scarcely, and is so far restricted to Gaussian random effects. We develop such methodology, thereby not requiring parametric assumptions on the distribution of the random effects. This is achieved by extending the ‘Nonparametric Maximum Likelihood’ towards a ‘Nonparametric profile maximum likelihood’ technique, allowing to deal with overdispersion as well as two-level data scenarios. Citation: Statistical Modelling PubDate: 2020-12-14T03:52:10Z DOI: 10.1177/1471082X20966919

Authors:Grilli Leonardo, Francesca Marino Maria, Paccagnella Omar, Rampichini Carla Abstract: Statistical Modelling, Ahead of Print. The article is motivated by the analysis of the relationship between university student ratings and teacher practices and attitudes, which are measured via a set of binary and ordinal items collected by an innovative survey. The analysis is conducted through a two-level random intercept model, where student ratings are nested within teachers. The analysis must face two issues about the items measuring teacher practices and attitudes, which are level 2 predictors: (a) the items are severely affected by missingness due to teacher non-response and (b) there is redundancy in both the number of items and the number of categories of their measurement scale. We tackle the missing data issue by considering a multiple imputation strategy exploiting information at both student and teacher levels. For the redundancy issue, we rely on regularization techniques for ordinal predictors, also accounting for the multilevel data structure. The proposed solution addresses the problem at hand in an original way, and it can be applied whenever it is required to select level 2 predictors affected by missing values. The results obtained with the final model indicate that ratings on teacher ability to motivate students are related to certain teacher practices and attitudes. Citation: Statistical Modelling PubDate: 2020-10-23T06:19:53Z DOI: 10.1177/1471082X20949710

Authors:Denise Costantin, Andrea Sottosanti, Alessandra R. Brazzale, Denis Bastieri, JunHui Fan Abstract: Statistical Modelling, Ahead of Print. Identifying as yet undetected high-energy sources in the [math]-ray sky is one of the declared objectives of the Fermi Large Area Telescope (LAT) Collaboration. We develop a Bayesian mixture model which is capable of disentangling the high-energy extra-galactic sources present in a given sky region from the pervasive background radiation. We achieve this by combining two model components. The first component models the emission activity of the single sources and incorporates the instrument response function of the Fermi [math]-ray space telescope. The second component reliably reflects the current knowledge of the physical phenomena which underlie the [math]-ray background. The model parameters are estimated using a reversible jump MCMC algorithm, which simultaneously returns the number of detected sources, their locations and relative intensities, and the background component. Our proposal is illustrated using a sample of the Fermi LAT data. In the analysed sky region, our model correctly identifies 116 sources out of the 132 present. The detection rate and the estimated directions and intensities of the identified sources are largely unaffected by the number of detected sources. Citation: Statistical Modelling PubDate: 2020-09-28T09:43:34Z DOI: 10.1177/1471082X20947222

Authors:Dunfu Yang, Gyuhyeong Goh, Haiyan Wang Abstract: Statistical Modelling, Ahead of Print. In the context of high-dimensional multivariate linear regression, sparse reduced-rank regression (SRRR) provides a way to handle both variable selection and low-rank estimation problems. Although there has been extensive research on SRRR, statistical inference procedures that deal with the uncertainty due to variable selection and rank reduction are still limited. To fill this research gap, we develop a fully Bayesian approach to SRRR. A major difficulty that occurs in a fully Bayesian framework is that the dimension of parameter space varies with the selected variables and the reduced-rank. Due to the varying-dimensional problems, traditional Markov chain Monte Carlo (MCMC) methods such as Gibbs sampler and Metropolis-Hastings algorithm are inapplicable in our Bayesian framework. To address this issue, we propose a new posterior computation procedure based on the Laplace approximation within the collapsed Gibbs sampler. A key feature of our fully Bayesian method is that the model uncertainty is automatically integrated out by the proposed MCMC computation. The proposed method is examined via simulation study and real data analysis. Citation: Statistical Modelling PubDate: 2020-09-25T08:40:47Z DOI: 10.1177/1471082X20948697

Authors:Wagner H. Bonat, Ricardo R. Petterle, Priscilla Balbinot, Alexandre Mansur, Ruth Graf Abstract: Statistical Modelling, Ahead of Print. We propose a multivariate regression model to deal with multiple outcomes along with repeated measures in the context of longitudinal data analysis. Our model allows for flexible and interpretable modelling of the covariance structure within outcomes by using a linear combination of known matrices, while the generalized Kronecker product is employed to take into account the correlation between outcomes. We present maximum likelihood estimation along with extensions of the classical multivariate analysis of variance and multiple comparison hypothesis tests to deal with multivariate longitudinal data. The model and the associated multivariate hypothesis test are motivated by a prospective study conducted to compare three aesthetic eyelid surgery techniques, namely blepharoplasty, endoscopic forehead lift and endoscopic forehead lift associated with blepharoplasty. The effect of the techniques was assessed using measurements of a horizontal line through pupil centre and then three vertical lines, which go in direction to lateral canthus, middle pupil and medial canthus to the top of the brow. In this study, 30 female patients were randomly divided into three groups. Preoperative measurements were compared with postoperative measurements taken 30 days, 90 days and 10 years after the surgery. The presented multivariate model provided a better fit than its univariate counterpart. The results showed that the three surgery techniques tend to increase all considered outcomes in a long-term perspective, that is, from preoperative to 10 years postoperative evaluations. The only exception was for the outcome lateral eyebrow, for which the blepharoplasty had no significant effect. Citation: Statistical Modelling PubDate: 2020-09-21T05:15:44Z DOI: 10.1177/1471082X20943312

Authors:Meredith A. Ray, Dale Bowman, Ryan Csontos, Roy B. Van Arsdale, Hongmei Zhang Abstract: Statistical Modelling, Ahead of Print. Earthquakes are one of the deadliest natural disasters. Our study focuses on detecting temporal patterns of earthquakes occurring along intraplate faults in the New Madrid seismic zone (NMSZ) within the middle of the United States from 1996–2016. Based on the magnitude and location of each earthquake, we developed a Bayesian clustering method to group hypocentres such that each group shared the same temporal pattern of occurrence. We constructed a matrix-variate Dirichlet process prior to describe temporal trends in the space and to detect regions showing similar temporal patterns. Simulations were conducted to assess accuracy and performance of the proposed method and to compare to other commonly used clustering methods such as Kmean, Kmedian and partition-around-medoids. We applied the method to NMSZ data to identify clusters of temporal patterns, which represent areas of stress that are potentially migrating over time. This information can then be used to assist in the prediction of future earthquakes. Citation: Statistical Modelling PubDate: 2020-08-25T11:52:45Z DOI: 10.1177/1471082X20939767

Authors:Aris Perperoglou, Marianne Huebner Abstract: Statistical Modelling, Ahead of Print. In this work, we develop ‘quantile foliation’ to predict outcomes for one explanatory variable based on two covariates and varying quantiles. This is an extension of quantile sheets.Data from World Championships in Olympic weightlifting with athletes aged 13 to 90 are used to study performances across the life span. Weightlifters of all ages compete in body weight classes, and we study performance development for adolescents, age at peak performance and decline for Masters athletes who are 35 years or older.In prior studies, weightlifting performances were compared with a body mass adjustment formula developed using world records. Although intended for elite athletes with highest performances, this formula was applied to weightlifters of all ages, and age factors for Masters were estimated based on these body mass adjustments. A comparison of youth athletes’ performances for different body mass has not been done.With quantile foliation, it is possible to examine age-associated patterns of performance increase for youth and to study the decline after reaching the peak performance. This can be done for athletes with different body mass and different performance levels as measured by quantiles. R code and example data are available as supplementary materials. Citation: Statistical Modelling PubDate: 2020-08-25T06:12:19Z DOI: 10.1177/1471082X20940156

Authors:Mirko Signorelli, Pietro Spitali, Roula Tsonaka Abstract: Statistical Modelling, Ahead of Print. We present a new modelling approach for longitudinal overdispersed counts that is motivated by the increasing availability of longitudinal RNA-sequencing experiments. The distribution of RNA-seq counts typically exhibits overdispersion, zero-inflation and heavy tails; moreover, in longitudinal designs repeated measurements from the same subject are typically (positively) correlated. We propose a generalized linear mixed model based on the Poisson–Tweedie distribution that can flexibly handle each of the aforementioned features of longitudinal overdispersed counts. We develop a computational approach to accurately evaluate the likelihood of the proposed model and to perform maximum likelihood estimation. Our approach is implemented in the R package ptmixed, which can be freely downloaded from CRAN. We assess the performance of ptmixed on simulated data, and we present an application to a dataset with longitudinal RNA-sequencing measurements from healthy and dystrophic mice. The applicability of the Poisson–Tweedie mixed-effects model is not restricted to longitudinal RNA-seq data, but it extends to any scenario where non-independent measurements of a discrete overdispersed response variable are available. Citation: Statistical Modelling PubDate: 2020-08-25T04:56:31Z DOI: 10.1177/1471082X20936017

Authors:M. Menictas, T.H. Nolan, D.G. Simpson, M.P. Wand Abstract: Statistical Modelling, Ahead of Print. A two-level group-specific curve model is such that the mean response of each member of a group is a separate smooth function of a predictor of interest. The three-level extension is such that one grouping variable is nested within another one, and higher level extensions are analogous. Streamlined variational inference for higher level group-specific curve models is a challenging problem. We confront it by systematically working through two-level and then three-level cases and making use of the higher level sparse matrix infrastructure laid down in (Nolan and Wand (2020), ANZIAM Journal, doi: 10.1017/S1446181120000061). A motivation is analysis of data from ultrasound technology for which three-level group-specific curve models are appropriate. Whilst extension to the number of levels exceeding three is not covered explicitly, the pattern established by our systematic approach sheds light on what is required for even higher level group-specific curve models. Citation: Statistical Modelling PubDate: 2020-08-21T07:25:16Z DOI: 10.1177/1471082X20930894

Authors:Vito M.R. Muggeo, Federico Torretta, Paul H. C. Eilers, Mariangela Sciandra, Massimo Attanasio Abstract: Statistical Modelling, Ahead of Print. We propose an iterative algorithm to select the smoothing parameters in additive quantile regression, wherein the functional forms of the covariate effects are unspecified and expressed via B-spline bases with difference penalties on the spline coefficients. The proposed algorithm relies on viewing the penalized coefficients as random effects from the symmetric Laplace distribution, and it turns out to be very efficient and particularly attractive with multiple smooth terms. Through simulations we compare our proposal with some alternative approaches, including the traditional ones based on minimization of the Schwarz Information Criterion. A real-data analysis is presented to illustrate the method in practice. Citation: Statistical Modelling PubDate: 2020-07-18T05:46:08Z DOI: 10.1177/1471082X20929802

Authors:Halvard Arntzen, Lars Magnus Hvattum Abstract: Statistical Modelling, Ahead of Print. The main goal of this article is to compare the performance of team ratings and individual player ratings when trying to forecast match outcomes in association football. The well-known Elo rating system is used to calculate team ratings, whereas a variant of plus-minus ratings is used to rate individual players. For prediction purposes, two covariates are introduced. The first represents the pre-match difference in Elo ratings of the two teams competing, while the second is the average difference in individual ratings for the players in the starting line-ups of the two teams. Two different statistical models are used to generate forecasts. The first type is an ordered logit regression (OLR) model that directly outputs probabilities for each of the three possible match outcomes, namely home win, draw and away win. The second type is based on competing risk modelling and involves the estimation of scoring rates for the two competing teams. These scoring rates are used to derive match outcome probabilities using discrete event simulation. Both types of models can be used to generate pre-game forecasts, whereas the competing risk models can also be used for in-game predictions. Computational experiments indicate that there is no statistical difference in the prediction quality for pre-game forecasts between the OLR models and the competing risk models. It is also found that team ratings and player ratings perform about equally well when predicting match outcomes. However, forecasts made when using both team ratings and player ratings as covariates are significantly better than those based on only one of the ratings. Citation: Statistical Modelling PubDate: 2020-07-10T01:56:40Z DOI: 10.1177/1471082X20929881

Authors:Almond Stöcker, Sarah Brockhaus, Sophia Anna Schaffer, Benedikt von Bronk, Madeleine Opitz, Sonja Greven Abstract: Statistical Modelling, Ahead of Print. We extend generalized additive models for location, scale and shape (GAMLSS) to regression with functional response. This allows us to simultaneously model point-wise mean curves, variances and other distributional parameters of the response in dependence of various scalar and functional covariate effects. In addition, the scope of distributions is extended beyond exponential families. The model is fitted via gradient boosting, which offers inherent model selection and is shown to be suitable for both complex model structures and highly auto-correlated response curves. This enables us to analyse bacterial growth in Escherichia coli in a complex interaction scenario, fruitfully extending usual growth models. Citation: Statistical Modelling PubDate: 2020-06-11T04:34:51Z DOI: 10.1177/1471082X20917586

Authors:Janet van Niekerk, Haakon Bakka, Håvard Rue Abstract: Statistical Modelling, Ahead of Print. The methodological advancements made in the field of joint models are numerous. None the less, the case of competing risks joint models has largely been neglected, especially from a practitioner's point of view. In the relevant works on competing risks joint models, the assumptions of a Gaussian linear longitudinal series and proportional cause-specific hazard functions, amongst others, have remained unchallenged. In this article, we provide a framework based on R-INLA to apply competing risks joint models in a unifying way such that non-Gaussian longitudinal data, spatial structures, times-dependent splines and various latent association structures, to mention a few, are all embraced in our approach. Our motivation stems from the SANAD trial which exhibits non-linear longitudinal trajectories and competing risks for failure of treatment. We also present a discrete competing risks joint model for longitudinal count data as well as a spatial competing risks joint model as specific examples. Citation: Statistical Modelling PubDate: 2020-05-25T10:27:54Z DOI: 10.1177/1471082X19913654

Authors:Özgür Asar Abstract: Statistical Modelling, Ahead of Print. This article is motivated by the panel surveys, called Statistics on Income and Living Conditions (SILC), conducted annually on (randomly selected) country representative households to monitor EU 2020 aims on poverty reduction. We particularly consider the surveys conducted in Turkey within the scope of integration to the EU. Our main interests are on health aspects of economic and living conditions. The outcome is self-reported health that is clustered longitudinal ordinal, since repeated measures of it are nested within individuals and individuals are nested within families. Economic and living conditions have been measured through a number of individual- and family-level explanatory variables. The questions of interest are on the marginal relationships between the outcome and covariates that we address using a polytomous logistic regression with Bridge distributed random effects. This choice of distribution allows us to directly obtain marginal inferences in the presence of random effects. Widely used Normal distribution is also considered as the random effects distribution. Samples from the joint posterior densities of parameters and random effects are drawn using Markov Chain Monte Carlo. Interesting findings from the public health point of view are that differences were found between the subgroups of employment status, income level and panel year in terms of odds of reporting better health. Citation: Statistical Modelling PubDate: 2020-05-25T06:03:03Z DOI: 10.1177/1471082X20920122

Authors:Louise Marquart, Geert Verbeke Abstract: Statistical Modelling, Ahead of Print. The conventional normality assumption for the random effects distribution in logistic mixed models can be too restrictive in some applications. In our data example of a longitudinal study modelling employment participation of Australian women, the random effects exhibit non-normality due to a potential mover–stayer scenario. In such a scenario, the women observed to remain in the same initial response state over the study period may consist of two subgroups: latent stayers—those with extremely small probability of transitioning response states—and latent movers, those with a probability of transitioning response states. The similarities between estimating the random effects using non-parametric approaches and mover–stayer models have previously been highlighted. We explore non-parametric approaches to model univariate and bivariate random effects in a potential mover–stayer scenario. As there are limited approaches available to fit the non-parametric maximum likelihood estimate for bivariate random effects in logistic mixed models, we implement the Vertex Exchange Method (VEM) to estimate the random effects in logistic mixed models. The approximation of the non-parametric maximum likelihood estimate derived by the VEM algorithm induces more flexibility of the random effects, identifying regions corresponding to potential latent stayers in the non-employment category in our data example. Citation: Statistical Modelling PubDate: 2020-05-23T07:43:49Z DOI: 10.1177/1471082X19889143

Authors:Alba Carballo, Maria Durban, Göran Kauermann, Dae-Jin Lee Abstract: Statistical Modelling, Ahead of Print. There are two main approaches to carrying out prediction in the context of penalized regression: with low-rank basis and penalties or through the smooth mixed models. In this article, we give further insight in the case of P-splines showing the influence of the penalty on the prediction. In the context of mixed models, we can connect the new predicted values to the observed values through a joint normal distribution, which allows us to compute prediction intervals. In this work, we propose an alternative approach, called the extended mixed model approach, that allows us to fit and predict data simultaneously. The methodology is illustrated with two real datasets, one of them on aboveground biomass and the other on monthly sulphur dioxide ([math]) levels in a selection of monitoring sites in Europe. Citation: Statistical Modelling PubDate: 2020-02-28T05:30:31Z DOI: 10.1177/1471082X19896867

Authors:Zhihua Ma, Guanghui Chen Abstract: Statistical Modelling, Ahead of Print. Motivated by the China Health and Nutrition Survey (CHNS) data, a semiparametric latent variable model with a Dirichlet process (DP) mixtures prior on the latent variable is proposed to jointly analyse mixed binary and continuous responses. Non-ignorable missing covariates are considered through a selection model framework where a missing covariate model and a missing data mechanism model are included. The logarithm of the pseudo-marginal likelihood (LPML) is applied for selecting the priors, and the deviance information criterion measure focusing on the missing data mechanism model only is used for selecting different missing data mechanisms. A Bayesian index of local sensitivity to non-ignorability (ISNI) is extended to explore the local sensitivity of the parameters in our model. A simulation study is carried out to examine the empirical performance of the proposed methodology. Finally, the proposed model and the ISNI index are applied to analyse the CHNS data in the motivating example. Citation: Statistical Modelling PubDate: 2020-02-19T11:47:36Z DOI: 10.1177/1471082X19896688

Authors:Jennifer Pohle, Roland Langrock, Mihaela van der Schaar, Ruth King, Frants Havmand Jensen First page: 264 Abstract: Statistical Modelling, Ahead of Print. State-switching models such as hidden Markov models or Markov-switching regression models are routinely applied to analyse sequences of observations that are driven by underlying non-observable states. Coupled state-switching models extend these approaches to address the case of multiple observation sequences whose underlying state variables interact. In this article, we provide an overview of the modelling techniques related to coupling in state-switching models, thereby forming a rich and flexible statistical framework particularly useful for modelling correlated time series. Simulation experiments demonstrate the relevance of being able to account for an asynchronous evolution as well as interactions between the underlying latent processes. The models are further illustrated using two case studies related to (a) interactions between a dolphin mother and her calf as inferred from movement data and (b) electronic health record data collected on 696 patients within an intensive care unit. Citation: Statistical Modelling PubDate: 2020-10-21T12:18:30Z DOI: 10.1177/1471082X20956423