for Journals by Title or ISSN
for Articles by Keywords
Journal Cover Statistics in Medicine
  [SJR: 1.811]   [H-I: 131]   [137 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 0277-6715 - ISSN (Online) 1097-0258
   Published by John Wiley and Sons Homepage  [1597 journals]
  • A nonparametric method to detect increased frequencies of adverse drug
           reactions over time
    • Authors: Günter Heimann; Rossella Belleli, Jouni Kerman, Roland Fisch, Joseph Kahn, Sigrid Behr, Conny Berlin
      Abstract: Signal detection is routinely applied to spontaneous report safety databases in the pharmaceutical industry and by regulators. As an example, methods that search for increases in the frequencies of known adverse drug reactions for a given drug are routinely applied, and the results are reported to the health authorities on a regular basis. Such methods need to be sensitive to detect true signals even when some of the adverse drug reactions are rare. The methods need to be specific and account for multiplicity to avoid false positive signals when the list of known adverse drug reactions is long. To apply them as part of a routine process, the methods also have to cope with very diverse drugs (increasing or decreasing number of cases over time, seasonal patterns, very safe drugs versus drugs for life-threatening diseases). In this paper, we develop new nonparametric signal detection methods, directed at detecting differences between a reporting and a reference period, or trends within a reporting period. These methods are based on bootstrap and permutation distributions, and they combine statistical significance with clinical relevance. We conducted a large simulation study to understand the operating characteristics of the methods. Our simulations show that the new methods have good power and control the family-wise error rate at the specified level. Overall, in all scenarios that we explored, the method performs much better than our current standard in terms of power, and it generates considerably less false positive signals as compared to the current standard.
      PubDate: 2018-01-10T22:56:18.652762-05:
      DOI: 10.1002/sim.7593
  • Two-step estimation in ratio-of-mediator-probability weighted causal
           mediation analysis
    • Authors: Edward Bein; Jonah Deutsch, Guanglei Hong, Kristin E. Porter, Xu Qin, Cheng Yang
      Abstract: This study investigates appropriate estimation of estimator variability in the context of causal mediation analysis that employs propensity score-based weighting. Such an analysis decomposes the total effect of a treatment on the outcome into an indirect effect transmitted through a focal mediator and a direct effect bypassing the mediator. Ratio-of-mediator-probability weighting estimates these causal effects by adjusting for the confounding impact of a large number of pretreatment covariates through propensity score-based weighting. In step 1, a propensity score model is estimated. In step 2, the causal effects of interest are estimated using weights derived from the prior step's regression coefficient estimates. Statistical inferences obtained from this 2-step estimation procedure are potentially problematic if the estimated standard errors of the causal effect estimates do not reflect the sampling uncertainty in the estimation of the weights. This study extends to ratio-of-mediator-probability weighting analysis a solution to the 2-step estimation problem by stacking the score functions from both steps. We derive the asymptotic variance-covariance matrix for the indirect effect and direct effect 2-step estimators, provide simulation results, and illustrate with an application study. Our simulation results indicate that the sampling uncertainty in the estimated weights should not be ignored. The standard error estimation using the stacking procedure offers a viable alternative to bootstrap standard error estimation. We discuss broad implications of this approach for causal analysis involving propensity score-based weighting.
      PubDate: 2018-01-10T22:46:23.773443-05:
      DOI: 10.1002/sim.7581
  • Model validation and influence diagnostics for regression models with
           missing covariates
    • Authors: Paul W. Bernhardt
      Abstract: Missing covariate values are prevalent in regression applications. While an array of methods have been developed for estimating parameters in regression models with missing covariate data for a variety of response types, minimal focus has been given to validation of the response model and influence diagnostics. Previous research has mainly focused on estimating residuals for observations with missing covariates using expected values, after which specialized techniques are needed to conduct proper inference. We suggest a multiple imputation strategy that allows for the use of standard methods for residual analyses on the imputed data sets or a stacked data set. We demonstrate the suggested multiple imputation method by analyzing the Sleep in Mammals data in the context of a linear regression model and the New York Social Indicators Status data with a logistic regression model.
      PubDate: 2018-01-09T22:06:31.859514-05:
      DOI: 10.1002/sim.7584
  • Fridge: Focused fine‐tuning of ridge regression for personalized
    • Authors: Kristoffer H. Hellton; Nils Lid Hjort
      Abstract: Statistical prediction methods typically require some form of fine‐tuning of tuning parameter(s), with K‐fold cross‐validation as the canonical procedure. For ridge regression, there exist numerous procedures, but common for all, including cross‐validation, is that one single parameter is chosen for all future predictions. We propose instead to calculate a unique tuning parameter for each individual for which we wish to predict an outcome. This generates an individualized prediction by focusing on the vector of covariates of a specific individual. The focused ridge—fridge—procedure is introduced with a 2‐part contribution: First we define an oracle tuning parameter minimizing the mean squared prediction error of a specific covariate vector, and then we propose to estimate this tuning parameter by using plug‐in estimates of the regression coefficients and error variance parameter. The procedure is extended to logistic ridge regression by using parametric bootstrap. For high‐dimensional data, we propose to use ridge regression with cross‐validation as the plug‐in estimate, and simulations show that fridge gives smaller average prediction error than ridge with cross‐validation for both simulated and real data. We illustrate the new concept for both linear and logistic regression models in 2 applications of personalized medicine: predicting individual risk and treatment response based on gene expression data. The method is implemented in the R package fridge.
      PubDate: 2018-01-03T23:37:11.519786-05:
      DOI: 10.1002/sim.7576
  • Explained variation in shared frailty models
    • Authors: Andreas Gleiss; Michael Gnant, Michael Schemper
      Abstract: Explained variation measures the relative gain in predictive accuracy when prediction based on prognostic factors replaces unconditional prediction. The factors may be measured on different scales or may be of different types (dichotomous, qualitative, or continuous). Thus, explained variation permits to establish a ranking of the importance of factors, even if predictive accuracy is too low to be helpful in clinical practice. In this contribution, the explained variation measure by Schemper and Henderson (2000) is extended to accommodate random factors, such as center effects in multicenter studies. This permits a direct comparison of the importance of centers and of other prognostic factors. We develop this extension for a shared frailty Cox model and provide an SAS macro and an R function to facilitate its application. Interesting empirical properties of the variation explained by a random factor are explored by a Monte Carlo study. Advantages of the approach are exemplified by an Austrian multicenter study of colon cancer.
      PubDate: 2017-12-28T01:55:58.734381-05:
      DOI: 10.1002/sim.7592
  • Sample size determination for jointly testing a cause-specific hazard and
           the all-cause hazard in the presence of competing risks
    • Authors: Qing Yang; Wing K. Fung, Gang Li
      Abstract: This article considers sample size determination for jointly testing a cause-specific hazard and the all-cause hazard for competing risks data. The cause-specific hazard and the all-cause hazard jointly characterize important study end points such as the disease-specific survival and overall survival, which are commonly used as coprimary end points in clinical trials. Specifically, we derive sample size calculation methods for 2-group comparisons based on an asymptotic chi-square joint test and a maximum joint test of the aforementioned quantities, taking into account censoring due to lost to follow-up as well as staggered entry and administrative censoring. We illustrate the application of the proposed methods using the Die Deutsche Diabetes Dialyse Studies clinical trial. An R package “powerCompRisk” has been developed and made available at the CRAN R library.
      PubDate: 2017-12-27T22:51:29.096893-05:
      DOI: 10.1002/sim.7590
  • Semiparametric Bayesian models for evaluating time-variant driving risk
           factors using naturalistic driving data and case-crossover approach
    • Authors: Feng Guo; Inyong Kim, Sheila G. Klauer
      Abstract: Driver behavior is a major contributing factor for traffic crashes, a leading cause of death and injury in the United States. The naturalistic driving study (NDS) revolutionizes driver behavior research by using sophisticated nonintrusive in-vehicle instrumentation to continuously record driving data. This paper uses a case-crossover approach to evaluate driver-behavior risk. To properly model the unbalanced and clustered binary outcomes, we propose a semiparametric hierarchical mixed-effect model to accommodate both among-strata and within-stratum variations. This approach overcomes several major limitations of the standard models, eg, constant stratum effect assumption for conditional logistic model. We develop 2 methods to calculate the marginal conditional probability. We show the consistency of parameter estimation and asymptotic equivalence of alternative estimation methods. A simulation study indicates that the proposed model is more efficient and robust than alternatives. We applied the model to the 100-Car NDS data, a large-scale NDS with 102 participants and 12-month data collection. The results indicate that cell phone dialing increased the crash/near-crash risk by 2.37 times (odds ratio: 2.37, 95% CI, 1.30-4.30) and drowsiness increased the risk 33.56 times (odds ratio: 33.56, 95% CI, 21.82-52.19). This paper provides new insight into driver behavior risk and novel analysis strategies for NDS studies.
      PubDate: 2017-12-26T20:55:37.985908-05:
      DOI: 10.1002/sim.7574
  • A functional supervised learning approach to the study of blood pressure
    • Authors: Georgios I. Papayiannis; Emmanuel A. Giakoumakis, Efstathios D. Manios, Spyros D. Moulopoulos, Kimon S. Stamatelopoulos, Savvas T. Toumanidis, Nikolaos A. Zakopoulos, Athanasios N. Yannacopoulos
      Abstract: In this work, a functional supervised learning scheme is proposed for the classification of subjects into normotensive and hypertensive groups, using solely the 24‐hour blood pressure data, relying on the concepts of Fréchet mean and Fréchet variance for appropriate deformable functional models for the blood pressure data. The schemes are trained on real clinical data, and their performance was assessed and found to be very satisfactory.
      PubDate: 2017-12-20T00:31:15.88171-05:0
      DOI: 10.1002/sim.7587
  • Innovative modeling of naturalistic driving data: Inference and prediction
    • Authors: Paul S. Albert
      Abstract: Naturalistic driving studies provide opportunities for investigating the effects of key driving exposures on risky driving performance and accidents. New technology provides a realistic assessment of risky driving through the intensive monitoring of kinematic behavior while driving. These studies with their complex data structures provide opportunities for statisticians to develop needed modeling techniques for statistical inference. This article discusses new statistical modeling procedures that were developed to specifically answer important analytical questions for naturalistic driving studies. However, these methodologies also have important applications for the analysis of intensively collected longitudinal data, an increasingly common data structure with the advent of wearable devises. To examine the sources of variation between‐ and within‐participants in risky driving behavior, we explore the use of generalized linear mixed models with autoregressive random processes to analyzing long sequences of kinematic count data from a group of teenagers that have measurements at each trip over a 1.5‐year observation period starting after receiving their license. These models provide a regression framework for examining the effects of driving conditions and exposures on risky driving behavior. Alternatively, generalized estimating equations approaches are explored for the situation where we have intensively collected count measurements on a moderate number of participants. In addition to proposing statistical modeling for kinematic events, we explore models for relating kinematic events with crash risk. Specifically, we propose both latent variable and hidden Markov models for relating these 2 processes and for developing dynamic predictors of crash risk from longitudinal kinematic event data. These different statistical modeling techniques are all used to analyze data from the Naturalistic Teenage Driving Study, a unique investigation into how teenagers drive after licensure.
      PubDate: 2017-12-18T01:21:24.752935-05:
      DOI: 10.1002/sim.7580
  • Investigation of 2‐stage meta‐analysis methods for joint longitudinal
           and time‐to‐event data through simulation and real data application
    • Authors: Maria Sudell; Catrin Tudur Smith, François Gueyffier, Ruwanthi Kolamunnage-Dona
      Abstract: BackgroundJoint modelling of longitudinal and time‐to‐event data is often preferred over separate longitudinal or time‐to‐event analyses as it can account for study dropout, error in longitudinally measured covariates, and correlation between longitudinal and time‐to‐event outcomes. The joint modelling literature focuses mainly on the analysis of single studies with no methods currently available for the meta‐analysis of joint model estimates from multiple studies.MethodsWe propose a 2‐stage method for meta‐analysis of joint model estimates. These methods are applied to the INDANA dataset to combine joint model estimates of systolic blood pressure with time to death, time to myocardial infarction, and time to stroke. Results are compared to meta‐analyses of separate longitudinal or time‐to‐event models. A simulation study is conducted to contrast separate versus joint analyses over a range of scenarios.ResultsUsing the real dataset, similar results were obtained by using the separate and joint analyses. However, the simulation study indicated a benefit of use of joint rather than separate methods in a meta‐analytic setting where association exists between the longitudinal and time‐to‐event outcomes.ConclusionsWhere evidence of association between longitudinal and time‐to‐event outcomes exists, results from joint models over standalone analyses should be pooled in 2‐stage meta‐analyses.
      PubDate: 2017-12-18T01:00:39.502956-05:
      DOI: 10.1002/sim.7585
  • A multiple‐model generalisation of updating clinical prediction
    • Authors: Glen P. Martin; Mamas A. Mamas, Niels Peek, Iain Buchan, Matthew Sperrin
      Abstract: There is growing interest in developing clinical prediction models (CPMs) to aid local healthcare decision‐making. Frequently, these CPMs are developed in isolation across different populations, with repetitive de novo derivation a common modelling strategy. However, this fails to utilise all available information and does not respond to changes in health processes through time and space. Alternatively, model updating techniques have previously been proposed that adjust an existing CPM to suit the new population, but these techniques are restricted to a single model. Therefore, we aimed to develop a generalised method for updating and aggregating multiple CPMs. The proposed “hybrid method” re‐calibrates multiple CPMs using stacked regression while concurrently revising specific covariates using individual participant data (IPD) under a penalised likelihood. The performance of the hybrid method was compared with existing methods in a clinical example of mortality risk prediction after transcatheter aortic valve implantation, and in 2 simulation studies. The simulation studies explored the effect of sample size and between‐population‐heterogeneity on the method, with each representing a situation of having multiple distinct CPMs and 1 set of IPD. When the sample size of the IPD was small, stacked regression and the hybrid method had comparable but highest performance across modelling methods. Conversely, in large IPD samples, development of a new model and the hybrid method gave the highest performance. Hence, the proposed strategy can inform the choice between utilising existing CPMs or developing a model de novo, thereby incorporating IPD, existing research, and prior (clinical) knowledge into the modelling strategy.
      PubDate: 2017-12-18T01:00:28.786512-05:
      DOI: 10.1002/sim.7586
  • Developing points‐based risk‐scoring systems in the presence
           of competing risks
    • Authors: Peter C. Austin; Douglas S. Lee, Ralph B. D'Agostino, Jason P. Fine
      PubDate: 2017-12-18T00:26:10.805437-05:
      DOI: 10.1002/sim.7591
  • Dynamic prediction in functional concurrent regression with an application
           to child growth
    • Authors: Andrew Leroux; Luo Xiao, Ciprian Crainiceanu, William Checkley
      Abstract: In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject‐specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject‐specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques.
      PubDate: 2017-12-11T22:55:37.071622-05:
      DOI: 10.1002/sim.7582
  • Inverse probability weighting to control confounding in an illness-death
           model for interval-censored data
    • Authors: Florence Gillaizeau; Thomas Sénage, Florent Le Borgne, Thierry Le Tourneau, Jean-Christian Roussel, Karen Leffondrè, Raphaël Porcher, Bruno Giraudeau, Etienne Dantan, Yohann Foucher
      Abstract: Multistate models with interval-censored data, such as the illness-death model, are still not used to any considerable extent in medical research regardless of the significant literature demonstrating their advantages compared to usual survival models. Possible explanations are their uncommon availability in classical statistical software or, when they are available, by the limitations related to multivariable modelling to take confounding into consideration. In this paper, we propose a strategy based on propensity scores that allows population causal effects to be estimated: the inverse probability weighting in the illness semi-Markov model with interval-censored data. Using simulated data, we validated the performances of the proposed approach. We also illustrated the usefulness of the method by an application aiming to evaluate the relationship between the inadequate size of an aortic bioprosthesis and its degeneration or/and patient death. We have updated the R package multistate to facilitate the future use of this method.
      PubDate: 2017-12-04T00:20:26.540447-05:
      DOI: 10.1002/sim.7550
  • Considerations for analysis of time-to-event outcomes measured with error:
           Bias and correction with SIMEX
    • Authors: Eric J. Oh; Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw
      Abstract: For time-to-event outcomes, a rich literature exists on the bias introduced by covariate measurement error in regression models, such as the Cox model, and methods of analysis to address this bias. By comparison, less attention has been given to understanding the impact or addressing errors in the failure time outcome. For many diseases, the timing of an event of interest (such as progression-free survival or time to AIDS progression) can be difficult to assess or reliant on self-report and therefore prone to measurement error. For linear models, it is well known that random errors in the outcome variable do not bias regression estimates. With nonlinear models, however, even random error or misclassification can introduce bias into estimated parameters. We compare the performance of 2 common regression models, the Cox and Weibull models, in the setting of measurement error in the failure time outcome. We introduce an extension of the SIMEX method to correct for bias in hazard ratio estimates from the Cox model and discuss other analysis options to address measurement error in the response. A formula to estimate the bias induced into the hazard ratio by classical measurement error in the event time for a log-linear survival model is presented. Detailed numerical studies are presented to examine the performance of the proposed SIMEX method under varying levels and parametric forms of the error in the outcome. We further illustrate the method with observational data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.
      PubDate: 2017-11-29T00:55:54.019045-05:
      DOI: 10.1002/sim.7554
  • Time, frequency, and time‐varying Granger‐causality measures
           in neuroscience
    • Abstract: This article proposes a systematic methodological review and an objective criticism of existing methods enabling the derivation of time, frequency, and time‐varying Granger‐causality statistics in neuroscience. The capacity to describe the causal links between signals recorded at different brain locations during a neuroscience experiment is indeed of primary interest for neuroscientists, who often have very precise prior hypotheses about the relationships between recorded brain signals. The increasing interest and the huge number of publications related to this topic calls for this systematic review, which describes the very complex methodological aspects underlying the derivation of these statistics. In this article, we first present a general framework that allows us to review and compare Granger‐causality statistics in the time domain, and the link with transfer entropy. Then, the spectral and the time‐varying extensions are exposed and discussed together with their estimation and distributional properties. Although not the focus of this article, partial and conditional Granger causality, dynamical causal modelling, directed transfer function, directed coherence, partial directed coherence, and their variant are also mentioned.
  • Cross‐sectional versus longitudinal designs for function estimation,
           with an application to cerebral cortex development
    • Abstract: Motivated by studies of the development of the human cerebral cortex, we consider the estimation of a mean growth trajectory and the relative merits of cross‐sectional and longitudinal data for that task. We define a class of relative efficiencies that compare function estimates in terms of aggregate variance of a parametric function estimate. These generalize the classical design effect for estimating a scalar with cross‐sectional versus longitudinal data, and are shown to be bounded above by it in certain cases. Turning to nonparametric function estimation, we find that longitudinal fits may tend to have higher aggregate variance than cross‐sectional ones, but that this may occur because the former have higher effective degrees of freedom reflecting greater sensitivity to subtle features of the estimand. These ideas are illustrated with cortical thickness data from a longitudinal neuroimaging study.
  • Likelihood‐based analysis of outcome‐dependent sampling
           designs with longitudinal data
    • Abstract: The use of outcome‐dependent sampling with longitudinal data analysis has previously been shown to improve efficiency in the estimation of regression parameters. The motivating scenario is when outcome data exist for all cohort members but key exposure variables will be gathered only on a subset. Inference with outcome‐dependent sampling designs that also incorporates incomplete information from those individuals who did not have their exposure ascertained has been investigated for univariate but not longitudinal outcomes. Therefore, with a continuous longitudinal outcome, we explore the relative contributions of various sources of information toward the estimation of key regression parameters using a likelihood framework. We evaluate the efficiency gains that alternative estimators might offer over random sampling, and we offer insight into their relative merits in select practical scenarios. Finally, we illustrate the potential impact of design and analysis choices using data from the Cystic Fibrosis Foundation Patient Registry.
  • Causal mediation analysis with multiple mediators in the presence of
           treatment noncompliance
    • Abstract: Randomized experiments are often complicated because of treatment noncompliance. This challenge prevents researchers from identifying the mediated portion of the intention‐to‐treated (ITT) effect, which is the effect of the assigned treatment that is attributed to a mediator. One solution suggests identifying the mediated ITT effect on the basis of the average causal mediation effect among compliers when there is a single mediator. However, considering the complex nature of the mediating mechanisms, it is natural to assume that there are multiple variables that mediate through the causal path. Motivated by an empirical analysis of a data set collected in a randomized interventional study, we develop a method to estimate the mediated portion of the ITT effect when both multiple dependent mediators and treatment noncompliance exist. This enables researchers to make an informed decision on how to strengthen the intervention effect by identifying relevant mediators despite treatment noncompliance. We propose a nonparametric estimation procedure and provide a sensitivity analysis for key assumptions. We conduct a Monte Carlo simulation study to assess the finite sample performance of the proposed approach. The proposed method is illustrated by an empirical analysis of JOBS II data, in which a job training intervention was used to prevent mental health deterioration among unemployed individuals.
  • Discrimination surfaces with application to region‐specific brain
           asymmetry analysis
    • Abstract: Discrimination surfaces are here introduced as a diagnostic tool for localizing brain regions where discrimination between diseased and nondiseased participants is higher. To estimate discrimination surfaces, we introduce a Mann‐Whitney type of statistic for random fields and present large‐sample results characterizing its asymptotic behavior. Simulation results demonstrate that our estimator accurately recovers the true surface and corresponding interval of maximal discrimination. The empirical analysis suggests that in the anterior region of the brain, schizophrenic patients tend to present lower local asymmetry scores in comparison with participants in the control group.
  • Some methods for heterogeneous treatment effect estimation in high
    • Abstract: When devising a course of treatment for a patient, doctors often have little quantitative evidence on which to base their decisions, beyond their medical education and published clinical trials. Stanford Health Care alone has millions of electronic medical records that are only just recently being leveraged to inform better treatment recommendations. These data present a unique challenge because they are high dimensional and observational. Our goal is to make personalized treatment recommendations based on the outcomes for past patients similar to a new patient. We propose and analyze 3 methods for estimating heterogeneous treatment effects using observational data. Our methods perform well in simulations using a wide variety of treatment effect functions, and we present results of applying the 2 most promising methods to data from The SPRINT Data Analysis Challenge, from a large randomized trial of a treatment for high blood pressure.
  • Assessing the performance of the generalized propensity score for
           estimating the effect of quantitative or continuous exposures on binary
    • Abstract: Propensity score methods are increasingly being used to estimate the effects of treatments and exposures when using observational data. The propensity score was initially developed for use with binary exposures. The generalized propensity score (GPS) is an extension of the propensity score for use with quantitative or continuous exposures (eg, dose or quantity of medication, income, or years of education). We used Monte Carlo simulations to examine the performance of different methods of using the GPS to estimate the effect of continuous exposures on binary outcomes. We examined covariate adjustment using the GPS and weighting using weights based on the inverse of the GPS. We examined both the use of ordinary least squares to estimate the propensity function and the use of the covariate balancing propensity score algorithm. The use of methods based on the GPS was compared with the use of G‐computation. All methods resulted in essentially unbiased estimation of the population dose‐response function. However, GPS‐based weighting tended to result in estimates that displayed greater variability and had higher mean squared error when the magnitude of confounding was strong. Of the methods based on the GPS, covariate adjustment using the GPS tended to result in estimates with lower variability and mean squared error when the magnitude of confounding was strong. We illustrate the application of these methods by estimating the effect of average neighborhood income on the probability of death within 1 year of hospitalization for an acute myocardial infarction.
  • Issue Information
    • Abstract: No abstract is available for this article.
  • New research strategy with ambiguous implications: A comment on
           “Planning future studies based on the conditional power of a
  • A 3‐level Bayesian mixed effects location scale model with an
           application to ecological momentary assessment data
    • Abstract: Ecological momentary assessment studies usually produce intensively measured longitudinal data with large numbers of observations per unit, and research interest is often centered around understanding the changes in variation of people's thoughts, emotions and behaviors. Hedeker et al developed a 2‐level mixed effects location scale model that allows observed covariates as well as unobserved variables to influence both the mean and the within‐subjects variance, for a 2‐level data structure where observations are nested within subjects. In some ecological momentary assessment studies, subjects are measured at multiple waves, and within each wave, subjects are measured over time. Li and Hedeker extended the original 2‐level model to a 3‐level data structure where observations are nested within days and days are then nested within subjects, by including a random location and scale intercept at the intermediate wave level. However, the 3‐level random intercept model assumes constant response change rate for both the mean and variance. To account for changes in variance across waves, as well as clustering attributable to waves, we propose a more comprehensive location scale model that allows subject heterogeneity at baseline as well as across different waves, for a 3‐level data structure where observations are nested within waves and waves are then further nested within subjects. The model parameters are estimated using Markov chain Monte Carlo methods. We provide details on the Bayesian estimation approach and demonstrate how the Stan statistical software can be used to sample from the desired distributions and achieve consistent estimates. The proposed model is validated via a series of simulation studies. Data from an adolescent smoking study are analyzed to demonstrate this approach. The analyses clearly favor the proposed model and show significant subject heterogeneity at baseline as well as change over time, for both mood mean and variance. The proposed 3‐level location scale model can be widely applied to areas of research where the interest lies in the consistency in addition to the mean level of the responses.
  • Estimating the effect of a rare time‐dependent treatment on the
           recurrent event rate
    • Abstract: In many observational studies, the objective is to estimate the effect of treatment or state‐change on the recurrent event rate. If treatment is assigned after the start of follow‐up, traditional methods (eg, adjustment for baseline‐only covariates or fully conditional adjustment for time‐dependent covariates) may give biased results. We propose a two‐stage modeling approach using the method of sequential stratification to accurately estimate the effect of a time‐dependent treatment on the recurrent event rate. At the first stage, we estimate the pretreatment recurrent event trajectory using a proportional rates model censored at the time of treatment. Prognostic scores are estimated from the linear predictor of this model and used to match treated patients to as yet untreated controls based on prognostic score at the time of treatment for the index patient. The final model is stratified on matched sets and compares the posttreatment recurrent event rate to the recurrent event rate of the matched controls. We demonstrate through simulation that bias due to dependent censoring is negligible, provided the treatment frequency is low, and we investigate a threshold at which correction for dependent censoring is needed. The method is applied to liver transplant (LT), where we estimate the effect of development of post‐LT End Stage Renal Disease (ESRD) on rate of days hospitalized.
  • Estimation of age effect with change‐points on survival of cancer
    • Abstract: There is a global trend that the average onset age of many human complex diseases is decreasing, and the age of cancer patients becomes more spread out. The age effect on survival is nonlinear in practice and may have one or more important change‐points at which the trend of the effect can be very different before and after these threshold ages. Identification of these change‐points allows clinical researchers to understand the biologic basis for the complex relation between age and prognosis for optimal prognostic decision. This paper considers estimation of the potentially nonlinear age effect for general partly linear survival models to ensure a valid statistical inference on the treatment effect. A simple and efficient sieve maximum likelihood estimation method that can be implemented easily using standard statistical software is proposed. A data‐driven adaptive algorithm to determine the optimal location and the number of knots for the identification of the change‐points is suggested. Simulation studies are performed to study the performance of the proposed method. For illustration purpose, the method is applied to a breast cancer data set from the public domain to investigate the effect of onset age on the disease‐free survival of the patients. The results revealed that the risk is highest among young patients and young postmenopausal patients, probably because of a change in hormonal environment during a certain phase of menopause.
  • Variable selection with group structure in competing risks quantile
    • Abstract: We study the group bridge and the adaptive group bridge penalties for competing risks quantile regression with group variables. While the group bridge consistently identifies nonzero group variables, the adaptive group bridge consistently selects variables not only at group level but also at within‐group level. We allow the number of covariates to diverge as the sample size increases. The oracle property for both methods is also studied. The performance of the group bridge and the adaptive group bridge is compared in simulation and in a real data analysis. The simulation study shows that the adaptive group bridge selects nonzero within‐group variables more consistently than the group bridge. A bone marrow transplant study is provided as an example.
  • Spatiotemporal incidence rate data analysis by nonparametric regression
    • Abstract: To monitor the incidence rates of cancers, AIDS, cardiovascular diseases, and other chronic or infectious diseases, some global, national, and regional reporting systems have been built to collect/provide population‐based data about the disease incidence. Such databases usually report daily, monthly, or yearly disease incidence numbers at the city, county, state, or country level, and the disease incidence numbers collected at different places and different times are often correlated, with the ones closer in place or time being more correlated. The correlation reflects the impact of various confounding risk factors, such as weather, demographic factors, lifestyles, and other cultural and environmental factors. Because such impact is complicated and challenging to describe, the spatiotemporal (ST) correlation in the observed disease incidence data has complicated ST structure as well. Furthermore, the ST correlation is hidden in the observed data and cannot be observed directly. In the literature, there has been some discussion about ST data modeling. But, the existing methods either impose various restrictive assumptions on the ST correlation that are hard to justify, or ignore partially or entirely the ST correlation. This paper aims to develop a flexible and effective method for ST disease incidence data modeling, using nonparametric local smoothing methods. This method can properly accommodate the ST data correlation. Theoretical justifications and numerical studies show that it works well in practice.
  • Estimating recurrence and incidence of preterm birth subject to
           measurement error in gestational age: A hidden Markov modeling approach
    • Abstract: Prediction of preterm birth as well as characterizing the etiological factors affecting both the recurrence and incidence of preterm birth (defined as gestational age at birth ≤ 37 wk) are important problems in obstetrics. The National Institute of Child Health and Human Development (NICHD) consecutive pregnancy study recently examined this question by collecting data on a cohort of women with at least 2 pregnancies over a fixed time interval. Unfortunately, measurement error due to the dating of conception may induce sizable error in computing gestational age at birth. This article proposes a flexible approach that accounts for measurement error in gestational age when making inference. The proposed approach is a hidden Markov model that accounts for measurement error in gestational age by exploiting the relationship between gestational age at birth and birth weight. We initially model the measurement error as being normally distributed, followed by a mixture of normals that has been proposed on the basis of biological considerations. We examine the asymptotic bias of the proposed approach when measurement error is ignored and also compare the efficiency of this approach to a simpler hidden Markov model formulation where only gestational age and not birth weight is incorporated. The proposed model is compared with alternative models for estimating important covariate effects on the risk of subsequent preterm birth using a unique set of data from the NICHD consecutive pregnancy study.
  • Identifying optimal dosage regimes under safety constraints: An
           application to long term opioid treatment of chronic pain
    • Abstract: There is growing interest and investment in precision medicine as a means to provide the best possible health care. A treatment regime formalizes precision medicine as a sequence of decision rules, one per clinical intervention period, that specify if, when and how current treatment should be adjusted in response to a patient's evolving health status. It is standard to define a regime as optimal if, when applied to a population of interest, it maximizes the mean of some desirable clinical outcome, such as efficacy. However, in many clinical settings, a high‐quality treatment regime must balance multiple competing outcomes; eg, when a high dose is associated with substantial symptom reduction but a greater risk of an adverse event. We consider the problem of estimating the most efficacious treatment regime subject to constraints on the risk of adverse events. We combine nonparametric Q‐learning with policy‐search to estimate a high‐quality yet parsimonious treatment regime. This estimator applies to both observational and randomized data, as well as settings with variable, outcome‐dependent follow‐up, mixed treatment types, and multiple time points. This work is motivated by and framed in the context of dosing for chronic pain; however, the proposed framework can be applied generally to estimate a treatment regime which maximizes the mean of one primary outcome subject to constraints on one or more secondary outcomes. We illustrate the proposed method using data pooled from 5 open‐label flexible dosing clinical trials for chronic pain.
  • Sensitivity analysis for unobserved confounding of direct and indirect
           effects using uncertainty intervals
    • Abstract: To estimate direct and indirect effects of an exposure on an outcome from observed data, strong assumptions about unconfoundedness are required. Since these assumptions cannot be tested using the observed data, a mediation analysis should always be accompanied by a sensitivity analysis of the resulting estimates. In this article, we propose a sensitivity analysis method for parametric estimation of direct and indirect effects when the exposure, mediator, and outcome are all binary. The sensitivity parameters consist of the correlations between the error terms of the exposure, mediator, and outcome models. These correlations are incorporated into the estimation of the model parameters and identification sets are then obtained for the direct and indirect effects for a range of plausible correlation values. We take the sampling variability into account through the construction of uncertainty intervals. The proposed method is able to assess sensitivity to both mediator‐outcome confounding and confounding involving the exposure. To illustrate the method, we apply it to a mediation study based on the data from the Swedish Stroke Register (Riksstroke). An R package that implements the proposed method is available.
  • Sequential parallel comparison design with binary and
           time‐to‐event outcomes
    • Abstract: Sequential parallel comparison design (SPCD) has been proposed to increase the likelihood of success of clinical trials especially trials with possibly high placebo effect. Sequential parallel comparison design is conducted with 2 stages. Participants are randomized between active therapy and placebo in stage 1. Then, stage 1 placebo nonresponders are rerandomized between active therapy and placebo. Data from the 2 stages are pooled to yield a single P value. We consider SPCD with binary and with time‐to‐event outcomes. For time‐to‐event outcomes, response is defined as a favorable event prior to the end of follow‐up for a given stage of SPCD. We show that for these cases, the usual test statistics from stages 1 and 2 are asymptotically normal and uncorrelated under the null hypothesis, leading to a straightforward combined testing procedure. In addition, we show that the estimators of the treatment effects from the 2 stages are asymptotically normal and uncorrelated under the null and alternative hypothesis, yielding confidence interval procedures with correct coverage. Simulations and real data analysis demonstrate the utility of the binary and time‐to‐event SPCD.
  • A Bayesian semiparametric Markov regression model for juvenile
    • Abstract: Juvenile dermatomyositis (JDM) is a rare autoimmune disease that may lead to serious complications, even to death. We develop a 2‐state Markov regression model in a Bayesian framework to characterise disease progression in JDM over time and gain a better understanding of the factors influencing disease risk. The transition probabilities between disease and remission state (and vice versa) are a function of time‐homogeneous and time‐varying covariates. These latter types of covariates are introduced in the model through a latent health state function, which describes patient‐specific health over time and accounts for variability among patients. We assume a nonparametric prior based on the Dirichlet process to model the health state function and the baseline transition intensities between disease and remission state and vice versa. The Dirichlet process induces a clustering of the patients in homogeneous risk groups. To highlight clinical variables that most affect the transition probabilities, we perform variable selection using spike and slab prior distributions. Posterior inference is performed through Markov chain Monte Carlo methods. Data were made available from the UK JDM Cohort and Biomarker Study and Repository, hosted at the UCL Institute of Child Health.
  • Multilevel moderated mediation model with ordinal outcome
    • Abstract: Although increasingly complex models have been proposed in mediation literature, there is no model nor software that incorporates the multiple possible generalizations of the simple mediation model jointly. We propose a flexible moderated mediation model allowing for (1) a hierarchical structure of clustered data, (2) more and possibly correlated mediators, and (3) an ordinal outcome. The motivating data set is obtained from a European study in nursing research. Patients' willingness to recommend their treating hospital was recorded in an ordinal way. The research question is whether such recommendation directly depends on system‐level features in the organization of nursing care, or whether these associations are mediated by 2 measurements of nursing care left undone and possibly moderated by nurse education. We have developed a Bayesian approach and accompanying program that takes all the above generalizations into account.
  • Controlling the type I error rate in two‐stage sequential adaptive
           designs when testing for average bioequivalence
    • Abstract: In a 2×2 crossover trial for establishing average bioequivalence (ABE) of a generic agent and a currently marketed drug, the recommended approach to hypothesis testing is the two one‐sided test (TOST) procedure, which depends, among other things, on the estimated within‐subject variability. The power of this procedure, and therefore the sample size required to achieve a minimum power, depends on having a good estimate of this variability. When there is uncertainty, it is advisable to plan the design in two stages, with an interim sample size reestimation after the first stage, using an interim estimate of the within‐subject variability. One method and 3 variations of doing this were proposed by Potvin et al. Using simulation, the operating characteristics, including the empirical type I error rate, of the 4 variations (called Methods A, B, C, and D) were assessed by Potvin et al and Methods B and C were recommended. However, none of these 4 variations formally controls the type I error rate of falsely claiming ABE, even though the amount of inflation produced by Method C was considered acceptable. A major disadvantage of assessing type I error rate inflation using simulation is that unless all possible scenarios for the intended design and analysis are investigated, it is impossible to be sure that the type I error rate is controlled. Here, we propose an alternative, principled method of sample size reestimation that is guaranteed to control the type I error rate at any given significance level. This method uses a new version of the inverse‐normal combination of p‐values test, in conjunction with standard group sequential techniques, that is more robust to large deviations in initial assumptions regarding the variability of the pharmacokinetic endpoints. The sample size reestimation step is based on significance levels and power requirements that are conditional on the first‐stage results. This necessitates a discussion and exploitation of the peculiar properties of the power curve of the TOST testing procedure. We illustrate our approach with an example based on a real ABE study and compare the operating characteristics of our proposed method with those of Method B of Povin et al.
  • Simultaneous small‐sample comparisons in longitudinal or
           multi‐endpoint trials using multiple marginal models
    • Abstract: Simultaneous inference in longitudinal, repeated‐measures, and multi‐endpoint designs can be onerous, especially when trying to find a reasonable joint model from which the interesting effects and covariances are estimated. A novel statistical approach known as multiple marginal models greatly simplifies the modelling process: the core idea is to “marginalise” the problem and fit multiple small models to different portions of the data, and then estimate the overall covariance matrix in a subsequent, separate step. Using these estimates guarantees strong control of the family‐wise error rate, however only asymptotically. In this paper, we show how to make the approach also applicable to small‐sample data problems. Specifically, we discuss the computation of adjusted P values and simultaneous confidence bounds for comparisons of randomised treatment groups as well as for levels of a nonrandomised factor such as multiple endpoints, repeated measures, or a series of points in time or space. We illustrate the practical use of the method with a data example.
  • Evaluation of biomarkers for treatment selection using individual
           participant data from multiple clinical trials
    • Abstract: Biomarkers that predict treatment effects may be used to guide treatment decisions, thus improving patient outcomes. A meta‐analysis of individual participant data (IPD) is potentially more powerful than a single‐study data analysis in evaluating markers for treatment selection. Our study was motivated by the IPD that were collected from 2 randomized controlled trials of hypertension and preeclampsia among pregnant women to evaluate the effect of labor induction over expectant management of the pregnancy in preventing progression to severe maternal disease. The existing literature on statistical methods for biomarker evaluation in IPD meta‐analysis have evaluated a marker's performance in terms of its ability to predict risk of disease outcome, which do not directly apply to the treatment selection problem. In this study, we propose a statistical framework for evaluating a marker for treatment selection given IPD from a small number of individual clinical trials. We derive marker‐based treatment rules by minimizing the average expected outcome across studies. The application of the proposed methods to the IPD from 2 studies in women with hypertension in pregnancy is presented.
  • A threshold‐free summary index of prediction accuracy for censored
           time to event data
    • Abstract: Prediction performance of a risk scoring system needs to be carefully assessed before its adoption in clinical practice. Clinical preventive care often uses risk scores to screen asymptomatic population. The primary clinical interest is to predict the risk of having an event by a prespecified future time t0. Accuracy measures such as positive predictive values have been recommended for evaluating the predictive performance. However, for commonly used continuous or ordinal risk score systems, these measures require a subjective cutoff threshold value that dichotomizes the risk scores. The need for a cutoff value created barriers for practitioners and researchers. In this paper, we propose a threshold‐free summary index of positive predictive values that accommodates time‐dependent event status and competing risks. We develop a nonparametric estimator and provide an inference procedure for comparing this summary measure between 2 risk scores for censored time to event data. We conduct a simulation study to examine the finite‐sample performance of the proposed estimation and inference procedures. Lastly, we illustrate the use of this measure on a real data example, comparing 2 risk score systems for predicting heart failure in childhood cancer survivors.
  • A Bayesian confirmatory factor model for multivariate observations in the
           form of two‐way tables of data
    • Abstract: Researchers collected multiple measurements on patients with schizophrenia and their relatives, as well as control subjects and their relatives, to study vulnerability factors for schizophrenics and their near relatives. Observations across individuals from the same family are correlated, and also the multiple outcome measures on the same individuals are correlated. Traditional data analyses model outcomes separately and thus do not provide information about the interrelationships among outcomes. We propose a novel Bayesian family factor model (BFFM), which extends the classical confirmatory factor analysis model to explain the correlations among observed variables using a combination of family‐member and outcome factors. Traditional methods for fitting confirmatory factor analysis models, such as full‐information maximum likelihood (FIML) estimation using quasi‐Newton optimization (QNO), can have convergence problems and Heywood cases (lack of convergence) caused by empirical underidentification. In contrast, modern Bayesian Markov chain Monte Carlo handles these inference problems easily. Simulations compare the BFFM to FIML‐QNO in settings where the true covariance matrix is identified, close to not identified, and not identified. For these settings, FIML‐QNO fails to fit the data in 13%, 57%, and 85% of the cases, respectively, while MCMC provides stable estimates. When both methods successfully fit the data, estimates from the BFFM have smaller variances and comparable mean‐squared errors. We illustrate the BFFM by analyzing data on data from schizophrenics and their family members.
  • Testing causal effects in observational survival data using propensity
           score matching design
    • Abstract: Time‐to‐event data are very common in observational studies. Unlike randomized experiments, observational studies suffer from both observed and unobserved confounding biases. To adjust for observed confounding in survival analysis, the commonly used methods are the Cox proportional hazards (PH) model, the weighted logrank test, and the inverse probability of treatment weighted Cox PH model. These methods do not rely on fully parametric models, but their practical performances are highly influenced by the validity of the PH assumption. Also, there are few methods addressing the hidden bias in causal survival analysis. We propose a strategy to test for survival function differences based on the matching design and explore sensitivity of the P‐values to assumptions about unmeasured confounding. Specifically, we apply the paired Prentice‐Wilcoxon (PPW) test or the modified PPW test to the propensity score matched data. Simulation studies show that the PPW‐type test has higher power in situations when the PH assumption fails. For potential hidden bias, we develop a sensitivity analysis based on the matched pairs to assess the robustness of our finding, following Rosenbaum's idea for nonsurvival data. For a real data illustration, we apply our method to an observational cohort of chronic liver disease patients from a Mayo Clinic study. The PPW test based on observed data initially shows evidence of a significant treatment effect. But this finding is not robust, as the sensitivity analysis reveals that the P‐value becomes nonsignificant if there exists an unmeasured confounder with a small impact.
  • Subgroup identification in dose‐finding trials via model‐based
           recursive partitioning
    • Abstract: An important task in early‐phase drug development is to identify patients, which respond better or worse to an experimental treatment. While a variety of different subgroup identification methods have been developed for the situation of randomized clinical trials that study an experimental treatment and control, much less work has been done in the situation when patients are randomized to different dose groups. In this article, we propose new strategies to perform subgroup analyses in dose‐finding trials and discuss the challenges, which arise in this new setting. We consider model‐based recursive partitioning, which has recently been applied to subgroup identification in 2‐arm trials, as a promising method to tackle these challenges and assess its viability using a real trial example and simulations. Our results show that model‐based recursive partitioning can be used to identify subgroups of patients with different dose‐response curves and improves estimation of treatment effects and minimum effective doses compared to models ignoring possible subgroups, when heterogeneity among patients is present.
  • Optimizing performance of BreastScreen Norway using value of information
           in graphical models
    • Abstract: This study proposes a method to optimize the performance of BreastScreen Norway through a stratified recommendation of tests including independent double or single reading of the screening mammograms and additional imaging with or without core needle biopsy. This is carefully evaluated by a value of information analysis. An estimated graphical probabilistic model describing the relationship between a set of risk factors and the corresponding risk of breast cancer is used for this analysis, together with a Bayesian network modeling screening test results conditional on the true (but unknown) breast cancer status of a woman.This study contributes towards evaluating a possibility of improving the efficiency of the screening program, where all women aged 50 to 69 are invited every second year, regardless of individual risk factors. Our stratified recommendation of tests is dependent on the probability that an asymptomatic woman has developed breast cancer at the time she is invited to a screening.
  • Sample size evaluation for a multiply matched case‐control study using
           the score test from a conditional logistic (discrete Cox PH) regression
  • A recursive partitioning approach for subgroup identification in
           individual patient data meta‐analysis
    • Abstract: BackgroundMotivated by the setting of clinical trials in low back pain, this work investigated statistical methods to identify patient subgroups for which there is a large treatment effect (treatment by subgroup interaction). Statistical tests for interaction are often underpowered. Individual patient data (IPD) meta‐analyses provide a framework with improved statistical power to investigate subgroups. However, conventional approaches to subgroup analyses applied in both a single trial setting and an IPD setting have a number of issues, one of them being that factors used to define subgroups are investigated one at a time. As individuals have multiple characteristics that may be related to response to treatment, alternative exploratory statistical methods are required.MethodsTree‐based methods are a promising alternative that systematically searches the covariate space to identify subgroups defined by multiple characteristics. A tree method in particular, SIDES, is described and extended for application in an IPD meta‐analyses setting by incorporating fixed‐effects and random‐effects models to account for between‐trial variation. The performance of the proposed extension was assessed using simulation studies. The proposed method was then applied to an IPD low back pain dataset.ResultsThe simulation studies found that the extended IPD‐SIDES method performed well in detecting subgroups especially in the presence of large between‐trial variation. The IPD‐SIDES method identified subgroups with enhanced treatment effect when applied to the low back pain data.ConclusionsThis work proposes an exploratory statistical approach for subgroup analyses applicable in any research discipline where subgroup analyses in an IPD meta‐analysis setting are of interest.
  • Flexible multistate models for interval‐censored data: Specification,
           estimation, and an application to ageing research
    • Abstract: Continuous‐time multistate survival models can be used to describe health‐related processes over time. In the presence of interval‐censored times for transitions between the living states, the likelihood is constructed using transition probabilities. Models can be specified using parametric or semiparametric shapes for the hazards. Semiparametric hazards can be fitted using P‐splines and penalised maximum likelihood estimation. This paper presents a method to estimate flexible multistate models that allow for parametric and semiparametric hazard specifications. The estimation is based on a scoring algorithm. The method is illustrated with data from the English Longitudinal Study of Ageing.
  • Morning surge in blood pressure using a random‐effects
           multiple‐component cosinor model
    • Abstract: Blood pressure (BP) fluctuates throughout the day. The pattern it follows represents one of the most important circadian rhythms in the human body. For example, morning BP surge has been suggested as a potential risk factor for cardiovascular events occurring in the morning, but the accurate quantification of this phenomenon remains a challenge. Here, we outline a novel method to quantify morning surge. We demonstrate how the most commonly used method to model 24‐hour BP, the single cosinor approach, can be extended to a multiple‐component cosinor random‐effects model. We outline how this model can be used to obtain a measure of morning BP surge by obtaining derivatives of the model fit. The model is compared with a functional principal component analysis that determines the main components of variability in the data. Data from the Mitchelstown Study, a population‐based study of Irish adults (n = 2047), were used where a subsample (1207) underwent 24‐hour ambulatory blood pressure monitoring. We demonstrate that our 2‐component model provided a significant improvement in fit compared with a single model and a similar fit to a more complex model captured by b‐splines using functional principal component analysis. The estimate of the average maximum slope was 2.857 mmHg/30 min (bootstrap estimates; 95% CI: 2.855‐2.858 mmHg/30 min). Simulation results allowed us to quantify the between‐individual SD in maximum slopes, which was 1.02 mmHg/30 min. By obtaining derivatives we have demonstrated a novel approach to quantify morning BP surge and its variation between individuals. This is the first demonstration of cosinor approach to obtain a measure of morning surge.
  • Improving estimation and prediction in linear regression incorporating
           external information from an established reduced model
    • Abstract: We consider a situation where there is rich historical data available for the coefficients and their standard errors in a linear regression model describing the association between a continuous outcome variable Y and a set of predicting factors X, from a large study. We would like to use this summary information for improving inference in an expanded model of interest, Y given X,B. The additional variable B is a new biomarker, measured on a small number of subjects in a new dataset. We formulate the problem in an inferential framework where the historical information is translated in terms of nonlinear constraints on the parameter space and propose both frequentist and Bayes solutions to this problem. We show that a Bayesian transformation approach proposed by Gunn and Dunson is a simple and effective computational method to conduct approximate Bayesian inference for this constrained parameter problem. The simulation results comparing these methods indicate that historical information on E(Y X) can improve the efficiency of estimation and enhance the predictive power in the regression model of interest E(Y X,B). We illustrate our methodology by enhancing a published prediction model for bone lead levels in terms of blood lead and other covariates, with a new biomarker defined through a genetic risk score.
  • Five criteria for using a surrogate endpoint to predict treatment effect
           based on data from multiple previous trials
  • Meta‐analysis of Gaussian individual patient data: two‐stage
           or not two‐stage'
    • Abstract: Quantitative evidence synthesis through meta‐analysis is central to evidence‐based medicine. For well‐documented reasons, the meta‐analysis of individual patient data is held in higher regard than aggregate data. With access to individual patient data, the analysis is not restricted to a “two‐stage” approach (combining estimates and standard errors) but can estimate parameters of interest by fitting a single model to all of the data, a so‐called “one‐stage” analysis. There has been debate about the merits of one‐ and two‐stage analysis. Arguments for one‐stage analysis have typically noted that a wider range of models can be fitted and overall estimates may be more precise. The two‐stage side has emphasised that the models that can be fitted in two stages are sufficient to answer the relevant questions, with less scope for mistakes because there are fewer modelling choices to be made in the two‐stage approach. For Gaussian data, we consider the statistical arguments for flexibility and precision in small‐sample settings. Regarding flexibility, several of the models that can be fitted only in one stage may not be of serious interest to most meta‐analysis practitioners. Regarding precision, we consider fixed‐ and random‐effects meta‐analysis and see that, for a model making certain assumptions, the number of stages used to fit this model is irrelevant; the precision will be approximately equal. Meta‐analysts should choose modelling assumptions carefully. Sometimes relevant models can only be fitted in one stage. Otherwise, meta‐analysts are free to use whichever procedure is most convenient to fit the identified model.
  • Promotion time cure rate model with nonparametric form of covariate
    • Abstract: Survival data with a cured portion are commonly seen in clinical trials. Motivated from a biological interpretation of cancer metastasis, promotion time cure model is a popular alternative to the mixture cure rate model for analyzing such data. The existing promotion cure models all assume a restrictive parametric form of covariate effects, which can be incorrectly specified especially at the exploratory stage. In this paper, we propose a nonparametric approach to modeling the covariate effects under the framework of promotion time cure model. The covariate effect function is estimated by smoothing splines via the optimization of a penalized profile likelihood. Point‐wise interval estimates are also derived from the Bayesian interpretation of the penalized profile likelihood. Asymptotic convergence rates are established for the proposed estimates. Simulations show excellent performance of the proposed nonparametric method, which is then applied to a melanoma study.
  • The effect of risk factor misclassification on the partial population
           attributable risk
    • Abstract: The partial population attributable risk (pPAR) is used to quantify the population‐level impact of preventive interventions in a multifactorial disease setting. In this paper, we consider the effect of nondifferential risk factor misclassification on the direction and magnitude of bias of pPAR estimands and related quantities. We found that the bias in the uncorrected pPAR depends nonlinearly and nonmonotonically on the sensitivities, specificities, relative risks, and joint prevalence of the exposure of interest and background risk factors, as well as the associations between these factors. The bias in the uncorrected pPAR is most dependent on the sensitivity of the exposure. The magnitude of bias varies over a large range, and in a small region of the parameter space determining the pPAR, the direction of bias is away from the null. In contrast, the crude PAR can only be unbiased or biased towards the null by risk factor misclassification. The semiadjusted PAR is calculated using the formula for the crude PAR but plugs in the multivariate‐adjusted relative risk. Because the crude and semiadjusted PARs continue to be used in public health research, we also investigated the magnitude and direction of the bias that may arise when using these formulae instead of the pPAR. These PAR estimators and their uncorrected counterparts were calculated in a study of risk factors for colorectal cancer in the Health Professionals Follow‐up Study, where it was found that because of misclassification, the pPAR for low folate intake was overestimated with a relative bias of 48%, when red meat and alcohol intake were treated as misclassified risk factors that are not modified, and when red meat was treated as the modifiable risk factor, the estimated value of the pPAR went from 14% to 60%, further illustrating the extent to which misclassification can bias estimates of the pPAR.
  • Controlled pattern imputation for sensitivity analysis of longitudinal
           binary and ordinal outcomes with nonignorable dropout
    • Abstract: The controlled imputation method refers to a class of pattern mixture models that have been commonly used as sensitivity analyses of longitudinal clinical trials with nonignorable dropout in recent years. These pattern mixture models assume that participants in the experimental arm after dropout have similar response profiles to the control participants or have worse outcomes than otherwise similar participants who remain on the experimental treatment. In spite of its popularity, the controlled imputation has not been formally developed for longitudinal binary and ordinal outcomes partially due to the lack of a natural multivariate distribution for such endpoints. In this paper, we propose 2 approaches for implementing the controlled imputation for binary and ordinal data based respectively on the sequential logistic regression and the multivariate probit model. Efficient Markov chain Monte Carlo algorithms are developed for missing data imputation by using the monotone data augmentation technique for the sequential logistic regression and a parameter‐expanded monotone data augmentation scheme for the multivariate probit model. We assess the performance of the proposed procedures by simulation and the analysis of a schizophrenia clinical trial and compare them with the fully conditional specification, last observation carried forward, and baseline observation carried forward imputation methods.
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-