HOME > Journal Current TOC
Statistics in Medicine [SJR: 1.811] [H-I: 131] [130 followers] Follow Hybrid journal (It can contain Open Access articles) ISSN (Print) 0277-6715 - ISSN (Online) 1097-0258 Published by John Wiley and Sons [1589 journals] |
- Multivariate space-time modelling of multiple air pollutants and their
health effects accounting for exposure uncertainty- Authors: Guowen Huang; Duncan Lee, E. Marian Scott
Abstract: The long-term health effects of air pollution are often estimated using a spatio-temporal ecological areal unit study, but this design leads to the following statistical challenges: (1) how to estimate spatially representative pollution concentrations for each areal unit; (2) how to allow for the uncertainty in these estimated concentrations when estimating their health effects; and (3) how to simultaneously estimate the joint effects of multiple correlated pollutants. This article proposes a novel 2-stage Bayesian hierarchical model for addressing these 3 challenges, with inference based on Markov chain Monte Carlo simulation. The first stage is a multivariate spatio-temporal fusion model for predicting areal level average concentrations of multiple pollutants from both monitored and modelled pollution data. The second stage is a spatio-temporal model for estimating the health impact of multiple correlated pollutants simultaneously, which accounts for the uncertainty in the estimated pollution concentrations. The novel methodology is motivated by a new study of the impact of both particulate matter and nitrogen dioxide concentrations on respiratory hospital admissions in Scotland between 2007 and 2011, and the results suggest that both pollutants exhibit substantial and independent health effects.
PubDate: 2017-12-04T00:55:42.775596-05:
DOI: 10.1002/sim.7570
- Authors: Guowen Huang; Duncan Lee, E. Marian Scott
- An R2-curve for evaluating the accuracy of dynamic predictions
- Authors: Marie-Cécile Fournier; Etienne Dantan, Paul Blanche
Abstract: In the context of chronic diseases, patient's health evolution is often evaluated through the study of longitudinal markers and major clinical events such as relapses or death. Dynamic predictions of such types of events may be useful to improve patients management all along their follow-up. Dynamic predictions consist of predictions that are based on information repeatedly collected over time, such as measurements of a biomarker, and that can be updated as soon as new information becomes available. Several techniques to derive dynamic predictions have already been suggested, and computation of dynamic predictions is becoming increasingly popular.In this work, we focus on assessing predictive accuracy of dynamic predictions and suggest that using an R2-curve may help. It facilitates the evaluation of the predictive accuracy gain obtained when accumulating information on a patient's health profile over time. A nonparametric inverse probability of censoring weighted estimator is suggested to deal with censoring. Large sample results are provided, and methods to compute confidence intervals and bands are derived. A simulation study assesses the finite sample size behavior of the inference procedures and illustrates the shape of some R2-curves which can be expected in common settings. A detailed application to kidney transplant data is also presented.
PubDate: 2017-12-04T00:55:27.538668-05:
DOI: 10.1002/sim.7571
- Authors: Marie-Cécile Fournier; Etienne Dantan, Paul Blanche
- Bayesian inference for unidirectional misclassification of a binary
response trait- Authors: Michelle Xia; Paul Gustafson
Abstract: When assessing association between a binary trait and some covariates, the binary response may be subject to unidirectional misclassification. Unidirectional misclassification can occur when revealing a particular level of the trait is associated with a type of cost, such as a social desirability or financial cost. The feasibility of addressing misclassification is commonly obscured by model identification issues. The current paper attempts to study the efficacy of inference when the binary response variable is subject to unidirectional misclassification. From a theoretical perspective, we demonstrate that the key model parameters possess identifiability, except for the case with a single binary covariate. From a practical standpoint, the logistic model with quantitative covariates can be weakly identified, in the sense that the Fisher information matrix may be near singular. This can make learning some parameters difficult under certain parameter settings, even with quite large samples. In other cases, the stronger identification enables the model to provide more effective adjustment for unidirectional misclassification. An extension to the Poisson approximation of the binomial model reveals the identifiability of the Poisson and zero-inflated Poisson models. For fully identified models, the proposed method adjusts for misclassification based on learning from data. For binary models where there is difficulty in identification, the method is useful for sensitivity analyses on the potential impact from unidirectional misclassification.
PubDate: 2017-12-04T00:51:19.859055-05:
DOI: 10.1002/sim.7555
- Authors: Michelle Xia; Paul Gustafson
- Induced smoothing for rank-based regression with recurrent gap time data
- Authors: Tianmeng Lyu; Xianghua Luo, Gongjun Xu, Chiung-Yu Huang
Abstract: Various semiparametric regression models have recently been proposed for the analysis of gap times between consecutive recurrent events. Among them, the semiparametric accelerated failure time (AFT) model is especially appealing owing to its direct interpretation of covariate effects on the gap times. In general, estimation of the semiparametric AFT model is challenging because the rank-based estimating function is a nonsmooth step function. As a result, solutions to the estimating equations do not necessarily exist. Moreover, the popular resampling-based variance estimation for the AFT model requires solving rank-based estimating equations repeatedly and hence can be computationally cumbersome and unstable. In this paper, we extend the induced smoothing approach to the AFT model for recurrent gap time data. Our proposed smooth estimating function permits the application of standard numerical methods for both the regression coefficients estimation and the standard error estimation. Large-sample properties and an asymptotic variance estimator are provided for the proposed method. Simulation studies show that the proposed method outperforms the existing nonsmooth rank-based estimating function methods in both point estimation and variance estimation. The proposed method is applied to the data analysis of repeated hospitalizations for patients in the Danish Psychiatric Center Register.
PubDate: 2017-12-04T00:45:41.740261-05:
DOI: 10.1002/sim.7564
- Authors: Tianmeng Lyu; Xianghua Luo, Gongjun Xu, Chiung-Yu Huang
- A joint marginal-conditional model for multivariate longitudinal data
- Authors: James Proudfoot; Walter Faig, Loki Natarajan, Ronghui Xu
Abstract: Multivariate longitudinal data frequently arise in biomedical applications; however, their analyses are often performed one outcome at a time, or jointly using existing software in an ad hoc fashion. A main challenge in the proper analysis of such data is the fact that the different outcomes are measured on different unknown scales. Methodology for handling the scale problem has been previously proposed for cross-sectional data, and here we extend it to the longitudinal setting. We consider modeling the longitudinal data using random effects, while leaving the joint distribution of the multiple outcomes unspecified. We propose an estimating equation together with an expectation-maximization–type (expectation-substitution) algorithm. The consistency and the asymptotic distribution of the parameter estimates are established. The method is evaluated using extensive simulations and applied to a longitudinal nutrition data set from a large dietary intervention trial on breast cancer survivors, the Women's Healthy Eating and Living Study.
PubDate: 2017-12-04T00:40:31.762658-05:
DOI: 10.1002/sim.7552
- Authors: James Proudfoot; Walter Faig, Loki Natarajan, Ronghui Xu
- Analysis of the U.S. patient referral network
- Authors: Chuankai An; A. James O'Malley, Daniel N. Rockmore, Corey D. Stock
Abstract: In this paper, we analyze the US Patient Referral Network (also called the Shared Patient Network) and various subnetworks for the years 2009 to 2015. In these networks, two physicians are linked if a patient encounters both of them within a specified time interval, according to the data made available by the Centers for Medicare and Medicaid Services. We find power law distributions on most state-level data as well as a core-periphery structure. On a national and state level, we discover a so-called small-world structure as well as a “gravity law” of the type found in some large-scale economic networks. Some physicians play the role of hubs for interstate referral. Strong correlations between certain network statistics with health care system statistics at both the state and national levels are discovered. The patterns in the referral network evinced using several statistical analyses involving key metrics derived from the network illustrate the potential for using network analysis to provide new insights into the health care system and opportunities or mechanisms for catalyzing improvements.
PubDate: 2017-12-04T00:36:16.956551-05:
DOI: 10.1002/sim.7565
- Authors: Chuankai An; A. James O'Malley, Daniel N. Rockmore, Corey D. Stock
- Relative efficiency of precision medicine designs for clinical trials with
predictive biomarkers- Authors: Weichung Joe Shih; Yong Lin
Abstract: Prospective randomized clinical trials addressing biomarkers are time consuming and costly, but are necessary for regulatory agencies to approve new therapies with predictive biomarkers. For this reason, recently, there have been many discussions and proposals of various trial designs and comparisons of their efficiency in the literature. We compare statistical efficiencies between the marker-stratified design and the marker-based precision medicine design regarding testing/estimating 4 hypotheses/parameters of clinical interest, namely, treatment effects in each marker-positive and marker-negative cohorts, marker-by-treatment interaction, and the marker's clinical utility. As may be expected, the stratified design is more efficient than the precision medicine design. However, it is perhaps surprising to find out how low the relative efficiency can be for the precision medicine design. We quantify the relative efficiency as a function of design factors including the marker-positive prevalence rate, marker assay and classification sensitivity and specificity, and the treatment randomization ratio. It is interesting to examine the trends of the relative efficiency with these design parameters in testing different hypotheses. We advocate to use the stratified design over the precision medicine design in clinical trials with predictive biomarkers.
PubDate: 2017-12-04T00:31:43.654999-05:
DOI: 10.1002/sim.7562
- Authors: Weichung Joe Shih; Yong Lin
- Inverse probability weighting to control confounding in an illness-death
model for interval-censored data- Authors: Florence Gillaizeau; Thomas Sénage, Florent Le Borgne, Thierry Le Tourneau, Jean-Christian Roussel, Karen Leffondrè, Raphaël Porcher, Bruno Giraudeau, Etienne Dantan, Yohann Foucher
Abstract: Multistate models with interval-censored data, such as the illness-death model, are still not used to any considerable extent in medical research regardless of the significant literature demonstrating their advantages compared to usual survival models. Possible explanations are their uncommon availability in classical statistical software or, when they are available, by the limitations related to multivariable modelling to take confounding into consideration. In this paper, we propose a strategy based on propensity scores that allows population causal effects to be estimated: the inverse probability weighting in the illness semi-Markov model with interval-censored data. Using simulated data, we validated the performances of the proposed approach. We also illustrated the usefulness of the method by an application aiming to evaluate the relationship between the inadequate size of an aortic bioprosthesis and its degeneration or/and patient death. We have updated the R package multistate to facilitate the future use of this method.
PubDate: 2017-12-04T00:20:26.540447-05:
DOI: 10.1002/sim.7550
- Authors: Florence Gillaizeau; Thomas Sénage, Florent Le Borgne, Thierry Le Tourneau, Jean-Christian Roussel, Karen Leffondrè, Raphaël Porcher, Bruno Giraudeau, Etienne Dantan, Yohann Foucher
- Joint mixed-effects models for causal inference with longitudinal data
- Authors: Michelle Shardell; Luigi Ferrucci
Abstract: Causal inference with observational longitudinal data and time-varying exposures is complicated due to the potential for time-dependent confounding and unmeasured confounding. Most causal inference methods that handle time-dependent confounding rely on either the assumption of no unmeasured confounders or the availability of an unconfounded variable that is associated with the exposure (eg, an instrumental variable). Furthermore, when data are incomplete, validity of many methods often depends on the assumption of missing at random. We propose an approach that combines a parametric joint mixed-effects model for the study outcome and the exposure with g-computation to identify and estimate causal effects in the presence of time-dependent confounding and unmeasured confounding. G-computation can estimate participant-specific or population-average causal effects using parameters of the joint model. The joint model is a type of shared parameter model where the outcome and exposure-selection models share common random effect(s). We also extend the joint model to handle missing data and truncation by death when missingness is possibly not at random. We evaluate the performance of the proposed method using simulation studies and compare the method to both linear mixed- and fixed-effects models combined with g-computation as well as to targeted maximum likelihood estimation. We apply the method to an epidemiologic study of vitamin D and depressive symptoms in older adults and include code using SAS PROC NLMIXED software to enhance the accessibility of the method to applied researchers.
PubDate: 2017-12-04T00:15:43.960864-05:
DOI: 10.1002/sim.7567
- Authors: Michelle Shardell; Luigi Ferrucci
- A weighted combined effect measure for the analysis of a composite
time-to-first-event endpoint with components of different clinical
relevance- Authors: Geraldine Rauch; Kevin Kunzmann, Meinhard Kieser, Karl Wegscheider, Jochem König, Christine Eulenburg
Abstract: Composite endpoints combine several events within a single variable, which increases the number of expected events and is thereby meant to increase the power. However, the interpretation of results can be difficult as the observed effect for the composite does not necessarily reflect the effects for the components, which may be of different magnitude or even point in adverse directions. Moreover, in clinical applications, the event types are often of different clinical relevance, which also complicates the interpretation of the composite effect. The common effect measure for composite endpoints is the all-cause hazard ratio, which gives equal weight to all events irrespective of their type and clinical relevance. Thereby, the all-cause hazard within each group is given by the sum of the cause-specific hazards corresponding to the individual components. A natural extension of the standard all-cause hazard ratio can be defined by a “weighted all-cause hazard ratio” where the individual hazards for each component are multiplied with predefined relevance weighting factors. For the special case of equal weights across the components, the weighted all-cause hazard ratio then corresponds to the standard all-cause hazard ratio. To identify the cause-specific hazard of the individual components, any parametric survival model might be applied. The new weighted effect measure can be tested for deviations from the null hypothesis by means of a permutation test. In this work, we systematically compare the new weighted approach to the standard all-cause hazard ratio by theoretical considerations, Monte-Carlo simulations, and by means of a real clinical trial example.
PubDate: 2017-12-04T00:15:32.286536-05:
DOI: 10.1002/sim.7531
- Authors: Geraldine Rauch; Kevin Kunzmann, Meinhard Kieser, Karl Wegscheider, Jochem König, Christine Eulenburg
- Considerations for analysis of time-to-event outcomes measured with error:
Bias and correction with SIMEX- Authors: Eric J. Oh; Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw
Abstract: For time-to-event outcomes, a rich literature exists on the bias introduced by covariate measurement error in regression models, such as the Cox model, and methods of analysis to address this bias. By comparison, less attention has been given to understanding the impact or addressing errors in the failure time outcome. For many diseases, the timing of an event of interest (such as progression-free survival or time to AIDS progression) can be difficult to assess or reliant on self-report and therefore prone to measurement error. For linear models, it is well known that random errors in the outcome variable do not bias regression estimates. With nonlinear models, however, even random error or misclassification can introduce bias into estimated parameters. We compare the performance of 2 common regression models, the Cox and Weibull models, in the setting of measurement error in the failure time outcome. We introduce an extension of the SIMEX method to correct for bias in hazard ratio estimates from the Cox model and discuss other analysis options to address measurement error in the response. A formula to estimate the bias induced into the hazard ratio by classical measurement error in the event time for a log-linear survival model is presented. Detailed numerical studies are presented to examine the performance of the proposed SIMEX method under varying levels and parametric forms of the error in the outcome. We further illustrate the method with observational data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.
PubDate: 2017-11-29T00:55:54.019045-05:
DOI: 10.1002/sim.7554
- Authors: Eric J. Oh; Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw
- Meta-analysis for the comparison of two diagnostic tests—A new
approach based on copulas- Authors: Annika Hoyer; Oliver Kuss
Abstract: Meta-analysis of diagnostic studies is still field of ongoing biometrical research. Especially, clinical researchers call for methods that allow for a comparison of different diagnostic tests to a common gold standard. Focussing on two diagnostic tests, the main parameters of interest are differences of sensitivities and specificities (with their corresponding confidence intervals) between the two diagnostic tests while accounting for the various associations across the two tests and the single studies. Similar to our previous work using generalized linear mixed models to this task, we propose a model with a quadrivariate response consisting of the two sensitivities and the two specificities of both tests. This new approach uses the ideas of copula modelling, and especially a quadrivariate Gaussian copula and a quadrivariate vine copula, which is built from bivariate Plackett copulas. The different copulas are compared in a simulation study and illustrated by the application of population-based screening for type 2 diabetes.
PubDate: 2017-11-29T00:55:32.762226-05:
DOI: 10.1002/sim.7556
- Authors: Annika Hoyer; Oliver Kuss
- Maximum likelihood estimation of influenza vaccine effectiveness against
transmission from the household and from the community- Authors: Kylie E. C. Ainslie; Michael J. Haber, Ryan E. Malosh, Joshua G. Petrie, Arnold S. Monto
Abstract: Influenza vaccination is recommended as the best way to protect against influenza infection and illness. Due to seasonal changes in influenza virus types and subtypes, a new vaccine must be produced, and vaccine effectiveness (VE) must be estimated, annually. Since 2010, influenza vaccination has been recommended universally in the United States, making randomized clinical trials unethical. Recent studies have used a monitored household cohort study design to determine separate VE estimates against influenza transmission from the household and community. We developed a probability model and accompanying maximum likelihood procedure to estimate vaccine-related protection against transmission of influenza from the household and the community. Using agent-based stochastic simulations, we validated that we can obtain maximum likelihood estimates of transmission parameters and VE close to their true values. Sensitivity analyses to examine the effect of deviations from our assumptions were conducted. We used our method to estimate transmission parameters and VE from data from a monitored household study in Michigan during the 2012-2013 influenza season and were able to detect a significant protective effect of influenza vaccination against community-acquired transmission.
PubDate: 2017-11-28T01:20:27.154935-05:
DOI: 10.1002/sim.7558
- Authors: Kylie E. C. Ainslie; Michael J. Haber, Ryan E. Malosh, Joshua G. Petrie, Arnold S. Monto
- Impact of individual behaviour change on the spread of emerging infectious
diseases- Authors: Q.L. Yan; S.Y. Tang, Y.N. Xiao
Abstract: Human behaviour plays an important role in the spread of emerging infectious diseases, and understanding the influence of behaviour changes on epidemics can be key to improving control efforts. However, how the dynamics of individual behaviour changes affects the development of emerging infectious disease is a key public health issue. To develop different formula for individual behaviour change and introduce how to embed it into a dynamic model of infectious diseases, we choose A/H1N1 and Ebola as typical examples, combined with the epidemic reported cases and media related news reports. Thus, the logistic model with the health belief model is used to determine behaviour decisions through the health belief model constructs. Furthermore, we propose 4 candidate infectious disease models without and with individual behaviour change and use approximate Bayesian computation based on sequential Monte Carlo method for model selection. The main results indicate that the classical compartment model without behaviour change and the model with average rate of behaviour change depicted by an exponential function could fit the observed data best. The results provide a new way on how to choose an infectious disease model to predict the disease prevalence trend or to evaluate the influence of intervention measures on disease control. However, sensitivity analyses indicate that the accumulated number of hospital notifications and deaths could be largely reduced as the rate of behaviour change increases. Therefore, in terms of mitigating emerging infectious diseases, both media publicity focused on how to guide people's behaviour change and positive responses of individuals are critical.
PubDate: 2017-11-28T01:16:10.029739-05:
DOI: 10.1002/sim.7548
- Authors: Q.L. Yan; S.Y. Tang, Y.N. Xiao
- Assessing the similarity of dose response and target doses in 2
non-overlapping subgroups- Authors: Frank Bretz; Kathrin Möllenhoff, Holger Dette, Wei Liu, Matthias Trampisch
Abstract: We consider 2 problems of increasing importance in clinical dose finding studies. First, we assess the similarity of 2 non-linear regression models for 2 non-overlapping subgroups of patients over a restricted covariate space. To this end, we derive a confidence interval for the maximum difference between the 2 given models. If this confidence interval excludes the pre-specified equivalence margin, similarity of dose response can be claimed. Second, we address the problem of demonstrating the similarity of 2 target doses for 2 non-overlapping subgroups, using again an approach based on a confidence interval. We illustrate the proposed methods with a real case study and investigate their operating characteristics (coverage probabilities, Type I error rates, power) via simulation.
PubDate: 2017-11-27T21:31:51.752935-05:
DOI: 10.1002/sim.7546
- Authors: Frank Bretz; Kathrin Möllenhoff, Holger Dette, Wei Liu, Matthias Trampisch
- Semiparametric regression analysis for alternating recurrent event data
- Authors: Chi Hyun Lee; Chiung-Yu Huang, Gongjun Xu, Xianghua Luo
Abstract: Alternating recurrent event data arise frequently in clinical and epidemiologic studies, where 2 types of events such as hospital admission and discharge occur alternately over time. The 2 alternating states defined by these recurrent events could each carry important and distinct information about a patient's underlying health condition and/or the quality of care. In this paper, we propose a semiparametric method for evaluating covariate effects on the 2 alternating states jointly. The proposed methodology accounts for the dependence among the alternating states as well as the heterogeneity across patients via a frailty with unspecified distribution. Moreover, the estimation procedure, which is based on smooth estimating equations, not only properly addresses challenges such as induced dependent censoring and intercept sampling bias commonly confronted in serial event gap time data but also is more computationally tractable than the existing rank‐based methods. The proposed methods are evaluated by simulation studies and illustrated by analyzing psychiatric contacts from the South Verona Psychiatric Case Register.
PubDate: 2017-11-23T23:50:39.214169-05:
DOI: 10.1002/sim.7563
- Authors: Chi Hyun Lee; Chiung-Yu Huang, Gongjun Xu, Xianghua Luo
- Mixture drug‐count response model for the high‐dimensional drug
combinatory effect on myopathy- Authors: Xueying Wang; Pengyue Zhang, Chien-Wei Chiang, Hengyi Wu, Li Shen, Xia Ning, Donglin Zeng, Lei Wang, Sara K. Quinney, Weixing Feng, Lang Li
Abstract: Drug‐drug interactions (DDIs) are a common cause of adverse drug events (ADEs). The electronic medical record (EMR) database and the FDA's adverse event reporting system (FAERS) database are the major data sources for mining and testing the ADE associated DDI signals. Most DDI data mining methods focus on pair‐wise drug interactions, and methods to detect high‐dimensional DDIs in medical databases are lacking. In this paper, we propose 2 novel mixture drug‐count response models for detecting high‐dimensional drug combinations that induce myopathy. The “count” indicates the number of drugs in a combination. One model is called fixed probability mixture drug‐count response model with a maximum risk threshold (FMDRM‐MRT). The other model is called count‐dependent probability mixture drug‐count response model with a maximum risk threshold (CMDRM‐MRT), in which the mixture probability is count dependent. Compared with the previous mixture drug‐count response model (MDRM) developed by our group, these 2 new models show a better likelihood in detecting high‐dimensional drug combinatory effects on myopathy. CMDRM‐MRT identified and validated (54; 374; 637; 442; 131) 2‐way to 6‐way drug interactions, respectively, which induce myopathy in both EMR and FAERS databases. We further demonstrate FAERS data capture much higher maximum myopathy risk than EMR data do. The consistency of 2 mixture models' parameters and local false discovery rate estimates are evaluated through statistical simulation studies.
PubDate: 2017-11-23T23:45:43.018637-05:
DOI: 10.1002/sim.7545
- Authors: Xueying Wang; Pengyue Zhang, Chien-Wei Chiang, Hengyi Wu, Li Shen, Xia Ning, Donglin Zeng, Lei Wang, Sara K. Quinney, Weixing Feng, Lang Li
- A state transition framework for patient‐level modeling of engagement
and retention in HIV care using longitudinal cohort data- Authors: Hana Lee; Joseph W. Hogan, Becky L. Genberg, Xiaotian K. Wu, Beverly S. Musick, Ann Mwangi, Paula Braitstein
Abstract: The human immunodeficiency virus (HIV) care cascade is a conceptual model used to outline the benchmarks that reflects effectiveness of HIV care in the whole HIV care continuum. The models can be used to identify barriers contributing to poor outcomes along each benchmark in the cascade such as disengagement from care or death. Recently, the HIV care cascade has been widely applied to monitor progress towards HIV prevention and care goals in an attempt to develop strategies to improve health outcomes along the care continuum. Yet, there are challenges in quantifying successes and gaps in HIV care using the cascade models that are partly due to the lack of analytic approaches. The availability of large cohort data presents an opportunity to develop a coherent statistical framework for analysis of the HIV care cascade. Motivated by data from the Academic Model Providing Access to Healthcare, which has provided HIV care to nearly 200,000 individuals in Western Kenya since 2001, we developed a state transition framework that can characterize patient‐level movements through the multiple stages of the HIV care cascade. We describe how to transform large observational data into an analyzable format. We then illustrate the state transition framework via multistate modeling to quantify dynamics in retention aspects of care. The proposed modeling approach identifies the transition probabilities of moving through each stage in the care cascade. In addition, this approach allows regression‐based estimation to characterize effects of (time‐varying) predictors of within and between state transitions such as retention, disengagement, re‐entry into care, transfer‐out, and mortality. Copyright © 2017 John Wiley & Sons, Ltd.
PubDate: 2017-11-22T00:00:32.299675-05:
DOI: 10.1002/sim.7502
- Authors: Hana Lee; Joseph W. Hogan, Becky L. Genberg, Xiaotian K. Wu, Beverly S. Musick, Ann Mwangi, Paula Braitstein
- Internal pilot design for balanced repeated measures
- Authors: Xinrui Zhang; Keith E. Muller, Maureen M. Goodenow, Yueh-Yun Chi
Abstract: Repeated measures are common in clinical trials and epidemiological studies. Designing studies with repeated measures requires reasonably accurate specifications of the variances and correlations to select an appropriate sample size. Underspecifying the variances leads to a sample size that is inadequate to detect a meaningful scientific difference, while overspecifying the variances results in an unnecessarily large sample size. Both lead to wasting resources and placing study participants in unwarranted risk. An internal pilot design allows sample size recalculation based on estimates of the nuisance parameters in the covariance matrix. We provide the theoretical results that account for the stochastic nature of the final sample size in a common class of linear mixed models. The results are useful for designing studies with repeated measures and balanced design. Simulations examine the impact of misspecification of the covariance matrix and demonstrate the accuracy of the approximations in controlling the type I error rate and achieving the target power. The proposed methods are applied to a longitudinal study assessing early antiretroviral therapy for youth living with HIV.
PubDate: 2017-11-21T21:05:51.735643-05:
DOI: 10.1002/sim.7524
- Authors: Xinrui Zhang; Keith E. Muller, Maureen M. Goodenow, Yueh-Yun Chi
- Exponential decay for binary time‐varying covariates in Cox models
- Authors: Charles Donald George Keown-Stoneman; Julie Horrocks, Gerarda Darlington
Abstract: Cox models are commonly used in the analysis of time to event data. One advantage of Cox models is the ability to include time‐varying covariates, often a binary covariate that codes for the occurrence of an event that affects an individual subject. A common assumption in this case is that the effect of the event on the outcome of interest is constant and permanent for each subject. In this paper, we propose a modification to the Cox model to allow the influence of an event to exponentially decay over time. Methods for generating data using the inverse cumulative density function for the proposed model are developed. Likelihood ratio tests and AIC are investigated as methods for comparing the proposed model to the commonly used permanent exposure model. A simulation study is performed, and 3 different data sets are presented as examples.
PubDate: 2017-11-21T21:05:43.945237-05:
DOI: 10.1002/sim.7539
- Authors: Charles Donald George Keown-Stoneman; Julie Horrocks, Gerarda Darlington
- Five criteria for using a surrogate endpoint to predict treatment effect
based on data from multiple previous trials- Authors: Stuart G. Baker
Abstract: A surrogate endpoint in a randomized clinical trial is an endpoint that occurs after randomization and before the true, clinically meaningful, endpoint that yields conclusions about the effect of treatment on true endpoint. A surrogate endpoint can accelerate the evaluation of new treatments but at the risk of misleading conclusions. Therefore, criteria are needed for deciding whether to use a surrogate endpoint in a new trial. For the meta‐analytic setting of multiple previous trials, each with the same pair of surrogate and true endpoints, this article formulates 5 criteria for using a surrogate endpoint in a new trial to predict the effect of treatment on the true endpoint in the new trial. The first 2 criteria, which are easily computed from a zero‐intercept linear random effects model, involve statistical considerations: an acceptable sample size multiplier and an acceptable prediction separation score. The remaining 3 criteria involve clinical and biological considerations: similarity of biological mechanisms of treatments between the new trial and previous trials, similarity of secondary treatments following the surrogate endpoint between the new trial and previous trials, and a negligible risk of harmful side effects arising after the observation of the surrogate endpoint in the new trial. These 5 criteria constitute an appropriately high bar for using a surrogate endpoint to make a definitive treatment recommendation.
PubDate: 2017-11-21T20:55:36.81695-05:0
DOI: 10.1002/sim.7561
- Authors: Stuart G. Baker
- Bayesian monotonic errors‐in‐variables models with applications to
pathogen susceptibility testing- Authors: Glen DePalma; Bruce A. Craig
Abstract: Drug dilution (MIC) and disk diffusion (DIA) are the 2 most common antimicrobial susceptibility assays used by hospitals and clinics to determine an unknown pathogen's susceptibility to various antibiotics. Since only one assay is commonly used, it is important that the 2 assays give similar results. Calibration of the DIA assay to the MIC assay is typically done using the error‐rate bounded method, which selects DIA breakpoints that minimize the observed discrepancies between the 2 assays. In 2000, Craig proposed a model‐based approach that specifically models the measurement error and rounding processes of each assay, the underlying pathogen distribution, and the true monotonic relationship between the 2 assays. The 2 assays are then calibrated by focusing on matching the probabilities of correct classification (susceptible, indeterminant, and resistant). This approach results in greater precision and accuracy for estimating DIA breakpoints. In this paper, we expand the flexibility of the model‐based method by introducing a Bayesian 4‐parameter logistic model (extending Craig's original 3‐parameter model) as well as a Bayesian nonparametric spline model to describe the relationship between the 2 assays. We propose 2 ways to handle spline knot selection, considering many equally spaced knots but restricting overfitting via a random walk prior and treating the number and location of knots as additional unknown parameters. We demonstrate the 2 approaches via a series of simulation studies and apply the methods to 2 real data sets.
PubDate: 2017-11-20T19:25:47.42346-05:0
DOI: 10.1002/sim.7533
- Authors: Glen DePalma; Bruce A. Craig
- Evidence synthesis from aggregate recurrent event data for clinical trial
design and analysis- Authors: Björn Holzhauer; Craig Wang, Heinz Schmidli
Abstract: Information from historical trials is important for the design, interim monitoring, analysis, and interpretation of clinical trials. Meta‐analytic models can be used to synthesize the evidence from historical data, which are often only available in aggregate form. We consider evidence synthesis methods for trials with recurrent event endpoints, which are common in many therapeutic areas. Such endpoints are typically analyzed by negative binomial regression. However, the individual patient data necessary to fit such a model are usually unavailable for historical trials reported in the medical literature. We describe approaches for back‐calculating model parameter estimates and their standard errors from available summary statistics with various techniques, including approximate Bayesian computation. We propose to use a quadratic approximation to the log‐likelihood for each historical trial based on 2 independent terms for the log mean rate and the log of the dispersion parameter. A Bayesian hierarchical meta‐analysis model then provides the posterior predictive distribution for these parameters. Simulations show this approach with back‐calculated parameter estimates results in very similar inference as using parameter estimates from individual patient data as an input. We illustrate how to design and analyze a new randomized placebo‐controlled exacerbation trial in severe eosinophilic asthma using data from 11 historical trials.
PubDate: 2017-11-20T01:20:43.605029-05:
DOI: 10.1002/sim.7549
- Authors: Björn Holzhauer; Craig Wang, Heinz Schmidli
- Sparse boosting for high‐dimensional survival data with varying
coefficients- Authors: Mu Yue; Jialiang Li, Shuangge Ma
Abstract: Motivated by high‐throughput profiling studies in biomedical research, variable selection methods have been a focus for biostatisticians. In this paper, we consider semiparametric varying‐coefficient accelerated failure time models for right censored survival data with high‐dimensional covariates. Instead of adopting the traditional regularization approaches, we offer a novel sparse boosting (SparseL2Boosting) algorithm to conduct model‐based prediction and variable selection. One main advantage of this new method is that we do not need to perform the time‐consuming selection of tuning parameters. Extensive simulations are conducted to examine the performance of our sparse boosting feature selection techniques. We further illustrate our methods using a lung cancer data analysis.
PubDate: 2017-11-19T23:50:44.003095-05:
DOI: 10.1002/sim.7544
- Authors: Mu Yue; Jialiang Li, Shuangge Ma
- Measures of clustering and heterogeneity in multilevel Poisson regression
analyses of rates/count data- Authors: Peter C. Austin; Henrik Stryhn, George Leckie, Juan Merlo
Abstract: Multilevel data occur frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models. These models incorporate cluster‐specific random effects that allow one to partition the total variation in the outcome into between‐cluster variation and between‐individual variation. The magnitude of the effect of clustering provides a measure of the general contextual effect. When outcomes are binary or time‐to‐event in nature, the general contextual effect can be quantified by measures of heterogeneity like the median odds ratio or the median hazard ratio, respectively, which can be calculated from a multilevel regression model. Outcomes that are integer counts denoting the number of times that an event occurred are common in epidemiological and medical research. The median (incidence) rate ratio in multilevel Poisson regression for counts that corresponds to the median odds ratio or median hazard ratio for binary or time‐to‐event outcomes respectively is relatively unknown and is rarely used. The median rate ratio is the median relative change in the rate of the occurrence of the event when comparing identical subjects from 2 randomly selected different clusters that are ordered by rate. We also describe how the variance partition coefficient, which denotes the proportion of the variation in the outcome that is attributable to between‐cluster differences, can be computed with count outcomes. We illustrate the application and interpretation of these measures in a case study analyzing the rate of hospital readmission in patients discharged from hospital with a diagnosis of heart failure.
PubDate: 2017-11-08T01:05:37.936255-05:
DOI: 10.1002/sim.7532
- Authors: Peter C. Austin; Henrik Stryhn, George Leckie, Juan Merlo
- Hierarchical models for epidermal nerve fiber data
- Authors: Claes Andersson; Tuomas Rajala, Aila Särkkä
Abstract: While epidermal nerve fiber (ENF) data have been used to study the effects of small fiber neuropathies through the density and the spatial patterns of the ENFs, little research has been focused on the effects on the individual nerve fibers. Studying the individual nerve fibers might give a better understanding of the effects of the neuropathy on the growth process of the individual ENFs. In this study, data from 32 healthy volunteers and 20 diabetic subjects, obtained from suction induced skin blister biopsies, are analyzed by comparing statistics for the nerve fibers as a whole and for the segments that a nerve fiber is composed of. Moreover, it is evaluated whether this type of data can be used to detect diabetic neuropathy, by using hierarchical models to perform unsupervised classification of the subjects. It is found that using the information about the individual nerve fibers in combination with the ENF counts yields a considerable improvement as compared to using the ENF counts only.
PubDate: 2017-11-07T21:05:58.847554-05:
DOI: 10.1002/sim.7516
- Authors: Claes Andersson; Tuomas Rajala, Aila Särkkä
- Robust fit of Bayesian mixed effects regression models with application to
colony forming unit count in tuberculosis research- Authors: Divan Aristo Burger; Robert Schall
Abstract: Early bactericidal activity of tuberculosis drugs is conventionally assessed using statistical regression modeling of colony forming unit (CFU) counts over time. Typically, most CFU counts deviate little from the regression curve, but gross outliers due to erroneous sputum sampling are occasionally present and can markedly influence estimates of the rate of change in CFU count, which is the parameter of interest. A recently introduced Bayesian nonlinear mixed effects regression model was adapted to offer a robust approach that accommodates both outliers and potential skewness in the data. At its most general, the proposed regression model fits the skew Student t distribution to residuals and random coefficients. Deviance information criterion statistics and compound Laplace‐Metropolis marginal likelihoods were used to discriminate between alternative Bayesian nonlinear mixed effects regression models. We present a relatively easy method to calculate the marginal likelihoods required to determine compound Laplace‐Metropolis marginal likelihoods, by adapting methods available in currently available statistical software. The robust methodology proposed in this paper was applied to data from 6 clinical trials. The results provide strong evidence that the distribution of CFU count is often heavy tailed and negatively skewed (suggesting the presence of outliers). Therefore, we recommend that robust regression models, such as those proposed here, should be fitted to CFU count.
PubDate: 2017-11-06T18:55:34.336609-05:
DOI: 10.1002/sim.7529
- Authors: Divan Aristo Burger; Robert Schall
- Multi‐arm trials with multiple primary endpoints and missing values
- Authors: Mario Hasler; Ludwig A. Hothorn
Abstract: We present an extension of multiple contrast tests for multiple endpoints to the case of missing values. The endpoints are assumed to be normally distributed and correlated and to have equal covariance matrices for the different treatments. Different multivariate t distributions will be applied, differing in endpoint‐specific degrees of freedom. In contrast to competing methods, the familywise error type I is maintained in the strong sense in an admissible range, and the problem of different marginal errors type I is avoided. The information of all observations is exploited, thereby enabling a gain in power compared with a complete case analysis.
PubDate: 2017-11-06T18:50:28.328784-05:
DOI: 10.1002/sim.7542
- Authors: Mario Hasler; Ludwig A. Hothorn
- A Bayesian approach for analyzing zero‐inflated clustered count data
with dispersion- Authors: Hyoyoung Choo-Wosoba; Jeremy Gaskins, Steven Levy, Somnath Datta
Abstract: In practice, count data may exhibit varying dispersion patterns and excessive zero values; additionally, they may appear in groups or clusters sharing a common source of variation. We present a novel Bayesian approach for analyzing such data. To model these features, we combine the Conway‐Maxwell‐Poisson distribution, which allows both overdispersion and underdispersion, with a hurdle component for the zeros and random effects for clustering. We propose an efficient Markov chain Monte Carlo sampling scheme to obtain posterior inference from our model. Through simulation studies, we compare our hurdle Conway‐Maxwell‐Poisson model with a hurdle Poisson model to demonstrate the effectiveness of our Conway‐Maxwell‐Poisson approach. Furthermore, we apply our model to analyze an illustrative dataset containing information on the number and types of carious lesions on each tooth in a population of 9‐year‐olds from the Iowa Fluoride Study, which is an ongoing longitudinal study on a cohort of Iowa children that began in 1991.
PubDate: 2017-11-06T17:30:24.780872-05:
DOI: 10.1002/sim.7541
- Authors: Hyoyoung Choo-Wosoba; Jeremy Gaskins, Steven Levy, Somnath Datta
- Investigating the assumptions of the self‐controlled case series
method- Authors: Heather J. Whitaker; Yonas Ghebremichael-Weldeselassie, Ian J. Douglas, Liam Smeeth, C. Paddy Farrington
Abstract: We describe some simple techniques for investigating 2 key assumptions of the self‐controlled case series (SCCS) method, namely, that events do not influence subsequent exposures and that events do not influence the length of observation periods. For each assumption, we propose some simple tests based on the standard SCCS model, along with associated graphical displays. The methods also enable the user to investigate the robustness of the results obtained using the standard SCCS model to failure of assumptions. The proposed methods are investigated by simulations and applied to data on measles, mumps and rubella vaccine, and antipsychotics.
PubDate: 2017-11-02T01:20:31.332328-05:
DOI: 10.1002/sim.7536
- Authors: Heather J. Whitaker; Yonas Ghebremichael-Weldeselassie, Ian J. Douglas, Liam Smeeth, C. Paddy Farrington
- Modeling rater diagnostic skills in binary classification processes
- Authors: Xiaoyan Lin; Hua Chen, Don Edwards, Kerrie P. Nelson
Abstract: Many disease diagnoses involve subjective judgments by qualified raters. For example, through the inspection of a mammogram, MRI, or ultrasound image, the clinician himself becomes part of the measuring instrument. To reduce diagnostic errors and improve the quality of diagnoses, it is necessary to assess raters' diagnostic skills and to improve their skills over time. This paper focuses on a subjective binary classification process, proposing a hierarchical model linking data on rater opinions with patient true disease‐development outcomes. The model allows for the quantification of the effects of rater diagnostic skills (bias and magnifier) and patient latent disease severity on the rating results. A Bayesian Markov chain Monte Carlo (MCMC) algorithm is developed to estimate these parameters. Linking to patient true disease outcomes, the rater‐specific sensitivity and specificity can be estimated using MCMC samples. Cost theory is used to identify poor‐ and strong‐performing raters and to guide adjustment of rater bias and diagnostic magnifier to improve the rating performance. Furthermore, diagnostic magnifier is shown as a key parameter to present a rater's diagnostic ability because a rater with a larger diagnostic magnifier has a uniformly better receiver operating characteristic (ROC) curve when varying the value of diagnostic bias. A simulation study is conducted to evaluate the proposed methods, and the methods are illustrated with a mammography example.
PubDate: 2017-11-02T01:15:30.096294-05:
DOI: 10.1002/sim.7530
- Authors: Xiaoyan Lin; Hua Chen, Don Edwards, Kerrie P. Nelson
- Collaborative targeted learning using regression shrinkage
- Authors: Mireille E. Schnitzer; Matthew Cefalu
Abstract: Causal inference practitioners are routinely presented with the challenge of model selection and, in particular, reducing the size of the covariate set with the goal of improving estimation efficiency. Collaborative targeted minimum loss‐based estimation (CTMLE) is a general framework for constructing doubly robust semiparametric causal estimators that data‐adaptively limit model complexity in the propensity score to optimize a preferred loss function. This stepwise complexity reduction is based on a loss function placed on a strategically updated model for the outcome variable through which the error is assessed using cross‐validation. We demonstrate how the existing stepwise variable selection CTMLE can be generalized using regression shrinkage of the propensity score. We present 2 new algorithms that involve stepwise selection of the penalization parameter(s) in the regression shrinkage. Simulation studies demonstrate that, under a misspecified outcome model, mean squared error and bias can be reduced by a CTMLE procedure that separately penalizes individual covariates in the propensity score. We demonstrate these approaches in an example using electronic medical data with sparse indicator covariates to evaluate the relative safety of 2 similarly indicated asthma therapies for pregnant women with moderate asthma.
PubDate: 2017-11-02T00:50:53.409746-05:
DOI: 10.1002/sim.7527
- Authors: Mireille E. Schnitzer; Matthew Cefalu
- Combining multiple biomarkers linearly to maximize the partial area under
the ROC curve- Authors: Qingxiang Yan; Leonidas E. Bantis, Janet L. Stanford, Ziding Feng
Abstract: It is now common in clinical practice to make clinical decisions based on combinations of multiple biomarkers. In this paper, we propose new approaches for combining multiple biomarkers linearly to maximize the partial area under the receiver operating characteristic curve (pAUC). The parametric and nonparametric methods that have been developed for this purpose have limitations. When the biomarker values for populations with and without a given disease follow a multivariate normal distribution, it is easy to implement our proposed parametric approach, which adopts an alternative analytic expression of the pAUC. When normality assumptions are violated, a kernel‐based approach is presented, which handles multiple biomarkers simultaneously. We evaluated the proposed as well as existing methods through simulations and discovered that when the covariance matrices for the disease and nondisease samples are disproportional, traditional methods (such as the logistic regression) are more likely to fail to maximize the pAUC while the proposed methods are more robust. The proposed approaches are illustrated through application to a prostate cancer data set, and a rank‐based leave‐one‐out cross‐validation procedure is proposed to obtain a realistic estimate of the pAUC when there is no independent validation set available.
PubDate: 2017-10-30T00:31:04.271312-05:
DOI: 10.1002/sim.7535
- Authors: Qingxiang Yan; Leonidas E. Bantis, Janet L. Stanford, Ziding Feng
- Pairwise residuals and diagnostic tests for misspecified dependence
structures in models for binary longitudinal data- Authors: Nina Breinegaard; Sophia Rabe-Hesketh, Anders Skrondal
Abstract: Maximum likelihood estimation of models for binary longitudinal data is typically inconsistent if the dependence structure is misspecified. Unfortunately, diagnostics specifically designed for detecting such misspecifications are scant. We develop residuals and diagnostic tests based on comparing observed and expected frequencies of response patterns over time in the presence of arbitrary time‐varying and time‐invariant covariates. To overcome the sparseness problem, we use lower‐order marginal tables, such as two‐way tables for pairs of time‐points, aggregated over covariate patterns. Our proposed pairwise concordance residuals are valuable for exploratory diagnostics and for constructing both generic tests for misspecified dependence structure as well as targeted adjacent pair concordance tests for excess serial dependence. The proposed methods are straightforward to implement and work well for general situations, regardless of the number of time‐points and the number and types of covariates.
PubDate: 2017-10-30T00:30:57.643124-05:
DOI: 10.1002/sim.7512
- Authors: Nina Breinegaard; Sophia Rabe-Hesketh, Anders Skrondal
- Efficient ℓ0‐norm feature selection based on augmented and
penalized minimization- Authors: Xiang Li; Shanghong Xie, Donglin Zeng, Yuanjia Wang
Abstract: Advances in high‐throughput technologies in genomics and imaging yield unprecedentedly large numbers of prognostic biomarkers. To accommodate the scale of biomarkers and study their association with disease outcomes, penalized regression is often used to identify important biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an ℓ0‐penalty on the regression coefficients. Since this optimization is a nondeterministic polynomial‐time hard (NP‐hard) problem that does not scale with number of biomarkers, alternative methods mostly place smooth penalties on the regression parameters, which lead to computationally feasible optimization problems. However, empirical studies and theoretical analyses show that convex approximation of ℓ0‐norm (eg, ℓ1) does not outperform their ℓ0 counterpart. The progress for ℓ0‐norm feature selection is relatively slower, where the main methods are greedy algorithms such as stepwise regression or orthogonal matching pursuit. Penalized regression based on regularizing ℓ0‐norm remains much less explored in the literature. In this work, inspired by the recently popular augmenting and data splitting algorithms including alternating direction method of multipliers, we propose a 2‐stage procedure for ℓ0‐penalty variable selection, referred to as augmented penalized minimization‐L0(APM‐L0). The APM‐L0 targets ℓ0‐norm as closely as possible while keeping computation tractable, efficient, and simple, which is achieved by iterating between a convex regularized regression and a simple hard‐thresholding estimation. The procedure can be viewed as arising from regularized optimization with truncated ℓ1 norm. Thus, we propose to treat regularization parameter and thresholding parameter as tuning parameters and select based on cross‐validation. A 1‐step coordinate descent algorithm is used in the first stage to significantly improve computational efficiency. Through extensive simulation studies and real data application, we demonstrate superior performance of the proposed method in terms of selection accuracy and computational speed as compared to existing methods. The proposed APM‐L0 procedure is implemented in the R‐package APML0.
PubDate: 2017-10-30T00:25:36.307735-05:
DOI: 10.1002/sim.7526
- Authors: Xiang Li; Shanghong Xie, Donglin Zeng, Yuanjia Wang
- Weighted estimation for confounded binary outcomes subject to
misclassification- Authors: Christopher A. Gravel; Robert W. Platt
Abstract: In the presence of confounding, the consistency assumption required for identification of causal effects may be violated due to misclassification of the outcome variable. We introduce an inverse probability weighted approach to rebalance covariates across treatment groups while mitigating the influence of differential misclassification bias. First, using a simplified example taken from an administrative health care dataset, we introduce the approach for estimation of the marginal causal odds ratio in a simple setting with the use of internal validation information. We then extend this to the presence of additional covariates and use simulated data to investigate the finite sample properties of the proposed weighted estimators. Estimation of the weights is done using logistic regression with misclassified outcomes, and a bootstrap approach is used for variance estimation.
PubDate: 2017-10-30T00:21:19.269249-05:
DOI: 10.1002/sim.7522
- Authors: Christopher A. Gravel; Robert W. Platt
- Regression analysis of interval‐censored failure time data with
possibly crossing hazards- Authors: Han Zhang; Peijie Wang, Jianguo Sun
Abstract: Interval‐censored failure time data occur in many areas, especially in medical follow‐up studies such as clinical trials, and in consequence, many methods have been developed for the problem. However, most of the existing approaches cannot deal with the situations where the hazard functions may cross each other. To address this, we develop a sieve maximum likelihood estimation procedure with the application of the short‐term and long‐term hazard ratio model. In the method, the I‐splines are used to approximate the underlying unknown function. An extensive simulation study was conducted for the assessment of the finite sample properties of the presented procedure and suggests that the method seems to work well for practical situations. The analysis of an motivated example is also provided.
PubDate: 2017-10-23T23:22:55.33647-05:0
DOI: 10.1002/sim.7538
- Authors: Han Zhang; Peijie Wang, Jianguo Sun
- Detection of gene–environment interactions in a family‐based
population using SCAD- Authors: Gwangsu Kim; Chao-Qiang Lai, Donna K. Arnett, Laurence D. Parnell, Jose M. Ordovas, Yongdai Kim, Joungyoun Kim
PubDate: 2017-10-23T23:22:51.295252-05:
DOI: 10.1002/sim.7537
- Authors: Gwangsu Kim; Chao-Qiang Lai, Donna K. Arnett, Laurence D. Parnell, Jose M. Ordovas, Yongdai Kim, Joungyoun Kim
- Meta‐analysis approaches to combine multiple gene set enrichment
studies- Authors: Wentao Lu; Xinlei Wang, Xiaowei Zhan, Adi Gazdar
Abstract: In the field of gene set enrichment analysis (GSEA), meta‐analysis has been used to integrate information from multiple studies to present a reliable summarization of the expanding volume of individual biomedical research, as well as improve the power of detecting essential gene sets involved in complex human diseases. However, existing methods, Meta‐Analysis for Pathway Enrichment (MAPE), may be subject to power loss because of (1) using gross summary statistics for combining end results from component studies and (2) using enrichment scores whose distributions depend on the set sizes. In this paper, we adapt meta‐analysis approaches recently developed for genome‐wide association studies, which are based on fixed effect and random effects (RE) models, to integrate multiple GSEA studies. We further develop a mixed strategy via adaptive testing for choosing RE versus FE models to achieve greater statistical efficiency as well as flexibility. In addition, a size‐adjusted enrichment score based on a one‐sided Kolmogorov‐Smirnov statistic is proposed to formally account for varying set sizes when testing multiple gene sets. Our methods tend to have much better performance than the MAPE methods and can be applied to both discrete and continuous phenotypes. Specifically, the performance of the adaptive testing method seems to be the most stable in general situations.
PubDate: 2017-10-19T23:56:44.054671-05:
DOI: 10.1002/sim.7540
- Authors: Wentao Lu; Xinlei Wang, Xiaowei Zhan, Adi Gazdar
- Three‐part joint modeling methods for complex functional data mixed with
zero‐and‐one–inflated proportions and zero‐inflated continuous
outcomes with skewness- Authors: Haocheng Li; John Staudenmayer, Tianying Wang, Sarah Kozey Keadle, Raymond J. Carroll
Abstract: We take a functional data approach to longitudinal studies with complex bivariate outcomes. This work is motivated by data from a physical activity study that measured 2 responses over time in 5‐minute intervals. One response is the proportion of time active in each interval, a continuous proportions with excess zeros and ones. The other response, energy expenditure rate in the interval, is a continuous variable with excess zeros and skewness. This outcome is complex because there are 3 possible activity patterns in each interval (inactive, partially active, and completely active), and those patterns, which are observed, induce both nonrandom and random associations between the responses. More specifically, the inactive pattern requires a zero value in both the proportion for active behavior and the energy expenditure rate; a partially active pattern means that the proportion of activity is strictly between zero and one and that the energy expenditure rate is greater than zero and likely to be moderate, and the completely active pattern means that the proportion of activity is exactly one, and the energy expenditure rate is greater than zero and likely to be higher. To address these challenges, we propose a 3‐part functional data joint modeling approach. The first part is a continuation‐ratio model to reorder the ordinal valued 3 activity patterns. The second part models the proportions when they are in interval (0,1). The last component specifies the skewed continuous energy expenditure rate with Box‐Cox transformations when they are greater than zero. In this 3‐part model, the regression structures are specified as smooth curves measured at various time points with random effects that have a correlation structure. The smoothed random curves for each variable are summarized using a few important principal components, and the association of the 3 longitudinal components is modeled through the association of the principal component scores. The difficulties in handling the ordinal and proportional variables are addressed using a quasi‐likelihood type approximation. We develop an efficient algorithm to fit the model that also involves the selection of the number of principal components. The method is applied to physical activity data and is evaluated empirically by a simulation study.
PubDate: 2017-10-19T23:55:30.675496-05:
DOI: 10.1002/sim.7534
- Authors: Haocheng Li; John Staudenmayer, Tianying Wang, Sarah Kozey Keadle, Raymond J. Carroll
- Measuring precision in bioassays: Rethinking assay validation
- Authors: Michael P. Fay; Michael C. Sachs, Kazutoyo Miura
Abstract: The m:n:θb procedure is often used for validating an assay for precision, where m levels of an analyte are measured with n replicates at each level, and if all m estimates of coefficient of variation (CV) are less than θb, then the assay is declared validated for precision. The statistical properties of the procedure are unknown so there is no clear statistical statement of precision upon passing. Further, it is unclear how to modify the procedure for relative potency assays in which the constant standard deviation (SD) model fits much better than the traditional constant CV model. We use simple normal error models to show that under constant CV across the m levels, the probability of passing when the CV is θb is about 10% to 20% for some recommended implementations; however, for extreme heterogeniety of CV when the largest CV is θb, the passing probability can be greater than 50%. We derive 100q% upper confidence limits on the CV under constant CV models and derive analogous limits for the SD under a constant SD model. Additionally, for a post‐validation assay output of y, we derive 68.27% confidence intervals on either the mean or log geometric mean of the assay output using either y±s (for the constant SD model) or log(y)±rG (for the constant CV model), where s and rG are constants that do not depend on y. We demonstrate the methods on a growth inhibition assay used to measure biologic activity of antibodies against the malaria parasite.
PubDate: 2017-10-19T23:35:48.037985-05:
DOI: 10.1002/sim.7528
- Authors: Michael P. Fay; Michael C. Sachs, Kazutoyo Miura
- Hierarchical Archimedean copula models for the analysis of binary familial
data- Authors: Yihao Deng; N. Rao Chaganty
Abstract: Archimedean copulas are commonly used in a wide range of statistical models due to their simplicity, manageable analytical expressions, rich choices of generator functions, and other workable properties. However, the exchangeable dependence structure inherent to Archimedean copulas limits its application to familial data, where the dependence among family members is often different. When response variables are binary, modeling the familial associations becomes more challenging due to the stringent constraints imposed on the dependence parameters. This paper proposes hierarchical Archimedean copulas to account for the natural hierarchical dependence structure in familial data and addresses the details in the modeling of binary familial data and the inference based on maximum likelihood estimate. An example showing the flexibility of this powerful tool is also presented with possible extension to other similar studies.
PubDate: 2017-10-17T00:01:13.749534-05:
DOI: 10.1002/sim.7521
- Authors: Yihao Deng; N. Rao Chaganty
- A mechanistic nonlinear model for censored and mismeasured covariates in
longitudinal models, with application in AIDS studies- Authors: Hongbin Zhang; Hubert Wong, Lang Wu
Abstract: When modeling longitudinal data, the true values of time‐varying covariates may be unknown because of detection‐limit censoring or measurement error. A common approach in the literature is to empirically model the covariate process based on observed data and then predict the censored values or mismeasured values based on this empirical model. Such an empirical model can be misleading, especially for censored values since the (unobserved) censored values may behave very differently than observed values due to the underlying data‐generation mechanisms or disease status. In this paper, we propose a mechanistic nonlinear covariate model based on the underlying data‐generation mechanisms to address censored values and mismeasured values. Such a mechanistic model is based on solid scientific or biological arguments, so the predicted censored or mismeasured values are more reasonable. We use a Monte Carlo EM algorithm for likelihood inference and apply the methods to an AIDS dataset, where viral load is censored by a lower detection limit. Simulation results confirm that the proposed models and methods offer substantial advantages over existing empirical covariate models for censored and mismeasured covariates.
PubDate: 2017-10-16T02:10:29.26938-05:0
DOI: 10.1002/sim.7515
- Authors: Hongbin Zhang; Hubert Wong, Lang Wu
- Decision theory for comparing institutions
- Authors: Nicholas T. Longford
Abstract: Various forms of performance assessment are applied to public service institutions, such as hospitals, schools, police units, and local authorities. Difficulties arise in the interpretation of the results presented in some established formats because they require a good understanding and appreciation of the uncertainties involved. Usually the results have to be adapted to the perspectives of the users—managers of the assessed units, a consumer, or a central authority (a watchdog) that dispenses awards and sanctions. We present a decision‐theoretical approach to these and related problems in which the perspectives are integrated in the analysis and its results are choices from a finite list of options (alternative courses of action).
PubDate: 2017-10-16T02:05:33.620893-05:
DOI: 10.1002/sim.7525
- Authors: Nicholas T. Longford
- Dissecting gene‐environment interactions: A penalized robust approach
accounting for hierarchical structures- Authors: Cen Wu; Yu Jiang, Jie Ren, Yuehua Cui, Shuangge Ma
Abstract: Identification of gene‐environment (G × E) interactions associated with disease phenotypes has posed a great challenge in high‐throughput cancer studies. The existing marginal identification methods have suffered from not being able to accommodate the joint effects of a large number of genetic variants, while some of the joint‐effect methods have been limited by failing to respect the “main effects, interactions” hierarchy, by ignoring data contamination, and by using inefficient selection techniques under complex structural sparsity. In this article, we develop an effective penalization approach to identify important G × E interactions and main effects, which can account for the hierarchical structures of the 2 types of effects. Possible data contamination is accommodated by adopting the least absolute deviation loss function. The advantage of the proposed approach over the alternatives is convincingly demonstrated in both simulation and a case study on lung cancer prognosis with gene expression measurements and clinical covariates under the accelerated failure time model.
PubDate: 2017-10-16T02:01:08.424936-05:
DOI: 10.1002/sim.7518
- Authors: Cen Wu; Yu Jiang, Jie Ren, Yuehua Cui, Shuangge Ma
- Identifying gene‐gene interactions using penalized tensor regression
- Authors: Mengyun Wu; Jian Huang, Shuangge Ma
Abstract: Gene‐gene (G×G) interactions have been shown to be critical for the fundamental mechanisms and development of complex diseases beyond main genetic effects. The commonly adopted marginal analysis is limited by considering only a small number of G factors at a time. With the “main effects, interactions” hierarchical constraint, many of the existing joint analysis methods suffer from prohibitively high computational cost. In this study, we propose a new method for identifying important G×G interactions under joint modeling. The proposed method adopts tensor regression to accommodate high data dimensionality and the penalization technique for selection. It naturally accommodates the strong hierarchical structure without imposing additional constraints, making optimization much simpler and faster than in the existing studies. It outperforms multiple alternatives in simulation. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer and melanoma demonstrates that it can identify markers with important implications and better prediction performance.
PubDate: 2017-10-16T02:00:36.944991-05:
DOI: 10.1002/sim.7523
- Authors: Mengyun Wu; Jian Huang, Shuangge Ma
- A review of tensor‐based methods and their application to hospital
care data- Authors: Paolo Giordani; Henk A.L. Kiers
Abstract: In many situations, a researcher is interested in the analysis of the scores of a set of observation units on a set of variables. However, in medicine, it is very frequent that the information is replicated at different occasions. The occasions can be time‐varying or refer to different conditions. In such cases, the data can be stored in a 3‐way array or tensor. The Candecomp/Parafac and Tucker3 methods represent the most common methods for analyzing 3‐way tensors. In this work, a review of these methods is provided, and then this class of methods is applied to a 3‐way data set concerning hospital care data for a hospital in Rome (Italy) during 15 years distinguished in 3 groups of consecutive years (1892–1896, 1940–1944, 1968–1972). The analysis reveals some peculiar aspects about the use of health services and its evolution along the time.
PubDate: 2017-10-10T20:35:53.362607-05:
DOI: 10.1002/sim.7514
- Authors: Paolo Giordani; Henk A.L. Kiers
- Covariate adjustment using propensity scores for dependent censoring
problems in the accelerated failure time model- Authors: Youngjoo Cho; Chen Hu, Debashis Ghosh
Abstract: In many medical studies, estimation of the association between treatment and outcome of interest is often of primary scientific interest. Standard methods for its evaluation in survival analysis typically require the assumption of independent censoring. This assumption might be invalid in many medical studies, where the presence of dependent censoring leads to difficulties in analyzing covariate effects on disease outcomes. This data structure is called “semicompeting risks data,” for which many authors have proposed an artificial censoring technique. However, confounders with large variability may lead to excessive artificial censoring, which subsequently results in numerically unstable estimation. In this paper, we propose a strategy for weighted estimation of the associations in the accelerated failure time model. Weights are based on propensity score modeling of the treatment conditional on confounder variables. This novel application of propensity scores avoids excess artificial censoring caused by the confounders and simplifies computation. Monte Carlo simulation studies and application to AIDS and cancer research are used to illustrate the methodology.
PubDate: 2017-10-10T20:22:00.494504-05:
DOI: 10.1002/sim.7513
- Authors: Youngjoo Cho; Chen Hu, Debashis Ghosh
- Sensitivity analysis for publication bias in meta‐analysis of diagnostic
studies for a continuous biomarker- Authors: Satoshi Hattori; Xiao-Hua Zhou
Abstract: Publication bias is one of the most important issues in meta‐analysis. For standard meta‐analyses to examine intervention effects, the funnel plot and the trim‐and‐fill method are simple and widely used techniques for assessing and adjusting for the influence of publication bias, respectively. However, their use may be subjective and can then produce misleading insights. To make a more objective inference for publication bias, various sensitivity analysis methods have been proposed, including the Copas selection model. For meta‐analysis of diagnostic studies evaluating a continuous biomarker, the summary receiver operating characteristic (sROC) curve is a very useful method in the presence of heterogeneous cutoff values. To our best knowledge, no methods are available for evaluation of influence of publication bias on estimation of the sROC curve. In this paper, we introduce a Copas‐type selection model for meta‐analysis of diagnostic studies and propose a sensitivity analysis method for publication bias. Our method enables us to assess the influence of publication bias on the estimation of the sROC curve and then judge whether the result of the meta‐analysis is sufficiently confident or should be interpreted with much caution. We illustrate our proposed method with real data.
PubDate: 2017-10-09T01:21:57.717623-05:
DOI: 10.1002/sim.7510
- Authors: Satoshi Hattori; Xiao-Hua Zhou
- Semiparametric accelerated failure time cure rate mixture models with
competing risks- Authors: Sangbum Choi; Liang Zhu, Xuelin Huang
Abstract: Modern medical treatments have substantially improved survival rates for many chronic diseases and have generated considerable interest in developing cure fraction models for survival data with a non‐ignorable cured proportion. Statistical analysis of such data may be further complicated by competing risks that involve multiple types of endpoints. Regression analysis of competing risks is typically undertaken via a proportional hazards model adapted on cause‐specific hazard or subdistribution hazard. In this article, we propose an alternative approach that treats competing events as distinct outcomes in a mixture. We consider semiparametric accelerated failure time models for the cause‐conditional survival function that are combined through a multinomial logistic model within the cure‐mixture modeling framework. The cure‐mixture approach to competing risks provides a means to determine the overall effect of a treatment and insights into how this treatment modifies the components of the mixture in the presence of a cure fraction. The regression and nonparametric parameters are estimated by a nonparametric kernel‐based maximum likelihood estimation method. Variance estimation is achieved through resampling methods for the kernel‐smoothed likelihood function. Simulation studies show that the procedures work well in practical settings. Application to a sarcoma study demonstrates the use of the proposed method for competing risk data with a cure fraction.
PubDate: 2017-10-06T00:06:11.574534-05:
DOI: 10.1002/sim.7508
- Authors: Sangbum Choi; Liang Zhu, Xuelin Huang
- Simultaneous inference for factorial multireader diagnostic trials
- Authors: Frank Konietschke; Randolph R. Aguayo, Wieland Staab
Abstract: We study inference methods for the analysis of multireader diagnostic trials. In these studies, data are usually collected in terms of a factorial design involving the factors Modality and Reader. Furthermore, repeated measures appear in a natural way since the same patient is observed under different modalities by several readers and the repeated measures may have a quite involved dependency structure. The hypotheses are formulated in terms of the areas under the ROC curves. Currently, only global testing procedures exist for the analysis of such data. We derive rank‐based multiple contrast test procedures and simultaneous confidence intervals which take the correlation between the test statistics into account. The procedures allow for testing arbitrary multiple hypotheses. Extensive simulation studies show that the new approaches control the nominal type 1 error rate very satisfactorily. A real data set illustrates the application of the proposed methods.
PubDate: 2017-10-05T02:05:10.410371-05:
DOI: 10.1002/sim.7507
- Authors: Frank Konietschke; Randolph R. Aguayo, Wieland Staab
- Generalized linear mixed model for binary outcomes when covariates are
subject to measurement errors and detection limits- Authors: Xianhong Xie; Xiaonan Xue, Howard D. Strickler
Abstract: Longitudinal measurement of biomarkers is important in determining risk factors for binary endpoints such as infection or disease. However, biomarkers are subject to measurement error, and some are also subject to left‐censoring due to a lower limit of detection. Statistical methods to address these issues are few. We herein propose a generalized linear mixed model and estimate the model parameters using the Monte Carlo Newton‐Raphson (MCNR) method. Inferences regarding the parameters are made by applying Louis's method and the delta method. Simulation studies were conducted to compare the proposed MCNR method with existing methods including the maximum likelihood (ML) method and the ad hoc approach of replacing the left‐censored values with half of the detection limit (HDL). The results showed that the performance of the MCNR method is superior to ML and HDL with respect to the empirical standard error, as well as the coverage probability for the 95% confidence interval. The HDL method uses an incorrect imputation method, and the computation is constrained by the number of quadrature points; while the ML method also suffers from the constrain for the number of quadrature points, the MCNR method does not have this limitation and approximates the likelihood function better than the other methods. The improvement of the MCNR method is further illustrated with real‐world data from a longitudinal study of local cervicovaginal HIV viral load and its effects on oncogenic HPV detection in HIV‐positive women.
PubDate: 2017-10-05T02:05:10.399533-05:
DOI: 10.1002/sim.7509
- Authors: Xianhong Xie; Xiaonan Xue, Howard D. Strickler
- Correlated Poisson models for age‐period‐cohort analysis
- Authors: Pavel Chernyavskiy; Mark P. Little, Philip S. Rosenberg
Abstract: Age‐period‐cohort (APC) models are widely used to analyze population‐level rates, particularly cancer incidence and mortality. These models are used for descriptive epidemiology, comparative risk analysis, and extrapolating future disease burden. Traditional APC models have 2 major limitations: (1) they lack parsimony because they require estimation of deviations from linear trends for each level of age, period, and cohort; and (2) rates observed at similar ages, periods, and cohorts are treated as independent, ignoring any correlations between them that may lead to biased parameter estimates and inefficient standard errors. We propose a novel approach to estimation of APC models using a spatially correlated Poisson model that accounts for over‐dispersion and correlations in age, period, and cohort, simultaneously. We treat the outcome of interest as event rates occurring over a grid defined by values of age, period, and cohort. Rates defined in this manner lend themselves to well‐established approaches from spatial statistics in which correlation among proximate observations may be modeled using a spatial random effect. Through simulations, we show that in the presence of spatial dependence and over‐dispersion: (1) the correlated Poisson model attains lower AIC; (2) the traditional APC model produces biased trend parameter estimates; and (3) the correlated Poisson model corrects most of this bias. We illustrate our approach using brain and breast cancer incidence rates from the Surveillance Epidemiology and End Results Program of the United States. Our approach can be easily extended to accommodate comparative risk analyses and interpolation of cells in the Lexis with sparse data.
PubDate: 2017-10-04T21:41:53.961392-05:
DOI: 10.1002/sim.7519
- Authors: Pavel Chernyavskiy; Mark P. Little, Philip S. Rosenberg
- Improving likelihood‐based inference in control rate regression
- Authors: Annamaria Guolo
Abstract: Control rate regression is a diffuse approach to account for heterogeneity among studies in meta‐analysis by including information about the outcome risk of patients in the control condition. Correcting for the presence of measurement error affecting risk information in the treated and in the control group has been recognized as a necessary step to derive reliable inferential conclusions. Within this framework, the paper considers the problem of small sample size as an additional source of misleading inference about the slope of the control rate regression. Likelihood procedures relying on first‐order approximations are shown to be substantially inaccurate, especially when dealing with increasing heterogeneity and correlated measurement errors. We suggest to address the problem by relying on higher‐order asymptotics. In particular, we derive Skovgaard's statistic as an instrument to improve the accuracy of the approximation of the signed profile log‐likelihood ratio statistic to the standard normal distribution. The proposal is shown to provide much more accurate results than standard likelihood solutions, with no appreciable computational effort. The advantages of Skovgaard's statistic in control rate regression are shown in a series of simulation experiments and illustrated in a real data example. R code for applying first‐ and second‐order statistic for inference on the slope on the control rate regression is provided.
PubDate: 2017-10-04T00:10:42.087624-05:
DOI: 10.1002/sim.7511
- Authors: Annamaria Guolo
- Direct likelihood inference on the cause‐specific cumulative incidence
function: A flexible parametric regression modelling approach- Authors: Sarwar Islam Mozumder; Mark Rutherford, Paul Lambert
Abstract: In a competing risks analysis, interest lies in the cause‐specific cumulative incidence function (CIF) that can be calculated by either (1) transforming on the cause‐specific hazard or (2) through its direct relationship with the subdistribution hazard. We expand on current competing risks methodology from within the flexible parametric survival modelling framework (FPM) and focus on approach (2). This models all cause‐specific CIFs simultaneously and is more useful when we look to questions on prognosis. We also extend cure models using a similar approach described by Andersson et al for flexible parametric relative survival models. Using SEER public use colorectal data, we compare and contrast our approach with standard methods such as the Fine & Gray model and show that many useful out‐of‐sample predictions can be made after modelling the cause‐specific CIFs using an FPM approach. Alternative link functions may also be incorporated such as the logit link. Models can also be easily extended for time‐dependent effects.
PubDate: 2017-10-02T23:05:35.804714-05:
DOI: 10.1002/sim.7498
- Authors: Sarwar Islam Mozumder; Mark Rutherford, Paul Lambert
- Treatment evaluation for a data‐driven subgroup in adaptive enrichment
designs of clinical trials- Authors: Zhiwei Zhang; Ruizhe Chen, Guoxing Soon, Hui Zhang
Abstract: Adaptive enrichment designs (AEDs) of clinical trials allow investigators to restrict enrollment to a promising subgroup based on an interim analysis. Most of the existing AEDs deal with a small number of predefined subgroups, which are often unknown at the design stage. The newly developed Simon design offers a great deal of flexibility in subgroup selection (without requiring pre‐defined subgroups) but does not provide a procedure for estimating and testing treatment efficacy for the selected subgroup. This article proposes a 2‐stage AED which does not require predefined subgroups but requires a prespecified algorithm for choosing a subgroup on the basis of baseline covariate information. Having a prespecified algorithm for subgroup selection makes it possible to use cross‐validation and bootstrap methods to correct for the resubstitution bias in estimating treatment efficacy for the selected subgroup. The methods are evaluated and compared in a simulation study mimicking actual clinical trials of human immunodeficiency virus infection.
PubDate: 2017-09-26T00:10:36.891114-05:
DOI: 10.1002/sim.7497
- Authors: Zhiwei Zhang; Ruizhe Chen, Guoxing Soon, Hui Zhang
- Efficient treatment allocation in 2 × 2 multicenter trials when costs
and variances are heterogeneous- Authors: Francesca Lemme; Gerard J.P. Breukelen, Math J.J.M. Candel
Abstract: At the design stage of a study, it is crucial to compute the sample size needed for treatment effect estimation with maximum precision and power. The optimal design depends on the costs, which may be known at the design stage, and on the outcome variances, which are unknown. A balanced design, optimal for homogeneous costs and variances, is typically used. An alternative to the balanced design is a design optimal for the known and possibly heterogeneous costs, and homogeneous variances, called costs considering design. Both designs suffer from loss of efficiency, compared with optimal designs for heterogeneous costs and variances. For 2 × 2 multicenter trials, we compute the relative efficiency of the balanced and the costs considering designs, relative to the optimal designs. We consider 2 heterogeneous costs and variance scenarios (in 1 scenario, 2 treatment conditions have small and 2 have large costs and variances; in the other scenario, 1 treatment condition has small, 2 have intermediate, and 1 has large costs and variances). Within these scenarios, we examine the relative efficiency of the balanced design and of the costs considering design as a function of the extents of heterogeneity of the costs and of the variances and of their congruence (congruent when the cheapest treatment has the smallest variance, incongruent when the cheapest treatment has the largest variance). We find that the costs considering design is generally more efficient than the balanced design, and we illustrate this theory on a 2 × 2 multicenter trial on lifestyle improvement of patients in general practices.
PubDate: 2017-09-25T19:40:45.322378-05:
DOI: 10.1002/sim.7499
- Authors: Francesca Lemme; Gerard J.P. Breukelen, Math J.J.M. Candel
- Type I error probability spending for post–market drug and vaccine
safety surveillance with binomial data- Authors: Ivair R. Silva
Abstract: Type I error probability spending functions are commonly used for designing sequential analysis of binomial data in clinical trials, but it is also quickly emerging for near–continuous sequential analysis of post–market drug and vaccine safety surveillance. It is well known that, for clinical trials, when the null hypothesis is not rejected, it is still important to minimize the sample size. Unlike in post–market drug and vaccine safety surveillance, that is not important. In post–market safety surveillance, specially when the surveillance involves identification of potential signals, the meaningful statistical performance measure to be minimized is the expected sample size when the null hypothesis is rejected. The present paper shows that, instead of the convex Type I error spending shape conventionally used in clinical trials, a concave shape is more indicated for post–market drug and vaccine safety surveillance. This is shown for both, continuous and group sequential analysis.
PubDate: 2017-09-25T19:35:54.569145-05:
DOI: 10.1002/sim.7504
- Authors: Ivair R. Silva
- Validating effectiveness of subgroup identification for longitudinal data
- Authors: Nichole Andrews; Hyunkeun Cho
Abstract: In clinical trials and biomedical studies, treatments are compared to determine which one is effective against illness; however, individuals can react to the same treatment very differently. We propose a complete process for longitudinal data that identifies subgroups of the population that would benefit from a specific treatment. A random effects linear model is used to evaluate individual treatment effects longitudinally where the random effects identify a positive or negative reaction to the treatment over time. With the individual treatment effects and characteristics of the patients, various classification algorithms are applied to build prediction models for subgrouping. While many subgrouping approaches have been developed recently, most of them do not check its validity. In this paper, we further propose a simple validation approach which not only determines if the subgroups used are appropriate and beneficial but also compares methods to predict individual treatment effects. This entire procedure is readily implemented by existing packages in statistical software. The effectiveness of the proposed method is confirmed with simulation studies and analysis of data from the Women Entering Care study on depression.
PubDate: 2017-09-25T19:30:28.623337-05:
DOI: 10.1002/sim.7500
- Authors: Nichole Andrews; Hyunkeun Cho
- Label‐invariant models for the analysis of
meta‐epidemiological data- Authors: K.M. Rhodes; D. Mawdsley, R.M. Turner, H.E. Jones, J. Savović, J.P.T. Higgins
Abstract: Rich meta‐epidemiological data sets have been collected to explore associations between intervention effect estimates and study‐level characteristics. Welton et al proposed models for the analysis of meta‐epidemiological data, but these models are restrictive because they force heterogeneity among studies with a particular characteristic to be at least as large as that among studies without the characteristic. In this paper we present alternative models that are invariant to the labels defining the 2 categories of studies. To exemplify the methods, we use a collection of meta‐analyses in which the Cochrane Risk of Bias tool has been implemented. We first investigate the influence of small trial sample sizes (less than 100 participants), before investigating the influence of multiple methodological flaws (inadequate or unclear sequence generation, allocation concealment, and blinding). We fit both the Welton et al model and our proposed label‐invariant model and compare the results. Estimates of mean bias associated with the trial characteristics and of between‐trial variances are not very sensitive to the choice of model. Results from fitting a univariable model show that heterogeneity variance is, on average, 88% greater among trials with less than 100 participants. On the basis of a multivariable model, heterogeneity variance is, on average, 25% greater among trials with inadequate/unclear sequence generation, 51% greater among trials with inadequate/unclear blinding, and 23% lower among trials with inadequate/unclear allocation concealment, although the 95% intervals for these ratios are very wide. Our proposed label‐invariant models for meta‐epidemiological data analysis facilitate investigations of between‐study heterogeneity attributable to certain study characteristics.
PubDate: 2017-09-19T23:05:37.791259-05:
DOI: 10.1002/sim.7491
- Authors: K.M. Rhodes; D. Mawdsley, R.M. Turner, H.E. Jones, J. Savović, J.P.T. Higgins
- A partially linear additive model for clustered proportion data
- Abstract: Proportion data with support lying in the interval [0,1] are a commonplace in various domains of medicine and public health. When these data are available as clusters, it is important to correctly incorporate the within‐cluster correlation to improve the estimation efficiency while conducting regression‐based risk evaluation. Furthermore, covariates may exhibit a nonlinear relationship with the (proportion) responses while quantifying disease status. As an alternative to various existing classical methods for modeling proportion data (such as augmented Beta regression) that uses maximum likelihood, or generalized estimating equations, we develop a partially linear additive model based on the quadratic inference function. Relying on quasi‐likelihood estimation techniques and polynomial spline approximation for unknown nonparametric functions, we obtain the estimators for both parametric part and nonparametric part of our model and study their large‐sample theoretical properties. We illustrate the advantages and usefulness of our proposition over other alternatives via extensive simulation studies, and application to a real dataset from a clinical periodontal study.
- Abstract: Proportion data with support lying in the interval [0,1] are a commonplace in various domains of medicine and public health. When these data are available as clusters, it is important to correctly incorporate the within‐cluster correlation to improve the estimation efficiency while conducting regression‐based risk evaluation. Furthermore, covariates may exhibit a nonlinear relationship with the (proportion) responses while quantifying disease status. As an alternative to various existing classical methods for modeling proportion data (such as augmented Beta regression) that uses maximum likelihood, or generalized estimating equations, we develop a partially linear additive model based on the quadratic inference function. Relying on quasi‐likelihood estimation techniques and polynomial spline approximation for unknown nonparametric functions, we obtain the estimators for both parametric part and nonparametric part of our model and study their large‐sample theoretical properties. We illustrate the advantages and usefulness of our proposition over other alternatives via extensive simulation studies, and application to a real dataset from a clinical periodontal study.
- Joint modeling of multiple ordinal adherence outcomes via generalized
estimating equations with flexible correlation structure- Abstract: Adherence to medication is critical in achieving effectiveness of many treatments. Factors that influence adherence behavior have been the subject of many clinical studies. Analyzing adherence is complicated because it is often measured on multiple drugs over a period, resulting in a multivariate longitudinal outcome. This paper is motivated by the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C study, where adherence is measured on two drugs as a bivariate ordinal longitudinal outcome. To analyze such outcome, we propose a joint model assuming the multivariate ordinal outcome arose from a partitioned latent multivariate normal process. We also provide a flexible multilevel association structure covering both between and within outcome correlation. In simulation studies, we show that the joint model provides unbiased estimators for regression parameters, which are more efficient than those obtained through fitting separate model for each outcome. The joint method also yields unbiased estimators for the correlation parameters when the correlation structure is correctly specified. Finally, we analyze the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C adherence data and discuss the findings.
- Abstract: Adherence to medication is critical in achieving effectiveness of many treatments. Factors that influence adherence behavior have been the subject of many clinical studies. Analyzing adherence is complicated because it is often measured on multiple drugs over a period, resulting in a multivariate longitudinal outcome. This paper is motivated by the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C study, where adherence is measured on two drugs as a bivariate ordinal longitudinal outcome. To analyze such outcome, we propose a joint model assuming the multivariate ordinal outcome arose from a partitioned latent multivariate normal process. We also provide a flexible multilevel association structure covering both between and within outcome correlation. In simulation studies, we show that the joint model provides unbiased estimators for regression parameters, which are more efficient than those obtained through fitting separate model for each outcome. The joint method also yields unbiased estimators for the correlation parameters when the correlation structure is correctly specified. Finally, we analyze the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C adherence data and discuss the findings.
- Prevalence estimation when disease status is verified only among test
positives: Applications in HIV screening programs- Abstract: The first goal of the United Nations' 90–90–90 HIV/AIDS elimination strategy is to ensure that, by 2020, 90% of HIV‐positive people know their HIV status. Estimating the prevalence of HIV among people eligible for screening allows assessment of the number of additional cases that might be diagnosed through continued screening efforts in this group. Here, we present methods for estimating prevalence when HIV status is verified by a gold standard only among those who test positive on an initial, imperfect screening test with known sensitivity and specificity. We develop maximum likelihood estimators and asymptotic confidence intervals for use in 2 scenarios: when the total number of test negatives is known (Scenario 1) and unknown (Scenario 2). We derive Bayesian prevalence estimators to account for non‐negligible uncertainty in previous estimates of the sensitivity and specificity. The Scenario 1 estimator consistently outperformed the Scenario 2 estimator in simulations, demonstrating the use of recording the number of test negatives in public health screening programs. For less accurate tests (sensitivity and specificity
- Abstract: The first goal of the United Nations' 90–90–90 HIV/AIDS elimination strategy is to ensure that, by 2020, 90% of HIV‐positive people know their HIV status. Estimating the prevalence of HIV among people eligible for screening allows assessment of the number of additional cases that might be diagnosed through continued screening efforts in this group. Here, we present methods for estimating prevalence when HIV status is verified by a gold standard only among those who test positive on an initial, imperfect screening test with known sensitivity and specificity. We develop maximum likelihood estimators and asymptotic confidence intervals for use in 2 scenarios: when the total number of test negatives is known (Scenario 1) and unknown (Scenario 2). We derive Bayesian prevalence estimators to account for non‐negligible uncertainty in previous estimates of the sensitivity and specificity. The Scenario 1 estimator consistently outperformed the Scenario 2 estimator in simulations, demonstrating the use of recording the number of test negatives in public health screening programs. For less accurate tests (sensitivity and specificity
- A novel case‐control subsampling approach for rapid model exploration of
large clustered binary data- Abstract: In many settings, an analysis goal is the identification of a factor, or set of factors associated with an event or outcome. Often, these associations are then used for inference and prediction. Unfortunately, in the big data era, the model building and exploration phases of analysis can be time‐consuming, especially if constrained by computing power (ie, a typical corporate workstation). To speed up this model development, we propose a novel subsampling scheme to enable rapid model exploration of clustered binary data using flexible yet complex model set‐ups (GLMMs with additive smoothing splines). By reframing the binary response prospective cohort study into a case‐control–type design, and using our knowledge of sampling fractions, we show one can approximate the model estimates as would be calculated from a full cohort analysis. This idea is extended to derive cluster‐specific sampling fractions and thereby incorporate cluster variation into an analysis. Importantly, we demonstrate that previously computationally prohibitive analyses can be conducted in a timely manner on a typical workstation. The approach is applied to analysing risk factors associated with adverse reactions relating to blood donation.
- Abstract: In many settings, an analysis goal is the identification of a factor, or set of factors associated with an event or outcome. Often, these associations are then used for inference and prediction. Unfortunately, in the big data era, the model building and exploration phases of analysis can be time‐consuming, especially if constrained by computing power (ie, a typical corporate workstation). To speed up this model development, we propose a novel subsampling scheme to enable rapid model exploration of clustered binary data using flexible yet complex model set‐ups (GLMMs with additive smoothing splines). By reframing the binary response prospective cohort study into a case‐control–type design, and using our knowledge of sampling fractions, we show one can approximate the model estimates as would be calculated from a full cohort analysis. This idea is extended to derive cluster‐specific sampling fractions and thereby incorporate cluster variation into an analysis. Importantly, we demonstrate that previously computationally prohibitive analyses can be conducted in a timely manner on a typical workstation. The approach is applied to analysing risk factors associated with adverse reactions relating to blood donation.
- Dynamic prediction in functional concurrent regression with an application
to child growth- Abstract: In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject‐specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject‐specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques.
- Abstract: In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject‐specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject‐specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques.
- Data‐generating models of dichotomous outcomes: Heterogeneity in
simulation studies for a random‐effects meta‐analysis- Abstract: Simulation studies to evaluate performance of statistical methods require a well‐specified data‐generating model. Details of these models are essential to interpret the results and arrive at proper conclusions. A case in point is random‐effects meta‐analysis of dichotomous outcomes. We reviewed a number of simulation studies that evaluated approximate normal models for meta‐analysis of dichotomous outcomes, and we assessed the data‐generating models that were used to generate events for a series of (heterogeneous) trials. We demonstrate that the performance of the statistical methods, as assessed by simulation, differs between these 3 alternative data‐generating models, with larger differences apparent in the small population setting. Our findings are relevant to multilevel binomial models in general.
- Abstract: Simulation studies to evaluate performance of statistical methods require a well‐specified data‐generating model. Details of these models are essential to interpret the results and arrive at proper conclusions. A case in point is random‐effects meta‐analysis of dichotomous outcomes. We reviewed a number of simulation studies that evaluated approximate normal models for meta‐analysis of dichotomous outcomes, and we assessed the data‐generating models that were used to generate events for a series of (heterogeneous) trials. We demonstrate that the performance of the statistical methods, as assessed by simulation, differs between these 3 alternative data‐generating models, with larger differences apparent in the small population setting. Our findings are relevant to multilevel binomial models in general.
- Tutorial on kernel estimation of continuous spatial and spatiotemporal
relative risk- Abstract: Kernel smoothing is a highly flexible and popular approach for estimation of probability density and intensity functions of continuous spatial data. In this role, it also forms an integral part of estimation of functionals such as the density‐ratio or “relative risk” surface. Originally developed with the epidemiological motivation of examining fluctuations in disease risk based on samples of cases and controls collected over a given geographical region, such functions have also been successfully used across a diverse range of disciplines where a relative comparison of spatial density functions has been of interest. This versatility has demanded ongoing developments and improvements to the relevant methodology, including use spatially adaptive smoothers; tests of significantly elevated risk based on asymptotic theory; extension to the spatiotemporal domain; and novel computational methods for their evaluation. In this tutorial paper, we review the current methodology, including the most recent developments in estimation, computation, and inference. All techniques are implemented in the new software package sparr, publicly available for the R language, and we illustrate its use with a pair of epidemiological examples.
- Abstract: Kernel smoothing is a highly flexible and popular approach for estimation of probability density and intensity functions of continuous spatial data. In this role, it also forms an integral part of estimation of functionals such as the density‐ratio or “relative risk” surface. Originally developed with the epidemiological motivation of examining fluctuations in disease risk based on samples of cases and controls collected over a given geographical region, such functions have also been successfully used across a diverse range of disciplines where a relative comparison of spatial density functions has been of interest. This versatility has demanded ongoing developments and improvements to the relevant methodology, including use spatially adaptive smoothers; tests of significantly elevated risk based on asymptotic theory; extension to the spatiotemporal domain; and novel computational methods for their evaluation. In this tutorial paper, we review the current methodology, including the most recent developments in estimation, computation, and inference. All techniques are implemented in the new software package sparr, publicly available for the R language, and we illustrate its use with a pair of epidemiological examples.
- Inference on network statistics by restricting to the network space:
applications to sexual history data- Abstract: Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd.
- Abstract: Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd.
- Estimating population effects of vaccination using large, routinely
collected data- Abstract: Vaccination in populations can have several kinds of effects. Establishing that vaccination produces population‐level effects beyond the direct effects in the vaccinated individuals can have important consequences for public health policy. Formal methods have been developed for study designs and analysis that can estimate the different effects of vaccination. However, implementing field studies to evaluate the different effects of vaccination can be expensive, of limited generalizability, or unethical. It would be advantageous to use routinely collected data to estimate the different effects of vaccination. We consider how different types of data are needed to estimate different effects of vaccination. The examples include rotavirus vaccination of young children, influenza vaccination of elderly adults, and a targeted influenza vaccination campaign in schools. Directions for future research are discussed. Copyright © 2017 John Wiley & Sons, Ltd.
- Abstract: Vaccination in populations can have several kinds of effects. Establishing that vaccination produces population‐level effects beyond the direct effects in the vaccinated individuals can have important consequences for public health policy. Formal methods have been developed for study designs and analysis that can estimate the different effects of vaccination. However, implementing field studies to evaluate the different effects of vaccination can be expensive, of limited generalizability, or unethical. It would be advantageous to use routinely collected data to estimate the different effects of vaccination. We consider how different types of data are needed to estimate different effects of vaccination. The examples include rotavirus vaccination of young children, influenza vaccination of elderly adults, and a targeted influenza vaccination campaign in schools. Directions for future research are discussed. Copyright © 2017 John Wiley & Sons, Ltd.
- Improved estimation of the cumulative incidence of rare outcomes
- Abstract: Studying the incidence of rare events is both scientifically important and statistically challenging. When few events are observed, standard survival analysis estimators behave erratically, particularly if covariate adjustment is necessary. In these settings, it is possible to improve upon existing estimators by considering estimation in a bounded statistical model. This bounded model incorporates existing scientific knowledge about the incidence of an event in the population. Estimators that are guaranteed to agree with existing scientific knowledge on event incidence may exhibit superior behavior relative to estimators that ignore this knowledge. Focusing on the setting of competing risks, we propose estimators of cumulative incidence that are guaranteed to respect a bounded model and show that when few events are observed, the proposed estimators offer improvements over existing estimators in bias and variance. We illustrate the proposed estimators using data from a recent preventive HIV vaccine efficacy trial. Copyright © 2017 John Wiley & Sons, Ltd.
- Abstract: Studying the incidence of rare events is both scientifically important and statistically challenging. When few events are observed, standard survival analysis estimators behave erratically, particularly if covariate adjustment is necessary. In these settings, it is possible to improve upon existing estimators by considering estimation in a bounded statistical model. This bounded model incorporates existing scientific knowledge about the incidence of an event in the population. Estimators that are guaranteed to agree with existing scientific knowledge on event incidence may exhibit superior behavior relative to estimators that ignore this knowledge. Focusing on the setting of competing risks, we propose estimators of cumulative incidence that are guaranteed to respect a bounded model and show that when few events are observed, the proposed estimators offer improvements over existing estimators in bias and variance. We illustrate the proposed estimators using data from a recent preventive HIV vaccine efficacy trial. Copyright © 2017 John Wiley & Sons, Ltd.
- Online cross‐validation‐based ensemble learning
- Abstract: Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble‐based online estimators of an infinite‐dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time‐series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross‐validation to identify the algorithm with the best performance. We show that by basing estimates on the cross‐validation‐selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best‐performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. Copyright © 2017 John Wiley & Sons, Ltd.
- Abstract: Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble‐based online estimators of an infinite‐dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time‐series models and, as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross‐validation to identify the algorithm with the best performance. We show that by basing estimates on the cross‐validation‐selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best‐performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. Copyright © 2017 John Wiley & Sons, Ltd.
- Constrained binary classification using ensemble learning: an application
to cost‐efficient targeted PrEP strategies- Abstract: Binary classification problems are ubiquitous in health and social sciences. In many cases, one wishes to balance two competing optimality considerations for a binary classifier. For instance, in resource‐limited settings, an human immunodeficiency virus prevention program based on offering pre‐exposure prophylaxis (PrEP) to select high‐risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program. In this article, we consider a general class of constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold. These include the minimization of the rate of positive predictions subject to a minimum sensitivity, the maximization of sensitivity subject to a maximum rate of positive predictions, and the Neyman–Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner methodology. This approach linearly combines a user‐supplied library of scoring algorithms, with combination weights and a discriminating threshold chosen to minimize the constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individualized PrEP targeting strategy in a resource‐limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study. Copyright © 2017 John Wiley & Sons, Ltd.
- Abstract: Binary classification problems are ubiquitous in health and social sciences. In many cases, one wishes to balance two competing optimality considerations for a binary classifier. For instance, in resource‐limited settings, an human immunodeficiency virus prevention program based on offering pre‐exposure prophylaxis (PrEP) to select high‐risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program. In this article, we consider a general class of constrained binary classification problems wherein the objective function and the constraint are both monotonic with respect to a threshold. These include the minimization of the rate of positive predictions subject to a minimum sensitivity, the maximization of sensitivity subject to a maximum rate of positive predictions, and the Neyman–Pearson paradigm, which minimizes the type II error subject to an upper bound on the type I error. We propose an ensemble approach to these binary classification problems based on the Super Learner methodology. This approach linearly combines a user‐supplied library of scoring algorithms, with combination weights and a discriminating threshold chosen to minimize the constrained optimality criterion. We then illustrate the application of the proposed classifier to develop an individualized PrEP targeting strategy in a resource‐limited setting, with the goal of minimizing the number of PrEP offerings while achieving a minimum required sensitivity. This proof of concept data analysis uses baseline data from the ongoing Sustainable East Africa Research in Community Health study. Copyright © 2017 John Wiley & Sons, Ltd.
- Issue Information
- Abstract: No abstract is available for this article.
- Abstract: No abstract is available for this article.
- Circular‐circular regression model with a spike at zero
- Abstract: With reference to a real data on cataract surgery, we discuss the problem of zero‐inflated circular‐circular regression when both covariate and response are circular random variables and a large proportion of the responses are zeros. The regression model is proposed, and the estimation procedure for the parameters is discussed. Some relevant test procedures are also suggested. Simulation studies and real data analysis are performed to illustrate the applicability of the model.
- Abstract: With reference to a real data on cataract surgery, we discuss the problem of zero‐inflated circular‐circular regression when both covariate and response are circular random variables and a large proportion of the responses are zeros. The regression model is proposed, and the estimation procedure for the parameters is discussed. Some relevant test procedures are also suggested. Simulation studies and real data analysis are performed to illustrate the applicability of the model.