for Journals by Title or ISSN
for Articles by Keywords
Followed Journals
Journal you Follow: 0
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Journal Cover Statistics in Medicine
  [SJR: 1.811]   [H-I: 131]   [126 followers]  Follow
   Hybrid Journal Hybrid journal (It can contain Open Access articles)
   ISSN (Print) 0277-6715 - ISSN (Online) 1097-0258
   Published by John Wiley and Sons Homepage  [1579 journals]
  • A review of tensor‐based methods and their application to hospital
           care data
    • Abstract: In many situations, a researcher is interested in the analysis of the scores of a set of observation units on a set of variables. However, in medicine, it is very frequent that the information is replicated at different occasions. The occasions can be time‐varying or refer to different conditions. In such cases, the data can be stored in a 3‐way array or tensor. The Candecomp/Parafac and Tucker3 methods represent the most common methods for analyzing 3‐way tensors. In this work, a review of these methods is provided, and then this class of methods is applied to a 3‐way data set concerning hospital care data for a hospital in Rome (Italy) during 15 years distinguished in 3 groups of consecutive years (1892–1896, 1940–1944, 1968–1972). The analysis reveals some peculiar aspects about the use of health services and its evolution along the time.
  • Covariate adjustment using propensity scores for dependent censoring
           problems in the accelerated failure time model
    • Abstract: In many medical studies, estimation of the association between treatment and outcome of interest is often of primary scientific interest. Standard methods for its evaluation in survival analysis typically require the assumption of independent censoring. This assumption might be invalid in many medical studies, where the presence of dependent censoring leads to difficulties in analyzing covariate effects on disease outcomes. This data structure is called “semicompeting risks data,” for which many authors have proposed an artificial censoring technique. However, confounders with large variability may lead to excessive artificial censoring, which subsequently results in numerically unstable estimation. In this paper, we propose a strategy for weighted estimation of the associations in the accelerated failure time model. Weights are based on propensity score modeling of the treatment conditional on confounder variables. This novel application of propensity scores avoids excess artificial censoring caused by the confounders and simplifies computation. Monte Carlo simulation studies and application to AIDS and cancer research are used to illustrate the methodology.
  • Sensitivity analysis for publication bias in meta‐analysis of diagnostic
           studies for a continuous biomarker
    • Abstract: Publication bias is one of the most important issues in meta‐analysis. For standard meta‐analyses to examine intervention effects, the funnel plot and the trim‐and‐fill method are simple and widely used techniques for assessing and adjusting for the influence of publication bias, respectively. However, their use may be subjective and can then produce misleading insights. To make a more objective inference for publication bias, various sensitivity analysis methods have been proposed, including the Copas selection model. For meta‐analysis of diagnostic studies evaluating a continuous biomarker, the summary receiver operating characteristic (sROC) curve is a very useful method in the presence of heterogeneous cutoff values. To our best knowledge, no methods are available for evaluation of influence of publication bias on estimation of the sROC curve. In this paper, we introduce a Copas‐type selection model for meta‐analysis of diagnostic studies and propose a sensitivity analysis method for publication bias. Our method enables us to assess the influence of publication bias on the estimation of the sROC curve and then judge whether the result of the meta‐analysis is sufficiently confident or should be interpreted with much caution. We illustrate our proposed method with real data.
  • Semiparametric accelerated failure time cure rate mixture models with
           competing risks
    • Abstract: Modern medical treatments have substantially improved survival rates for many chronic diseases and have generated considerable interest in developing cure fraction models for survival data with a non‐ignorable cured proportion. Statistical analysis of such data may be further complicated by competing risks that involve multiple types of endpoints. Regression analysis of competing risks is typically undertaken via a proportional hazards model adapted on cause‐specific hazard or subdistribution hazard. In this article, we propose an alternative approach that treats competing events as distinct outcomes in a mixture. We consider semiparametric accelerated failure time models for the cause‐conditional survival function that are combined through a multinomial logistic model within the cure‐mixture modeling framework. The cure‐mixture approach to competing risks provides a means to determine the overall effect of a treatment and insights into how this treatment modifies the components of the mixture in the presence of a cure fraction. The regression and nonparametric parameters are estimated by a nonparametric kernel‐based maximum likelihood estimation method. Variance estimation is achieved through resampling methods for the kernel‐smoothed likelihood function. Simulation studies show that the procedures work well in practical settings. Application to a sarcoma study demonstrates the use of the proposed method for competing risk data with a cure fraction.
  • Simultaneous inference for factorial multireader diagnostic trials
    • Abstract: We study inference methods for the analysis of multireader diagnostic trials. In these studies, data are usually collected in terms of a factorial design involving the factors Modality and Reader. Furthermore, repeated measures appear in a natural way since the same patient is observed under different modalities by several readers and the repeated measures may have a quite involved dependency structure. The hypotheses are formulated in terms of the areas under the ROC curves. Currently, only global testing procedures exist for the analysis of such data. We derive rank‐based multiple contrast test procedures and simultaneous confidence intervals which take the correlation between the test statistics into account. The procedures allow for testing arbitrary multiple hypotheses. Extensive simulation studies show that the new approaches control the nominal type 1 error rate very satisfactorily. A real data set illustrates the application of the proposed methods.
  • Generalized linear mixed model for binary outcomes when covariates are
           subject to measurement errors and detection limits
    • Abstract: Longitudinal measurement of biomarkers is important in determining risk factors for binary endpoints such as infection or disease. However, biomarkers are subject to measurement error, and some are also subject to left‐censoring due to a lower limit of detection. Statistical methods to address these issues are few. We herein propose a generalized linear mixed model and estimate the model parameters using the Monte Carlo Newton‐Raphson (MCNR) method. Inferences regarding the parameters are made by applying Louis's method and the delta method. Simulation studies were conducted to compare the proposed MCNR method with existing methods including the maximum likelihood (ML) method and the ad hoc approach of replacing the left‐censored values with half of the detection limit (HDL). The results showed that the performance of the MCNR method is superior to ML and HDL with respect to the empirical standard error, as well as the coverage probability for the 95% confidence interval. The HDL method uses an incorrect imputation method, and the computation is constrained by the number of quadrature points; while the ML method also suffers from the constrain for the number of quadrature points, the MCNR method does not have this limitation and approximates the likelihood function better than the other methods. The improvement of the MCNR method is further illustrated with real‐world data from a longitudinal study of local cervicovaginal HIV viral load and its effects on oncogenic HPV detection in HIV‐positive women.
  • Correlated Poisson models for age‐period‐cohort analysis
    • Abstract: Age‐period‐cohort (APC) models are widely used to analyze population‐level rates, particularly cancer incidence and mortality. These models are used for descriptive epidemiology, comparative risk analysis, and extrapolating future disease burden. Traditional APC models have 2 major limitations: (1) they lack parsimony because they require estimation of deviations from linear trends for each level of age, period, and cohort; and (2) rates observed at similar ages, periods, and cohorts are treated as independent, ignoring any correlations between them that may lead to biased parameter estimates and inefficient standard errors. We propose a novel approach to estimation of APC models using a spatially correlated Poisson model that accounts for over‐dispersion and correlations in age, period, and cohort, simultaneously. We treat the outcome of interest as event rates occurring over a grid defined by values of age, period, and cohort. Rates defined in this manner lend themselves to well‐established approaches from spatial statistics in which correlation among proximate observations may be modeled using a spatial random effect. Through simulations, we show that in the presence of spatial dependence and over‐dispersion: (1) the correlated Poisson model attains lower AIC; (2) the traditional APC model produces biased trend parameter estimates; and (3) the correlated Poisson model corrects most of this bias. We illustrate our approach using brain and breast cancer incidence rates from the Surveillance Epidemiology and End Results Program of the United States. Our approach can be easily extended to accommodate comparative risk analyses and interpolation of cells in the Lexis with sparse data.
  • Improving likelihood‐based inference in control rate regression
    • Abstract: Control rate regression is a diffuse approach to account for heterogeneity among studies in meta‐analysis by including information about the outcome risk of patients in the control condition. Correcting for the presence of measurement error affecting risk information in the treated and in the control group has been recognized as a necessary step to derive reliable inferential conclusions. Within this framework, the paper considers the problem of small sample size as an additional source of misleading inference about the slope of the control rate regression. Likelihood procedures relying on first‐order approximations are shown to be substantially inaccurate, especially when dealing with increasing heterogeneity and correlated measurement errors. We suggest to address the problem by relying on higher‐order asymptotics. In particular, we derive Skovgaard's statistic as an instrument to improve the accuracy of the approximation of the signed profile log‐likelihood ratio statistic to the standard normal distribution. The proposal is shown to provide much more accurate results than standard likelihood solutions, with no appreciable computational effort. The advantages of Skovgaard's statistic in control rate regression are shown in a series of simulation experiments and illustrated in a real data example. R code for applying first‐ and second‐order statistic for inference on the slope on the control rate regression is provided.
  • Direct likelihood inference on the cause‐specific cumulative incidence
           function: A flexible parametric regression modelling approach
    • Abstract: In a competing risks analysis, interest lies in the cause‐specific cumulative incidence function (CIF) that can be calculated by either (1) transforming on the cause‐specific hazard or (2) through its direct relationship with the subdistribution hazard. We expand on current competing risks methodology from within the flexible parametric survival modelling framework (FPM) and focus on approach (2). This models all cause‐specific CIFs simultaneously and is more useful when we look to questions on prognosis. We also extend cure models using a similar approach described by Andersson et al for flexible parametric relative survival models. Using SEER public use colorectal data, we compare and contrast our approach with standard methods such as the Fine & Gray model and show that many useful out‐of‐sample predictions can be made after modelling the cause‐specific CIFs using an FPM approach. Alternative link functions may also be incorporated such as the logit link. Models can also be easily extended for time‐dependent effects.
  • Multiplicity considerations in subgroup analysis
    • Abstract: This paper deals with the general topic of subgroup analysis in late‐stage clinical trials with emphasis on multiplicity considerations. The discussion begins with multiplicity issues arising in the context of exploratory subgroup analysis, including principled approaches to subgroup search that are applied as part of subgroup exploration exercises as well as in adaptive biomarker‐driven designs. Key considerations in confirmatory subgroup analysis based on one or more pre‐specified patient populations are reviewed, including a survey of multiplicity adjustment methods recommended in multi‐population phase III clinical trials. Guidelines for interpretation of significant findings in several patient populations are introduced to facilitate the decision‐making process and achieve consistent labeling across development programs.
  • Inference on network statistics by restricting to the network space:
           applications to sexual history data
    • Abstract: Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd.
  • Estimating population effects of vaccination using large, routinely
           collected data
    • Abstract: Vaccination in populations can have several kinds of effects. Establishing that vaccination produces population‐level effects beyond the direct effects in the vaccinated individuals can have important consequences for public health policy. Formal methods have been developed for study designs and analysis that can estimate the different effects of vaccination. However, implementing field studies to evaluate the different effects of vaccination can be expensive, of limited generalizability, or unethical. It would be advantageous to use routinely collected data to estimate the different effects of vaccination. We consider how different types of data are needed to estimate different effects of vaccination. The examples include rotavirus vaccination of young children, influenza vaccination of elderly adults, and a targeted influenza vaccination campaign in schools. Directions for future research are discussed. Copyright © 2017 John Wiley & Sons, Ltd.
  • Commentary: Changes are still needed on multiple co‐primary
    • Abstract: The Food and Drug Administration in the United States issued a much‐awaited draft guidance on ‘Multiple Endpoints in Clinical Trials’ in January 2017. The draft guidance is well written and contains consistent message on the technical implementation of the principles laid out in the guidance. In this commentary, we raise a question on applying the principles to studies designed from a safety perspective. We then direct our attention to issues related to multiple co‐primary endpoints. In a paper published in the Drug Information Journal in 2007, Offen et al. give examples of disorders where multiple co‐primary endpoints are required by regulators. The standard test for multiple co‐primary endpoints is the min test which tests each endpoint individually, at the one‐sided 2.5% level, for a confirmatory trial. This approach leads to a substantial loss of power when the number of co‐primary endpoints exceeds 2, a fact acknowledged in the draft guidance. We review approaches that have been proposed to tackle the problem of power loss and propose a new one. Using recommendations by Chen et al. for the assessment of drugs for vulvar and vaginal atrophy published in the Drug Information Journal in 2010, we argue the need for more changes and urge a path forward that uses different levels of claims to reflect the effectiveness of a product on multiple endpoints that are equally important and mostly unrelated. Copyright © 2017 John Wiley & Sons, Ltd.
  • A joint logistic regression and covariate‐adjusted
           continuous‐time Markov chain model
    • Abstract: The use of longitudinal measurements to predict a categorical outcome is an increasingly common goal in research studies. Joint models are commonly used to describe two or more models simultaneously by considering the correlated nature of their outcomes and the random error present in the longitudinal measurements. However, there is limited research on joint models with longitudinal predictors and categorical cross‐sectional outcomes. Perhaps the most challenging task is how to model the longitudinal predictor process such that it represents the true biological mechanism that dictates the association with the categorical response. We propose a joint logistic regression and Markov chain model to describe a binary cross‐sectional response, where the unobserved transition rates of a two‐state continuous‐time Markov chain are included as covariates. We use the method of maximum likelihood to estimate the parameters of our model. In a simulation study, coverage probabilities of about 95%, standard deviations close to standard errors, and low biases for the parameter values show that our estimation method is adequate. We apply the proposed joint model to a dataset of patients with traumatic brain injury to describe and predict a 6‐month outcome based on physiological data collected post‐injury and admission characteristics. Our analysis indicates that the information provided by physiological changes over time may help improve prediction of long‐term functional status of these severely ill subjects. Copyright © 2017 John Wiley & Sons, Ltd.
  • Composite and multicomponent end points in clinical trials
    • Abstract: In January 2017, the FDA released the draft guidance to industry on multiple end points in clinical trials. A class of multiplicity problems arise from the testing of individual or subset of components of a composite or multicomponent end point. This commentary attempts to further clarify these problems. Discussions include general consideration on the use of the composite and multicomponent end points, situations when multiplicity adjustments are needed, and the relevant multiple testing methods. Copyright © 2017 John Wiley & Sons, Ltd.
  • Improved estimation of the cumulative incidence of rare outcomes
    • Abstract: Studying the incidence of rare events is both scientifically important and statistically challenging. When few events are observed, standard survival analysis estimators behave erratically, particularly if covariate adjustment is necessary. In these settings, it is possible to improve upon existing estimators by considering estimation in a bounded statistical model. This bounded model incorporates existing scientific knowledge about the incidence of an event in the population. Estimators that are guaranteed to agree with existing scientific knowledge on event incidence may exhibit superior behavior relative to estimators that ignore this knowledge. Focusing on the setting of competing risks, we propose estimators of cumulative incidence that are guaranteed to respect a bounded model and show that when few events are observed, the proposed estimators offer improvements over existing estimators in bias and variance. We illustrate the proposed estimators using data from a recent preventive HIV vaccine efficacy trial. Copyright © 2017 John Wiley & Sons, Ltd.
  • Clustering high‐dimensional mixed data to uncover sub‐phenotypes:
           joint analysis of phenotypic and genotypic data
    • Abstract: The LIPGENE‐SU.VI.MAX study, like many others, recorded high‐dimensional continuous phenotypic data and categorical genotypic data. LIPGENE‐SU.VI.MAX focuses on the need to account for both phenotypic and genetic factors when studying the metabolic syndrome (MetS), a complex disorder that can lead to higher risk of type 2 diabetes and cardiovascular disease. Interest lies in clustering the LIPGENE‐SU.VI.MAX participants into homogeneous groups or sub‐phenotypes, by jointly considering their phenotypic and genotypic data, and in determining which variables are discriminatory. A novel latent variable model that elegantly accommodates high dimensional, mixed data is developed to cluster LIPGENE‐SU.VI.MAX participants using a Bayesian finite mixture model. A computationally efficient variable selection algorithm is incorporated, estimation is via a Gibbs sampling algorithm and an approximate BIC‐MCMC criterion is developed to select the optimal model. Two clusters or sub‐phenotypes (‘healthy’ and ‘at risk’) are uncovered. A small subset of variables is deemed discriminatory, which notably includes phenotypic and genotypic variables, highlighting the need to jointly consider both factors. Further, 7 years after the LIPGENE‐SU.VI.MAX data were collected, participants underwent further analysis to diagnose presence or absence of the MetS. The two uncovered sub‐phenotypes strongly correspond to the 7‐year follow‐up disease classification, highlighting the role of phenotypic and genotypic factors in the MetS and emphasising the potential utility of the clustering approach in early screening. Additionally, the ability of the proposed approach to define the uncertainty in sub‐phenotype membership at the participant level is synonymous with the concepts of precision medicine and nutrition. Copyright © 2017 John Wiley & Sons, Ltd.
  • Some remaining challenges regarding multiple endpoints in clinical trials
    • Abstract: Despite recent advance in methods for handling multiple endpoints in clinical trials, some challenges remain. This paper discusses some of these challenges, including confusion surrounding the terminology used to describe the multiple endpoints, the justification for simultaneously testing for non‐inferiority and superiority in a non‐inferiority trial, lack of agreement on the situations under which multiple objectives do or do not lead to the need for a multiplicity correction, and choice of the most appropriate multiple comparisons procedure. In addition, this paper will discuss the position of the recent FDA draft guidance, Multiple Endpoints in Clinical Trials, on these issues. Copyright © 2017 John Wiley & Sons, Ltd.
  • Issue Information
    • Abstract: No abstract is available for this article.
  • A semiparametric method for comparing the discriminatory ability of
           biomarkers subject to limit of detection
    • Abstract: Receiver operating characteristic curves and the area under the curves (AUC) are often used to compare the discriminatory ability of potentially correlated biomarkers. Many biomarkers are subject to limit of detection due to the instrumental limitation in measurements and may not be normally distributed. Standard parametric methods assuming normality can lead to biased results when the normality assumption is violated. We propose new estimation and inference procedures for the AUCs of biomarkers subject to limit of detection by using the semiparametric transformation model allowing for heteroscedasticity. We obtain the nonparametric maximum likelihood estimators by maximizing the likelihood for the observed data with limit of detection. The proposed estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Additionally, we propose a Wald type test statistic to compare the AUCs of 2 potentially correlated biomarkers with limit of detection. Extensive simulation studies demonstrate that the proposed method is robust to nonnormality while performing as well as its parametric counterpart when the normality assumption is true. An application to an autism study is provided.
  • Optimum design and sequential treatment allocation in an experiment in
           deep brain stimulation with sets of treatment combinations
    • Abstract: In an experiment including patients who underwent surgery for deep brain stimulation electrode placement, each patient responds to a set of 9 treatment combinations. There are 16 such sets, and the design problem is to choose which sets should be administered and in what proportions. Extensions to the methods of nonsequential optimum experimental design lead to identification of an unequally weighted optimum design involving 4 sets of treatment combinations. In the actual experiment, patients arrive sequentially and present with sets of prognostic factors. The idea of loss due to Burman is extended and used to assess designs with varying randomization structures. It is found that a simple sequential design using only 2 sets of treatments has surprisingly good properties for trials with the proposed number of patients.
  • Bayesian hierarchical model for analyzing multiresponse longitudinal
           pharmacokinetic data
    • Abstract: Traditional Chinese medicine (TCM) is a very complex mixture containing many different ingredients. Thus, statistical analysis of traditional Chinese medicine data becomes challenging, as one needs to handle the association among the observed data across different time points and across different ingredients of the multivariate response. This paper builds a 3‐stage Bayesian hierarchical model for analyzing multivariate response pharmacokinetic data. Usually, the dimensionality of the parameter space is very huge, which leads to the parameter‐estimation difficulty. So we take the hybrid Markov chain Monte Carlo algorithms to obtain the posterior Bayesian estimation of corresponding parameters in our model. Both simulation study and real‐data analysis show that our theoretical model and Markov chain Monte Carlo algorithms perform well, and especially the correlation among different ingredients can be calculated very accurately.
  • Bayesian noninferiority test for 2 binomial probabilities as the extension
           of Fisher exact test
    • Abstract: Noninferiority trials have recently gained importance for the clinical trials of drugs and medical devices. In these trials, most statistical methods have been used from a frequentist perspective, and historical data have been used only for the specification of the noninferiority margin Δ>0. In contrast, Bayesian methods, which have been studied recently are advantageous in that they can use historical data to specify prior distributions and are expected to enable more efficient decision making than frequentist methods by borrowing information from historical trials. In the case of noninferiority trials for response probabilities π1,π2, Bayesian methods evaluate the posterior probability of H1:π1>π2−Δ being true. To numerically calculate such posterior probability, complicated Appell hypergeometric function or approximation methods are used. Further, the theoretical relationship between Bayesian and frequentist methods is unclear. In this work, we give the exact expression of the posterior probability of the noninferiority under some mild conditions and propose the Bayesian noninferiority test framework which can flexibly incorporate historical data by using the conditional power prior. Further, we show the relationship between Bayesian posterior probability and the P value of the Fisher exact test. From this relationship, our method can be interpreted as the Bayesian noninferior extension of the Fisher exact test, and we can treat superiority and noninferiority in the same framework. Our method is illustrated through Monte Carlo simulations to evaluate the operating characteristics, the application to the real HIV clinical trial data, and the sample size calculation using historical data.
  • Extending the MR‐Egger method for multivariable Mendelian randomization
           to correct for both measured and unmeasured pleiotropy
    • Abstract: Methods have been developed for Mendelian randomization that can obtain consistent causal estimates while relaxing the instrumental variable assumptions. These include multivariable Mendelian randomization, in which a genetic variant may be associated with multiple risk factors so long as any association with the outcome is via the measured risk factors (measured pleiotropy), and the MR‐Egger (Mendelian randomization‐Egger) method, in which a genetic variant may be directly associated with the outcome not via the risk factor of interest, so long as the direct effects of the variants on the outcome are uncorrelated with their associations with the risk factor (unmeasured pleiotropy). In this paper, we extend the MR‐Egger method to a multivariable setting to correct for both measured and unmeasured pleiotropy. We show, through theoretical arguments and a simulation study, that the multivariable MR‐Egger method has advantages over its univariable counterpart in terms of plausibility of the assumption needed for consistent causal estimation and power to detect a causal effect when this assumption is satisfied. The methods are compared in an applied analysis to investigate the causal effect of high‐density lipoprotein cholesterol on coronary heart disease risk. The multivariable MR‐Egger method will be useful to analyse high‐dimensional data in situations where the risk factors are highly related and it is difficult to find genetic variants specifically associated with the risk factor of interest (multivariable by design), and as a sensitivity analysis when the genetic variants are known to have pleiotropic effects on measured risk factors.
  • Treatment evaluation for a data‐driven subgroup in adaptive enrichment
           designs of clinical trials
    • Abstract: Adaptive enrichment designs (AEDs) of clinical trials allow investigators to restrict enrollment to a promising subgroup based on an interim analysis. Most of the existing AEDs deal with a small number of predefined subgroups, which are often unknown at the design stage. The newly developed Simon design offers a great deal of flexibility in subgroup selection (without requiring pre‐defined subgroups) but does not provide a procedure for estimating and testing treatment efficacy for the selected subgroup. This article proposes a 2‐stage AED which does not require predefined subgroups but requires a prespecified algorithm for choosing a subgroup on the basis of baseline covariate information. Having a prespecified algorithm for subgroup selection makes it possible to use cross‐validation and bootstrap methods to correct for the resubstitution bias in estimating treatment efficacy for the selected subgroup. The methods are evaluated and compared in a simulation study mimicking actual clinical trials of human immunodeficiency virus infection.
  • Learning curve estimation in medical devices and procedures: Hierarchical
  • Efficient treatment allocation in 2 × 2 multicenter trials when costs
           and variances are heterogeneous
    • Abstract: At the design stage of a study, it is crucial to compute the sample size needed for treatment effect estimation with maximum precision and power. The optimal design depends on the costs, which may be known at the design stage, and on the outcome variances, which are unknown. A balanced design, optimal for homogeneous costs and variances, is typically used. An alternative to the balanced design is a design optimal for the known and possibly heterogeneous costs, and homogeneous variances, called costs considering design. Both designs suffer from loss of efficiency, compared with optimal designs for heterogeneous costs and variances. For 2 × 2 multicenter trials, we compute the relative efficiency of the balanced and the costs considering designs, relative to the optimal designs. We consider 2 heterogeneous costs and variance scenarios (in 1 scenario, 2 treatment conditions have small and 2 have large costs and variances; in the other scenario, 1 treatment condition has small, 2 have intermediate, and 1 has large costs and variances). Within these scenarios, we examine the relative efficiency of the balanced design and of the costs considering design as a function of the extents of heterogeneity of the costs and of the variances and of their congruence (congruent when the cheapest treatment has the smallest variance, incongruent when the cheapest treatment has the largest variance). We find that the costs considering design is generally more efficient than the balanced design, and we illustrate this theory on a 2 × 2 multicenter trial on lifestyle improvement of patients in general practices.
  • Type I error probability spending for post–market drug and vaccine
           safety surveillance with binomial data
    • Abstract: Type I error probability spending functions are commonly used for designing sequential analysis of binomial data in clinical trials, but it is also quickly emerging for near–continuous sequential analysis of post–market drug and vaccine safety surveillance. It is well known that, for clinical trials, when the null hypothesis is not rejected, it is still important to minimize the sample size. Unlike in post–market drug and vaccine safety surveillance, that is not important. In post–market safety surveillance, specially when the surveillance involves identification of potential signals, the meaningful statistical performance measure to be minimized is the expected sample size when the null hypothesis is rejected. The present paper shows that, instead of the convex Type I error spending shape conventionally used in clinical trials, a concave shape is more indicated for post–market drug and vaccine safety surveillance. This is shown for both, continuous and group sequential analysis.
  • Validating effectiveness of subgroup identification for longitudinal data
    • Abstract: In clinical trials and biomedical studies, treatments are compared to determine which one is effective against illness; however, individuals can react to the same treatment very differently. We propose a complete process for longitudinal data that identifies subgroups of the population that would benefit from a specific treatment. A random effects linear model is used to evaluate individual treatment effects longitudinally where the random effects identify a positive or negative reaction to the treatment over time. With the individual treatment effects and characteristics of the patients, various classification algorithms are applied to build prediction models for subgrouping. While many subgrouping approaches have been developed recently, most of them do not check its validity. In this paper, we further propose a simple validation approach which not only determines if the subgroups used are appropriate and beneficial but also compares methods to predict individual treatment effects. This entire procedure is readily implemented by existing packages in statistical software. The effectiveness of the proposed method is confirmed with simulation studies and analysis of data from the Women Entering Care study on depression.
  • Accommodating the ecological fallacy in disease mapping in the absence of
           individual exposures
    • Abstract: In health exposure modeling, in particular, disease mapping, the ecological fallacy arises because the relationship between aggregated disease incidence on areal units and average exposure on those units differs from the relationship between the event of individual incidence and the associated individual exposure. This article presents a novel modeling approach to address the ecological fallacy in the least informative data setting. We assume the known population at risk with an observed incidence for a collection of areal units and, separately, environmental exposure recorded during the period of incidence at a collection of monitoring stations. We do not assume any partial individual level information or random allocation of individuals to observed exposures. We specify a conceptual incidence surface over the study region as a function of an exposure surface resulting in a stochastic integral of the block average disease incidence. The true block level incidence is an unavailable Monte Carlo integration for this stochastic integral. We propose an alternative manageable Monte Carlo integration for the integral. Modeling in this setting is immediately hierarchical, and we fit our model within a Bayesian framework. To alleviate the resulting computational burden, we offer 2 strategies for efficient model fitting: one is through modularization, the other is through sparse or dimension‐reduced Gaussian processes. We illustrate the performance of our model with simulations based on a heat‐related mortality dataset in Ohio and then analyze associated real data.
  • Label‐invariant models for the analysis of
           meta‐epidemiological data
    • Abstract: Rich meta‐epidemiological data sets have been collected to explore associations between intervention effect estimates and study‐level characteristics. Welton et al proposed models for the analysis of meta‐epidemiological data, but these models are restrictive because they force heterogeneity among studies with a particular characteristic to be at least as large as that among studies without the characteristic. In this paper we present alternative models that are invariant to the labels defining the 2 categories of studies. To exemplify the methods, we use a collection of meta‐analyses in which the Cochrane Risk of Bias tool has been implemented. We first investigate the influence of small trial sample sizes (less than 100 participants), before investigating the influence of multiple methodological flaws (inadequate or unclear sequence generation, allocation concealment, and blinding). We fit both the Welton et al model and our proposed label‐invariant model and compare the results. Estimates of mean bias associated with the trial characteristics and of between‐trial variances are not very sensitive to the choice of model. Results from fitting a univariable model show that heterogeneity variance is, on average, 88% greater among trials with less than 100 participants. On the basis of a multivariable model, heterogeneity variance is, on average, 25% greater among trials with inadequate/unclear sequence generation, 51% greater among trials with inadequate/unclear blinding, and 23% lower among trials with inadequate/unclear allocation concealment, although the 95% intervals for these ratios are very wide. Our proposed label‐invariant models for meta‐epidemiological data analysis facilitate investigations of between‐study heterogeneity attributable to certain study characteristics.
  • Circular‐circular regression model with a spike at zero
    • Abstract: With reference to a real data on cataract surgery, we discuss the problem of zero‐inflated circular‐circular regression when both covariate and response are circular random variables and a large proportion of the responses are zeros. The regression model is proposed, and the estimation procedure for the parameters is discussed. Some relevant test procedures are also suggested. Simulation studies and real data analysis are performed to illustrate the applicability of the model.
  • Practical recommendations for reporting Fine‐Gray model analyses for
           competing risk data
    • Abstract: In survival analysis, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Outcomes in medical research are frequently subject to competing risks. In survival analysis, there are 2 key questions that can be addressed using competing risk regression models: first, which covariates affect the rate at which events occur, and second, which covariates affect the probability of an event occurring over time. The cause‐specific hazard model estimates the effect of covariates on the rate at which events occur in subjects who are currently event‐free. Subdistribution hazard ratios obtained from the Fine‐Gray model describe the relative effect of covariates on the subdistribution hazard function. Hence, the covariates in this model can also be interpreted as having an effect on the cumulative incidence function or on the probability of events occurring over time. We conducted a review of the use and interpretation of the Fine‐Gray subdistribution hazard model in articles published in the medical literature in 2015. We found that many authors provided an unclear or incorrect interpretation of the regression coefficients associated with this model. An incorrect and inconsistent interpretation of regression coefficients may lead to confusion when comparing results across different studies. Furthermore, an incorrect interpretation of estimated regression coefficients can result in an incorrect understanding about the magnitude of the association between exposure and the incidence of the outcome. The objective of this article is to clarify how these regression coefficients should be reported and to propose suggestions for interpreting these coefficients.
  • Infectious disease prediction with kernel conditional density estimation
    • Abstract: Creating statistical models that generate accurate predictions of infectious disease incidence is a challenging problem whose solution could benefit public health decision makers. We develop a new approach to this problem using kernel conditional density estimation (KCDE) and copulas. We obtain predictive distributions for incidence in individual weeks using KCDE and tie those distributions together into joint distributions using copulas. This strategy enables us to create predictions for the timing of and incidence in the peak week of the season. Our implementation of KCDE incorporates 2 novel kernel components: a periodic component that captures seasonality in disease incidence and a component that allows for a full parameterization of the bandwidth matrix with discrete variables. We demonstrate via simulation that a fully parameterized bandwidth matrix can be beneficial for estimating conditional densities. We apply the method to predicting dengue fever and influenza and compare to a seasonal autoregressive integrated moving average model and HHH4, a previously published extension to the generalized linear model framework developed for infectious disease incidence. The KCDE outperforms the baseline methods for predictions of dengue incidence in individual weeks. The KCDE also offers more consistent performance than the baseline models for predictions of incidence in the peak week and is comparable to the baseline models on the other prediction targets. Using the periodic kernel function led to better predictions of incidence. Our approach and extensions of it could yield improved predictions for public health decision makers, particularly in diseases with heterogeneous seasonal dynamics such as dengue fever.
  • Generalized survival models for correlated time‐to‐event data
    • Abstract: Our aim is to develop a rich and coherent framework for modeling correlated time‐to‐event data, including (1) survival regression models with different links and (2) flexible modeling for time‐dependent and nonlinear effects with rich postestimation. We extend the class of generalized survival models, which expresses a transformed survival in terms of a linear predictor, by incorporating a shared frailty or random effects for correlated survival data. The proposed approach can include parametric or penalized smooth functions for time, time‐dependent effects, nonlinear effects, and their interactions. The maximum (penalized) marginal likelihood method is used to estimate the regression coefficients and the variance for the frailty or random effects. The optimal smoothing parameters for the penalized marginal likelihood estimation can be automatically selected by a likelihood‐based cross‐validation criterion. For models with normal random effects, Gauss‐Hermite quadrature can be used to obtain the cluster‐level marginal likelihoods. The Akaike Information Criterion can be used to compare models and select the link function. We have implemented these methods in the R package rstpm2. Simulating for both small and larger clusters, we find that this approach performs well. Through 2 applications, we demonstrate (1) a comparison of proportional hazards and proportional odds models with random effects for clustered survival data and (2) the estimation of time‐varying effects on the log‐time scale, age‐varying effects for a specific treatment, and two‐dimensional splines for time and age.
  • Understanding MCP‐MOD dose finding as a method based on linear
    • Abstract: MCP‐MOD is a testing and model selection approach for clinical dose finding studies. During testing, contrasts of dose group means are derived from candidate dose response models. A multiple‐comparison procedure is applied that controls the alpha level for the family of null hypotheses associated with the contrasts. Provided at least one contrast is significant, a corresponding set of “good” candidate models is identified. The model generating the most significant contrast is typically selected. There have been numerous publications on the method. It was endorsed by the European Medicines Agency.The MCP‐MOD procedure can be alternatively represented as a method based on simple linear regression, where “simple” refers to the inclusion of an intercept and a single predictor variable, which is a transformation of dose. It is shown that the contrasts are equal to least squares linear regression slope estimates after a rescaling of the predictor variables. The test for each contrast is the usual t statistic for a null slope parameter, except that a variance estimate with fewer degrees of freedom is used in the standard error. Selecting the model corresponding to the most significant contrast P value is equivalent to selecting the predictor variable yielding the smallest residual sum of squares. This criteria orders the models like a common goodness‐of‐fit test, but it does not assure a good fit. Common inferential methods applied to the selected model are subject to distortions that are often present following data‐based model selection.
  • Modeling conditional dependence among multiple diagnostic tests
    • Abstract: When multiple imperfect dichotomous diagnostic tests are applied to an individual, it is possible that some or all of their results remain dependent even after conditioning on the true disease status. The estimates could be biased if this conditional dependence is ignored when using the test results to infer about the prevalence of a disease or the accuracies of the diagnostic tests. However, statistical methods correcting for this bias by modelling higher‐order conditional dependence terms between multiple diagnostic tests are not well addressed in the literature. This paper extends a Bayesian fixed effects model for 2 diagnostic tests with pairwise correlation to cases with 3 or more diagnostic tests with higher order correlations. Simulation results show that the proposed fixed effects model works well both in the case when the tests are highly correlated and in the case when the tests are truly conditionally independent, provided adequate external information is available in the form of fixed constraints or prior distributions. A data set on the diagnosis of childhood pulmonary tuberculosis is used to illustrate the proposed model.
  • Parametric multistate survival models: Flexible modelling allowing
           transition‐specific distributions with application to estimating
           clinically useful measures of effect differences
    • Abstract: Multistate models are increasingly being used to model complex disease profiles. By modelling transitions between disease states, accounting for competing events at each transition, we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this article, we concentrate on parametric multistate models, both Markov and semi‐Markov, and develop a flexible framework where each transition can be specified by a variety of parametric models including exponential, Weibull, Gompertz, Royston‐Parmar proportional hazards models or log‐logistic, log‐normal, generalised gamma accelerated failure time models, possibly sharing parameters across transitions. We also extend the framework to allow time‐dependent effects. We then use an efficient and generalisable simulation method to calculate transition probabilities from any fitted multistate model, and show how it facilitates the simple calculation of clinically useful measures, such as expected length of stay in each state, and differences and ratios of proportion within each state as a function of time, for specific covariate patterns. We illustrate our methods using a dataset of patients with primary breast cancer. User‐friendly Stata software is provided.
  • Modeling continuous response variables using ordinal regression
    • Abstract: We study the application of a widely used ordinal regression model, the cumulative probability model (CPM), for continuous outcomes. Such models are attractive for the analysis of continuous response variables because they are invariant to any monotonic transformation of the outcome and because they directly model the cumulative distribution function from which summaries such as expectations and quantiles can easily be derived. Such models can also readily handle mixed type distributions. We describe the motivation, estimation, inference, model assumptions, and diagnostics. We demonstrate that CPMs applied to continuous outcomes are semiparametric transformation models. Extensive simulations are performed to investigate the finite sample performance of these models. We find that properly specified CPMs generally have good finite sample performance with moderate sample sizes, but that bias may occur when the sample size is small. Cumulative probability models are fairly robust to minor or moderate link function misspecification in our simulations. For certain purposes, the CPMs are more efficient than other models. We illustrate their application, with model diagnostics, in a study of the treatment of HIV. CD4 cell count and viral load 6 months after the initiation of antiretroviral therapy are modeled using CPMs; both variables typically require transformations, and viral load has a large proportion of measurements below a detection limit.
  • Penalized estimation for proportional hazards models with current status
    • Abstract: We provide a simple and practical, yet flexible, penalized estimation method for a Cox proportional hazards model with current status data. We approximate the baseline cumulative hazard function by monotone B‐splines and use a hybrid approach based on the Fisher‐scoring algorithm and the isotonic regression to compute the penalized estimates. We show that the penalized estimator of the nonparametric component achieves the optimal rate of convergence under some smooth conditions and that the estimators of the regression parameters are asymptotically normal and efficient. Moreover, a simple variance estimation method is considered for inference on the regression parameters. We perform 2 extensive Monte Carlo studies to evaluate the finite‐sample performance of the penalized approach and compare it with the 3 competing R packages: C1.coxph, intcox, and ICsurv. A goodness‐of‐fit test and model diagnostics are also discussed. The methodology is illustrated with 2 real applications.
  • A model‐based conditional power assessment for decision making in
           randomized controlled trial studies
    • Abstract: Conditional power based on summary statistic by comparing outcomes (such as the sample mean) directly between 2 groups is a convenient tool for decision making in randomized controlled trial studies. In this paper, we extend the traditional summary statistic‐based conditional power with a general model‐based assessment strategy, where the test statistic is based on a regression model. Asymptotic relationships between parameter estimates based on the observed interim data and final unobserved data are established, from which we develop an analytic model‐based conditional power assessment for both Gaussian and non‐Gaussian data. The model‐based strategy is not only flexible in handling baseline covariates and more powerful in detecting the treatment effects compared with the conventional method but also more robust in controlling the overall type I error under certain missing data mechanisms. The performance of the proposed method is evaluated by extensive simulation studies and illustrated with an application to a clinical study.
  • Simultaneous confidence regions for multivariate bioequivalence
    • Abstract: Demonstrating bioequivalence of several pharmacokinetic (PK) parameters, such as AUC and Cmax, that are calculated from the same biological sample measurements is in fact a multivariate problem, even though this is neglected by most practitioners and regulatory bodies, who typically settle for separate univariate analyses. We believe, however, that a truly multivariate evaluation of all PK measures simultaneously is clearly more adequate. In this paper, we review methods to construct joint confidence regions around multivariate normal means and investigate their usefulness in simultaneous bioequivalence problems via simulation. Some of them work well for idealised scenarios but break down when faced with real-data challenges such as unknown variance and correlation among the PK parameters. We study the shapes of the confidence regions resulting from different methods, discuss how marginal simultaneous confidence intervals for the individual PK measures can be derived, and illustrate the application to data from a trial on ticlopidine hydrochloride. An R package is available.
  • A test for comparing current status survival data with crossing hazard
           functions and its application to immunogenicity of biotherapeutics
    • Abstract: Several tests have been recently implemented in the nonparametric comparison of current status survival data. However, they are not suited for the situation of crossing hazards. In this setting, we propose a new test specifically designed for crossing hazards alternatives. The proposed test is compared to classical implemented tests through simulations mimicking crossing hazards situations with various schemes of censoring. The results show that the proposed test has a correct type I error and generally outperforms the existing methods. The application of the proposed test on a real dataset on immunogenicity of interferon-β among multiple sclerosis patients highlights the interest of the proposed test.
  • Group testing regression models with dilution submodels
    • Abstract: Group testing, where specimens are tested initially in pools, is widely used to screen individuals for sexually transmitted diseases. However, a common problem encountered in practice is that group testing can increase the number of false negative test results. This occurs primarily when positive individual specimens within a pool are diluted by negative ones, resulting in positive pools testing negatively. If the goal is to estimate a population-level regression model relating individual disease status to observed covariates, severe bias can result if an adjustment for dilution is not made. Recognizing this as a critical issue, recent binary regression approaches in group testing have utilized continuous biomarker information to acknowledge the effect of dilution. In this paper, we have the same overall goal but take a different approach. We augment existing group testing regression models (that assume no dilution) with a parametric dilution submodel for pool-level sensitivity and estimate all parameters using maximum likelihood. An advantage of our approach is that it does not rely on external biomarker test data, which may not be available in surveillance studies. Furthermore, unlike previous approaches, our framework allows one to formally test whether dilution is present based on the observed group testing data. We use simulation to illustrate the performance of our estimation and inference methods, and we apply these methods to 2 infectious disease data sets.
  • Two-stage designs versus European scaled average designs in bioequivalence
           studies for highly variable drugs: Which to choose'
    • Abstract: The usual approach to determine bioequivalence for highly variable drugs is scaled average bioequivalence, which is based on expanding the limits as a function of the within-subject variability in the reference formulation. This requires separately estimating this variability and thus using replicated or semireplicated crossover designs. On the other hand, regulations also allow using common 2 × 2 crossover designs based on two-stage adaptive approaches with sample size reestimation at an interim analysis. The choice between scaled or two-stage designs is crucial and must be fully described in the protocol. Using Monte Carlo simulations, we show that both methodologies achieve comparable statistical power, though the scaled method usually requires less sample size, but at the expense of each subject being exposed more times to the treatments. With an adequate initial sample size (not too low, eg, 24 subjects), two-stage methods are a flexible and efficient option to consider: They have enough power (eg, 80%) at the first stage for non-highly variable drugs, and, if otherwise, they provide the opportunity to step up to a second stage that includes additional subjects.
  • A robust interrupted time series model for analyzing complex health care
           intervention data
    • Abstract: Current health policy calls for greater use of evidence-based care delivery services to improve patient quality and safety outcomes. Care delivery is complex, with interacting and interdependent components that challenge traditional statistical analytic techniques, in particular, when modeling a time series of outcomes data that might be “interrupted” by a change in a particular method of health care delivery. Interrupted time series (ITS) is a robust quasi-experimental design with the ability to infer the effectiveness of an intervention that accounts for data dependency. Current standardized methods for analyzing ITS data do not model changes in variation and correlation following the intervention. This is a key limitation since it is plausible for data variability and dependency to change because of the intervention. Moreover, present methodology either assumes a prespecified interruption time point with an instantaneous effect or removes data for which the effect of intervention is not fully realized. In this paper, we describe and develop a novel robust interrupted time series (robust-ITS) model that overcomes these omissions and limitations. The robust-ITS model formally performs inference on (1) identifying the change point; (2) differences in preintervention and postintervention correlation; (3) differences in the outcome variance preintervention and postintervention; and (4) differences in the mean preintervention and postintervention. We illustrate the proposed method by analyzing patient satisfaction data from a hospital that implemented and evaluated a new nursing care delivery model as the intervention of interest. The robust-ITS model is implemented in an R Shiny toolbox, which is freely available to the community.
  • Algorithms for evaluating reference scaled average bioequivalence: Power,
           bias, and consumer risk
    • Abstract: The determination of the bioequivalence between highly variable drug products involves the evaluation of reference scaled average bioequivalence. The European and US regulatory authorities suggest different algorithms for the implementation of this approach. Both algorithms are based on approximations reflected in lower than the achievable power or higher than the nominal consumer risk of 5%. To overcome these deficiencies, a new class of algorithms, the so-called Exact methods, was earlier introduced. However, their applicability was limited. We propose 2 modifications which make their computation simpler and also applicable with any study design.Four algorithms were evaluated in simulated 3-period and 4-period bioequivalence studies: Hyslop's approach recommended by the US FDA, the method of average bioequivalence with expanding limits requested by the European EMA, and 2 versions of the new Exact methods.At small sample sizes, the Exact methods had substantially higher statistical power than Hyslop's algorithm and had lower consumer risk than the method of average bioequivalence with expanding limits. Similarly to the Hyslop's algorithm, higher than 5% consumer risk was observed only with either unbalanced study design or with additional regulatory requirements.The improved Exact algorithms compare favorably with the alternative procedures. They are based on the bias correction method of Hedges. The recognition that the scaled difference statistics is measured with bias has important practical implications when results of pilot bioequivalence studies are evaluated and, at the same time, calls for the revision of the statistical theory of RSABE and its related methods.
  • Improving phase II oncology trials using best observed RECIST response as
           an endpoint by modelling continuous tumour measurements
    • Abstract: In many phase II trials in solid tumours, patients are assessed using endpoints based on the Response Evaluation Criteria in Solid Tumours (RECIST) scale. Often, analyses are based on the response rate. This is the proportion of patients who have an observed tumour shrinkage above a predefined level and no new tumour lesions. The augmented binary method has been proposed to improve the precision of the estimator of the response rate. The method involves modelling the tumour shrinkage to avoid dichotomising it. However, in many trials the best observed response is used as the primary outcome. In such trials, patients are followed until progression, and their best observed RECIST outcome is used as the primary endpoint. In this paper, we propose a method that extends the augmented binary method so that it can be used when the outcome is best observed response. We show through simulated data and data from a real phase II cancer trial that this method improves power in both single-arm and randomised trials. The average gain in power compared to the traditional analysis is equivalent to approximately a 35% increase in sample size. A modified version of the method is proposed to reduce the computational effort required. We show this modified method maintains much of the efficiency advantages.
  • Mendelian randomization incorporating uncertainty about pleiotropy
    • Abstract: Mendelian randomization (MR) requires strong assumptions about the genetic instruments, of which the most difficult to justify relate to pleiotropy. In a two-sample MR, different methods of analysis are available if we are able to assume, M1: no pleiotropy (fixed effects meta-analysis), M2: that there may be pleiotropy but that the average pleiotropic effect is zero (random effects meta-analysis), and M3: that the average pleiotropic effect is nonzero (MR-Egger). In the latter 2 cases, we also require that the size of the pleiotropy is independent of the size of the effect on the exposure. Selecting one of these models without good reason would run the risk of misrepresenting the evidence for causality. The most conservative strategy would be to use M3 in all analyses as this makes the weakest assumptions, but such an analysis gives much less precise estimates and so should be avoided whenever stronger assumptions are credible. We consider the situation of a two-sample design when we are unsure which of these 3 pleiotropy models is appropriate. The analysis is placed within a Bayesian framework and Bayesian model averaging is used. We demonstrate that even large samples of the scale used in genome-wide meta-analysis may be insufficient to distinguish the pleiotropy models based on the data alone. Our simulations show that Bayesian model averaging provides a reasonable trade-off between bias and precision. Bayesian model averaging is recommended whenever there is uncertainty about the nature of the pleiotropy.
  • Correction
  • Modelling two cause-specific hazards of competing risks in one cumulative
           proportional odds model'
    • Abstract: Competing risks extend standard survival analysis to considering time-to-first-event and type-of-first-event, where the event types are called competing risks. The competing risks process is completely described by all cause-specific hazards, ie, the hazard marked by the event type. Separate Cox models for each cause-specific hazard are the standard approach to regression modelling, but they come with the interpretational challenge that there are as many regression coefficients as there are competing risks. An alternative approach is to directly model the cumulative event probabilities, but again, there will be as many models as there are competing risks. The aim of this paper is to investigate the usefulness of a third alternative. Proportional odds modelling of all cause-specific hazards summarizes the effect of one covariate on “opposing” competing outcomes in one regression coefficient. For instance, if the competing outcomes are hospital death and alive discharge from hospital, the modelling assumption is that a covariate affects both outcomes in opposing directions, but the effect size is of the same absolute magnitude. We will investigate the interpretational aspects of the approach analysing a data set on intensive care unit patients using parametric methods.
  • Bayesian hierarchical joint modeling of repeatedly measured continuous and
    • Abstract: Modeling of correlated biomarkers jointly has been shown to improve the efficiency of parameter estimates, leading to better clinical decisions. In this paper, we employ a joint modeling approach to a unique diabetes dataset, where blood glucose (continuous) and urine glucose (ordinal) measures of disease severity for diabetes are known to be correlated. The postulated joint model assumes that the outcomes are from distributions that are in the exponential family and hence modeled as multivariate generalized linear mixed effects model associated through correlated and/or shared random effects. The Markov chain Monte Carlo Bayesian approach is used to approximate posterior distribution and draw inference on the parameters. This proposed methodology provides a flexible framework to account for the hierarchical structure of the highly unbalanced data as well as the association between the 2 outcomes. The results indicate improved efficiency of parameter estimates when blood glucose and urine glucose are modeled jointly. Moreover, the simulation studies show that estimates obtained from the joint model are consistently less biased and more efficient than those in the separate models.
  • A synthetic estimator for the efficacy of clinical trials with
           all-or-nothing compliance
    • Abstract: A critical issue in the analysis of clinical trials is patients' noncompliance to assigned treatments. In the context of a binary treatment with all or nothing compliance, the intent-to-treat analysis is a straightforward approach to estimating the effectiveness of the trial. In contrast, there exist 3 commonly used estimators with varying statistical properties for the efficacy of the trial, formally known as the complier-average causal effect. The instrumental variable estimator may be unbiased but can be extremely variable in many settings. The as treated and per protocol estimators are usually more efficient than the instrumental variable estimator, but they may suffer from selection bias. We propose a synthetic approach that incorporates all 3 estimators in a data-driven manner. The synthetic estimator is a linear convex combination of the instrumental variable, per protocol, and as treated estimators, resembling the popular model-averaging approach in the statistical literature. However, our synthetic approach is nonparametric; thus, it is applicable to a variety of outcome types without specific distributional assumptions. We also discuss the construction of the synthetic estimator using an analytic form derived from a simple normal mixture distribution. We apply the synthetic approach to a clinical trial for post-traumatic stress disorder.
  • Survival trees for interval-censored survival data
    • Abstract: Interval-censored data, in which the event time is only known to lie in some time interval, arise commonly in practice, for example, in a medical study in which patients visit clinics or hospitals at prescheduled times and the events of interest occur between visits. Such data are appropriately analyzed using methods that account for this uncertainty in event time measurement. In this paper, we propose a survival tree method for interval-censored data based on the conditional inference framework. Using Monte Carlo simulations, we find that the tree is effective in uncovering underlying tree structure, performs similarly to an interval-censored Cox proportional hazards model fit when the true relationship is linear, and performs at least as well as (and in the presence of right-censoring outperforms) the Cox model when the true relationship is not linear. Further, the interval-censored tree outperforms survival trees based on imputing the event time as an endpoint or the midpoint of the censoring interval. We illustrate the application of the method on tooth emergence data.
  • A semiparametric joint model for terminal trend of quality of life and
           survival in palliative care research
    • Abstract: Palliative medicine is an interdisciplinary specialty focusing on improving quality of life (QOL) for patients with serious illness and their families. Palliative care programs are available or under development at over 80% of large US hospitals (300+ beds). Palliative care clinical trials present unique analytic challenges relative to evaluating the palliative care treatment efficacy which is to improve patients’ diminishing QOL as disease progresses towards end of life (EOL). A unique feature of palliative care clinical trials is that patients will experience decreasing QOL during the trial despite potentially beneficial treatment. Often longitudinal QOL and survival data are highly correlated which, in the face of censoring, makes it challenging to properly analyze and interpret terminal QOL trend. To address these issues, we propose a novel semiparametric statistical approach to jointly model the terminal trend of QOL and survival data. There are two sub-models in our approach: a semiparametric mixed effects model for longitudinal QOL and a Cox model for survival. We use regression splines method to estimate the nonparametric curves and AIC to select knots. We assess the model performance through simulation to establish a novel modeling approach that could be used in future palliative care research trials. Application of our approach in a recently completed palliative care clinical trial is also presented.
  • Study of coverage of confidence intervals for the standardized mortality
           ratio in studies with missing death certificates
    • Abstract: This paper assesses the coverage probability of commonly used confidence intervals for the standardized mortality ratio (SMR) when death certificates are missing. It also proposes alternative confidence interval approaches with coverage probabilities close to .95. In epidemiology, the SMR is an important measure of risk of disease mortality (or incidence) to compare a specific group to a reference population. The appropriate confidence interval for the SMR is crucial, especially when the SMR is close to 1.0 and the statistical significance of the risk needs to be determined. There are several ways to calculate confidence intervals, depending on a study characteristics (ie, studies with small number of deaths, studies with small counts, aggregate SMRs based on several countries or time periods, and studies with missing death certificates). This paper summarizes the most commonly used confidence intervals and newly applies several existing approaches not previously used for SMR confidence intervals. The coverage probability and length of the different confidence intervals are assessed using a simulation study and different scenarios. The performance of the confidence intervals for the lung cancer SMR and all other cancer SMR is also assessed using the dataset of French and Czech uranium miners. Finally, the most appropriate confidence intervals to use under different study scenarios are recommended.
  • Leveraging Prognostic Baseline Variables to Gain Precision in Randomized
  • A comparison of 20 heterogeneity variance estimators in statistical
           synthesis of results from studies: A simulation study
    • Abstract: When we synthesize research findings via meta-analysis, it is common to assume that the true underlying effect differs across studies. Total variability consists of the within-study and between-study variances (heterogeneity). There have been established measures, such as I2, to quantify the proportion of the total variation attributed to heterogeneity. There is a plethora of estimation methods available for estimating heterogeneity. The widely used DerSimonian and Laird estimation method has been challenged, but knowledge of the overall performance of heterogeneity estimators is incomplete. We identified 20 heterogeneity estimators in the literature and evaluated their performance in terms of mean absolute estimation error, coverage probability, and length of the confidence interval for the summary effect via a simulation study. Although previous simulation studies have suggested the Paule-Mandel estimator, it has not been compared with all the available estimators. For dichotomous outcomes, estimating heterogeneity through Markov chain Monte Carlo is a good choice if an informative prior distribution for heterogeneity is employed (eg, by published Cochrane reviews). Nonparametric bootstrap and positive DerSimonian and Laird perform well for all assessment criteria for both dichotomous and continuous outcomes. Hartung-Makambi estimator can be the best choice when the heterogeneity values are close to 0.07 for dichotomous outcomes and medium heterogeneity values (0.01 , 0.05) for continuous outcomes. Hence, there are heterogeneity estimators (nonparametric bootstrap DerSimonian and Laird and positive DerSimonian and Laird) that perform better than the suggested Paule-Mandel. Maximum likelihood provides the best performance for both types of outcome in the absence of heterogeneity.
  • Meta-analytical synthesis of regression coefficients under different
           categorization scheme of continuous covariates
    • Abstract: Recently, the number of clinical prediction models sharing the same regression task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these regression models have not been sufficiently studied, particularly in meta-analysis settings where only regression coefficients are available. One of the difficulties lies in the differences between the categorization schemes of continuous covariates across different studies. In general, categorization methods using cutoff values are study specific across available models, even if they focus on the same covariates of interest. Differences in the categorization of covariates could lead to serious bias in the estimated regression coefficients and thus in subsequent syntheses. To tackle this issue, we developed synthesis methods for linear regression models with different categorization schemes of covariates. A 2-step approach to aggregate the regression coefficient estimates is proposed. The first step is to estimate the joint distribution of covariates by introducing a latent sampling distribution, which uses one set of individual participant data to estimate the marginal distribution of covariates with categorization. The second step is to use a nonlinear mixed-effects model with correction terms for the bias due to categorization to estimate the overall regression coefficients. Especially in terms of precision, numerical simulations show that our approach outperforms conventional methods, which only use studies with common covariates or ignore the differences between categorization schemes. The method developed in this study is also applied to a series of WHO epidemiologic studies on white blood cell counts.
  • A clinical trial design using the concept of proportional time using the
           generalized gamma ratio distribution
    • Abstract: Traditional methods of sample size and power calculations in clinical trials with a time-to-event end point are based on the logrank test (and its variations), Cox proportional hazards (PH) assumption, or comparison of means of 2 exponential distributions. Of these, sample size calculation based on PH assumption is likely the most common and allows adjusting for the effect of one or more covariates. However, when designing a trial, there are situations when the assumption of PH may not be appropriate. Additionally, when it is known that there is a rapid decline in the survival curve for a control group, such as from previously conducted observational studies, a design based on the PH assumption may confer only a minor statistical improvement for the treatment group that is neither clinically nor practically meaningful. For such scenarios, a clinical trial design that focuses on improvement in patient longevity is proposed, based on the concept of proportional time using the generalized gamma ratio distribution. Simulations are conducted to evaluate the performance of the proportional time method and to identify the situations in which such a design will be beneficial as compared to the standard design using a PH assumption, piecewise exponential hazards assumption, and specific cases of a cure rate model. A practical example in which hemorrhagic stroke patients are randomized to 1 of 2 arms in a putative clinical trial demonstrates the usefulness of this approach by drastically reducing the number of patients needed for study enrollment.
  • Mediation analysis for a survival outcome with time-varying exposures,
           mediators, and confounders
    • Abstract: We propose an approach to conduct mediation analysis for survival data with time-varying exposures, mediators, and confounders. We identify certain interventional direct and indirect effects through a survival mediational g-formula and describe the required assumptions. We also provide a feasible parametric approach along with an algorithm and software to estimate these effects. We apply this method to analyze the Framingham Heart Study data to investigate the causal mechanism of smoking on mortality through coronary artery disease. The estimated overall 10-year all-cause mortality risk difference comparing “always smoke 30 cigarettes per day” versus “never smoke” was 4.3 (95% CI = (1.37, 6.30)). Of the overall effect, we estimated 7.91% (95% CI: = 1.36%, 19.32%) was mediated by the incidence and timing of coronary artery disease. The survival mediational g-formula constitutes a powerful tool for conducting mediation analysis with longitudinal data.
  • Notes on the overlap measure as an alternative to the Youden index: How
           are they related'
    • Abstract: The receiver operating characteristic (ROC) curve is frequently used to evaluate and compare diagnostic tests. As one of the ROC summary indices, the Youden index measures the effectiveness of a diagnostic marker and enables the selection of an optimal threshold value (cut-off point) for the marker. Recently, the overlap coefficient, which captures the similarity between 2 distributions directly, has been considered as an alternative index for determining the diagnostic performance of markers. In this case, a larger overlap indicates worse diagnostic accuracy, and vice versa. This paper provides a graphical demonstration and mathematical derivation of the relationship between the Youden index and the overlap coefficient and states their advantages over the most popular diagnostic measure, the area under the ROC curve. Furthermore, we outline the differences between the Youden index and overlap coefficient and identify situations in which the overlap coefficient outperforms the Youden index. Numerical examples and real data analysis are provided.
  • Inference for multimarker adaptive enrichment trials
    • Abstract: Identification of treatment selection biomarkers has become very important in cancer drug development. Adaptive enrichment designs have been developed for situations where a unique treatment selection biomarker is not apparent based on the mechanism of action of the drug. With such designs, the eligibility rules may be adaptively modified at interim analysis times to exclude patients who are unlikely to benefit from the test treatment.We consider a recently proposed, particularly flexible approach that permits development of model-based multifeature predictive classifiers as well as optimized cut-points for continuous biomarkers. A single significance test, including all randomized patients, is performed at the end of the trial of the strong null hypothesis that the expected outcome on the test treatment is no better than control for any of the subset populations of patients accrued in the K stages of the clinical trial.In this paper, we address 2 issues involving inference following an adaptive enrichment design as described above. The first is specification of the intended use population and estimation of treatment effect for that population following rejection of the strong null hypothesis. The second issue is defining conditions in which rejection of the strong null hypothesis implies rejection of the null hypothesis for the intended use population.
  • Joint two-part Tobit models for longitudinal and time-to-event data
    • Abstract: In this article, we show how Tobit models can address problems of identifying characteristics of subjects having left-censored outcomes in the context of developing a method for jointly analyzing time-to-event and longitudinal data. There are some methods for handling these types of data separately, but they may not be appropriate when time to event is dependent on the longitudinal outcome, and a substantial portion of values are reported to be below the limits of detection. An alternative approach is to develop a joint model for the time-to-event outcome and a two-part longitudinal outcome, linking them through random effects. This proposed approach is implemented to assess the association between the risk of decline of CD4/CD8 ratio and rates of change in viral load, along with discriminating between patients who are potentially progressors to AIDS from patients who do not. We develop a fully Bayesian approach for fitting joint two-part Tobit models and illustrate the proposed methods on simulated and real data from an AIDS clinical study.
  • Blood pressure and the risk of chronic kidney disease progression using
           multistate marginal structural models in the CRIC Study
    • Abstract: In patients with chronic kidney disease (CKD), clinical interest often centers on determining treatments and exposures that are causally related to renal progression. Analyses of longitudinal clinical data in this population are often complicated by clinical competing events, such as end-stage renal disease (ESRD) and death, and time-dependent confounding, where patient factors that are predictive of later exposures and outcomes are affected by past exposures. We developed multistate marginal structural models (MS-MSMs) to assess the effect of time-varying systolic blood pressure on disease progression in subjects with CKD. The multistate nature of the model allows us to jointly model disease progression characterized by changes in the estimated glomerular filtration rate (eGFR), the onset of ESRD, and death, and thereby avoid unnatural assumptions of death and ESRD as noninformative censoring events for subsequent changes in eGFR. We model the causal effect of systolic blood pressure on the probability of transitioning into 1 of 6 disease states given the current state. We use inverse probability weights with stabilization to account for potential time-varying confounders, including past eGFR, total protein, serum creatinine, and hemoglobin. We apply the model to data from the Chronic Renal Insufficiency Cohort Study, a multisite observational study of patients with CKD.
  • Subgroup detection and sample size calculation with proportional hazards
           regression for survival data
    • Abstract: In this paper, we propose a testing procedure for detecting and estimating the subgroup with an enhanced treatment effect in survival data analysis. Here, we consider a new proportional hazard model that includes a nonparametric component for the covariate effect in the control group and a subgroup-treatment–interaction effect defined by a change plane. We develop a score-type test for detecting the existence of the subgroup, which is doubly robust against misspecification of the baseline effect model or the propensity score but not both under mild assumptions for censoring. When the null hypothesis of no subgroup is rejected, the change-plane parameters that define the subgroup can be estimated on the basis of supremum of the normalized score statistic. The asymptotic distributions of the proposed test statistic under the null and local alternative hypotheses are established. On the basis of established asymptotic distributions, we further propose a sample size calculation formula for detecting a given subgroup effect and derive a numerical algorithm for implementing the sample size calculation in clinical trial designs. The performance of the proposed approach is evaluated by simulation studies. An application to an AIDS clinical trial data is also given for illustration.
  • Links between causal effects and causal association for surrogacy
           evaluation in a gaussian setting
    • Abstract: Two paradigms for the evaluation of surrogate markers in randomized clinical trials have been proposed: the causal effects paradigm and the causal association paradigm. Each of these paradigms rely on assumptions that must be made to proceed with estimation and to validate a candidate surrogate marker (S) for the true outcome of interest (T). We consider the setting in which S and T are Gaussian and are generated from structural models that include an unobserved confounder. Under the assumed structural models, we relate the quantities used to evaluate surrogacy within both the causal effects and causal association frameworks. We review some of the common assumptions made to aid in estimating these quantities and show that assumptions made within one framework can imply strong assumptions within the alternative framework. We demonstrate that there is a similarity, but not exact correspondence between the quantities used to evaluate surrogacy within each framework, and show that the conditions for identifiability of the surrogacy parameters are different from the conditions, which lead to a correspondence of these quantities.
  • Comparing sampling methods for pharmacokinetic studies using model
           averaged derived parameters
    • Abstract: Pharmacokinetic studies aim to study how a compound is absorbed, distributed, metabolised, and excreted. The concentration of the compound in the blood or plasma is measured at different time points after administration and pharmacokinetic parameters such as the area under the curve (AUC) or maximum concentration (Cmax) are derived from the resulting concentration time profile. In this paper, we want to compare different methods for collecting concentration measurements (traditional sampling versus microsampling) on the basis of these derived parameters. We adjust and evaluate an existing method for testing superiority of multiple derived parameters that accounts for model uncertainty. We subsequently extend the approach to allow testing for equivalence. We motivate the methods through an illustrative example and evaluate the performance using simulations. The extensions show promising results for application to the desired setting.
  • STEIN: A simple toxicity and efficacy interval design for seamless phase
           I/II clinical trials
    • Abstract: Seamless phase I/II dose-finding trials are attracting increasing attention nowadays in early-phase drug development for oncology. Most existing phase I/II dose-finding methods use sophisticated yet untestable models to quantify dose-toxicity and dose-efficacy relationships, which always renders them difficult to implement in practice. To simplify the practical implementation, we extend the Bayesian optimal interval design from maximum tolerated dose finding to optimal biological dose finding in phase I/II trials. In particular, optimized intervals for toxicity and efficacy are respectively derived by minimizing probabilities of incorrect classifications. If the pair of observed toxicity and efficacy probabilities at the current dose is located inside the promising region, we retain the current dose; if the observed probabilities are outside of the promising region, we propose an allocation rule by maximizing the posterior probability that the response rate of the next dose falls inside a prespecified efficacy probability interval while still controlling the level of toxicity. The proposed interval design is model-free, thus is suitable for various dose-response relationships. We conduct extensive simulation studies to demonstrate the small- and large-sample performance of the proposed method under various scenarios. Compared to existing phase I/II dose-finding designs, not only is our interval design easy to implement in practice, but it also possesses desirable and robust operating characteristics.
  • Quantile causal mediation analysis allowing longitudinal data
    • Abstract: Mediation analysis has mostly been conducted with mean regression models. With this approach modeling means, formulae for direct and indirect effects are based on changes in means, which may not capture effects that occur in units at the tails of mediator and outcome distributions. Individuals with extreme values of medical endpoints are often more susceptible to disease and can be missed if one investigates mean changes only. We derive the controlled direct and indirect effects of an exposure along percentiles of the mediator and outcome using quantile regression models and a causal framework. The quantile regression models can accommodate an exposure-mediator interaction and random intercepts to allow for longitudinal mediator and outcome. Because DNA methylation acts as a complex “switch” to control gene expression and fibrinogen is a cardiovascular factor, individuals with extreme levels of these markers may be more susceptible to air pollution. We therefore apply this methodology to environmental data to estimate the effect of air pollution, as measured by particle number, on fibrinogen levels through a change in interferon-gamma (IFN-γ) methylation. We estimate the controlled direct effect of air pollution on the qth percentile of fibrinogen and its indirect effect through a change in the pth percentile of IFN-γ methylation. We found evidence of a direct effect of particle number on the upper tail of the fibrinogen distribution. We observed a suggestive indirect effect of particle number on the upper tail of the fibrinogen distribution through a change in the lower percentiles of the IFN-γ methylation distribution.
  • A pattern-mixture model approach for handling missing continuous outcome
           data in longitudinal cluster randomized trials
    • Abstract: We extend the pattern-mixture approach to handle missing continuous outcome data in longitudinal cluster randomized trials, which randomize groups of individuals to treatment arms, rather than the individuals themselves. Individuals who drop out at the same time point are grouped into the same dropout pattern. We approach extrapolation of the pattern-mixture model by applying multilevel multiple imputation, which imputes missing values while appropriately accounting for the hierarchical data structure found in cluster randomized trials. To assess parameters of interest under various missing data assumptions, imputed values are multiplied by a sensitivity parameter, k, which increases or decreases imputed values. Using simulated data, we show that estimates of parameters of interest can vary widely under differing missing data assumptions. We conduct a sensitivity analysis using real data from a cluster randomized trial by increasing k until the treatment effect inference changes. By performing a sensitivity analysis for missing data, researchers can assess whether certain missing data assumptions are reasonable for their cluster randomized trial.
  • Bayesian analysis of pair-matched case-control studies subject to outcome
    • Abstract: We examine the impact of nondifferential outcome misclassification on odds ratios estimated from pair-matched case-control studies and propose a Bayesian model to adjust these estimates for misclassification bias. The model relies on access to a validation subgroup with confirmed outcome status for all case-control pairs as well as prior knowledge about the positive and negative predictive value of the classification mechanism. We illustrate the model's performance on simulated data and apply it to a database study examining the presence of ten morbidities in the prodromal phase of multiple sclerosis.
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Tel: +00 44 (0)131 4513762
Fax: +00 44 (0)131 4513327
Home (Search)
Subjects A-Z
Publishers A-Z
Your IP address:
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-2016