Followed Journals
Journal you Follow: 0
 
Sign Up to follow journals, search in your chosen journals and, optionally, receive Email Alerts when new issues of your Followed Journals are published.
Already have an account? Sign In to see the journals you follow.
Similar Journals
Journal Cover
Statistical Methods in Medical Research
Journal Prestige (SJR): 1.402
Citation Impact (citeScore): 2
Number of Followers: 30  
 
  Hybrid Journal Hybrid journal (It can contain Open Access articles)
ISSN (Print) 0962-2802 - ISSN (Online) 1477-0334
Published by Sage Publications Homepage  [1166 journals]
  • Adaptive group sequential survival comparisons based on log-rank and
           pointwise test statistics

    • Free pre-print version: Loading...

      Authors: Jannik Feld, Andreas Faldum, Rene Schmidt
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Whereas the theory of confirmatory adaptive designs is well understood for uncensored data, implementation of adaptive designs in the context of survival trials remains challenging. Commonly used adaptive survival tests are based on the independent increments structure of the log-rank statistic. This implies some relevant limitations: On the one hand, essentially only the interim log-rank statistic may be used for design modifications (such as data-dependent sample size recalculation). Furthermore, the treatment arm allocation ratio in these classical methods is assumed to be constant throughout the trial period. Here, we propose an extension of the independent increments approach to adaptive survival tests that addresses some of these limitations. We present a confirmatory adaptive two-sample log-rank test that allows rejection regions and sample size recalculation rules to be based not only on the interim log-rank statistic, but also on point-wise survival rate estimates, simultaneously. In addition, the possibility is opened to adapt the treatment arm allocation ratio after each interim analysis in a data-dependent way. The ability to include point-wise survival rate estimators in the rejection region of a test for comparing survival curves might be attractive, e.g., for seamless phase II/III designs. Data-dependent adaptation of the allocation ratio could be helpful in multi-arm trials in order to successively steer recruitment into the study arms with the greatest chances of success. The methodology is motivated by the LOGGIC Europe Trial from pediatric oncology. Distributional properties are derived using martingale techniques in the large sample limit. Small sample properties are studied by simulation.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-13T04:03:06Z
      DOI: 10.1177/09622802211043262
       
  • Multiple imputation with missing data indicators

    • Free pre-print version: Loading...

      Authors: Lauren J Beesley, Irina Bondarenko, Michael R Elliot, Allison W Kurian, Steven J Katz, Jeremy MG Taylor
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Multiple imputation is a well-established general technique for analyzing data with missing values. A convenient way to implement multiple imputation is sequential regression multiple imputation, also called chained equations multiple imputation. In this approach, we impute missing values using regression models for each variable, conditional on the other variables in the data. This approach, however, assumes that the missingness mechanism is missing at random, and it is not well-justified under not-at-random missingness without additional modification. In this paper, we describe how we can generalize the sequential regression multiple imputation imputation procedure to handle missingness not at random in the setting where missingness may depend on other variables that are also missing but not on the missing variable itself, conditioning on fully observed variables. We provide algebraic justification for several generalizations of standard sequential regression multiple imputation using Taylor series and other approximations of the target imputation distribution under missingness not at random. Resulting regression model approximations include indicators for missingness, interactions, or other functions of the missingness not at random missingness model and observed data. In a simulation study, we demonstrate that the proposed sequential regression multiple imputation modifications result in reduced bias in the final analysis compared to standard sequential regression multiple imputation, with an approximation strategy involving inclusion of an offset in the imputation model performing the best overall. The method is illustrated in a breast cancer study, where the goal is to estimate the prevalence of a specific genetic pathogenic variant.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-13T02:07:48Z
      DOI: 10.1177/09622802211047346
       
  • The asymptotic distribution of the Net Benefit estimator in presence of
           right-censoring

    • Free pre-print version: Loading...

      Authors: Brice Ozenne, Esben Budtz-Jørgensen, Julien Péron
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      The benefit–risk balance is a critical information when evaluating a new treatment. The Net Benefit has been proposed as a metric for the benefit–risk assessment, and applied in oncology to simultaneously consider gains in survival and possible side effects of chemotherapies. With complete data, one can construct a U-statistic estimator for the Net Benefit and obtain its asymptotic distribution using standard results of the U-statistic theory. However, real data is often subject to right-censoring, e.g. patient drop-out in clinical trials. It is then possible to estimate the Net Benefit using a modified U-statistic, which involves the survival time. The latter can be seen as a nuisance parameter affecting the asymptotic distribution of the Net Benefit estimator. We present here how existing asymptotic results on U-statistics can be applied to estimate the distribution of the net benefit estimator, and assess their validity in finite samples. The methodology generalizes to other statistics obtained using generalized pairwise comparisons, such as the win ratio. It is implemented in the R package BuyseTest (version 2.3.0 and later) available on Comprehensive R Archive Network.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-11T03:13:19Z
      DOI: 10.1177/09622802211037067
       
  • Conditional copula models for correlated survival endpoints: Individual
           patient data meta-analysis of randomized controlled trials

    • Free pre-print version: Loading...

      Authors: Takeshi Emura, Casimir Ledoux Sofeu, Virginie Rondeau
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Correlations among survival endpoints are important for exploring surrogate endpoints of the true endpoint. With a valid surrogate endpoint tightly correlated with the true endpoint, the efficacy of a new drug/treatment can be measurable on it. However, the existing methods for measuring correlation between two endpoints impose an invalid assumption: correlation structure is constant across different treatment arms. In this article, we reconsider the definition of Kendall's concordance measure (tau) in the context of individual patient data meta-analyses of randomized controlled trials. According to our new definition of Kendall's tau, its value depends on the treatment arms. We then suggest extending the existing copula (and frailty) models so that their Kendall's tau can vary across treatment arms. Our newly proposed model, a joint frailty-conditional copula model, is the implementation of the new definition of Kendall's tau in meta-analyses. In order to facilitate our approach, we develop an original R function condCox.reg(.) and make it available in the R package joint.Cox (https://CRAN.R-project.org/package=joint.Cox). We apply the proposed method to a gastric cancer dataset (3288 patients in 14 randomized trials from the GASTRIC group). This data analysis concludes that Kendall's tau has different values between the surgical treatment arm and the adjuvant chemotherapy arm (p-value
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-09T05:56:25Z
      DOI: 10.1177/09622802211046390
       
  • Developing clinical prediction models when adhering to minimum sample size
           recommendations: The importance of quantifying bootstrap variability in
           tuning parameters and predictive performance

    • Free pre-print version: Loading...

      Authors: Glen P Martin, Richard D Riley, Gary S Collins, Matthew Sperrin
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Recent minimum sample size formula (Riley et al.) for developing clinical prediction models help ensure that development datasets are of sufficient size to minimise overfitting. While these criteria are known to avoid excessive overfitting on average, the extent of variability in overfitting at recommended sample sizes is unknown. We investigated this through a simulation study and empirical example to develop logistic regression clinical prediction models using unpenalised maximum likelihood estimation, and various post-estimation shrinkage or penalisation methods. While the mean calibration slope was close to the ideal value of one for all methods, penalisation further reduced the level of overfitting, on average, compared to unpenalised methods. This came at the cost of higher variability in predictive performance for penalisation methods in external data. We recommend that penalisation methods are used in data that meet, or surpass, minimum sample size requirements to further mitigate overfitting, and that the variability in predictive performance and any tuning parameters should always be examined as part of the model development process, since this provides additional information over average (optimism-adjusted) performance alone. Lower variability would give reassurance that the developed clinical prediction model will perform well in new individuals from the same population as was used for model development.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-08T03:51:39Z
      DOI: 10.1177/09622802211046388
       
  • Novel empirical likelihood inference for the mean difference with
           right-censored data

    • Free pre-print version: Loading...

      Authors: Kangni Alemdjrodo, Yichuan Zhao
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      This paper focuses on comparing two means and finding a confidence interval for the difference of two means with right-censored data using the empirical likelihood method combined with the independent and identically distributed random functions representation. In the literature, some early researchers proposed empirical link-based confidence intervals for the mean difference based on right-censored data using the synthetic data approach. However, their empirical log-likelihood ratio statistic has a scaled chi-squared distribution. To avoid the estimation of the scale parameter in constructing confidence intervals, we propose an empirical likelihood method based on the independent and identically distributed representation of Kaplan–Meier weights involved in the empirical likelihood ratio. We obtain the standard chi-squared distribution. We also apply the adjusted empirical likelihood to improve coverage accuracy for small samples. In addition, we investigate a new empirical likelihood method, the mean empirical likelihood, within the framework of our study. The performances of all the empirical likelihood methods are compared via extensive simulations. The proposed empirical likelihood-based confidence interval has better coverage accuracy than those from existing methods. Finally, our findings are illustrated with a real data set.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-08T01:51:20Z
      DOI: 10.1177/09622802211041767
       
  • Evaluations of the sum-score-based and item response theory-based tests of
           group mean differences under various simulation conditions

    • Free pre-print version: Loading...

      Authors: Mian Wang, Bryce B. Reeve
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      The use of patient-reported outcomes measures is gaining popularity in clinical trials for comparing patient groups. Such comparisons typically focus on the differences in group means and are carried out using either a traditional sum-score-based approach or item response theory (IRT)-based approaches. Several simulation studies have evaluated different group mean comparison approaches in the past, but the performance of these approaches remained unknown under certain uninvestigated conditions (e.g. under the impact of differential item functioning (DIF)). By incorporating some of the uninvestigated simulation features, the current study examines Type I error, statistical power, and effect size estimation accuracy associated with group mean comparisons using simple sum scores, IRT model likelihood ratio tests, and IRT expected-a-posteriori scores. Manipulated features include sample size per group, number of items, number of response categories, strength of discrimination parameters, location of thresholds, impact of DIF, and presence of missing data. Results are summarized and visualized using decision trees.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-07T03:11:39Z
      DOI: 10.1177/09622802211043263
       
  • A multi-state Markov model using notification data to estimate HIV
           incidence, number of undiagnosed individuals living with HIV, and delay
           between infection and diagnosis: Illustration in France, 2008–2018

    • Free pre-print version: Loading...

      Authors: Charlotte Castel, Cecile Sommen, Yann Le Strat, Ahmadou Alioum
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Thirty-five years since the discovery of the human immunodeficiency virus (HIV), the epidemic is still ongoing in France. To guide HIV prevention strategies and monitor their impact, it is essential to understand the dynamics of the HIV epidemic. The indicator for reporting the progress of new infections is the HIV incidence. Given that HIV is mainly transmitted by undiagnosed individuals and that earlier treatment leads to less HIV transmission, it is essential to know the number of infected people unaware of their HIV-positive status as well as the time between infection and diagnosis. Our approach is based on a non-homogeneous multi-state Markov model describing the progression of the HIV disease. We propose a penalized likelihood approach to estimate the HIV incidence curve as well as the diagnosis rates. The HIV incidence curve was approximated using cubic M-splines, while an approximation of the cross-validation criterion was used to estimate the smoothing parameter. In a simulation study, we evaluate the performance of the model for reconstructing the HIV incidence curve and diagnosis rates. The method is illustrated in the population of men who have sex with men using HIV surveillance data collected by the French Institute for Public Health Surveillance since 2004.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-10-04T08:00:30Z
      DOI: 10.1177/09622802211032697
       
  • Semiparametric estimation for nonparametric frailty models using
           nonparametric maximum likelihood approach

    • Free pre-print version: Loading...

      Authors: Chew-Seng Chee, Il Do Ha, Byungtae Seo, Youngjo Lee
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      A consequence of using a parametric frailty model with nonparametric baseline hazard for analyzing clustered time-to-event data is that its regression coefficient estimates could be sensitive to the underlying frailty distribution. Recently, there has been a proposal for specifying both the baseline hazard and the frailty distribution nonparametrically, and estimating the unknown parameters by the maximum penalized likelihood method. Instead, in this paper, we propose the nonparametric maximum likelihood method for a general class of nonparametric frailty models, i.e. models where the frailty distribution is completely unspecified but the baseline hazard can be either parametric or nonparametric. The implementation of the estimation procedure can be based on a combination of either the Broyden–Fletcher–Goldfarb–Shanno or expectation-maximization algorithm and the constrained Newton algorithm with multiple support point inclusion. Simulation studies to investigate the performance of estimation of a regression coefficient by several different model-fitting methods were conducted. The simulation results show that our proposed regression coefficient estimator generally gives a reasonable bias reduction when the number of clusters is increased under various frailty distributions. Our proposed method is also illustrated with two data examples.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-27T11:30:54Z
      DOI: 10.1177/09622802211037072
       
  • A permutation test for assessing the presence of individual differences in
           treatment effects

    • Free pre-print version: Loading...

      Authors: Chi Chang, Thomas Jaki, Muhammad Saad Sadiq, Alena Kuhlemeier, Daniel Feaster, Natalie Cole, Andrea Lamont, Daniel Oberski, Yasin Desai, M. Lee Van Horn
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      An important goal of personalized medicine is to identify heterogeneity in treatment effects and then use that heterogeneity to target the intervention to those most likely to benefit. Heterogeneity is assessed using the predicted individual treatment effects framework, and a permutation test is proposed to establish if significant heterogeneity is present given the covariates and predictive model or algorithm used for predicted individual treatment effects. We first show evidence for heterogeneity in the effects of treatment across an illustrative example data set. We then use simulations with two different predictive methods (linear regression model and Random Forests) to show that the permutation test has adequate type-I error control. Next, we use an example dataset as the basis for simulations to demonstrate the ability of the permutation test to find heterogeneity in treatment effects for a predicted individual treatment effects estimate as a function of both effect size and sample size. We find that the proposed test has good power for detecting heterogeneity in treatment effects when the heterogeneity was due primarily to a single predictor, or when it was spread across the predictors. Power was found to be greater for predictions from a linear model than from random forests. This non-parametric permutation test can be used to test for significant differences across individuals in predicted individual treatment effects obtained with a given set of covariates using any predictive method with no additional assumptions.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-27T05:19:03Z
      DOI: 10.1177/09622802211033640
       
  • Inference for the treatment effect in longitudinal cluster randomized
           trials when treatment effect heterogeneity is ignored

    • Free pre-print version: Loading...

      Authors: Rhys Bowden, Andrew B Forbes, Jessica Kasza
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      In cluster-randomized trials, sometimes the effect of the intervention being studied differs between clusters, commonly referred to as treatment effect heterogeneity. In the analysis of stepped wedge and cluster-randomized crossover trials, it is possible to include terms in outcome regression models to allow for such treatment effect heterogeneity yet this is not frequently considered. Outside of some simulation studies of specific cases where the outcome is binary, the impact of failing to include terms for treatment effect heterogeneity on the variance of the treatment effect estimator is unknown. We analytically examine the impact of failing to include terms for treatment effect heterogeneity on the variance of the treatment effect estimator, when outcomes are continuous. Using analysis of variance and feasible generalized least squares we provide expressions for this variance. For both the cluster-randomized crossover design and the stepped wedge design, our analytic derivations indicate that failing to include treatment effect heterogeneity results in the estimates for variance of the treatment effect that are too small, leading to inflation of type I error rates. We therefore recommend assessing the sensitivity of sample size calculations and conclusions drawn from the analysis of cluster randomized trials to the inclusion of treatment effect heterogeneity.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-27T01:29:07Z
      DOI: 10.1177/09622802211041754
       
  • Commentary on the use of the reproduction number R during the COVID-19
           pandemic

    • Free pre-print version: Loading...

      Authors: Carolin Vegvari, Sam Abbott, Frank Ball, Ellen Brooks-Pollock, Robert Challen, Benjamin S Collyer, Ciara Dangerfield, Julia R Gog, Katelyn M Gostic, Jane M Heffernan, T Déirdre Hollingsworth, Valerie Isham, Eben Kenah, Denis Mollison, Jasmina Panovska-Griffiths, Lorenzo Pellis, Michael G Roberts, Gianpaolo Scalia Tomba, Robin N Thompson, Pieter Trapman
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Since the beginning of the COVID-19 pandemic, the reproduction number [math] has become a popular epidemiological metric used to communicate the state of the epidemic. At its most basic, [math] is defined as the average number of secondary infections caused by one primary infected individual. [math] seems convenient, because the epidemic is expanding if [math] and contracting if [math]. The magnitude of [math] indicates by how much transmission needs to be reduced to control the epidemic. Using [math] in a naïve way can cause new problems. The reasons for this are threefold: (1) There is not just one definition of [math] but many, and the precise definition of [math] affects both its estimated value and how it should be interpreted. (2) Even with a particular clearly defined [math], there may be different statistical methods used to estimate its value, and the choice of method will affect the estimate. (3) The availability and type of data used to estimate [math] vary, and it is not always clear what data should be included in the estimation. In this review, we discuss when [math] is useful, when it may be of use but needs to be interpreted with care, and when it may be an inappropriate indicator of the progress of the epidemic. We also argue that careful definition of [math], and the data and methods used to estimate it, can make [math] a more useful metric for future management of the epidemic.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-27T01:28:07Z
      DOI: 10.1177/09622802211037079
       
  • Optimal allocation to treatments in a sequential multiple assignment
           randomized trial

    • Free pre-print version: Loading...

      Authors: Andrea Morciano, Mirjam Moerbeek
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      One of the main questions in the design of a trial is how many subjects should be assigned to each treatment condition. Previous research has shown that equal randomization is not necessarily the best choice. We study the optimal allocation for a novel trial design, the sequential multiple assignment randomized trial, where subjects receive a sequence of treatments across various stages. A subject's randomization probabilities to treatments in the next stage depend on whether he or she responded to treatment in the current stage. We consider a prototypical sequential multiple assignment randomized trial design with two stages. Within such a design, many pairwise comparisons of treatment sequences can be made, and a multiple-objective optimal design strategy is proposed to consider all such comparisons simultaneously. The optimal design is sought under either a fixed total sample size or a fixed budget. A Shiny App is made available to find the optimal allocations and to evaluate the efficiency of competing designs. As the optimal design depends on the response rates to first-stage treatments, maximin optimal design methodology is used to find robust optimal designs. The proposed methodology is illustrated using a sequential multiple assignment randomized trial example on weight loss management.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-23T02:03:39Z
      DOI: 10.1177/09622802211037066
       
  • Flexible extension of the accelerated failure time model to account for
           nonlinear and time-dependent effects of covariates on the hazard

    • Free pre-print version: Loading...

      Authors: Menglan Pang, Robert W Platt, Tibor Schuster, Michal Abrahamowicz
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      The accelerated failure time model is an alternative to the Cox proportional hazards model in survival analysis. However, conclusions regarding the associations of prognostic factors with event times are valid only if the underlying modeling assumptions are met. In contrast to several flexible methods for relaxing the proportional hazards and linearity assumptions in the Cox model, formal investigation of the constant-over-time time ratio and linearity assumptions in the accelerated failure time model has been limited. Yet, in practice, prognostic factors may have time-dependent and/or nonlinear effects. Furthermore, parametric accelerated failure time models require correct specification of the baseline hazard function, which is treated as a nuisance parameter in the Cox proportional hazards model, and is rarely known in practice. To address these challenges, we propose a flexible extension of the accelerated failure time model where unpenalized regression B-splines are used to model (i) the baseline hazard function of arbitrary shape, (ii) the time-dependent covariate effects on the hazard, and (iii) nonlinear effects for continuous covariates. Simulations evaluate the accuracy of the time-dependent and/or nonlinear estimates, and of the resulting survival functions, in multivariable settings. The proposed flexible extension of the accelerated failure time model is applied to re-assess the effects of prognostic factors on mortality after septic shock.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-22T03:22:16Z
      DOI: 10.1177/09622802211041759
       
  • Correlation-based joint feature screening for semi-competing risks
           outcomes with application to breast cancer data

    • Free pre-print version: Loading...

      Authors: Mengjiao Peng, Liming Xiang
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Ultrahigh-dimensional gene features are often collected in modern cancer studies in which the number of gene features [math] is extremely larger than sample size [math]. While gene expression patterns have been shown to be related to patients’ survival in microarray-based gene expression studies, one has to deal with the challenges of ultrahigh-dimensional genetic predictors for survival predicting and genetic understanding of the disease in precision medicine. The problem becomes more complicated when two types of survival endpoints, distant metastasis-free survival and overall survival, are of interest in the study and outcome data can be subject to semi-competing risks due to the fact that distant metastasis-free survival is possibly censored by overall survival but not vice versa. Our focus in this paper is to extract important features, which have great impacts on both distant metastasis-free survival and overall survival jointly, from massive gene expression data in the semi-competing risks setting. We propose a model-free screening method based on the ranking of the correlation between gene features and the joint survival function of two endpoints. The method accounts for the relationship between two endpoints in a simply defined utility measure that is easy to understand and calculate. We show its favorable theoretical properties such as the sure screening and ranking consistency, and evaluate its finite sample performance through extensive simulation studies. Finally, an application to classifying breast cancer data clearly demonstrates the utility of the proposed method in practice.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-14T09:40:12Z
      DOI: 10.1177/09622802211037071
       
  • CauchyCP: A powerful test under non-proportional hazards using Cauchy
           combination of change-point Cox regressions

    • Free pre-print version: Loading...

      Authors: Hong Zhang, Qing Li, Devan V Mehrotra, Judong Shen
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Non-proportional hazards data are routinely encountered in randomized clinical trials. In such cases, classic Cox proportional hazards model can suffer from severe power loss, with difficulty in interpretation of the estimated hazard ratio since the treatment effect varies over time. We propose CauchyCP, an omnibus test of change-point Cox regression models, to overcome both challenges while detecting signals of non-proportional hazards patterns. Extensive simulation studies demonstrate that, compared to existing treatment comparison tests under non-proportional hazards, the proposed CauchyCP test (a) controls the type I error better at small [math] levels ([math]); (b) increases the power of detecting time-varying effects; and (c) is more computationally efficient than popular methods like MaxCombo for large-scale data analysis. The superior performance of CauchyCP is further illustrated using retrospective analyses of two randomized clinical trial datasets and a pharmacogenetic biomarker study dataset. The R package CauchyCP is publicly available on CRAN.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-14T05:13:48Z
      DOI: 10.1177/09622802211037076
       
  • Sample sizes for cluster-randomised trials with continuous outcomes:
           

    • Free pre-print version: Loading...

      Authors: Jen Lewis, Steven A Julious
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Sample size calculations for cluster-randomised trials require inclusion of an inflation factor taking into account the intra-cluster correlation coefficient. Often, estimates of the intra-cluster correlation coefficient are taken from pilot trials, which are known to have uncertainty about their estimation. Given that the value of the intra-cluster correlation coefficient has a considerable influence on the calculated sample size for a main trial, the uncertainty in the estimate can have a large impact on the ultimate sample size and consequently, the power of a main trial. As such, it is important to account for the uncertainty in the estimate of the intra-cluster correlation coefficient. While a commonly adopted approach is to utilise the upper confidence limit in the sample size calculation, this is a largely inefficient method which can result in overpowered main trials. In this paper, we present a method of estimating the sample size for a main cluster-randomised trial with a continuous outcome, using numerical methods to account for the uncertainty in the intra-cluster correlation coefficient estimate. Despite limitations with this initial study, the findings and recommendations in this paper can help to improve sample size estimations for cluster randomised controlled trials by accounting for uncertainty in the estimate of the intra-cluster correlation coefficient. We recommend this approach be applied to all trials where there is uncertainty in the intra-cluster correlation coefficient estimate, in conjunction with additional sources of information to guide the estimation of the intra-cluster correlation coefficient.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-03T02:48:29Z
      DOI: 10.1177/09622802211037073
       
  • Augmenting contact matrices with time-use data for fine-grained
           intervention modelling of disease dynamics: A modelling analysis

    • Free pre-print version: Loading...

      Authors: Edwin van Leeuwen, Frank Sandmann
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Social distancing is an important public health intervention to reduce or interrupt the sustained community transmission of emerging infectious pathogens, such as severe acute respiratory syndrome-coronavirus-2 during the coronavirus disease 2019 pandemic. Contact matrices are typically used when evaluating such public health interventions to account for the heterogeneity in social mixing of individuals, but the surveys used to obtain the number of contacts often lack detailed information on the time individuals spend on daily activities. The present work addresses this problem by combining the large-scale empirical data of a social contact survey and a time-use survey to estimate contact matrices by age group (0--15, 16--24, 25–44, 45–64, 65+ years) and daily activity (work, schooling, transportation, and four leisure activities: social visits, bar/cafe/restaurant visits, park visits, and non-essential shopping). This augmentation allows exploring the impact of fewer contacts when individuals reduce the time they spend on selected daily activities as well as when lifting such restrictions again. For illustration, the derived matrices were then applied to an age-structured dynamic-transmission model of coronavirus disease 2019. Findings show how contact matrices can be successfully augmented with time-use data to inform the relative reductions in contacts by activity, which allows for more fine-grained mixing patterns and infectious disease modelling.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-01T02:28:51Z
      DOI: 10.1177/09622802211037078
       
  • A survival mediation model with Bayesian model averaging

    • Free pre-print version: Loading...

      Authors: Jie Zhou, Xun Jiang, Hong Amy Xia, Peng Wei, Brian P Hobbs
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Determining the extent to which a patient is benefiting from cancer therapy is challenging. Criteria for quantifying the extent of “tumor response” observed within a few cycles of treatment have been established for various types of solids as well as hematologic malignancies. These measures comprise the primary endpoints of phase II trials. Regulatory approvals of new cancer therapies, however, are usually contingent upon the demonstration of superior overall survival with randomized evidence acquired with a phase III trial comparing the novel therapy to an appropriate standard of care treatment. With nearly two-thirds of phase III oncology trials failing to achieve statistically significant results, researchers continue to refine and propose new surrogate endpoints. This article presents a Bayesian framework for studying relationships among treatment, patient subgroups, tumor response, and survival. Combining classical components of a mediation analysis with Bayesian model averaging, the methodology is robust to model misspecification among various possible relationships among the observable entities. A posterior inference is demonstrated via an application to a randomized controlled phase III trial in metastatic colorectal cancer. Moreover, the article details posterior predictive distributions of survival and statistical metrics for quantifying the extent of direct and indirect, or tumor response mediated treatment effects.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-08-27T01:41:43Z
      DOI: 10.1177/09622802211037069
       
  • Variable selection for a mark-specific additive hazards model using the
           adaptive LASSO

    • Free pre-print version: Loading...

      Authors: Dongxiao Han, Lianqiang Qu, Liuquan Sun, Yanqing Sun
      First page: 2017
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      In HIV vaccine efficacy trials, mark-specific hazards models have important applications and can be used to evaluate the strain-specific vaccine efficacy. Additive hazards models have been widely used in practice, especially when continuous covariates are present. In this article, we conduct variable selection for a mark-specific additive hazards model. The proposed method is based on an estimating equation with the first derivative of the adaptive LASSO penalty function. The asymptotic properties of the resulting estimators are established. The finite sample behavior of the proposed estimators is evaluated through simulation studies, and an application to a dataset from the first HIV vaccine efficacy trial is provided.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-16T04:02:12Z
      DOI: 10.1177/09622802211023957
       
  • The change in estimate method for selecting confounders: A simulation
           study

    • Free pre-print version: Loading...

      Authors: Denis Talbot, Awa Diop, Mathilde Lavigne-Robichaud, Chantal Brisson
      First page: 2032
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      BackgroundThe change in estimate is a popular approach for selecting confounders in epidemiology. It is recommended in epidemiologic textbooks and articles over significance test of coefficients, but concerns have been raised concerning its validity. Few simulation studies have been conducted to investigate its performance.MethodsAn extensive simulation study was realized to compare different implementations of the change in estimate method. The implementations were also compared when estimating the association of body mass index with diastolic blood pressure in the PROspective Québec Study on Work and Health.ResultsAll methods were susceptible to introduce important bias and to produce confidence intervals that included the true effect much less often than expected in at least some scenarios. Overall mixed results were obtained regarding the accuracy of estimators, as measured by the mean squared error. No implementation adequately differentiated confounders from non-confounders. In the real data analysis, none of the implementation decreased the estimated standard error.ConclusionBased on these results, it is questionable whether change in estimate methods are beneficial in general, considering their low ability to improve the precision of estimates without introducing bias and inability to yield valid confidence intervals or to identify true confounders.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-08-09T09:32:20Z
      DOI: 10.1177/09622802211034219
       
  • Validity of a method for identifying disease subtypes that are
           etiologically heterogeneous

    • Free pre-print version: Loading...

      Authors: Emily C Zabor, Venkatraman E Seshan, Shuang Wang, Colin B Begg
      First page: 2045
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      A focus of cancer epidemiologic research has become the identification of risk factors that influence specific subtypes of disease, a phenomenon known as etiologic heterogeneity. In previous work we developed a novel strategy to cluster tumor markers and identify disease subtypes that differ maximally with respect to known risk factors for use in the context of case-control studies. The method relies on the premise that unsupervised k-means clustering will find candidate solutions that are closely aligned with the sought-after etiologically distinct clusters, which may not be true in the presence of clusters of tumor markers that are not related to risk of disease. In this article, we investigate in detail the ability of the method to identify the “true” clusters in the presence of clusters that are unrelated to risk factors, what we term “counterfeit” clusters. We find that our method works when the strength of structure is larger in the clusters that truly represent etiologic heterogeneity than in the counterfeit clusters, but when this condition is not met, or when there are many tumor markers that simply represent noise, the method will not find the correct solution without first performing variable selection to identify the tumor markers most strongly related to the risk factors. We illustrate the results using data from a breast cancer case-control study.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-28T05:16:06Z
      DOI: 10.1177/09622802211032704
       
  • Impact of unequal censoring and insufficient follow-up on comparing
           survival outcomes: Applications to clinical studies

    • Free pre-print version: Loading...

      Authors: Deo Kumar Srivastava, E Olusegun George, Zhaohua Lu, Shesh N Rai
      First page: 2057
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Clinical trials with survival endpoints are typically designed to enroll patients for a specified number of years, (usually 2–3 years) with another specified duration of follow-up (usually 2–3 years). Under this scheme, patients who are alive or free of the event of interest at the termination of the study are censored. Consequently, a patient may be censored due to insufficient follow-up duration or due to being lost to follow-up. Potentially, this process could lead to unequal censoring in the treatment arms and lead to inaccurate and adverse conclusions about treatment effects. In this article, using extensive simulation studies, we assess the impact of such censorings on statistical procedures (the generalized logrank tests) for comparing two survival distributions and illustrate our observations by revisiting Mukherjee et al.’s findings of cardiovascular events in patients who took Rofecoxib (Vioxx).
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-07T07:22:58Z
      DOI: 10.1177/09622802211017592
       
  • Improving the estimation of the COVID-19 effective reproduction number
           using nowcasting

    • Free pre-print version: Loading...

      Authors: Joaquin Salas
      First page: 2075
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      As the interactions between people increases, the impending menace of COVID-19 outbreaks materializes, and there is an inclination to apply lockdowns. In this context, it is essential to have easy-to-use indicators for people to employ as a reference. The effective reproduction number of confirmed positives, Rt, fulfills such a role. This document proposes a data-driven approach to nowcast Rt based on previous observations’ statistical behavior. As more information arrives, the method naturally becomes more precise about the final count of confirmed positives. Our method’s strength is that it is based on the self-reported onset of symptoms, in contrast to other methods that use the daily report’s count to infer this quantity. We show that our approach may be the foundation for determining useful epidemy tracking indicators.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-05-06T05:38:02Z
      DOI: 10.1177/09622802211008939
       
  • Distance-based Classification and Regression Trees for the analysis of
           complex predictors in health and medical research

    • Free pre-print version: Loading...

      Authors: Hannah Johns, Julie Bernhardt, Leonid Churilov
      First page: 2085
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Predicting patient outcomes based on patient characteristics and care processes is a common task in medical research. Such predictive features are often multifaceted and complex, and are usually simplified into one or more scalar variables to facilitate statistical analysis. This process, while necessary, results in a loss of important clinical detail. While this loss may be prevented by using distance-based predictive methods which better represent complex healthcare features, the statistical literature on such methods is limited, and the range of tools facilitating distance-based analysis is substantially smaller than those of other methods. Consequently, medical researchers must choose to either reduce complex predictive features to scalar variables to facilitate analysis, or instead use a limited number of distance-based predictive methods which may not fulfil the needs of the analysis problem at hand. We address this limitation by developing a Distance-Based extension of Classification and Regression Trees (DB-CART) capable of making distance-based predictions of categorical, ordinal and numeric patient outcomes. We also demonstrate how this extension is compatible with other extensions to CART, including a recently published method for predicting care trajectories in chronic disease. We demonstrate DB-CART by using it to expand upon previously published dose–response analysis of stroke rehabilitation data. Our method identified additional detail not captured by the previously published analysis, reinforcing previous conclusions. We also demonstrate how by combining DB-CART with other extensions to CART, the method is capable of making predictions about complex, multifaceted outcome data based on complex, multifaceted predictive features.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-28T05:14:27Z
      DOI: 10.1177/09622802211032712
       
  • General regression methods for respondent-driven sampling data

    • Free pre-print version: Loading...

      Authors: Mamadou Yauck, Erica EM Moodie, Herak Apelian, Alain Fourmigue, Daniel Grace, Trevor Hart, Gilles Lambert, Joseph Cox
      First page: 2105
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Respondent-driven sampling is a variant of link-tracing sampling techniques that aim to recruit hard-to-reach populations by leveraging individuals’ social relationships. As such, a respondent-driven sample has a graphical component which represents a partially observed network of unknown structure. Moreover, it is common to observe homophily, or the tendency to form connections with individuals who share similar traits. Currently, there is a lack of principled guidance on multivariate modelling strategies for respondent-driven sampling to address peer effects driven by homophily and the dependence between observations within the network. In this work, we propose a methodology for general regression techniques using respondent-driven sampling data. This is used to study the socio-demographic predictors of HIV treatment optimism (about the value of antiretroviral therapy) among gay, bisexual and other men who have sex with men, recruited into a respondent-driven sampling study in Montreal, Canada.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-28T05:16:05Z
      DOI: 10.1177/09622802211032713
       
  • A novel model-checking approach for dose-response relationships

    • Free pre-print version: Loading...

      Authors: Shunyao Wu, Xinmin Li, Yu Xia, Hua Liang
      First page: 2119
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      We propose a test for assessing nonlinear dose-response models based on a Crámer–von Mises statistic. We establish the asymptotic distribution of the test and demonstrate that the test can detect the local alternative converging to the null at the parametric rate [math]. We provide a bootstrap resampling technique to calculate the critical values. It is observed that the test has good power performance in small sample sizes. We apply the proposed method to analyze 250 datasets from a pharmacologic study and conduct two small simulation experiments to explore the numerical performance of the proposed test and compare one commonly used test in practice.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-28T05:14:28Z
      DOI: 10.1177/09622802211032695
       
  • Between-group comparison of area under the curve in clinical trials with
           censored follow-up: Application to HIV therapeutic vaccines

    • Free pre-print version: Loading...

      Authors: Marie Alexandre, Mélanie Prague, Rodolphe Thiébaut
      First page: 2130
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      In clinical trials, longitudinal data are commonly analyzed and compared between groups using a single summary statistic such as area under the outcome versus time curve (AUC). However, incomplete data, arising from censoring due to a limit of detection or missing data, can bias these analyses. In this article, we present a statistical test based on splines-based mixed-model accounting for both the censoring and missingness mechanisms in the AUC estimation. Inferential properties of the proposed method were evaluated and compared to ad hoc approaches and to a non-parametric method through a simulation study based on two-armed trial where trajectories and the proportion of missing data were varied. Simulation results highlight that our approach has significant advantages over the other methods. A real working example from two HIV therapeutic vaccine trials is presented to illustrate the applicability of our approach.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-05T03:43:12Z
      DOI: 10.1177/09622802211023963
       
  • Testing for treatment effect in covariate-adaptive randomized trials with
           generalized linear models and omitted covariates

    • Free pre-print version: Loading...

      Authors: Yang Li, Wei Ma, Yichen Qin, Feifang Hu
      First page: 2148
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Concerns have been expressed over the validity of statistical inference under covariate-adaptive randomization despite the extensive use in clinical trials. In the literature, the inferential properties under covariate-adaptive randomization have been mainly studied for continuous responses; in particular, it is well known that the usual two-sample t-test for treatment effect is typically conservative. This phenomenon of invalid tests has also been found for generalized linear models without adjusting for the covariates and are sometimes more worrisome due to inflated Type I error. The purpose of this study is to examine the unadjusted test for treatment effect under generalized linear models and covariate-adaptive randomization. For a large class of covariate-adaptive randomization methods, we obtain the asymptotic distribution of the test statistic under the null hypothesis and derive the conditions under which the test is conservative, valid, or anti-conservative. Several commonly used generalized linear models, such as logistic regression and Poisson regression, are discussed in detail. An adjustment method is also proposed to achieve a valid size based on the asymptotic results. Numerical studies confirm the theoretical findings and demonstrate the effectiveness of the proposed adjustment method.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-04-26T10:36:05Z
      DOI: 10.1177/09622802211008206
       
  • A competing risks model with binary time varying covariates for estimation
           of breast cancer risks in BRCA1 families

    • Free pre-print version: Loading...

      Authors: Yun-Hee Choi, Hae Jung, Saundra Buys, Mary Daly, Esther M John, John Hopper, Irene Andrulis, Mary Beth Terry, Laurent Briollais
      First page: 2165
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Mammographic screening and prophylactic surgery such as risk-reducing salpingo oophorectomy can potentially reduce breast cancer risks among mutation carriers of BRCA families. The evaluation of these interventions is usually complicated by the fact that their effects on breast cancer may change over time and by the presence of competing risks. We introduce a correlated competing risks model to model breast and ovarian cancer risks within BRCA1 families that accounts for time-varying covariates. Different parametric forms for the effects of time-varying covariates are proposed for more flexibility and a correlated gamma frailty model is specified to account for the correlated competing events.We also introduce a new ascertainment correction approach that accounts for the selection of families through probands affected with either breast or ovarian cancer, or unaffected. Our simulation studies demonstrate the good performances of our proposed approach in terms of bias and precision of the estimators of model parameters and cause-specific penetrances over different levels of familial correlations. We applied our new approach to 498 BRCA1 mutation carrier families recruited through the Breast Cancer Family Registry. Our results demonstrate the importance of the functional form of the time-varying covariate effect when assessing the role of risk-reducing salpingo oophorectomy on breast cancer. In particular, under the best fitting time-varying covariate model, the overall effect of risk-reducing salpingo oophorectomy on breast cancer risk was statistically significant in women with BRCA1 mutation.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-07-07T07:22:54Z
      DOI: 10.1177/09622802211008945
       
  • Estimation of required sample size for external validation of risk models
           for binary outcomes

    • Free pre-print version: Loading...

      Authors: Menelaos Pavlou, Chen Qu, Rumana Z Omar, Shaun R Seaman, Ewout W Steyerberg, Ian R White, Gareth Ambler
      First page: 2187
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Risk-prediction models for health outcomes are used in practice as part of clinical decision-making, and it is essential that their performance be externally validated. An important aspect in the design of a validation study is choosing an adequate sample size. In this paper, we investigate the sample size requirements for validation studies with binary outcomes to estimate measures of predictive performance (C-statistic for discrimination and calibration slope and calibration in the large). We aim for sufficient precision in the estimated measures. In addition, we investigate the sample size to achieve sufficient power to detect a difference from a target value. Under normality assumptions on the distribution of the linear predictor, we obtain simple estimators for sample size calculations based on the measures above. Simulation studies show that the estimators perform well for common values of the C-statistic and outcome prevalence when the linear predictor is marginally Normal. Their performance deteriorates only slightly when the normality assumptions are violated. We also propose estimators which do not require normality assumptions but require specification of the marginal distribution of the linear predictor and require the use of numerical integration. These estimators were also seen to perform very well under marginal normality. Our sample size equations require a specified standard error (SE) and the anticipated C-statistic and outcome prevalence. The sample size requirement varies according to the prognostic strength of the model, outcome prevalence, choice of the performance measure and study objective. For example, to achieve an SE 
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-04-21T12:55:31Z
      DOI: 10.1177/09622802211007522
       
  • A Bayesian group lasso classification for ADNI volumetrics data

    • Free pre-print version: Loading...

      Authors: Atreyee Majumder, Tapabrata Maiti, Subha Datta
      First page: 2207
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      The primary objective of this paper is to develop a statistically valid classification procedure for analyzing brain image volumetrics data obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) in elderly subjects with cognitive impairments. The Bayesian group lasso method thereby proposed for logistic regression efficiently selects an optimal model with the use of a spike and slab type prior. This method selects groups of attributes of a brain subregion encouraged by the group lasso penalty. We conduct simulation studies for high- and low-dimensional scenarios where our method is always able to select the true parameters that are truly predictive among a large number of parameters. The method is then applied on dichotomous response ADNI data which selects predictive atrophied brain regions and classifies Alzheimer’s disease patients from healthy controls. Our analysis is able to give an accuracy rate of 80% for classifying Alzheimer’s disease. The suggested method selects 29 brain subregions. The medical literature indicates that all these regions are associated with Alzheimer’s patients. The Bayesian method of model selection further helps selecting only the subregions that are statistically significant, thus obtaining an optimal model.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-08-30T07:38:55Z
      DOI: 10.1177/09622802211022404
       
  • Adjusting for selection bias due to missing data in electronic health
           records-based research

    • Free pre-print version: Loading...

      Authors: Sarah B Peskoe, David Arterburn, Karen J Coleman, Lisa J Herrinton, Michael J Daniels, Sebastien Haneuse
      First page: 2221
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      While electronic health records data provide unique opportunities for research, numerous methodological issues must be considered. Among these, selection bias due to incomplete/missing data has received far less attention than other issues. Unfortunately, standard missing data approaches (e.g. inverse-probability weighting and multiple imputation) generally fail to acknowledge the complex interplay of heterogeneous decisions made by patients, providers, and health systems that govern whether specific data elements in the electronic health records are observed. This, in turn, renders the missing-at-random assumption difficult to believe in standard approaches. In the clinical literature, the collection of decisions that gives rise to the observed data is referred to as the data provenance. Building on a recently-proposed framework for modularizing the data provenance, we develop a general and scalable framework for estimation and inference with respect to regression models based on inverse-probability weighting that allows for a hierarchy of missingness mechanisms to better align with the complex nature of electronic health records data. We show that the proposed estimator is consistent and asymptotically Normal, derive the form of the asymptotic variance, and propose two consistent estimators. Simulations show that naïve application of standard methods may yield biased point estimates, that the proposed estimators have good small-sample properties, and that researchers may have to contend with a bias-variance trade-off as they consider how to handle missing data. The proposed methods are motivated by an on-going, electronic health records-based study of bariatric surgery.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-08-27T05:30:50Z
      DOI: 10.1177/09622802211027601
       
  • Additive rates model for recurrent event data with intermittently observed
           time-dependent covariates

    • Free pre-print version: Loading...

      Authors: Tianmeng Lyu, Xianghua Luo, Chiung-Yu Huang, Yifei Sun
      First page: 2239
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Various regression methods have been proposed for analyzing recurrent event data. Among them, the semiparametric additive rates model is particularly appealing because the regression coefficients quantify the absolute difference in the occurrence rate of the recurrent events between different groups. Estimation of the additive rates model requires the values of time-dependent covariates being observed throughout the entire follow-up period. In practice, however, the time-dependent covariates are usually only measured at intermittent follow-up visits. In this paper, we propose to kernel smooth functions involving time-dependent covariates across subjects in the estimating function, as opposed to imputing individual covariate trajectories. Simulation studies show that the proposed method outperforms simple imputation methods. The proposed method is illustrated with data from an epidemiologic study of the effect of streptococcal infections on recurrent pharyngitis episodes.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-08-27T05:29:12Z
      DOI: 10.1177/09622802211027593
       
  • Dealing with missing information on covariates for excess mortality hazard
           regression models – Making the imputation model compatible with the
           substantive model

    • Free pre-print version: Loading...

      Authors: Luís Antunes, Denisa Mendonça, Maria José Bento, Edmund Njeru Njagi, Aurélien Belot, Bernard Rachet
      First page: 2256
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Missing data is a common issue in epidemiological databases. Among the different ways of dealing with missing data, multiple imputation has become more available in common statistical software packages. However, the incompatibility between the imputation and substantive model, which can arise when the associations between variables in the substantive model are not taken into account in the imputation models or when the substantive model is itself nonlinear, can lead to invalid inference. Aiming at analysing population-based cancer survival data, we extended the multiple imputation substantive model compatible-fully conditional specification (SMC-FCS) approach, proposed by Bartlett et al. in 2015 to accommodate excess hazard regression models. The proposed approach was compared with the standard fully conditional specification multiple imputation procedure and with the complete-case analysis using a simulation study. The SMC-FCS approach produced unbiased estimates in both scenarios tested, while the fully conditional specification produced biased estimates and poor empirical coverages probabilities. The SMC-FCS algorithm was then used for handling missing data in the evaluation of socioeconomic inequalities in survival from colorectal cancer patients diagnosed in the North Region of Portugal. The analysis using SMC-FCS showed a clearer trend in higher excess hazards for patients coming from more deprived areas. The proposed algorithm was implemented in R software and is presented as Supplementary Material.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-02T06:03:04Z
      DOI: 10.1177/09622802211031615
       
  • A unified Bayesian framework for exact inference of area under the
           receiver operating characteristic curve

    • Free pre-print version: Loading...

      Authors: Ruitao Lin, KC Gary Chan, Haolun Shi
      First page: 2269
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      The area under the receiver operating characteristic curve is a widely used measure for evaluating the performance of a diagnostic test. Common approaches for inference on area under the receiver operating characteristic curve are usually based upon approximation. For example, the normal approximation based inference tends to suffer from the problem of low accuracy for small sample size. Frequentist empirical likelihood based approaches for area under the receiver operating characteristic curve estimation may perform better, but are usually conducted through approximation in order to reduce the computational burden, thus the inference is not exact. By contrast, we proposed an exact inferential procedure by adapting the empirical likelihood into a Bayesian framework and draw inference from the posterior samples of the area under the receiver operating characteristic curve obtained via a Gibbs sampler. The full conditional distributions within the Gibbs sampler only involve empirical likelihoods with linear constraints, which greatly simplify the computation. To further enhance the applicability and flexibility of the Bayesian empirical likelihood, we extend our method to the estimation of partial area under the receiver operating characteristic curve, comparison of multiple tests, and the doubly robust estimation of area under the receiver operating characteristic curve in the presence of missing test results. Simulation studies confirm the desirable performance of the proposed methods, and a real application is presented to illustrate its usefulness.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-01T02:10:49Z
      DOI: 10.1177/09622802211037070
       
  • Assigning readers to cases in imaging studies using balanced incomplete
           block designs

    • Free pre-print version: Loading...

      Authors: Erich P Huang, Joanna H Shih
      First page: 2288
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      In many imaging studies, each case is reviewed by human readers and characterized according to one or more features. Often, the inter-reader agreement of the feature indications is of interest in addition to their diagnostic accuracy or association with clinical outcomes. Complete designs in which all participating readers review all cases maximize efficiency and guarantee estimability of agreement metrics for all pairs of readers but often involve a heavy reading burden. Assigning readers to cases using balanced incomplete block designs substantially reduces reading burden by having each reader review only a subset of cases, while still maintaining estimability of inter-reader agreement for all pairs of readers. Methodology for data analysis and power and sample size calculations under balanced incomplete block designs is presented and applied to simulation studies and an actual example. Simulation studies results suggest that such designs may reduce reading burdens by>40% while in most scenarios incurring a
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-01T02:13:06Z
      DOI: 10.1177/09622802211037074
       
  • Multiple imputation analysis for propensity score matching with missing
           causes of failure: An application to hepatocellular carcinoma data

    • Free pre-print version: Loading...

      Authors: Seungbong Han, Kam-Wah Tsui, Hui Zhang, Gi-Ae Kim, Young-Suk Lim, Adin-Cristian Andrei
      First page: 2313
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Propensity score matching is widely used to determine the effects of treatments in observational studies. Competing risk survival data are common to medical research. However, there is a paucity of propensity score matching studies related to competing risk survival data with missing causes of failure. In this study, we provide guidelines for estimating the treatment effect on the cumulative incidence function when using propensity score matching on competing risk survival data with missing causes of failure. We examined the performances of different methods for imputing the data with missing causes. We then evaluated the gain from the missing cause imputation in an extensive simulation study and applied the proposed data imputation method to the data from a study on the risk of hepatocellular carcinoma in patients with chronic hepatitis B and chronic hepatitis C.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-01T02:27:33Z
      DOI: 10.1177/09622802211037075
       
  • Bayesian approaches to the weighted kappa-like inter-rater agreement
           measures

    • Free pre-print version: Loading...

      Authors: Quoc Duyet Tran, Haydar Demirhan, Anil Dolgun
      First page: 2329
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Inter-rater agreement measures are used to estimate the degree of agreement between two or more assessors. When the agreement table is ordinal, different weight functions that incorporate row and column scores are used along with the agreement measures. The selection of row and column scores is effectual on the estimated degree of agreement. The weighted measures are prone to the anomalies frequently seen in agreement tables such as unbalanced table structures or grey zones due to the assessment behaviour of the raters. In this study, Bayesian approaches for the estimation of inter-rater agreement measures are proposed. The Bayesian approaches make it possible to include prior information on the assessment behaviour of the raters in the analysis and impose order restrictions on the row and column scores. In this way, we improve the accuracy of the agreement measures and mitigate the impact of the anomalies in the estimation of the strength of agreement between the raters. The elicitation of prior distributions is described theoretically and practically for the Bayesian estimation of five agreement measures with three different weights using an agreement table having two grey zones. A Monte Carlo simulation study is conducted to assess the classification accuracy of the Bayesian and classical approaches for the considered agreement measures for a given level of agreement. Recommendations for the selection of the highest performing agreement measure and weight combination are made in the breakdown of the table structure and sample size.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-08-27T01:38:43Z
      DOI: 10.1177/09622802211037068
       
  • Revisiting performance metrics for prediction with rare outcomes

    • Free pre-print version: Loading...

      Authors: Samrachana Adhikari, Sharon-Lise Normand, Jordan Bloom, David Shahian, Sherri Rose
      First page: 2352
      Abstract: Statistical Methods in Medical Research, Ahead of Print.
      Machine learning algorithms are increasingly used in the clinical literature, claiming advantages over logistic regression. However, they are generally designed to maximize the area under the receiver operating characteristic curve. While area under the receiver operating characteristic curve and other measures of accuracy are commonly reported for evaluating binary prediction problems, these metrics can be misleading. We aim to give clinical and machine learning researchers a realistic medical example of the dangers of relying on a single measure of discriminatory performance to evaluate binary prediction questions. Prediction of medical complications after surgery is a frequent but challenging task because many post-surgery outcomes are rare. We predicted post-surgery mortality among patients in a clinical registry who received at least one aortic valve replacement. Estimation incorporated multiple evaluation metrics and algorithms typically regarded as performing well with rare outcomes, as well as an ensemble and a new extension of the lasso for multiple unordered treatments. Results demonstrated high accuracy for all algorithms with moderate measures of cross-validated area under the receiver operating characteristic curve. False positive rates were [math]1%, however, true positive rates were [math]7%, even when paired with a 100% positive predictive value, and graphical representations of calibration were poor. Similar results were seen in simulations, with the addition of high area under the receiver operating characteristic curve ([math]90%) accompanying low true positive rates. Clinical studies should not primarily report only area under the receiver operating characteristic curve or accuracy.
      Citation: Statistical Methods in Medical Research
      PubDate: 2021-09-01T02:30:11Z
      DOI: 10.1177/09622802211038754
       
 
JournalTOCs
School of Mathematical and Computer Sciences
Heriot-Watt University
Edinburgh, EH14 4AS, UK
Email: journaltocs@hw.ac.uk
Tel: +00 44 (0)131 4513762
 


Your IP address: 35.172.217.174
 
Home (Search)
API
About JournalTOCs
News (blog, publications)
JournalTOCs on Twitter   JournalTOCs on Facebook

JournalTOCs © 2009-