Abstract: Mortality deceleration, or the slowing down of death rates at old ages, has been repeatedly investigated, but empirical studies of this phenomenon have produced mixed results. The scarcity of observations at the oldest ages complicates the statistical assessment of mortality deceleration, even in the parsimonious parametric framework of the gamma-Gompertz model considered here. The need for thorough verification of the ages at death can further limit the available data. As logistical constraints may only allow to validate survivors beyond a certain (high) age, samples may be restricted to a certain age range. If we can quantify the effects of the sample size and the age range on the assessment of mortality deceleration, we can make recommendations for study design. For that purpose, we propose applying the concept of the Fisher information and ideas from the theory of optimal design. We compute the Fisher information matrix in the gamma-Gompertz model, and derive information measures for comparing the performance of different study designs. We then discuss interpretations of these measures. The special case in which the frailty variance takes the value of zero and lies on the boundary of the parameter space is given particular attention. The changes in information related to varying sample sizes or age ranges are investigated for specific scenarios. The Fisher information also allows us to study the power of a likelihood ratio test to detect mortality deceleration depending on the study design. We illustrate these methods with a study of mortality among late nineteenth-century French-Canadian birth cohorts. PubDate: 2021-02-25

Abstract: For equivalence trials with survival outcomes, a popular testing approach is the elegant test for equivalence of two survival functions suggested by Wellek (Biometrics 49: 877–881, 1993). This test evaluates whether or not the difference between the true survival curves is practically irrelevant by specifying an equivalence margin on the hazard ratio under the proportional hazards assumption. However, this approach is based on extrapolating the behavior of the survival curves to the whole time axis, whereas in practice survival times are only observed until the end of follow-up. We propose a modification of Welleks test that only addresses equivalence until end of follow-up and derive the large sample properties of this test. Another issue is the proportional hazards assumption which may not be realistic. If this assumption is violated, one may severely misjudge the actual treatment effect with a hazard ratio quantification and wrongly declare equivalence. We suggest a non-parametric test for assessing survival equivalence within the follow-up period. We derive the large sample properties of this test and provide an approximation to the limiting distribution under some mild assumptions on the functional form of the difference between the two survival curves. Both suggestions are investigated by simulation and applied to a clinical trial on survival of gastric cancer patients. PubDate: 2021-01-30

Abstract: With recent advancement in cancer screening and treatment, many patients with cancers are identified at early stage and clinically cured. Importantly, uncured patients should be treated timely before the cancer progresses to advanced stages for which therapeutic options are rather limited. It is also crucial to identify uncured subjects among patients with early-stage cancers for clinical trials to develop effective adjuvant therapies. Thus, it is of interest to develop statistical predictive models with as high accuracy as possible in predicting the latent cure status. The receiver operating characteristic curve (ROC) and the area under the ROC curve (AUC) are among the most widely used statistical metrics for assessing predictive accuracy or discriminatory power for a dichotomous outcome (cured/uncured). Yet the conventional AUC cannot be directly used due to incompletely observed cure status. In this article, we proposed new estimates of the ROC curve and its AUC for predicting latent cure status in Cox proportional hazards (PH) cure models and transformation cure models. We developed explicit formulas to estimate sensitivity, specificity, the ROC and its AUC without requiring to know the patient cure status. We also developed EM type estimates to approximate sensitivity, specificity, ROC and AUC conditional on observed data. Numerical studies were used to assess their finite-sample performance of the proposed methods. Both methods are consistent and have similar efficiency as shown in our numerical studies. A melanoma dataset was used to demonstrate the utility of the proposed estimates of the ROC curve for the latent cure status. We also have developed an \(\mathtt{R}\) package called \(\mathtt {evacure}\) to efficiently compute the proposed estimates. PubDate: 2021-01-28

Abstract: The existence of a cured subgroup happens quite often in survival studies and many authors considered this under various situations (Farewell in Biometrics 38:1041–1046, 1982; Kuk and Chen in Biometrika 79:531–541, 1992; Lam and Xue in Biometrika 92:573–586, 2005; Zhou et al. in J Comput Graph Stat 27:48–58, 2018). In this paper, we discuss the situation where only interval-censored data are available and furthermore, the censoring may be informative, for which there does not seem to exist an established estimation procedure. For the analysis, we present a three component model consisting of a logistic model for describing the cure rate, an additive hazards model for the failure time of interest and a nonhomogeneous Poisson model for the observation process. For estimation, we propose a sieve maximum likelihood estimation procedure and the asymptotic properties of the resulting estimators are established. Furthermore, an EM algorithm is developed for the implementation of the proposed estimation approach, and extensive simulation studies are conducted and suggest that the proposed method works well for practical situations. Also the approach is applied to a cardiac allograft vasculopathy study that motivated this investigation. PubDate: 2021-01-22 DOI: 10.1007/s10985-021-09515-7

Abstract: This paper deals with statistical inference procedure of multivariate failure time data when the primary covariate can be measured only on a subset of the full cohort but the auxiliary information is available. To improve efficiency of statistical inference, we use quadratic inference function approach to incorporate the intra-cluster correlation and use kernel smoothing technique to further utilize the auxiliary information. The proposed method is shown to be more efficient than those ignoring the intra-cluster correlation and auxiliary information and is easy to implement. In addition, we develop a chi-squared test for hypothesis testing of hazard ratio parameters. We evaluate the finite-sample performance of the proposed procedure via extensive simulation studies. The proposed approach is illustrated by analysis of a real data set from the study of left ventricular dysfunction. PubDate: 2021-01-08 DOI: 10.1007/s10985-020-09513-1

Abstract: Time-to-event data often violate the proportional hazards assumption inherent in the popular Cox regression model. Such violations are especially common in the sphere of biological and medical data where latent heterogeneity due to unmeasured covariates or time varying effects are common. A variety of parametric survival models have been proposed in the literature which make more appropriate assumptions on the hazard function, at least for certain applications. One such model is derived from the First Hitting Time (FHT) paradigm which assumes that a subject’s event time is determined by a latent stochastic process reaching a threshold value. Several random effects specifications of the FHT model have also been proposed which allow for better modeling of data with unmeasured covariates. While often appropriate, these methods often display limited flexibility due to their inability to model a wide range of heterogeneities. To address this issue, we propose a Bayesian model which loosens assumptions on the mixing distribution inherent in the random effects FHT models currently in use. We demonstrate via simulation study that the proposed model greatly improves both survival and parameter estimation in the presence of latent heterogeneity. We also apply the proposed methodology to data from a toxicology/carcinogenicity study which exhibits nonproportional hazards and contrast the results with both the Cox model and two popular FHT models. PubDate: 2021-01-08 DOI: 10.1007/s10985-020-09514-0

Abstract: This paper considers the optimal design for the frailty model with discrete-time survival endpoints in longitudinal studies. We introduce the random effects into the discrete hazard models to account for the heterogeneity between experimental subjects, which causes the observations of the same subject at the sequential time points being correlated. We propose a general design method to collect the survival endpoints as inexpensively and efficiently as possible. A cost-based generalized D ( \(D_s\) )-optimal design criterion is proposed to derive the optimal designs for estimating the fixed effects with cost constraint. Different computation strategies based on grid search or particle swarm optimization (PSO) algorithm are provided to obtain generalized D ( \(D_s\) )-optimal designs. The equivalence theorem for the cost-based D ( \(D_s\) )-optimal design criterion is given to verify the optimality of the designs. Our numerical results indicate that the presence of the random effects has a great influence on the optimal designs. Some useful suggestions are also put forward for future designing longitudinal studies. PubDate: 2021-01-08 DOI: 10.1007/s10985-020-09512-2

Abstract: We estimate the dementia incidence hazard in Germany for the birth cohorts 1900 until 1954 from a simple sample of Germany’s largest health insurance company. Followed from 2004 to 2012, 36,000 uncensored dementia incidences are observed and further 200,000 right-censored insurants included. From a multiplicative hazard model we find a positive and linear trend in the dementia hazard over the cohorts. The main focus of the study is on 11,000 left-censored persons who have already suffered from the disease in 2004. After including the left-censored observations, the slope of the trend declines markedly due to Simpson’s paradox, left-censored persons are imbalanced between the cohorts. When including left-censoring, the dementia hazard increases differently for different ages, we consider omitted covariates to be the reason. For the standard errors from large sample theory, left-censoring requires an adjustment to the conditional information matrix equality. PubDate: 2021-01-01 DOI: 10.1007/s10985-020-09505-1

Abstract: In this paper, we propose an innovative method for jointly analyzing survival data and longitudinally measured continuous and ordinal data. We use a random effects accelerated failure time model for survival outcomes, a linear mixed model for continuous longitudinal outcomes and a proportional odds mixed model for ordinal longitudinal outcomes, where these outcome processes are linked through a set of association parameters. A primary objective of this study is to examine the effects of association parameters on the estimators of joint models. The model parameters are estimated by the method of maximum likelihood. The finite-sample properties of the estimators are studied using Monte Carlo simulations. The empirical study suggests that the degree of association among the outcome processes influences the bias, efficiency, and coverage probability of the estimators. Our proposed joint model estimators are approximately unbiased and produce smaller mean squared errors as compared to the estimators obtained from separate models. This work is motivated by a large multicenter study, referred to as the Genetic and Inflammatory Markers of Sepsis (GenIMS) study. We apply our proposed method to the GenIMS data analysis. PubDate: 2020-11-24 DOI: 10.1007/s10985-020-09511-3

Abstract: Models for situations where some individuals are long-term survivors, immune or non-susceptible to the event of interest, are extensively studied in biomedical research. Fitting a regression can be problematic in situations involving small sample sizes with high censoring rate, since the maximum likelihood estimates of some coefficients may be infinity. This phenomenon is called monotone likelihood, and it occurs in the presence of many categorical covariates, especially when one covariate level is not associated with any failure (in survival analysis) or when a categorical covariate perfectly predicts a binary response (in the logistic regression). A well known solution is an adaptation of the Firth method, originally created to reduce the estimation bias. The method provides a finite estimate by penalizing the likelihood function. Bias correction in the mixture cure model is a topic rarely discussed in the literature and it configures a central contribution of this work. In order to handle this point in such context, we propose to derive the adjusted score function based on the Firth method. An extensive Monte Carlo simulation study indicates good inference performance for the penalized maximum likelihood estimates. The analysis is illustrated through a real application involving patients with melanoma assisted at the Hospital das Clínicas/UFMG in Brazil. This is a relatively novel data set affected by the monotone likelihood issue and containing cured individuals. PubDate: 2020-11-13 DOI: 10.1007/s10985-020-09510-4

Abstract: Calibration is an important measure of the predictive accuracy for a prognostic risk model. A widely used measure of calibration when the outcome is survival time is the expected Brier score. In this paper, methodology is developed to accurately estimate the difference in expected Brier scores derived from nested survival models and to compute an accompanying variance estimate of this difference. The methodology is applicable to time invariant and time-varying coefficient Cox survival models. The nested survival model approach is often applied to the scenario where the full model consists of conventional and new covariates and the subset model contains the conventional covariates alone. A complicating factor in the methodologic development is that the Cox model specification cannot, in general, be simultaneously satisfied for nested models. The problem has been resolved by projecting the properly specified full survival model onto the lower dimensional space of conventional markers alone. Simulations are performed to examine the method’s finite sample properties and a prostate cancer data set is used to illustrate its application. PubDate: 2020-10-22 DOI: 10.1007/s10985-020-09509-x

Abstract: Outcome-dependent sampling designs such as the case–control or case–cohort design are widely used in epidemiological studies for their outstanding cost-effectiveness. In this article, we propose and develop a smoothed weighted Gehan estimating equation approach for inference in an accelerated failure time model under a general failure time outcome-dependent sampling scheme. The proposed estimating equation is continuously differentiable and can be solved by the standard numerical methods. In addition to developing asymptotic properties of the proposed estimator, we also propose and investigate a new optimal power-based subsamples allocation criteria in the proposed design by maximizing the power function of a significant test. Simulation results show that the proposed estimator is more efficient than other existing competing estimators and the optimal power-based subsamples allocation will provide an ODS design that yield improved power for the test of exposure effect. We illustrate the proposed method with a data set from the Norwegian Mother and Child Cohort Study to evaluate the relationship between exposure to perfluoroalkyl substances and women’s subfecundity. PubDate: 2020-10-12 DOI: 10.1007/s10985-020-09508-y

Abstract: In this paper, we first propose a dependent Dirichlet process (DDP) model using a mixture of Weibull models with each mixture component resembling a Cox model for survival data. We then build a Dirichlet process mixture model for competing risks data without regression covariates. Next we extend this model to a DDP model for competing risks regression data by using a multiplicative covariate effect on subdistribution hazards in the mixture components. Though built on proportional hazards (or subdistribution hazards) models, the proposed nonparametric Bayesian regression models do not require the assumption of constant hazard (or subdistribution hazard) ratio. An external time-dependent covariate is also considered in the survival model. After describing the model, we discuss how both cause-specific and subdistribution hazard ratios can be estimated from the same nonparametric Bayesian model for competing risks regression. For use with the regression models proposed, we introduce an omnibus prior that is suitable when little external information is available about covariate effects. Finally we compare the models’ performance with existing methods through simulations. We also illustrate the proposed competing risks regression model with data from a breast cancer study. An R package “DPWeibull” implementing all of the proposed methods is available at CRAN. PubDate: 2020-10-12 DOI: 10.1007/s10985-020-09506-0

Abstract: Interval-censored failure time data arise in a number of fields and many authors have recently paid more attention to their analysis. However, regression analysis of interval-censored data under the additive risk model can be challenging in maximizing the complex likelihood, especially when there exists a non-ignorable cure fraction in the population. For the problem, we develop a sieve maximum likelihood estimation approach based on Bernstein polynomials. To relieve the computational burden, an expectation–maximization algorithm by exploiting a Poisson data augmentation is proposed. Under some mild conditions, the asymptotic properties of the proposed estimator are established. The finite sample performance of the proposed method is evaluated by extensive simulations, and is further illustrated through a real data set from the smoking cessation study. PubDate: 2020-10-01 DOI: 10.1007/s10985-020-09507-z

Abstract: Interval-censored data often arise naturally in medical, biological, and demographical studies. As a matter of routine, the Cox proportional hazards regression is employed to fit such censored data. The related work in the framework of additive hazards regression, which is always considered as a promising alternative, remains to be investigated. We propose a sieve maximum likelihood method for estimating regression parameters in the additive hazards regression with case II interval-censored data, which consists of right-, left- and interval-censored observations. We establish the consistency and the asymptotic normality of the proposed estimator and show that it attains the semiparametric efficiency bound. The finite-sample performance of the proposed method is assessed via comprehensive simulation studies, which is further illustrated by a real clinical example for patients with hemophilia. PubDate: 2020-10-01 DOI: 10.1007/s10985-020-09496-z

Abstract: The cause of failure in cohort studies that involve competing risks is frequently incompletely observed. To address this, several methods have been proposed for the semiparametric proportional cause-specific hazards model under a missing at random assumption. However, these proposals provide inference for the regression coefficients only, and do not consider the infinite dimensional parameters, such as the covariate-specific cumulative incidence function. Nevertheless, the latter quantity is essential for risk prediction in modern medicine. In this paper we propose a unified framework for inference about both the regression coefficients of the proportional cause-specific hazards model and the covariate-specific cumulative incidence functions under missing at random cause of failure. Our approach is based on a novel computationally efficient maximum pseudo-partial-likelihood estimation method for the semiparametric proportional cause-specific hazards model. Using modern empirical process theory we derive the asymptotic properties of the proposed estimators for the regression coefficients and the covariate-specific cumulative incidence functions, and provide methodology for constructing simultaneous confidence bands for the latter. Simulation studies show that our estimators perform well even in the presence of a large fraction of missing cause of failures, and that the regression coefficient estimator can be substantially more efficient compared to the previously proposed augmented inverse probability weighting estimator. The method is applied using data from an HIV cohort study and a bladder cancer clinical trial. PubDate: 2020-10-01 DOI: 10.1007/s10985-020-09494-1

Abstract: In the time-to-event setting, the concordance probability assesses the relative level of agreement between a model-based risk score and the survival time of a patient. While it provides a measure of discrimination over the entire follow-up period of a study, the probability does not provide information on the longitudinal durability of a baseline risk score. It is possible that a baseline risk model is able to segregate short-term from long-term survivors but unable to maintain its discriminatory strength later in the follow-up period. As a consequence, this would motivate clinicians to re-evaluate the risk score longitudinally. This longitudinal re-evaluation may not, however, be feasible in many scenarios since a single baseline evaluation may be the only data collectible due to treatment or other clinical or ethical reasons. In these scenarios, an attenuation of the discriminatory power of the patient risk score over time would indicate decreased clinical utility and call into question whether this score should remain a prognostic tool at later time points. Working within the concordance probability paradigm, we propose a method to address this clinical scenario and evaluate the discriminatory power of a baseline derived risk score over time. The methodology is illustrated with two examples: a baseline risk score in colorectal cancer defined at the time of tumor resection, and for circulating tumor cells in metastatic prostate cancer. PubDate: 2020-07-24 DOI: 10.1007/s10985-020-09503-3

Abstract: The hazard ratio is one of the most commonly reported measures of treatment effect in randomised trials, yet the source of much misinterpretation. This point was made clear by Hernán (Epidemiology (Cambridge, Mass) 21(1):13–15, 2010) in a commentary, which emphasised that the hazard ratio contrasts populations of treated and untreated individuals who survived a given period of time, populations that will typically fail to be comparable—even in a randomised trial—as a result of different pressures or intensities acting on different populations. The commentary has been very influential, but also a source of surprise and confusion. In this note, we aim to provide more insight into the subtle interpretation of hazard ratios and differences, by investigating in particular what can be learned about a treatment effect from the hazard ratio becoming 1 (or the hazard difference 0) after a certain period of time. We further define a hazard ratio that has a causal interpretation and study its relationship to the Cox hazard ratio, and we also define a causal hazard difference. These quantities are of theoretical interest only, however, since they rely on assumptions that cannot be empirically evaluated. Throughout, we will focus on the analysis of randomised experiments. PubDate: 2020-07-11 DOI: 10.1007/s10985-020-09501-5

Abstract: The original version of this article unfortunately contains mistakes. It has been corrected with this Correction PubDate: 2020-07-09 DOI: 10.1007/s10985-020-09502-4

Abstract: Mean residual life (MRL) is the remaining life expectancy of a subject who has survived to a certain time point and can be used as an alternative to hazard function for characterizing the distribution of a time-to-event variable. Inference and application of MRL models have primarily focused on full-cohort studies. In practice, case-cohort and nested case-control designs have been commonly used within large cohorts that have long follow-up and study rare diseases, particularly when studying costly molecular biomarkers. They enable prospective inference as the full-cohort design with significant cost-saving benefits. In this paper, we study the modeling and inference of a family of generalized MRL models under case-cohort and nested case-control designs. Built upon the idea of inverse selection probability, the weighted estimating equations are constructed to estimate regression parameters and baseline MRL function. Asymptotic properties of the proposed estimators are established and finite-sample performance is evaluated by extensive numerical simulations. An application to the New York University Women’s Health Study is presented to illustrate the proposed models and demonstrate a model diagnostic method to guide practical implementation. PubDate: 2020-06-11 DOI: 10.1007/s10985-020-09499-w