Abstract: Recurrent events often arise in follow-up studies where a subject may experience multiple occurrences of the same type of event. Most regression models for recurrent events consider the time scale measured from the study origin and assume constant effects of covariates. In many applications, however, gap times between recurrent events are of natural interest and moreover the effects may actually vary over time. In this article, we propose a marginal varying-coefficient model for gap times between recurrent events that allows for the intra-individual correlation between events. Estimation and inference procedures are developed for the varying coefficients. Consistency and weak convergence of the proposed estimator are established. Monte Carlo simulation studies demonstrate that the proposed method works well with practical sample sizes. The proposed method is illustrated with an analysis of bladder tumor clinical data. PubDate: 2021-05-08

Abstract: Current status data occur in many fields including demographical, epidemiological, financial, medical, and sociological studies. We consider the regression analysis of current status data with latent variables. The proposed model consists of a factor analytic model for characterizing latent variables through their multiple surrogates and an additive hazard model for examining potential covariate effects on the hazards of interest in the presence of current status data. We develop a borrow-strength estimation procedure that incorporates the expectation–maximization algorithm and correlated estimating equations. The consistency and asymptotic normality of the proposed estimators are established. A simulation study is conducted to evaluate the finite sample performance of the proposed method. A real-life study on the chronic kidney disease of type 2 diabetic patients is presented. PubDate: 2021-04-24

Abstract: Classical simultaneous confidence bands for survival functions (i.e., Hall–Wellner, equal precision, and empirical likelihood bands) are derived from transformations of the asymptotic Brownian nature of the Nelson–Aalen or Kaplan–Meier estimators. Due to the properties of Brownian motion, a theoretical derivation of the highest confidence density region cannot be obtained in closed form. Instead, we provide confidence bands derived from a related optimization problem with local time processes. These bands can be applied to the one-sample problem regarding both cumulative hazard and survival functions. In addition, we present a solution to the two-sample problem for testing differences in cumulative hazard functions. The finite sample performance of the proposed method is assessed by Monte Carlo simulation studies. The proposed bands are applied to clinical trial data to assess survival times for primary biliary cirrhosis patients treated with D-penicillamine. PubDate: 2021-04-13

Abstract: Understanding the distribution of an event duration time is essential in many studies. The exact time to the event is often unavailable, and thus so is the full event duration. By linking relevant longitudinal measures to the event duration, we propose to estimate the duration distribution via the first-hitting-time model (e.g. Lee and Whitmore in Stat Sci 21(4):501–513, 2006). The longitudinal measures are assumed to follow a Wiener process with random drift. We apply a variant of the MCEM algorithm to compute likelihood-based estimators of the parameters in the longitudinal process model. This allows us to adapt the well-known empirical distribution function to estimate the duration distribution in the presence of missing time origin. Estimators with smooth realizations can then be obtained by conventional smoothing techniques. We establish the consistency and weak convergence of the proposed distribution estimator and present its variance estimation. We use a collection of wildland fire records from Alberta, Canada to motivate and illustrate the proposed approach. The finite-sample performance of the proposed estimator is examined by simulation. Viewing the available data as interval-censored times, we show that the proposed estimator can be more efficient than the well-established Turnbull estimator, an alternative that is often applied in such situations. PubDate: 2021-04-05

Abstract: Recurrent event data arise in many biomedical longitudinal studies when health-related events can occur repeatedly for each subject during the follow-up time. In this article, we examine the gap times between recurrent events. We propose a new semiparametric accelerated gap time model based on the trend-renewal process which contains trend and renewal components that allow for the intensity function to vary between successive events. We use the Buckley–James imputation approach to deal with censored transformed gap times. The proposed estimators are shown to be consistent and asymptotically normal. Model diagnostic plots of residuals and a method for predicting number of recurrent events given specified covariates and follow-up time are also presented. Simulation studies are conducted to assess finite sample performance of the proposed method. The proposed technique is demonstrated through an application to two real data sets. PubDate: 2021-03-25

Abstract: Mortality deceleration, or the slowing down of death rates at old ages, has been repeatedly investigated, but empirical studies of this phenomenon have produced mixed results. The scarcity of observations at the oldest ages complicates the statistical assessment of mortality deceleration, even in the parsimonious parametric framework of the gamma-Gompertz model considered here. The need for thorough verification of the ages at death can further limit the available data. As logistical constraints may only allow to validate survivors beyond a certain (high) age, samples may be restricted to a certain age range. If we can quantify the effects of the sample size and the age range on the assessment of mortality deceleration, we can make recommendations for study design. For that purpose, we propose applying the concept of the Fisher information and ideas from the theory of optimal design. We compute the Fisher information matrix in the gamma-Gompertz model, and derive information measures for comparing the performance of different study designs. We then discuss interpretations of these measures. The special case in which the frailty variance takes the value of zero and lies on the boundary of the parameter space is given particular attention. The changes in information related to varying sample sizes or age ranges are investigated for specific scenarios. The Fisher information also allows us to study the power of a likelihood ratio test to detect mortality deceleration depending on the study design. We illustrate these methods with a study of mortality among late nineteenth-century French-Canadian birth cohorts. PubDate: 2021-02-25 DOI: 10.1007/s10985-021-09518-4

Abstract: For equivalence trials with survival outcomes, a popular testing approach is the elegant test for equivalence of two survival functions suggested by Wellek (Biometrics 49: 877–881, 1993). This test evaluates whether or not the difference between the true survival curves is practically irrelevant by specifying an equivalence margin on the hazard ratio under the proportional hazards assumption. However, this approach is based on extrapolating the behavior of the survival curves to the whole time axis, whereas in practice survival times are only observed until the end of follow-up. We propose a modification of Welleks test that only addresses equivalence until end of follow-up and derive the large sample properties of this test. Another issue is the proportional hazards assumption which may not be realistic. If this assumption is violated, one may severely misjudge the actual treatment effect with a hazard ratio quantification and wrongly declare equivalence. We suggest a non-parametric test for assessing survival equivalence within the follow-up period. We derive the large sample properties of this test and provide an approximation to the limiting distribution under some mild assumptions on the functional form of the difference between the two survival curves. Both suggestions are investigated by simulation and applied to a clinical trial on survival of gastric cancer patients. PubDate: 2021-01-30 DOI: 10.1007/s10985-021-09517-5

Abstract: With recent advancement in cancer screening and treatment, many patients with cancers are identified at early stage and clinically cured. Importantly, uncured patients should be treated timely before the cancer progresses to advanced stages for which therapeutic options are rather limited. It is also crucial to identify uncured subjects among patients with early-stage cancers for clinical trials to develop effective adjuvant therapies. Thus, it is of interest to develop statistical predictive models with as high accuracy as possible in predicting the latent cure status. The receiver operating characteristic curve (ROC) and the area under the ROC curve (AUC) are among the most widely used statistical metrics for assessing predictive accuracy or discriminatory power for a dichotomous outcome (cured/uncured). Yet the conventional AUC cannot be directly used due to incompletely observed cure status. In this article, we proposed new estimates of the ROC curve and its AUC for predicting latent cure status in Cox proportional hazards (PH) cure models and transformation cure models. We developed explicit formulas to estimate sensitivity, specificity, the ROC and its AUC without requiring to know the patient cure status. We also developed EM type estimates to approximate sensitivity, specificity, ROC and AUC conditional on observed data. Numerical studies were used to assess their finite-sample performance of the proposed methods. Both methods are consistent and have similar efficiency as shown in our numerical studies. A melanoma dataset was used to demonstrate the utility of the proposed estimates of the ROC curve for the latent cure status. We also have developed an \(\mathtt{R}\) package called \(\mathtt {evacure}\) to efficiently compute the proposed estimates. PubDate: 2021-01-28 DOI: 10.1007/s10985-021-09516-6

Abstract: The existence of a cured subgroup happens quite often in survival studies and many authors considered this under various situations (Farewell in Biometrics 38:1041–1046, 1982; Kuk and Chen in Biometrika 79:531–541, 1992; Lam and Xue in Biometrika 92:573–586, 2005; Zhou et al. in J Comput Graph Stat 27:48–58, 2018). In this paper, we discuss the situation where only interval-censored data are available and furthermore, the censoring may be informative, for which there does not seem to exist an established estimation procedure. For the analysis, we present a three component model consisting of a logistic model for describing the cure rate, an additive hazards model for the failure time of interest and a nonhomogeneous Poisson model for the observation process. For estimation, we propose a sieve maximum likelihood estimation procedure and the asymptotic properties of the resulting estimators are established. Furthermore, an EM algorithm is developed for the implementation of the proposed estimation approach, and extensive simulation studies are conducted and suggest that the proposed method works well for practical situations. Also the approach is applied to a cardiac allograft vasculopathy study that motivated this investigation. PubDate: 2021-01-22 DOI: 10.1007/s10985-021-09515-7

Abstract: This paper deals with statistical inference procedure of multivariate failure time data when the primary covariate can be measured only on a subset of the full cohort but the auxiliary information is available. To improve efficiency of statistical inference, we use quadratic inference function approach to incorporate the intra-cluster correlation and use kernel smoothing technique to further utilize the auxiliary information. The proposed method is shown to be more efficient than those ignoring the intra-cluster correlation and auxiliary information and is easy to implement. In addition, we develop a chi-squared test for hypothesis testing of hazard ratio parameters. We evaluate the finite-sample performance of the proposed procedure via extensive simulation studies. The proposed approach is illustrated by analysis of a real data set from the study of left ventricular dysfunction. PubDate: 2021-01-08 DOI: 10.1007/s10985-020-09513-1

Abstract: Time-to-event data often violate the proportional hazards assumption inherent in the popular Cox regression model. Such violations are especially common in the sphere of biological and medical data where latent heterogeneity due to unmeasured covariates or time varying effects are common. A variety of parametric survival models have been proposed in the literature which make more appropriate assumptions on the hazard function, at least for certain applications. One such model is derived from the First Hitting Time (FHT) paradigm which assumes that a subject’s event time is determined by a latent stochastic process reaching a threshold value. Several random effects specifications of the FHT model have also been proposed which allow for better modeling of data with unmeasured covariates. While often appropriate, these methods often display limited flexibility due to their inability to model a wide range of heterogeneities. To address this issue, we propose a Bayesian model which loosens assumptions on the mixing distribution inherent in the random effects FHT models currently in use. We demonstrate via simulation study that the proposed model greatly improves both survival and parameter estimation in the presence of latent heterogeneity. We also apply the proposed methodology to data from a toxicology/carcinogenicity study which exhibits nonproportional hazards and contrast the results with both the Cox model and two popular FHT models. PubDate: 2021-01-08 DOI: 10.1007/s10985-020-09514-0

Abstract: This paper considers the optimal design for the frailty model with discrete-time survival endpoints in longitudinal studies. We introduce the random effects into the discrete hazard models to account for the heterogeneity between experimental subjects, which causes the observations of the same subject at the sequential time points being correlated. We propose a general design method to collect the survival endpoints as inexpensively and efficiently as possible. A cost-based generalized D ( \(D_s\) )-optimal design criterion is proposed to derive the optimal designs for estimating the fixed effects with cost constraint. Different computation strategies based on grid search or particle swarm optimization (PSO) algorithm are provided to obtain generalized D ( \(D_s\) )-optimal designs. The equivalence theorem for the cost-based D ( \(D_s\) )-optimal design criterion is given to verify the optimality of the designs. Our numerical results indicate that the presence of the random effects has a great influence on the optimal designs. Some useful suggestions are also put forward for future designing longitudinal studies. PubDate: 2021-01-08 DOI: 10.1007/s10985-020-09512-2

Abstract: We estimate the dementia incidence hazard in Germany for the birth cohorts 1900 until 1954 from a simple sample of Germany’s largest health insurance company. Followed from 2004 to 2012, 36,000 uncensored dementia incidences are observed and further 200,000 right-censored insurants included. From a multiplicative hazard model we find a positive and linear trend in the dementia hazard over the cohorts. The main focus of the study is on 11,000 left-censored persons who have already suffered from the disease in 2004. After including the left-censored observations, the slope of the trend declines markedly due to Simpson’s paradox, left-censored persons are imbalanced between the cohorts. When including left-censoring, the dementia hazard increases differently for different ages, we consider omitted covariates to be the reason. For the standard errors from large sample theory, left-censoring requires an adjustment to the conditional information matrix equality. PubDate: 2021-01-01 DOI: 10.1007/s10985-020-09505-1

Abstract: In this paper, we propose an innovative method for jointly analyzing survival data and longitudinally measured continuous and ordinal data. We use a random effects accelerated failure time model for survival outcomes, a linear mixed model for continuous longitudinal outcomes and a proportional odds mixed model for ordinal longitudinal outcomes, where these outcome processes are linked through a set of association parameters. A primary objective of this study is to examine the effects of association parameters on the estimators of joint models. The model parameters are estimated by the method of maximum likelihood. The finite-sample properties of the estimators are studied using Monte Carlo simulations. The empirical study suggests that the degree of association among the outcome processes influences the bias, efficiency, and coverage probability of the estimators. Our proposed joint model estimators are approximately unbiased and produce smaller mean squared errors as compared to the estimators obtained from separate models. This work is motivated by a large multicenter study, referred to as the Genetic and Inflammatory Markers of Sepsis (GenIMS) study. We apply our proposed method to the GenIMS data analysis. PubDate: 2020-11-24 DOI: 10.1007/s10985-020-09511-3

Abstract: Models for situations where some individuals are long-term survivors, immune or non-susceptible to the event of interest, are extensively studied in biomedical research. Fitting a regression can be problematic in situations involving small sample sizes with high censoring rate, since the maximum likelihood estimates of some coefficients may be infinity. This phenomenon is called monotone likelihood, and it occurs in the presence of many categorical covariates, especially when one covariate level is not associated with any failure (in survival analysis) or when a categorical covariate perfectly predicts a binary response (in the logistic regression). A well known solution is an adaptation of the Firth method, originally created to reduce the estimation bias. The method provides a finite estimate by penalizing the likelihood function. Bias correction in the mixture cure model is a topic rarely discussed in the literature and it configures a central contribution of this work. In order to handle this point in such context, we propose to derive the adjusted score function based on the Firth method. An extensive Monte Carlo simulation study indicates good inference performance for the penalized maximum likelihood estimates. The analysis is illustrated through a real application involving patients with melanoma assisted at the Hospital das Clínicas/UFMG in Brazil. This is a relatively novel data set affected by the monotone likelihood issue and containing cured individuals. PubDate: 2020-11-13 DOI: 10.1007/s10985-020-09510-4

Abstract: Calibration is an important measure of the predictive accuracy for a prognostic risk model. A widely used measure of calibration when the outcome is survival time is the expected Brier score. In this paper, methodology is developed to accurately estimate the difference in expected Brier scores derived from nested survival models and to compute an accompanying variance estimate of this difference. The methodology is applicable to time invariant and time-varying coefficient Cox survival models. The nested survival model approach is often applied to the scenario where the full model consists of conventional and new covariates and the subset model contains the conventional covariates alone. A complicating factor in the methodologic development is that the Cox model specification cannot, in general, be simultaneously satisfied for nested models. The problem has been resolved by projecting the properly specified full survival model onto the lower dimensional space of conventional markers alone. Simulations are performed to examine the method’s finite sample properties and a prostate cancer data set is used to illustrate its application. PubDate: 2020-10-22 DOI: 10.1007/s10985-020-09509-x

Abstract: Outcome-dependent sampling designs such as the case–control or case–cohort design are widely used in epidemiological studies for their outstanding cost-effectiveness. In this article, we propose and develop a smoothed weighted Gehan estimating equation approach for inference in an accelerated failure time model under a general failure time outcome-dependent sampling scheme. The proposed estimating equation is continuously differentiable and can be solved by the standard numerical methods. In addition to developing asymptotic properties of the proposed estimator, we also propose and investigate a new optimal power-based subsamples allocation criteria in the proposed design by maximizing the power function of a significant test. Simulation results show that the proposed estimator is more efficient than other existing competing estimators and the optimal power-based subsamples allocation will provide an ODS design that yield improved power for the test of exposure effect. We illustrate the proposed method with a data set from the Norwegian Mother and Child Cohort Study to evaluate the relationship between exposure to perfluoroalkyl substances and women’s subfecundity. PubDate: 2020-10-12 DOI: 10.1007/s10985-020-09508-y

Abstract: In this paper, we first propose a dependent Dirichlet process (DDP) model using a mixture of Weibull models with each mixture component resembling a Cox model for survival data. We then build a Dirichlet process mixture model for competing risks data without regression covariates. Next we extend this model to a DDP model for competing risks regression data by using a multiplicative covariate effect on subdistribution hazards in the mixture components. Though built on proportional hazards (or subdistribution hazards) models, the proposed nonparametric Bayesian regression models do not require the assumption of constant hazard (or subdistribution hazard) ratio. An external time-dependent covariate is also considered in the survival model. After describing the model, we discuss how both cause-specific and subdistribution hazard ratios can be estimated from the same nonparametric Bayesian model for competing risks regression. For use with the regression models proposed, we introduce an omnibus prior that is suitable when little external information is available about covariate effects. Finally we compare the models’ performance with existing methods through simulations. We also illustrate the proposed competing risks regression model with data from a breast cancer study. An R package “DPWeibull” implementing all of the proposed methods is available at CRAN. PubDate: 2020-10-12 DOI: 10.1007/s10985-020-09506-0

Abstract: Interval-censored failure time data arise in a number of fields and many authors have recently paid more attention to their analysis. However, regression analysis of interval-censored data under the additive risk model can be challenging in maximizing the complex likelihood, especially when there exists a non-ignorable cure fraction in the population. For the problem, we develop a sieve maximum likelihood estimation approach based on Bernstein polynomials. To relieve the computational burden, an expectation–maximization algorithm by exploiting a Poisson data augmentation is proposed. Under some mild conditions, the asymptotic properties of the proposed estimator are established. The finite sample performance of the proposed method is evaluated by extensive simulations, and is further illustrated through a real data set from the smoking cessation study. PubDate: 2020-10-01 DOI: 10.1007/s10985-020-09507-z

Abstract: The original version of this article unfortunately contains mistakes. It has been corrected with this Correction PubDate: 2020-07-09 DOI: 10.1007/s10985-020-09502-4