Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Estimating individualized treatment rules—particularly in the context of right-censored outcomes—is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom’s Clinical Practice Research Datalink. PubDate: 2022-05-02
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In oncology studies, it is important to understand and characterize disease heterogeneity among patients so that patients can be classified into different risk groups and one can identify high-risk patients at the right time. This information can then be used to identify a more homogeneous patient population for developing precision medicine. In this paper, we propose a mixture survival tree approach for direct risk classification. We assume that the patients can be classified into a pre-specified number of risk groups, where each group has distinct survival profile. Our proposed tree-based methods are devised to estimate latent group membership using an EM algorithm. The observed data log-likelihood function is used as the splitting criterion in recursive partitioning. The finite sample performance is evaluated by extensive simulation studies and the proposed method is illustrated by a case study in breast cancer. PubDate: 2022-04-29
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In this paper we propose a boosting algorithm to extend the applicability of a first hitting time model to high-dimensional frameworks. Based on an underlying stochastic process, first hitting time models do not require the proportional hazards assumption, hardly verifiable in the high-dimensional context, and represent a valid parametric alternative to the Cox model for modelling time-to-event responses. First hitting time models also offer a natural way to integrate low-dimensional clinical and high-dimensional molecular information in a prediction model, that avoids complicated weighting schemes typical of current methods. The performance of our novel boosting algorithm is illustrated in three real data examples. PubDate: 2022-04-27
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract This paper discusses the fitting of the proportional hazards model to interval-censored failure time data with missing covariates. Many authors have discussed the problem when complete covariate information is available or the missing is completely at random. In contrast to this, we will focus on the situation where the missing is at random. For the problem, a sieve maximum likelihood estimation approach is proposed with the use of I-spline functions to approximate the unknown cumulative baseline hazard function in the model. For the implementation of the proposed method, we develop an EM algorithm based on a two-stage data augmentation. Furthermore, we show that the proposed estimators of regression parameters are consistent and asymptotically normal. The proposed approach is then applied to a set of the data concerning Alzheimer Disease that motivated this study. PubDate: 2022-03-29
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract In the study of life tables the random variable of interest is usually assumed discrete since mortality rates are studied for integer ages. In dynamic life tables a time domain is included to account for the evolution effect of the hazard rates in time. In this article we follow a survival analysis approach and use a nonparametric description of the hazard rates. We construct a discrete time stochastic processes that reflects dependence across age as well as in time. This process is used as a bayesian nonparametric prior distribution for the hazard rates for the study of evolutionary life tables. Prior properties of the process are studied and posterior distributions are derived. We present a simulation study, with the inclusion of right censored observations, as well as a real data analysis to show the performance of our model. PubDate: 2022-03-17
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract For high dimensional gene expression data, one important goal is to identify a small number of genes that are associated with progression of the disease or survival of the patients. In this paper, we consider the problem of variable selection for multivariate survival data. We propose an estimation procedure for high dimensional accelerated failure time (AFT) models with bivariate censored data. The method extends the Buckley-James method by minimizing a penalized \(L_2\) loss function with a penalty function induced from a bivariate spike-and-slab prior specification. In the proposed algorithm, censored observations are imputed using the Kaplan-Meier estimator, which avoids a parametric assumption on the error terms. Our empirical studies demonstrate that the proposed method provides better performance compared to the alternative procedures designed for univariate survival data regardless of whether the true events are correlated or not, and conceptualizes a formal way of handling bivariate survival data for AFT models. Findings from the analysis of a myeloma clinical trial using the proposed method are also presented. PubDate: 2022-03-03 DOI: 10.1007/s10985-022-09549-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Additive hazards model is often used to complement the proportional hazards model in the analysis of failure time data. Statistical inference of additive hazards model with time-dependent longitudinal covariates requires the availability of the whole trajectory of the longitudinal process, which is not realistic in practice. The commonly used last value carried forward approach for intermittently observed longitudinal covariates can induce biased parameter estimation. The more principled joint modeling of the longitudinal process and failure time data imposes strong modeling assumptions, which is difficult to verify. In this paper, we propose methods that weigh the distance between the observational time of longitudinal covariates and the failure time, resulting in unbiased regression coefficient estimation. We establish the consistency and asymptotic normality of the proposed estimators. Simulation studies provide numerical support for the theoretical findings. Data from an Alzheimer’s study illustrate the practical utility of the methodology. PubDate: 2022-02-11 DOI: 10.1007/s10985-022-09548-6
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We first review some main results for phase-type distributions, including a discussion of Coxian distributions and their canonical representations. We then consider the extension of phase-type modeling to cover competing risks. This extension involves the consideration of finite state Markov chains with more than one absorbing state, letting each absorbing state correspond to a particular risk. The non-uniqueness of Markov chain representations of phase-type distributions is well known. In the paper we study corresponding issues for the competing risks case with the aim of obtaining identifiable parameterizations. Statistical inference for the Coxian competing risks model is briefly discussed and some real data are analyzed for illustration. PubDate: 2022-02-08 DOI: 10.1007/s10985-022-09547-7
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Survival modeling with time-varying coefficients has proven useful in analyzing time-to-event data with one or more distinct failure types. When studying the cause-specific etiology of breast and prostate cancers using the large-scale data from the Surveillance, Epidemiology, and End Results (SEER) Program, we encountered two major challenges that existing methods for estimating time-varying coefficients cannot tackle. First, these methods, dependent on expanding the original data in a repeated measurement format, result in formidable time and memory consumption as the sample size escalates to over one million. In this case, even a well-configured workstation cannot accommodate their implementations. Second, when the large-scale data under analysis include binary predictors with near-zero variance (e.g., only 0.6% of patients in our SEER prostate cancer data had tumors regional to the lymph nodes), existing methods suffer from numerical instability due to ill-conditioned second-order information. The estimation accuracy deteriorates further with multiple competing risks. To address these issues, we propose a proximal Newton algorithm with a shared-memory parallelization scheme and tests of significance and nonproportionality for the time-varying effects. A simulation study shows that our scalable approach reduces the time and memory costs by orders of magnitude and enjoys improved estimation accuracy compared with alternative approaches. Applications to the SEER cancer data demonstrate the real-world performance of the proximal Newton algorithm. PubDate: 2022-01-29 DOI: 10.1007/s10985-021-09544-2
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Accurate risk prediction has been the central goal in many studies of survival outcomes. In the presence of multiple risk factors, a censored regression model can be employed to estimate a risk prediction rule. Before the prediction tool can be popularized for practical use, it is crucial to rigorously assess its prediction performance. In our motivating example, researchers are interested in developing and validating a risk prediction tool to identify future lung cancer cases by integrating demographic information, disease characteristics and smoking-related data. Considering the long latency period of cancer, it is desirable for a prediction tool to achieve discriminative performance that does not weaken over time. We propose estimation and inferential procedures to comprehensively assess both the overall predictive discrimination and the temporal pattern of an estimated prediction rule. The proposed methods readily accommodate commonly used censored regression models, including the Cox proportional hazards model and the accelerated failure time model. The estimators are consistent and asymptotically normal, and reliable variance estimators are also developed. The proposed methods offer an informative tool for inferring time-dependent predictive discrimination, as well as for comparing the discrimination performance between candidate models. Applications of the proposed methods demonstrate enduring performance of the risk prediction tool in the PLCO study and detected decaying performance in a study of liver disease. PubDate: 2022-01-21 DOI: 10.1007/s10985-022-09545-9
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Missing covariates are commonly encountered when evaluating covariate effects on survival outcomes. Excluding missing data from the analysis may lead to biased parameter estimation and a misleading conclusion. The inverse probability weighting method is widely used to handle missing covariates. However, obtaining asymptotic variance in frequentist inference is complicated because it involves estimating parameters for propensity scores. In this paper, we propose a new approach based on an approximate Bayesian method without using Taylor expansion to handle missing covariates for survival data. We consider a stratified proportional hazards model so that it can be used for the non-proportional hazards structure. Two cases for missing pattern are studied: a single missing pattern and multiple missing patterns. The proposed estimators are shown to be consistent and asymptotically normal, which matches the frequentist asymptotic properties. Simulation studies show that our proposed estimators are asymptotically unbiased and the credible region obtained from posterior distribution is close to the frequentist confidence interval. The algorithm is straightforward and computationally efficient. We apply the proposed method to a stem cell transplantation data set. PubDate: 2022-01-16 DOI: 10.1007/s10985-021-09542-4
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract A generalized case-cohort design has been used when measuring exposures is expensive and events are not rare in the full cohort. This design collects expensive exposure information from a (stratified) randomly selected subset from the full cohort, called the subcohort, and a fraction of cases outside the subcohort. For the full cohort study with competing risks, He et al. (Scand J Stat 43:103-122, 2016) studied the non-stratified proportional subdistribution hazards model with covariate-dependent censoring to directly evaluate covariate effects on the cumulative incidence function. In this paper, we propose a stratified proportional subdistribution hazards model with covariate-adjusted censoring weights for competing risks data under the generalized case-cohort design. We consider a general class of weight functions to account for the generalized case-cohort design. Then, we derive the optimal weight function which minimizes the asymptotic variance of parameter estimates within the general class of weight functions. The proposed estimator is shown to be consistent and asymptotically normally distributed. The simulation studies show (i) the proposed estimator with covariate-adjusted weight is unbiased when the censoring distribution depends on covariates; and (ii) the proposed estimator with the optimal weight function gains parameter estimation efficiency. We apply the proposed method to stem cell transplantation and diabetes data sets. PubDate: 2022-01-15 DOI: 10.1007/s10985-022-09546-8
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We propose a nonparametric estimate of the scale-change parameter for characterizing the difference between two survival functions under the accelerated failure time model using an estimating equation based on restricted means. Advantages of our restricted means based approach compared to current nonparametric procedures is the strictly monotone nature of the estimating equation as a function of the scale-change parameter, leading to a unique root, as well as the availability of a direct standard error estimate, avoiding the need for hazard function estimation or re-sampling to conduct inference. We derive the asymptotic properties of the proposed estimator for fixed and for random point of restriction. In a simulation study, we compare the performance of the proposed estimator with parametric and nonparametric competitors in terms of bias, efficiency, and accuracy of coverage probabilities. The restricted means based approach provides unbiased estimates and accurate confidence interval coverage rates with efficiency ranging from 81% to 95% relative to fitting the correct parametric model. An example from a randomized clinical trial in head and neck cancer is provided to illustrate an application of the methodology in practice. PubDate: 2022-01-12 DOI: 10.1007/s10985-021-09541-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract We consider accelerated failure time models with error-prone time-to-event outcomes. The proposed models extend the conventional accelerated failure time model by allowing time-to-event responses to be subject to measurement errors. We describe two measurement error models, a logarithm transformation regression measurement error model and an additive error model with a positive increment, to delineate possible scenarios of measurement error in time-to-event outcomes. We develop Bayesian approaches to conduct statistical inference. Efficient Markov chain Monte Carlo algorithms are developed to facilitate the posterior inference. Extensive simulation studies are conducted to assess the performance of the proposed method, and an application to a study of Alzheimer’s disease is presented. PubDate: 2022-01-09 DOI: 10.1007/s10985-021-09543-3
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Proportional hazards frailty models have been extensively investigated and used to analyze clustered and recurrent failure times data. However, the proportional hazards assumption in the models may not always hold in practice. In this paper, we propose an additive hazards frailty model with semi-varying coefficients, which allows some covariate effects to be time-invariant while other covariate effects to be time-varying. The time-varying and time-invariant regression coefficients are estimated by a set of estimating equations, whereas the frailty parameter is estimated by the moment method. The large sample properties of the proposed estimators are established. The finite sample performance of the estimators is examined by simulation studies. The proposed model and estimation are illustrated with an analysis of data from a rehospitalization study of colorectal cancer patients. PubDate: 2021-11-25 DOI: 10.1007/s10985-021-09540-6
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Many medical conditions are marked by a sequence of events in association with continuous changes in biomarkers. Few works have evaluated the overall accuracy of a biomarker in predicting disease progression. We thus extend the concept of receiver operating characteristic (ROC) surface and the volume under the surface (VUS) from multi-category outcomes to ordinal competing-risk outcomes that are also subject to noninformative censoring. Two VUS estimators are considered. One is based on the definition of the ROC surface and obtained by integrating the estimated ROC surface. The other is an inverse probability weighted U estimator that is built upon the equivalence of the VUS to the concordance probability between the marker and sequential outcomes. Both estimators have nice asymptotic results that can be derived using counting process techniques and U-statistics theory. We illustrate their good practical performances through simulations and applications to two studies of cognition and a transplant dataset. PubDate: 2021-11-22 DOI: 10.1007/s10985-021-09539-z
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract The analysis of recurrent events in the presence of terminal events requires special attention. Several approaches have been suggested for such analyses either using intensity models or marginal models. When analysing treatment effects on recurrent events in controlled trials, special attention should be paid to competing deaths and their impact on interpretation. This paper proposes a method that formulates a marginal model for recurrent events and terminal events simultaneously. Estimation is based on pseudo-observations for both the expected number of events and survival probabilities. Various relevant hypothesis tests in the framework are explored. Theoretical derivations and simulation studies are conducted to investigate the behaviour of the method. The method is applied to two real data examples. The bivariate marginal pseudo-observation model carries the strength of a two-dimensional modelling procedure and performs well in comparison with available models. Finally, an extension to a three-dimensional model, which decomposes the terminal event per death cause, is proposed and exemplified. PubDate: 2021-11-05 DOI: 10.1007/s10985-021-09533-5
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Each cluster consists of multiple subunits from which outcome data are collected. In a subunit randomization trial, subunits are randomized into different intervention arms. Observations from subunits within each cluster tend to be positively correlated due to the shared common frailties, so that the outcome data from a subunit randomization trial have dependency between arms as well as within each arm. For subunit randomization trials with a survival endpoint, few methods have been proposed for sample size calculation showing the clear relationship between the joint survival distribution between subunits and the sample size, especially when the number of subunits from each cluster is variable. In this paper, we propose a closed form sample size formula for weighted rank test to compare the marginal survival distributions between intervention arms under subunit randomization, possibly with variable number of subunits among clusters. We conduct extensive simulations to evaluate the performance of our formula under various design settings, and demonstrate our sample size calculation method with some real clinical trials. PubDate: 2021-10-29 DOI: 10.1007/s10985-021-09538-0
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Left-truncated data are often encountered in epidemiological cohort studies, where individuals are recruited according to a certain cross-sectional sampling criterion. Length-biased data, a special case of left-truncated data, assume that the incidence of the initial event follows a homogeneous Poisson process. In this article, we consider an analysis of length-biased and interval-censored data with a nonsusceptible fraction. We first point out the importance of a well-defined target population, which depends on the prior knowledge for the support of the failure times of susceptible individuals. Given the target population, we proceed with a length-biased sampling and draw valid inferences from a length-biased sample. When there is no covariate, we show that it suffices to consider a discrete version of the survival function for the susceptible individuals with jump points at the left endpoints of the censoring intervals when maximizing the full likelihood function, and propose an EM algorithm to obtain the nonparametric maximum likelihood estimates of nonsusceptible rate and the survival function of the susceptible individuals. We also develop a novel graphical method for assessing the stationarity assumption. When covariates are present, we consider the Cox proportional hazards model for the survival time of the susceptible individuals and the logistic regression model for the probability of being susceptible. We construct the full likelihood function and obtain the nonparametric maximum likelihood estimates of the regression parameters by employing the EM algorithm. The large sample properties of the estimates are established. The performance of the method is assessed by simulations. The proposed model and method are applied to data from an early-onset diabetes mellitus study. PubDate: 2021-10-08 DOI: 10.1007/s10985-021-09536-2
Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.
Abstract: Abstract Multivariate panel count data frequently arise in follow up studies involving several related types of recurrent events. For univariate panel count data, several varying coefficient models have been developed. However, varying coefficient models for multivariate panel count data remain to be studied. In this paper, we propose a varying coefficient mean model for multivariate panel count data to describe the possible nonlinear interact effects between the covariates and the local logarithm partial likelihood procedure is considered to estimate the unknown covariate effects. Furthermore, a Breslow-type estimator is constructed for the baseline mean functions. The consistency and asymptotic normality of the proposed estimators are established under some mild conditions. The utility of the proposed approach is evaluated by some numerical simulations and an application to a dataset of skin cancer study. PubDate: 2021-10-05 DOI: 10.1007/s10985-021-09537-1