Analysis of Case-Cohort Designs (original) (raw)
Related papers
On the proportional hazards model for occupational and environmental case-control analyses
BMC Medical Research Methodology, 2013
Background: Case-control studies are generally designed to investigate the effect of exposures on the risk of a disease. Detailed information on past exposures is collected at the time of study. However, only the cumulated value of the exposure at the index date is usually used in logistic regression. A weighted Cox (WC) model has been proposed to estimate the effects of time-dependent exposures. The weights depend on the age conditional probabilities to develop the disease in the source population. While the WC model provided more accurate estimates of the effect of time-dependent covariates than standard logistic regression, the robust sandwich variance estimates were lower than the empirical variance, resulting in a low coverage probability of confidence intervals. The objectives of the present study were to investigate through simulations a new variance estimator and to compare the estimates from the WC model and standard logistic regression for estimating the effects of correlated temporal aspects of exposure with detailed information on exposure history.
Exposure stratified case-cohort designs
1998
A variant of the case-cohort design is proposed for the situation in which a correlate of the exposure (or prognostic factor) of interest is available for all cohort members, and exposure information is to be collected for a case-cohort sample. The cohort is stratified according to the correlate, and the subcohort is selected by stratified random sampling. A number of possible methods for the analysis of such exposure stratified case-cohort samples are presented and some of their statistical properties developed. The bias and efficiency of the methods are compared to each other, and to randomly sampled case-cohort studies, in a limited computer simulation study. We found that all of the proposed analysis methods performed reasonably well and were more efficient than a randomly sampled case-cohort sample. We conclude that these methods are well suited for the "clinical trials setting" in which subjects enter the study at time zero (at diagnosis or treatment) and a correlate of an expensive prognostic factor is collected for all study subjects at the time of entry to the study. In such studies, a correlate stratified subcohort can be much more cost-efficient for investigation of the expensive prognostic factor than a randomly sampled subcohort.
A Simulation Study of Relative Efficiency and Bias in the Nested Case–Control Study Design
Epidemiologic Methods, 2013
Purpose: The nested case-control study design, in which a fixed number of controls are matched to each case, is often used to analyze exposure-response associations within a cohort. It has become common practice to sample four or five controls per case; however, previous research has shown that in certain instances, significant gains in relative efficiency can be realized when more controls are matched to each case. This study expanded upon this and investigated the effect of (i) the number of cases, (ii) the strength of the exposure-response, and (iii) the skewness of the exposure distribution on the bias and relative efficiency of the conditional likelihood estimator from a nested case-control study. Methods: Cohorts were simulated and analyzed using conditional logistic regression. Results: The relative efficiency decreased and bias away from the null increased, as the true exposure-response parameter increased and the skewness of the exposure distribution of the risk-sets increased. This became more pronounced when the number of cases in the cohort was small. Conclusions: Gains in relative efficiency and a reduction in bias can be realized by sampling more than four or five controls per case generally used, especially when there are few cases, a strong exposure-response relation, and a skewed exposure variable.
A weighted Cox model for modelling time-dependent exposures in the analysis of case-control studies
Statistics in Medicine, 2010
Many exposures investigated in epidemiological case-control studies may vary over time. The effects of these exposures are usually estimated using logistic regression, which does not directly account for changes in covariate values over time within individuals. By contrast, the Cox model with time-dependent covariates directly accounts for these changes over time. However, the over-sampling of cases in case-control studies, relative to controls, requires manipulating the risk sets in the Cox partial likelihood. A previous study showed that simple inclusion or exclusion of future cases in each risk set induces an under-or over-estimation bias in the regression parameters, respectively. We investigate the performance of a weighted Cox model that weights subjects according to age-conditional probabilities of developing the disease of interest in the source population. In a simulation study, the lifetime experience of a source population is first generated and a case-control study is then simulated within each population. Different characteristics of exposure are generated, including time-varying intensity. The results show that the estimates from the weighted Cox model are much less biased than the Cox models that simply include or exclude future cases, and are superior to logistic regression estimates in terms of bias and mean-squared error. An application to frequency-matched population-based case-control data on lung cancer illustrates similar differences in the estimated effects of different smoking variables. The investigated weighted Cox model is a potential alternative method to analyse matched or unmatched population-based case-control studies with time-dependent exposures.
A New Method for Estimating the Risk Ration in Studies Using Case-Partental Control Design
American Journal of Epidemiology, 1998
The authors describe a new simple noniterative, yet efficient method to estimate the risk ratio in studies using case-parental control design. The new method is compared with two other noniterative methods, Khoury's method and Flanders and Khoury's method, and with a maximum likelihood-based method of Schaid and Sommer. The authors found that the variance of the new estimation method is usually smaller than that of Khoury's method or Flanders and Khoury's method and that it is slightly larger than that of the maximum likelihood-based method of Schaid and Sommer. Despite the slightly large variance of the new estimator compared with that of the maximum likelihood-based method, the simplicity of the new estimator and its variance makes the new method appealing. When genotypic information for only one parent is available, the authors also describe a method to estimate the risk ratio without assuming Hardy-Weinberg equilibrium or random mating. A simple formula for the variance of the estimator is given.
Epidemiology, 2011
In occupational epidemiologic studies, the healthy-worker survivor effect refers to a process that leads to bias in the estimates of an association between cumulative exposure and a health outcome. In these settings, work status acts both as an intermediate and confounding variable, and may violate the positivity assumption (the presence of exposed and unexposed observations in all strata of the confounder). Using Monte Carlo simulation, we assess the degree to which crude, workstatus adjusted, and weighted (marginal structural) Cox proportional hazards models are biased in the presence of time-varying confounding and nonpositivity. We simulate data representing timevarying occupational exposure, work status, and mortality. Bias, coverage, and root mean squared error (MSE) were calculated relative to the true marginal exposure effect in a range of scenarios. For a base-case scenario, using crude, adjusted, and weighted Cox models, respectively, the hazard ratio was biased downward 19%, 9%, and 6%; 95% confidence interval coverage was 48%, 85%, and 91%; and root MSE was 0.20, 0.13, and 0.11. Although marginal structural models were less biased in most scenarios studied, neither standard nor marginal structural Cox proportional hazards models fully resolve the bias encountered under conditions of time-varying confounding and nonpositivity.
Pharmacoepidemiology and Drug Safety, 2019
Background: Epidemiological study reporting is improving but is not transparent enough for easy evaluation or replication. One barrier is insufficient details about design elements in published studies. Methods: Using a previously conducted drug safety evaluation in claims as a test case, we investigated the impact of small changes in five key design elements on risk estimation. These elements are index day of incident exposure's determination of look-back or follow-up periods, exposure duration algorithms, heparin exposure exclusion, propensity score model variables, and Cox proportional hazard model stratification. We covaried these elements using a fractional factorial design, resulting in 24 risk estimates for one outcome. We repeated eight of these combinations for two additional outcomes. We measured design effects on cohort sizes, follow-up time, and risk estimates. Results: Small changes in specifications of index day and exposure algorithm affected the risk estimation process the most. They affected cohort size on average by 8 to 10%, follow-up time by up to 31%, and magnitude of log hazard ratios by up to 0.22. Other elements affected cohort before matching or risk estimate's precision but not its magnitude. Any change in design substantially altered the matched control-group subjects in 1:1 matching. Conclusions: Exposure-related design elements require attention from investigators initiating, evaluating, or wishing to replicate a study or from analysts standardizing definitions. The methods we developed, using factorial design and mapping design effect on causal estimation process, are applicable to planning of sensitivity analyses in similar studies.
A Note on Risk Prediction for Case-Control Studies
2008
We introduce a new method for prediction in case-control study designs, which is a simple extension of the work by van der Laan (2008). Case-control samples are biased since the proportion of cases in the sample is not the same as the population of interest. The case-control weighting for prediction proposed in this paper relies on knowledge of the true incidence probability P(Y=1) to eliminate the bias of the sampling design. In many practical settings, case-control weighting will outperform an existing method for prediction, intercept adjustment.
On Estimation of the Hazard Function From Population-Based Case–Control Studies
Journal of the American Statistical Association, 2018
The population-based case-control study design has been widely used for studying the etiology of chronic diseases. It is well established that the Cox proportional hazards model can be adapted to the case-control study and hazard ratios can be estimated by (conditional) logistic regression model with time as either a matched set or a covariate (Prentice and Breslow, 1978). However, the baseline hazard function, a critical component in absolute risk assessment, is unidentifiable, because the ratio of cases and controls is controlled by the investigators and does not reflect the true disease incidence rate in the population. In this paper we propose a simple and innovative approach, which makes use of routinely collected family history information, to estimate the baseline hazard function for any logistic regression model that is fit to the risk factor data collected on cases and controls. We establish that the proposed baseline hazard function estimator is consistent and asymptotically normal and show via simulation that it performs well in finite samples. We illustrate the proposed method by a population-based case-control study of prostate cancer where the association of various risk factors is assessed and the family history information is used to estimate the baseline hazard function.
Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies
T he case-cohort design is an efficient alternative to the full cohort design. When compared with the case-control study nested within the cohort, the case-cohort design has flexibility for a series of exploratory analyses because a single subcohort is employed to analyze multiple outcomes. 1,2 This design feature is of particular importance in some specific types of research, including phar-macoepidemiology studies. For example , it can be used to evaluate the association between a single specific drug and multiple adverse events, of which the association with some of the events is often unknown or little understood at the beginning of the study. Nevertheless , the case-cohort design has not often been employed. One of the reasons hindering the wide use of the design may be the scarcity of the information essential for planning individual studies, including sample size calculation. Recently , Cai and Zeng 3,4 have presented a method for power/sample size calculation as a natural generalization of the log-rank test in the full cohort design. We show a simple sample size formula for the case-cohort design interpretable as the straightforward expansion of the conventional sample-size formula for the cohort study. N full denotes the sample size needed for the cohort study and N 1 full (N 0 full) is the size of the exposed (un-exposed) population in the full cohort, that is, N full (1 K)N 1 full where K N 0 full /N 1 full. When RR is the relative risk, or the ratio of the risk (incidence proportion) in the exposed (P 1) to that in the unexposed (P 0) (ie, RR P 1 /P 0) and P D is the common estimate of the incidence proportion under the null hypothesis defined as P D (N 1 full P 1 N 0 full P 0)/N full P 0 (RR K)/(1 K), based on the conventional sample size formula for the cohort study, N 1 full Z / 2 A Z B/C 2 where z c is (1 c) th standard normal quantile, A (1 1/K)P D (1 P D), B RR P 0 (1 RR P 0) P 0 (1 P 0)/K and C P 0 (RR 1). Using m, the ratio of the sub-cohort to cases in the entire cohort, the entire size of the case-cohort study, N, is simply formulated as N 1 1 m N full Of note, m should be assigned by a researcher who is planning the study. A simulation study using a model subject to time-to-event analysis 5 revealed that the proposed sample size yielded a satisfactory empirical power and type I empirical error rate. For a single event, the number of subjects where the detailed information on co-variates is collected (ie, subcohort members and/or cases) defined as n detail is the smallest when m 1; however, for multiple events, n detail is the smallest when m is larger than 1. In general, with a larger m, the size of the entire cohort N is closer to N full but n detail is larger. To achieve a good balance between N and n detail , m 3-5 may be adopted in many occasions. For example, (N, n detail) (19, 972, 70) and (11, 984, 126) for m 1 and 5, respectively, when (P 0 , RR, K, ,) (0.001, 4, 3, 0.05, 0.2). In actual situations, if the estimation for all or some of covariates is quite costly, the value of n detail may be minimized by adjusting m within available resources. Details on derivation of the formula and simulation are available in the eAppen-dix REFERENCES 1. Kupper LL, McMichael AJ, Spirtas R. A hybrid epidemiologic study design useful in estimating relative risk. J Am Stat Assoc. 1975;70: 524-528. 2. Langholz B, Thomas DC. Nested case-control and case-cohort sampling from a cohort: a critical comparison. Am J Epidemiol. 1990;131: 169-176. 3. Cai J, Zeng D. Sample size/power calculation for case-cohort studies.