Conventional case-cohort design and analysis for studies of interaction (original) (raw)
Related papers
Exposure stratified case-cohort designs
1998
A variant of the case-cohort design is proposed for the situation in which a correlate of the exposure (or prognostic factor) of interest is available for all cohort members, and exposure information is to be collected for a case-cohort sample. The cohort is stratified according to the correlate, and the subcohort is selected by stratified random sampling. A number of possible methods for the analysis of such exposure stratified case-cohort samples are presented and some of their statistical properties developed. The bias and efficiency of the methods are compared to each other, and to randomly sampled case-cohort studies, in a limited computer simulation study. We found that all of the proposed analysis methods performed reasonably well and were more efficient than a randomly sampled case-cohort sample. We conclude that these methods are well suited for the "clinical trials setting" in which subjects enter the study at time zero (at diagnosis or treatment) and a correlate of an expensive prognostic factor is collected for all study subjects at the time of entry to the study. In such studies, a correlate stratified subcohort can be much more cost-efficient for investigation of the expensive prognostic factor than a randomly sampled subcohort.
Analysis of Case-Cohort Designs
Journal of Clinical Epidemiology, 1999
The case-cohort design is most useful in analyzing time to failure in a large cohort in which failure is rare. Covariate information is collected from all failures and a representative sample of censored observations. Sampling is done without respect to time or disease status, and, therefore, the design is more flexible than a nested case-control design. Despite the efficiency of the methods, case-cohort designs are not often used because of perceived analytic complexity. In this article, we illustrate computation of a simple variance estimator and discuss model fitting techniques in SAS. Three different weighting methods are considered. Model fitting is demonstrated in an occupational exposure study of nickel refinery workers. The design is compared to a nested case-control design with respect to analysis and efficiency in a small simulation. In this example, case-cohort sampling from the full cohort was more efficient than using a comparable nested casecontrol design.
Pharmacoepidemiology and Drug Safety, 2013
Purpose Instrumental variable (IV) analysis is becoming increasingly popular to adjust for confounding in observational pharmacoepidemiologic research. One of the prerequisites of an IV is that it is strongly associated with exposure; if it is weakly associated with exposure, IV estimates are reported to be biased. We aimed to assess the performance of IV estimates in various (pharmaco-) epidemiologic settings. Methods Data were simulated for continuous/binary exposure, outcome and IV in cohort and nested case-control (NCC) designs with different incidences of the outcome. Pearson's correlation, point bi-serial correlation, odds ratio (OR), and F-statistic were used to assess the IV-exposure association. Two-stage analysis was performed to estimate the exposure effect. Results For all types of IV and exposure in the cohort and NCC designs, IV estimates were extremely unstable and biased when the IV was very weakly associated with exposure (e.g. Pearson's correlation < 0.15 for continuous or OR < 2.0 for binary IV and exposure; although specific cutoff values depend on simulation settings). For stronger IVs, estimates were unbiased and become less variable compared with weaker IVs in the case of continuous and binary (risk difference scale) outcomes. For a similar IV-exposure association (e.g. OR = 1.4 and 5% incidence of the outcome), the variability of the estimates was more pronounced in the NCC (standard deviation = 2.37, case : control = 1:5) compared with the cohort design (standard deviation = 1.14). The variability was even more pronounced for rare (≤1%) outcomes. However, IV estimates from the NCC design became less variable with an increasing number of controls per case. Moreover, estimates were biased when the IV was related to confounders even with strong IVs. Conclusions Instrumental variable analysis performs poorly when the IV-exposure association is extremely weak, especially in the NCC design. IV estimates in the NCC design become less variable when the number of control increases. As NCC does not use the entire cohort, in order to achieve stable estimates, this design requires a stronger IV-exposure association than the cohort design.
2017
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP–disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assumin...
Using cohort studies in lifecourse epidemiology
Public Health, 2012
Population science Cohort studies s u m m a r y The UK Medical Research Council (MRC) Population Health Sciences Research Network is a network of MRC research units and centres that aims to bring together and add value to existing MRC investment in public health, health services and epidemiological research. This symposium held in August 2011 at the World Congress of Epidemiology, Edinburgh, discussed a range of topics including methodology and analytical issues based on a number of examples of cohort studies within the context of lifecourse epidemiology.
Estimating Interaction Between Genetic and Environmental Risk Factors
Epidemiology, 2008
Large prospective cohorts originally assembled to study environmental risk factors are increasingly exploited to study geneenvironment interactions. Given the cost of genetic studies in large samples, being able to select a subsample for genotyping that contains most of the information from the cohort would lead to substantial savings. We consider nested case-control and case-cohort sampling designs with and without stratification and compare their efficiency relative to the entire cohort for estimating the effects of genetic and environmental risk factors and their interactions. Asymptotic calculations show that the relative efficiency of the casecohort and nested case-control designs implementing the same sampling stratification are similar over a range of scenarios for the relationships among genes, environmental exposures, and disease status. Sampling equal numbers of exposed and unexposed subjects improves efficiency when the exposure is rare. The case-cohort designs had a slight advantage in simulations of sampling designs within the Framingham Offspring Study, using the interaction between apolipoprotein E and smoking on the risk of coronary heart disease as an example. It was possible to estimate the interaction effect with precision close to that of the full cohort when using case-cohort or nested case-control samples containing fewer than half the subjects of the cohort.
Population Stratification Bias in the Case-Only Study for Gene-Environment Interactions
American Journal of Epidemiology, 2008
The case-only study is a convenient approach and provides increased statistical efficiency in detecting geneenvironment interactions. The validity of a case-only study hinges on one well-recognized assumption: The susceptibility genotypes and the environmental exposures of interest are independent in the population. Otherwise, the study will be biased. The authors show that hidden stratification in the study population could also ruin a case-only study. They derive the formulas for population stratification bias. The bias involves three terms: 1) the coefficient of variation of the exposure prevalence odds, 2) the coefficient of variation of the genotype frequency odds, and 3) the correlation coefficient between the exposure prevalence odds and the genotype frequency odds. The authors perform simulation to investigate the magnitude of bias over a wide range of realistic scenarios. It is found that the estimated interaction effect is frequently biased by more than 5%. For a rarer gene and a rarer exposure, the bias becomes even larger (>30%). Because of the potentially large bias, researchers conducting case-only studies should use the boundary formula presented in this paper to make more prudent interpretations of their results, or they should use stratified analysis or a modeling approach to adjust for population stratification bias in their studies. bias (epidemiology); data interpretation, statistical; environment; epidemiologic methods; genetics Abbreviations: CIR, confounding interaction ratio; CV, coefficient of variation; RR, relative risk.
Analysis of case-cohort data: A comparison of different methods
Journal of Clinical Epidemiology, 2007
Objective: The case-cohort design combines the advantages of a prospective cohort study and the efficiency of a caseecontrol design. Usually a Cox proportional-hazards model is used for the analyses. However, adaptation of the model is necessary because of the sampling. We compared three methods that were proposed in the literature, which differ in weighting of study subjects: Prentice's, Barlow's, and Self and Prentice's method.