Design Options for Molecular Epidemiology Research within Cohort Studies (original) (raw)

The value of reusing prior nested case-control data in new studies with different outcome

Statistics in Medicine, 2012

Many epidemiological studies use a nested case-control (NCC) design to reduce cost while maintaining study power. However, because of the incidence density sampling used, reusing data from NCC studies for analysis of secondary outcomes is not straightforward. Recent methodological developments have opened the possibility for prior NCC data to be used to complement controls in a current study, thereby improving study efficiency. However, practical guidelines on the effectiveness of prior data relative to newly sampled subjects and the potential power gains are still lacking. Using simulated cohorts, we show in this paper how the efficiency of NCC studies that use a mixture of prior and newly sampled subjects depends on the number of newly sampled controls and prior subjects as well as the overlap in the distributions of the matching variables. We explore the feasibility and efficiency of a current study that gathers no controls, relying instead on prior data. Using the concept of effective number of controls, we show how researchers can assess the potential power gains from reusing prior data. We apply the method to analyses of anorexia and contralateral breast cancer in the Swedish population and show how power calculations can be done using publicly available software. This work has important applications in genetic and molecular epidemiology to make optimal use of costly exposure measurements.

Two-Stage Designs in Case-Control Association Analysis

Genetics, 2006

DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are smal...

Analysis of Case-Cohort Designs

Journal of Clinical Epidemiology, 1999

The case-cohort design is most useful in analyzing time to failure in a large cohort in which failure is rare. Covariate information is collected from all failures and a representative sample of censored observations. Sampling is done without respect to time or disease status, and, therefore, the design is more flexible than a nested case-control design. Despite the efficiency of the methods, case-cohort designs are not often used because of perceived analytic complexity. In this article, we illustrate computation of a simple variance estimator and discuss model fitting techniques in SAS. Three different weighting methods are considered. Model fitting is demonstrated in an occupational exposure study of nickel refinery workers. The design is compared to a nested case-control design with respect to analysis and efficiency in a small simulation. In this example, case-cohort sampling from the full cohort was more efficient than using a comparable nested casecontrol design.

A Simulation Study of Relative Efficiency and Bias in the Nested Case–Control Study Design

Epidemiologic Methods, 2013

Purpose: The nested case-control study design, in which a fixed number of controls are matched to each case, is often used to analyze exposure-response associations within a cohort. It has become common practice to sample four or five controls per case; however, previous research has shown that in certain instances, significant gains in relative efficiency can be realized when more controls are matched to each case. This study expanded upon this and investigated the effect of (i) the number of cases, (ii) the strength of the exposure-response, and (iii) the skewness of the exposure distribution on the bias and relative efficiency of the conditional likelihood estimator from a nested case-control study. Methods: Cohorts were simulated and analyzed using conditional logistic regression. Results: The relative efficiency decreased and bias away from the null increased, as the true exposure-response parameter increased and the skewness of the exposure distribution of the risk-sets increased. This became more pronounced when the number of cases in the cohort was small. Conclusions: Gains in relative efficiency and a reduction in bias can be realized by sampling more than four or five controls per case generally used, especially when there are few cases, a strong exposure-response relation, and a skewed exposure variable.

Sequential Analysis of Longitudinal Data in a Prospective Nested Case-Control Study

Biometrics, 2010

The nested case-control design is a relatively new type of observational study whereby a case-control approach is employed within an established cohort. In this design, we observe cases and controls longitudinally by sampling all cases whenever they occur but controls at certain time points. Controls can be obtained at time points randomly scheduled or prefixed for operational convenience. This design with longitudinal observations is efficient in terms of cost and duration, especially when the disease is rare and the assessment of exposure levels is difficult. In our design, we propose sequential sampling methods and study both (group) sequential testing and estimation methods so that the study can be stopped as soon as the stopping rule is satisfied. To make such a longitudinal sampling more efficient in terms of both numbers of subjects and replications, we propose applying sequential sampling methods to subjects and replications, simultaneously, until the information criterion is fulfilled. This simultaneous sequential sampling on subjects and replicates is more flexible for practitioners designing their sampling schemes, and is different from the classical approaches used in longitudinal studies. We newly define the σ-field to accommodate our proposed sampling scheme, which contains mixtures of independent and correlated observations, and prove the asymptotic optimality of sequential estimation based on the martingale theories. We also prove that the independent increment structure is retained so that the group sequential method is applicable. Finally, we present results by employing sequential estimation and group sequential testing on both simulated data and real data on children's diarrhea.

Exposure stratified case-cohort designs

1998

A variant of the case-cohort design is proposed for the situation in which a correlate of the exposure (or prognostic factor) of interest is available for all cohort members, and exposure information is to be collected for a case-cohort sample. The cohort is stratified according to the correlate, and the subcohort is selected by stratified random sampling. A number of possible methods for the analysis of such exposure stratified case-cohort samples are presented and some of their statistical properties developed. The bias and efficiency of the methods are compared to each other, and to randomly sampled case-cohort studies, in a limited computer simulation study. We found that all of the proposed analysis methods performed reasonably well and were more efficient than a randomly sampled case-cohort sample. We conclude that these methods are well suited for the "clinical trials setting" in which subjects enter the study at time zero (at diagnosis or treatment) and a correlate of an expensive prognostic factor is collected for all study subjects at the time of entry to the study. In such studies, a correlate stratified subcohort can be much more cost-efficient for investigation of the expensive prognostic factor than a randomly sampled subcohort.

Study designs in biomarker research

The European Research Journal, 2017

In order to advances in technology, nowadays science is facing to a large variety of biomarkers. Issues of selecting appropriate study design for biomarkers, facing with a large number of biomarkers, multiple biomarkers, and usefulness of a new biomarker the today is more complicated. Current study is an overview of the issues discussed in studies of biomarkers.

Randomized Clinical Trials With Biomarkers: Design Issues

JNCI Journal of the National Cancer Institute, 2010

Establishing clinical relevance of a biomarker test for guiding therapy decisions requires demonstrating that it can classify patients into distinct subgroups with different recommended management. Conventional RCTs (with no biomarker evaluation) only allow for estimation of the average treatment effect in the overall study population, and therefore, alternative designs must be considered to evaluate biomarker-guided therapy. We discuss three main types of biomarker RCT designs: biomarker-stratified designs, enrichment designs, and biomarker-strategy designs (3-8) (Figure 1). We assume throughout this presentation that the biomarker test to be evaluated in the RCT is fully specified and can effectively be treated as though it were a single measure. Additionally, we assume that discrete categories for the biomarker have been previously identified (eg, the cutoff value has been determined for a continuous biomarker to classify patients as biomarker-positive vs CommentarieS

Programming challenges of sampling controls to cases from the dynamic risk sets in nested case–control studies

Pharmaceutical Programming, 2012

Pharmacoepidemiological studies based on the cohort design are simpler to analyse and their results easier to interpret. However, these may not reflect real-life drug use which is a major strength of such studies. The nested case-control design is often used instead to avoid the computational burden associated with time-dependent explanatory variables. Unlike the classical case-control design which is generally easy to programme, that of the nested case-control can pose a number of challenges. Subjects can be chosen as controls more than once and a subject who is chosen as a control can later become a case. Indeed controls are chosen from among those in the cohort who are at risk of the event at that time (i.e. we sample from the risk set defined by the case). We highlight the main programming challenges of the design as well as describe and demonstrate approaches for resolution and appropriate implementation.

Randomized Phase II Trial Designs With Biomarkers

Journal of Clinical Oncology, 2012

Efficient development of targeted therapies that may only benefit a fraction of patients requires clinical trial designs that use biomarkers to identify sensitive subpopulations. Various randomized phase III trial designs have been proposed for definitive evaluation of new targeted treatments and their associated biomarkers (eg, enrichment designs and biomarker-stratified designs). Before proceeding to phase III, randomized phase II trials are often used to decide whether the new therapy warrants phase III testing. In the presence of a putative biomarker, the phase II trial should also provide information as to what type of biomarker phase III trial is appropriate. A randomized phase II biomarker trial design is proposed, which, after completion, recommends the type of phase III trial to be used for the definitive testing of the therapy and the biomarker. The recommendations include the possibility of proceeding to a randomized phase III of the new therapy with or without using the ...