Colorectal Cancer Risk Prediction Tool for White Men and Women Without Known Susceptibility (original) (raw)

Abstract

Purpose

Given the high incidence of colorectal cancer (CRC), and the availability of procedures that can detect disease and remove precancerous lesions, there is a need for a model that estimates the probability of developing CRC across various age intervals and risk factor profiles.

Methods

The development of separate CRC absolute risk models for men and women included estimating relative risks and attributable risk parameters from population-based case-control data separately for proximal, distal, and rectal cancer and combining these estimates with baseline age-specific cancer hazard rates based on Surveillance, Epidemiology, and End Results (SEER) incidence rates and competing mortality risks.

Results

For men, the model included a cancer-negative sigmoidoscopy/colonoscopy in the last 10 years, polyp history in the last 10 years, history of CRC in first-degree relatives, aspirin and nonsteroidal anti-inflammatory drug (NSAID) use, cigarette smoking, body mass index (BMI), current leisure-time vigorous activity, and vegetable consumption. For women, the model included sigmoidoscopy/colonoscopy, polyp history, history of CRC in first-degree relatives, aspirin and NSAID use, BMI, leisure-time vigorous activity, vegetable consumption, hormone-replacement therapy (HRT), and estrogen exposure on the basis of menopausal status. For men and women, relative risks differed slightly by tumor site. A validation study in independent data indicates that the models for men and women are well calibrated.

Conclusion

We developed absolute risk prediction models for CRC from population-based data, and a simple questionnaire suitable for self-administration. This model is potentially useful for counseling, for designing research intervention studies, and for other applications.

INTRODUCTION

Colorectal cancer (CRC) is the third most commonly diagnosed cancer and the third leading cause of cancer death in the United States. During 2008, an estimated 148,810 new cases of CRC will be diagnosed, and 49,960 persons will die as a result of the disease.1 Approximately one in 18 persons in the United States will develop CRC during his or her life.

Currently, several CRC screening strategies are available, including the fecal occult blood test (FOBT), double-contrast barium enema, flexible sigmoidoscopy, colonoscopy, virtual colonography, and combinations of these tests. Many of these strategies have been shown to be effective for reducing CRC incidence and mortality.2,3

Given the high incidence of CRC, its significant cost to society, and the availability of screening tests, a model that estimates an individual's probability of developing CRC using risk factor information that can be obtained easily in a clinical setting may aid physicians and their patients in deciding on screening regimens, and can also be useful in designing chemoprevention and screening intervention trials.4

Several risk and protective factors for CRC have been consistently identified in epidemiologic studies, including physical activity, cigarette smoking, and body mass index (BMI).5,6 However, no quantitative risk model that takes competing causes of death into account is currently available to estimate the absolute risk or probability of developing CRC. Existing models are qualitative and based on expert opinion,7 or applicable only to special populations.810 We present a model that, given a set of risk and protective factors and age, predicts the absolute risk of developing CRC over a given time period, accounting for competing causes of death. We used data from two population-based case-control studies to assess risk or protective factors. After describing the study populations, we describe the development of the CRC absolute risk model and give examples of risk estimates for various combinations of factors. We also present a short, self-administered risk assessment questionnaire that can be used to obtain information about risk factors for the model.

METHODS

Study Populations Used to Estimate Relative Risk

We used data from two population-based case-control studies, one for colon cancer1113 and one for rectal cancer,1416 to estimate relative risks (RRs) of CRC. Controls for both case-control studies were matched to cases on sex and 5-year age groups. These studies were conducted by investigators at the Universities of Utah (Salt Lake City, UT), Minnesota (Minneapolis, MN), and the Kaiser Permanente Medical Care Program (KPMCP) of Northern California (Oakland, CA). The Appendix (online only) provides additional details.

We restricted our analyses to non-Hispanic white men and women age 50 years and older, who comprised the majority of participants in both studies. We differentiated proximal (cecum through transverse colon), distal (splenic flexure, descending, and sigmoid colon), and rectal (rectosigmoid junction and rectum) tumor sites because incidence rates for these sites differ dramatically by age, and because risk factors and prior screening may have different effects on each site (Fig 1).

Fig 1.

Fig 1.

Colorectal cancer incidence by tumor site for white non-Hispanic men and women (13 Surveillance, Epidemiology, and End Results [SEER] sites 1992-2002).

Risk Factors to Estimate RR Models

We assessed a variety of factors that have been consistently associated with colon or rectal cancer,6,17,18 including age; history of CRC in first-degree relatives; history of sigmoidoscopy and/or colonoscopy; history of polyps; use of multivitamins; red meat, vegetable, and fruit consumption; alcohol intake; BMI (kg/m2); cigarette smoking; use of aspirin and other nonsteroidal anti-inflammatory drugs (NSAIDs); current leisure-time vigorous activity; and estrogen status as assessed by menopausal status and hormone-replacement therapy (HRT) use. Although additional nutrient variables including calcium, dietary fiber, and iron have been related to CRC risk in some studies,6,18 a detailed dietary assessment and supporting nutrient database would be needed to capture these intakes accurately, making such an assessment unfeasible in most clinical settings. Therefore, we did not include dietary variables that require a detailed assessment. The Appendix includes further description of the variables we evaluated in developing our RR models.

Projecting Probabilities (absolute risk) of Developing CRC

Our approach19 included 1 estimating RR parameters from population-based case-control data separately for proximal, distal, and rectal cancer; 2 estimating baseline age-specific cancer hazard rates (based on the National Cancer Institute's Surveillance Epidemiology and End Results (SEER) Program incidence rates) and attributable risks (ARs) from the case-control data; and 3 combining competing risks, RRs, and baseline hazards to estimate the probability of developing the first of proximal, distal, or rectal cancer over a prespecified time interval (eg, 5, 10, or 20 years) given a person's age and risk factors. The advantage of modeling the sites separately is that the covariates have different associations by various sites, and thus discriminatory accuracy can be improved by separately modeling each site.20 The Appendix contains additional details.

Estimating the RR Models

We analyzed proximal and distal colon cancer cases separately and used all eligible controls from the colon cancer study1113 to estimate separate RR models. Age was included in the models in two categories (≤ 65 and > 65) when significant, to account for the matching. We determined RR estimates for all factors described earlier herein and assessed interactions among these factors and with age. Because a substantial number of participants in both studies had missing information on sigmoidoscopy/colonoscopy, we included an “unknown” category for that variable in all models. The following variables were coded as 0, 1, 2 for one df (trend) models: men proximal—smoking, BMI and family history; men distal—BMI and family history; men rectal—current vigorous exercise; women proximal—current vigorous exercise and family history; and women distal—family history. Odds ratios (ORs) estimating RRs and corresponding 95% CIs were computed from unconditional logistic regression models. Variable selection for inclusion in the final model was based on Wald tests for individual parameters as well as information on previously established risk factors. Statistical analyses were performed using SAS software (version 8.2; SAS Institute Inc, Cary, NC).

Estimating the Baseline Age-Specific CRC Hazard Rates

The baseline hazard rate was defined as the hazard rate for individuals each of whose risk factors are at the lowest risk level. The age-specific baseline hazard rates were computed by multiplying the age-specific SEER incidence rates by 1 – [the estimate of the AR for each CRC cancer site]19 (Appendix).

The age- and sex-specific SEER incidence rates for proximal, distal, and rectal cancer (Appendix) were obtained for white non-Hispanics in 13 SEER registries between 1992 and 2002 that cover 14% of the US population.21 Cancers of the appendix, second primary, and recurrence of colorectal cancers were not included in the computation of these rates. Competing mortality hazards from causes other than CRC were obtained from US mortality data between 1990 and 2002 (Appendix).

CIs on the Absolute Risk Estimates

We extended the influence function method of Graubard and Fears,22 for the three outcomes (proximal, distal, or rectal cancer) to estimate the variance of the absolute risk estimate (Appendix). Approximate normality of the estimates was used to obtain 95% CIs for the estimated absolute risks.

Risk Assessment Questionnaire

We constructed a short, self-administered risk assessment questionnaire to capture the information used in the CRC risk prediction models. We tested the questionnaire using cognitive interviewing techniques.23 Cognitive interviews involved four “rounds” of nine participants each. We used verbal probing and think-aloud techniques to evaluate sources of misunderstanding and inaccuracy in reporting. Based on the results of our cognitive testing, the questionnaire was improved after each round to address potential sources of response errors24 and until it was usually understandable and yielded responses that appeared to classify individuals into risk categories accurately.

RESULTS

RR Models

In our analyses, we included 1,599 colon cancer cases (665 men, 708 women) with 1,974 controls (1,058 men, 916 women) and 664 rectal cancer cases (397 men, 267 women) with 859 controls (478 men, 381 women). Table 1 displays frequencies for age and recruitment site for men and women respectively, and for proximal, distal, and rectal tumor sites.

Table 1.

Demographic Characteristics of the Colon and Rectal Case–Control Study Populations (restricted to white men and women, age ≥ 50 years)

Characteristic Colon Cancer Case-Control Study (1991-1994) Rectal Cancer Case-Control Study (1997-2002)
Proximal Cases Distal Cases Controls Rectal Cases Controls
Men, No.
Total 429 462 1058 397 478
Location
Kaiser (HMO) 196 236 415 240 285
Utah (population) 86 75 239 157 193
Minnesota (population) 147 151 404
Age, years
50-59 81 103 189 106 127
60-69 148 188 410 163 186
≥ 70 200 171 459 128 165
Women, No.
Total 374 334 916 267 381
Location
Kaiser (HMO) 163 154 341 155 220
Utah (population) 65 63 200 112 161
Minnesota (population) 143 117 375
Age, years
50-59 66 69 158 69 98
60-69 118 152 341 98 130
≥ 70 187 113 417 100 153

Several factors were not related to CRC risk in our data, including FOBT; multivitamin use; alcohol use; and red meat and fruit consumption. We examined variables for smoking status and smoking pack-years for each risk model, but included only the variables “smoking duration and usual number of cigarettes smoked per day for current and former smokers” when significant, because this variable seemed to have the strongest effect of all the smoking variables, as noted previously in these studies.25

For men, the predictors of proximal colon cancer in the final model were prior negative sigmoidoscopy and/or colonoscopy, polyp history, number of relatives with CRC, aspirin and NSAID use, usual number of cigarettes smoked per day and years of smoking in current and former smokers, BMI, and servings of vegetables per day (Table 2). For example, men with two or more first-degree relatives with CRC were more likely to be diagnosed with proximal colon cancer than those without a positive family history (OR = 3.28; 95% CI, 1.84 to 5.84), whereas regular users of aspirin or NSAIDS were less likely to be diagnosed with proximal colon cancer (OR = 0.65; 95% CI, 0.51 to 0.82). For distal colon cancer, the final model included the same factors except for smoking and vegetable consumption (Table 2). The RR model for rectal cancer included sigmoidoscopy and/or colonoscopy, polyp history, number of relatives with CRC, NSAID use, and current vigorous leisure-time activity (Table 2). Only four controls and two cases in the rectal case-control study reported having two or more family members with CRC; therefore, family history was incorporated into the rectal RR model as “one or more relatives with CRC.” None of the models showed significant two-way interactions among these variables. All three RR models for men included separate intercepts for study center.

Table 2.

Relative Risk Estimates for Proximal, Distal, and Rectal Cancers for White Men Age ≥ 50 Years

Variable Proximal Distal Rectal*
OR 95% CI OR 95% CI OR 95% CI
Sigmoidoscopy and/or colonoscopy and polyp history
Sigmoidoscopy and/or colonoscopy in last 10 years, and no history of polyps 1.00 1.00 1.00
No sigmoidoscopy and/or colonoscopy in last 10 years 1.42 1.09 to 1.88 2.83 2.10 to 3.81 3.86 2.71 to 5.48
Sigmoidoscopy and/or colonoscopy in last 10 years and history of polyps 1.77 1.17 to 2.66 1.34 0.82 to 2.21 1.92 1.07 to 3.45
Sigmoidoscopy and/or colonoscopy and polyps unknown 1.58 1.02 to 2.41 2.61 1.72 to 3.97 0.51 0.14 to 1.81
No. of relatives with CRC
0 1.00 1.00 1.00
1 1.81 1.35 to 2.42 1.68 1.24 to 2.27 1.49 0.91 to 2.46
≥ 2 3.28 1.84 to 5.84 2.81 1.53 to 5.16
Current leisure-time activity, h/wk
0 1.00
> 0 and ≤ 2 0.83 0.72 to 0.95
> 2 and ≤ 4 0.69 0.52 to 0.90
> 4 0.57 0.38 to 0.85
Aspirin/NSAID use
Nonuser 1.00 1.00 1.00
Regular user 0.65 0.51 to 0.82 0.71 0.57 to 0.90 0.66 0.46 to 0.95
Smoking, cigarettes/d
Never smoker 1.00
> 0 and < 11 1.30 1.05 to 1.61
≥ 11 and ≤ 20 1.70 1.11 to 2.60
> 20 2.22 1.17 to 4.20
Years of smoking
0 1.00
> 0 and < 15 0.60 0.34 to 1.06
≥ 15 and < 35 0.88 0.50 to 1.55
≥ 35 0.67 0.38 to 1.21
Vegetable intake, servings/d
< 5 1.00
≥ 5 0.58 0.41 to 0.80
Body mass index, kg/m2
≤ 24.9 1.00 1.00
25.0 to ≤ 30 1.26 1.07 to 1.49 1.38 1.17 to 1.62
> 30 1.59 1.14 to 2.21 1.90 1.38 to 2.61

For women, the proximal colon RR model included prior negative sigmoidoscopy and/or colonoscopy, polyp history, number of relatives with CRC, estrogen status within the last 2 years, current vigorous activity, aspirin and NSAID use, and servings of vegetables per day (Table 3). The distal colon RR model included sigmoidoscopy and/or colonoscopy and polyp history, number of relatives with CRC, aspirin and NSAID use, and estrogen status within the last 2 years, an age indicator (≥ 65 years), BMI, and an interaction between BMI and estrogen status (Table 3). Older obese women (≥ 30 kg/m2) had an increased risk of CRC (OR = 2.68; 95% CI, 1.39 to 5.20). The main effect of BMI, however, was not statistically significant (OR = 1.08; 95% CI, 0.75 to 1.54). No other statistically significant interactions were found in any of the other models. Sigmoidoscopy and/or colonoscopy, polyp history, number of relatives with CRC, estrogen status within the last 2 years, current vigorous leisure-time activity, aspirin and NSAID use, and BMI were all included in the rectal cancer RR model for women (Table 3). Again, because of small numbers in the rectal RR model, family history of colorectal cancer was reduced to “one or more relatives with CRC.” All three RR models for women included separate intercepts for study center.

Table 3.

Relative Risk Estimates for Proximal, Distal, and Rectal Cancers for White Women Age ≥ 50 Years

Variable Proximal Distal Rectal
OR 95% CI OR 95% CI OR 95% CI
Sigmoidoscopy and/or colonoscopy and polyp history
Sigmoidoscopy and/or colonoscopy in last 10 years, and no history of polyps 1.00 1.00 1.00
No sigmoidoscopy and/or colonoscopy in last 10 years 1.82 1.32 to 2.51 3.44 2.31 to 5.11 2.99 1.91 to 4.69
Sigmoidoscopy and/or colonoscopy in last 10 years and history of polyps 2.62 1.52 to 4.50 4.35 2.35 to 8.03 3.19 1.41 to 7.25
Sigmoidoscopy and/or colonoscopy and polyps unknown 0.61 0.17 to 1.04 3.17 1.09 to 4.02 0.37 0.04 to 3.14
No. of relatives with CRC
0 1.00 1.00 1.00
1 1.51 1.11 to 2.03 1.45 1.04 to 2.00 1.53 0.92 to 2.55
≥ 2 2.27 1.25 to 4.14 2.09 1.09 to 4.02
Current vigorous leisure exercise, h/wk
0 1.00 1.00
> 0 and ≤ 2 0.86 0.75 to 1.00 0.69 0.48 to 1.00
> 2 and ≤ 4 0.75 0.56 to 1.00 0.79 0.45 to 1.37
> 4 0.65 0.52 to 0.99 0.63 0.36 to 1.10
Aspirin/NSAID use
Nonuser 1.00 1.00 1.00
Regular user 0.63 0.49 to 0.81 0.70 0.53 to 0.91 0.70 0.50 to 0.97
Vegetable intake, servings/d
< 5 1.00
≥ 5 0.72 0.51 to 1.02
BMI, kg/m2
≤ 29.9 1.00 1.00
≥ 30 1.08 0.75 to 1.54 1.40 0.95 to 2.06
Age, years
≤ 65 1.00
> 65 0.55 0.41 to 0.74
Estrogen status within the last 2 years
Negative 1.00 1.00 1.00
Positive 0.68 0.52 to 0.90 0.48 0.33 to 0.68 0.67 0.48 to 0.94
BMI-estrogen interaction 2.68 1.39 to 5.20

Estimates of the Baseline Age-Specific CRC Hazard Rates

To compute baseline hazard rates, we estimated separate ARs for the three models for men and women. Because the ARs did not differ by study site, we obtained combined AR estimates separately for proximal, distal and rectal cancer. The AR estimates for men were 0.86 (95% CI, 0.79 to 0.91) for proximal, 0.72 (95% CI, 0.63 to 0.80) for distal, and 0.90 (95% CI, 0.69 to 0.97) for rectal cancer. For women, the AR estimates were 0.81 (95% CI, 0.69 to 0.90) for proximal, 0.82 (95% CI, 0.73 to 0.89) for distal in women younger than 65 years, 0.85 (95% CI, 0.76 to 0.91) for distal in women age 65 years or older, and 0.93 (95% CI, 0.57 to 0.99) for rectal cancer.

Examples of Individual Absolute Risk Estimates for CRC

Table 4 presents estimates of the 10- and 20-year projected absolute risks of developing CRC for white men with various ages and risk factor profiles. The first risk profile, the lowest risk example, describes a 50-year-old man who had a colonoscopy in the last 10 years without evidence of polyps. He has no family history of CRC, vigorously exercises 5 hours/week, takes aspirin daily, never smoked, eats more than five servings of vegetables/day, and has a BMI of 24 kg/m2. His 10-year predicted absolute risk of developing CRC is only 0.16% (95% CI, 0.11 to 0.22), and his 20-year risk is 0.53% (95% CI, 0.38 to 0.73). In risk profile 9, a high-risk example, we consider a 60-year-old man who had a colonoscopy in the last 10 years and was found to have a polyp. He has two relatives with CRC, does not exercise regularly, does not take aspirin or NSAIDS regularly, smokes more than 20 cigarettes a day, eats fewer than five servings of vegetables per day, and has a BMI of 31 kg/m2. His 10-year predicted absolute risk of developing CRC is 7.14% (95% CI, 3.9 to 12.8), and his 20-year risk is 16.7% (95% CI, 9.1 to 28.5). Similar 10- and 20-year projected absolute risks of developing CRC for white women with various ages and risk factor profiles are presented in Table 5.

Table 5.

Examples of 10- and 20-Year Absolute Risk Estimates for CRC for White Women of Different Ages and Risk Factor Profiles

Patient Profile No. Age (years) Sigmoidoscopy and/or Colonoscopy in the Last 10 Years Polyp in the Last 10 Years No. of First-Degree Relatives With CRC Current Leisure-Time Activity (h/wk) Regular User of Aspirin/NSAIDs Vegetable Intake (servings/d) Body Mass Index (kg/m2) Estrogen Status Within the Last 2 Years 10-Year Absolute Risk 20-Year Absolute Risk
% 95% CI % 95% CI
1 50 Yes No 0 5 Yes 6 28 Positive 0.09 0.07% to 0.13% 0.31 0.22% to 0.43%
2 50 No 1 3 Yes 7 29 Negative 0.71 0.50% to 1.01% 2.25 1.61% to 3.13%
3 50 Yes Yes 2 0 No 3 32 Negative 2.51 1.50% to 4.19% 8.11 4.89% to 13.17%
4 55 Yes No 0 5 Yes 6 28 Positive 0.15 0.11% to 0.21% 0.43 0.31% to 0.62%
5 55 No 1 1 No 2 31 Positive 1.85 1.30% to 2.64% 4.99 3.55% to 6.98%
6 55 Yes Yes 2 0 No 3 32 Negative 4.13 2.46% to 6.84% 11.38 6.88% to 18.23%
7 60 Yes No 0 5 Yes 6 28 Positive 0.22 0.16% to 0.32% 0.57 0.40% to 0.82%
8 60 No 1 1 No 3 32 Positive 2.61 1.85% to 3.68% 6.37 4.57% to 8.82%
9 60 Yes Yes 2 0 No 3 32 Negative 6.01 3.60% to 9.87% 14.72 8.94% to 23.29%
10 65 Yes No 0 5 Yes 6 28 Positive 0.30 0.21% to 0.44% 0.69 0.48% to 1.00%
11 65 No 1 4 Yes 6 28 Negative 2.13 1.54% to 2.94% 4.64 3.38% to 6.35%
12 65 Yes Yes 2 0 No 3 32 Negative 8.16 4.89% to 13.31% 17.49 10.66% to 27.37%

Table 4.

Examples of 10- and 20-Year Absolute Risk Estimates for CRC for White Men of Different Ages and Risk Factor Profiles

Patient Profile No. Age (years) Sigmoidoscopy and/or Colonoscopy in the Last 10 Years Polyp in the Last 10 Years No. of First-Degree Relatives With CRC Current Leisure-Time Activity (h/wk) Regular User of Aspirin/NSAIDs (at least 3 times/wk) Smoking (usual No. of cigarettes/day) Vegetable Intake (servings/d) Body Mass Index (kg/m2) 10-Year Absolute Risk 20-Year Absolute Risk
% 95% CI % 95% CI
1 50 Yes No 0 5 Yes 0 6 24 0.16 0.11% to 0.22% 0.53 0.38% to 0.73%
2 50 No 0 1 No 8 3 27 1.06 0.90% to 1.25% 3.48 2.92% to 4.13%
3 50 Yes Yes 2 0 No > 20 4 31 2.71 1.49% to 4.87% 9.14 5.02% to 16.05%
4 55 Yes No 0 6 Yes 0 6 24 0.26 0.19% to 0.36% 0.70 0.53% to 1.04%
5 55 No 1 3 Yes 0 3 33 2.02 1.45% to 2.81% 5.57 3.97% to 7.76%
6 55 Yes Yes 2 0 No > 20 4 32 4.59 2.51% to 8.25% 12.98 7.11% to 22.50%
7 60 Yes No 0 5 Yes 0 6 24 0.40 0.29% to 0.56% 0.93 0.66% to 1.33%
8 60 No 1 4 Yes 0 3 32 3.07 2.19% to 4.28% 6.97 4.93% to 9.77%
9 60 Yes Yes 2 0 No > 20 2 31 7.14 3.89% to 12.77% 16.65 9.11% to 28.46%
10 65 Yes No 0 5 Yes 0 6 24 0.54 0.38% to 0.77% 1.07 0.74% to 1.55%
11 65 No 0 1 No 10 2 26 3.51 2.89% to 4.26% 6.78 5.50% to 8.32%
12 65 Yes Yes 2 0 No > 20 4 31 9.95 5.37% to 17.8% 19.4 10.6% to 32.7%

Risk Assessment Questionnaire

The short, paper-based, self-administered risk assessment questionnaire we constructed to capture the information required by the models (Tables 2 and 3) requires 5 to 8 minutes to complete. A Web version of the questionnaire is available at http//www.cancer.gov/colorectalcancerrisk/.

DISCUSSION

We present a model that predicts the probability or absolute risk of developing CRC for men and women age 50 years and older. We combined separate RRs and ARs and baseline hazards for proximal, distal, and rectal cancers to project the risk of the earliest of these tumors. We also developed a short, simple, self-administered risk assessment questionnaire that can be used to obtain information for risk estimation.

In related work, we used independent data from the National Institutes of Health (NIH)-AARP Diet and Health Cohort Study26 of men and women age 55 years and older to assess the performance of our models.27 We found that the models had discriminatory accuracy comparable with absolute risk models for other cancers and were well calibrated.

Although the models were developed from cases and controls age 50 years and older, one could project risk for younger individuals by assuming that our relative and ARs apply to younger populations and by using younger age-specific SEER rates. However, such assumptions would need to be checked in independent data because risk factors and biologic mechanisms may differ among those developing CRC at younger ages.

Although absolute risk models exist for breast cancer and lung cancer,19,28 this is the first such model for CRC. The four other CRC risk prediction models currently available either apply to special populations, such as patients who were referred by general practitioners to gastroenterologists for symptoms,8 provide a qualitative index of CRC risk,7,10 or predict different outcomes, such as the risk of having an advanced polyp or cancer in the proximal portion of the colon.9

Our model estimates the probability of developing CRC over a prespecified time interval from data collected from two large US population-based case-control studies of colon and rectal cancer, incidence data from 13 SEER registries, which are generally representative of the US population29 and from national mortality rates. Thus, our risk prediction models would be expected to apply to the general non-Hispanic white US population.

We used factors in our models that, in addition to having strong predictive ability, can also be ascertained easily in a clinical setting. Thus, we did not include some factors that may have predictive value, such as and calcium intake or long-term vigorous activity3033 but which would require a much more complex questionnaire.34

Our risk prediction model has some limitations. Because the majority of participants in the case-control studies were white, we could not estimate RRs for other racial or ethnic groups. A first step to developing models for other racial/ethnic groups could be to combine RR and AR estimates for whites with SEER rates for blacks, Asians, or Hispanics. However, the assumption of constant AR and RR estimates across racial groups needs to be validated in minority populations. Our model is not applicable to individuals with ulcerative colitis, Crohn's disease and familial adenomatous polyposis, because these conditions carry a high risk of CRC, and individuals with these conditions were excluded from the studies. Additionally, our model is not applicable to individuals with hereditary nonpolyposis CRC.

Because we used US mortality data from 1990 to 2002 for our competing mortality hazards, we did not adjust these estimates for potential confounders such as BMI given that our sensitivity analyses indicated that changes in the risk estimates were small (data not shown). Although our two case-control studies were conducted at slightly different time periods, we believe any changes in the distribution of risk factors would have a minimal effect on our risk estimates, considering that RRs and ARs were estimated separately for the two studies.

Another limitation of our model is that we estimated our RRs and ARs from case-control rather than from cohort studies. Although case-control studies have been used in the development of risk prediction models for melanoma,35 breast,19,36,37 bladder,38 and lung cancer,39 a general criticism is that such estimates could be subject to recall bias. However, recall bias likely plays a minor role in our models as most of the RRs we found, including BMI, physical activity, HRT, and aspirin and NSAIDs use, were consistent with RRs summarized in a recent comprehensive review of the epidemiologic literature.6 Although not covered in this review, our risk estimates for screening and polyp history,2,40,41 smoking,42 and family history43,44 are also consistent with many previously published findings, including results from cohort studies. Additionally, the models were well calibrated in an independent validation study using the AARP cohort.27

In summary, we developed an absolute CRC risk projection model for white men and women age 50 years or older that may aid physicians and their patients in deciding on screening regimens, and can also be useful in designing chemoprevention and screening intervention trials.

Supplementary Material

[Data Supplement]

Acknowledgment

We thank Anne Rodgers for her helpful comments and editorial assistance.

Appendix

Study Populations

For the colon cancer study, cases with a first primary CRC (ICD-O second edition codes 18.0, 18.2 to 18.9) diagnosed between October 1, 1991, and September 30, 1994, were identified in Utah, the metropolitan Twin Cities area in Minnesota, and KPMCP of Northern California using rapid-reporting systems. The second study included cases with a first primary tumor in the rectosigmoid junction or rectum identified between May 1997 and May 2001 in Utah or Northern California, using a rapid case-ascertainment program conducted in conjunction with the SEER Program. Individuals with a previous CRC, familial adenomatous polyposis, ulcerative colitis, or Crohn's disease as determined by the pathology report were not eligible for these studies.

Controls for both case-control studies were matched to cases (during their respective recruitment periods) by sex and 5-year age groups. In Utah, controls age 65 years and older were randomly selected from Health Care Financing Administration (now the Centers for Medicare & Medicaid Services) lists. Controls younger than 65 years were randomly selected from driver's license lists. In Minnesota, all controls were randomly selected from driver's license lists. In Northern California, controls were randomly selected from KPMCP membership lists. Data for both studies were collected using the same study questionnaire and the same quality-control procedures that have been described in detail elsewhere (Slattery ML, Caan BJ, Duncan D, et al. J Am Diet Assoc 94:761-766, 1994; Edwards S, Slattery ML, Mori M, et al. Am J Epidemiol 140:1020-1028, 1994; Slattery ML, Jacobs DR Jr. Ann Epidemiol 5:292-296, 1995)

Description of Variables

For all of the variables in our analyses, including age, the referent year was the calendar year, 2 years before the date of diagnosis for cases, and 2 years before date of selection for controls.

History of CRC in first-degree relatives.

We used information reported by participants on the number of first-degree relatives (ie, father, mother, siblings, and children) diagnosed with CRC. We categorized the number of first-degree relatives with CRC as none, one, two, or more in our analyses.

FOBT.

We categorized participants based on whether they had reported an FOBT in the last 10 years before the referent year.

Sigmoidoscopy/colonoscopy.

We also categorized participants on the basis of whether they reported having a sigmoidoscopy and/or colonoscopy in the last 10 years before the referent year. Because these examinations occurred 2 years before the date of diagnosis for cases, none these examinations resulted in a CRC diagnosis.

History of polyps.

Among those who reported having had a sigmoidoscopy and/or colonoscopy, we categorized participants by they had been told by a physician that they had a polyp in the last 10 years before the referent year.

Use of multivitamins.

We categorized participants as regular users of multivitamins if they reported using multivitamins at least three times/week for at least 1 month during the referent year.

Red meat, vegetable, and fruit consumption.

We used information on the number of servings of red meat, vegetables, and fruits per week reported by participants during the referent year. These dietary exposures were collected during in-person interviews conducted by trained and certified interviewers.

Alcohol intake.

We combined reported weekly intake of servings of red wine, beer, and liquor to derive an estimate of total alcohol consumption per week. We categorized these estimates into quartiles among controls for our analyses.

BMI.

Self-reported weight during the referent year and measured height at interview were used to calculate BMI, defined as weight (kg)/height2 (m2). We categorized participants' BMI based on WHO classifications: underweight and normal weight (BMI ≤ 24.9 kg/m2), overweight (BMI 25-29.9 kg/m2), and obese (BMI ≥ 30 kg/m2).

Smoking.

We categorized participants by 1 cigarette smoking status (never smoker, current smoker, and former smoker), 2 pack-years (calculated by dividing the average number of cigarettes smoked in a usual day by 20 and multiplying this number by the number of years smoked), and 3 usual number of cigarettes smoked per day for current and former smokers, and 4 years of smoking.

Aspirin and other NSAIDs.

We categorized participants as regular users of aspirin and other NSAIDs if they reported using these medications at least three times a week for at least 1 month during the referent year.

Physical activity.

We assessed participants' current leisure-time vigorous activity during the referent year and grouped participants into four categories: 0 hours/week, up to 2 hours/week, 2 to 4 hours/week, and more than 4 hours/week. Vigorous leisure-time activities were defined as “those activities which make you sweat or get out of breath” and included questions about racket sports, jogging, running and biking, exercise or dance class, weight lifting, hiking, vigorous swimming, scrubbing floors or mowing the lawn, and gardening and heavy labor.

Estrogen status.

We created a variable “estrogen status,” which was categorized as “estrogen positive” or “estrogen negative” based on a combination of menopausal status and use of HRT. Premenopausal women and women who had used HRT within the 2 years before the referent year were considered estrogen positive. Postmenopausal women not reporting the use of HRT during the 2 years before the referent year were considered estrogen negative.

Projecting Probabilities (absolute risk) of Developing CRC

The absolute risk A*(a,b) of CRC in the age interval (a,b) is the probability of developing CRC during that interval, given that one is alive and free of previous CRC at the beginning of the interval. The absolute risk is reduced by death from causes other than CRC. The absolute risk is defined mathematically by

graphic file with name zlj00509-8036-m01.jpg

where S*(a) = exp[−0a{λcrc(u,x) + λM(u)}du].

The main building blocks for our model are the hazard rates λ_crc_ for CRC incidence and λ_M_ for competing causes of death other than CRC. Because incidence rates and the impact of risk factors differ by CRC tumor site, we decomposed the CRC incidence hazard rate into the sum of a proximal, distal, and rectal cancer hazards, denoted by i = P,D,R. Let T denote the earlier of the age of onset of the first colorectal outcome or age at death resulting from other causes. The cause-specific hazards that may depend on covariates x are defined as λi(a,x) = lim ε→0 P(a ≤ T < a + ε, J = i|T > a,x)/ε, i = P,D,R,M.

We used λcrc(a,x) = λP(a,x) + λD(a,x) + λR(a,x), and modeled λi(a,x) = λi(a) rri(a,x) as the product of the age-specific baseline hazard rate and the RR, rri(a,x) that includes covariates, for outcomes i = P,D,R. We thus estimated three separate RR models for the CRC outcomes. We did not include covariates in the hazard for competing causes of death, and used λM(a,x) = λM(a). The absolute risk of developing CRC thus is the probability of developing the first of proximal, distal, or rectal cancer in a given age interval. Following the approach outlined by Willis (Willis GB: Cognitive Interviewing: A Tool for Improving Questionnaire Design. Thousand Oaks, CA, SAGE Publications, 2004) we estimated the RR parameters from case-control data and then estimated baseline age-specific cancer hazard rates λi(a) for i = P,D,R as described in the following sections.

Estimating the Baseline Age-Specific CRC Hazard Rates and ARs

The baseline hazard rate λi(a) at age a for i = P,D,R is the hazard rate for individuals each of whose risk factors are at the lowest risk level. The age-specific baseline hazard rates are computed by multiplying the age-specific SEER incidence rates λ*(a) by 1 – the estimate of the appropriate AR as described in Smith and Dean (Smith T, Dean D: Colorectal Cancer Health Assessment Testing of a Self-Administered Survey: Summary Report—Technical Report Prepared for the National Cancer Institute. Rockville, MD, Westat, 2005) that is, λi(a) = λi*(a) (1 – ARi(a)). To estimate ARi for each of the three sites, we used ARi = 1 – (1/mi)Σj 1/rji, where mi is the number of cases of cancer type i, rji is the estimated RR for the jth case of the typei, and the summation is over all the cases of the type (Bruzzi P, Green SB, Byar DP, et al. Am J Epidemiol 122:904-914, 1985)

Colorectal Cancer Incidence and Mortality

Appendix Table A1 denotes colorectal cancer incidence for white non-Hispanic men. Appendix Table A2 denotes colorectal cancer incidence for white non-Hispanic women. Appendix Table A3 denotes mortality for white non-Hispanic women. Appendix Table A4 denotes mortality for white non-Hispanic men.

Variance Calculations

Recall that the age-specific baseline hazard rates λ(t) at age t are computed as λ (t) = λ*(t) (1 – AR(t)), the product of the composite rates λ* from SEER and 1 – AR, the attributable risk. Thus, we can write λi(a,x) in equation 1 as λi(a,x) = λi*(a) (1 – ARi(a)) rri(x), i = P,D,R. In what follows, we assume that the composite rates λ* are known without error; thus, only the factors (1 – ARi(a)) rri(x) contribute to the variance of A* in equation 1.

For ease of presentation, we let Hi = (1 – ARi(a)) rri(x). First, we compute the covariance matrix of (HP, HD, HR) by adapting the influence function based–approach of Graubard and Fears (Graubard BI, Fears TR: Biometrics 61:847-855, 2005). Let Yij denote case control status, that is, Yij = 1 for a case with cancer type i and 0 for a control, and xij the vector of covariates for the jth subject, including an intercept term. Also let pij = exp(xijb){1 − exp(xijb)}−1. The AR for the ith outcome is estimated as

graphic file with name zlj00509-8036-m02.jpg

(Bruzzi P, Green SB, Byar Dp, et al. Am J Epidemiol 122:904-914, 1985). Letting

graphic file with name zlj00509-8036-m03.jpg

and

graphic file with name zlj00509-8036-m04.jpg

graphic file with name zlj00509-8036-m05.jpg

where xi denotes the risk factors specific for ith cancer model. The influence of observation j on _H_i is

graphic file with name zlj00509-8036-m06.jpg

graphic file with name zlj00509-8036-m07.jpg

graphic file with name zlj00509-8036-m08.jpg

graphic file with name zlj00509-8036-m09.jpg

The random variables z_j for cases and controls from the three study centers are independent and are assumed to be random samples from six separate strata, defined by case-control status and site. The variance of H_i is thus estimated as

graphic file with name zlj00509-8036-m10.jpg

where ns is the number of cases or controls in stratum s and zs is the stratum mean of the variables zj. Because the same controls were used to estimate the RR parameters for proximal and distal cancers, we also compute the covariance between HP and HD based on the controls as

graphic file with name zlj00509-8036-m11.jpg

The superscripts u denote the scores from the controls from the three study sites, and n_u_ stands for the corresponding numbers of controls.

After the covariance matrix Σ of (HP, HD, HR) is computed, the variance of A* in equation 1 is obtained by applying the delta method as DT Σ D, where DT = ∂A*∂HP, ∂A*∂HD, ∂A*∂HR.

Table A1.

Colorectal Cancer Incidence for White Non-Hispanic Men

Age (years) Incidence (cases per 100,000)
Proximal Distal Rectal
50-54 15.4 14.4 21.0
55-59 27.9 27.4 35.2
60-64 49.2 46.0 52.5
65-69 80.1 69.6 76.7
70-74 121.5 88.3 90.2
75-79 169.0 107.3 103.5
80-84 219.02 118.1 115.1
≥ 85 250.5 113.7 111.0
Table A2.

Colorectal Cancer Incidence for White Non-Hispanic Women

Age (years) Incidence (cases per 100,000)
Proximal Distal Rectal
50-54 12.3 11.5 13.3
55-59 23.3 18.6 20.9
60-64 42.1 29.9 30.5
65-69 67.8 41.2 40.8
70-74 103.1 57.6 50.9
75-79 144.9 65.7 62.7
80-84 191.4 74.9 72.8
≥ 85 220.2 77.2 75.4
Table A3.

Mortality for White Non-Hispanic Men

Age (years) Mortality (cases per 100,000)
All Cause CRC All Cause-CRC
50-54 612.9 20.4 592.5
55-59 979.7 37.5 942.2
60-64 1,594.6 68.7 1,525.9
65-69 2,503.0 99.2 2,403.8
70-74 3,813.6 139.5 3,674.1
75-79 5,870.7 190.6 5,680.1
80-84 9,338.6 259.7 9,078.9
≥ 85 17,577.5 371.5 17,206.0
Table A4.

Mortality in White Non-Hispanic Women

Age (years) Mortality (cases per 100,000)
All Cause CRC All Cause-CRC
50-54 361.0 14.0 347.0
55-59 589.5 24.7 564.8
60-64 954.6 39.7 914.9
65-69 1,480.1 59.7 1,420.4
70-74 2,319.1 86.9 2,232.2
75-79 3,680.4 123.1 3,557.3
80-84 6,175.8 177.2 5,998.6
≥ 85 14,316.6 285.8 14,030.8
Table A5.

Cases and Controls for Covariates Used in Relative Risk Estimation (in Table 2) for Proximal, Distal, and Rectal Cancers for White Men Age ≥ 50 Years

Variable No.
Proximal Cancer Cases Distal Cancer Cases Proximal and Distal Controls Rectal Cancer Cases Rectal Cancer Controls
Sigmoidoscopy and/or colonoscopy and polyp history
Sigmoidoscopy and/or colonoscopy in last 10 years, and no history of polyps 101 71 348 56 172
No sigmoidoscopy and/or colonoscopy in last 10 years 225 305 511 312 249
Sigmoidoscopy and/or colonoscopy in last 10 years and history of polyps 57 28 97 26 40
Sigmoidoscopy and/or colonoscopy and polyps unknown 46 58 102 3 17
No. of relatives with CRC
0 360 390 960 37 441
1 54 64 91 39 37
≥ 2 15 8 7
Current leisure-time activity, h/wk
0 123 100
> 0 and ≤ 2 155 191
> 2 and ≤ 4 44 92
> 4 75 95
Aspirin/NSAID use
Nonuser 262 275 536 337 365
Regular user 167 187 522 60 113
Smoking, cigarettes/d
Never smoker 130 386
> 0 and < 11 41 125
≥ 11 and ≤ 20 138 263
> 20 120 284
Years of smoking
0 130 386
> 0 and < 15 41 125
≥ 15 and < 35 138 263
≥ 35 120 284
Vegetable intake, servings/d
< 5 373 842
≥ 5 56 216
Body mass index, kg/m2
≤ 24.9 110 110 337
25.0 to ≤ 30 205 221 514
> 30 114 131 207
Table A6.

Cases and Controls for Covariates Used in Relative Risk Estimation (in Table 3) for Proximal, Distal, and Rectal Cancers for White Women Age ≥ 50 Years

Variable No.
Proximal Cases Distal Cases Proximal and Distal Controls Rectal Cases Rectal Controls
Sigmoidoscopy and/or colonoscopy and polyp history
Sigmoidoscopy and/or colonoscopy in last 10 years, and no history of polyps 66 35 269 32 110
No sigmoidoscopy and/or colonoscopy in last 10 years 220 225 493 217 248
Sigmoidoscopy and/or colonoscopy in last 10 years and history of polyps 31 26 45 17 15
Sigmoidoscopy and/or colonoscopy and polyps unknown 54 48 109 1 8
No. of relatives with CRC
0 308 285 811 39 352
1 55 41 93 37 39
≥ 2 8 8 12
Current leisure-time activity, h/wk
0 152 400 110 120
> 0 and ≤ 2 112 351 101 166
> 2 and ≤ 4 40 68 29 48
> 4 30 97 27 47
Aspirin/NSAID use
Nonuser 235 207 462 149 169
Regular user 136 127 454 118 212
Vegetable intake, servings/d
< 5 317 743
≥ 5 54 173
Body mass index, kg/m2
≤ 29.9 246 726 192 298
≤ 30 88 190 75 83
Age ≤ 65 > 65 years 139,195 287,629
Estrogen status within the last 2 years
Negative 270 236 593 153 169
Positive 101 98 323 114 212

Footnotes

Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The author(s) indicated no potential conflicts of interest.

AUTHOR CONTRIBUTIONS

Conception and design: Andrew N. Freedman, Martha L. Slattery, Rachel Ballard-Barbash, Mitchell H. Gail, Ruth M. Pfeiffer

Financial support: Andrew N. Freedman, Rachel Ballard-Barbash

Administrative support: Andrew N. Freedman, Martha L. Slattery, Rachel Ballard-Barbash, David Pee

Provision of study materials or patients: Martha L. Slattery, Bette J. Cann

Collection and assembly of data: Martha L. Slattery, Gordon Willis, Bette J. Cann

Data analysis and interpretation: Andrew N. Freedman, Martha L. Slattery, Rachel Ballard-Barbash, Gordon Willis, Bette J. Cann, David Pee, Mitchell H. Gail, Ruth M. Pfeiffer

Manuscript writing: Andrew N. Freedman, Martha L. Slattery, Rachel Ballard-Barbash, Gordon Willis, Bette J. Cann, Mitchell H. Gail, Ruth M. Pfeiffer

Final approval of manuscript: Andrew N. Freedman, Martha L. Slattery, Rachel Ballard-Barbash, Gordon Willis, Bette J. Cann, David Pee, Mitchell H. Gail

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Data Supplement]