Effect of long-term caloric restriction on DNA methylation measures of biological aging in healthy adults from the CALERIE trial (original) (raw)
We conducted new DNAm assays of stored blood biospecimens collected from the CALERIE Phase 2 randomized controlled trial and merged these data with existing secondary data from the trial. The assays of the biospecimens were conducted blind to the conditions of the trial. Details of trial design and the collection of other trial data were reported previously10,26.
Study design and participants
CALERIE Phase 2 was a multi-center, randomized controlled trial conducted at three clinical centers in the United States10 (ClinicalTrials.gov Identifier: NCT00427193). It aimed to evaluate the time-course effects of 25% CR (that is, intake 25% below the individual’s baseline level) over a 2-yr period in healthy adults (men aged 21–50 yr, premenopausal women aged 21–47 yr) with BMI in the normal weight or slightly overweight range (BMI 22.0–27.9 kg m−2). The study protocol was approved by Institutional Review Boards at three clinical centers (Washington University School of Medicine, St Louis, MO, USA; Pennington Biomedical Research Center, Baton Rouge, LA, USA; Tufts University, Boston, MA, USA) and the coordinating center at Duke University (Durham, NC, USA). All study participants provided written, informed consent. Nongenomic data were obtained from the CALERIE Biorepository (https://calerie.duke.edu/apply-samples-and-data-analysis).
Randomization and masking
After baseline testing, participants were randomly assigned at a ratio of 2:1 to a CR behavioral intervention or to an AL control group. Randomization was stratified by site, sex and BMI. A permuted block randomization technique was used.
Procedures
Study procedures were published previously10,21,26 and are described here in brief. Participants in the CR group were prescribed a 25% restriction in calorie intake based on energy requirements estimated from two DLW measurement periods at baseline. Participants were provided three meals per day for 27 d to familiarize themselves with portion sizes for a 25% reduced calorie intake; meals included eating plans modified to suit various cultural preferences. Participants also received instruction on the essentials of CR. Finally, participants were provided with intensive group and individual behavioral counseling sessions once a week, with 24 group and individual counseling sessions over the first 24 weeks of the intervention. Adherence to the CR intervention was estimated in real time by the degree to which individual weight change followed a predicted weight loss trajectory (15.5% weight loss at 1 yr followed by weight loss maintenance). The precise level of CR achieved was quantified retrospectively by calculating energy intake during the CR intervention and comparing it with baseline energy intake. Energy intake during the 2-yr trial was quantified from total daily energy expenditure (assessed during 2-week DLW periods every 6 months) and changes in body composition (that is, fat mass and fat-free mass). Participants assigned to the AL group continued on their regular diets; they received no specific dietary intervention or counseling. They had quarterly contact with study investigators to complete the assessments.
Quantification of %CR
Mean %CR was calculated at each of the follow-up timepoints as percentage decrease in energy intake relative to baseline using the equation %CRmean = (1 − EImean/EIBL) × 100 (ref. 21). EIBL was defined as total energy expenditure (TEE) at preintervention baseline and EImean was defined as the average of TEE across all follow-up visits through the visit at which %CR was calculated. TEE was measured by the DLW method during two consecutive 2-week periods at baseline and during 2-week periods at months 6, 12, 18 and 24 in the CR group10,44.
DNAm data
DNA extracted from blood samples was obtained from the CALERIE Biorepository at the University of Vermont. DNAm data were generated by the Kobor Lab at the University of British Columbia and processed by the Genomic Analysis and Bioinformatics Shared Resource at Duke University. Illumina Infinium Methylation EPIC BeadChip arrays were used to assay genome-wide DNAm data from banked DNA samples extracted from blood collected at the baseline, 12-month and 24-month follow-ups. The EPIC array quantifies DNAm levels at >850,000 CpG sites across all known genes, regions and key regulatory regions. Briefly, 750-ng extracted DNA samples were bisulfite converted using the EZ DNA Methylation kit (Zymo Research), and 160 ng of the converted DNA was used as input for the EPIC arrays (Illumina). EPIC arrays were processed according to the manufacturer’s instructions and scanned using the Illumina iScan platform. To the extent possible, baseline, 12-month and 24-month samples from the same individual were processed in the same array batch and on the same BeadChip to minimize batch effects; CR treatment and AL control participants were included on all chips. Quality control and normalization analyses were performed using the methylumi (v.2.32.0)51 Bioconductor (v.2.46.0)52 package for the R statistical programming environment (v.3.6.3). Probes were considered missing in a sample if they had detection P values >0.05 and were excluded from the analysis if they were missing in >5% of sample. Normalization to eliminate systematic dye bias in 2-channel probes was carried out using the methylumi default method. Following quality control and normalization, DNAm data for 828,613 CpGs were available for n = 595 samples (baseline n = 214; 12 months n = 193; 24 months n = 188). Additional batch correction was performed by residualizing DNAm measurements for PCs estimated from array control-probe beta values53. Cell count estimation was performed using the Houseman equation via the minfi and FlowSorted.Blood.EPIC R packages28,54.
DNAm clocks and pace-of-aging measures
DNAm clocks are algorithms that combine information from DNAm measurements across the genome to quantify variation in biological age55.
The first-generation DNAm clocks were developed from machine-learning analyses comparing samples from individuals of different chronological age. These clocks were highly accurate in predicting the chronological age of new samples and also showed some capacity for predicting differences in mortality risk, although effect sizes tend to be small and inconsistent across studies56,57,58. We analyzed the first-generation clocks proposed by Horvath (Horvath clock) and Hannum et al. (Hannum clock)56,57.
The second-generation DNAm clocks were developed with the goal of improving quantification of biological aging by focusing on differences in mortality risk instead of on differences in chronological age22,23. These clocks also include an intermediate step in which DNAm data are fitted to physiological parameters. The second-generation clocks are more predictive of morbidity and mortality as compared with the first-generation clocks59 and are proposed to have improved potential for testing impacts of interventions to slow aging14. We analyzed the second-generation clocks proposed by Levine et al. (PhenoAge clock) and Lu et al. (GrimAge clock)22,23.
A limitation of several DNAm clocks is that when residualized for chronological age, values show only moderate test–retest reliability across technical replicates. Test–retest reliability is a critical feature of measurements used to evaluate the impact of intervention because change from preintervention to postintervention cannot be distinguished from technical noise unless reliability is high. To improve technical reliability, Higgins-Chen and colleagues developed a new computational method that retrained DNAm clocks using DNAm PCs25. The resulting ‘PC clocks’ demonstrate exceptional test–retest reliability across technical replicates.
A third generation of DNAm measures of aging are referred to as pace-of-aging measures. In contrast to first- and second-generation DNAm clocks, which aim to quantify how much aging has occurred up to the time of measurement, pace-of-aging measures aim to quantity how fast the process of aging-related deterioration of system integrity is proceeding. We analyzed the newest pace-of-aging measure, DunedinPACE, which is shorthand for ‘Pace of Aging Computed from the Epigenome’24. DunedinPACE was developed by modeling within-individual multi-system physiological change across four timepoints in same-age individuals in the Dunedin Study 1972–1973 birth cohort60,61, when participants were aged 26, 32, 38 and 45 yr. DunedinPACE was developed from analysis of a pace-of-aging composite of slopes of aging-related change in the following physiological measures: ApoB100/ApoA1 ratio, BMI, blood urea nitrogen, high-sensitivity C-reactive protein, cardiorespiratory fitness, dental caries experience, total cholesterol, forced expiratory volume in 1 second, forced expiratory volume in 1 second/fixed vital capacity ratio, estimated glomerular filtration rate, hemoglobin A1C, high-density lipoprotein cholesterol, leptin, lipoprotein(a), mean arterial pressure, mean periodontal attachment loss, triglycerides, waist-to-hip ratio and white blood cell count. Slopes of change were estimated from four repeated measurements collected over a period of two decades. This physiological pace-of-aging composite is described in detail in ref. 61. The DunedinPACE DNAm algorithm was derived from elastic net regression of the physiological pace-of-aging composite on Illumina EPIC array DNAm data derived from blood samples collected at the age 45 follow-up assessment. The set of CpG sites included in the DNAm dataset used to develop the DunedinPACE algorithm was restricted to those showing acceptable test–retest reliability as determined in the analysis in ref. 62. The DunedinPACE DNAm algorithm is described in detail in ref. 24.
Our primary analysis focused on the PC versions of the PhenoAge and GrimAge second-generation clocks and DunedinPACE, all of which show exceptional test–retest reliability in technical replicates. We report results for both original and PC versions of DNAm clocks in the Supplementary Information.
Analysis
Analysis included all participants with available DNAm data at trial baseline and at least one follow-up timepoint.
We computed change scores for all aging measures by comparing values at the 12-month and 24-month follow-up assessments with baseline values (that is, 12-month change = 12-month value − baseline; 24-month change = 24-month value − baseline). We conducted analyses of these change scores to test the hypothesis that CR slows biological aging using two complementary approaches: (1) we conducted ITT analysis which compared change scores between participants randomized to CR intervention and the AL control group; (2) we conducted TOT analysis using IV methods to estimate the effect of CR on change scores.
In ITT analysis, we tested the effect of randomization to CR versus AL on aging measure change scores using repeated-measures ANCOVA implemented under mixed models, following the approach used in past CALERIE analysis26. The model included terms for treatment condition (CR or AL), follow-up time, an interaction term modeling heterogeneity in the treatment effect between the 12- and 24-month follow-ups, the baseline level of the aging measure and the following pretreatment covariates: chronological age, sex, race/ethnicity (Black, White, Other), BMI stratum at randomization (normal weight (22.0–24.9 kg m−2) and overweight (25.0–27.9 kg m−2)) and study site. Models were fitted using the Stata software’s ‘mixed’ command. Details of estimation and calculation of confidence intervals are reported in Stata’s documentation of the command63.
In TOT analysis, we tested the effect of the CR intervention on aging measure change scores using IV regression implemented using a two-stage least squares approach64. The first-stage regression modeled CR treatment dose as a function of randomization condition (CR versus AL) and pretreatment characteristics (chronological age, sex, race/ethnicity, BMI, study site and baseline value of the biological aging measure). The model instruments were randomization condition and interactions of randomization condition with sex and pretreatment values of BMI and the biological aging measure. The second-stage regression modeled aging measure change scores as a function of the CR treatment dose estimated from the first-stage regression and pretreatment covariates. Separate models were fitted for the 12- and 24-month follow-ups. IV regression models were fitted using the Stata 16.0 software’s ‘ivregress’ command. Details of estimation and calculation of confidence intervals are reported in Stata’s documentation of the command65. TOT models are described in detail below.
In ITT and TOT analyses, effect sizes were scaled in standardized units according to the distribution of the aging measures at pretreatment baseline. For the DNAm clocks, clock ages were differenced from chronological ages and standard deviations for these age-difference values were used for scaling. For DunedinPACE, the standard deviations of the original values were used for scaling. Treatment effects denominated in these standardized units are interpreted as Cohen’s d.
Specification of TOT regression models
We tested TOT effects using two-stage least squares IV regression. IV regression is a method commonly used to reduce the impact of confounding in association analysis. It can also be applied to account for contamination/nonadherence in randomized trials64. Under conditions of nonadherence, traditional ITT analysis can result in a biased estimate of the treatment effect and an IV estimator can provide a complement66. In CALERIE, adherence was imperfect; the average CR achieved in the treatment group was roughly half the prescribed dose of 25% (ref. 10). The ITT estimate may therefore underestimate the effect of CR on biological aging.
In our analysis, we used IV regression to estimate the effect of 20% CR on change in measures of biological aging. We focused on a CR dose of 20% instead of the 25% dose prescribed in the trial because few individuals achieved 25% CR, especially through the 24-month follow-up. The 20% CR level represented the 75th percentile of the treatment group CR distribution at 12-month follow-up and the 87th percentile of the treatment group CR distribution at 24-month follow-up.
The IV approach we used involved two related regressions. The first regression modeled observed treatment dose (%CR relative to baseline) on pretreatment characteristics and the instrument of randomization condition. The second regression modeled the outcomes (changes in measures of biological aging) as functions of the predicted treatment dose estimated by the first regression and pretreatment covariates.
We developed our IV regression model by first modeling intervention group participants’ achieved CR treatment dose as a function of pretreatment covariates: chronological age, sex, BMI, study site. We fitted a saturated regression model including interactions among all pretreatment characteristics and additional covariate adjustment for race/ethnicity, which was included only as a main effect. (Race/ethnicity was omitted from the interaction terms because there was insufficient site- and sex-specific variation in race/ethnicity to fit models.) This analysis identified sex, baseline BMI and their interaction as statistically significant predictors of CR dose at the alpha = 0.05 level.
Next, we parameterized our IV regression specifying the first stage to include the ‘instruments’ of intervention group and interactions of intervention group with sex, pretreatment BMI and a three-way interaction between intervention condition, sex and pretreatment BMI. The base first-stage regression took the form \begin{array}{l}\% {{{\mathrm{CR}}}}_{{{t}}} = {{{a}}} + {{{\mathrm{CR}}}} + {{{\mathrm{CR}}}} \times {{{\mathrm{sex}}}} + {{{\mathrm{CR}}}} \times {{{\mathrm{BMI}}}}_{{{{\mathrm{baseline}}}}} \\+ {{{\mathrm{CR}}}} \times {{{\mathrm{sex}}}} \times {{{\mathrm{BMI}}}}_{{{{\mathrm{baseline}}}}} + {{{X}}} + e\end{array}(1)inwhich(1)
in which %CR_t_ is the %CR relative to baseline achieved at time t (either 12- or 24-month follow-up), BMIbaseline is pretreatment BMI, X is a matrix of all pretreatment covariates, a is a model intercept and e is the error term. Results from this first-stage regression were then included in the second-stage model:(1)inwhich{{{\mathrm{Delta}}}}\,{{{\mathrm{BA}}}}_{{{t}}} = {{{a}}} + \% {{{\mathrm{CR}}}}_{{{t}}} + {{{\mathrm{X}}}} + {{{e}}}$$
(2)
in which %CR_t_ is %CR predicted from equation (1). For final TOT analysis, we included a further instrument in the first-stage regression consisting of the interaction between the baseline level of the aging measure and the CR treatment group. Sensitivity analysis involving re-estimating the IV regression models omitting this final instrument did not change results.
Supplementary Fig. 1 plots predicted values of %CR based on our base first-stage model (that is, the model in equation (1)).
Statistics and reproducibility
We conducted new DNAm assays of stored blood biospecimens collected from the CALERIE Phase 2 randomized controlled trial and merged these data with existing secondary data from the trial. The assays of the biospecimens were conducted blind to the conditions of the trial. After baseline testing, n = 220 participants were randomly assigned at a ratio of 2:1 to a CR behavioral intervention or to an AL control group. Randomization was stratified by site, sex and BMI. A permuted block randomization technique was used. No statistical methods were used to predetermine sample sizes; we analyzed data from all participants for whom blood DNAm data were available at baseline and at least one follow-up timepoint (N = 197; CR n = 128, AL n = 69). Participants had mean age of 38 yr (s.d. = 7), 70% were women and 77% were white; there were no differences in age, sex or race/ethnicity between AL and CR at baseline (Table 1). Data met model assumptions. Normality of outcome variables was evaluated by visual inspection of distributions and the Shapiro–Wilk test67. Equality of variances was evaluated according to the tests proposed by Brown and Forsythe68 and Markowski and Markowski69. Models used to test ITT and TOT effects were fitted with heteroskedasticity-robust standard errors. Normality of distribution of error terms was evaluated by visual inspection of histograms of residuals and the Shapiro–Wilk test.
DNAm clocks
DNAm clock measures of aging are algorithms that estimate biological age, the state of an organism’s biology represented as the age at which that state would be typical in a reference population. The clocks we analyzed were developed to predict mortality risk. The age values computed by the clock algorithms correspond to the age at which predicted mortality risk would be approximately normal in the reference population used to develop the clock. We computed clock values based on versions of the clock algorithms developed from DNAm PCs (sometimes referred to as ‘PC clocks’)18,21.
PhenoAge clock
The PhenoAge clock was based on analysis of nine blood chemistry markers, age and mortality data from the US National Health and Nutrition Examination Surveys (n = 9,926 participants aged 18 yr and older; 23 yr of mortality follow-up); DNAm and blood chemistry data from the Invecchiare in Chianti (InCHIANTI) Study (n = 912 participants aged 21–100 yr); and the US Health and Retirement Study (n = 3,593 participants aged 51–100 yr)19.
GrimAge clock
The GrimAge clock was based on analysis of eight plasma protein markers, smoking pack years, age, sex and mortality data from the Framingham Heart Study Offspring and Gen3 Cohorts (n = 2,751 participants aged 24–92 yr)47,48,49.
Pace of aging
Pace-of-aging measures estimate the rate of biological aging, defined as the rate of decline in overall system integrity. Pace-of-aging values correspond to the years of biological aging experienced during a single calendar year. A value of 1 represents the typical pace of aging in a reference population; values above 1 indicate faster pace of aging; values below 1 indicate slower pace of aging.
DunedinPACE
Based on analysis of pace of aging in the Dunedin Study (n = 817 participants examined at ages 26, 32, 38 and 45 yr)24, pace of aging was measured from within-person change over time in 19 blood chemistry and organ function test metrics of system integrity24. DNAm was measured at age 45 yr.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.