A systematic assessment of normalization approaches for the Infinium 450K methylation platform - PubMed (original) (raw)
A systematic assessment of normalization approaches for the Infinium 450K methylation platform
Michael C Wu et al. Epigenetics. 2014 Feb.
Abstract
The Illumina Infinium HumanMethylation450 BeadChip has emerged as one of the most popular platforms for genome wide profiling of DNA methylation. While the technology is wide-spread, systematic technical biases are believed to be present in the data. For example, this array incorporates two different chemical assays, i.e., Type I and Type II probes, which exhibit different technical characteristics and potentially complicate the computational and statistical analysis. Several normalization methods have been introduced recently to adjust for possible biases. However, there is considerable debate within the field on which normalization procedure should be used and indeed whether normalization is even necessary. Yet despite the importance of the question, there has been little comprehensive comparison of normalization methods. We sought to systematically compare several popular normalization approaches using the Norwegian Mother and Child Cohort Study (MoBa) methylation data set and the technical replicates analyzed with it as a case study. We assessed both the reproducibility between technical replicates following normalization and the effect of normalization on association analysis. Results indicate that the raw data are already highly reproducible, some normalization approaches can slightly improve reproducibility, but other normalization approaches may introduce more variability into the data. Results also suggest that differences in association analysis after applying different normalizations are not large when the signal is strong, but when the signal is more modest, different normalizations can yield very different numbers of findings that meet a weaker statistical significance threshold. Overall, our work provides useful, objective assessment of the effectiveness of key normalization methods.
Keywords: association testing; cotinine exposure; genome wide methylation profiling; normalization; reproducibility.
Figures
Figure 1. Boxplots of the pairwise Pearson correlation estimates between duplicate pairs constructed using (A) all probes, (B) just Type I probes, and (C) just Type II probes after application of each normalization method. Note that the scale of the y-axis for the center panel is considerably narrower which reflects the overall better reproducibility for Type I probes.
Figure 2. Boxplots of the pairwise 99th-QAD between duplicate pairs constructed using (A) all probes, (B) just Type I probes, and (C) just Type II probes after application of each normalization method. Note that the scale of the y-axis for the center panel is considerably narrower which reflects the overall better reproducibility for Type I probes.
Figure 3. Standard deviations of the probe intensities across technical replicates for two different adult DNA samples after applying each normalization method.
Figure 4. (A) Comparison of the distribution of pairwise Pearson correlations between duplicate pairs and non-duplicate pairs following application of each normalization method. Correlations for non-duplicate pairs are represented by shaded boxes and are lower across all methods. (B) Comparison of the distribution of 99th-QADs between duplicate pairs and non-duplicate pairs following application of each normalization method. 99th-QADs between non-duplicate pairs are represented by the shaded boxes and are higher across all methods.
Figure 5. Comparison of the density plots for adjacent Type I (blue lines) and Type II (red lines) probes in two different samples following application each normalization approach.
Figure 6. Comparison of the mean absolute difference between adjacent Type I and Type II probes across 40 samples after applying each method. Adjacent probes are believed to behave similarly and should exhibit similar distributions.
Figure 7. (A) Pairwise Pearson correlation between CpG island level aggregate values for duplicate pairs (B) Comparison of the distribution of pairwise Pearson correlations between CpG island level aggregate values duplicate pairs and non-duplicate pairs following application of each normalization method. Correlations for non-duplicate pairs are represented by shaded boxes and are lower across all methods.
References
- Joubert BR, Håberg SE, Nilsen RM, Wang X, Vollset SE, Murphy SK, Huang Z, Hoyo C, Midttun Ø, Cupul-Uicab LA, et al. 450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy. Environ Health Perspect. 2012;120:1425–31. doi: 10.1289/ehp.1205412. - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
- P30 ES010126/ES/NIEHS NIH HHS/United States
- R01HD058008/HD/NICHD NIH HHS/United States
- R01 HD058008/HD/NICHD NIH HHS/United States
- U01 NS047537/NS/NINDS NIH HHS/United States
- Z01-ES-49019/ES/NIEHS NIH HHS/United States
- P30ES010126/ES/NIEHS NIH HHS/United States
- ES-75558/ES/NIEHS NIH HHS/United States
- 1 U01 NS 047537-01/NS/NINDS NIH HHS/United States
- N01 ES075558/ES/NIEHS NIH HHS/United States
- Z01 ES049019/ImNIH/Intramural NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources