A systematic assessment of normalization approaches for the Infinium 450K methylation platform - PubMed (original) (raw)

A systematic assessment of normalization approaches for the Infinium 450K methylation platform

Michael C Wu et al. Epigenetics. 2014 Feb.

Abstract

The Illumina Infinium HumanMethylation450 BeadChip has emerged as one of the most popular platforms for genome wide profiling of DNA methylation. While the technology is wide-spread, systematic technical biases are believed to be present in the data. For example, this array incorporates two different chemical assays, i.e., Type I and Type II probes, which exhibit different technical characteristics and potentially complicate the computational and statistical analysis. Several normalization methods have been introduced recently to adjust for possible biases. However, there is considerable debate within the field on which normalization procedure should be used and indeed whether normalization is even necessary. Yet despite the importance of the question, there has been little comprehensive comparison of normalization methods. We sought to systematically compare several popular normalization approaches using the Norwegian Mother and Child Cohort Study (MoBa) methylation data set and the technical replicates analyzed with it as a case study. We assessed both the reproducibility between technical replicates following normalization and the effect of normalization on association analysis. Results indicate that the raw data are already highly reproducible, some normalization approaches can slightly improve reproducibility, but other normalization approaches may introduce more variability into the data. Results also suggest that differences in association analysis after applying different normalizations are not large when the signal is strong, but when the signal is more modest, different normalizations can yield very different numbers of findings that meet a weaker statistical significance threshold. Overall, our work provides useful, objective assessment of the effectiveness of key normalization methods.

Keywords: association testing; cotinine exposure; genome wide methylation profiling; normalization; reproducibility.

PubMed Disclaimer

Figures

None

Figure 1. Boxplots of the pairwise Pearson correlation estimates between duplicate pairs constructed using (A) all probes, (B) just Type I probes, and (C) just Type II probes after application of each normalization method. Note that the scale of the y-axis for the center panel is considerably narrower which reflects the overall better reproducibility for Type I probes.

None

Figure 2. Boxplots of the pairwise 99th-QAD between duplicate pairs constructed using (A) all probes, (B) just Type I probes, and (C) just Type II probes after application of each normalization method. Note that the scale of the y-axis for the center panel is considerably narrower which reflects the overall better reproducibility for Type I probes.

None

Figure 3. Standard deviations of the probe intensities across technical replicates for two different adult DNA samples after applying each normalization method.

None

Figure 4. (A) Comparison of the distribution of pairwise Pearson correlations between duplicate pairs and non-duplicate pairs following application of each normalization method. Correlations for non-duplicate pairs are represented by shaded boxes and are lower across all methods. (B) Comparison of the distribution of 99th-QADs between duplicate pairs and non-duplicate pairs following application of each normalization method. 99th-QADs between non-duplicate pairs are represented by the shaded boxes and are higher across all methods.

None

Figure 5. Comparison of the density plots for adjacent Type I (blue lines) and Type II (red lines) probes in two different samples following application each normalization approach.

None

Figure 6. Comparison of the mean absolute difference between adjacent Type I and Type II probes across 40 samples after applying each method. Adjacent probes are believed to behave similarly and should exhibit similar distributions.

None

Figure 7. (A) Pairwise Pearson correlation between CpG island level aggregate values for duplicate pairs (B) Comparison of the distribution of pairwise Pearson correlations between CpG island level aggregate values duplicate pairs and non-duplicate pairs following application of each normalization method. Correlations for non-duplicate pairs are represented by shaded boxes and are lower across all methods.

References

    1. Attar N. The allure of the epigenome. Genome Biol. 2012;13:419. doi: 10.1186/gb-2012-13-10-419. - DOI - PMC - PubMed
    1. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, Delano D, Zhang L, Schroth GP, Gunderson KL, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98:288–95. doi: 10.1016/j.ygeno.2011.07.007. - DOI - PubMed
    1. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, Esteller M. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2011;6:692–702. doi: 10.4161/epi.6.6.16196. - DOI - PubMed
    1. Joubert BR, Håberg SE, Nilsen RM, Wang X, Vollset SE, Murphy SK, Huang Z, Hoyo C, Midttun Ø, Cupul-Uicab LA, et al. 450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy. Environ Health Perspect. 2012;120:1425–31. doi: 10.1289/ehp.1205412. - DOI - PMC - PubMed
    1. Shen J, Wang S, Zhang YJ, Wu HC, Kibriya MG, Jasmine F, et al. Exploring genome-wide DNA methylation profiles altered in hepatocellular carcinoma using Infinium HumanMethylation 450 BeadChips. Epigenetics 2013; 8:0–1. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources