Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays - PubMed (original) (raw)

Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays

Martin J Aryee et al. Bioinformatics. 2014.

Abstract

Motivation: The recently released Infinium HumanMethylation450 array (the '450k' array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years.

Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods.

Availability and implementation: http://bioconductor.org/packages/release/bioc/html/minfi.html.

Contact: khansen@jhsph.edu; rafa@jimmy.harvard.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Beta density estimates for a typical sample showing type I (solid) and type II (dashed) loci located in CGIs, CGI shores, CGI shelves and open sea regions

Fig. 2.

Fig. 2.

Illustration of locus-collapsing procedure for block finding. Loci in CpG islands, shores, shelves and open sea regions are represented by green, orange, purple and pink, respectively. (A) The boxes represent locus groups, each of which is collapsed to a single mean methylation value. We group loci within the same CGI, the same CGI shore or the same CGI shelf, as well as adjacent open sea probes that are within 500 bp of each other. (B) The first row of points shows the midpoints of collapsed open sea clusters. These are grouped into long-range clusters and used for block finding. The second row of points shows all collapsed clusters across all region types with color representing region type

Fig. 3.

Fig. 3.

Accuracy and precision assessment of preprocessing algorithms. (A) For each locus, we compute the average and standard deviation across liver technical samples. The resulting loess curve fitted to the standard deviation versus average scatterplot for each method is shown. (B) Using the same samples, we compute the average difference between liver and placenta (effect size) for each locus. We then plot the resulting effect sizes for each preprocessing method against effect sizes from the default Illumina procedure

Fig. 4.

Fig. 4.

Quality assessment plots based on the blood sample dataset. (A) A multidimensional scaling plot. Color represents reported ethnicity. (B) Scatterplot of median Unmeth signal versus median Meth signal value for each sample. Points outside the dashed lines represent cases were the differences are >0.5. (C) Beta density plots for all samples with black curves representing samples where the average of the median Unmeth and Meth is <11.5

Fig. 5.

Fig. 5.

DMRs associate more strongly with gene expression than methylation differences at single CpGs, as observed in a dataset of normal lung and colon samples. (A) An example of a tissue-DMR, identified by bumphunter. The 15 CpGs in the region show concordant methylation differences. (B) An example of a significant tissue-DMP, identified by a locus-level limma model. Note that the CpG probes adjacent to the DMP do not show a methylation difference. (C) Between-tissue differential expression is greater for genes with a DMR located within 2 kb of the transcriptional start site (left) than for genes with a DMP located within 2 kb of the transcriptional start site (right). (D) A greater fraction of DMRs is located close to DEG promoters than are DMPs

Fig. 6.

Fig. 6.

Large regions of hypomethylation in colon cancer are reliably identified by minfi. We used the block finding method on 450k data for colon cancer and matched normal samples from the TCGA project. The top (A) shows smoothed estimates of average methylation at the collapsed locus level in the region plotted as Figure 2a in Hansen et al. (2011). Loss of methylation in tumor is clearly observed in this region. The second panel shows the methylation difference between cancer and normal. Dots indicate the probe clusters used in the block finder algorithm, which ignores clusters corresponding to CpG islands, shores or shelves. The smoothed methylation difference used for segmentation is also plotted. The gap in this smooth curve results from large genomic distances between probe clusters over which no smoothing is performed. The bottom panel shows the minfi segmentation of the cluster-level measurements, with blue indicating blocks of significant hypomethylation. The bottom track are the blocks of methylation difference defined from whole-genome bisulfite sequencing in Hansen et al. (2011) (B) Hypomethylation regions identified by minfi consistently overlap hypomethylation blocks identified in Hansen et al. (2011)

Similar articles

Cited by

References

    1. Aryee MJ, et al. Accurate genome-scale percentage DNA methylation estimates from microarray data. Biostatistics. 2011;12:197–210. - PMC - PubMed
    1. Berman BP, et al. Regions of focal DNA hypermethylation and long-range hypomethylation in colorectal cancer coincide with nuclear lamina-associated domains. Nat. Genet. 2012;44:40–46. - PMC - PubMed
    1. Bibikova M, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98:288–295. - PubMed
    1. Bolstad BM, et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. - PubMed
    1. Chambers JM. Programming with Data: A Guide to the S Language. New York: Springer; 1998.

Publication types

MeSH terms

Grants and funding

LinkOut - more resources