methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles - PubMed (original) (raw)

methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles

Altuna Akalin et al. Genome Biol. 2012.

Abstract

DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Flowchart of possible operations by methylKit. A summary of the most important_methylKit_ features is shown in a flow chart. It depicts the main features of _methylKit_and the sequential relationship between them. The functions that could be used for those features are also printed in the boxes.

Figure 2

Figure 2

Descriptive statistics per sample. (a) Histogram of %methylation per cytosine for ER+ T47D sample. Most of the bases have either high or low methylation. (b) Histogram of read coverage per cytosine for ER+ T47D sample. ER+, estrogen receptor-alpha expressing.

Figure 3

Figure 3

Scatter plots for sample pairs. Scatter plots of %methylation values for each pair in seven breast cancer cell lines. Numbers on upper right corner denote pair-wise Pearson's correlation scores. The histograms on the diagonal are %methylation histograms similar to Figure 2a for each sample.

Figure 4

Figure 4

Sample clustering. (a) Hierarchical clustering of seven breast cancer methylation profiles using 1-Pearson's correlation distance. (b) Principal Component Analysis (PCA) of seven breast cancer methylation profiles, plot shows principal component 1 and principal component 2 for each sample. Samples closer to each other in principal component space are similar in their methylation profiles.

Figure 5

Figure 5

Visualizing differential methylation events. (a) Horizontal bar plots show the number of hyper- and hypomethylation events per chromosome, as a percent of the sites with the minimum coverage and differential. By default this is a 25% change in methylation and all samples with 10X coverage. (b) Example of bedgraph file uploaded to UCSC browser. The bedraph file is for differentially methylated CpGs with at least a 25% difference and q-value <0.01. Hyper- and hypo-methylated bases are color coded. The bar heights correspond to % methylation difference between ER+ and ER- sets. ER+, estrogen receptor-alpha expressing; ER-, estrogen receptor-alpha non-expressing. UCSC, University of California Santa Cruz.

Figure 6

Figure 6

Annotation of differentially methylated CpGs. (a) Distance to TSS for differentially methylated CpGs are plotted from ER+ versus ER- analysis. (b) Pie chart showing percentages of differentially methylated CpGs on promoters, exons, introns and intergenic regions. (c) Pie chart showing percentages of differentially methylated CpGs on CpG islands, CpG island shores (defined as 2kb flanks of CpG islands) and other regions outside of shores and CpG islands. (d) Pie chart showing percentages of differentially methylated CpGs on enhancers and other regions. ER+, estrogen receptor-alpha expressing; ER-, estrogen receptor-alpha non-expressing, TSS, transcription start site.

Similar articles

Cited by

References

    1. Deaton AM, Bird A. CpG islands and the regulation of transcription. Genes Dev. 2011;25:1010–2210. doi: 10.1101/gad.2037511. - DOI - PMC - PubMed
    1. Suzuki MM, Bird A. DNA methylation landscapes: provocative insights from epigenomics. Nat Rev Genet. 2008;9:465–476. - PubMed
    1. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo Q-M, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514. - DOI - PMC - PubMed
    1. Bird AP, Wolffe AP. Methylation-induced repression--belts, braces, and chromatin. Cell. 1999;99:451–454. doi: 10.1016/S0092-8674(00)81532-9. - DOI - PubMed
    1. Hendrich B, Bird A. Identification and characterization of a family of mammalian methyl-CpG binding proteins. Mol Cell Biol. 1998;18:6538–6547. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources