Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity (original) (raw)

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Sequence Read Archive

References

  1. Jones, P.A. Nat. Rev. Genet. 13, 484–492 (2012).
    Article CAS Google Scholar
  2. Smith, Z.D. & Meissner, A. Nat. Rev. Genet. 14, 204–220 (2013).
    Article CAS Google Scholar
  3. Jaitin, D.A. et al. Science 343, 776–779 (2014).
    Article CAS Google Scholar
  4. Deng, Q. et al. Science 343, 193–196 (2014).
    Article CAS Google Scholar
  5. Macaulay, I.C. & Voet, T. PLoS Genet. 10, e1004126 (2014).
    Article Google Scholar
  6. Lee, H.J. et al. Cell Stem Cell 14, 710–719 (2014).
    Article CAS Google Scholar
  7. Miura, F. et al. Nucleic Acids Res. 40, e136 (2012).
    Article CAS Google Scholar
  8. Shirane, K. et al. PLoS Genet. 9, e1003439 (2013).
    Article CAS Google Scholar
  9. Chambers, I. et al. Nature 450, 1230–1234 (2007).
    Article CAS Google Scholar
  10. Islam, S. et al. Nat. Methods 11, 163–166 (2014).
    Article CAS Google Scholar
  11. Hayashi, K. et al. Cell Stem Cell 3, 391–401 (2008).
    Article CAS Google Scholar
  12. Torres-Padilla, M.E. & Chambers, I. Development 141, 2173–2181 (2014).
    Article CAS Google Scholar
  13. Ficz, G. et al. Cell Stem Cell 13, 351–359 (2013).
    Article CAS Google Scholar
  14. Habibi, E. et al. Cell Stem Cell 13, 360–369 (2013).
    Article CAS Google Scholar
  15. Stadler, M.B. et al. Nature 480, 490–495 (2011).
    Article CAS Google Scholar
  16. Ziller, M.J. et al. Nature 500, 477–481 (2013).
    Article CAS Google Scholar
  17. Hon, G.C. et al. Nat. Genet. 45, 1198–1206 (2013).
    Article CAS Google Scholar
  18. Guo, H. et al. Genome Res. 23, 2126–2135 (2013).
    Article CAS Google Scholar
  19. Smallwood, S.A. et al. Nat. Genet. 43, 811–814 (2011).
    Article CAS Google Scholar
  20. Quail, M.A. et al. Nat. Methods 9, 10–11 (2012).
    Article CAS Google Scholar
  21. Krueger, F. & Andrews, S.R. Bioinformatics 27, 1571–1572 (2011).
    Article CAS Google Scholar
  22. Illingworth, R.S. et al. PLoS Genet. 6, e1001134 (2010).
    Article Google Scholar
  23. Creyghton, M.P. et al. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).
    Article CAS Google Scholar
  24. Li, Y. et al. PLoS Biol. 8, e1000533 (2010).
    Article Google Scholar
  25. Bock, C. et al. Mol. Cell 47, 633–647 (2012).
    Article CAS Google Scholar

Download references

Acknowledgements

We thank K. Tabbada and the Welcome Trust Sanger Institute sequencing pipeline team for assistance with Illumina sequencing, R. Walker for assistance with flow cytometry, T. Hore (Babraham Institute, Cambridge, UK) for providing ESCs maintained in 2i medium and serum conditions, and T. Hore, J. Huang, I. Macaulay, S. Lorenz, M. Quail, T. Voet and H. Swerdlow for helpful discussions. This work was supported by the UK Biotechnology and Biological Sciences Research Council grant BB/J004499/1, UK Medical Research Council grant MR/K011332/1, Wellcome Trust award 095645/Z/11/Z and EU FP7 EpiGeneSys and BLUEPRINT.

Author information

Author notes

  1. Sébastien A Smallwood and Heather J Lee: These authors contributed equally to this work.
  2. Wolf Reik and Gavin Kelsey: These authors jointly directed this work.

Authors and Affiliations

  1. Epigenetics Programme, Babraham Institute, Cambridge, UK
    Sébastien A Smallwood, Heather J Lee, Heba Saadeh, Julian Peat, Wolf Reik & Gavin Kelsey
  2. Wellcome Trust Sanger Institute, Cambridge, UK
    Heather J Lee & Wolf Reik
  3. European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, UK
    Christof Angermueller & Oliver Stegle
  4. Bioinformatics Group, Babraham Institute, Cambridge, UK
    Felix Krueger & Simon R Andrews
  5. Centre for Trophoblast Research, University of Cambridge, Cambridge, UK
    Wolf Reik & Gavin Kelsey

Authors

  1. Sébastien A Smallwood
    You can also search for this author inPubMed Google Scholar
  2. Heather J Lee
    You can also search for this author inPubMed Google Scholar
  3. Christof Angermueller
    You can also search for this author inPubMed Google Scholar
  4. Felix Krueger
    You can also search for this author inPubMed Google Scholar
  5. Heba Saadeh
    You can also search for this author inPubMed Google Scholar
  6. Julian Peat
    You can also search for this author inPubMed Google Scholar
  7. Simon R Andrews
    You can also search for this author inPubMed Google Scholar
  8. Oliver Stegle
    You can also search for this author inPubMed Google Scholar
  9. Wolf Reik
    You can also search for this author inPubMed Google Scholar
  10. Gavin Kelsey
    You can also search for this author inPubMed Google Scholar

Contributions

S.A.S. and H.J.L. designed the study, prepared scBS-seq libraries, analyzed data and wrote the manuscript. F.K., H.S. and S.R.A. performed sequence mapping and analyzed data. J.P. contributed to technical developments. C.A. and O.S. analyzed data. O.S. provided advice on statistical analyses. W.R. and G.K. supervised the study and wrote the manuscript.

Corresponding authors

Correspondence toWolf Reik or Gavin Kelsey.

Ethics declarations

Competing interests

W.R. is a consultant to Cambridge Epigenetix Ltd.

Integrated supplementary information

Supplementary Figure 1 Quality control of scBS-seq libraries.

(a) Mapping efficiency of scBS-Seq samples and negative controls. Boxplot representation of the mapping efficiencies (on sequences obtained after trimming and mapping against human genome) for each single cell and negative control (red crosses represent individual cell values). The overall higher mapping efficiency of oocytes versus ESCs can be explained by the amount of DNA in each cells (4n for MII oocytes and 2n for ESCs), resulting in a relatively lower contribution of spurious sequences in MIIs (see Supplementary Fig. 2). All negative controls had less than 3.5% mapping efficiency (the dashed line indicates 5% mapping efficiency). (b) Visualization of scBS-Seq library fragment size distribution on the Bioanalyser platform. The Bioanalyser trace of library MII#1 is shown as an example.

Supplementary Figure 2 Contribution of spurious sequences to scBS-seq mapping efficiency.

(a) The relatively low mapping efficiency of scBS-Seq is associated with a significant fraction of sequences mapping at multiple genomic locations, which are therefore discarded. (b) Analysis of the G+C content of the raw sequences (i.e. prior to mapping) of scBS-Seq libraries revealed many with <3% G+C, absent from bulk samples. These correspond to poly-T stretches (poly-Ts) (i.e., (T)N with N>50). Poly-Ts are present in both actual samples and corresponding negative controls suggesting a contaminant as their main source of origin. (c,d) The amount of poly-Ts is higher in ESCs than oocytes, and the percentage of sequences with poly-Ts and sequences with multiple alignments are tightly correlated across samples. (e) This suggests that poly-Ts are the major cause for scBS-Seq low mapping efficiency. To test this, we trimmed, from the raw fasq file, sequences containing poly-Ts of at least 50 bp in size and repeated the mapping. This resulted in a drastic reduction in the percentage of sequences with multiple alignments and an increase in the percentage of sequences with unique alignments. Poly-Ts are inherent to our current methodology, and while alternative protocols we developed do not generate these artifacts, they still yield significantly fewer measured CpGs.

Supplementary Figure 3 Saturation level of scBS-seq libraries.

For each individual MII scBS-Seq library and one representative example of bulk BS-Seq (PBAT), the percentage of informative CpGs is plotted for 10% increments of mapped sequences. This demonstrates that in contrast to the bulk BS-Seq example (black line), MIIs scBS-Seq libraries (colored lines) have not reached the plateau of saturating sequencing depth, indicating that further sequencing would yield additional information. MII#2 Deep Seq and MII#5 Deep Seq correspond to the deeper sequencing of these libraries (see main text and Supplementary Table 1).

Supplementary Figure 4 scBS-seq generates a digital output of DNA methylation.

(a) For each single MII BS-Seq library, and for the bulk MII sample, CpGs were grouped based on their read depth. The proportion of CpGs in each group with a methylation value of either 0% or 100% (digital output) was calculated for each sample. The boxplot represents the results from all 12 single MII libraries. The results from the bulk MII sample are superimposed as solid blue circles. As expected, the proportion of digital CpGs in the scBS-Seq libraries was very high (>90% for read depth 2-5 in all cells, dashed line). In contrast, the bulk sample had fewer digital CpGs (66% at read depth 5) due to cell-to-cell variability within the population. (b) Histograms of the distribution of CpG methylation values for MII bulk and MII single cells for CpGs with at least 2 reads.

Supplementary Figure 5 CpG concordance obtained from MIIs and ESCs using scBS-seq.

(a) CpG concordance was calculated for each cell pair as the proportion of overlapping CpGs with identical methylation state. On average, 1.8 M CpGs were measured for each pairwise analysis. Within each cell types, the order from bottom – up is the same than in Supplementary Table 1 (For oocytes bottom sample is MII#1 and top sample is MII#12). (b) Pearson correlation matrix of MIIs, 2i ESCs and serum ESCs scBS-Seq was calculated using 2 kb window methylation values.

Supplementary Figure 6 scBS-seq accurately determines CpG island (CGI) methylation status in MII oocytes.

(a) Heatmap displaying in individual MII libraries the methylation level of CGIs identified as methylated (>80%) and unmethylated (<20%; random selection) in bulk. The number on top indicates the number of individual MIIs in which CGIs are commonly informative. The discrepancy between the number of methylated and unmethylated CGIs informative across single cells reflects the different CpG density between these 2 groups as previously described19. (b) Histogram displaying for MII bulk and individual MII libraries the percentage of total CGIs (23,020) found methylated, unmethylated, with an intermediate level of methylation, and the percentage of wrong calls (i.e., CGI methylated in bulk (>80%) and called unmethylated (<20%) in single cells, and vice versa). (c) Boxplot presenting the methylation level in each individual MII of CGIs found methylated in bulk (>80%). The percentage of these CGIs informative in each MII with a methylation level lower than 80% is shown below the plot. (d) Similar to (c) for unmethylated CGIs (<20%).

Supplementary Figure 7 scBS-seq provides information on all genomic contexts.

(a) Snapshot displaying read distribution across 61 Mbp of chromosome 19. Below the annotation tracks are displayed the mapped reads and the quantification (number of reads per 25 kb window (log)). (b) The representation of different genomic contexts in single cell and bulk libraries is shown as fold enrichment over the expected value (dashed line). The boxplot represents the values for all single cell samples, and the bulk samples are superimposed as blue diamonds (MII), purple crosses (serum ESCs) and red plus signs (2i ESCs).

Supplementary Figure 8 Union and intersect for scBS-seq libraries.

Number of CpGs (a) and CGIs (b) for the union and intersect of all possible combinations of the 12 individual MII scBS-Seq libraries. The union shows that pooling data from multiple scBS-Seq samples increases the number of measured sites. The intersect shows that the number of measured sites common to multiple scBS-Seq datasets decreases as the number of datasets increases. Dotted lines show the information obtained in standard BS-Seq experiments as well as the number of CpGs and CGIs in the mouse genome.

Supplementary Figure 9 scBS-seq snapshot of the imprinted locus Plagl1.

The imprinted Plagl1 locus (top) and Plagl1 maternal DMR/CGI (bottom) is shown for all 12 individual MIIs, MIIs merged and MII bulk. Quantification is absolute level of methylation (%), at individual CpG resolution, as indicated on the scale on the left of each sample (0 is 0% methylation, 1 is 100% methylation).

Supplementary Figure 10 Comparison of cluster analyses for ESCs.

Cluster dendrograms are shown for (a) genome-wide methylation estimates (equivalent to the dendrogram shown in Figure 2b) and (b) the top 300 most variable sites among single ESC samples (equivalent to the dendrogram shown in Figure 2c). The cell IDs are included for direct comparison between dendrograms. (c) The distance matrix for the 300 most variable sites is grossly similar to that for all sites (Figure 2b). Cells are presented in the order shown in (b).

Supplementary Figure 11 Cluster dendrogram and distance matrix for the most variable sites in ESCs.

The top 300 ranked most variable sites in ESCs show similar methylation patterns across ESCs, as indicated by the low distance between sites.

Supplementary Figure 12 Detailed variance analysis for different genomic contexts.

(a) Receiver Operating Characteristic (ROC) curves showing the fraction of annotated sites (sensitivity) versus the fraction of non-annotated sites (1-specificity). Sites with high variance are more likely to belong to a given genomic context if the ROC curve is above the diagonal (e.g. H3K4me1), and less likely to belong to genomic contexts if the ROC curve is below the diagonal (e.g. CGI). (b) Different genomic contexts have different mean methylation values. (c) For most genomic contexts, variance was greatest for sites with mean methylation rates close to 50%. H3K27ac and H3K4me1 sites were among the most variable, even after accounting for mean methylation rate. CGI and p300 sites with intermediate mean methylation rates were also highly variable.

Supplementary Figure 13 Comparison of scRRBS and scBS-seq in MII oocytes.

(a) Summary table showing the number of raw sequences, informative CpGs and CGIs. For scRRBS, the number of CpG dinucleotides and the number of informative CGIs were calculated using the methylation calls present in the.bed file of GEO accession number GSE47343 from Guo et al.18. (b) Plots showing the number of raw sequences generated and the corresponding number of CpGs obtained in MII oocytes for both methods.

Supplementary information

Rights and permissions

About this article

Cite this article

Smallwood, S., Lee, H., Angermueller, C. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity.Nat Methods 11, 817–820 (2014). https://doi.org/10.1038/nmeth.3035

Download citation