Genome-wide characterization of the routes to pluripotency (original) (raw)
Accession codes
Primary accessions
European Nucleotide Archive
Sequence Read Archive
Data deposits
Sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under accession number SRP046744 for all RNA-seq and ChIP-seq experiments, and in the European Bioinformatics Institute under the European Nucleotide Archive (ENA) accession number ERP004116 for MethylC-sequencing. The global and cell surface mass spectrometry proteomics raw data have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository under data set identifiers PXD000413 and PXD001456, respectively.
Change history
10 December 2014
A minor addition was made to the Acknowledgements in the HTML and PDF versions.
References
- Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006)
Article CAS PubMed Google Scholar - Mikkelsen, T. S. et al. Dissecting direct reprogramming through integrative genomic analysis. Nature 454, 49–55 (2008)
Article ADS CAS PubMed PubMed Central Google Scholar - Graf, T. & Enver, T. Forcing cells to change lineages. Nature 462, 587–594 (2009)
Article ADS CAS PubMed Google Scholar - Tonge, P. D. et al. Divergent reprogramming routes lead to alternative stem-cell states. Nature http://dx.doi.org/10.1038/nature14047 (this issue)
- Samavarchi-Tehrani, P. et al. Functional genomics reveals a BMP-driven mesenchymal-to-epithelial transition in the initiation of somatic cell reprogramming. Cell Stem Cell 7, 64–77 (2010)
Article CAS PubMed Google Scholar - Polo, J. M. et al. A molecular roadmap of reprogramming somatic cells into iPS cells. Cell 151, 1617–1632 (2012)
Article CAS PubMed PubMed Central Google Scholar - Golipour, A. et al. A late transition in somatic cell reprogramming requires regulators distinct from the pluripotency network. Stem Cells 11, 769–782 (2012)
CAS Google Scholar - O’Malley, J. et al. High-resolution analysis with novel cell-surface markers identifies routes to iPS cells. Nature 499, 88–91 (2013)
Article ADS PubMed PubMed Central CAS Google Scholar - Nagy, A. Secondary cell reprogramming systems: as years go by. Curr. Opin. Genet. Dev. 23, 534–539 (2013)
Article CAS PubMed Google Scholar - Woltjen, K. et al. piggyBac transposition reprograms fibroblasts to induced pluripotent stem cells. Nature 458, 766–770 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar - Buganim, Y. et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209–1222 (2012)
Article CAS PubMed PubMed Central Google Scholar - Belteki, G. et al. Conditional and inducible transgene expression in mice through the combinatorial use of Cre-mediated recombination and tetracycline induction. Nucleic Acids Res. 33, e51 (2005)
Article PubMed PubMed Central Google Scholar - Wells, C. A. et al. Stemformatics: visualisation and sharing of stem cell gene expression. Stem Cell Res. 10, 387–395 (2013)
Article CAS PubMed Google Scholar - Clancy, J. L. et al. Small RNA changes en route to distinct cellular states of induced pluripotency. Nature Commun. http://dx.doi.org/10.1038/ncomms6522 (2014)
- Benevento, M. et al. Proteome adaptation in cell reprogramming proceeds via distinct transcriptional networks. Nature Commun. http://dx.doi.org/10.1038/ncomms6613 (2014)
- Polo, J. M. et al. Cell type of origin influences the molecular and functional properties of mouse induced pluripotent stem cells. Nature Biotechnol. 28, 848–855 (2010)
Article CAS Google Scholar - Ohi, Y. et al. Incomplete DNA methylation underlies a transcriptional memory of somatic cells in human iPS cells. Nature Cell Biol. 13, 541–549 (2011)
Article CAS PubMed Google Scholar - Schug, J. et al. Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol. 6, R33 (2005)
Article PubMed PubMed Central CAS Google Scholar - Li, R. et al. A mesenchymal-to-epithelial transition initiates and is required for the nuclear reprogramming of mouse fibroblasts. Cell Stem Cell 7, 51–63 (2010)
Article CAS PubMed Google Scholar - Kojima, Y. et al. The transcriptional and functional properties of mouse epiblast stem cells resemble the anterior primitive streak. Cell Stem Cell 14, 107–120 (2014)
Article CAS PubMed Google Scholar - Li, B., Carey, M. & Workman, J. L. The role of chromatin during transcription. Cell 128, 707–719 (2007)
Article CAS PubMed Google Scholar - Simon, J. A. & Kingston, R. E. Occupying chromatin: polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol. Cell 49, 808–824 (2013)
Article CAS PubMed PubMed Central Google Scholar - Mansour, A. A. et al. The H3K27 demethylase Utx regulates somatic and germ cell epigenetic reprogramming. Nature 488, 409–413 (2012)
Article ADS CAS PubMed Google Scholar - Pereira, C. F. et al. ESCs require PRC2 to direct the successful reprogramming of differentiated cells toward pluripotency. Cell Stem Cell 6, 547–556 (2010)
Article CAS PubMed Google Scholar - Wong, J. J.-L. et al. Orchestrated intron retention regulates normal granulocyte differentiation. Cell 154, 583–595 (2013)
Article CAS PubMed Google Scholar - Fadloun, A. et al. Chromatin signatures and retrotransposon profiling in mouse embryos reveal regulation of LINE-1 by RNA. Nature Struct. Mol. Biol. 20, 332–338 (2013)
Article CAS Google Scholar - Tang, S.-J. Chromatin organization by repetitive elements (CORE): a genomic principle for the higher-order structure of chromosomes. Genes 2, 502–515 (2011)
Article CAS PubMed PubMed Central Google Scholar - Lunyak, V. V. et al. Developmentally regulated activation of a SINE B2 repeat as a domain boundary in organogenesis. Science 317, 248–251 (2007)
Article ADS CAS PubMed Google Scholar - Rebollo, R., Romanish, M. T. & Mager, D. L. Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu. Rev. Genet. 46, 21–42 (2012)
Article CAS PubMed Google Scholar - Bernstein, B. E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006)
Article CAS PubMed Google Scholar - Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007)
Article ADS CAS PubMed PubMed Central Google Scholar - Jørgensen, H. F. et al. Stem cells primed for action: polycomb repressive complexes restrain the expression of lineage-specific regulators in embryonic stem cells. Cell Cycle 5, 1411–1414 (2006)
Article PubMed Google Scholar - Voigt, P. et al. Asymmetrically modified nucleosomes. Cell 151, 181–193 (2012)
Article CAS PubMed PubMed Central Google Scholar - Schmitges, F. W. et al. Histone methylation by PRC2 is inhibited by active chromatin marks. Mol. Cell 42, 330–341 (2011)
Article CAS PubMed Google Scholar - Yuan, W. et al. H3K36 methylation antagonizes PRC2-mediated H3K27 methylation. J. Biol. Chem. 286, 7983–7989 (2011)
Article CAS PubMed PubMed Central Google Scholar - Voigt, P., Tee, W. W. & Reinberg, D. A double take on bivalent promoters. Genes Dev. 27, 1318–1338 (2013)
Article CAS PubMed PubMed Central Google Scholar - Lee, D.-S. et al. DNA methylation as a reprogramming modulator: an epigenomic roadmap to induced pluripotency. Nature Commun. http://dx.doi.org/10.1038/ncomms6619 (2014)
- Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotechnol. 28, 503–510 (2010)
Article CAS Google Scholar - Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011)
Article CAS PubMed PubMed Central Google Scholar - Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar - Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar - Guttman, M. et al. lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295–300 (2011)
Article ADS CAS PubMed PubMed Central Google Scholar - Behringer, R. R., Gertsenstein, M., Nagy-Vintersten, K. & Nagy, A. Manipulating the Mouse Embryo: A Laboratory Manual (Cold Spring Harbor Laboratory Press, 2013)
Google Scholar - Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010)
Article CAS PubMed Google Scholar - Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007)
Article PubMed PubMed Central Google Scholar - Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012)
Article CAS PubMed PubMed Central Google Scholar - Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013)
Article CAS PubMed PubMed Central Google Scholar - Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010)
Article PubMed PubMed Central CAS Google Scholar - O’Geen, H., Echipare, L. & Farnham, P. J. in Epigenetics Protocols 791, 265–286 (Humana, 2011)
Book Google Scholar - Gaspar-Maia, A. et al. Chd1 regulates open chromatin and pluripotency of embryonic stem cells. Nature 460, 863–868 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar - Wang, T. et al. The histone demethylases Jhdm1a/1b enhance somatic cell reprogramming in a vitamin-C-dependent manner. Cell Stem Cell 9, 575–587 (2011)
Article CAS PubMed Google Scholar - Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. & Pachter, L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 12, R22 (2011)
Article CAS PubMed PubMed Central Google Scholar - Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nature Protocols 7, 1728–1740 (2012)
Article CAS PubMed Google Scholar - Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010)
Article CAS PubMed PubMed Central Google Scholar - Shen, L., Shao, N., Liu, X. & Nestler, E. ngs.plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genomics 15, 284 (2014)
Article PubMed PubMed Central CAS Google Scholar - Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011)
Article CAS PubMed PubMed Central Google Scholar - Gauci, S. et al. Lys-N and trypsin cover complementary parts of the phosphoproteome in a refined SCX-based approach. Anal. Chem. 81, 4493–4501 (2009)
Article CAS PubMed Google Scholar - Wollscheid, B. et al. Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins. Nature Biotechnol. 27, 378–386 (2009)
Article CAS Google Scholar - Kislinger, T. et al. PRISM, a generic large scale proteomic investigation strategy for mammals. Mol. Cell. Proteomics 2, 96–106 (2003)
Article CAS PubMed Google Scholar
Acknowledgements
We thank M. Gertsenstein and M. Pereira for chimaera production, C. Monetti for cell culture, R. Cowling for DNA purification, and K. Harpal for chimaera embryo sectioning and staining. We acknowledge the intellectual contributions of P. P. L. Tam and R. P. Harvey. A.N. is Tier 1 Canada Research Chair in Stem Cells and Regeneration. This work was supported by grants awarded to A.N., I.M.R. and P.W.Z. from the Ontario Research Fund Global Leadership Round in Genomics and Life Sciences grants (GL2-01-028), to A.N. from the Canadian stem cell network (9/5254 (TR3)) and from the Canadian Institutes of Health Research (CIHR MOP102575). This work received support from the Korean Ministry of Knowledge Economy (grant 10037410 to J.-S.S.), from the SNUCM Research Fund (grant 0411-20100074 to J.-S.S.), and from Macrogen Inc. (grant MGR03-11 and MGR03-12). The Stemformatics resource is supported by an Australian Research Council special research grant to Stem Cells Australia (C.A.W. and S.M.G.). The analysis of the miRNA was supported by grants from the National Health and Medical Research Council of Australia (1024852 to J.L.C. and T.P.) and the Australian Research Council (DP1300101928 to T.P.). W.R. is a Cancer Institute of NSW Fellow and with J.E.J.R. receives support from the Cancer Council of NSW and National Health & Medical Research Council (571156 and 1061906). J.E.J.R. receives funding from Cure the Future & Tour de Cure. K.-A.L.C. is supported, in part, by the Wound Management Innovation CRC (established and supported under the Australian Government’s Cooperative Research Centres Program). S.M.G. received support from the Australian Research Council (SR110001002). C.A.W. is a QLD Smart Futures Fellow. M.B., J.M. and A.J.R.H. are supported by the Netherlands Proteomics Centre, and by the European Community’s Seventh Framework Programme (FP7/2007-2013) by the PRIME-XS project grant agreement number 262067. P.W.Z. is the Canada Research Chair in Stem Cell Bioengineering. S.M.I.H. received a fellowship from the McEwen Centre of Regenerative Medicine.
Author information
Author notes
- Javier Munoz & Kim-Anh Lê Cao
Present address: †Present addresses: Proteomics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain (J.M.); The University of Queensland Diamantina Institute, Translational Research Institute, 37 Kent Street, Princess Alexandra Hospital, Brisbane, Queensland 4102, Australia (K.-A.L.C.)., - Samer M. I. Hussein, Mira C. Puri and Peter D. Tonge: These authors contributed equally to this work.
Authors and Affiliations
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada,
Samer M. I. Hussein, Mira C. Puri, Peter D. Tonge, Andrew J. Corso, Mira Li, Ian M. Rogers & Andras Nagy - Department of Medical Biophysics, University of Toronto, Toronto, Ontario M5T 3H7, Canada,
Mira C. Puri - Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands,
Marco Benevento, Javier Munoz & Albert J. R. Heck - Netherlands Proteomics Centre, Padualaan 8, 3584CH Utrecht, The Netherlands,
Marco Benevento, Javier Munoz & Albert J. R. Heck - Institute of Medical Science, University of Toronto, Toronto, Ontario M5T 3H7, Canada,
Andrew J. Corso & Andras Nagy - Genome Biology Department, The John Curtin School of Medical Research, The Australian National University, Acton (Canberra), ACT 2601, Australia,
Jennifer L. Clancy, Hardip R. Patel & Thomas Preiss - Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, 4072, Queensland, Australia
Rowland Mosbergen, Othmar Korn & Christine A. Wells - Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul 110-799, South Korea,
Dong-Sung Lee, Jong-Yeon Shin, Jong-Il Kim & Jeong-Sun Seo - Department of Biomedical Sciences and Biochemistry, Seoul National University College of Medicine, Seoul 110-799, South Korea,
Dong-Sung Lee, Jong-Il Kim & Jeong-Sun Seo - Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, The University of Queensland, St Lucia, 4072, Queensland, Australia
Nicole Cloonan, David L. A. Wood, Maely E. Gauthier, Kim-Anh Lê Cao & Sean M. Grimmond - Gene and Stem Cell Therapy Program and Bioinformatics Lab, Centenary Institute, Camperdown 2050, NSW, Australia & Sydney Medical School, 31 University of Sydney 2006, New South Wales, Australia
Robert Middleton, William Ritchie & John E. J. Rasko - Genome Discovery Unit, The John Curtin School of Medical Research, The Australian National University, Acton (Canberra) 2601, ACT, Australia,
Hardip R. Patel - Institute of Biomaterials and Biomedical Engineering (IBBME), University of Toronto, Toronto M5S-3G9, Canada,
Carl A. White, Nika Shakiba & Peter W. Zandstra - The Donnelly Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto M5S 3E1, Canada,
Carl A. White & Peter W. Zandstra - Life Science Institute, Macrogen Inc., Seoul 153-781, South Korea,
Jong-Yeon Shin & Jeong-Sun Seo - Department of Systems & Computational Biology, Albert Einstein College of Medicine of Yeshiva University, Bronx, 10461, New York, USA
Jessica C. Mar - Cell and Molecular Therapies, Royal Prince Alfred Hospital, Camperdown 2050, New South Wales, Australia,
John E. J. Rasko - College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8TA, UK,
Christine A. Wells - Victor Chang Cardiac Research Institute, Darlinghurst (Sydney), New South Wales 2010, Australia,
Thomas Preiss - Department of Physiology, University of Toronto, Toronto, Ontario M5S 1A8, Canada,
Ian M. Rogers - Department of Obstetrics and Gynaecology, University of Toronto, Toronto, Ontario M5S 1E2, Canada,
Ian M. Rogers & Andras Nagy - QIMR Berghofer Medical Research Institute, Genomic Biology Lab, 300 Herston Road, Herston, 4006, Queensland, Australia
Nicole Cloonan
Authors
- Samer M. I. Hussein
- Mira C. Puri
- Peter D. Tonge
- Marco Benevento
- Andrew J. Corso
- Jennifer L. Clancy
- Rowland Mosbergen
- Mira Li
- Dong-Sung Lee
- Nicole Cloonan
- David L. A. Wood
- Javier Munoz
- Robert Middleton
- Othmar Korn
- Hardip R. Patel
- Carl A. White
- Jong-Yeon Shin
- Maely E. Gauthier
- Kim-Anh Lê Cao
- Jong-Il Kim
- Jessica C. Mar
- Nika Shakiba
- William Ritchie
- John E. J. Rasko
- Sean M. Grimmond
- Peter W. Zandstra
- Christine A. Wells
- Thomas Preiss
- Jeong-Sun Seo
- Albert J. R. Heck
- Ian M. Rogers
- Andras Nagy
Contributions
S.M.I.H., M.C.P., P.D.T. and A.N. conceived, designed and carried out most of the experiments, interpreted results and wrote the manuscript. P.W.Z. contributed to study design. T.P., C. A. Wells, I.M.R., P.W.Z., C. A. White, N.S., A.J.C. and J.C.M. assisted with data interpretation and manuscript writing. M.L., S.M.I.H. and M.C.P. performed ChIP. M.C.P., S.M.I.H., N.C., O.K., D.L.A.W., M.E.G. and S.M.G. produced and analysed RNA-seq data. S.M.I.H., D.-S.L., M.C.P., J.-Y.S., J.-I.K. and J.-S.S. produced and analysed MethylC-seq and ChIP-seq data. J.E.J.R, W.R. and R.Mi. performed the IR analysis, interpretation and contributed to the manuscript writing. C. A. Wells, R.Mo., O.K., K.-A.LC. and J.C.M. provided support for bioinformatics analyses and data visualization. M.B., J.M. and A.J.R.H. performed the LC-MS analysis and proteomic data analysis. H.R.P. mapped the miRNA Next Generation Sequencing (NGS) data and provided support for bioinformatics analyses and data visualization. J.L.C. and T.P. analysed and interpreted the miRNA NGS data. C.A.W. performed the CSC proteomics. C.A.W., N.S. and P.W.Z. analysed CSC proteome data.
Corresponding author
Correspondence toAndras Nagy.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Extended data figures and tables
Extended Data Figure 1 Effects of lowering doxycycline on reprogramming cells.
a, Frequency of doxycycline-independent pluripotent cells obtained when 1B secondary MEFs were reprogrammed in 1,500 ng ml−1 doxycycline until the indicated day. b, Morphology of cells at day 15 after lowering the doxycycline concentration from 1,500 ng ml−1 to levels as indicated on day 8 of reprogramming. c, Clonal efficiency measurement at day 15 of reprogramming after lowering the doxycycline concentration on day 8 to the level indicated. d, e, 1B secondary iPSCs show widespread contribution to all germ layers of chimaeric embryos. Whole-mount view (d) and transverse section of E10.5 diploid chimaera (e). Embryo is representative of n = 6 chimaeric embryos with strong (>75%) iPSC donor cell contribution. h, heart; hg, hindgut; nt, neural tube. Scale bars, 750 μm (d) and 400 µm (e). f, RNA-seq analysis of transgene and endogenous expression levels during reprogramming. CPM, counts per million.
Extended Data Figure 2 Locus-specific sequencing data.
Read coverage histograms representing gene expression and epigenetic status at the genomic loci of selected ESC-associated genes.
Extended Data Figure 3 Hierarchical clustering and principal component analysis (PCA) for multi-omics analyses.
a, Pearson correlation complete linkage hierarchical clustering of long RNA-seq data set. Colour coding indicates the grouping of samples based on clustering. b–d, PCA performed on each platform (10 neighbours for _k_-value nearest neighbour (KNN) imputation). Short RNA-seq platform PCA was performed on miRNAs (b). Long RNA-seq platform PCA was performed on protein-coding transcripts (b). Cell surface proteome PCA represents proteins detected by cell surface focused mass spectrometry analysis (b). c, PCA of global CpG methylation analysis. Red arrow follows the high-doxycycline sample trajectory; black dashed arrow follows D8H through low-doxycycline trajectory. Low-doxycycline samples D21L and D21
are highlighted in blue to indicate that compared to other platforms they do not project with ESC/iPSC (see text for further details). d, H3K4me3, H3K36me3 and H3K27me3 PCAs represent genome-wide enriched regions at annotated genes.
Extended Data Figure 4 Integration of gene expression data from 1B reprogramming and other transcriptome data sets.
a, Distribution of the entropy score of protein-coding gene expression for individual samples (blue) and sample groups (red) indicated as probability density curve. b, Pearson correlation analysis of 1B secondary reprogramming sample protein-coding gene expression with transcriptomes of early embryonic stages and epiblast stem cells (EpiSCs) derived from a range of developmental stages20. c, Pearson correlation analysis of 1B secondary reprogramming sample protein-coding gene expression with transcriptome of sorted secondary reprogramming intermediates8. d, Expression of CD44 and Icam1 markers during 1B reprogramming. Error bars represent standard error of the mean. e, Pearson correlation analysis of 1B reprogramming sample protein-coding gene expression with sorted reprogramming and pluripotent cells from the Col1a1 primary reprogramming system6.
Extended Data Figure 5 Effect of Oct4, Sox2, Klf4 and Myc expression level on reprogramming outcomes.
a, Pearson correlation analysis of RNA-seq data from 1B reprogramming samples and reprogramming clones from ref. 7 that are competent or incompetent to become factor-independent secondary iPSC (SC and SI clones, respectively). b, Transgene and endogenous gene expression determined by RNA-seq for Myc, Pou5f1 (Oct4), Sox2 and Klf4 in SC and SI clones7. Bar graphs represent average expression of doxycycline-treated samples or SC iPSCs. Error bars represent standard error of the mean. Student’s _t_-test was used for statistics. c, PCA of protein-coding stage-specific genes from Fig. 2c, comparing 1B reprogramming samples and secondary reprogramming clones from ref. 7. F-class cells cluster separately from SI and SC clones. Moreover, 1B reprogramming follows a different trajectory than SI and SC clones towards iPSCs. Colour coding indicates the grouping of samples. d, Pearson correlation complete linkage hierarchical clustering of 1B reprogramming samples and SI and SC secondary reprogramming clones. Clustering was performed on protein-coding stage-specific genes and based on FPKM values normalized to the averaged ESC/iPSCs values from the respective study. Heat maps show stage-specific protein-coding gene expression belonging to iPSC/ESC (top heat map) and F-class (bottom heat map) genes. Clusters and genes on the right of each heat map highlight genes that show a different expression pattern between F-class and doxycycline-treated SI clones. For gene lists associated with d, refer to Supplementary Table 1.
Extended Data Figure 6 Global analysis of histone mark and intron retention changes during reprogramming.
a, Intensity plots of genes associated with H3K4me3 (green) and H3K27me3 (red) ±10 kb of annotated TSSs. b, Heat map representation of PRC2 components and histone demethylase expression at the RNA (RNA-seq) and protein level. c, Correlation of gene transcription with protein and intron retention for genes that exhibit intron retention from Fig. 2c. d, Correlation of intron retention, RNA expression and protein level for Kdm6a. e, Violin plots comparing observed and random Pearson correlations of intron retention versus gene FPKM at reprogramming stages. Bars represent average Pearson correlation coefficients. Error bars represent standard error of the mean. Student’s _t_-test was used for statistics. f, Number of expressed transposable elements during reprogramming.
Extended Data Figure 7 Tracking secondary MEF histone mark changes during reprogramming from one sample to another.
a, Pie-chart diagram tracking the histone mark changes using secondary MEF and secondary iPSCs as reference points. Each histone mark is colour coded: H3K4me3, green; H3K4me3H3K27me3, orange; H3K27me3, red; no mark, grey. Loci were tracked from their start (2°MEF) and end (2°iPSCs) histone signatures. b–g, Tracking bar graphs of histone mark changes. The histone mark change is shown at the top of each set of 12 histograms. Bars represent number of genes whose mark changed for the time point indicated at the top of the individual histogram, and which of these genes carry the same mark at the other time points (x axis). For example, in b ‘2°MEF (H3K4me3/H3K27me3→H3K4me3)’ the histogram shows the number of genes that were bivalent in secondary MEFs but changed to H3K4me3 monovalent at another time point. In the case of the small histogram labelled D2H, the black-framed green bar represents the number of loci that showed this change from secondary MEFs at D2H and the bars for all the other samples indicate how many of these D2H loci were also H3K4me3+ in the other samples.
Extended Data Figure 8 Determining expression threshold for defining bivalent loci and bivalency in other reprogramming systems.
a, RNA-seq expression value (log2 of FPKM) distribution (as represented by density curves) of four categories of genes: (1) genes marked by H3K4me3 and H3K36me3 (blue line); (2) genes marked by H3K4me3 alone (green line); (3) genes marked by H3K27me3 alone (red line); and (4) genes marked by H3K4me3 and H3K27me3, but not H3K36me3 (orange line). Genes were combined from all the samples to identify each category. Expression threshold was defined as the 10th percentile expression boundary of genes marked by H3K4me3 and H3K36me3. Genes that were expressed at lower levels than this threshold were considered not expressed in subsequent analyses. b, Assessment of cellular heterogeneity in 1B reprogramming by chromatin mark and expression association of two cell surface markers: CD24 and CD73. Upper scatter plots show H3K27me3 versus H3K36me3 enrichment in individual samples. Lower plot shows percentage of cells expressing each marker for same samples as determined by FACs analysis. Active locus: H3K4me3+H3K36me3+H3K27me3−. Heterogeneous locus: H3K4me3+H3K36me3+H3K27me3+. c, Absolute number (primary y axis) and proportion (secondary y axis) of false (heterogeneous) bivalent loci during secondary reprogramming. the presence of H3K36me3 distinguishes false bivalent loci (H3K4me3+H3K27me3+H3K36me3+) that represent heterogeneity from true bivalent loci that are transcriptionally repressed (H3K36me3−). d, Tracking of histone mark status of secondary MEF heterogeneous loci. Heterogeneous loci resolve into silent and active loci during reprogramming. e, Total number of detected bivalent loci as defined by lack of H3K36me3 mark and expression levels below the threshold as shown in panel a. Dark and light green bar graphs highlight proportion shared among all samples and with secondary MEFs, respectively. f, Sequential addition of novel bivalent marks with respect to stages of reprogramming, as indicated by colours. g, h, Corresponding bivalent loci identified in 1B samples and two independent data sets6,31. i, Tracking of bivalent loci for Polo et al. reprogramming system6. For gene lists related to e, refer to Supplementary Table 2.
Extended Data Figure 9 Long non-coding RNA expression analysis.
a, Determination of expression threshold for lncRNA genes using H3K4me3 and H3K36me3 chromatin mark. b, Distribution of the entropy of non-coding gene expression for individual samples (blue) and sample groups (red) indicated as probability density curve. c, Percentage of unannotated transcripts with listed genomic features. d, Analysis of unannotated lncRNA transcripts for coding potential using coding potential calculator (CPC). (See Supplementary Information for details.) e, RNA and protein expression profiles of three novel coding transcripts.
Extended Data Figure 10 Comparison of lncRNA expression in 1B secondary reprogramming and other reprogramming systems.
a, Pearson correlation analysis of differentially expressed un-annotated RNA transcripts for 1B reprogramming samples and secondary reprogramming clones that are competent or incompetent to become factor-independent secondary iPSCs (SC and SI clones, respectively)7. b, Pearson correlation analysis of differentially expressed unannotated RNA transcripts for 1B reprogramming samples and sorted reprogramming intermediates from ref. 8. c, Heat map of differentially expressed novel RNAs from 1B reprogramming samples with secondary reprogramming clones that are competent or incompetent to become factor-independent secondary iPSCs (SC and SI clones, respectively)7. For gene lists related to c, refer to Supplementary Table 4. d, Read coverage histograms representing gene expression and epigenetic status of unannotated lncRNAs observed in F-class (D16H) and ESC-like state (secondary iPSCs). e, GO analysis results for genes downregulated in F-class state (FDR <1%), but unchanged in ESC-like state, from D8H (combined groups 3, 6 and 9). f, GO analysis results for genes upregulated in ESC-like state (FDR <1%), but unchanged in F-class state, from D8H (combined groups 1b, 4b and 7b). For gene lists, full GO term analyses and P values associated with e, f refer to Supplementary Table 5.
Supplementary information
PowerPoint slides
Rights and permissions
About this article
Cite this article
Hussein, S., Puri, M., Tonge, P. et al. Genome-wide characterization of the routes to pluripotency.Nature 516, 198–206 (2014). https://doi.org/10.1038/nature14046
- Received: 10 October 2013
- Accepted: 10 November 2014
- Published: 10 December 2014
- Issue date: 11 December 2014
- DOI: https://doi.org/10.1038/nature14046