Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs (original) (raw)
- Article
- Published: 02 May 2010
- Manuel Garber1 na1,
- Joshua Z Levin1,
- Julie Donaghey1,
- James Robinson1,
- Xian Adiconis1,
- Lin Fan1,
- Magdalena J Koziol1,3,
- Andreas Gnirke1,
- Chad Nusbaum1,
- John L Rinn1,3,
- Eric S Lander1,2,4 &
- …
- Aviv Regev1,2,5
Nature Biotechnology volume 28, pages 503–510 (2010)Cite this article
- 13k Accesses
- 953 Citations
- 20 Altmetric
- Metrics details
Subjects
A Corrigendum to this article was published on 01 July 2010
This article has been updated
Abstract
Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5′ start sites, 3′ ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Similar content being viewed by others
Accession codes
Accessions
Gene Expression Omnibus
Change history
09 July 2010
In the version of this article initially published, the fourth sentence in the methods section “RNA extraction and library preparation” instead of saying a “procedure that combines a random priming step with a shearing step8,9,28 and results in fragments of ~700 bp in size” should have read, “procedure that combines fragmentation of mRNA to a peak size of ~750 nucleotides by heating6 followed by random-primed reverse transcription8.”. The error has been corrected in the HTML and PDF versions of the article.
References
- Carninci, P. et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).
Article CAS Google Scholar - Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).
Article CAS Google Scholar - Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).
Article CAS Google Scholar - Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Article CAS Google Scholar - Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).
Article CAS Google Scholar - Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).
Article CAS Google Scholar - Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Article CAS Google Scholar - Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Article CAS Google Scholar - Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).
Article CAS Google Scholar - Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).
Article CAS Google Scholar - Maher, C.A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).
Article CAS Google Scholar - Birol, I. et al. De novo transcriptome assembly with ABySS. Bioinformatics 25, 2872–2877 (2009).
Article CAS Google Scholar - Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS Google Scholar - Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).
Article Google Scholar - Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Article CAS Google Scholar - Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).
Article CAS Google Scholar - Lin, M.F., Deoras, A.N., Rasmussen, M.D. & Kellis, M. Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLOS Comput. Biol. 4, e1000067 (2008).
Article Google Scholar - Lin, M.F. et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 17, 1823–1836 (2007).
Article CAS Google Scholar - Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).
Article CAS Google Scholar - Brown, C.J. et al. A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38–44 (1991).
Article CAS Google Scholar - Rinn, J.L. et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).
Article CAS Google Scholar - Willingham, A.T. et al. A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science 309, 1570–1573 (2005).
Article CAS Google Scholar - Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J. & Lee, J.T. Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008).
Article CAS Google Scholar - Katayama, S. et al. Antisense transcription in the mammalian transcriptome. Science 309, 1564–1566 (2005).
Article Google Scholar - Wu, J. Q. et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc. Natl. Acad. Sci. USA 107, 5254–5259 (2010).
Article CAS Google Scholar - Ramsköld, D., Wang, E.T., Burge, C.B. & Sandberg, R. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLOS Comput. Biol. 5, e1000598 (2009).
Article Google Scholar - Conti, L. et al. Niche-independent symmetrical self-renewal of a mammalian tissue stem cell. PLoS Biol. 3, e283 (2005).
Article Google Scholar - Berger, M. F. et al. Integrative analysis of the melanoma transcriptome. Genome Res. 20, 413–427 (2010).
Article CAS Google Scholar - Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008).
Article CAS Google Scholar - Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article Google Scholar - Ewens, W.J. & Grant, G.R. Statistical Methods in Bioinformatics: An Introduction 2nd edn. (Springer, 2005).
- Glaz, J., Naus, J.I. & Wallenstein, S. Scan Statistics (Springer, 2001).
Acknowledgements
We thank M. Wernig (MIT) for providing NPC; M. Lin and M. Kellis (MIT) for CSF code; the Broad Sequencing Platform for sample sequencing; L. Gaffney for assistance with graphics; and C. Burge, J. Merkin, R. Bradley and members of Lander and Regev laboratories—in particular, M. Yassour, T. Mikkelsen and I. Amit—for discussions. A.R. and J.L.R. were supported by the Merkin Family Foundation for Stem Cell Research at the Broad Institute. M. Guttman was supported by a Vertex scholarship. Work was supported by a Burroughs Wellcome Fund Career Award at the Scientific Interface, a US National Institutes of Health PIONEER award, a US National Human Genome Research Institute (NHGRI) R01 grant and the Howard Hughes Medical Institute (A.R.), and NHGRI and the Broad Institute of MIT and Harvard (E.S.L.).
Author information
Author notes
- Mitchell Guttman and Manuel Garber: These authors contributed equally to this work.
Authors and Affiliations
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
Mitchell Guttman, Manuel Garber, Joshua Z Levin, Julie Donaghey, James Robinson, Xian Adiconis, Lin Fan, Magdalena J Koziol, Andreas Gnirke, Chad Nusbaum, John L Rinn, Eric S Lander & Aviv Regev - Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Mitchell Guttman, Eric S Lander & Aviv Regev - Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
Magdalena J Koziol & John L Rinn - Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
Eric S Lander - Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Aviv Regev
Authors
- Mitchell Guttman
You can also search for this author inPubMed Google Scholar - Manuel Garber
You can also search for this author inPubMed Google Scholar - Joshua Z Levin
You can also search for this author inPubMed Google Scholar - Julie Donaghey
You can also search for this author inPubMed Google Scholar - James Robinson
You can also search for this author inPubMed Google Scholar - Xian Adiconis
You can also search for this author inPubMed Google Scholar - Lin Fan
You can also search for this author inPubMed Google Scholar - Magdalena J Koziol
You can also search for this author inPubMed Google Scholar - Andreas Gnirke
You can also search for this author inPubMed Google Scholar - Chad Nusbaum
You can also search for this author inPubMed Google Scholar - John L Rinn
You can also search for this author inPubMed Google Scholar - Eric S Lander
You can also search for this author inPubMed Google Scholar - Aviv Regev
You can also search for this author inPubMed Google Scholar
Contributions
M. Guttman and M. Garber conceived the project, designed research, implemented Scripture, performed computational analysis and wrote the paper. A.G., C.N. and J.Z.L. oversaw cDNA sequencing, provided molecular biology advice and helped to edit the manuscript. J.D. constructed cDNA libraries, performed validation experiments and helped to edit the manuscript. J.R. implemented components of Scripture and provided computational support and technical advice. X.A., L.F. and M.J.K. constructed cDNA libraries. J.L.R. provided reagents and helped edit the manuscript. E.S.L. designed research direction and wrote the paper. A.R. provided cDNA sequencing guidance, conceived the project, designed research direction and wrote the paper.
Corresponding authors
Correspondence toMitchell Guttman, Manuel Garber or Aviv Regev.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Guttman, M., Garber, M., Levin, J. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs.Nat Biotechnol 28, 503–510 (2010). https://doi.org/10.1038/nbt.1633
- Received: 10 March 2010
- Accepted: 06 April 2010
- Published: 02 May 2010
- Issue Date: May 2010
- DOI: https://doi.org/10.1038/nbt.1633