Computational methods for transcriptome annotation and quantification using RNA-seq (original) (raw)
Marra, M. et al. An encyclopedia of mouse genes. Nat. Genet.21, 191–194 (1999). ArticleCAS Google Scholar
Carninci, P. et al. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res.13, 1273–1289 (2003). Article Google Scholar
de Souza, S.J. et al. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl. Acad. Sci. USA97, 12690–12693 (2000). ArticleCAS Google Scholar
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature458, 223–227 (2009). ArticleCAS Google Scholar
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature456, 470–476 (2008). ArticleCAS Google Scholar
Adams, M.D. et al. Complementary DNA sequencing: expressed sequence tags and human genome project. Science252, 1651–1656 (1991). ArticleCAS Google Scholar
Haas, B.J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res.31, 5654–5666 (2003). ArticleCAS Google Scholar
Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics21, 1859–1875 (2005). ArticleCAS Google Scholar
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science296, 916–919 (2002). ArticleCAS Google Scholar
Pan, Q. et al. Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. Mol. Cell16, 929–941 (2004). ArticleCAS Google Scholar
Castle, J.C. et al. Expression of 24,426 human alternative splicing events and predicted cis regulation in 48 tissues and cell lines. Nat. Genet.40, 1416–1425 (2008). ArticleCAS Google Scholar
Schena, M., Shalon, D., Davis, R.W. & Brown, P.O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science270, 467–470 (1995). ArticleCAS Google Scholar
Golub, T.R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science286, 531–537 (1999). ArticleCAS Google Scholar
Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods5, 613–619 (2008). ArticleCAS Google Scholar
Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol.9, R175 (2008). Article Google Scholar
Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis.Cell133, 523–536 (2008). ArticleCAS Google Scholar
Maher, C.A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature458, 97–101 (2009). ArticleCAS Google Scholar
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res.18, 1509–1517 (2008). First systematic comparison of expression arrays and RNA-seq revealed that technical variability between RNA-seq runs is extremely low; the authors developed the first methods for principled differential analysis of expression with read counts. ArticleCAS Google Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods5, 621–628 (2008). One of the first papers to describe the RNA-seq experimental protocol and provided the foundations for the computational analysis of quantitative transcriptome sequencing by introducing the RPKM expression metric. ArticleCAS Google Scholar
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science320, 1344–1349 (2008). ArticleCAS Google Scholar
Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science321, 956–960 (2008). ArticleCAS Google Scholar
Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA106, 3264–3269 (2009). ArticleCAS Google Scholar
Blekhman, R., Marioni, J.C., Zumbo, P., Stephens, M. & Gilad, Y. Sex-specific and lineage-specific alternative splicing in primates. Genome Res.20, 180–189 (2010). ArticleCAS Google Scholar
Wilhelm, B.T. et al. RNA-seq analysis of two closely related leukemia clones that differ in their self-renewal capacity. Blood117, e27–e38 (2010). Article Google Scholar
Berger, M.F. et al. Integrative analysis of the melanoma transcriptome. Genome Res.20, 413–427 (2010). ArticleCAS Google Scholar
Mortazavi, A. et al. Scaffolding a Caenorhabditis nematode genome with RNA-seq. Genome Res.20, 1740–1747 (2010). ArticleCAS Google Scholar
Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol.28, 503–510 (2010). This paper describes a spliced alignment–based genome-guided transcript reconstruction methods that allow discovery of novel genes and isoforms from RNA-seq data. ArticleCAS Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol.28, 511–515 (2010). This paper describes a spliced alignment–based genome-guided transcript reconstruction methods that allow discovery of novel genes and isoforms from RNA-seq data and provided a method for estimating the expression of each reconstructed isoform. ArticleCAS Google Scholar
Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods7, 1009–1015 (2010). This paper describes a computational method that estimates isoform expression making use of both single and paired-end reads, and provides a Bayesian approach for detecting differential isoform expression. ArticleCAS Google Scholar
Homer, N., Merriman, B. & Nelson, S.F. BFAST: an alignment tool for large scale genome resequencing. PLoS ONE4, e7767 (2009). Article Google Scholar
Jiang, H. & Wong, W.H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics24, 2395–2396 (2008). A statistical algorithm to calculate isoform abundances for alternatively spliced genes is described. ArticleCAS Google Scholar
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res.18, 1851–1858 (2008). ArticleCAS Google Scholar
Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics24, 713–714 (2008). ArticleCAS Google Scholar
Lunter, G. & Goodson, M. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. advance online publication 27 October 2010 (doi:10.1101/gr.111120.110). Article Google Scholar
Rizk, G. & Lavenier, D. GASSST: global alignment short sequence search tool. Bioinformatics26, 2534–2540 (2010). ArticleCAS Google Scholar
Rumble, S.M. et al. SHRiMP: accurate mapping of short color-space reads. PLoS Comput. Biol.5, e1000386 (2009). Article Google Scholar
Smith, A.D., Xuan, Z. & Zhang, M.Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics9, 128 (2008). Article Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.10, R25 (2009). Introduced short read alignment with the Burrows-Wheeler transform, allowing the construction of the first fast alignment pipelines for RNA-seq. Article Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics25, 1754–1760 (2009). ArticleCAS Google Scholar
Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics25, 1966–1967 (2009). ArticleCAS Google Scholar
Burrows, M. & Wheeler, D.J.A. Block-sorting lossless data compression algorithm. Digital SRC Reports124, [AU: provide an article ID number or page numbers, or some other identifying information for this paper, such as a doi number or Pubmed or CrossRef ID] (1994).
Ferragina, P. & Manzini, G. An experimental study of a compressed index. Inf. Sci.135, 13–28 (2001). Article Google Scholar
Griffith, M. et al. Alternative expression analysis by RNA sequencing. Nat. Methods7, 843–847 (2010). ArticleCAS Google Scholar
Cloonan, N. et al. RNA-MATE: a recursive mapping strategy for high-throughput RNA-sequencing data. Bioinformatics25, 2615–2616 (2009). ArticleCAS Google Scholar
Degner, J.F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics25, 3207–3212 (2009). ArticleCAS Google Scholar
Au, K.F., Jiang, H., Lin, L., Xing, Y. & Wong, W.H. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res.38, 4570–4578 (2010). ArticleCAS Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics25, 1105–1111 (2009). This method combined fast read alignment using Burrows-Wheeler transform alignment with novel junction discovery, was one of the first scalable RNA-seq alignment programs, and paved the way for gene discovery and transcript reconstruction with RNA-seq. ArticleCAS Google Scholar
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res.38, e178 (2010). Article Google Scholar
Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics26, 873–881 (2010). ArticleCAS Google Scholar
De Bona, F., Ossowski, S., Schneeberger, K. & Ratsch, G. Optimal spliced alignments of short sequence reads. Bioinformatics24, i174–i180 (2008). Article Google Scholar
Mikkelsen, T.S. et al. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature447, 167–177 (2007). ArticleCAS Google Scholar
Robertson, G. et al. De novo assembly and analysis of RNA-seq data. Nat. Methods7, 909–912 (2010). Described a variablek-mer approach for genome-independent reconstruction that allows for transcript discovery without a reference genome. ArticleCAS Google Scholar
Birol, I. et al. De novo transcriptome assembly with ABySS. Bioinformatics25, 2872–2877 (2009). ArticleCAS Google Scholar
Surget-Groba, Y. & Montoya-Burgos, J.I. Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res.20, 1432–1440 (2010). ArticleCAS Google Scholar
De Bruijn, N.G. A combinatorial problem. Koninklijke Nederlandse Akademie v.Wetenschappen46, 6 (1946). Google Scholar
Pevzner, P.A. 1-Tuple DNA sequencing: computer analysis. J. Biomol. Struct. Dyn.7, 63–73 (1989). ArticleCAS Google Scholar
Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res.18, 821–829 (2008). ArticleCAS Google Scholar
Zerbino, D.R. Using the Velvet de novo assembler for short-read sequencing technologies. Curr. Protoc. Bioinformatics31, 11.5.1–11.5.12 (2010). Google Scholar
Blencowe, B.J., Ahmad, S. & Lee, L.J. Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes. Genes Dev.23, 1379–1386 (2009). ArticleCAS Google Scholar
Lister, R., Gregory, B.D. & Ecker, J.R. Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr. Opin. Plant Biol.12, 107–118 (2009). ArticleCAS Google Scholar
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods6, S22–S32 (2009). ArticleCAS Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet.10, 57–63 (2009). ArticleCAS Google Scholar
Oshlack, A. & Wakefield, M.J. Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct4, 14 (2009). Article Google Scholar
Robinson, M.D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol.11, R25 (2010). Article Google Scholar
Jiang, H. & Wong, W.H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics25, 1026–1032 (2009). ArticleCAS Google Scholar
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics26, 493–500 (2010). Article Google Scholar
Bullard, J.H., Purdom, E., Hansen, K.D. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics11, 94 (2010). Article Google Scholar
Wang, X., Wu, Z. & Zhang, X. Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. J. Bioinform. Comput. Biol.8 (Suppl. 1), 177–192 (2010). ArticleCAS Google Scholar
Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA98, 5116–5121 (2001). ArticleCAS Google Scholar
Grant, G.R., Manduchi, E. & Stoeckert, C.J. Jr. Analysis and management of microarray gene expression data. Curr. Protoc. Mol. Biol.19 6 (2007). PubMed Google Scholar
Grant, G.R., Liu, J. & Stoeckert, C.J. Jr. A practical false discovery rate approach to identifying patterns of differential expression in microarray data. Bioinformatics21, 2684–2690 (2005). ArticleCAS Google Scholar
Robinson, M.D. & Smyth, G.K. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics23, 2881–2887 (2007). Provided a statistical framework that is well suited to differential expression testing when a small number of RNA-seq replicates are available, and which also works well for larger experiments. ArticleCAS Google Scholar
Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics26, 139–140 (2010). ArticleCAS Google Scholar
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol.11, R106 (2010). ArticleCAS Google Scholar
Wang, L., Feng, Z., Wang, X. & Zhang, X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics26, 136–138 (2010). Article Google Scholar
Levin, J.Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods7, 709–715 (2010). ArticleCAS Google Scholar
Jan, C.H., Friedman, R.C., Ruby, J.G. & Bartel, D.P. Formation, regulation and evolution of Caenorhabditis elegans 3′UTRs. Nature469, 97–101 (2011). ArticleCAS Google Scholar
Mangone, M. et al. The landscape of C. elegans 3′UTRs. Science329, 432–435 (2010). ArticleCAS Google Scholar
Plessy, C. et al. Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat. Methods7, 528–534 (2010). ArticleCAS Google Scholar
Lee, S. et al. Accurate quantification of transcriptome from RNA-Seq data by effective length normalization. Nucleic Acids Res.39, e9 (2010). Article Google Scholar