Differential Expression for RNA Sequencing (RNA-Seq) Data: Mapping, Summarization, Statistical Analysis, and Experimental Design (original) (raw)
‘t Hoen PA, Ariyurek Y, Thygesen HH, et al. (2008) Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res 36:e141 Article Google Scholar
Ameur A, Wetterbom A, Feuk L, et al. (2010) Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol 11:R34 ArticlePubMed Google Scholar
Anders S and Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106 ArticlePubMedCAS Google Scholar
Auer PL (2010) Statistical Design And Analysis Of Next-Generation Sequencing Data. Doctor of Philosophy, Purdue University Google Scholar
Auer PL and Doerge RW (2010) Statistical design and analysis of RNA sequencing data. Genetics 185:405–16 ArticlePubMedCAS Google Scholar
Babak T, Garrett-Engele P, Armour CD, et al. (2010) Genetic validation of whole-transcriptome sequencing for mapping expression affected by cis-regulatory variation. BMC Genomics 11:473 ArticlePubMed Google Scholar
Binder H, Kirsten T, Loeffler M, et al. (2004) Sensitivity of Microarray Oligonucleotide Probes: Variability and Effect of Base Composition. The Journal of Physical Chemistry B 108:18003–14 ArticleCAS Google Scholar
Blekhman R, Marioni JC, Zumbo P, et al. (2010) Sex-specific and lineage-specific alternative splicing in primates. Genome Res 20:180–9 ArticlePubMedCAS Google Scholar
Bock C, Tomazou EM, Brinkman AB, et al. (2010) Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28:1106–14 ArticlePubMedCAS Google Scholar
Bradford JR, Hey Y, Yates T, et al. (2010) A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling. BMC Genomics 11:282 ArticlePubMed Google Scholar
Bullard JH, Purdom E, Hansen KD, et al. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94 ArticlePubMed Google Scholar
Carvalho PC, Hewel J, Barbosa VC, et al. (2008) Identifying differences in protein expression levels by spectral counting and feature selection. Genet Mol Res 7:342–56 ArticlePubMedCAS Google Scholar
Churchill GA (2002) Fundamentals of experimental design for cDNA microarrays. Nat Genet 32 Suppl:490–5 ArticlePubMedCAS Google Scholar
Cloonan N, Forrest AR, Kolle G, et al. (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–9 ArticlePubMedCAS Google Scholar
De Bona F, Ossowski S, Schneeberger K, et al. (2008) Optimal spliced alignments of short sequence reads. Bioinformatics 24:i174–80 ArticlePubMed Google Scholar
Degner JF, Marioni JC, Pai AA, et al. (2009) Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25:3207–12 ArticlePubMedCAS Google Scholar
Dennis G, Jr., Sherman BT, Hosack DA, et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4:P3 ArticlePubMed Google Scholar
Ferragina P and Manzini G (2000) Opportunistic data structures with applications. Annu Symp Found Comput Sci Proc 2000:390–398 Google Scholar
Flicek P and Birney E (2009) Sense from sequence reads: methods for alignment and assembly. Nat Methods 6:S6–S12 ArticlePubMedCAS Google Scholar
Fu X, Fu N, Guo S, et al. (2009) Estimating accuracy of RNA-Seq and microarrays with proteomics. BMC Genomics 10:161 ArticlePubMed Google Scholar
Griffith M, Griffith OL, Mwenifumbo J, et al. (2010) Alternative expression analysis by RNA sequencing. Nat Methods 7:843–7 ArticlePubMedCAS Google Scholar
Hansen KD, Brenner SE and Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38:e131 ArticlePubMed Google Scholar
Hardcastle TJ and Kelly KA (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11:422 ArticlePubMed Google Scholar
Harr B and Turner LM (2010) Genome-wide analysis of alternative splicing evolution among Mus subspecies. Mol Ecol 19 Suppl 1:228–39 ArticlePubMedCAS Google Scholar
Harris RA, Wang T, Coarfa C, et al. (2010) Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat Biotechnol 28:1097–1105 ArticlePubMedCAS Google Scholar
Hawkins RD, Hon GC and Ren B (2010) Next-generation genomics: an integrative approach. Nat Rev Genet 11:476–86 PubMedCAS Google Scholar
Hu J, Coombes KR, Morris JS, et al. (2005) The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. Brief Funct Genomic Proteomic 3:322–31 ArticlePubMedCAS Google Scholar
Jiang H and Wong WH (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–32 ArticlePubMedCAS Google Scholar
Kanehisa M and Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30 ArticlePubMedCAS Google Scholar
Langmead B, Hansen KD and Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11:R83 ArticlePubMed Google Scholar
Langmead B, Trapnell C, Pop M, et al. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25 ArticlePubMed Google Scholar
Levin JZ, Yassour M, Adiconis X, et al. (2010) Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods 7:709–15 ArticlePubMedCAS Google Scholar
Li B, Ruotti V, Stewart RM, et al. (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500 ArticlePubMed Google Scholar
Li H and Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–60 ArticlePubMedCAS Google Scholar
Li H, Ruan J and Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–8 ArticlePubMedCAS Google Scholar
Linsen SE, de Wit E, Janssens G, et al. (2009) Limitations and possibilities of small RNA digital gene expression profiling. Nat Methods 6:474–6 ArticlePubMedCAS Google Scholar
Lister R, Pelizzola M, Dowen RH, et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462:315–22 ArticlePubMedCAS Google Scholar
Liu S, Lin L, Jiang P, et al. (2011) A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res 39:578–88 ArticlePubMedCAS Google Scholar
Lu J, Tomfohr JK and Kepler TB (2005) Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach. BMC Bioinformatics 6:165 ArticlePubMed Google Scholar
Maher CA, Kumar-Sinha C, Cao X, et al. (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458:97–101 ArticlePubMedCAS Google Scholar
Marioni JC, Mason CE, Mane SM, et al. (2008) RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–17 ArticlePubMedCAS Google Scholar
McCullagh P and Nelder JA (1989) Generalized linear models, 2nd. Chapman and Hall, London ; New York Google Scholar
Montgomery SB, Sammeth M, Gutierrez-Arcelus M, et al. (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464:773–7 ArticlePubMedCAS Google Scholar
Mortazavi A, Williams BA, McCue K, et al. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–8 ArticlePubMedCAS Google Scholar
Naef F and Magnasco MO (2003) Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. Phys Rev E Stat Nonlin Soft Matter Phys 68:011906 Google Scholar
Oshlack A and Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14 ArticlePubMed Google Scholar
Ouyang Z, Zhou Q and Wong WH (2009) ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc Natl Acad Sci USA 106:21521–6 ArticlePubMedCAS Google Scholar
Pan Q, Shai O, Lee LJ, et al. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40:1413–5 ArticlePubMedCAS Google Scholar
Parikh A, Miranda ER, Katoh-Kurasawa M, et al. (2010) Conserved developmental transcriptomes in evolutionarily divergent species. Genome Biol 11:R35 ArticlePubMed Google Scholar
Picardi E, Horner DS, Chiara M, et al. (2010) Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing. Nucleic Acids Res 38:4755–67 ArticlePubMedCAS Google Scholar
Pickrell JK, Marioni JC, Pai AA, et al. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464:768–72 ArticlePubMedCAS Google Scholar
Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32 Suppl:496–501 ArticlePubMedCAS Google Scholar
Quail MA, Kozarewa I, Smith F, et al. (2008) A large genome center’s improvements to the Illumina sequencing system. Nat Methods 5:1005–10 ArticlePubMedCAS Google Scholar
Raha D, Wang Z, Moqtaderi Z, et al. (2010) Close association of RNA polymerase II and many transcription factors with Pol III genes. Proc Natl Acad Sci USA 107:3639–44 ArticlePubMedCAS Google Scholar
Robertson G, Schein J, Chiu R, et al. (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–12 ArticlePubMedCAS Google Scholar
Robinson MD, McCarthy DJ and Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–40 ArticlePubMedCAS Google Scholar
Robinson MD and Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11:R25 ArticlePubMed Google Scholar
Robinson MD and Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23:2881–7 ArticlePubMedCAS Google Scholar
Robinson MD and Smyth GK (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9:321–32 ArticlePubMed Google Scholar
Robinson MD, Stirzaker C, Statham AL, et al. (2010) Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation. Genome Res 20:1719–29 ArticlePubMedCAS Google Scholar
Schadt EE, Linderman MD, Sorenson J, et al. (2010) Computational solutions to large-scale data management and analysis. Nat Rev Genet 11:647–57 ArticlePubMedCAS Google Scholar
Simpson JT, Wong K, Jackman SD, et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–23 ArticlePubMedCAS Google Scholar
Srivastava S and Chen L (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 38:e170 ArticlePubMed Google Scholar
Subramanian A, Tamayo P, Mootha VK, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102:15545–50 ArticlePubMedCAS Google Scholar
Sultan M, Schulz MH, Richard H, et al. (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956–60 ArticlePubMedCAS Google Scholar
Taub M and Speed TP (2010) Methods for allocating ambiguous short-reads. Communications in information and systems 10:69–82 Google Scholar
Trapnell C, Pachter L and Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–11 ArticlePubMedCAS Google Scholar
Trapnell C, Williams BA, Pertea G, et al. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28:511–515 ArticlePubMedCAS Google Scholar
Wang ET, Sandberg R, Luo S, et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–6 ArticlePubMedCAS Google Scholar
Wang L, Xi Y, Yu J, et al. (2010) A statistical method for the detection of alternative splicing using RNA-seq. PLoS One 5:e8529 ArticlePubMed Google Scholar
Wang Z, Gerstein M and Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63 ArticlePubMedCAS Google Scholar
White JR, Nagarajan N and Pop M (2009) Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5:e1000352 ArticlePubMed Google Scholar
Wu D, Lim E, Vaillant F, et al. (2010) ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26:2176–82 ArticlePubMedCAS Google Scholar
Wu Z and Irizarry RA (2005) Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J Comput Biol 12:882–93 ArticlePubMedCAS Google Scholar
Yang YH and Speed T (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3:579–88 PubMedCAS Google Scholar
Young MD, Wakefield MJ, Smyth GK, et al. (2010) Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol 11:R14 ArticlePubMed Google Scholar
Zerbino DR and Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–9 ArticlePubMedCAS Google Scholar
Zhang K, Li JB, Gao Y, et al. (2009) Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human. Nat Methods 6:613–8 ArticlePubMedCAS Google Scholar