ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets - PubMed (original) (raw)

ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets

Alyssa C Frazee et al. BMC Bioinformatics. 2011.

Abstract

Background: RNA sequencing is a flexible and powerful new approach for measuring gene, exon, or isoform expression. To maximize the utility of RNA sequencing data, new statistical methods are needed for clustering, differential expression, and other analyses. A major barrier to the development of new statistical methods is the lack of RNA sequencing datasets that can be easily obtained and analyzed in common statistical software packages such as R. To speed up the development process, we have created a resource of analysis-ready RNA-sequencing datasets. 2 DESCRIPTION: ReCount is an online resource of RNA-seq gene count tables and auxilliary data. Tables were built from raw RNA sequencing data from 18 different published studies comprising 475 samples and over 8 billion reads. Using the Myrna package, reads were aligned, overlapped with gene models and tabulated into gene-by-sample count tables that are ready for statistical analysis. Count tables and phenotype data were combined into Bioconductor ExpressionSet objects for ease of analysis. ReCount also contains the Myrna manifest files and R source code used to process the samples, allowing statistical and computational scientists to consider alternative parameter values. 3 CONCLUSIONS: By combining datasets from many studies and providing data that has already been processed from. fastq format into ready-to-use. RData and. txt files, ReCount facilitates analysis and methods development for RNA-seq count data. We anticipate that ReCount will also be useful for investigators who wish to consider cross-study comparisons and alternative normalization strategies for RNA-seq.

PubMed Disclaimer

Figures

Figure 1

Histogram of adjusted p-values from differential expression analysis on the 29 samples included in both Cheung and Montgomery. The p-values in the histogram are from paired t-tests on the 25% of genes with nonzero counts in at least one of the two studies. The peak near zero is somewhat indicative of technical variability between the two studies.

Figure 2

Histogram of adjusted p-values from analysis of differential expression between YRI and CEU populations. The p-values in the histogram are from two-sample t-tests on the 25% of genes with nonzero counts in at least one of the two studies. The peak near zero indicates differential gene expression that may result from either technical or biological variability.

Cited by

Expression profile of ectopic olfactory receptors determined by deep sequencing.
Flegel C, Manteniotis S, Osthold S, Hatt H, Gisselmann G. Flegel C, et al. PLoS One. 2013;8(2):e55368. doi: 10.1371/journal.pone.0055368. Epub 2013 Feb 6. PLoS One. 2013. PMID: 23405139 Free PMC article.
An Iterative Leave-One-Out Approach to Outlier Detection in RNA-Seq Data.
George NI, Bowyer JF, Crabtree NM, Chang CW. George NI, et al. PLoS One. 2015 Jun 3;10(6):e0125224. doi: 10.1371/journal.pone.0125224. eCollection 2015. PLoS One. 2015. PMID: 26039068 Free PMC article.
Elucidating tissue specific genes using the Benford distribution.
Karthik D, Stelzer G, Gershanov S, Baranes D, Salmon-Divon M. Karthik D, et al. BMC Genomics. 2016 Aug 9;17:595. doi: 10.1186/s12864-016-2921-x. BMC Genomics. 2016. PMID: 27506195 Free PMC article.
Sparse sliced inverse regression for high dimensional data analysis.
Hilafu H, Safo SE. Hilafu H, et al. BMC Bioinformatics. 2022 May 7;23(1):168. doi: 10.1186/s12859-022-04700-3. BMC Bioinformatics. 2022. PMID: 35525975 Free PMC article.
Differential abundance analysis for microbial marker-gene surveys.
Paulson JN, Stine OC, Bravo HC, Pop M. Paulson JN, et al. Nat Methods. 2013 Dec;10(12):1200-2. doi: 10.1038/nmeth.2658. Epub 2013 Sep 29. Nat Methods. 2013. PMID: 24076764 Free PMC article.

References

1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484. - DOI - PMC - PubMed
1. Stein LD. The case for cloud computing in genome informatics. Genome Biol. 2010;11:207. doi: 10.1186/gb-2010-11-5-207. - DOI - PMC - PubMed
1. Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011;29(7):572–573. doi: 10.1038/nbt.1910. - DOI - PMC - PubMed
1. Auer PL, Doerge RW. Statistical design and analysis of RNA sequencing data. Genetics. 2010;185:405–416. doi: 10.1534/genetics.110.114983. - DOI - PMC - PubMed
1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Sobolera A. NCBI GEO: archive for functional genomics data sets - 10 years on. Nucleic Acids Res. 2011;39(suppl 1):D1005–D1010. - PMC - PubMed

ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets - PubMed (original) (raw)