RNA-Seq gene expression estimation with read mapping uncertainty - PubMed (original) (raw)
RNA-Seq gene expression estimation with read mapping uncertainty
Bo Li et al. Bioinformatics. 2010.
Abstract
Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.
Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20-25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.
Figures
Fig. 1.
The graphical model for RNA-Seq data used by our method.
Fig. 2.
Gene expression estimation accuracy varies with read length given fixed base throughput (T). The curves are (1) mouse liver, _T_=375 × 106, (2) mouse liver, _T_=750 × 106, (3) mouse liver, _T_=1.5 × 107, (4) mouse brain, _T_=750 × 106 and (5) maize, _T_=750 × 106. The τ MPE was calculated with respect to the true expression values for all genes with true level at least 1 TPM.
Similar articles
- Zea mays RNA-seq estimated transcript abundances are strongly affected by read mapping bias.
Zhan S, Griswold C, Lukens L. Zhan S, et al. BMC Genomics. 2021 Apr 20;22(1):285. doi: 10.1186/s12864-021-07577-3. BMC Genomics. 2021. PMID: 33874908 Free PMC article. - Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads.
Turro E, Su SY, Gonçalves Â, Coin LJ, Richardson S, Lewin A. Turro E, et al. Genome Biol. 2011;12(2):R13. doi: 10.1186/gb-2011-12-2-r13. Epub 2011 Feb 10. Genome Biol. 2011. PMID: 21310039 Free PMC article. - TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads.
Nariai N, Kojima K, Mimori T, Sato Y, Kawai Y, Yamaguchi-Kabata Y, Nagasaki M. Nariai N, et al. BMC Genomics. 2014;15 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2164-15-S10-S5. Epub 2014 Dec 12. BMC Genomics. 2014. PMID: 25560536 Free PMC article. - Mapping RNA-seq Reads with STAR.
Dobin A, Gingeras TR. Dobin A, et al. Curr Protoc Bioinformatics. 2015 Sep 3;51:11.14.1-11.14.19. doi: 10.1002/0471250953.bi1114s51. Curr Protoc Bioinformatics. 2015. PMID: 26334920 Free PMC article. Review. - Characterizing and annotating the genome using RNA-seq data.
Chen G, Shi T, Shi L. Chen G, et al. Sci China Life Sci. 2017 Feb;60(2):116-125. doi: 10.1007/s11427-015-0349-4. Epub 2016 Jun 13. Sci China Life Sci. 2017. PMID: 27294835 Review.
Cited by
- Exploration and Validation of Immune and Therapeutic-Related Hub Genes in Aortic Valve Calcification and Carotid Atherosclerosis.
Wei K, Cao Y, Kong X, Liu C, Gu X. Wei K, et al. J Inflamm Res. 2024 Sep 17;17:6485-6500. doi: 10.2147/JIR.S462546. eCollection 2024. J Inflamm Res. 2024. PMID: 39310903 Free PMC article. - TYRP1 directed CAR T cells control tumor progression in preclinical melanoma models.
Hackett CS, Hirschhorn D, Tang MS, Purdon TJ, Marouf Y, Piersigilli A, Agaram NP, Liu C, Schad SE, de Stanchina E, Rafiq S, Monette S, Wolchok JD, Merghoub T, Brentjens RJ. Hackett CS, et al. Mol Ther Oncol. 2024 Aug 22;32(3):200862. doi: 10.1016/j.omton.2024.200862. eCollection 2024 Sep 19. Mol Ther Oncol. 2024. PMID: 39308793 Free PMC article. - Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner.
Ji HJ, Pertea M. Ji HJ, et al. bioRxiv [Preprint]. 2024 Aug 17:2024.04.13.589356. doi: 10.1101/2024.04.13.589356. bioRxiv. 2024. PMID: 39185147 Free PMC article. Preprint. - A novel flavobacterial phage abundant during green tide, representing a new viral family, Zblingviridae.
Guo X, Zhang X, Shao H, McMinn A, Liang Y, Wang M. Guo X, et al. Appl Environ Microbiol. 2024 Jul 24;90(7):e0036724. doi: 10.1128/aem.00367-24. Epub 2024 Jul 2. Appl Environ Microbiol. 2024. PMID: 38953371 Free PMC article.
References
- Beissbarth T, et al. Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics. 2004;20(Suppl. 1):i31–i39. - PubMed
- Cloonan N, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods. 2008;5:613–619. - PubMed
- Dempster AP, et al. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977;39:1–38.
- Faulkner GJ, et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics. 2008;91:281–288. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources