RNA-Seq gene expression estimation with read mapping uncertainty - PubMed (original) (raw)

RNA-Seq gene expression estimation with read mapping uncertainty

Bo Li et al. Bioinformatics. 2010.

Abstract

Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically.

Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20-25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed.

PubMed Disclaimer

Figures

Fig. 1.

The graphical model for RNA-Seq data used by our method.

Fig. 2.

Gene expression estimation accuracy varies with read length given fixed base throughput (T). The curves are (1) mouse liver, _T_=375 × 106, (2) mouse liver, _T_=750 × 106, (3) mouse liver, _T_=1.5 × 107, (4) mouse brain, _T_=750 × 106 and (5) maize, _T_=750 × 106. The τ MPE was calculated with respect to the true expression values for all genes with true level at least 1 TPM.

Cited by

Exploration and Validation of Immune and Therapeutic-Related Hub Genes in Aortic Valve Calcification and Carotid Atherosclerosis.
Wei K, Cao Y, Kong X, Liu C, Gu X. Wei K, et al. J Inflamm Res. 2024 Sep 17;17:6485-6500. doi: 10.2147/JIR.S462546. eCollection 2024. J Inflamm Res. 2024. PMID: 39310903 Free PMC article.
TYRP1 directed CAR T cells control tumor progression in preclinical melanoma models.
Hackett CS, Hirschhorn D, Tang MS, Purdon TJ, Marouf Y, Piersigilli A, Agaram NP, Liu C, Schad SE, de Stanchina E, Rafiq S, Monette S, Wolchok JD, Merghoub T, Brentjens RJ. Hackett CS, et al. Mol Ther Oncol. 2024 Aug 22;32(3):200862. doi: 10.1016/j.omton.2024.200862. eCollection 2024 Sep 19. Mol Ther Oncol. 2024. PMID: 39308793 Free PMC article.
Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner.
Ji HJ, Pertea M. Ji HJ, et al. bioRxiv [Preprint]. 2024 Aug 17:2024.04.13.589356. doi: 10.1101/2024.04.13.589356. bioRxiv. 2024. PMID: 39185147 Free PMC article. Preprint.
A novel flavobacterial phage abundant during green tide, representing a new viral family, Zblingviridae.
Guo X, Zhang X, Shao H, McMinn A, Liang Y, Wang M. Guo X, et al. Appl Environ Microbiol. 2024 Jul 24;90(7):e0036724. doi: 10.1128/aem.00367-24. Epub 2024 Jul 2. Appl Environ Microbiol. 2024. PMID: 38953371 Free PMC article.

References

1. Beissbarth T, et al. Statistical modeling of sequencing errors in SAGE libraries. Bioinformatics. 2004;20(Suppl. 1):i31–i39. - PubMed
1. Cloonan N, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods. 2008;5:613–619. - PubMed
1. Dempster AP, et al. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977;39:1–38.
1. Dohm JC, et al. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008;36:e105. - PMC - PubMed
1. Faulkner GJ, et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics. 2008;91:281–288. - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations

RNA-Seq gene expression estimation with read mapping uncertainty - PubMed (original) (raw)