Discrimination of non-protein-coding transcripts from protein-coding mRNA - PubMed (original) (raw)
. 2006 Jan-Mar;3(1):40-8.
doi: 10.4161/rna.3.1.2789. Epub 2006 Apr 3.
Timothy L Bailey, Takeya Kasukawa, Flavio Mignone, Sarah K Kummerfeld, Martin Madera, Sirisha Sunkara, Masaaki Furuno, Carol J Bult, John Quackenbush, Chikatoshi Kai, Jun Kawai, Piero Carninci, Yoshihide Hayashizaki, Graziano Pesole, John S Mattick
Affiliations
- PMID: 17114936
- DOI: 10.4161/rna.3.1.2789
Discrimination of non-protein-coding transcripts from protein-coding mRNA
Martin C Frith et al. RNA Biol. 2006 Jan-Mar.
Abstract
Several recent studies indicate that mammals and other organisms produce large numbers of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different laboratories have used different methods, whose ability to perform this discrimination is unclear. In this study, we examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous versus non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, our analyses also provide evidence that as much as approximately 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.
Similar articles
- Differentiating protein-coding and noncoding RNA: challenges and ambiguities.
Dinger ME, Pang KC, Mercer TR, Mattick JS. Dinger ME, et al. PLoS Comput Biol. 2008 Nov;4(11):e1000176. doi: 10.1371/journal.pcbi.1000176. Epub 2008 Nov 28. PLoS Comput Biol. 2008. PMID: 19043537 Free PMC article. Review. - Characterization of 43 non-protein-coding mRNA genes in Arabidopsis, including the MIR162a-derived transcripts.
Hirsch J, Lefort V, Vankersschaver M, Boualem A, Lucas A, Thermes C, d'Aubenton-Carafa Y, Crespi M. Hirsch J, et al. Plant Physiol. 2006 Apr;140(4):1192-204. doi: 10.1104/pp.105.073817. Epub 2006 Feb 24. Plant Physiol. 2006. PMID: 16500993 Free PMC article. - A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts.
Schneider HW, Raiol T, Brigido MM, Walter MEMT, Stadler PF. Schneider HW, et al. BMC Genomics. 2017 Oct 18;18(1):804. doi: 10.1186/s12864-017-4178-4. BMC Genomics. 2017. PMID: 29047334 Free PMC article. - Identification and expression analysis of putative mRNA-like non-coding RNA in Drosophila.
Inagaki S, Numata K, Kondo T, Tomita M, Yasuda K, Kanai A, Kageyama Y. Inagaki S, et al. Genes Cells. 2005 Dec;10(12):1163-73. doi: 10.1111/j.1365-2443.2005.00910.x. Genes Cells. 2005. PMID: 16324153 - Coding vs non-coding: Translatability of short ORFs found in putative non-coding transcripts.
Kageyama Y, Kondo T, Hashimoto Y. Kageyama Y, et al. Biochimie. 2011 Nov;93(11):1981-6. doi: 10.1016/j.biochi.2011.06.024. Epub 2011 Jun 26. Biochimie. 2011. PMID: 21729735 Review.
Cited by
- LncRNA 4930581F22Rik promotes myogenic differentiation by regulating the ERK/MAPK signaling pathway.
Chen WC, Chen WX, Tan YY, Xu YJ, Luo Y, Qian SY, Xu WY, Huang MC, Guo YH, Zhou ZG, Zhang Q, Lu JX, Xie SJ. Chen WC, et al. Heliyon. 2024 May 6;10(9):e30640. doi: 10.1016/j.heliyon.2024.e30640. eCollection 2024 May 15. Heliyon. 2024. PMID: 38774102 Free PMC article. - De novo transcriptome assembly of mouse male germ cells reveals novel genes, stage-specific bidirectional promoter activity, and noncoding RNA expression.
Gill ME, Rohmer A, Erkek-Ozhan S, Liang CY, Chun S, Ozonov EA, Peters AHFM. Gill ME, et al. Genome Res. 2023 Dec 27;33(12):2060-2078. doi: 10.1101/gr.278060.123. Genome Res. 2023. PMID: 38129075 Free PMC article. - csORF-finder: an effective ensemble learning framework for accurate identification of multi-species coding short open reading frames.
Zhang M, Zhao J, Li C, Ge F, Wu J, Jiang B, Song J, Song X. Zhang M, et al. Brief Bioinform. 2022 Nov 19;23(6):bbac392. doi: 10.1093/bib/bbac392. Brief Bioinform. 2022. PMID: 36094083 Free PMC article. - Common Features in lncRNA Annotation and Classification: A Survey.
Klapproth C, Sen R, Stadler PF, Findeiß S, Fallmann J. Klapproth C, et al. Noncoding RNA. 2021 Dec 13;7(4):77. doi: 10.3390/ncrna7040077. Noncoding RNA. 2021. PMID: 34940758 Free PMC article. Review. - AI applications in functional genomics.
Caudai C, Galizia A, Geraci F, Le Pera L, Morea V, Salerno E, Via A, Colombo T. Caudai C, et al. Comput Struct Biotechnol J. 2021 Oct 11;19:5762-5790. doi: 10.1016/j.csbj.2021.10.009. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34765093 Free PMC article. Review.