RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data (original) (raw)
- Sven Findeiß3,
- Stephan A. Müller4,
- Stefan Kalkhof4,
- Martin von Bergen4,
- Ivo L. Hofacker2,
- Peter F. Stadler2,3,5,6,7 and
- Nick Goldman1
- 1EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, United Kingdom
- 2Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- 3Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, D-04107 Leipzig, Germany
- 4Department of Proteomics, Helmholtz Centre for Environmental Research, 04318 Leipzig, Germany
- 5Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- 6RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, 04103 Leipzig, Germany
- 7Santa Fe Institute, Santa Fe, New Mexico 87501, USA
- ↵8 Present address: MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA 02139, USA.
Abstract
With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
Footnotes
Reprint requests to: Stefan Washietl, MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA 02139, USA; e-mail: wash{at}mit.edu; fax: (617) 253-6652.
Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.2536111.
Received November 10, 2010.
Accepted January 12, 2011.
Copyright © 2011 RNA Society