PGAA, a Plant Genome Annotation Pipeline for Rice Gene and Alternatively Spliced Variant Identification with Cross-species EST Conservation from … (original) (raw)

The completion of the rice (Oryza sativa L.) genome draft has brought unprecedented opportunities for genomic studies of the world's most important food crop. Previous rice gene annotations have relied mainly on ab initio methods, which usually yield a high rate of false-positive predictions and give only limited information regarding alternative splicing in rice genes. Comparative approaches based on ESTs can compensate for the drawbacks of ab initio methods because they can simultaneously identify experimental data-supported genes and alternatively spliced transcripts. Furthermore, cross-species EST information can be used to not only offset the insufficiency of same-species ESTs but also derive evolutionary implications. In this study, we used ESTs from 7 plant species-namely rice, wheat, maize, barley, sorghum, soybean, and Arabidopsis thaliana-to annotate the rice genome. We developed a plant genome annotation pipeline, Plant Gene and Alternatively spliced variant Annotator (PGAA), Using this approach, we identified 852 genes (931 isoforms) not annotated in other widely used databases (i.e., TIGR, NCBI, and RAP) and found 87% of them supported by both rice and non-rice EST evidence. PGAA also identified more than 44,000 alternatively spliced events, of which ~20% are not observed in the other 3 annotations. These novel annotations represent rich opportunities for rice genome research because the functions of most of our annotated genes are currently unknown. As well, in the PGAA annotation, the isoforms with non-rice-EST-supported exons are significantly enriched in transporter activity but significantly underrepresented in transcription regulator activity. We have also identified potential lineage-specific and conserved isoforms, which are important markers in evolutionary studies. The data and the web-based interface, RiceViewer, are available for public access at http://RiceViewer.genomics.sinica.edu.tw/.