Plant Gene and Alternatively Spliced Variant Annotator. A Plant Genome Annotation Pipeline for Rice Gene and Alternatively Spliced Variant Identification with Cross-Species Expressed Sequence Tag Conservation from Seven Plant Species (original) (raw)

Plant gene and alternatively spliced variant annotator. A plant genome annotation pipeline for rice gene and alternatively spliced variant identification with cross- …

Plant …, 2007

The completion of the rice (Oryza sativa) genome draft has brought unprecedented opportunities for genomic studies of the world's most important food crop. Previous rice gene annotations have relied mainly on ab initio methods, which usually yield a high rate of false-positive predictions and give only limited information regarding alternative splicing in rice genes. Comparative approaches based on expressed sequence tags (ESTs) can compensate for the drawbacks of ab initio methods because they can simultaneously identify experimental data-supported genes and alternatively spliced transcripts. Furthermore, cross-species EST information can be used to not only offset the insufficiency of same-species ESTs but also derive evolutionary implications. In this study, we used ESTs from seven plant species, rice, wheat (Triticum aestivum), maize (Zea mays), barley (Hordeum vulgare), sorghum (Sorghum bicolor), soybean (Glycine max), and Arabidopsis (Arabidopsis thaliana), to annotate the rice genome. We developed a plant genome annotation pipeline, Plant Gene and Alternatively Spliced Variant Annotator (PGAA). Using this approach, we identified 852 genes (931 isoforms) not annotated in other widely used databases (i.e. the Institute for Genomic Research, National Center for Biotechnology Information, and Rice Annotation Project) and found 87% of them supported by both rice and nonrice EST evidence. PGAA also identified more than 44,000 alternatively spliced events, of which approximately 20% are not observed in the other three annotations. These novel annotations represent rich opportunities for rice genome research, because the functions of most of our annotated genes are currently unknown. Also, in the PGAA annotation, the isoforms with non-rice-EST-supported exons are significantly enriched in transporter activity but significantly underrepresented in transcription regulator activity. We have also identified potential lineage-specific and conserved isoforms, which are important markers in evolutionary studies. The data and the Web-based interface, RiceViewer, are available for public access at http://RiceViewer.genomics.sinica.edu.tw/.

PGAA, a Plant Genome Annotation Pipeline for Rice Gene and Alternatively Spliced Variant Identification with Cross-species EST Conservation from …

Plant …, 2007

The completion of the rice (Oryza sativa L.) genome draft has brought unprecedented opportunities for genomic studies of the world's most important food crop. Previous rice gene annotations have relied mainly on ab initio methods, which usually yield a high rate of false-positive predictions and give only limited information regarding alternative splicing in rice genes. Comparative approaches based on ESTs can compensate for the drawbacks of ab initio methods because they can simultaneously identify experimental data-supported genes and alternatively spliced transcripts. Furthermore, cross-species EST information can be used to not only offset the insufficiency of same-species ESTs but also derive evolutionary implications. In this study, we used ESTs from 7 plant species-namely rice, wheat, maize, barley, sorghum, soybean, and Arabidopsis thaliana-to annotate the rice genome. We developed a plant genome annotation pipeline, Plant Gene and Alternatively spliced variant Annotator (PGAA), Using this approach, we identified 852 genes (931 isoforms) not annotated in other widely used databases (i.e., TIGR, NCBI, and RAP) and found 87% of them supported by both rice and non-rice EST evidence. PGAA also identified more than 44,000 alternatively spliced events, of which ~20% are not observed in the other 3 annotations. These novel annotations represent rich opportunities for rice genome research because the functions of most of our annotated genes are currently unknown. As well, in the PGAA annotation, the isoforms with non-rice-EST-supported exons are significantly enriched in transporter activity but significantly underrepresented in transcription regulator activity. We have also identified potential lineage-specific and conserved isoforms, which are important markers in evolutionary studies. The data and the web-based interface, RiceViewer, are available for public access at http://RiceViewer.genomics.sinica.edu.tw/.

The TIGR Rice Genome Annotation Resource: improvements and new features

Nucleic Acids Research, 2007

In The Institute for Genomic Research Rice Genome Annotation project (http://rice.tigr.org), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42 653 nontransposable element-related genes encoding 49 472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13 237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31 739 gene models), representing 50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.

Comprehensive Analysis of Alternative Splicing In Rice and Comparative Analyses With Arabidopsis

BMC …, 2006

Background: Recently, genomic sequencing efforts were finished for Oryza sativa (cultivated rice) and Arabidopsis thaliana (Arabidopsis). Additionally, these two plant species have extensive cDNA and expressed sequence tag (EST) libraries. We employed the Program to Assemble Spliced Alignments (PASA) to identify and analyze alternatively spliced isoforms in both species. Results: A comprehensive analysis of alternative splicing was performed in rice that started with >1.1 million publicly available spliced ESTs and over 30,000 full length cDNAs in conjunction with the newly enhanced PASA software. A parallel analysis was performed with Arabidopsis to compare and ascertain potential differences between monocots and dicots. Alternative splicing is a widespread phenomenon (observed in greater than 30% of the loci with transcript support) and we have described nine alternative splicing variations. While alternative splicing has the potential to create many RNA isoforms from a single locus, the majority of loci generate only two or three isoforms and transcript support indicates that these isoforms are generally not rare events. For the alternate donor (AD) and acceptor (AA) classes, the distance between the splice sites for the majority of events was found to be less than 50 basepairs (bp). In both species, the most frequent distance between AA is 3 bp, consistent with reports in mammalian systems. Conversely, the most frequent distance between AD is 4 bp in both plant species, as previously observed in mouse. Most alternative splicing variations are localized to the protein coding sequence and are predicted to significantly alter the coding sequence. Conclusion: Alternative splicing is widespread in both rice and Arabidopsis and these species share many common features. Interestingly, alternative splicing may play a role beyond creating novel combinations of transcripts that expand the proteome. Many isoforms will presumably have negative consequences for protein structure and function, suggesting that their biological role involves post-transcriptional regulation of gene expression.

Comparative genomics of grass EST libraries reveals previously uncharacterized splicing events in crop plants

BMC plant biology, 2015

BackgroundCrop plants such as rice, maize and sorghum play economically-important roles as main sources of food, fuel, and animal feed. However, current genome annotations of crop plants still suffer false-positive predictions; a more comprehensive registry of alternative splicing (AS) events is also in demand. Comparative genomics of crop plants is largely unexplored.ResultsWe performed a large-scale comparative analysis (ExonFinder) of the expressed sequence tag (EST) library from nine grass plants against three crop genomes (rice, maize, and sorghum) and identified 2,879 previously-unannotated exons (i.e., novel exons) in the three crops. We validated 81% of the tested exons by RT-PCR-sequencing, supporting the effectiveness of our in silico strategy. Evolutionary analysis reveals that the novel exons, comparing with their flanking annotated ones, are generally under weaker selection pressure at the protein level, but under stronger pressure at the RNA level, suggesting that most...

Comparative Cross-Species Alternative Splicing in Plants

Plant Physiology, 2007

Alternative splicing (AS) can add significantly to genome complexity. Plants are thought to exhibit less AS than animals. An algorithm, based on expressed sequence tag (EST) pairs gapped alignment, was developed that takes advantage of the relatively small intron and exon size in plants and directly compares pairs of ESTs to search for AS. EST pairs gapped alignment was first evaluated in Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and tomato (Solanum lycopersicum) for which annotated genome sequence is available and was shown to accurately predict splicing events. The method was then applied to 11 plant species that include 17 cultivars for which enough ESTs are available. The results show a large, 3.7-fold difference in AS rates between plant species with Arabidopsis and rice in the lower range and lettuce (Lactuca sativa) and sorghum (Sorghum bicolor) in the upper range. Hence, compared to higher animals, plants show a much greater degree of variety in their AS rates...

The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists

Nucleic Acids Research, 2003

Rice is not only a major food staple for the world's population but it also is a model species for a major group of flowering plants, the monocotyledonous plants. Draft genomic sequence of two subspecies of rice, Oryza sativa spp. japonica and indica ssp. are publicly available. To provide the community with a resource to data-mine the rice genome, we have constructed an annotation resource for rice (http:// www.tigr.org/tdb/e2k1/osa1/). In this resource, we have annotated the rice genome for gene content, identified motifs/domains within the predicted genes, constructed a rice repeat database, identified related sequences in other plant species, and identified syntenic sequences between rice and maize.

The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology

Nucleic Acids Research, 2012

Over the past 10 years, genomes of cultivated rice cultivars and their wild counterparts have been sequenced although most efforts are focused on genome assembly and annotation of two major cultivated rice (Oryza sativa L.) subspecies, 93-11 (indica) and Nipponbare (japonica). To integrate information from genome assemblies and annotations for better analysis and application, we now introduce a comparative rice genome database, the Rice Genome Knowledgebase (RGKbase, http:// rgkbase.big.ac.cn/RGKbase/). RGKbase is built to have three major components: (i) integrated data curation for rice genomics and molecular biology, which includes genome sequence assemblies, transcriptomic and epigenomic data, genetic variations, quantitative trait loci (QTLs) and the relevant literature; (ii) User-friendly viewers, such as Gbrowse, GeneBrowse and Circos, for genome annotations and evolutionary dynamics and (iii) Bioinformatic tools for compositional and synteny analyses, gene family classifications, gene ontology terms and pathways and gene co-expression networks. RGKbase current includes data from five rice cultivars and species: Nipponbare (japonica), 93-11 (indica), PA64s (indica), the African rice (Oryza glaberrima) and a wild rice species (Oryza brachyantha). We are also constantly introducing new datasets from variety of public efforts, such as two recent releases-sequence data from $1000 rice varieties, which are mapped into the reference genome, yielding ample high-quality single-nucleotide polymorphisms and insertionsdeletions.

Analysis of Alternatively Spliced Rice Transcripts Using Microarray Data

Rice, 2008

Alternative splicing creates a diversity of gene products in higher eukaryotes. Twenty-five percent (1,583/ 6,371) of predicted alternatively spliced transcripts can be detected using the NSF45K rice whole-genome oligonucleotide array. We used the NSF45K array to assess differential expression patterns of 507 loci showing at least a twofold change in expression between light-and darkgrown seedlings. At least 42% of these loci show evidence of alternative splicing in aerial seedling tissue of Oryza sativa ssp. japonica cv. Nipponbare. Most alternative splice forms display the same pattern of regulation as the primary, or most highly expressed, transcript; however, splice forms for ten loci, represented by 35 oligos, display opposite expression patterns in the light vs. dark. We found similar evidence of alternative splicing events in Affymetrix microarray data for Nipponbare rice treated with the causative agent of fungal rice blast, Magnaporthe grisea. This strategy for analyzing alternative splicing in microarray data will enable delineation of the diversity of splicing in rice.

Genome- wide structural and functional variant discovery of rice landraces using genotyping by sequencing

Molecular Biology Reports, 2020

Rice landraces are vital genetic resources for agronomic and quality traits but the undeniable collection of Kerala landraces remains poorly delineated. To effectively conserve, manage, and use these resources, understanding the genomic structure of germplasm is essential. Genotyping by sequencing (GBS) enables identification of an immense number of single nucleotide polymorphism (SNP) and insertion deletion (InDel) from 96 rice germplasm. In the present study, a total of 16.9 × 10 7 reads were generated, and among that 16.3 × 10 7 reads were mapped to the indica reference genome. Exploring GBS data unfolded a wide genomic variations including 82,59,639 SNPs and 1,07,140 Indels. Both neighbor-joining tree and principal coordinate analysis with InDel markers revealed the selected germplasm in this study as highly diverse in structure. We assembled unmapped reads which were further employed for gene ontology analysis. These unmapped sequences that are generally expelled from subsequent studies of GBS data analysis may exist as an unexplored resort for several novel significant biological findings. The discovery of SNPs from the haplotyping results of GS3 and GIF1 genes provided insight into marker-assisted selection based on grain size and yield and can be utilized for rice yield improvement. To our knowledge, this is the first report on structural variation analysis using the GBS platform in rice landraces collected from Kerala. Genomic information from this study endows with valuable resources for perceptive rice landrace structure and can also facilitate sequencing-based molecular breeding.