The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species - PubMed (original) (raw)

The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species

J Quackenbush et al. Nucleic Acids Res. 2001.

Abstract

While genome sequencing projects are advancing rapidly, EST sequencing and analysis remains a primary research tool for the identification and categorization of gene sequences in a wide variety of species and an important resource for annotation of genomic sequence. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi. shtml) are a collection of species-specific databases that use a highly refined protocol to analyze EST sequences in an attempt to identify the genes represented by that data and to provide additional information regarding those genes. Gene Indices are constructed by first clustering, then assembling EST and annotated gene sequences from GenBank for the targeted species. This process produces a set of unique, high-fidelity virtual transcripts, or Tentative Consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to mapping and genomic sequence data, to provide links between orthologous and paralogous genes and as a resource for comparative sequence analysis.

PubMed Disclaimer

Figures

Figure 1

Figure 1

An example THC from the Human Gene Index. The consensus sequence is presented in FASTA format below which the locations of the gene sequences (red) and ESTs that comprise the assembly are shown with their respective locations within the assembly. Links are provided to GenBank records, internal data for all ESTs sequenced at TIGR and to clones available through the ATCC. This THC has been assigned a putative ID of ‘insulin receptor inhibitor, muscle’ as it contains a HT853 (as well as gene sequences from GenBank).

Figure 2

Figure 2

An example TOG from the TOGA database. The human, mouse and rat TCs all contain annotated genes; those in mouse and rat have been identified as ‘bithoraxoid-like protein’ while the human gene is simply annotated as ‘HSPC162’ and the cattle TC consists only of ESTs. The stringent overlap criteria used to construct the TOGs makes it unlikely that these matches are spurious and provides putative functional annotation for the previously unclassified human and bovine gene and EST sequences.

Figure 3

Figure 3

Alignment of TCs from the TIGR Plant Gene Indices with the sequence of Arabidopsis thaliana Chromsome II. The coding sequence of a putative casein kinase II catalytic subunit shows significant homology to the same gene in other plants as is evident from an alignment between the Arabidopsis genomic sequence and the various plant TCs. This gene is well conserved across both monocots and dicots. The multiple hits seen in some species may represent paralogs, gene families, alternative splice forms or partial TC assemblies.

Similar articles

Cited by

References

    1. Adams M.D., Kelley,J.M., Gocayne,J.D., Dubnick,M., Polymeropoulos,M.H.M., Xiao,H., Merril,C.R., Wu,A., Olde,B., Moreno,R.F. et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252, 1651–1656. - PubMed
    1. Boguski M.S. and Schuler,G.D. (1995) ESTablishing a human transcript map. Nature Genet., 10, 369–371. - PubMed
    1. Burke J., Wang,H., Hide,W. and Davison,D.B. (1998) Alternative gene form discovery and candidate gene selection from gene indexing projects. Genome Res., 8, 276–290. - PMC - PubMed
    1. Quackenbush J., Liang,F., Holt,I., Pertea,G. and Upton,J. (2000). The TIGR Gene Indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res., 28, 141–145. - PMC - PubMed
    1. Liang F., Holt,I., Pertea,G., Karamycheva,S., Salzberg,S.L. and Quackenbush,J. (2000) An optimized protocol for analysis of EST sequences. Nucleic Acids Res., 28, 3657–3665. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources