TIGRFAMs: a protein family resource for the functional identification of proteins - PubMed (original) (raw)

TIGRFAMs: a protein family resource for the functional identification of proteins

D H Haft et al. Nucleic Acids Res. 2001.

Abstract

TIGRFAMs is a collection of protein families featuring curated multiple sequence alignments, hidden Markov models and associated information designed to support the automated functional identification of proteins by sequence homology. We introduce the term 'equivalog' to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor. Related proteins are grouped into equivalog families where possible, and otherwise into protein families with other hierarchically defined homology types. TIGRFAMs currently contains over 800 protein families, available for searching or downloading at www.tigr.org/TIGRFAMs. Classification by equivalog family, where achievable, complements classification by orthology, superfamily, domain or motif. It provides the information best suited for automatic assignment of specific functions to proteins from large-scale genome sequencing projects.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Homology relationships can be classified by evolutionary history, as shown in this model phylogenetic tree. The ancestral node, or root, is at the top. Duplication creates paralogs A and B with distinct function. Speciation creates an orthologous set A1, A2 and A3 from A, and B1, B2 and B3 from B. If B1, B2 and B3 share the same function, they are equivalogs as well as orthologs. Dashed lines indicate a possible pattern of gene loss that leaves only A1, B2 and B3. The resulting protein subfamily should exhibit bi-directional best hits across species but is not orthologous and does not show conserved function.

Similar articles

Cited by

References

    1. Bateman A., Birney,E., Durbin,R., Eddy,S.R., Howe,K.L. and Sonnhammer,E.L. (2000) The Pfam Protein Families Database. Nucleic Acids Res., 28, 263–266. - PMC - PubMed
    1. Sonnhammer E.L., Eddy,S.R. and Durbin,R. (1997) Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins, 28, 405–420. - PubMed
    1. Srinivasarao G.Y., Yeh,L.S., Marzec,C.R., Orcutt,B.C. and Barker,W.C. (1999) PIR-ALN: a database of protein sequence alignments. Bioinformatics, 15, 382–390. - PubMed
    1. Henikoff J.G., Greene,E.A., Pietrokovski,S. and Henikoff,S. (2000) Increased coverage of protein families with the Blocks Database servers. Nucleic Acids Res., 28, 228–230. - PMC - PubMed
    1. Tatusov R.L., Galperin,M.Y., Natale,D.A. and Koonin,E.V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res., 28, 33–36. Updated article in this issue: Nucleic Acids Res. (2001), 29, 22–28. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources