The COG database: new developments in phylogenetic classification of proteins from complete genomes - PubMed (original) (raw)

The COG database: new developments in phylogenetic classification of proteins from complete genomes

R L Tatusov et al. Nucleic Acids Res. 2001.

Abstract

The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Growth dynamics of the COG set with the increase of number of included genomes. The circles show the sequence of genome inclusion according to the actual order of sequencing, and the smooth line shows the mean of 106 random permutations of the genome order. The colored area indicates the range between the maximal and minimal value for each point (number of genomes) in 106 random permutations.

Figure 2

Figure 2

An example of a COG-Info page.

Figure 3

Figure 3

Classification of genome by co-occurrence in COGs using PCA. (A) All COGs. (B) Translation, transcription and replication (functional categories J, K and L). (C) Metabolism (functional categories C, E, F, G, H and I).

Similar articles

Cited by

References

    1. Tatusov R.L., Koonin,E.V. and Lipman,D.J. (1997) A genomic perspective on protein families. Science, 278, 631–637. - PubMed
    1. Fitch W.M. (1970) Distinguishing homologous from analogous proteins. Syst. Zool., 19, 99–106. - PubMed
    1. Tatusov R.L., Galperin,M.Y., Natale,D.A. and Koonin,E.V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res., 28, 33–36. - PMC - PubMed
    1. Kawarabayasi Y., Hino,Y., Horikawa,H., Yamazaki,S., Haikawa,Y., Jin-no,K., Takahashi,M., Sekine,M., Baba,S., Ankai,A. et al. (1999) Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. DNA Res., 6, 83–101. - PubMed
    1. Natale D.A., Shankavaram,U.T., Galperin,M.Y., Wolf,Y.I., Aravind,L. and Koonin,E.V. (2000) Genome annotation using clusters of orthologous groups of proteins (COGs) – towards understanding the first genome of a Crenarchaeon. Genome Biol., 1, 0009.1–0009.19. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources