eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations - PubMed (original) (raw)

. 2010 Jan;38(Database issue):D190-5.

doi: 10.1093/nar/gkp951. Epub 2009 Nov 9.

Affiliations

eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

J Muller et al. Nucleic Acids Res. 2010 Jan.

Abstract

The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224,847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2,242 035 proteins (built from 2,590,259 proteins) and provides a broad functional description for at least 1,966,709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Statistics on the content of the eggNOG database. The eggNOG assignments for 630 complete genomes were mapped onto the tree of life. The stacked bar charts outside the tree show the proportion of genes from each genome that can be assigned to a functionally annotated orthologous group (green), an unannotated orthologous group (orange) or no orthologous group (gray). The length of each bar is proportional to the logarithm of the number of genes in the respective genome. The pie charts inside the tree show the fractions of orthologous groups at each level in the hierarchy that could be annotated with a functional category (green for NOGs, light green for extended COGs and KOGs) or not (orange for NOGs, light orange for extended COGs and KOGs). An interactive version is available in the ‘Overview’ section at:

http://eggnog.embl.de

. This figure was made using iTOL.

Figure 2.

Figure 2.

Screenshot of the detailed results page. The eggNOG database was queried for the term ‘mTERF’, the mitochondrial precursor of the transcription termination factor 1. The navigation tree at the top of the page allows the user to change the view to more coarse-grained orthologous groups, for example, the mammalian orthologous groups. The tab menu, shown here, enables several in-depth interactions with the new data (i.e. MSA or phylogenetic trees, here displayed with SMART domains).

References

    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst. Zool. 1970;19:99–113. - PubMed
    1. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. - PubMed
    1. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet. 2005;39:309–338. - PubMed
    1. Sonnhammer EL, Koonin EV. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002;18:619–620. - PubMed
    1. Berglund AC, Sjolund E, Ostlund G, Sonnhammer EL. InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res. 2008;36:D263–D266. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources