eggNOG v4.0: nested orthology inference across 3686 organisms - PubMed (original) (raw)

. 2014 Jan;42(Database issue):D231-9.

doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.

Kristoffer Forslund, Damian Szklarczyk, Kalliopi Trachana, Alexander Roth, Jaime Huerta-Cepas, Toni Gabaldón, Thomas Rattei, Chris Creevey, Michael Kuhn, Lars J Jensen, Christian von Mering, Peer Bork

Affiliations

eggNOG v4.0: nested orthology inference across 3686 organisms

Sean Powell et al. Nucleic Acids Res. 2014 Jan.

Abstract

With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Taxonomic levels for which orthologous groups are provided, with functional annotation coverage displayed. This tree shows the levels of the Tree of Life for which eggNOG v4 provides orthologous groups. For internal nodes, the size of the orange circle increases with the number of species in the core/periphery set, which falls under this taxonomic level, respectively. Blue dot markers or circles denote the 67 of 107 taxonomic levels that are new to eggNOG v4 over eggNOG v3. The bar charts displayed at the edge show what fraction of orthologous groups have meaningful free-text descriptions or COG/KOG/arCOG functional categories assigned, respectively.

Figure 2.

Figure 2.

Benchmarking and comparing eggNOGv4 and eggNOGv3. (A) The performance of eggNOG database was evaluated at two levels: gene (identifying false and missing assignments) and group (identifying fusions and fissions) level using the Reference Orthologous Groups (RefOGs). Initially, we mapped the reference orthologs to the bilaterian-specific orthologous groups (biNOGs). We score eggNOG performance using (i) all orthologous groups (‘All OGs’) to identify the number of fissions and fusions for every RefOG and (ii) the orthologous group with the larger overlap with RefOG (‘Single OG’, i.e. OG1). Then, we calculated how many genes were predicted accurately (true assignments, TA, black box), how many genes were not predicted as orthologs (missing assignments, MA, striped white box) and how many genes were erroneous orthology predictions (false assignments, FA, white box). Depending on whether the user wants to evaluate the database on a ‘Single OG’ or ‘All OGs’ manner, it will change the numbers of true, missing and false assignments. (B) Comparison of the two most recent eggNOG versions (v3 and v4) in terms of %RefOG coverage (number of true assignments per total number of reference orthologs). Venn diagram shows the species number between the two database releases; there are 47 overlapping species that included the 12 animals that are used in the benchmarking data set. (C) Comparison of eggNOGv3 and eggNOGv4 at the gene level (false and missing assignments). The larger bars indicate a larger number of errors. (D) Comparison of eggNOGv3 and eggNOGv4 at the group level (fusion and fission events). The larger bars indicate a larger number of errors.

Figure 3.

Figure 3.

Web site screenshots. The navigation tool has been improved to help users find relevant orthologous groups in a simple and intuitive way. The added insight of related groups is displayed inline with the use of chord diagrams. The thickness of the link (chord) between the groups represents the amount of proteins mapped between two orthologous group. The tooltips on the outer edge and chords display the amount of proteins mapped from a group and between groups, respectively.

Similar articles

Cited by

References

    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst. Zool. 1970;19:99–113. - PubMed
    1. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. - PubMed
    1. Gabaldón T, Koonin EV. Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 2013;14:360–366. - PMC - PubMed
    1. Koonin EV, Aravind L, Kondrashov AS. The impact of comparative genomics on our understanding of evolution. Cell. 2000;101:573–576. - PubMed
    1. Tordai H, Nagy A, Farkas K, Bányai L, Patthy L. Modules, multidomain proteins and organismic complexity. FEBS J. 2005;272:5064–5078. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources