KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters - PubMed (original) (raw)

. 2013 Jan;41(Database issue):D353-7.

doi: 10.1093/nar/gks1239. Epub 2012 Nov 27.

Toshiaki Katayama, Masumi Itoh, Kazushi Hiranuka, Shuichi Kawashima, Yuki Moriya, Shujiro Okuda, Michihiro Tanaka, Toshiaki Tokimatsu, Yoshihiro Yamanishi, Akiyasu C Yoshizawa, Minoru Kanehisa, Susumu Goto

Affiliations

PMID: 23193276
PMCID: PMC3531156
DOI: 10.1093/nar/gks1239

KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters

Akihiro Nakaya et al. Nucleic Acids Res. 2013 Jan.

Abstract

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.

PubMed Disclaimer

Figures

Figure 1.

Distribution of OCs in KEGG OC across three domains: eukaryotes, bacteria and archaea. The number indicates the number of OCs consisting of multiple genes, whereas the number in parenthesis indicates the number of singletons (OCs consisting of a single gene).

Figure 2.

An example of the output page of OC Viewer of query ‘eco:b0002’ (an example of KEGG GENES ID for a gene of E. coli K-12 MG1655) as an input. The PC column shows the PCs (eco.14, ecj.17, ecd.113, etc.). These PCs are aggregated into a TC named Escherichia_col.10890 at the higher taxonomic level indicated in 5th column. As the aggregation of the TCs is iterated from the 5th column to the 2nd column in the OC table, these PCs are merged to the top-level cluster OC.149602. By using the slider at the bottom left, one can focus to arbitrary depth in the taxonomic tree indicated at the bottom right.

Cited by

Metagenome survey of a multispecies and alga-associated biofilm revealed key elements of bacterial-algal interactions in photobioreactors.
Krohn-Molt I, Wemheuer B, Alawi M, Poehlein A, Güllert S, Schmeisser C, Pommerening-Röser A, Grundhoff A, Daniel R, Hanelt D, Streit WR. Krohn-Molt I, et al. Appl Environ Microbiol. 2013 Oct;79(20):6196-206. doi: 10.1128/AEM.01641-13. Epub 2013 Aug 2. Appl Environ Microbiol. 2013. PMID: 23913425 Free PMC article.
Transcription factor and microRNA-regulated network motifs for cancer and signal transduction networks.
Hsieh WT, Tzeng KR, Ciou JS, Tsai JJ, Kurubanjerdjit N, Huang CH, Ng KL. Hsieh WT, et al. BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S5. doi: 10.1186/1752-0509-9-S1-S5. Epub 2015 Jan 21. BMC Syst Biol. 2015. PMID: 25707690 Free PMC article.
Proposal for a new therapy for drug-resistant malaria using Plasmodium synthetic lethality inference.
Lee SJ, Seo E, Cho Y. Lee SJ, et al. Int J Parasitol Drugs Drug Resist. 2013 Jun 28;3:119-28. doi: 10.1016/j.ijpddr.2013.06.001. eCollection 2013 Dec. Int J Parasitol Drugs Drug Resist. 2013. PMID: 24533301 Free PMC article.
KCF-S: KEGG Chemical Function and Substructure for improved interpretability and prediction in chemical bioinformatics.
Kotera M, Tabei Y, Yamanishi Y, Moriya Y, Tokimatsu T, Kanehisa M, Goto S. Kotera M, et al. BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S2. doi: 10.1186/1752-0509-7-S6-S2. Epub 2013 Dec 13. BMC Syst Biol. 2013. PMID: 24564846 Free PMC article.
Oncofinder, a new method for the analysis of intracellular signaling pathway activation using transcriptomic data.
Buzdin AA, Zhavoronkov AA, Korzinkin MB, Venkova LS, Zenin AA, Smirnov PY, Borisov NM. Buzdin AA, et al. Front Genet. 2014 Mar 25;5:55. doi: 10.3389/fgene.2014.00055. eCollection 2014. Front Genet. 2014. PMID: 24723936 Free PMC article.

References

1. Pagani I, Liolios K, Jansson J, Chen IM, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC. The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012;40:D571–D579. - PMC - PubMed
1. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32:D258–D261. - PMC - PubMed
1. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40:D109–D114. - PMC - PubMed
1. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. - PubMed
1. Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, et al. Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA) Genome Res. 2002;12:493–502. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information