Benchmarking ortholog identification methods using functional genomics data - PubMed (original) (raw)
Benchmarking ortholog identification methods using functional genomics data
Tim Hulsen et al. Genome Biol. 2006.
Abstract
Background: The transfer of functional annotations from model organism proteins to human proteins is one of the main applications of comparative genomics. Various methods are used to analyze cross-species orthologous relationships according to an operational definition of orthology. Often the definition of orthology is incorrectly interpreted as a prediction of proteins that are functionally equivalent across species, while in fact it only defines the existence of a common ancestor for a gene in different species. However, it has been demonstrated that orthologs often reveal significant functional similarity. Therefore, the quality of the orthology prediction is an important factor in the transfer of functional annotations (and other related information). To identify protein pairs with the highest possible functional similarity, it is important to qualify ortholog identification methods.
Results: To measure the similarity in function of proteins from different species we used functional genomics data, such as expression data and protein interaction data. We tested several of the most popular ortholog identification methods. In general, we observed a sensitivity/selectivity trade-off: the functional similarity scores per orthologous pair of sequences become higher when the number of proteins included in the ortholog groups decreases.
Conclusion: By combining the sensitivity and the selectivity into an overall score, we show that the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins.
Figures
Figure 1
Correlation in expression profiles. Correlation in expression patterns between the (a) human-mouse (Hs-Mm) and (b) human-worm (Hs-Ce) orthologous pairs from the benchmarked methods versus the average proteome size. Vertical error bars show the standard deviation from the average correlation coefficient. The trendline shown is a linear regression trendline. The methods having a fourth letter 'B' behind the method name, shown as squares in the graph, are group orthology methods in which only the best scoring pairs are taken into account.
Figure 2
Equal InterPro accession number. Conservation of InterPro accession number between the (a) human-mouse (Hs-Mm) and (b) human-worm (Hs-Ce) orthologous pairs from the benchmarked methods versus the average proteome size.
Figure 3
Conservation of co-expression. Conservation of co-expression from human-human gene pairs to orthologous (a) mouse-mouse and (b) worm-worm gene pairs from the benchmarked methods versus the average proteome size. Ce, Caenorhabditis elegans; Hs, Homo sapiens; Mm, Mus musculus.
Figure 4
Conservation of gene order. Conservation of gene order from human-human gene pairs to orthologous (a) mouse-mouse and (b) worm-worm gene pairs from the benchmarked methods versus the average proteome size. Ce, Caenorhabditis elegans; Hs, Homo sapiens.
Figure 5
Conservation of protein-protein interaction. Conservation of protein-protein interaction from human-human protein pairs to orthologous (a) mouse-mouse and (b) worm-worm protein pairs from the benchmarked methods versus the average proteome size. Ce, Caenorhabditis elegans; Hs, Homo sapiens.
Figure 6
Overall scoring graph. Overall scoring graph, created by adding up all normalized benchmarking scores per ortholog identification method. X-axis, the several ortholog identification methods, sorted by average proteome size or number of protein pairs; Y-axis, the sum of all five benchmarking scores per ortholog identification method. Red, correlation of expression profiles; green, equal InterPro accession numbers; blue, conservation of co-expression; orange, conservation of gene order; purple, conservation of protein-protein interaction. (a) Human-mouse (Hs-Mm). (b) Human-worm (Hs-Ce).
Similar articles
- Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.
Remm M, Storm CE, Sonnhammer EL. Remm M, et al. J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197. J Mol Biol. 2001. PMID: 11743721 - Gene-oriented ortholog database: a functional comparison platform for orthologous loci.
Ho MR, Chen CH, Lin WC. Ho MR, et al. Database (Oxford). 2010;2010:baq002. doi: 10.1093/database/baq002. Epub 2010 Feb 10. Database (Oxford). 2010. PMID: 20428317 Free PMC article. - Orthology confers intron position conservation.
Henricson A, Forslund K, Sonnhammer EL. Henricson A, et al. BMC Genomics. 2010 Jul 2;11:412. doi: 10.1186/1471-2164-11-412. BMC Genomics. 2010. PMID: 20598118 Free PMC article. - [Prediction of functionally related proteins by comparative genomics in silico].
Piatnitskiĭ MA, Lisitsa AV, Archakov AI. Piatnitskiĭ MA, et al. Biomed Khim. 2009 May-Jun;55(3):230-46. Biomed Khim. 2009. PMID: 19662999 Review. Russian. - Orthology for comparative genomics in the mouse genome database.
Dolan ME, Baldarelli RM, Bello SM, Ni L, McAndrews MS, Bult CJ, Kadin JA, Richardson JE, Ringwald M, Eppig JT, Blake JA. Dolan ME, et al. Mamm Genome. 2015 Aug;26(7-8):305-13. doi: 10.1007/s00335-015-9588-5. Epub 2015 Jul 30. Mamm Genome. 2015. PMID: 26223881 Free PMC article. Review.
Cited by
- A new chromosome-level genome assembly and annotation of Cryptosporidium meleagridis.
Penumarthi LR, Baptista RP, Beaudry MS, Glenn TC, Kissinger JC. Penumarthi LR, et al. bioRxiv [Preprint]. 2024 Feb 17:2024.02.16.580748. doi: 10.1101/2024.02.16.580748. bioRxiv. 2024. PMID: 38405792 Free PMC article. Preprint. - Hybrid Deep Learning Based on a Heterogeneous Network Profile for Functional Annotations of Plasmodium falciparum Genes.
Suratanee A, Plaimas K. Suratanee A, et al. Int J Mol Sci. 2021 Sep 16;22(18):10019. doi: 10.3390/ijms221810019. Int J Mol Sci. 2021. PMID: 34576183 Free PMC article. - Genome-Wide Analysis of Four Pathotypes of Wheat Rust Pathogen (Puccinia graminis) Reveals Structural Variations and Diversifying Selection.
Kiran K, Rawal HC, Dubey H, Jaswal R, Bhardwaj SC, Deshmukh R, Sharma TR. Kiran K, et al. J Fungi (Basel). 2021 Aug 27;7(9):701. doi: 10.3390/jof7090701. J Fungi (Basel). 2021. PMID: 34575739 Free PMC article. - KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases.
Huang LC, Taujale R, Gravel N, Venkat A, Yeung W, Byrne DP, Eyers PA, Kannan N. Huang LC, et al. BMC Bioinformatics. 2021 Sep 18;22(1):446. doi: 10.1186/s12859-021-04358-3. BMC Bioinformatics. 2021. PMID: 34537014 Free PMC article. - Domestication Shapes the Community Structure and Functional Metagenomic Content of the Yak Fecal Microbiota.
Fu H, Zhang L, Fan C, Liu C, Li W, Li J, Zhao X, Jia S, Zhang Y. Fu H, et al. Front Microbiol. 2021 Mar 31;12:594075. doi: 10.3389/fmicb.2021.594075. eCollection 2021. Front Microbiol. 2021. PMID: 33897627 Free PMC article.
References
- Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. - PubMed
- Chimpanzee sequencing whitepaper http://genome.wustl.edu/ancillary/data/whitepapers/Pan_troglodytes_WP2.pdf
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources