An integrative approach to ortholog prediction for disease-focused and other functional studies - PubMed (original) (raw)

An integrative approach to ortholog prediction for disease-focused and other functional studies

Yanhui Hu et al. BMC Bioinformatics. 2011.

Abstract

Background: Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward.

Results: We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist).

Conclusions: DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.

PubMed Disclaimer

Figures

Figure 1

Figure 1

DIOPT facilitates identification of predicted orthologs based on various approaches, with flexible inputs and outputs. The DIOPT graphical user interface facilitates input of small or large gene or protein lists and is compatible with one or more of a variety of identifiers. Three filters are available: two of them exclude relationships that are predicted by < 2 or < 3 tools unless the only match score is equal to or lower than the threshold. The third one and most stringent filter limits outputs to the best-matching ortholog(s) per gene, as judged by the number of tools supporting the prediction. DIOPT provides a score based on the number of tools supporting the prediction and a weighted score based on the average of GO semantic similarity of orthologous pairs predicted by each tool using high quality GO functional annotation. In addition, DIOPT also provides the option to view the original score of each tool and a protein-protein alignment based on an updated proteome annotation (RefSeq release 44). Percent protein identity is calculated for both the overall alignment and domain regions.

Figure 2

Figure 2

Assessing the correlation of functional relatedness and DIOPT score with gene ontology molecular function (GO MF) annotations. GO MF1, all GO annotations of molecular function. GO MF2, the subset of GO annotations of molecular function supported either by experimental data or author/curator statement. GO MF3, the subset of GO annotation of molecular function supported by experimental data. 2a. Correlation of functional relatedness and DIOPT score with GO MF annotations. 2b. Comparison of the functional relatedness of orthologous genes predicted by individual tools.

Figure 3

Figure 3

Comparing sensitivity and specificity among the nine individual ortholog prediction tools and the integrative tool DIOPT. Comparing sensitivity and specificity using a manually assembled reference set (3a), or using the KOG human-specific gene set (3b). Sensitivity is defined as the percent of manually assembled pairs that can be identified by each tool versus all the manually assembled orthologous pairs (same for Figures 3a and 3b). In 3a, specificity is defined as the percent of manually assembled pairs that can be identified by each tool versus all the orthologous pairs identified by each tool if queried with either the Drosophila or the human genes from the test set. In 3b, specificity is defined as the percent of putative human-specific genes that do not have fly ortholog versus all the human specific genes from KOG list.

Figure 4

Figure 4

Identification of predicted disease gene orthologs in model organisms. 4a. Strategy for identifying disease genes in model organisms. 4b. Summary of disease genes from OMIM and GWAS and predicted orthologous relationships for Drosophila.

Figure 5

Figure 5

User Interface for the disease and trait query tool DIOPT-DIST. Users can query with a list of genes from model organisms, and a list of human orthologs along with their disease annotations from OMIM/GWAS will be retrieved. Alternatively, users can query with a disease term, disease category or OMIM IDs (OMIM IDs for disease phenotype and/or gene/locus), and a list of human genes as well as their corresponding orthologs in a model organism will be retrieved. The DIOPT score is displayed on the results page and is hyper-linked to detailed information including a protein alignment, the original tool scores, and the weighted DIOPT score.

Similar articles

Cited by

References

    1. McKusick VA. On the naming of clinical disorders, with particular reference to eponyms. Medicine (Baltimore) 1998;77(1):1–2. doi: 10.1097/00005792-199801000-00001. - DOI - PubMed
    1. Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick's Online Mendelian Inheritance in Man (OMIM) Nucleic Acids Res. 2009. pp. D793–796. - PMC - PubMed
    1. Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA. Online Mendelian Inheritance in Man (OMIM) Hum Mutat. 2000;15(1):57–61. doi: 10.1002/(SICI)1098-1004(200001)15:1<57::AID-HUMU12>3.0.CO;2-G. - DOI - PubMed
    1. Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005. pp. D514–517. - PMC - PubMed
    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009;106(23):9362–9367. doi: 10.1073/pnas.0903103106. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources