Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes - PubMed (original) (raw)

. 2006 Jun;78(6):1011-25.

doi: 10.1086/504300. Epub 2006 Apr 25.

Affiliations

Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes

Lude Franke et al. Am J Hum Genet. 2006 Jun.

Abstract

Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray co-expressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown.

PubMed Disclaimer

Figures

Figure  1

Figure 1

Basic principles of the prioritization method for positional candidate genes with the use of a functional human gene network. The method integrates different gene-gene interaction data sources in a Bayesian way (left panel). Subsequently, this gene network is used to prioritize positional candidate genes, with all genes assigned an initial score of zero. In the example (right panel), three different susceptibility loci are analyzed, each containing a disease gene (P, Q, or R) and two nondisease genes. In each locus, the three positional candidate genes increase the scores of nearby genes in the gene network, by use of a kernel function that models the relationship between gene-gene distance and score effect. Genes within each locus are ranked on the basis of their eventual effect score, corrected for differences in the topology of the network (see the “Material and Methods” section).

Figure  2

Figure 2

Integration of data sets in four gene networks. a, Data sets were benchmarked against a set of 55,606 known true-positive gene pairs derived from BIND, KEGG, HPRD, and Reactome and 800,608 true-negative gene pairs derived from GO. The Venn diagram indicates the data sources from which the true positives were derived and their degree of overlap. Numbers in parentheses indicate the number of interactions that are provided by each of the data sets. b, Potential gene-gene interactions derived from GO, microarray coexpression data, and human and orthologous protein-protein interaction data were integrated using a Bayesian classifier. The steps involved in building this classifier are shown.

Figure  3

Figure 3

ROC curve of the GO network, the MA+PPI network, and the combined GO+MA+PPI network. The baseline (solid gray line) indicates the performance of a classifier that would be totally uninformative.

Figure  4

Figure 4

Accuracy of positional candidate-gene prioritization. a and b, Percentage of the 409 disease genes that was ranked among the top 5 (a) or top 10 (b) genes per locus, after artificial susceptibility loci of varying widths around these genes were constructed and when different types of gene networks were used. The baselines (gray lines) indicate the percentage of disease genes expected to rank among the top 5 or top 10 genes by chance. c, ROC curves for susceptibility loci that contain 50, 100, or 150 genes.

Figure  5

Figure 5

Probability of detecting at least one disease gene when a fixed number of top-ranked positional candidate genes—as ranked by Prioritizer—are followed up for each locus. Each locus contains either 100 or 150 genes, and the GO+MA+PPI+TP network was employed. The baselines (dashed lines) show the probability of detecting at least one disease gene if a fixed number of arbitrarily chosen genes in each locus are followed up.

Figure  6

Figure 6

Prioritizer analysis of breast cancer. Susceptibility loci, each containing 100 genes, were defined around 10 known breast cancer genes. The 10 highest-ranked genes for each locus are shown in the graph, with colors indicating the locus in which they reside. Use of the GO+MA+PPI network led to four breast cancer genes (PIK3CA, CHEK2, BARD1, and TP53 [_circles_]) being ranked in the top 10. Chr. = chromosome.

Figure  A1

Figure A1

Difference in likelihood ratios between genes that were represented on the microarrays and genes that were not.

Figure  A2

Figure A2

Degree distributions for the four networks. The MA+Y2H network has a topology that most closely follows a scale-free, power-law distribution, compared with the other three networks.

Similar articles

Cited by

References

Web Resources

    1. Biomolecular Interaction Network Database (BIND), http://bind.ca/
    1. Ensembl, http://www.ensembl.org/index.html
    1. GeneNetwork, http://www.genenetwork.nl
    1. Human Protein Reference Database (HPRD), http://www.hprd.org/
    1. Kyoto Encyclopedia of Genes and Genomes (KEGG), http://www.genome.jp/kegg/

References

    1. Jacobi FK, Broghammer M, Pesch K, Zrenner E, Berger W, Meindl A, Pusch CM (2000) Physical mapping and exclusion of GPR34 as the causative gene for congenital stationary night blindness type 1. Hum Genet 107:89–9110.1007/s004390050017 - DOI - PubMed
    1. Seri M, Martucciello G, Paleari L, Bolino A, Priolo M, Salemi G, Forabosco P, Caroli F, Cusano R, Tocco T, Lerone M, Cama A, Torre M, Guys JM, Romeo G, Jasonni V (1999) Exclusion of the Sonic Hedgehog gene as responsible for Currarino syndrome and anorectal malformations with sacral hypodevelopment. Hum Genet 104:108–11010.1007/s004390050919 - DOI - PubMed
    1. Simard J, Feunteun J, Lenoir G, Tonin P, Normand T, Luu The V, Vivier A, et al (1993) Genetic mapping of the breast-ovarian cancer syndrome to a small interval on chromosome 17q12-21: exclusion of candidate genes EDH17B2 and RARA. Hum Mol Genet 2:1193–1199 - PubMed
    1. Tumer Z, Croucher PJ, Jensen LR, Hampe J, Hansen C, Kalscheuer V, Ropers HH, Tommerup N, Schreiber S (2002) Genomic structure, chromosome mapping and expression analysis of the human AVIL gene, and its exclusion as a candidate for locus for inflammatory bowel disease at 12q13-14 (IBD2). Gene 288:179–18510.1016/S0378-1119(02)00478-X - DOI - PubMed
    1. Walpole SM, Ronce N, Grayson C, Dessay B, Yates JR, Trump D, Toutain A (1999) Exclusion of RAI2 as the causative gene for Nance-Horan syndrome. Hum Genet 104:410–41110.1007/s004390050976 - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources