Predicting genes for orphan metabolic activities using phylogenetic profiles - PubMed (original) (raw)
Predicting genes for orphan metabolic activities using phylogenetic profiles
Lifeng Chen et al. Genome Biol. 2006.
Abstract
Homology-based methods fail to assign genes to many metabolic activities present in sequenced organisms. To suggest genes for these orphan activities we developed a novel method that efficiently combines local structure of a metabolic network with phylogenetic profiles. We validated our method using known metabolic genes in Saccharomyces cerevisiae and Escherichia coli. We show that our method should be easily transferable to other organisms, and that it is robust to errors in incomplete metabolic networks.
Figures
Figure 1
The average phylogenetic correlation between a target gene and all other network genes at a certain metabolic network distance. The standard deviation of the average correlation for all possible network gaps is represented by the error bars. The dashed line shows the background correlation, estimated by the average phylogenetic correlation between any metabolic and non-metabolic genes. The average phylogenetic correlation between two genes decreases monotonically with their separation in the network.
Figure 2
'Fit' test of a candidate gene in a network gap. We use a self-consistent test in which a known gene E4 is removed from the network, leaving a gap in its place. We then: 1, put candidate genes in the gap one by one; 2, determine the function value for every candidate gene (Equations 1 to 3); and 3, rank all candidate genes based on their function values. In the figure we show an example when the correct gene E4 was ranked as number 6.
Figure 3
Enzyme predictions based on phylogenetic profiles. (a) The cumulative fraction of correctly predicted genes as a function of rank among all non-metabolic genes. All 6,093 non-metabolic yeast genes plus a known correct gene were ranked using Equation 2. The cumulative distribution is shown for ranks from 1 to 100; the inset shows the same distribution for all ranks. (b) The effect of connection specificity adjustment. Only highly ranked genes (1 to 50) are shown. (c) Comparison of the performance with all non-metabolic genes as candidates to that with only hypothetical genes as candidates for an orphan activity. (d) Predictions for the E. coli metabolic network. The cost function with the parameters optimized for the yeast network showed comparable performance to the cost function with the parameters specifically optimized for the E. coli network.
Figure 4
Importance of metabolic neighborhood for the predictive power of the algorithm. (a) Informative and non-informative gaps. About one-third of the gaps did not allow any discrimination between the correct and average genes (represented by bin 0 in the figure), that is, the function value of the correct gene is equal to or smaller than the function value for average genes determined by Equation 2. The red line shows the average rank of correct genes represented in each bin. Genes filling gaps with higher discrimination ratios are ranked higher by the algorithm. (b) The relationship between the rank of a correct enzyme in a gap and the average correlation of first layer genes around the gap. A metabolic gene for a gap with a high average first layer correlation (>0.5) is usually highly ranked by the prediction algorithm (black line) but the fraction of such gaps is small (red bins).
Figure 5
The algorithm performance using an incomplete metabolic network. We show the algorithm performance for yeast networks with a certain fraction of genes randomly deleted. The performance decrease is gradual as up to 50% of the network nodes are deleted. For example, when half of the network is deleted, we can still predict more than 33% of the correct metabolic genes within the top 50 among all candidate genes, compared to 0.8% by random chance.
Figure 6
Context-based associations versus the metabolic network distance for the yeast metabolic network. (a) mRNA expression distance. The expression distance is calculated as 1-|correlation|, where correlation is the Spearman's rank correlation between genes' mRNA expression. Close neighbors in the metabolic network have similar expression profiles. (b) Gene fusion events (Rosetta Stone). The fraction of proteins involved in gene fusion events. The adjacent genes in the network are much more likely to form a Rosetta Stone protein. (c) Phylogenetic profiles. Pearson's correlations between phylogenetic profiles for genes close in the network are more likely to be similar. (d) Chromosomal distance between genes. The mean physical distances (in kilobase pairs (kbp)) between ORFs are shown. The adjacent genes in the network are significantly closer to each other on yeast chromosomes.
Figure 7
Construction of a network from a list of metabolic reactions. The direct connections are established between the dependency pairs: gene pairs sharing metabolites (M) as reactants or products. An orphan activity (metabolic network gap) is marked by a question mark and surrounded by known metabolic genes. The first and second network layers around the gap are colored yellow and blue, respectively. E, enzyme.
Similar articles
- Comparative assessment of performance and genome dependence among phylogenetic profiling methods.
Snitkin ES, Gustafson AM, Mellor J, Wu J, DeLisi C. Snitkin ES, et al. BMC Bioinformatics. 2006 Sep 27;7:420. doi: 10.1186/1471-2105-7-420. BMC Bioinformatics. 2006. PMID: 17005048 Free PMC article. - Comparison of the small molecule metabolic enzymes of Escherichia coli and Saccharomyces cerevisiae.
Jardine O, Gough J, Chothia C, Teichmann SA. Jardine O, et al. Genome Res. 2002 Jun;12(6):916-29. doi: 10.1101/gr.228002. Genome Res. 2002. PMID: 12045145 Free PMC article. - Escherichia coli and Saccharomyces cerevisiae adenylate cyclases: a case of phylogenetic convergence?
Danchin A. Danchin A. Isozymes Curr Top Biol Med Res. 1987;15:141-51. Isozymes Curr Top Biol Med Res. 1987. PMID: 3298152 Review. No abstract available. - Origin of co-expression patterns in E. coli and S. cerevisiae emerging from reverse engineering algorithms.
Zampieri M, Soranzo N, Bianchini D, Altafini C. Zampieri M, et al. PLoS One. 2008 Aug 20;3(8):e2981. doi: 10.1371/journal.pone.0002981. PLoS One. 2008. PMID: 18714358 Free PMC article. - Genetic analysis of protein export in Escherichia coli.
Schatz PJ, Beckwith J. Schatz PJ, et al. Annu Rev Genet. 1990;24:215-48. doi: 10.1146/annurev.ge.24.120190.001243. Annu Rev Genet. 1990. PMID: 2088168 Review. No abstract available.
Cited by
- Assembling bacterial puzzles: piecing together functions into microbial pathways.
Chung HC, Friedberg I, Bromberg Y. Chung HC, et al. NAR Genom Bioinform. 2024 Aug 24;6(3):lqae109. doi: 10.1093/nargab/lqae109. eCollection 2024 Sep. NAR Genom Bioinform. 2024. PMID: 39184378 Free PMC article. - Coevolution of Metabolic Pathways in Blattodea and Their Blattabacterium Endosymbionts, and Comparisons with Other Insect-Bacteria Symbioses.
Kinjo Y, Bourguignon T, Hongoh Y, Lo N, Tokuda G, Ohkuma M. Kinjo Y, et al. Microbiol Spectr. 2022 Oct 26;10(5):e0277922. doi: 10.1128/spectrum.02779-22. Epub 2022 Sep 12. Microbiol Spectr. 2022. PMID: 36094208 Free PMC article. - Measurement of Conditional Relatedness Between Genes Using Fully Convolutional Neural Network.
Wang Y, Zhang S, Yang L, Yang S, Tian Y, Ma Q. Wang Y, et al. Front Genet. 2019 Oct 22;10:1009. doi: 10.3389/fgene.2019.01009. eCollection 2019. Front Genet. 2019. PMID: 31695723 Free PMC article. - Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites.
Hadadi N, MohammadiPeyhani H, Miskovic L, Seijo M, Hatzimanikatis V. Hadadi N, et al. Proc Natl Acad Sci U S A. 2019 Apr 9;116(15):7298-7307. doi: 10.1073/pnas.1818877116. Epub 2019 Mar 25. Proc Natl Acad Sci U S A. 2019. PMID: 30910961 Free PMC article. - Using Machine Learning to Measure Relatedness Between Genes: A Multi-Features Model.
Wang Y, Yang S, Zhao J, Du W, Liang Y, Wang C, Zhou F, Tian Y, Ma Q. Wang Y, et al. Sci Rep. 2019 Mar 12;9(1):4192. doi: 10.1038/s41598-019-40780-7. Sci Rep. 2019. PMID: 30862804 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases