Using Machine Learning to Measure Relatedness Between Genes: A Multi-Features Model - PubMed (original) (raw)

Using Machine Learning to Measure Relatedness Between Genes: A Multi-Features Model

Yan Wang et al. Sci Rep. 2019.

Abstract

Measuring conditional relatedness between a pair of genes is a fundamental technique and still a significant challenge in computational biology. Such relatedness can be assessed by gene expression similarities while suffering high false discovery rates. Meanwhile, other types of features, e.g., prior-knowledge based similarities, is only viable for measuring global relatedness. In this paper, we propose a novel machine learning model, named Multi-Features Relatedness (MFR), for accurately measuring conditional relatedness between a pair of genes by incorporating expression similarities with prior-knowledge based similarities in an assessment criterion. MFR is used to predict gene-gene interactions extracted from the COXPRESdb, KEGG, HPRD, and TRRUST databases by the 10-fold cross validation and test verification, and to identify gene-gene interactions collected from the GeneFriends and DIP databases for further verification. The results show that MFR achieves the highest area under curve (AUC) values for identifying gene-gene interactions in the development, test, and DIP datasets. Specifically, it obtains an improvement of 1.1% on average of precision for detecting gene pairs with both high expression similarities and high prior-knowledge based similarities in all datasets, comparing to other linear models and coexpression analysis methods. Regarding cancer gene networks construction and gene function prediction, MFR also obtains the results with more biological significances and higher average prediction accuracy, than other compared models and methods. A website of the MFR model and relevant datasets can be accessed from http://bmbl.sdstate.edu/MFR .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1

Figure 1

Workflow of MFR model. Five steps are in the workflow, including (i) gene pair samples collection, (ii) gene features extraction, (iii) gene pair features calculation, (iv) SVM model construction and (v) verification and discussion.

Figure 2

Figure 2

Structure of the MFR model. The model is based on SVM and uses 12 similarity-based gene pair features as input; and the output value, namely MFR, is applied as an assessment criterion for measuring conditional relatedness between genes.

Figure 3

Figure 3

(A) ROCs of nine models or methods for identifying gene-gene interactions by the 10-fold cross-validation. (B) Average PPVs of nine models or methods for detecting B0/B1 matched gene pairs by 10-fold cross-validation.

Figure 4

Figure 4

ROCs of nine models or methods for identifying gene-gene interactions in the (A) test, (C) GeneFriends and (E) DIP datasets. Average PPVs of nine models or methods for detecting B0/B1 matched gene pairs in the (B) test, (D) GeneFriends and (F) DIP datasets.

Figure 5

Figure 5

Metabolic pathways are predicted to be directly influenced by increased glutamine and glutamate metabolism in nine BRCA gene networks.

Figure 6

Figure 6

Number of metabolic pathways predicted to be directly influenced by increased glutamine and glutamate metabolism in four cancer types. These pathways were predicted in cancer gene networks, where nodes represent up-regulated metabolic genes and edges represent relatedness between genes, measured by the five linear models and six coexpression analysis methods.

Figure 7

Figure 7

Percentages of L0- and L1-matched selected genes in the nine KEGG metabolic gene networks. In these networks, nodes represent genes involved in KEGG metabolism pathways, and edges represent relatedness between genes, measured by the nine models or methods.

Similar articles

Cited by

References

    1. Du D, Rawat N, Deng Z, Gmitter GF., Jr. Construction of citrus gene coexpression networks from microarray data using random matrix theory. Horticulture Research. 2015;2:15026. doi: 10.1038/hortres.2015.26. - DOI - PMC - PubMed
    1. Righetti, K. et al. Inference of Longevity-Related Genes from a Robust Coexpression Network of Seed Maturation Identifies Regulators Linking Seed Storability to Biotic Defense-Related Pathways. Plant Cell27 (2015). - PMC - PubMed
    1. Sarkar NK, Kim YK, Grover A. Coexpression network analysis associated with call of rice seedlings for encountering heat stress. Plant Molecular Biology. 2014;84:125–143. doi: 10.1007/s11103-013-0123-3. - DOI - PubMed
    1. Takehisa H, Sato Y, Antonio B, Nagamura Y. Coexpression Network Analysis of Macronutrient Deficiency Response Genes in Rice. Rice. 2015;8:1–7. doi: 10.1186/s12284-015-0059-0. - DOI - PMC - PubMed
    1. Zhao X, Liu ZY, Liu QX. Gene coexpression networks reveal key drivers of phenotypic divergence in porcine muscle. BMC Genomics. 2015;16:1–15. doi: 10.1186/1471-2164-16-1. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources