Machine learning-based predictions of dietary restriction associations across ageing-related genes (original) (raw)

Machine-learning-based predictions of caloric restriction associations across ageing-related genes

2021

Caloric restriction (CR) is the most studied pro-longevity intervention; however, a complete understanding of its underlying mechanisms remains elusive, and new research directions may emerge from the identification of novel CR-related genes and CR-related genetic features. This work used a Machine Learning (ML) approach to classify ageing-related genes as CR-related or NotCR-related using 9 different types of predictive features: PathDIP pathways, two types of features based on KEGG pathways, two types of Protein-Protein Interactions (PPI) features, Gene Ontology (GO) terms, Genotype-Tissue Expression (GTEx) expression features, Gene-Friends co-expression features and protein sequence descriptors. Our findings suggested that features biased towards curated knowledge (i.e. GO terms and biological pathways), had the greatest predictive power, while unbiased features (mainly gene expression and co-expression data) have the least predictive power. Moreover, a combination of all the fea...

Predicting the Pro-Longevity or Anti-Longevity Effect of Model Organism Genes with New Hierarchical Feature Selection Methods

—Ageing is a highly complex biological process that is still poorly understood. With the growing amount of ageing-related data available on the web, in particular concerning the genetics of ageing, it is timely to apply data mining methods to that data, in order to try to discover novel patterns that may assist ageing research. In this work, we introduce new hierarchical feature selection methods for the classification task of data mining and apply them to ageing-related data from four model organisms: Caenorhabditis elegans (worm), Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fly), and Mus musculus (mouse). The main novel aspect of the proposed feature selection methods is that they exploit hierarchical relationships in the set of features (Gene Ontology terms) in order to improve the predictive accuracy of the Na€ ıve Bayes and 1-Nearest Neighbour (1-NN) classifiers, which are used to classify model organisms' genes into pro-longevity or anti-longevity genes. The results show that our hierarchical feature selection methods, when used together with Na€ ıve Bayes and 1-NN classifiers, obtain higher predictive accuracy than the standard (without feature selection) Na€ ıve Bayes and 1-NN classifiers, respectively. We also discuss the biological relevance of a number of Gene Ontology terms very frequently selected by our algorithms in our datasets.

Towards finding the linkage between metabolic and age-related disorders using semantic gene data network analysis

Bioinformation, 2016

A metabolic disorder (MD) occurs when the metabolic process is disturbed. This process is carried out by thousands of enzymes participating in numerous inter-dependent metabolic pathways. Critical biochemical reactions that involve the processing and transportation of carbohydrates, proteins and lipids are affected in metabolic diseases. Therefore, it is of interest to identify the common pathways of metabolic disorders by building protein-protein interactions (PPI) for network analysis. The molecular network linkages between MD and age related diseases (ARD) are intriguing. Hence, we created networks of protein-protein interactions that are related with MD and ARD using relevant known data in the public domain. The network analysis identified known MD associated proteins and predicted genes and or its products of ARD in common pathways. The genes in the common pathways were isolated from the network and further analyzed for their co-localization and shared domains. Thus, a model hy...

An Extensive Empirical Comparison of Probabilistic Hierarchical Classifiers in Datasets of Ageing-Related Genes

IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 2015

This study comprehensively evaluates the performance of 5 types of probabilistic hierarchical classification methods used for predicting Gene Ontology (GO) terms related to ageing. Of those tested, a new hybrid of a Local Hierarchical Classifier (LHC) and the Predictive Clustering Tree algorithm (LHC-PCT) had the best predictive accuracy results. We also tested the impact of two types of variations in most hierarchical classification algorithms, namely: (a) changing the base algorithm (we tested Naive Bayes and Support Vector Machines), and the impact of (b) using or not the Correlation based Feature Selection (CFS) algorithm in a pre-processing step. In total, we evaluated the predictive performance of 17 variations of hierarchical classifiers across 15 datasets of ageing and longevityrelated genes. We conclude that the LHC-PCT algorithm ranks better across several tests (7 out of 12). In addition, we interpreted the models generated by the PCT algorithm to show how hierarchical cl...

Supervised Machine Learning Models and Protein-Protein Interaction Network Analysis of Gene Expression Profiles Induced by Omega-3 Polyunsaturated Fatty Acids

Current Chinese Science, 2022

Background: Omega-3 polyunsaturated fatty acids (PUFAs), such as eicosapentaenoic (EPA) and docosahexaenoic (DHA) acids, have beneficial effects on human health, but their effect on gene expression in elderly individuals (age ≥ 65) is largely unknown. In order to examine this, the gene expression profiles were analyzed in the healthy subjects (n = 96) at baseline and after 26 weeks of supplementation with EPA+DHA to determine up-regulated and down-regulated dif-ferentially expressed genes (DEGs) triggered by PUFAs. The protein-protein interaction (PPI) networks were constructed by mapping these DEGs to a human interactome and linking them to the specific pathways. Objective: This study aimed to implement supervised machine learning models and protein-protein interaction network analysis of gene expression profiles induced by PUFAs. Methods: The transcriptional profile of GSE12375 was obtained from the Gene Expression Om-nibus database, which is based on the Affymetrix NuGO array. Th...

Supervised machine learning with feature selection for prioritization of targets related to time-based cellular dysfunction in aging

BackgroundGlobal life expectancy has been increasing without a corresponding increase in health span and with greater risk for aging-associated diseases such as Alzheimer’s disease (AD). An urgent need to delay the onset of aging-associated diseases has arisen and a dramatic increase in the number of potential molecular targets has led to the challenge of prioritizing targets to promote successful aging. Here, we developed a pipeline to prioritize aging-related genes which integrates the plethora of publicly available genomic, transcriptomic, proteomic and morphological data of C. elegans by applying a supervised machine learning approach. Additionally, a unique biological post-processing analysis of the computational output was performed to better reveal the prioritized gene’s function within the context of pathways and processes involved in aging across the lifespan of C. elegans.ResultsFour known aging-related genes — daf-2, involved in insulin signaling; let-363 and rsks-1, invo...

Human Ageing Genomic Resources: Integrated databases and tools for the biology and genetics of ageing

Nucleic Acids Research, 2013

The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restrictionmediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology.

Predicting Protein Relationships to Human Pathways through a Relational Learning Approach based on Simple Sequence Features

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2014

Biological pathways are important elements of systems biology and in the past decade, an increasing number of pathway databases have been set up to document the growing understanding of complex cellular processes. Although more genome-sequence data are becoming available, a large fraction of it remains functionally uncharacterized. Thus, it is important to be able to predict the mapping of poorly annotated proteins to original pathway models. Results: We have developed a Relational Learning-based Extension (RLE) system to investigate pathway membership through a function prediction approach that mainly relies on combinations of simple properties attributed to each protein. RLE searches for proteins with molecular similarities to specific pathway components. Using RLE, we associated 383 uncharacterized proteins to 28 pre-defined human Reactome pathways, demonstrating relative confidence after proper evaluation. Indeed, in specific cases manual inspection of the database annotations and the related literature supported the proposed classifications. Examples of possible additional components of the Electron transport system, Telomere maintenance and Integrin cell surface interactions pathways are discussed in detail. Availability: All the human predicted proteins in the 2009 and 2012 releases 30 and 40 of Reactome are available at http://rle.bioinfo.cnio.es.