Predicting the Pro-Longevity or Anti-Longevity Effect of Model Organism Genes with New Hierarchical Feature Selection Methods (original) (raw)

An Extensive Empirical Comparison of Probabilistic Hierarchical Classifiers in Datasets of Ageing-Related Genes

IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, 2015

This study comprehensively evaluates the performance of 5 types of probabilistic hierarchical classification methods used for predicting Gene Ontology (GO) terms related to ageing. Of those tested, a new hybrid of a Local Hierarchical Classifier (LHC) and the Predictive Clustering Tree algorithm (LHC-PCT) had the best predictive accuracy results. We also tested the impact of two types of variations in most hierarchical classification algorithms, namely: (a) changing the base algorithm (we tested Naive Bayes and Support Vector Machines), and the impact of (b) using or not the Correlation based Feature Selection (CFS) algorithm in a pre-processing step. In total, we evaluated the predictive performance of 17 variations of hierarchical classifiers across 15 datasets of ageing and longevityrelated genes. We conclude that the LHC-PCT algorithm ranks better across several tests (7 out of 12). In addition, we interpreted the models generated by the PCT algorithm to show how hierarchical cl...

Supervised machine learning with feature selection for prioritization of targets related to time-based cellular dysfunction in aging

BackgroundGlobal life expectancy has been increasing without a corresponding increase in health span and with greater risk for aging-associated diseases such as Alzheimer’s disease (AD). An urgent need to delay the onset of aging-associated diseases has arisen and a dramatic increase in the number of potential molecular targets has led to the challenge of prioritizing targets to promote successful aging. Here, we developed a pipeline to prioritize aging-related genes which integrates the plethora of publicly available genomic, transcriptomic, proteomic and morphological data of C. elegans by applying a supervised machine learning approach. Additionally, a unique biological post-processing analysis of the computational output was performed to better reveal the prioritized gene’s function within the context of pathways and processes involved in aging across the lifespan of C. elegans.ResultsFour known aging-related genes — daf-2, involved in insulin signaling; let-363 and rsks-1, invo...

Taxonomy of Ageing and Non-Ageing Genes by Means of General Data Mining Techniques

International Journal of Computer Applications, 2013

Classification of DNA repair genes into ageing and nonageing is a vital process to identify faulty genes. Classifying genes into ageing and non-ageing human genome ranges over ten thousand. The ratio of ageing genes in the human genome is very less. There is a need for classifying ageing genes accurately in order to understand the complex processes occurring in living organisms. Data mining approach is routinely applied to classify DNA repair genes using various characteristics and feature. This paper proposes to build classification models that allow us to discriminate between ageing-related and non-ageing related DNA repair genes, in order to enhance value their different properties of genes classification performance should be evaluated by applying different kinds of classification algorithms like pruning, multiperceptron and Logistics. It will helpful for biomedical researchers, gene analyzer, patients and different kinds of end user.

Gene Categories Differentially Expressed in C. elegans Age-1 Mutants of Extraordinary Longevity: New Insights From Novel Data-Mining Procedures

The journals of gerontology. Series A, Biological sciences and medical sciences, 2011

Two nonsense mutants of age-1, the Caenorhabditis elegans gene encoding phosphoinositide 3-kinase, live nearly 10-fold longer than wild-type controls and are exceptionally resistant to several stresses. Genome-wide expression analyses implicated downregulation of many more genes than were upregulated in second-generation age-1 homozygotes. Functional-annotation analysis, based on Gene Ontology terms, suggested that novel mechanisms may mediate the stronger phenotypes observed for these worms than with milder age-1 disruption. For the current study, the same microarray data were reanalyzed using novel meta-analytic procedures that we developed recently. First, gene p values were corrected for systematic biases based on the observed distribution for nonexpressed genes; these values were then combined to derive an aggregate p value for each functional-annotation term while adjusting for intergene covariance. This resulted in much better coverage of relevant gene categories, including m...

ACO-Based bayesian network ensembles for the hierarchical classification of ageing-related proteins

The task of predicting protein functions using computational techniques is a major research area in the field of bioinformatics. Casting the task into a classification problem makes it challenging, since the classes (functions) to be predicted are hierarchically related, and a protein can have more than one function. One approach is to produce a set of local classifiers; each is responsible for discriminating between a subset of the classes in a certain level of the hierarchy. In this paper we tackle the hierarchical classification problem in a local fashion, by learning an ensemble of Bayesian network classifiers for each class in the hierarchy and combining their outputs with four alternative methods: a) selecting the best classifier, b) majority voting, c) weighted voting, and d) constructing a meta-classifier. The ensemble is built using ABC-Miner, our recently introduced Ant-based Bayesian Classification algorithm. We use different types of protein representations to learn diff...

Prediction of C. elegans Longevity Genes by Human and Worm Longevity Networks

PLoS ONE, 2012

Intricate and interconnected pathways modulate longevity, but screens to identify the components of these pathways have not been saturating. Because biological processes are often executed by protein complexes and fine-tuned by regulatory factors, the first-order protein-protein interactors of known longevity genes are likely to participate in the regulation of longevity. Data-rich maps of protein interactions have been established for many cardinal organisms such as yeast, worms, and humans. We propose that these interaction maps could be mined for the identification of new putative regulators of longevity. For this purpose, we have constructed longevity networks in both humans and worms. We reasoned that the essential first-order interactors of known longevity-associated genes in these networks are more likely to have longevity phenotypes than randomly chosen genes. We have used C. elegans to determine whether post-developmental inactivation of these essential genes modulates lifespan. Our results suggest that the worm and human longevity networks are functionally relevant and possess a high predictive power for identifying new longevity regulators.

An evidence-based approach to identify aging-related genes in Caenorhabditis elegans

BMC Bioinformatics, 2015

Background: Extensive studies have been carried out on Caenorhabditis elegans as a model organism to elucidate mechanisms of aging and the effects of perturbing known aging-related genes on lifespan and behavior. This research has generated large amounts of experimental data that is increasingly difficult to integrate and analyze with existing databases and domain knowledge. To address this challenge, we demonstrate a scalable and effective approach for automatic evidence gathering and evaluation that leverages existing experimental data and literature-curated facts to identify genes involved in aging and lifespan regulation in C. elegans. Results: We developed a semantic knowledge base for aging by integrating data about C. elegans genes from WormBase with data about 2005 human and model organism genes from GenAge and 149 genes from GenDR, and with the Bio2RDF network of linked data for the life sciences. Using HyQue (a Semantic Web tool for hypothesis-based querying and evaluation) to interrogate this knowledge base, we examined 48,231 C. elegans genes for their role in modulating lifespan and aging. HyQue identified 24 novel but well-supported candidate aging-related genes for further experimental validation. Conclusions: We use semantic technologies to discover candidate aging genes whose effects on lifespan are not yet well understood. Our customized HyQue system, the aging research knowledge base it operates over, and HyQue evaluations of all C. elegans genes are freely available at http://hyque.semanticscience.org.

Machine learning-based predictions of dietary restriction associations across ageing-related genes

BMC Bioinformatics, 2022

Background Dietary restriction (DR) is the most studied pro-longevity intervention; however, a complete understanding of its underlying mechanisms remains elusive, and new research directions may emerge from the identification of novel DR-related genes and DR-related genetic features. Results This work used a Machine Learning (ML) approach to classify ageing-related genes as DR-related or NotDR-related using 9 different types of predictive features: PathDIP pathways, two types of features based on KEGG pathways, two types of Protein–Protein Interactions (PPI) features, Gene Ontology (GO) terms, Genotype Tissue Expression (GTEx) expression features, GeneFriends co-expression features and protein sequence descriptors. Our findings suggested that features biased towards curated knowledge (i.e. GO terms and biological pathways), had the greatest predictive power, while unbiased features (mainly gene expression and co-expression data) have the least predictive power. Moreover, a combinat...

Predicting Aging/Longevity-Related Genes in the Nematode Caenorhabditis elegans 1

We present a novel mathematical/computational strategy for predicting genes/proteins associated with aging/longevity. The novelty of our method arises from the topological analysis of an organismal longevity gene/protein network (LGPN), which extends the existing cellular networks. The LGPN nodes represent both genes and corresponding proteins. Links stand for all known interactions between the nodes. The LGPN of C. elegans incorporated 362 genes/proteins, 160 connecting and 202 age-related ones, from a list of 321 with known impact on aging/longevity. A longevity core of 129 directly interacting genes or proteins was identified. This core may shed light on the large-scale mechanisms of aging. Predictions were made, based upon the finding that LGPN hubs and centrally located nodes have higher likelihoods of being associated with aging/longevity than do randomly selected nodes. Analysis singled-out 15 potential aging/longevity-related genes for further examination: mpk-1,