Biomolecular Systems of Disease Buried Across Multiple GWAS Unveiled by Information Theory and Ontology (original) (raw)

Analysis of genome-wide association study data using the protein knowledge base

BMC Genetics, 2011

Background: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. Results: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data.

Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory

Journal of the American Medical Informatics Association, 2012

Objective Thousands of complex-disease singlenucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complexdisease traits and offer insights for drug repositioning.

A data driven approach reveals disease similarity on a molecular level

npj Systems Biology and Applications

Could there be unexpected similarities between different studies, diseases, or treatments, on a molecular level due to common biological mechanisms involved? To answer this question, we develop a method for computing similarities between empirical, statistical distributions of high-dimensional, low-sample datasets, and apply it on hundreds of-omics studies. The similarities lead to dataset-to-dataset networks visualizing the landscape of a large portion of biological data. Potentially interesting similarities connecting studies of different diseases are assembled in a disease-to-disease network. Exploring it, we discover numerous nontrivial connections between Alzheimer's disease and schizophrenia, asthma and psoriasis, or liver cancer and obesity, to name a few. We then present a method that identifies the molecular quantities and pathways that contribute the most to the identified similarities and could point to novel drug targets or provide biological insights. The proposed method acts as a "statistical telescope" providing a global view of the constellation of biological data; readers can peek through it at: http://datascope.csd.uoc. gr:25000/.

Linking disease associations with regulatory information in the human genome

Genome …, 2012

Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify ''functional SNPs'' that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

Network Cluster Analysis of PPI and Phenotype Ontology for Type 1 Diabetes Mellitus

PubMed, 2024

Background: Our knowledge of Type 1 Diabetes Mellitus (T1DM) etiology is incomplete; however, the pathogenesis of the disease includes T-cell-mediated destruction of β-cells. Objective: The present study aimed to investigate the key gene pathways and co-expression networks in T1DM disease. Material and methods: TIDM-associated genes were identified from 13 databases, enrichment of pathways annotated with functional annotations, and analysis of protein-protein network interactions. Next, functional modules and transcription factor networks were constructed. The analysis of gene co-expression networks was conducted to discover associated pivotal modules. Results: A total of 172 expressed genes and four variants (SNP) were filtered in the of T1DM disease; pathway enrichment analysis identified key pathways, such as inflammatory bowel disease, type I diabetes mellitus, cytokine-cytokine receptor interaction, Th17 cell differentiation, JAK-STAT signaling pathway, and graft-versus-host disease. A weighted correlation network analysis revealed one module that was strongly correlated with T1DM. Functional annotation revealed that the module was mainly enriched in pathways such as T cell activation, regulation of immune system process, and response to the organic substance. IRF2, IRF4, IRF8, and CDX2 were regulated in the module at a significant level. Conclusion: The study identified IL-2 as a significant T1DM hotspot and highlighted the role of hub genes and transcription factors in the autoimmune disease, offering potentials for treatment and prevention.

Reduction of redundancy in diabetes-related gene categories utilizing conceptual similarity and gene expression information

International journal of health sciences

Insulin resistance or insulin deficiency can cause diabetes, which has a high level of glucose in the blood (American Diabetes Association 2014). It's one of the most pressing issues in modern public health. Around 366 million individuals had diabetes in 2011, according to the "International Diabetes Federation" (IDF). This figure is expected to rise to 552 million by the end of 2030. The average score of the semantic measure and the Pearson correlation were used to create a similarity index in this study. Gene similarity can be determined by this procedure. "One" indicates perfect similarity between two genes, whereas "zero" indicates perfect dissimilarity. Algorithms based on Pearson and semantic measures were used to identify genes that were genetically similar. PERL was used to write the algorithm (Practical Extraction and Reporting Language). PERL found gene-to-gene similarity and gradually deleted redundant information. An algorithm known as &...