Open Targets Genetics: An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci (original) (raw)

PhenoScanner: a database of human genotype-phenotype associations

Bioinformatics (Oxford, England), 2016

PhenoScanner is a curated database of publicly available results from large-scale genetic association studies. This tool aims to facilitate "phenome scans", the cross-referencing of genetic variants with many phenotypes, to help aid understanding of disease pathways and biology. The database currently contains over 350 million association results and over 10 million unique genetic variants, mostly single nucleotide polymorphisms. It is accompanied by a web-based tool that queries the database for associations with user-specified variants, providing results according to the same effect and non-effect alleles for each input variant. The tool provides the option of searching for trait associations with proxies of the input variants, calculated using the European samples from 1000 Genomes and Hapmap. PhenoScanner is available at www.phenoscanner.medschl.cam.ac.uk CONTACT: jrs95@medschl.cam.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

VaDE: a manually curated database of reproducible associations between various traits and human genomic polymorphisms

Nucleic acids research, 2015

Genome-wide association studies (GWASs) have identified numerous single nucleotide polymorphisms (SNPs) associated with the development of common diseases. However, it is clear that genetic risk factors of common diseases are heterogeneous among human populations. Therefore, we developed a database of genomic polymorphisms that are reproducibly associated with disease susceptibilities, drug responses and other traits for each human population: 'VarySysDB Disease Edition' (VaDE; http://bmi-tokai.jp/VaDE/). SNP-trait association data were obtained from the National Human Genome Research Institute GWAS (NHGRI GWAS) catalog and RAvariome, and we added detailed information of sample populations by curating original papers. In addition, we collected and curated original papers, and registered the detailed information of SNP-trait associations in VaDE. Then, we evaluated reproducibility of associations in each population by counting the number of significantly associated studies. V...

TAGOOS: genome-wide supervised learning of non-coding loci associated to complex phenotypes

Nucleic Acids Research, 2019

Genome-wide association studies (GWAS) associate single nucleotide polymorphisms (SNPs) to complex phenotypes. Most human SNPs fall in non-coding regions and are likely regulatory SNPs, but linkage disequilibrium (LD) blocks make it difficult to distinguish functional SNPs. Therefore, putative functional SNPs are usually annotated with molecular markers of gene regulatory regions and prioritized with dedicated prediction tools. We integrated associated SNPs, LD blocks and regulatory features into a supervised model called TAGOOS (TAG SNP bOOSting) and computed scores genome-wide. The TAGOOS scores enriched and prioritized unseen associated SNPs with an odds ratio of 4.3 and 3.5 and an area under the curve (AUC) of 0.65 and 0.6 for intronic and intergenic regions, respectively. The TAGOOS score was correlated with the maximal significance of associated SNPs and expression quantitative trait loci (eQTLs) and with the number of biological samples annotated for key regulatory features. Analysis of loci and regions associated to cleft lip and human adult height phenotypes recovered known functional loci and predicted new functional loci enriched in transcriptions factors related to the phenotypes. In conclusion, we trained a supervised model based on associated SNPs to prioritize putative functional regions. The TAGOOS scores, annotations and UCSC genome tracks are available here: https: //tagoos.readthedocs.io.

GWASdb: a database for human genetic variants identified by genome wide association studies

Recent advances in genome-wide association studies (GWAS) have enabled us to identify thousands of genetic variants (GVs) that are associated with human diseases. As next-generation sequencing technologies become less expensive, more GVs will be discovered in the near future. Existing databases, such as NHGRI GWAS Catalog, collect GVs with only genome-wide level significance. However, many true disease susceptibility loci have relatively moderate P values and are not included in these databases. We have developed GWASdb that contains 20 times more data than the GWAS Catalog and includes less significant GVs (P < 1.0 Â 10 À3 ) manually curated from the literature. In addition, GWASdb provides comprehensive functional annotations for each GV, including genomic mapping information, regulatory effects (transcription factor binding sites, microRNA target sites and splicing sites), amino acid substitutions, evolution, gene expression and disease associations. Furthermore, GWASdb classifies these GVs according to diseases using Disease-Ontology Lite and Human Phenotype Ontology. It can conduct pathway enrichment and PPI network association analysis for these diseases. GWASdb provides an intuitive, multifunctional database for biologists and clinicians to explore GVs and their functional inferences. It is freely available at http://jjwanglab .org/gwasdb and will be updated frequently.

Are genome-wide association studies all that we need to dissect the genetic component of complex human diseases?

European Journal of Human Genetics, 2006

With the availability of dense maps of anonymous and frequent SNPs spanning the whole human genome, genome-wide association studies are now becoming a reality. In this paper, we discuss the utility of these approaches to detect genetic risk variants involved in complex disease susceptibility and, in the best case scenario where a signal is detected, how helpful it will be to the understanding of the pathological process.

Linking disease associations with regulatory information in the human genome

Genome …, 2012

Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify ''functional SNPs'' that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.

GPCards: An integrated database of genotype–phenotype correlations in human genetic diseases

Computational and Structural Biotechnology Journal, 2021

Genotype-phenotype correlations are the basis of precision medicine of human genetic diseases. However, it remains a challenge for clinicians and researchers to conveniently access detailed individual-level clinical phenotypic features of patients with various genetic variants. To address this urgent need, we manually searched for genetic studies in PubMed and catalogued 8,309 genetic variants in 1,288 genes from 17,738 patients with detailed clinical phenotypic features from 1,855 publications. Based on genotype-phenotype correlations in this dataset, we developed an user-friendly online database called GPCards (http://genemed.tech/gpcards/), which not only provided the association between genetic diseases and disease genes, but also the prevalence of various clinical phenotypes related to disease genes and the patient-level mapping between these clinical phenotypes and genetic variants. To accelerate the interpretation of genetic variants, we integrated 62 well-known variant-level and genelevel genomic data sources, including functional predictions, allele frequencies in different populations, and disease-related information. Furthermore, GPCards enables automatic analyses of users' own genetic data, comprehensive annotation, prioritization of candidate functional variants, and identification of genotype-phenotype correlations using custom parameters. In conclusion, GPCards is expected to accelerate the interpretation of genotype-phenotype correlations, subtype classification, and candidate gene prioritisation in human genetic diseases.

Predicting causal variants affecting expression using whole genome sequence and RNA-seq from multiple human tissues

2016

Genetic association mapping produces statistical links between phenotypes and genomic regions, but identifying the causal variants themselves remains difficult. Complete knowledge of all genetic variants, as provided by whole genome sequence (WGS), will help, but is currently financially prohibitive for well powered GWAS studies. To explore the advantages of WGS in a well powered setting, we performed eQTL mapping using WGS and RNA-seq, and showed that the lead eQTL variants called using WGS are more likely to be causal. We derived properties of the causal variant from simulation studies, and used these to propose a method for implicating likely causal SNPs. This method predicts that 25% - 70% of the causal variants lie in open chromatin regions, depending on tissue and experiment. Finally, we identify a set of high confidence causal variants and show that they are more enriched in GWAS associations than other eQTL. Of these, we find 65 associations with GWAS traits and show example...