E. Gamazon - Academia.edu (original) (raw)
Papers by E. Gamazon
Biochemical phenotypes are major indexes for protein structure and function characterization. The... more Biochemical phenotypes are major indexes for protein structure and function characterization. They are determined, at least in part, by the intrinsic physicochemical properties of amino acids and may be reflected in the protein three-dimensional structure. Modeling mutational effects on biochemical phenotypes is a critical step for understanding protein function and disease mechanism as well as enabling drug discovery. Deep Mutational Scanning (DMS) experiments have been performed on SARS-CoV-2's spike receptor binding domain and the human ACE2 zinc-binding peptidase domain – both central players in viral infection and evolution and antibody evasion - quantifying how mutations impact binding affinity and protein expression. Here, we modeled biochemical phenotypes from massively parallel assays, using convolutional neural networks trained on protein sequence mutations in the virus and human host. We found that neural networks are significantly predictive of binding affinity, prot...
SummaryWith the increasing availability of biobank-scale datasets that incorporate both genomic d... more SummaryWith the increasing availability of biobank-scale datasets that incorporate both genomic data and electronic health records, many associations between genetic variants and phenotypes of interest have been discovered. Polygenic risk scores (PRS), which are being widely explored in precision medicine, use the results of association studies to predict the genetic component of disease risk by accumulating risk alleles weighted by their effect sizes. However, limited studies have thoroughly investigated best practices for PRS in global populations across different diseases. In this study, we utilize data from the Global-Biobank Meta-analysis Initiative (GBMI), which consists of individuals from diverse ancestries and across continents, to explore methodological considerations and PRS prediction performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRS using heuristic (pruning and thresholding, P+T) and Bayesian (PRS-CS) methods. We found that t...
The pharmacogenomics journal, Jan 27, 2016
We conducted a discovery genome-wide association study with expression quantitative trait loci (e... more We conducted a discovery genome-wide association study with expression quantitative trait loci (eQTL) annotation of new-onset diabetes (NOD) among European Americans, who were exposed to a calcium channel blocker-based strategy (CCB strategy) or a β-blocker-based strategy (β-blocker strategy) in the INternational VErapamil SR Trandolapril STudy. Replication of the top signal from the SNP*treatment interaction analysis was attempted in Hispanic and African Americans, and a joint meta-analysis was performed (total 334 NOD cases and 806 matched controls). PLEKHH2 rs11124945 at 2p21 interacted with antihypertensive exposure for NOD (meta-analysis P=5.3 × 10(-)(8)). rs11124945 G allele carriers had lower odds for NOD when exposed to the β-blocker strategy compared with the CCB strategy (Odds ratio OR=0.38(0.24-0.60), P=4.0 × 10(-)(5)), whereas A/A homozygotes exposed to the β-blocker strategy had increased odds for NOD compared with the CCB strategy (OR=2.02(1.39-2.92), P=2.0 × 10(-)(4))...
Journal of the American Medical Informatics Association, 2013
Background While genome-wide association studies (GWAS) of complex traits have revealed thousands... more Background While genome-wide association studies (GWAS) of complex traits have revealed thousands of reproducible genetic associations to date, these loci collectively confer very little of the heritability of their respective diseases and, in general, have contributed little to our understanding the underlying disease biology. Physical protein interactions have been utilized to increase our understanding of human Mendelian disease loci but have yet to be fully exploited for complex traits. Methods We hypothesized that protein interaction modeling of GWAS findings could highlight important disease-associated loci and unveil the role of their network topology in the genetic architecture of diseases with complex inheritance. Results Network modeling of proteins associated with the intragenic single nucleotide polymorphisms of the National Human Genome Research Institute catalog of complex trait GWAS revealed that complex trait associated loci are more likely to be hub and bottleneck genes in available, albeit incomplete, networks (OR=1.59, Fisher's exact test p<2.24×10 −12). Network modeling also prioritized novel type 2 diabetes (T2D) genetic variations from the Finland-USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics and the Wellcome Trust GWAS data, and demonstrated the enrichment of hubs and bottlenecks in prioritized T2D GWAS genes. The potential biological relevance of the T2D hub and bottleneck genes was revealed by their increased number of first degree protein interactions with known T2D genes according to several independent sources (p<0.01, probability of being first interactors of known T2D genes). Conclusion Virtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.
npj Genomic Medicine, 2019
African Americans (AAs) are an admixed population with widely varying proportion of West African ... more African Americans (AAs) are an admixed population with widely varying proportion of West African ancestry (WAA). Here we report the correlation of WAA to gene expression and DNA methylation in AA-derived hepatocytes, a cell type important in disease and drug response. We perform mediation analysis to test whether methylation is a mediator of the effect of ancestry on expression. GTEx samples and a second cohort are used as validation. One hundred and thirty-one genes are associated with WAA (FDR
Blood
3892 Survivors of Hodgkin lymphoma (HL) are susceptible to radiation-induced second malignant neo... more 3892 Survivors of Hodgkin lymphoma (HL) are susceptible to radiation-induced second malignant neoplasms (SMNs). In a genome-wide association study (GWAS) of patients treated for HL who did or did not develop SMNs, we identified and validated two SMN-associated single nucleotide polymorphisms (SNPs) at 6q21, intergenic between PRDM1 and ATG5 [rs4946728: P = 1.04×10-9, OR = 3.21 (95% CI = 2.37–6.42), and rs1040411: P = 4.24×10-8, OR = 2.43 (95% CI = 1.76–3.34)]. Recently, it was demonstrated that disease-associated SNPs are more likely to be expression quantitative trait loci (eQTLs), SNPs that regulate gene expression, than are randomly chosen SNPs matched for their population allele frequencies. Indeed, we found that the 1000 SNPs most associated with SMNs are significantly enriched for eQTLs (P = 0.01). Exploring the processes regulated by SMN-associated SNPs can inform the mechanism by which SMNs result in patients treated with radiation therapy. As an initial investigation of the...
Journal of Thrombosis and Haemostasis
Background-Warfarin is commonly used to control and prevent thromboembolic disorders. However, be... more Background-Warfarin is commonly used to control and prevent thromboembolic disorders. However, because of warfarin's complex dose-requirement relationship, safe and effective use is challenging. Pharmacogenomics-guided warfarin dosing algorithms that include the wellestablished VKORC1 and CYP2C9 polymorphisms explain only a small proportion of interindividual variability in African Americans (AAs). Objectives-We aimed to assess whether transcriptomic analyses could be used to identify regulatory variants associated with warfarin dose response in AAs. Patients/Methods-We identified a total of 56 expression quantitative trait loci (eQTLs) for CYP2C9, VKORC1 and CALU derived from human livers and evaluated their association with warfarin dose response in two independent AA warfarin patient cohorts.
The pharmacogenomics journal, Jan 9, 2016
Variation in the expression level and activity of genes involved in drug disposition and action (... more Variation in the expression level and activity of genes involved in drug disposition and action ('pharmacogenes') can affect drug response and toxicity, especially when in tissues of pharmacological importance. Previous studies have relied primarily on microarrays to understand gene expression differences, or have focused on a single tissue or small number of samples. The goal of this study was to use RNA-sequencing (RNA-seq) to determine the expression levels and alternative splicing of 389 Pharmacogenomics Research Network pharmacogenes across four tissues (liver, kidney, heart and adipose) and lymphoblastoid cell lines, which are used widely in pharmacogenomics studies. Analysis of RNA-seq data from 139 different individuals across the 5 tissues (20-45 individuals per tissue type) revealed substantial variation in both expression levels and splicing across samples and tissue types. Comparison with GTEx data yielded a consistent picture. This in-depth exploration also reve...
Leukemia, 2015
Epigenetic deregulation is a common finding in myeloid malignancies, and epigenetic therapies hav... more Epigenetic deregulation is a common finding in myeloid malignancies, and epigenetic therapies have been used successfully to treat patients with acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Inactivating mutations of TET2 have been found in Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Science, 2015
The GTEx Consortium* † Understanding the functional consequences of genetic variation, and how it... more The GTEx Consortium* † Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
Briefings in Functional Genomics, 2015
Recent years have witnessed a flurry of important technological and methodological developments i... more Recent years have witnessed a flurry of important technological and methodological developments in the discovery and analysis of copy number variations (CNVs), which are increasingly enabling the systematic evaluation of their impact on a broad range of phenotypes from molecular-level (intermediate) traits to higher-order clinical phenotypes. Like single nucleotide variants in the human genome, CNVs have been linked to complex traits in humans, including disease and drug response. These recent developments underscore the importance of incorporating complex forms of genetic variation into disease mapping studies and promise to transform our understanding of genome function and the genetic basis of disease. Here we review some of the findings that have emerged from transcriptome studies of CNVs facilitated by the rapid advances in-omics technologies and corresponding methodologies.
Science (New York, N.Y.), Jan 8, 2015
Accurate prediction of the functional effect of genetic variation is critical for clinical genome... more Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
Background: Although ASD has a significant genetic component, a heterogeneous etiology makes gene... more Background: Although ASD has a significant genetic component, a heterogeneous etiology makes gene discovery complex. With the advent of efficient and affordable sequencing tools, genomic data is becoming widely available in public databases. Two data types of particular relevance for ASD investigation are copy number variants (CNVs) and whole exome sequence data. Exome sequence data has been previously shown to help identify causative genes and variations in complex disorders. De novoCNVs have been found in dosage-sensitive regions, which are typically stable in healthy controls. Because of the heterogeneity of ASD, it is necessary to integrate CNV and exome sequence data to help identify candidate genes. Objectives: To develop a pilot queryable and scalable database for effective integration of whole exome sequence data and CNV data in order to identify additional genes and variants of interest in ASD. Methods: A pilot database was developed using available data for 24 probands—23 ...
Database, 2015
Functional annotation of genetic variants including single nucleotide polymorphisms (SNPs) and co... more Functional annotation of genetic variants including single nucleotide polymorphisms (SNPs) and copy number variations (CNV) promises to greatly improve our understanding of human complex traits. Previous transcriptomic studies involving individuals from different global populations have investigated the genetic architecture of gene expression variation by mapping expression quantitative trait loci (eQTL). Functional interpretation of genome-wide association studies (GWAS) has identified enrichment of eQTL in top signals from GWAS of human complex traits. The SCAN (SNP and CNV Annotation) database was developed as a web-based resource of genetical genomic studies including eQTL detected in the HapMap lymphoblastoid cell line samples derived from apparently healthy individuals of European and African ancestry. Considering the critical roles of epigenetic gene regulation, cytosine modification quantitative trait loci (mQTL) are expected to add a crucial layer of annotation to existing functional genomic information. Here, we describe the new features of the SCAN database that integrate comprehensive mQTL mapping results generated in the HapMap CEU (Caucasian residents from
Blood, Jan 2, 2014
The anticoagulant warfarin has >30 million prescriptions per year in the United States. Doses ... more The anticoagulant warfarin has >30 million prescriptions per year in the United States. Doses can vary 20-fold between patients, and incorrect dosing can result in serious adverse events. Variation in warfarin pharmacokinetic and pharmacodynamic genes, such as CYP2C9 and VKORC1, do not fully explain the dose variability in African Americans. To identify additional genetic contributors to warfarin dose, we exome sequenced 103 African Americans on stable doses of warfarin at extremes (≤ 35 and ≥ 49 mg/week). We found an association between lower warfarin dose and a population-specific regulatory variant, rs7856096 (P = 1.82 × 10(-8), minor allele frequency = 20.4%), in the folate homeostasis gene folylpolyglutamate synthase (FPGS). We replicated this association in an independent cohort of 372 African American subjects whose stable warfarin doses represented the full dosing spectrum (P = .046). In a combined cohort, adding rs7856096 to the International Warfarin Pharmacogenetic Con...
AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science, 2010
A key challenge for genome-wide association studies (GWAS) is to understand how single nucleotide... more A key challenge for genome-wide association studies (GWAS) is to understand how single nucleotide polymorphisms (SNPs) mechanistically underpin complex diseases. While this challenge has been addressed partially by Gene Ontology (GO) enrichment of large list of host genes of SNPs prioritized in GWAS, these enrichment have not been formally evaluated. Here, we develop a novel computational approach anchored in information theoretic similarity, by systematically mining lists of host genes of SNPs prioritized in three adult-onset diabetes mellitus GWAS. The "gold-standard" is based on GO associated with 20 published diabetes SNPs' host genes and on our own evaluation. We computationally identify 69 similarity-predicted GO independently validated in all three GWAS (FDR<5%), enriched with those of the gold-standard (odds ratio=5.89, P=4.81e-05), and these terms can be organized by similarity criteria into 11 groupings termed…
Biochemical phenotypes are major indexes for protein structure and function characterization. The... more Biochemical phenotypes are major indexes for protein structure and function characterization. They are determined, at least in part, by the intrinsic physicochemical properties of amino acids and may be reflected in the protein three-dimensional structure. Modeling mutational effects on biochemical phenotypes is a critical step for understanding protein function and disease mechanism as well as enabling drug discovery. Deep Mutational Scanning (DMS) experiments have been performed on SARS-CoV-2's spike receptor binding domain and the human ACE2 zinc-binding peptidase domain – both central players in viral infection and evolution and antibody evasion - quantifying how mutations impact binding affinity and protein expression. Here, we modeled biochemical phenotypes from massively parallel assays, using convolutional neural networks trained on protein sequence mutations in the virus and human host. We found that neural networks are significantly predictive of binding affinity, prot...
SummaryWith the increasing availability of biobank-scale datasets that incorporate both genomic d... more SummaryWith the increasing availability of biobank-scale datasets that incorporate both genomic data and electronic health records, many associations between genetic variants and phenotypes of interest have been discovered. Polygenic risk scores (PRS), which are being widely explored in precision medicine, use the results of association studies to predict the genetic component of disease risk by accumulating risk alleles weighted by their effect sizes. However, limited studies have thoroughly investigated best practices for PRS in global populations across different diseases. In this study, we utilize data from the Global-Biobank Meta-analysis Initiative (GBMI), which consists of individuals from diverse ancestries and across continents, to explore methodological considerations and PRS prediction performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRS using heuristic (pruning and thresholding, P+T) and Bayesian (PRS-CS) methods. We found that t...
The pharmacogenomics journal, Jan 27, 2016
We conducted a discovery genome-wide association study with expression quantitative trait loci (e... more We conducted a discovery genome-wide association study with expression quantitative trait loci (eQTL) annotation of new-onset diabetes (NOD) among European Americans, who were exposed to a calcium channel blocker-based strategy (CCB strategy) or a β-blocker-based strategy (β-blocker strategy) in the INternational VErapamil SR Trandolapril STudy. Replication of the top signal from the SNP*treatment interaction analysis was attempted in Hispanic and African Americans, and a joint meta-analysis was performed (total 334 NOD cases and 806 matched controls). PLEKHH2 rs11124945 at 2p21 interacted with antihypertensive exposure for NOD (meta-analysis P=5.3 × 10(-)(8)). rs11124945 G allele carriers had lower odds for NOD when exposed to the β-blocker strategy compared with the CCB strategy (Odds ratio OR=0.38(0.24-0.60), P=4.0 × 10(-)(5)), whereas A/A homozygotes exposed to the β-blocker strategy had increased odds for NOD compared with the CCB strategy (OR=2.02(1.39-2.92), P=2.0 × 10(-)(4))...
Journal of the American Medical Informatics Association, 2013
Background While genome-wide association studies (GWAS) of complex traits have revealed thousands... more Background While genome-wide association studies (GWAS) of complex traits have revealed thousands of reproducible genetic associations to date, these loci collectively confer very little of the heritability of their respective diseases and, in general, have contributed little to our understanding the underlying disease biology. Physical protein interactions have been utilized to increase our understanding of human Mendelian disease loci but have yet to be fully exploited for complex traits. Methods We hypothesized that protein interaction modeling of GWAS findings could highlight important disease-associated loci and unveil the role of their network topology in the genetic architecture of diseases with complex inheritance. Results Network modeling of proteins associated with the intragenic single nucleotide polymorphisms of the National Human Genome Research Institute catalog of complex trait GWAS revealed that complex trait associated loci are more likely to be hub and bottleneck genes in available, albeit incomplete, networks (OR=1.59, Fisher's exact test p<2.24×10 −12). Network modeling also prioritized novel type 2 diabetes (T2D) genetic variations from the Finland-USA Investigation of Non-Insulin-Dependent Diabetes Mellitus Genetics and the Wellcome Trust GWAS data, and demonstrated the enrichment of hubs and bottlenecks in prioritized T2D GWAS genes. The potential biological relevance of the T2D hub and bottleneck genes was revealed by their increased number of first degree protein interactions with known T2D genes according to several independent sources (p<0.01, probability of being first interactors of known T2D genes). Conclusion Virtually all common diseases are complex human traits, and thus the topological centrality in protein networks of complex trait genes has implications in genetics, personal genomics, and therapy.
npj Genomic Medicine, 2019
African Americans (AAs) are an admixed population with widely varying proportion of West African ... more African Americans (AAs) are an admixed population with widely varying proportion of West African ancestry (WAA). Here we report the correlation of WAA to gene expression and DNA methylation in AA-derived hepatocytes, a cell type important in disease and drug response. We perform mediation analysis to test whether methylation is a mediator of the effect of ancestry on expression. GTEx samples and a second cohort are used as validation. One hundred and thirty-one genes are associated with WAA (FDR
Blood
3892 Survivors of Hodgkin lymphoma (HL) are susceptible to radiation-induced second malignant neo... more 3892 Survivors of Hodgkin lymphoma (HL) are susceptible to radiation-induced second malignant neoplasms (SMNs). In a genome-wide association study (GWAS) of patients treated for HL who did or did not develop SMNs, we identified and validated two SMN-associated single nucleotide polymorphisms (SNPs) at 6q21, intergenic between PRDM1 and ATG5 [rs4946728: P = 1.04×10-9, OR = 3.21 (95% CI = 2.37–6.42), and rs1040411: P = 4.24×10-8, OR = 2.43 (95% CI = 1.76–3.34)]. Recently, it was demonstrated that disease-associated SNPs are more likely to be expression quantitative trait loci (eQTLs), SNPs that regulate gene expression, than are randomly chosen SNPs matched for their population allele frequencies. Indeed, we found that the 1000 SNPs most associated with SMNs are significantly enriched for eQTLs (P = 0.01). Exploring the processes regulated by SMN-associated SNPs can inform the mechanism by which SMNs result in patients treated with radiation therapy. As an initial investigation of the...
Journal of Thrombosis and Haemostasis
Background-Warfarin is commonly used to control and prevent thromboembolic disorders. However, be... more Background-Warfarin is commonly used to control and prevent thromboembolic disorders. However, because of warfarin's complex dose-requirement relationship, safe and effective use is challenging. Pharmacogenomics-guided warfarin dosing algorithms that include the wellestablished VKORC1 and CYP2C9 polymorphisms explain only a small proportion of interindividual variability in African Americans (AAs). Objectives-We aimed to assess whether transcriptomic analyses could be used to identify regulatory variants associated with warfarin dose response in AAs. Patients/Methods-We identified a total of 56 expression quantitative trait loci (eQTLs) for CYP2C9, VKORC1 and CALU derived from human livers and evaluated their association with warfarin dose response in two independent AA warfarin patient cohorts.
The pharmacogenomics journal, Jan 9, 2016
Variation in the expression level and activity of genes involved in drug disposition and action (... more Variation in the expression level and activity of genes involved in drug disposition and action ('pharmacogenes') can affect drug response and toxicity, especially when in tissues of pharmacological importance. Previous studies have relied primarily on microarrays to understand gene expression differences, or have focused on a single tissue or small number of samples. The goal of this study was to use RNA-sequencing (RNA-seq) to determine the expression levels and alternative splicing of 389 Pharmacogenomics Research Network pharmacogenes across four tissues (liver, kidney, heart and adipose) and lymphoblastoid cell lines, which are used widely in pharmacogenomics studies. Analysis of RNA-seq data from 139 different individuals across the 5 tissues (20-45 individuals per tissue type) revealed substantial variation in both expression levels and splicing across samples and tissue types. Comparison with GTEx data yielded a consistent picture. This in-depth exploration also reve...
Leukemia, 2015
Epigenetic deregulation is a common finding in myeloid malignancies, and epigenetic therapies hav... more Epigenetic deregulation is a common finding in myeloid malignancies, and epigenetic therapies have been used successfully to treat patients with acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Inactivating mutations of TET2 have been found in Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Science, 2015
The GTEx Consortium* † Understanding the functional consequences of genetic variation, and how it... more The GTEx Consortium* † Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
Briefings in Functional Genomics, 2015
Recent years have witnessed a flurry of important technological and methodological developments i... more Recent years have witnessed a flurry of important technological and methodological developments in the discovery and analysis of copy number variations (CNVs), which are increasingly enabling the systematic evaluation of their impact on a broad range of phenotypes from molecular-level (intermediate) traits to higher-order clinical phenotypes. Like single nucleotide variants in the human genome, CNVs have been linked to complex traits in humans, including disease and drug response. These recent developments underscore the importance of incorporating complex forms of genetic variation into disease mapping studies and promise to transform our understanding of genome function and the genetic basis of disease. Here we review some of the findings that have emerged from transcriptome studies of CNVs facilitated by the rapid advances in-omics technologies and corresponding methodologies.
Science (New York, N.Y.), Jan 8, 2015
Accurate prediction of the functional effect of genetic variation is critical for clinical genome... more Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
Background: Although ASD has a significant genetic component, a heterogeneous etiology makes gene... more Background: Although ASD has a significant genetic component, a heterogeneous etiology makes gene discovery complex. With the advent of efficient and affordable sequencing tools, genomic data is becoming widely available in public databases. Two data types of particular relevance for ASD investigation are copy number variants (CNVs) and whole exome sequence data. Exome sequence data has been previously shown to help identify causative genes and variations in complex disorders. De novoCNVs have been found in dosage-sensitive regions, which are typically stable in healthy controls. Because of the heterogeneity of ASD, it is necessary to integrate CNV and exome sequence data to help identify candidate genes. Objectives: To develop a pilot queryable and scalable database for effective integration of whole exome sequence data and CNV data in order to identify additional genes and variants of interest in ASD. Methods: A pilot database was developed using available data for 24 probands—23 ...
Database, 2015
Functional annotation of genetic variants including single nucleotide polymorphisms (SNPs) and co... more Functional annotation of genetic variants including single nucleotide polymorphisms (SNPs) and copy number variations (CNV) promises to greatly improve our understanding of human complex traits. Previous transcriptomic studies involving individuals from different global populations have investigated the genetic architecture of gene expression variation by mapping expression quantitative trait loci (eQTL). Functional interpretation of genome-wide association studies (GWAS) has identified enrichment of eQTL in top signals from GWAS of human complex traits. The SCAN (SNP and CNV Annotation) database was developed as a web-based resource of genetical genomic studies including eQTL detected in the HapMap lymphoblastoid cell line samples derived from apparently healthy individuals of European and African ancestry. Considering the critical roles of epigenetic gene regulation, cytosine modification quantitative trait loci (mQTL) are expected to add a crucial layer of annotation to existing functional genomic information. Here, we describe the new features of the SCAN database that integrate comprehensive mQTL mapping results generated in the HapMap CEU (Caucasian residents from
Blood, Jan 2, 2014
The anticoagulant warfarin has >30 million prescriptions per year in the United States. Doses ... more The anticoagulant warfarin has >30 million prescriptions per year in the United States. Doses can vary 20-fold between patients, and incorrect dosing can result in serious adverse events. Variation in warfarin pharmacokinetic and pharmacodynamic genes, such as CYP2C9 and VKORC1, do not fully explain the dose variability in African Americans. To identify additional genetic contributors to warfarin dose, we exome sequenced 103 African Americans on stable doses of warfarin at extremes (≤ 35 and ≥ 49 mg/week). We found an association between lower warfarin dose and a population-specific regulatory variant, rs7856096 (P = 1.82 × 10(-8), minor allele frequency = 20.4%), in the folate homeostasis gene folylpolyglutamate synthase (FPGS). We replicated this association in an independent cohort of 372 African American subjects whose stable warfarin doses represented the full dosing spectrum (P = .046). In a combined cohort, adding rs7856096 to the International Warfarin Pharmacogenetic Con...
AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science, 2010
A key challenge for genome-wide association studies (GWAS) is to understand how single nucleotide... more A key challenge for genome-wide association studies (GWAS) is to understand how single nucleotide polymorphisms (SNPs) mechanistically underpin complex diseases. While this challenge has been addressed partially by Gene Ontology (GO) enrichment of large list of host genes of SNPs prioritized in GWAS, these enrichment have not been formally evaluated. Here, we develop a novel computational approach anchored in information theoretic similarity, by systematically mining lists of host genes of SNPs prioritized in three adult-onset diabetes mellitus GWAS. The "gold-standard" is based on GO associated with 20 published diabetes SNPs' host genes and on our own evaluation. We computationally identify 69 similarity-predicted GO independently validated in all three GWAS (FDR<5%), enriched with those of the gold-standard (odds ratio=5.89, P=4.81e-05), and these terms can be organized by similarity criteria into 11 groupings termed…