Annotation of functional variation in personal genomes using RegulomeDB - PubMed (original) (raw)

doi: 10.1101/gr.137323.112.

Eurie L Hong, Manoj Hariharan, Yong Cheng, Marc A Schaub, Maya Kasowski, Konrad J Karczewski, Julie Park, Benjamin C Hitz, Shuai Weng, J Michael Cherry, Michael Snyder

Affiliations

Annotation of functional variation in personal genomes using RegulomeDB

Alan P Boyle et al. Genome Res. 2012 Sep.

Abstract

As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

A SNV (rs9261424) overlapping many regulatory features. (A) This SNV falls within peak regions for many ChIP-seq factors as well as DNase-seq peaks from multiple cell lines. (B) The same SNV overlaps a motif match to the NFKB motif and has been shown to alter binding. The signal tracks represent ChIP-seq peaks of NFKB at the SNV site for three individuals: homozygous to reference allele (G), heterozygous, and homozygous to alternate allele (C) (Kasowski et al. 2010).

Figure 2.

Figure 2.

Incidence of SNVs in features and categories. Average percent count of SNVs in each genomic feature (A) and in each RegulomeDB category (B). Although the differences between homozygous and heterozygous SNV counts are small, they are nevertheless significant (P < 5 × 10−15). Actual SNV count in features (C) and categories for the cell line GM12878 (D).

Figure 3.

Figure 3.

Protein coding and noncoding SNVs can be classified as potentially functional by Polyphen-2 and RegulomeDB, respectively. Heterozygous, damaging coding SNVs can act in conjunction with a heterozygous regulatory SNV on the opposite allele to create a compound heterozygote and loss of function on both alleles (one regulatory, the other coding).

Figure 4.

Figure 4.

_TNFAIP3_-associated SNV. (A) RegulomeDB results for rs117480515 which is likely a functional variant associated with systemic lupus erythematosus. (B) This SNV was the most likely to be functional in the associated region but might be missed in a standard study because it lies >20 kb downstream from its target. (C) An enlargement of the region around rs117480515 (red line) shows the overlap with a large number of functional elements (NFKB, purple; BCL, light blue; and DNase, green) as well as the motif for BCL.

Similar articles

Cited by

References

    1. The 1000 Genomes Project Consortium 2010. A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073 - PMC - PubMed
    1. Adrianto I, Wen F, Templeton A, Wiley G, King JB, Lessard CJ, Bates JS, Hu Y, Kelly JA, Kaufman KM, et al. 2011. Association of a functional variant downstream of TNFAIP3 with systemic lupus erythematosus. Nat Genet 43: 253–258 - PMC - PubMed
    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR 2010. A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249 - PMC - PubMed
    1. Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, et al. 2009. Diversity and complexity in DNA recognition by transcription factors. Science 324: 1720–1723 - PMC - PubMed
    1. Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW III, Bulyk ML 2006. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol 24: 1429–1435 - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources