An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people - PubMed (original) (raw)

. 2012 Jul 6;337(6090):100-4.

doi: 10.1126/science.1217876. Epub 2012 May 17.

Daniel Wegmann, Margaret G Ehm, Darren Kessner, Pamela St Jean, Claudio Verzilli, Judong Shen, Zhengzheng Tang, Silviu-Alin Bacanu, Dana Fraser, Liling Warren, Jennifer Aponte, Matthew Zawistowski, Xiao Liu, Hao Zhang, Yong Zhang, Jun Li, Yun Li, Li Li, Peter Woollard, Simon Topp, Matthew D Hall, Keith Nangle, Jun Wang, Gonçalo Abecasis, Lon R Cardon, Sebastian Zöllner, John C Whittaker, Stephanie L Chissoe, John Novembre, Vincent Mooser

Affiliations

An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people

Matthew R Nelson et al. Science. 2012.

Abstract

Rare genetic variants contribute to complex disease risk; however, the abundance of rare variants in human populations remains unknown. We explored this spectrum of variation by sequencing 202 genes encoding drug targets in 14,002 individuals. We find rare variants are abundant (1 every 17 bases) and geographically localized, so that even with large sample sizes, rare variant catalogs will be largely incomplete. We used the observed patterns of variation to estimate population growth parameters, the proportion of variants in a given frequency class that are putatively deleterious, and mutation rates for each gene. We conclude that because of rapid population growth and weak purifying selection, human populations harbor an abundance of rare variants, many of which are deleterious and have relevance to understanding disease risk.

PubMed Disclaimer

Figures

Fig. 1

Fig. 1

(A) Frequency spectrum of variants relating the number of variants per kb within minor allele counts. Solid gray lines provide expectations from nucleotide diversity (θπ) and the number of segregating sites (θ_W_). (B) The number of common (MAF>0.5%, above the origin) and rare (MAF≤ 0.5%, below the origin) coding variants observed in each gene are shown as stacked bars of NS and S variants. (C) Log-likelihood surface of European population growth (r) and population size (Ne) in a demographic model. Colored contours correspond to 2 log-likelihood intervals. The blue point is the maximum likelihood estimate of r and Ne. (D) Per-gene mutation rates with 2 log-likelihood intervals. Horizontal lines are 10th, 50th and 90th mutation rate percentiles. Seven genes on the X chromosome and four genes with low target coverage or yielding too few common variants for inference (ADRB3, CCR5, MIF and PTGER1) were excluded. (E) Proportion of rare cumulative MAF (cMAF) accounted for by SNVs of increasing frequency. (F) Proportion of rare variants in four cMAF ranges falling within the MAF categories shown in (E). The successfully sequenced coding length of each gene (in kb) is overlaid as a gray line. cMAFs in (E) and (F) are for amino acid-changing variants in each gene predicted to be damaging or are evolutionarily conserved (phyloP≥2). Genes in (B), (D) and (F) are ordered by number of rare coding variants per gene and vertical lines correspond to rank deciles.

Fig. 2

Fig. 2

(A,B) Number of variants per kilobase of intronic, UTR, nonsynonymous (NS) or synonymous (S) sequence with sample size increasing to 50,000 (A) and one million (B) Europeans. Observed numbers are given as a dot, solid and dashed lines indicate hyper-geometric expectations and jack-knife projections, respectively. (C) Expected ratios of NS to S variants in the absence of selection and observed ratios for different minor allele frequency (MAF) bins. (D) The proportion of NS variants predicted to be benign, possibly damaging or probably damaging by PolyPhen or SIFT and the proportion of NS variants that is neutral, deleterious such that they will never become common (MAF >5%) or never be fixed in Europeans as predicted by the relative ratios of NS:S variant abundances observed at different MAF (2). (C,D) 95% confidence intervals are represented by white lines. (E) phyloP score for intronic, UTR, NS and S variants for different MAF bins.

Fig. 3

Fig. 3

Number of variants per kilobase of sequence with sample sizes increasing to 5,000 people for multiple populations. Observed numbers are given as a dot, solid and dashed lines indicate hyper-geometric expectations and jack-knife projections, respectively.

Comment in

References

    1. Pritchard JK. Am J Hum Genet. 2001;69:124. - PMC - PubMed
    1. Kryukov GV, Pennacchio LA, Sunyaev SR. Am J Hum Genet. 2007;80:727. - PMC - PubMed
    1. Marth GT, et al. Genome Biol. 2011;12:R84. - PMC - PubMed
    1. Manolio TA, et al. Nature. 2009;461:747. - PMC - PubMed
    1. Eichler EE, et al. Nat Rev Genet. 2010;11:446. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources