Genomic regions exhibiting positive selection identified from dense genotype data - PubMed (original) (raw)

Comparative Study

Genomic regions exhibiting positive selection identified from dense genotype data

Christopher S Carlson et al. Genome Res. 2005 Nov.

Abstract

The allele frequency spectrum of polymorphisms in DNA sequences can be used to test for signatures of natural selection that depart from the expected frequency spectrum under the neutral theory. We observed a significant (P = 0.001) correlation between the Tajima's D test statistic in full resequencing data and Tajima's D in a dense, genome-wide data set of genotyped polymorphisms for a set of 179 genes. Based on this, we used a sliding window analysis of Tajima's D across the human genome to identify regions putatively subject to strong, recent, selective sweeps. This survey identified seven Contiguous Regions of Tajima's D Reduction (CRTRs) in an African-descent population (AD), 23 in a European-descent population (ED), and 29 in a Chinese-descent population (XD). Only four CRTRs overlapped between populations: three between ED and XD and one between AD and ED. Full resequencing of eight genes within six CRTRs demonstrated frequency spectra inconsistent with neutral expectations for at least one gene within each CRTR. Identification of the functional polymorphism (and/or haplotype) responsible for the selective sweeps within each CRTR may provide interesting insights into the strongest selective pressures experienced by the human genome over recent evolutionary history.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Comparison of Tajima's D between Perlegen and SeattleSNPs data sets. For each gene, Tajima's D was calculated from complete resequencing data in the SeattleSNPs data set, or from the region spanning 10 kb upstream of the transcript, the full transcript, and 10 kb downstream of the transcript in the Perlegen data. (A) Tajima's D from Perlegen vs. Tajima's D from SeattleSNPs for AD population. (B) Tajima's D from Perlegen vs. Tajima's D from SeattleSNPs for ED population. Genes previously resequenced by SeattleSNPs are shown in red, with a trend line representing a linear regression on the data. Genes resequenced as part of the present study are shown as purple dots, with filled circles indicating that the gene lay within a CRTR in the population being plotted. The seven SeattleSNPs genes with robust signatures of selection in SeattleSNPs data are shown in green (Akey et al. 2004).

Figure 2.

Figure 2.

A probability density plot of the distribution of Tajima's D in the sliding windows is shown for each population. All three distributions depart significantly from a normal distribution, most noticeably in the heavy tail at low values in each population.

Figure 3.

Figure 3.

Tajima's D in 100-kbp sliding windows with 10-kbp steps is shown across the first 50 megabases of chromosome 1. Several CRTRs are visible, including a region near 35M in the ED population containing CLSPN (large blue arrowhead) and a region near 41M in the AD population spanning CTPS, FLJ23878, and SCMH1 (large green arrowhead). CRTRs at the less stringent 5% level are also indicated in the ED population as small blue arrowheads and in the XD population as small red arrowheads.

Figure 4.

Figure 4.

(A) A visual genotype for 1.5 Mbp spanning the CLSPN CRTR in the Perlegen data. Each row corresponds to an individual, and each column corresponds to a polymorphic site, with genotypes color coded as follows: Common allele homozygotes are shown in blue, heterozygotes are shown in red, rare allele homozygotes are shown in yellow, and missing data are shown as gray. The top 24 samples are ED, the middle 23 samples are AD, and the bottom 24 samples are XD. Although nucleotide diversity is depressed across a large region, there is no clear minimum within the CRTR. Nucleotide diversity was relatively constant across the region, so CLSPN (shown as a black box) was selected as a target for resequencing because of interesting patterns of Fst between ED and XD, in addition to low nucleotide diversity. (B) A visual genotype of the resequencing results for the CLSPN gene. The top 24 samples are ED; the middle 24 samples are AD, and the bottom 24 samples are XD. As expected, a number of polymorphisms nearly fixated between ED and XD were observed. One of these SNPs (10710, red arrowhead) changes an amino acid (Ser525Asn), whereas the other three are intronic (green arrowheads).

Figure 5.

Figure 5.

A close-up of the CLSPN CRTR from the UCSC genome browser is shown, with the Tajima's D tracks as well as a set of tracks showing the inferred relative recombination rate from LDhat for each population in grayscale (track label, LDhat log RR AD/ED/XD): Darker segments correspond to high inferred recombination rates. CLSPN is located at 35.9 Mbp. The left edge of the CLSPN CRTR (at ∼35 Mbp in the ED population) corresponds to a strong recombination hotspot observed in all three populations, but of greater interest are the hotspots spanned by the CRTR at ∼35.4 Mbp and ∼35.8 Mbp. Thus, although this CRTR does span a region with reduced recombination overall, there are several inferred hotspots within the CRTR that are shared between populations.

Similar articles

Cited by

References

    1. Akey, J.M., Eberle, M.A., Rieder, M.J., Carlson, C.S., Shriver, M.D., Nickerson, D.A., and Kruglyak, L. 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2: e286. - PMC - PubMed
    1. Bersaglieri, T., Sabeti, P.C., Patterson, N., Vanderploeg, T., Schaffner, S.F., Drake, J.A., Rhodes, M., Reich, D.E., and Hirschhorn, J.N. 2004. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74: 1111–1120. - PMC - PubMed
    1. Carlson, C.S., Eberle, M.A., Rieder, M.J., Smith, J.D., Kruglyak, L., and Nickerson, D.A. 2003. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet. 33: 518–521. - PubMed
    1. Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., and Nickerson, D.A. 2004. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74: 106–120. - PMC - PubMed
    1. Clark, A.G., Glanowski, S., Nielsen, R., Thomas, P., Kejariwal, A., Todd, M.J., Tanenbaum, D.M., Civello, D., Lu, F., Murphy, B., et al. 2003a. Positive selection in the human genome inferred from human–chimp–mouse orthologous gene alignments. Cold Spring Harb. Symp. Quant. Biol. 68: 471–477. - PubMed

Web site references

    1. http://pga.gs.washington.edu; Seattle SNPs Web site.
    1. http://genome.perlegen.com/browser/download.html; Perlegen Web site.
    1. http://genome.ucsc.edu/cgi-bin/hgGateway; UCSC Genome Browser.

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources