Matching Strategies for Genetic Association Studies in Structured Populations (original) (raw)

Abstract

Association studies in populations that are genetically heterogeneous can yield large numbers of spurious associations if population subgroups are unequally represented among cases and controls. This problem is particularly acute for studies involving pooled genotyping of very large numbers of single-nucleotide–polymorphism (SNP) markers, because most methods for analysis of association in structured populations require individual genotyping data. In this study, we present several strategies for matching case and control pools to have similar genetic compositions, based on ancestry information inferred from genotype data for ∼300 SNPs tiled on an oligonucleotide-based genotyping array. We also discuss methods for measuring the impact of population stratification on an association study. Results for an admixed population and a phenotype strongly confounded with ancestry show that these simple matching strategies can effectively mitigate the impact of population stratification.

Introduction

Genomewide association studies provide a powerful approach to implicate DNA variants (and, by extension, the genomic regions they represent) in the predisposition to complex diseases and in the genetic underpinnings of drug efficacy and adverse reactions. The success of these studies relies on the accurate measurement or estimation of allele-frequency differences between case and control subjects. When searching for small genetic effects in large association studies, systematic differences in ancestry between the cases and controls are likely to produce many statistically significant but spurious associations (e.g., Knowler et al. 1988; Lander and Schork 1994). Such differences are expected to be found when genetically distinct population subgroups have a different prevalence of the target phenotype.

The use of family-based association study designs mitigates the impact of systematic ancestry differences (population stratification) but can lead to an increased burden in the recruitment of subjects and in genotyping (Cardon and Palmer 2003). Self-reported ancestry is also useful in matching case and control subjects to reduce the prevalence of spurious associations. Population structure can be empirically determined by individually genotyping all potential cases and controls across a set of unlinked marker loci (Pritchard and Rosenberg 1999). When individual genotypes are known, analysis methods can correct the association test statistic for unmatched groups by use of the inferred population structure (Pritchard et al. 2000_b_; Reich and Goldstein 2001; Satten et al. 2001; Thornsberry et al. 2001; Hoggart et al. 2003).

In association studies using DNA pooled from many individuals, significant causal disease (or pharmacogenetic) associations would be indistinguishable from associations due to ancestry differences between cases and controls. Thus, genetic-ancestry matching prior to DNA pooling is essential. By use of inferred population-structure data, DNA pools can be constructed that are matched to have similar genetic composition, to minimize the likelihood of spurious associations due to population stratification. Allele-frequency estimates in the matched DNA pools should then give a more reliable indication of causal disease association. See the work of Sham et al. (2002) for a recent review of DNA pooling methodologies and implications for association studies.

In genomewide association studies, it is necessary to test at least hundreds of thousands of SNP markers because of the generally limited extent of linkage disequilibrium in the human genome (Risch and Merikangas 1996; Kruglyak 1999; Risch 2000; Patil et al. 2001). We are currently testing >1.5 million SNP markers in association studies, using pooled genotyping with multiple measurements of allele frequency in each of two pools as an efficient screen to enrich for SNPs with significant allele-frequency differences. The SNPs with the greatest apparent allele-frequency differences in the pooled data are then selected for individual genotyping. The pooled genotyping step reduces the number of SNPs that must be individually genotyped to confirm allele-frequency differences between case and control groups. In this context, spurious associations due to population structure force us either to examine more SNPs by individual genotyping or, if that is impractical, to sacrifice power to detect causal associations.

In this study, we describe the use of unlinked SNP markers to detect and correct for population stratification in case and control subjects in an admixed population prior to pooled genotyping for association testing. Using a phenotype that is strongly confounded with ancestry, we show that several strategies for matching case and control groups are successful at eliminating significant stratification. We also discuss methods for measuring the impact of stratification on a pooled genotyping experiment.

Methods

Subject Collection

Subjects were chronic alcoholics, some with alcoholic liver disease, recruited in Mexico City under full informed consent. The international institutional review board of the Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), which is registered with the Office of Human Research Protection, approved the human patient sample-collection protocol. Subjects were measured for height in cm at the time of blood-sample collection. The three self-reported ethnicities in this population were “Caucasian,” those of primarily Spanish European ancestry; “Otomi” Indians, from the Pachuca region in Mexico; and “Mestizo,” a mix of Spanish European and Mexican Indian ancestry. A total of 824 Mestizo males were examined to determine the distribution of height. The definitions of “tall” and “short” were chosen to include the upper and lower 25% of the observed distribution. This yielded a minimum height of 174 cm for the “tall” group and a maximum height of 162 cm for the “short” group.

SNP Selection

From a genomewide collection of SNPs discovered by Perlegen Sciences in a globally diverse panel of individuals (Patil et al. 2001), we selected a set of 312 that were roughly equally spaced across the autosomes and were expected to behave well in oligonucleotide array–based genotyping. SNPs were selected to be at least 150 bp from the nearest common repetitive element, as identified by the RepeatMasker 2 program (available on the RepeatMasker Web site), and the 25-bp sequence containing the SNP (± 12 bases of context) was required to be unique in the human genome, according to then-current National Center for Biotechnology Information (NCBI) Build 29 (available on the NCBI Web site). We also required that in Perlegen’s previously collected SNP discovery data, the SNPs have a high rate of high-confidence genotype calls and an allele frequency close to 0.5. A combination of these quality metrics was used to numerically score each candidate SNP. We then selected the highest-scoring candidates from a series of 2-Mb windows spaced at 9-Mb intervals across each NCBI Build 29 chromosome.

Primer Design

PCR primer pairs for each SNP were selected using the program Oligo, version 6.57 (Molecular Biology Insights). We selected primers having a Tm of 59°C–66°C, a length of 18–22 bases, a PCR product size of 50–200 bases, and 3′-end ΔG of between −5.5 and −9.8 kcal/mol. We also required that each primer be at least 5 bases from its target SNP. Primer sequences containing repetitive sequences, as determined by the RepeatMasker 2 program, were excluded. Only primer sequences determined to be unique (P<10-4) in the genome (NCBI Build 29) by use of the BLAST program (available on the NCBI BLAST Web site) (Altschul et al. 1990) were selected.

Genotyping Oligonucleotide Array Design

Genotyping arrays of 25-bp oligonucleotides were designed as four sets of 20 features (80 features per SNP), corresponding to forward and reverse strand tilings for sequences complementary to each of two SNP alleles. A set of 20 features consisted of five sets of 4 features where the location of the SNP within the oligonucleotide varied from position 11 to position 15. A set of 4 features consisted of sequences where A, C, T, or G was substituted at position 13. Thus, each set of four features provided one perfect match to the sequence of the corresponding SNP allele and three features with a single-base mismatch for that allele. Mismatch probes were used to measure background and, by comparison with the signal for the perfect match probes, to detect the presence or absence of a specific PCR product in a sample. Light-directed chemical synthesis of the appropriate oligonucleotides was carried out by Affymetrix (Fodor et al. 1991).

Hybridization Sample Preparation

For analysis of the 312 stratification SNPs, DNA was amplified by PCR in 12-μl volume containing 13 primer pairs at 0.4 mM of each primer, 10 ng of individual genomic DNA, 2 U Titanium Taq (Clontech), 0.5 mM deoxynucleotide triphosphates, 10 mM Tris-HCl (pH 9.1), 3 mM MgCl2, and additives. Thermocycling was performed on a 9700 cycler (Perkin-Elmer), with initial denaturation at 96°C for 5 min, followed by 10 cycles of 96°C for 30 s, 58°C minus 0.5°C/cycle for 30 s, 65°C for 1 min, then 40 cycles of 96°C for 10 s, 53°C for 30 s, and 65°C for 60 s, and, finally, an extension at 65°C for 7 min. PCR products were pooled together and labeled with 0.7 μM biotin-16-ddUTP/dUTP (Roche) with 25 units of terminal deoxynucleotidyl transferase (Roche), by incubating at 37°C for 90 min, after which the reaction was stopped by heat-inactivation at 99°C for 10 min.

Hybridization of Samples to High-Density Oligonucleotide Arrays

Labeled DNA samples were incubated in hybridization buffer (3 M tetramethylammonium chloride, 10 mM Tris-HCl [pH 7.8], 0.01% Triton X-100, 100 μg/ml herring sperm DNA, and 50 pM control oligomer) at 99°C for 10 min and hybridized to a chip overnight at 50°C on a rotisserie at 25 rpm. Chips were washed twice in 1 × MES buffer (0.1 M 2-[N-morpholine]ethane sulfonic acid [pH 6.7], 1 M NaCl, and 0.01% Triton X-100), and incubated with 5 μg/ml streptavidin (Sigma-Aldrich) and 2.5 mg/ml acetylated bovine serum albumin (Sigma-Aldrich) in 1 × MES for 15 min on a rotisserie at room temperature (RT). After two washes with 1 × MES at 35°C, chips were incubated with antibody solution (1.25 μg/ml biotinylated antistreptavidin antibody [Vector Laboratories] and 2.5 mg/ml BSA in 1 × MES) for 15 min on a rotisserie at RT, followed by another two washes with 1 × MES at 35°C. Then, chips were stained with 1 μg/ml streptavidin-Cy-chrome conjugate (Molecular Probes) and 2.5 mg/ml BSA for 15 min on a rotisserie at RT, followed by two washes with 1 × MES at 35°C. Chips were incubated for 30 min at 37°C in 0.2 × SSPET (30 mM NaCl, 2 mM NaH2 PO4, 0.2 mM EDTA [pH 7.4], 0.01% Triton X-100), followed by a wash with 1 × MES at RT. Hybridization of the labeled sample to the chip was detected using a confocal laser scanner (Perlegen) (Patil et al. 2001).

SNP Genotyping

For each SNP, we measured ratios of the mean intensity of perfect-match features for one allele to the sum of mean intensities for both alleles. In principle, these ratios should take on values near 1.0, 0.5, or 0.0 for AA, AB, or BB genotypes. We discarded data if, for both alleles, <9 out of 10 perfect-match features were brighter than their corresponding mismatch features. We used an expectation-maximization algorithm and a normal mixture model to assign intensity ratios to clusters.

For the stratification analyses, we only used data for SNPs that showed consistently good genotyping results (table 1). We excluded SNPs that had a pass rate of <80% on the basis of the perfect-match/mismatch comparison. We also excluded SNPs for which fewer than three genotype clusters could be identified, as well as those that had >20 ambiguous cluster assignments. Many SNPs showed moderate departures from Hardy-Weinberg equilibrium, which would be expected in a heterogeneous population. We excluded only those SNPs showing extreme deviations that could be traced back to convergence failures of the clustering algorithm. For the 275 SNPs passing these criteria, the overall call rate was 98.4%. In a set of 24 individuals genotyped in triplicate for these SNPs, we had a concordance of 99.8%. The 275 SNPs and all individual genotype data used in this study have been submitted to dbSNP (ss12673803–ss12674077) (available on the dbSNP Web site). SNP positions in NCBI Build 33 are also shown in table A (online only).

Table 1.

Quality-Control Checks for SNP Genotyping Results

Data-Quality-Filter Criterion No. of SNPs Passing % Passing
Pass rate >80% 309 99%
Three genotype clusters identified 308 99%
<20 ambiguous calls 305 98%
_P_>.00001 for Hardy-Weinberg equilibrium 303 97%
Maximum cluster width 282 90%
All criteria 275 88%

Table A.

SNP Positions, Alleles, and Flanking Sequences

dbSNP ID Chromosome Accession No. Position Reference Allele Alternate Allele Flanking 51-bp Sequence (−25 to +25)
ss12673885 1 NC_000001.4 9936378 C T AATAGAATAAAAGGGCCTAGAGTTACGGCATTTCACTTTGTAAAGGTTGCT
ss12673848 1 NC_000001.4 16347653 G A TATGGCTGACCAGGGGCATCTTTACGCATTGAACTCTCAGGTCACAAGTAT
ss12673890 1 NC_000001.4 31426127 A G TGACCATCTTGGCCAATTGCTCATCAAGTCCATGAAAGAAGATTGAATTTA
ss12673851 1 NC_000001.4 39299078 T C TATTCCTGAGTTTGGTACGCTTCAATGTTATATGCGGTTGTTCAGCCAAAC
ss12673850 1 NC_000001.4 48093199 G A CCCATTCCATGGACAGTAAAATTAAGGTTTGAGATCTCTAGGTATTCCCAC
ss12673891 1 NC_000001.4 54476998 G C CAAGAAACACCTCCAAGCTGAACCTGAATGTAGCTCCAGATTCTCATGGGA
ss12673892 1 NC_000001.4 71079585 G A CTCACTGTTCATGTTAACCGTCTGTGCTCTCTACACGGAGTGGAGCCCGTG
ss12673887 1 NC_000001.4 77499794 A T AAAGAGCCAACATTATATCCAACCAACTCTTGGCTCTAGACAATGAAGGTA
ss12673893 1 NC_000001.4 85573463 C A GCTGTTAAAGTTTCGTAGGCCGTAACCTGATGCCATGAGGCTCTGATATGT
ss12673894 1 NC_000001.4 93335361 G C GCTTCGTTCCCCTCTTACTTTTCACGTTACTACTTGCAATTCTCTAGCTCA
ss12673895 1 NC_000001.4 100468314 G T TTTGTGTTGCAAATGAGTTATAGAGGTGAATCCATGTGGGGCCAGAAAGTA
ss12673896 1 NC_000001.4 109070518 A C TGAGAAATCTCTTGAAAACCATTCTAAATTCCAGTTCCTATAAAATCAGAC
ss12673888 1 NC_000001.4 116699736 G A GCAACAGACCTGGCTAGGAGCTACAGATAGTTCCAACCAACCAGCTAGAAA
ss12673897 1 NC_000001.4 149656841 C G ACACACAGCATTCCAATGGGAGATTCAGGCCTAGAGCATGTCCTGTGGCTC
ss12673898 1 NC_000001.4 157108893 A G GCCACTGCTCTCAAAGGTACATATTAGGATGAGATACTGTTTACCCAGAGT
ss12673847 1 NC_000001.4 173882748 C T ACGGTCTGTATTGACTGGCTGCGCACGGACAAGTGTCATCTTGCCACACCT
ss12673899 1 NC_000001.4 180522150 C T ATAAAATATGACATCATTCTTACCACTGGTGCTGAAAAATGTCACTGATAC
ss12673900 1 NC_000001.4 189969066 A G AGAAAACCATGTAATTTTTCAGCTTAATTTAACATTGTATCTAGGGCAAGC
ss12673901 1 NC_000001.4 197042249 A G CAAAGTTAGGAAGGCCAATGAGAATACAATAAAGATTATGGGAGAGTAACA
ss12673902 1 NC_000001.4 214750406 G A ATAACCTACCATGTTTACCAGGTCCGCCAAGACTATGAAACAAATATATGT
ss12673903 1 NC_000001.4 221802714 G A GTGTAAGTTGTAATCACACAAAGGCGTAATCATGAAAAGTAGTAAAACATA
ss12673849 1 NC_000001.4 229272493 C T TTTACACATCAGTGCATTTTTGTATCTAGCACAATTCCTGGCATGTGGCTG
ss12673886 1 NT_077961.1 186534 T C AACTTGAGACATGTAATTGGGGTTTTGTAACTAGCTACTCACTATATAGCT
ss12673884 1 NT_077984.1 2495 G C ACAAGCTGATAATTACCATCATTTTGGAATTGTTCAGAACCATGAAGAATA
ss12673904 2 NC_000002.5 9639015 G A TTTGACACAAGGCAATAACTTCCGCGTAATGAGTACCCAAGAAAGTAGAAC
ss12673905 2 NC_000002.5 17853426 C G ACAGCAGAGTCAACTGGCTTCAGAACTGATCTTTTCCTCACTAATCCAAGT
ss12673906 2 NC_000002.5 26279750 A G AACCTTTGAATTTGTATTTGTCTGAATCATAGAATTTAGAATTATAATGGC
ss12673907 2 NC_000002.5 35537401 A G GCGAATAAAATAACAAGTCACATCAAGAAGTTGTGGCCTGATTTAAAACAA
ss12673854 2 NC_000002.5 41971450 A T TTCTGAACATACACCCAGGAATGTAATATTGTTCCGTTTTTGCAGAAGCTA
ss12673852 2 NC_000002.5 50417805 T G TTTAAGATTTGAAGCTTCTCAATAATATGGCTGCTTATATCAAGTTCTATA
ss12673908 2 NC_000002.5 76828447 A G ATCATGCAATTCAGGCAGGGAACCAATCTTTAGAAACTATACCCAGTTTAG
ss12673909 2 NC_000002.5 86713991 G A TGAGTTTTTCCTATTCAAGGAACCCGTGTTGATAATAACAGCAACCCCGGC
ss12673910 2 NC_000002.5 113628511 A G CAATAGCAACGTTTTTGAATCAGAGAAGTGATTTTGAACACACTGTACATA
ss12673911 2 NC_000002.5 123753048 A C TCTTTCCTCATTGTCTGCGATCTGGAAATAGAGCTTTCAGTTCTCATCACT
ss12673912 2 NC_000002.5 130675905 A G TAGAATCCATTCATTATTATGTGTGACTGGAGAGGTATATTGCTTAAAAAC
ss12673913 2 NC_000002.5 140961080 G A TGGCTCTTGACTCAGTAATCACTTTATGTCAAAATGTTTCCTAGTCTCCTT
ss12673914 2 NC_000002.5 157001034 T C ATCATCCCATTTCACATGAAGAAATTTGCATCTAGTGAGGTTCACTAACTT
ss12673915 2 NC_000002.5 166705382 G A AGGATGCTTCCTAAATTTCAGCAGAGGGATTATGATGCATTTATAAAGAAA
ss12673916 2 NC_000002.5 175050139 C T TACCATCATCCTAAATTGCTCCAGCCTGGAAAATTTTAAGTCAAATATCCC
ss12673917 2 NC_000002.5 183322843 A T AATGTACATTCAAAACAATCGAGGTAGGCTTTAAAGGAGCATTCAAAATCA
ss12673918 2 NC_000002.5 191081229 G A ATCAATATCACAAAAGACTTGCTATGAACTGTGCTAACTTGGGTATTTTTC
ss12673919 2 NC_000002.5 199615627 A G ACATGGCTAGCACATTGCTTGGCACATTCTCAGAGGTGAATAGGTCATTTT
ss12673853 2 NC_000002.5 207935369 C T CAACAGCCTTTCTCCATGAAGTTCCCCTGCAAGAAGCGTGAATCATACATG
ss12673920 2 NC_000002.5 216170075 A G TAGATTCCATGGTACCATGTTGAGAATTGTGCCTAGCTACTGAGAGTCTTT
ss12673921 2 NC_000002.5 225720515 C G TTTTAAAAATTATGCAGACCAGAGACTGTCAATTTAAGTCAGATCTGGGGC
ss12673922 2 NC_000002.5 233086431 G A TTCAGCCAGAGATACATCTTAATGAAGTGCTGACATTTTTCAGAGGATAAA
ss12674008 3 NC_000003.5 3699847 C T GAATCGACAAAGCTCTCCTGGAAAACGGCCATCTCATGGAAGTGATGGCCT
ss12674009 3 NC_000003.5 8874886 T C TTGTTTCCTTTAGATAACATCCTGCTGGTTACTGATCATACCTGTTGATGA
ss12674010 3 NC_000003.5 13413791 G A TATCCCTCAATAGCCCTGGGAACACGCTACTGAGAGCCACATTTTGGGGAT
ss12673856 3 NC_000003.5 21334999 G C TATACTTTTCATCAAGTGACAAGTTGTTCCCCATAGTAGCCTGCATGAAAC
ss12674011 3 NC_000003.5 30025863 T C CCTACCTGTGATGAACTTACTGGAATGGGAACTTTTCACTTTACAATTAGC
ss12674012 3 NC_000003.5 40271257 T C ATTAGGCAGACTTGATACCCTTATATGGCAGAACTTTAGAGCAACCACATT
ss12673855 3 NC_000003.5 47208180 T A AGTGTGGTTTTGCCTGTTGGGAAACTCTTCAGTCACACTTTTCCAAAAGTC
ss12674013 3 NC_000003.5 56092272 G A TTTGGCTTAAAAGGGGTACAATTAGGTCTTACTCATGCTGATTAAGGCAAA
ss12674014 3 NC_000003.5 70589259 C T CCGTTTGGGTTCAGCCTTACAGAGCCGTGATTTTGGCTACATCCTTTAGAA
ss12673857 3 NC_000003.5 84902112 A G ACGGTATAGTGCAAATCCTGACGGTAGGTTCTACAATTATGCTAATAGATT
ss12674015 3 NC_000003.5 99066776 T C GTGACCACTGACTTTTCAAGAGGTGTGAGGACAAGGCCAGATGACCATAGA
ss12674016 3 NC_000003.5 106351294 A G TCCAGGGAAAATACATTCCTGGCTGATCGTAGACAAGGGATATTGCCTGAA
ss12674017 3 NC_000003.5 114254346 A G GTCATTTAGACATGAATCAGAAGGCAAAATGTTTGGGCCTGACTAGAAAGA
ss12674018 3 NC_000003.5 122316930 G C TCCTTTCTCTGCCCATTTTTTCACTGTTGTTCCACGTTACTTTTCTTAATG
ss12674029 3 NC_000003.5 136194178 A G GTGGCTACAGATTAAAGGGTCAGTCATTGAAACATTGCTGGGATCACTGCT
ss12674030 3 NC_000003.5 144332113 T C GAACTCCTACTATGCGTTGGTTACTTGCCGGGCACTGGGAAACAAAGATGA
ss12674033 3 NC_000003.5 159534573 G A GAGAATTATTTCATGACTTTAGAGCGTAGTATTTTAGCATTTCAGCAGTAG
ss12674034 3 NC_000003.5 166984897 C T AAGTACAAAATTGTAGGTGGCAAATCGATACCCTGAGGGCTGCATATTTTA
ss12674035 3 NC_000003.5 174673711 C A ATATGCAGGGCACTGATTTTTGGATCGGAAAGGAGACTAAATTTTCCCCCT
ss12674036 3 NC_000003.5 182314045 G C AAGGATGGGTAGAGCGAGGACCTATCGGATGTGTAGGCAAAGCTGGTGGCC
ss12674037 3 NC_000003.5 190743817 G A TCACAGATAAATTCATGGCGCTTTAGCACAAGTATGGAACTTCACATTATA
ss12674052 4 NC_000004.5 733928 T C CATCCACTCCTCTTTCTAGTCTCCTTCCCGGGCGGTCAGGGAGTGTTGTCT
ss12674053 4 NC_000004.5 8720247 G C GTCTCCGGGTTGTTCTGAAACCATCGCAGGAATGATGCACACTGTGGCTGA
ss12673846 4 NC_000004.5 17350708 A C AACTTTTGAACCAGCTGAGAATTTCAAGTGTGATCGGTTTAGATTGGGATG
ss12673845 4 NC_000004.5 24834655 G A GACCATACTAGATGTCTCCCTCCCAGTGCAAGAAACCAACAGGCAGAAAAC
ss12674061 4 NC_000004.5 33801012 C T TTTAATCGGTTTTTATGTAAAAGCACGTGTCATTATGAACAAGGAATACAG
ss12674059 4 NC_000004.5 41840213 T C TTTTGAGAAGACATCACATTTTCTTCCAAAGGCAAGTACCCATTGACTTAG
ss12674060 4 NC_000004.5 48828164 T G CCCTGGATGGTGTTTACTGAACAAATTATCATCTCAGGCAGTATAACAATT
ss12674062 4 NC_000004.5 58397869 C T ACTGCAATGAGATTGAATAATGCCTCTAGGGTCTGAATAATACGTGTTGTC
ss12674065 4 NC_000004.5 66672503 T C AGGTAGGAGAGACCACTTTACGGCTTTCACCATTCACATTCCTCTTGGAGT
ss12674066 4 NC_000004.5 77117873 A G TTAATGGAGCCTTCTTTGAAATCTTGTAGTTCTGCTGCATGTTAATTGGCC
ss12674057 4 NC_000004.5 85230848 T C CCATTAAGACAGGTTCCTCACTGTATTGCTTATTCTCTTTCCAGTCTTTTT
ss12674058 4 NC_000004.5 92088698 T C CATGCTTGGTGATGAGCCTGGCATATGAAACAATTTCCTACAGTAATAAAA
ss12674054 4 NC_000004.5 100188103 G A AAGCTGCATTATAGAACACAATAACGAAACAGATGATGGTCTGTATACATG
ss12674055 4 NC_000004.5 108912304 A G AAGAAAATGCATTTGCAGGTTTACAAGAAGCAACCCAGAAATTCAATGGTA
ss12674056 4 NC_000004.5 115777312 C T GTATTAGGACTTTCACTGATACCGACATTTACTTTTAGGATTTCTAGGATT
ss12674067 4 NC_000004.5 124057410 G A ATTAAAACAGTATTAGATAGCATGCGGCTTCAAGAAGACAGCTCAGAAGAA
ss12674068 4 NC_000004.5 132889303 G A GCACCTAGCATGCAATAAGCCGCCAATGAACATCGCTGAGTTCTCAATTAT
ss12674073 4 NC_000004.5 141354227 G A CCTCCTGACTTCCCACTAATCAAGTGCACTAGCCCTCCTGGATATAAGCTG
ss12674075 4 NC_000004.5 149160357 C T GACGGAACTCCTGCATTCCAGAGTTCAGTGGTAAAAAGAACACTTTATGTA
ss12674063 4 NC_000004.5 158093442 A G AAATTATAGCCTGTTCAAATCATCCACAGACTTGTTCAAGAATAGGTAAAG
ss12674064 4 NC_000004.5 166155853 T C AGGAACTCACCACACATGAGCTATGTGTCCAGAAAACATTAACAGCTGCCT
ss12674076 4 NC_000004.5 175609800 T A GTGTTTGTTCATTCTGGGCAAATCATCCATTAACACTCAGAAGACTTTTAA
ss12674077 4 NC_000004.5 182884117 A T ACTGATGTTGTCTGAACCATGATTGAGGTCCTCCAAAAGTTATTCCCTTTA
ss12674027 5 NC_000005.4 2026753 C T CTACTCCAGCACCATTACCTCCTTACGCAAATGACCACAACTGAATTAACT
ss12673871 5 NC_000005.4 7856314 G A CCAGCCAGAACAAGAATCGTGGGGCGTTTTGGCAGCCTAGTCTAATCATTT
ss12673868 5 NC_000005.4 16716847 T C TAAAGTAGGGTGCACACTACAGTAATACCTCTAGCCCACAACTCCCAATCA
ss12673869 5 NC_000005.4 30043011 C A CATTGTAAAATATTCAATGCAGGAGCATAATCATGTTTGCCTTCTGATCTG
ss12674028 5 NC_000005.4 41185313 C A GATTGGTTGTTTTGGAACTGGAAAGCAACTTCCATCAGAACAATATTACAT
ss12673803 5 NC_000005.4 50605403 C T AGTGTTGAAAATGTGTCAGGCACTTCAGGGCATTGACCTATTAATCCTCCC
ss12673870 5 NC_000005.4 59427503 C T GACACTGAGAGAAGGGCCGCAATAACTCTCATGGCTCCTTTCCACTTTCTA
ss12674044 5 NC_000005.4 67386238 C T CTGCCTTGTTTACTGTGCTTTCTGACGCCTACCTCCAAGGTCAAAGGGGGA
ss12674045 5 NC_000005.4 74960399 G A CCTGGTTCTCTTCAGGATGGAATGAGCGCCCTCCACTTTGCCACTCAGAGC
ss12674041 5 NC_000005.4 82273510 G A ATTTGCTCTGGTAGAAATCATGTCCGCCCTTTTCCAGTTATCTTAGGTGAG
ss12674042 5 NC_000005.4 91333028 C T TGCTTGCATGAAATACCTATTCGCCCTCCATGAAACAGTTTGGAAAATGTT
ss12674046 5 NC_000005.4 100229818 T C AAAAGGTTGATTCTGAGATTTCCTATGATGGAAGTGTGGGAAAAAGTTAGG
ss12674047 5 NC_000005.4 108102517 T C CAGCTTATCTGGCCTGTTTCCTTTCTGGTTGGTTCGTTAGCTCCAGTCATG
ss12674072 5 NC_000005.4 115453797 T C AGGCTCTTCCCTGCGCAGTACAAGGTTTCCCTAGTAGTTTGGTTTGCCCAA
ss12674048 5 NC_000005.4 130864809 C T TTCAGTTTTTGCCTTTTAAGGCTCTCGTCCACTTTTAGCATGCTATTCTGT
ss12673867 5 NC_000005.4 139501013 C G TAAGCTGGCTGGTAAAGCCTCCACGGGACAACAGTTTCACTTTGCTTCGGG
ss12674049 5 NC_000005.4 147071503 A G TCGACAGTTATGAGAGACCTCAAAGATTCACAACATGAAGCCCTTTGTAAC
ss12674050 5 NC_000005.4 163553409 T C AATACTGGATGGAAAACATTTACAATGGGATAATAATAGAGCTAGAAACAG
ss12673872 5 NC_000005.4 171593857 T C GAACAAAAGAAGTCAATCAAGGGCATCAGTGACAATATTAACACCCAGAAT
ss12674051 5 NC_000005.4 177958304 G A GGAATGTGCAAAGGCCACGTCAGCAGATGGTTAGGTGCAATTTCACGCCTC
ss12673923 6 NC_000006.5 1504798 C T TGCTAAGTCTTCATTACAGGTTTCACTTTTTTATCGTCTATGACCACTATG
ss12673823 6 NC_000006.5 8572931 A T TTCTTATAGAATCAACCTTACTATGATCCTAAACTTTTGTTCTCAGAAACA
ss12673819 6 NC_000006.5 15865771 A C GTGGCTCGCGTGATATGAAAGGCCAAAGCATAGAGTTTCGTGAGGAAGAAG
ss12673924 6 NC_000006.5 25789873 C T CGAAAGCTATGAGCATTATGAATTCCTTCGTCACTGATATCTTTGAGCGTA
ss12673925 6 NC_000006.5 39933409 C A ATGTCCTTCTCCTGAACCACAGAAACGTGCTCTGCCTTAAGCACCTGTAAC
ss12673926 6 NC_000006.5 41245155 C T GTGAGACGCTGACTTTAGAAATAGCCGGTGATTACAGATTTAATTCATGTT
ss12673927 6 NC_000006.5 57634762 G C CCCAGGACCATTCCAGAGCTTATTCGTTCTACCTTGTTCTTCCTTGGGATG
ss12673820 6 NC_000006.5 66890421 A G CCCCAGGTACTTTGCATGTCTCACAACATTACGAATGGATAACTGAATCTC
ss12673928 6 NC_000006.5 75359591 T C TTTTCTTACTTTCTGCATAGTAATCTTTCATTCAGCACAGGACTTGAAAAC
ss12673929 6 NC_000006.5 83061795 C T TAATTCTATGTGGTAGCTACAGTTACCGATTCCGCTTATACAAAGTAATTG
ss12673930 6 NC_000006.5 92114257 C T GAAATGTGAATTTAGTATTTGTCAACTAATGCTGTTAAGTTAGAGACCTGT
ss12673818 6 NC_000006.5 109008107 T G GCATGAACTTGAGCACCTGAGTCCCTTGAATGCTGCTAAGGATAGGATGGA
ss12673825 6 NC_000006.5 117082712 T C TTCATACTTCTGTGCAATAGCTAATTGAGTTCCTGATTTAATGAATGATCT
ss12673931 6 NC_000006.5 125000422 T A AGAGCATATTGGTTACTTTGATTAATGGCTGATGATATTAAAACAGCATAG
ss12673822 6 NC_000006.5 133738141 C G TGCAAGTTTTGATCTAAATTGGCACCGACAAATTTTAAAACTATAGCCATT
ss12673824 6 NC_000006.5 152028449 G A TAAAAGCAGCCATGTCCAATTAGCAGTAAGTGCCATGCACCTGCAGTTACT
ss12673932 6 NC_000006.5 160529746 T G AATCCGTACATAGCTTTTGTTCATTGGATAATCGGGTGTAATATATGCAAA
ss12673821 6 NC_000006.5 169356087 T C GGATGTCCCTAAATCACGTTGTAACTGAGCAGACATTCACAGGGAAAACTT
ss12673843 7 NC_000007.7 2525072 A G ACTAACATCTTTCAAGTTTTTGGATAGACAATACATGCACAGAGTACCAAA
ss12673833 7 NC_000007.7 10586231 A G AGTCCATGTTGTCAATTCAGACCACACTTAGGGAATCAGACTCTCCAGGGA
ss12673834 7 NC_000007.7 19501292 T C ATTGATAGGTGCTGTCCACAAAGGTTTGGAATATAAAACCAGCACTGCTCT
ss12673835 7 NC_000007.7 27111483 G T ATTTTTTCACCTCTTGTGATATTCCGCCAAAGTAAACAATAGAGGTATTAC
ss12673839 7 NC_000007.7 36197746 G T TTGCTTTAATTACTCTGTACCTCATGTACTTGTAGTCTTTCTCACTATAAA
ss12673836 7 NC_000007.7 44314883 A G CCTTCCCATGTAACCTTCGGCTCTGAATGCACCTGAGTTTACCTAGCAAGC
ss12673841 7 NC_000007.7 68684292 A G TTGGGGCCAGGGCTCTGCACCTGGAAAGGCTTTATAACGTGAGATTCTCAA
ss12673831 7 NC_000007.7 81030028 C T GCAGTTGGGTATCTCAAGTGCCTGCCACAAGTAAATAGTTGTAAAAGCAAG
ss12673837 7 NC_000007.7 98149562 C T AAATGTAATACTCCACTCGAGCATGCGGCATTATTTAATCACTGATAGTTC
ss12673838 7 NC_000007.7 107294194 C T ATGAAAGGTATTAATCAGTCATTTCCGGCTCTTTATGTACAAGTGGTTCAT
ss12673832 7 NC_000007.7 113711846 A G AAACAAATGTGTTTTTGGAAACTAGATATGGTTTGGCTGCCTTCGAAATCT
ss12673933 7 NC_000007.7 124117659 C T AACTCTAGCTGCCATGTGATACTTACGAATTCCACCAGTATTTATTGGTTT
ss12673842 7 NC_000007.7 140143478 C T GTTTAACAGTAGAGTCCATTTTGTTCTCACTCAGCTGTTCTAGTTGAAGCA
ss12673840 7 NC_000007.7 148344112 G A TTCTGGGCCCAACTACAGTACAGACGTTGATGAGACCAACTCTGACTTTGG
ss12673963 8 NC_000008.5 6543128 C T AGATAATATTTAAAAAGTTTCATTCCGGGAGGCTTGGAACTATAGAGATAG
ss12673964 8 NC_000008.5 14891730 G C TTCCCATTATGTTCCACTTCTAATAGCTTTCACAAGACTGTCATAAACCAC
ss12673965 8 NC_000008.5 23192709 T A GAAAACGAGTCATCGTAAACTGAGCTGACCTGTACCCTACGCTGGAGAAAT
ss12673966 8 NC_000008.5 31938636 T C AATGTGCCAGGCACTGTGTTAAACTTCCAGATGGCAGTGAGAAACAAACTC
ss12673967 8 NC_000008.5 39612530 G C TATGAGTCTGGGCCAGCTGGAAACAGGTCTGGGATCTTCCAAGAAAGTCCT
ss12673968 8 NC_000008.5 49331802 G A ATCACAGCTGCCTGTTAACCAGCCTGAATGCAAAAAGTGAAAAAGCATTGC
ss12673969 8 NC_000008.5 56073823 C T TGAAGGAAGACGTAACAGCCAGAGTTCCTGTAAGAGCAAGAGAGGGTGGCT
ss12673970 8 NC_000008.5 63475926 C T CAAATATATTTCTGGCATACATCTTCCTTAACCTACATTATCCTCCTACTG
ss12673971 8 NC_000008.5 71825336 C G TTTCAGAAACCTAGGTCCAAAAGTCCTGCTAGGTATCTGGTATCTGGGATT
ss12673972 8 NC_000008.5 80590545 C T CGATAAGGTACTGCTTTAAGTTATTCTGAGGTCTTGCCTTTCTATAGACCC
ss12673973 8 NC_000008.5 87441087 A T TTTGGGAATTAGAATGCGTAGGTTAAGGTCCTAGTTCAATAAGTTAATGCC
ss12673974 8 NC_000008.5 96224852 C A TGAATCTTGGACTGTGCTACTTTGACTGTGAAAATACATTGACACTTGTGC
ss12673975 8 NC_000008.5 103925518 T C GTCCAGGTGACAACTCAGGAAAGAATTGCCACTTCGAAGCCGGAACACAAA
ss12673976 8 NC_000008.5 113781010 C T ATGACCAATTATTATTTGATGTGACCGATAGCTCCAGAACCTAAACAAATG
ss12673977 8 NC_000008.5 120241683 C T TATCTTCCTAAAGCAGAGCCAAAAACGTTGCTCTTCCAACTAAACATTTTC
ss12673978 8 NC_000008.5 129227046 A G AAGAAAAAGTTGACATTGTGATTACATATCAGTAGCATGACAAATTACATC
ss12673979 8 NC_000008.5 135579063 G A GTTCCAAGTGCACACCCTTTCTACTGTACTATCACAGCCTCTTGTTTCCCT
ss12674019 9 NC_000009.5 9383975 T C ACTTTCCTGATAGCTAGTGCTTTCATGATGCCCTTAGTGTCTACTGCCACG
ss12673873 9 NC_000009.5 17663047 C T GATTTCTTGCCCTGTTACCCTTACACGTGGCTGTTTGCCATGGTCTGTCAA
ss12673876 9 NC_000009.5 26070402 T A TTATAGGCATAATTTCTAACTCTCATTTAAGTGAGGCAGTTATCAATGTTG
ss12674020 9 NC_000009.5 33195409 T C GATGTATTAGCTGAGGGCCCAAAGTTGGGTAATGTGAGAAACCAGGACTCT
ss12673877 9 NC_000009.5 70506757 T C ACTAATAATTCCAGCCAATGTTTAGTGGAGATATTTCTTCTGACATTCTAA
ss12674021 9 NC_000009.5 77568465 C T TTAGTAAAGCCATTGTTCAAGCCATCGATATTAGGTTGTCAAATGTCTCTT
ss12674022 9 NC_000009.5 86491072 T C TCACAGTGTCCCTGTGTGATGCTCTTTTTTGACCCACACACTGTATAGGTC
ss12673874 9 NC_000009.5 91240102 C T TGGTGCCTGTGCAAAGAGTGGAACCCCAAAGAACACTGGGTGGTCAACACA
ss12673875 9 NC_000009.5 101820299 A G GCAGAGTTATATTTTGAAATATTGCAGTATTAGAAAAGCACATTATATATG
ss12674031 9 NC_000009.5 110194153 G A TGATGTGAGGATTTGAAACTTAGGCGGAATAGTAAGTACCAGGCATGGGCC
ss12674032 9 NC_000009.5 118689216 A G TCTGTACAAAGTGTATCATGGGACCATCCTATAAGGTTAAGCTTTCTCATT
ss12674023 9 NC_000009.5 126715684 A G CTCTGCATAAACTTGGAGAGAGGCCATTTCCTAATCAGAGGTCACAACTAG
ss12673997 10 NC_000010.4 509164 G C GATGTGAATCCACCTGTCACATATTGATTACATTCAGGCAATAACAGGGTG
ss12673998 10 NC_000010.4 9764292 G A TCACCTACATATGAGCAGCCTATCCGTCAGGCCAATGCTTAAGGTACCCCC
ss12673999 10 NC_000010.4 17081249 C A TTTCATTACCATTGTAATCTAGCCACAACAATGGTTGCTTTTTAAAACTAG
ss12674000 10 NC_000010.4 25905020 C T CAGGCAAGATCTCGTTTGTAAATTTCGTGGATTGAAAGTGAGGGACTAAGT
ss12674001 10 NC_000010.4 44291617 G A CTCCACAGCTGTTCCCAGGAATTTCGAAAGGGAGCACACCCTTGACTTGGT
ss12674002 10 NC_000010.4 53583207 G C TAGCTACTGCTCTTATTGAGGTTGTGTTTCTCTACTCCTCTGTAACATCGT
ss12674003 10 NC_000010.4 61618552 G A CTGATTTGCCTGTTAAAAGGCAGTAGGAAGGCAGTCCACCTGCTGTTTGCT
ss12673861 10 NC_000010.4 68562600 C A CCTTAACCATCACTTCTGCTGGAAACTTAGGGTGATCACCTTTTCCTAGAA
ss12674004 10 NC_000010.4 77221697 G T CAGCTTGGATTATTTTCCCCTGTCAGTTTAGCAATCAACAGCAATAAAAAC
ss12673859 10 NC_000010.4 83404531 C T CTGATTCATTGGTTCCTATATGGTGCCCCAAATTCTTAAGTCCTAATGCTC
ss12674005 10 NC_000010.4 92160155 C A GCTGAGGTTAGAAGCCTCCTTTCAACCCTGGTGAGAAGAGGTTGTACAGCG
ss12674006 10 NC_000010.4 101448303 A G GAGGCTAGATTCTGAAATGTTCCCAAGTCCAGCCATGAGGCCAAGGGAATC
ss12673860 10 NC_000010.4 110319618 T A TCACTTTTTCTGGTTTTAGCGAGGGTTCATTCGTTCATTCTAGCAGACAAA
ss12673858 10 NC_000010.4 117679716 G A ATTATGAAATCCATTCTCGAGTGGCGATTTTTTATGATGTTGTGTTATCAC
ss12674007 10 NC_000010.4 124558750 T A CGTGCAAGCCTAGTGAAACCAACCATGGGTCTCTCATCTGCTTTTACAGGA
ss12673947 11 NC_000011.4 9978639 A G CCTCTTCCACACTATTTTGGTAAACAGGACCAGCATTTATTCAGTCGCCTA
ss12673948 11 NC_000011.4 19526219 T G AACTTCTGTAATTTCCAATTCATGATGAAAGCCTAAGTAAAAATATCTGAC
ss12673949 11 NC_000011.4 26302239 T C ATTAATTCATTAGGAGCTTTTCCCATGTATGATCTGACACATTTCTGCCTT
ss12673950 11 NC_000011.4 34941032 G C AAATGTGTTTGATCTAGATCTCTTAGCAGTTTAATCCTGCATTCATAACCA
ss12673951 11 NC_000011.4 50463164 T C TTGAGGTTTTTGGCATCATTGGACATCATGAAATATGTAAATAAGATGGCA
ss12673952 11 NC_000011.4 58525253 T C AATTAAAAACAGGATGAGGAAAATTTGGTACATTCATTTGTATGCTTCAAT
ss12673953 11 NC_000011.4 73007371 A G GCTCTGTAAACCTCACAAACGCTCAATCTTTTTAGTCAATCAATCCTTTGC
ss12673954 11 NC_000011.4 79267467 T C AAAATGAAACTACACCTAATATCTATGAAGCCAATTGTACGTAGTAAAGAT
ss12673955 11 NC_000011.4 88615466 A G AACAATTCAAAAATCAGGGATCATAGCACTGACAAAAGCTCTAAAGTAATA
ss12673956 11 NC_000011.4 95783160 G A GTTTGTAGAACACACTAAGATGCTGAGAAGACTGCAGGTAAAGAGTTCTGC
ss12673957 11 NC_000011.4 102987050 A G AAATGGGTAAAGATTGCACGGGAGCAGTTACAACATTTCTACTTTTGTCCT
ss12673844 11 NC_000011.4 110769906 G A GTTTTCATCAGTTTTGTGGTCATACGTTTCTGATATGCTTCATTAATTGTT
ss12673958 11 NC_000011.4 119349702 T C AAATCTTCAATTTTGAAACCAAGTTTGTACTCTTGGCTGTAGAACCCCAAT
ss12673879 12 NC_000012.5 9734612 C T AGAGAGACCCTTCAAATACTGCTTACGTAACTTAAGAGTCAGCAATACTTG
ss12673980 12 NC_000012.5 25449289 C T GCTCTAGATTACCCATATAAAGTGGCTGGTTTTAGGCCTATGGCTTTTATT
ss12673981 12 NC_000012.5 33771690 G A CACATAGGCGATGTGGCTTCCAAGAGTCCCCTGGTCAGAGTAAGCCATGAT
ss12673982 12 NC_000012.5 41545985 C G GAAAAAGCAAACATTTTCATTGATAGAAGGGTGAGCCATCTTTGCCTTACT
ss12673983 12 NC_000012.5 48787992 T C CATACATCTCTTCAAAGCAGCAAGTTTGGCCATCTAGAACCACAATGGAAA
ss12673984 12 NC_000012.5 64435176 A G TCTTGCTGGGATGTCTAGACGTGGTAAAAGGTTTATCTGCTGTGCAATGGA
ss12673985 12 NC_000012.5 78087554 C T AGCTCAAGTGTGAGTCAGGCAATTACGAGTACTAGGAGGCAGGACCATCAT
ss12673986 12 NC_000012.5 85983132 T C CCTGTCTCATTCAAGTTGTATAGTATGAAATAGCATTATTGGAAGTTTTCT
ss12673987 12 NC_000012.5 94257563 A G TTACAAATCTGGAGATAACCAAATCATTTTTCGGATTTAAGTGAAGACACT
ss12673988 12 NC_000012.5 102304251 G A TTTCCAGTATAGCAAACTTAACTGCGTTCTCAAATAGTGCATTATGAACAT
ss12673880 12 NC_000012.5 109687937 A G ATTATCATTCTCAGATTTGATCCTTATAAATTCCATAGCTAAGACCCCTTG
ss12673989 12 NC_000012.5 120061185 C T CAAAGGCACAGAAAACTCAAAGAACCTCCCAAAGGCAACAATACACTCAGC
ss12673878 12 NC_000012.5 126352865 T C TGCTTTCTTGGAATATCCTCAAATTTGGTCACTCAGGTGACTTTGCTGAAA
ss12673990 13 NC_000013.5 37569214 A C TTTCACAATTTCTTTCTTGTGTCTCAACATTTTGTATGATTCATGAAAATG
ss12673991 13 NC_000013.5 46030774 C T CTAGGCAAATATGTATTGGTTCAGACACTATTCGAAATAGGGCTGTTGGCC
ss12673992 13 NC_000013.5 54733488 A G TGTTGGCGCATTTCAATTGCAGAGAAGTTTTCAAATGATTTTAATTTTTCC
ss12673993 13 NC_000013.5 61713070 A C TAGATAGGTATTATGGCTAAATGAAACAGTCACATCTACTATTTGTTGAAT
ss12673994 13 NC_000013.5 71242882 T C ATTTGGGGGATCTTGATTCCACCATTATCTATAGCTCCATCTAGGCTCCAG
ss12673881 13 NC_000013.5 79078549 A G TGTATTGGAATCCTTAGTGACTCACAGTATACATCCCATTAGATCTGCTGT
ss12673995 13 NC_000013.5 87134862 G T GTAAAGTATAACGGAGTCTACCATTGTATTGGGTACATGAGAAACAAATAA
ss12673996 13 NC_000013.5 104455053 G T AGAATATGTTCTGAAGTCTTTTCCTGTTGAATACCATCCAGAATTTTTAAA
ss12673809 14 NC_000014.4 21935494 A G GCTTGGTTCCAGTACATTATGGTATAAACTTTGGCTGCTGCCTCCTCAGCA
ss12673810 14 NC_000014.4 30627841 T C TAGAATTCAGGCAATGGCTTAATCATAAGGAACTACATGTGAGCCTAATGT
ss12673811 14 NC_000014.4 37903865 T A TGGATGGTTGTAGTGCACTGGGTTGTTTCAGGTAGGGATGACAAGGTTTTG
ss12673817 14 NC_000014.4 47597308 C A ATACACAAACAGGTCAGAAAGCTCCCAATGTAGCAGTTAAACAGTGTTTCC
ss12673812 14 NC_000014.4 56536816 A G TAGGCAACAGCCAGGTTTGACTGCCAACGATGCTAAGACAAGGAGATGAGG
ss12673813 14 NC_000014.4 64870703 C T ACATTTGCTGAATTACAAAGTAGTGCAGCTGTACATCAAGGCCAAAAGCTA
ss12673814 14 NC_000014.4 73249181 C T CATAATCTTGTAGTCTCAGGAGAAGCGGCCCTTCTGATGAGAGCTAATCCT
ss12673816 14 NC_000014.4 81522156 C T TTCTTTTTGCCTAATTGCAAACTTACGATATTCACAAAGACACAAATCTTA
ss12673815 14 NC_000014.4 98203350 G A TGTTTGCTATCCTGTGCTTGCCTCCGCTCTATCGGGCGCTGTGCCCCATCT
ss12674074 15 NC_000015.4 31015317 G A AGGTCCAAAACCTATCGCCTTGATAGAAATATGATATGGAAATCAGTAGGG
ss12674070 15 NC_000015.4 41408146 C T CCTACTCCATCCTCTACTGCTTCATCGCCCTCTAGTACTTGACTAACCTAC
ss12674071 15 NC_000015.4 46635808 T G TTAAAACATGAACTTGTTGTGCGTGTCTTGGATAGCAAAAAAAATCCCTCT
ss12674069 15 NC_000015.4 64802044 C T GAAACCTGGGCCAGGGATACATTTTCGCAGGTCCCGCAGACACTGCTAAGC
ss12673942 16 NC_000016.4 929722 C T AGATGGGAAGATACTTGTGATTTGACGGGAAGTAAAAAAACTTTGGTTATT
ss12673943 16 NC_000016.4 8210016 T G TTATAAACCAATCACCATTGAGAGGTTCCCCTTAGCCAGATCCTGGTTTAA
ss12673944 16 NC_000016.4 16027252 C T CCTATTTTGTACTTCTTATTTTATCCGATTGAATTGTGGTGGAGATAGGAA
ss12673882 16 NC_000016.4 22941613 A G AGAAAACAATGGAACAGTAACAATCGATCATTATGAGCTATCACCAAGACA
ss12673883 16 NC_000016.4 53860338 G A ACAACTATGAGATATTTCGTATTTTGAATGCCCCACAAATAAACAGATATT
ss12673945 16 NC_000016.4 60769659 A G TTAGCCTGTATTCCCATGAAAGATGACTCCAGAAACTTCAGAAGGATTGCT
ss12673946 16 NC_000016.4 68835084 T C TCCTGCCTTTCTTTACTGACCGTCCTGACGCTTTCAGTGAAGTGTCTCAAA
ss12673889 16 NC_000016.4 76218899 G C AGTAGCTATAATAACTTTGTCACATCAAACAAGATGAGTAAACTGGAATGT
ss12674043 16 NC_000016.4 83056636 G A ACAGCCTTATTAACTAACTCATCCCGCAGTTTTCAAAGAGCATGTATTTCT
ss12673862 17 NC_000017.5 17949595 T C GCTCTACAGAGGTCAGGACACAGCTCGGGGTCACGGCGCAAACCTTCAAGC
ss12674024 17 NC_000017.5 25674535 G C TTGTCCAGTAAGGCTGTCTCTACCAGGTAACACATGACTGCCAAGTGGGTA
ss12673863 17 NC_000017.5 33821968 G A TATTTTTATTTATCTCGGTCTTGACGGTCTGAATTACTGTGGCCTCCATGT
ss12674025 17 NC_000017.5 41821078 G A TCATGGAGGCAATTCCAGACAAAGGGATCAGTGCAAGCAAAGGAAGCGAGG
ss12673864 17 NC_000017.5 49396518 T C TATTTCCTATAATTCTCCTATTTGTTCCATGGCAGTTATCTAAAAATATAC
ss12673865 17 NC_000017.5 56570082 C A GAAAGAACCCACGGTTACTGACGGGCTTTAGCCATTACAGTGACACTCAAA
ss12673866 17 NC_000017.5 65526529 T C AAAGACGGAGGTCATGTTAGAGAGATTGTGAAAAGTAAAAATGTGTCAAAG
ss12674026 17 NC_000017.5 74688517 T C AACCCTGTACCCTTCTTCCTTGTGGTGCTCTCAGAACCCTTATGCATTACA
ss12673959 18 NC_000018.4 1287985 C G ATTATGGGACTGCTATCTTAGCCTACTAGAATGGAATCAGCATGGGGATCC
ss12674038 18 NC_000018.4 8059870 T C AAGAAGTAAGCTGGGATACAGAAAACTCACACCCTCAACACACGATCACTA
ss12673960 18 NC_000018.4 24641105 G C ATTTAACCTCATTTACTTTGTCCCTGTCATAGAACCTGTACTTGATGGATA
ss12673961 18 NC_000018.4 32549247 T A TGAAGTAAATACTGTGCATTCTTCAAACTGATTTGGGATCCTTCTGATACT
ss12673962 18 NC_000018.4 54949539 G A CTCTATTCCCTAAAGCAGGCTAAAGGTTTCACTGAAGTCTTATACTCTGTC
ss12674039 18 NC_000018.4 63533863 A G TGGAAATGGCTACATTATCATTTGCATAAGCCTCTCATGCAGAATTATCTC
ss12674040 18 NC_000018.4 70863198 G A TTTGTGCAAACTTCATACACTTCCAAATCTTCTGTAGCTGAGACGAGTGAA
ss12673934 19 NC_000019.5 206544 C T TAGTCATGAAGTTAATGATAAAAGACGACCCATGCCTTATTTATGTAATAA
ss12673935 19 NC_000019.5 7106531 A G GTTACTCACCCAACAATCTAATGCCACAAGAAAAAATAACTCGGGAACAGC
ss12673936 19 NC_000019.5 14842230 T C ATACCTTTCCTCCTGTTATTCCAACTCTGAACACATCAGTTTCCTGGGGGA
ss12673937 19 NC_000019.5 21737353 T C TTGGGTAAAGGTAAAACTGTGTCCATTACTCTCAGTCATCTTGGTTAGAAT
ss12673938 19 NC_000019.5 34028148 G A TCCACAGTCAGAAGACACGCTAGACGAAGGGCGTCCATCCAGTCTCAGCCC
ss12673939 19 NC_000019.5 41616809 T G GAGCCGAGTTCTTTCTTAAACTGCCGATTACATTCCCAATCATCTCTGAAA
ss12673940 19 NC_000019.5 49030287 T C GCTAAAAAAATGAGACTTGAAAAAATCCAGACTTTTGAAGAGTTTAGGAAA
ss12673941 19 NC_000019.5 57184140 C T GAGCAAGGTCTGAAGAGGAACAAAACGGTAAGTAATTAATAAAGCCTAAAT
ss12673830 20 NC_000020.5 1332298 T C GGTGAAACTGTAGCCAAAACTCTTATAAATTCTATGGTGGACATTTGGTGA
ss12673829 20 NC_000020.5 9620074 A C TTAGGCAACTGTCACGAAAATCATAAGACTCTACGGAAAGAAAAAGACTGT
ss12673827 20 NC_000020.5 18818060 A G TAGCCAAGAAAATCAAATTTCCACTATCCCGAGAAGGTTAGCTCTGTTGTT
ss12673828 20 NC_000020.5 38659006 C G TTCTGGCTCTTGGAAAGTCATTGTTCTCAAATGGGATGCCATGATTTGTAG
ss12673826 20 NC_000020.5 46896999 A G ATACTAAATAAAATATCTTTAAGCAATTTAGCAAGTAGCATCTTTGAAAAT
ss12673804 21 NC_000021.3 21432625 A G TCTGTAAATTGAAGATGATTACAGTAGTCGTAGTTCCCCAATCTTAAGCTA
ss12673805 22 NC_000022.4 15767574 G T GGTCGTGCCTCCTGCGGACCTGAGTGACCTCATGGAACAGAGCCAACGACA
ss12673806 22 NC_000022.4 25256704 G A CAGACAGGAACAAATCAGATGACCAGGAATTGAGAGACTGAACATTTCCTC
ss12673807 22 NC_000022.4 32926742 A G CCATGTGGACTCGCTACAGAGGTACATGCATAGGTCCAAGATAGGCGTCCC
ss12673808 22 NC_000022.4 42338946 C T TGTTAGAACCTTCTTTTTCTATAGACGGCCAGCACTGGCATGAAGAGATGC

Statistical Analysis

We used the structure program (Pritchard et al. 2000_a_) to identify population subgroups and infer admixture information from SNP genotype data. All runs were 100,000 cycles, after a 20,000-cycle burn-in period. We selected a model with admixture and with correlated allele frequencies; we used the defaults for other settings. We did not use prior information about population membership to direct the clustering. Without this information, the structure program cannot distinguish between solutions with permuted cluster labels; therefore, we manually assigned labels to clusters, for consistency across multiple analyses. Genetic distances (_F_ST) were calculated from _structure_’s allele-frequency estimates, as in the study by Weir (1996). False-discovery rates were calculated using Q-VALUE (available on the Q-VALUE Software Web site) (Storey and Tibshirani 2003). All other statistical analyses were performed with the R package (available on the R Project Web site) (Ihaka and Gentleman 1996).

Results

Assessment of Population Structure

A total of 707 individuals recruited in Mexico City were selected for genotyping. The majority of subjects (655) were of Mestizo (“mixed”) ancestry; small numbers of individuals of self-reported Caucasian (23) and Otomi Indian (29) ancestry were also included. Using high-density oligonucleotide arrays, we genotyped these subjects for 312 uniformly spaced, unlinked SNPs. Of the 312 markers, 275 yielded high-quality genotype data. Many of the SNPs showed larger-than-expected allele-frequency differences between the three subpopulations, measured as an excess of small P values in χ2 tests (table 2). Controlling for false-discovery rate (Storey and Tibshirani 2003), we also counted SNPs having q values < 0.05 and found many significant associations. The q value method accounts for multiple testing, and it indicates the number of SNPs with significant associations such that, on average, only 5% will be false positives.

Table 2.

Association Test Results for Population Subgroups with 275 SNPs

Number of SNPs with χ2 Test Statistics
P<.0001 P<.001 P<.01 P<.1 q<.05
Expected 0 0 2.75 27.5 0
Caucasian—Mestizo 2 5 23 85 8
Otomi—Mestizo 0 1 15 50 0
Otomi—Caucasian 3 14 34 105 32

We analyzed this genotype data for population structure using the structure program (Pritchard et al. 2000_a_; available on the Pritchard Lab Web site). This is a model-based method for identifying subpopulations in which, within each subpopulation, all markers are in Hardy-Weinberg and linkage equilibrium. The analysis supported the presence of two genetically distinct population clusters, one of mostly European ancestry (“cluster A”), and one of mostly Indian ancestry (“cluster B”). The estimated cluster-membership proportions for self-reported Caucasian and Otomi Indian samples are well separated; Mestizo samples are uniformly distributed across nearly the full range of values (fig. 1). There was no strong evidence for models with more than two population clusters. On the basis of their estimated allele frequencies, we determined a genetic distance of F _ST_=0.14 between the two clusters. Phenotype information and cluster-membership proportions for each sample are reported in table B (online only).

Figure 1.

Figure  1

Distribution of ancestry for self-reported population subgroups. Density distributions for the inferred fraction of subjects with cluster A ancestry are shown for 655 Mestizo, 23 Caucasian, and 29 Otomi Indian subjects. Each tick mark represents the fractional ancestry of an individual subject.

Table B.

Sample Phenotypes and Inferred Cluster Membership Proportions

Subject Ethnicity Sex Height (cm) Cluster A Cluster B
0b7 Mestizo Male .3126 .6874
0b8 Mestizo Male .1795 .8205
0bb Mestizo Male .1457 .8543
0bc Mestizo Male .4727 .5273
0bd Caucasian Male .7723 .2277
0be Caucasian Male .9331 .0669
0bf Caucasian Male .8838 .1162
0bg Mestizo Male .4903 .5097
0bh Otomi Male .4210 .5790
0bi Mestizo Male .3954 .6046
0bj Mestizo Female .3851 .6149
0bk Mestizo Male 162 .6158 .3842
0bl Mestizo Male 170 .9062 .0938
0bm Caucasian Male 168 .5929 .4071
0bn Mestizo Male 166 .5515 .4485
0bo Mestizo Male 162 .5460 .4540
0br Mestizo Male 165 .6342 .3658
0bs Mestizo Male 174 .8261 .1739
0bt Mestizo Male 158 .3408 .6592
0bu Caucasian Male 180 .9376 .0624
0bv Caucasian Male 185 .8000 .2000
0bw Mestizo Male 174 .7291 .2709
0by Mestizo Male 161 .1883 .8117
0bz Mestizo Male 163 .5132 .4868
0c0 Mestizo Female 156 .4868 .5132
0c1 Mestizo Male 165 .8240 .1760
0c2 Mestizo Male 159 .3869 .6131
0c4 Mestizo Male 165 .2128 .7872
0c6 Mestizo Male 169 .7532 .2468
0c7 Mestizo Male 155 .1457 .8543
0c9 Mestizo Male 162 .4311 .5689
0ca Mestizo Male 168 .6268 .3732
0cb Mestizo Female 150 .6889 .3111
0cc Mestizo Male 164 .6794 .3206
0cd Mestizo Male 160 .1669 .8331
0ce Mestizo Male 163 .5118 .4882
0cf Mestizo Male 165 .4008 .5992
0cg Otomi Male 155 .3292 .6708
0ch Mestizo Male 175 .3374 .6626
0ci Mestizo Male 157 .1288 .8712
0cj Mestizo Male 170 .6687 .3313
0ck Mestizo Male 165 .4926 .5074
0cl Mestizo Male 155 .0853 .9147
0co Mestizo Female 141 .1381 .8619
0cp Mestizo Male 165 .6193 .3807
0cq Caucasian Male 179 .9381 .0619
0cr Mestizo Male 160 .2016 .7984
0cs Mestizo Male 168 .6686 .3314
0ct Mestizo Male 177 .5407 .4593
0cu Mestizo Male 170 .6671 .3329
0cv Mestizo Male 174 .4032 .5968
0cw Mestizo Male 170 .2612 .7388
0cx Mestizo Male 179 .3576 .6424
0cy Mestizo Male 170 .6408 .3592
0cz Mestizo Male 157 .6199 .3801
0d0 Mestizo Male 160 .6221 .3779
0d1 Mestizo Male 175 .6839 .3161
0d2 Mestizo Male 173 .3033 .6967
0d3 Mestizo Male 175 .4385 .5615
0d4 Mestizo Female 149 .2430 .7570
0d5 Mestizo Male 170 .7654 .2346
0d7 Mestizo Male 165 .6873 .3127
0d8 Mestizo Male 173 .2555 .7445
0d9 Caucasian Male 171 .9000 .1000
0da Mestizo Male 183 .3681 .6319
0db Mestizo Female 165 .9016 .0984
0dc Caucasian Male 170 .9298 .0702
0dd Mestizo Male 175 .6059 .3941
0df Caucasian Male 181 .5684 .4316
0dh Caucasian Male 172 .6811 .3189
0di Mestizo Male 157 .7941 .2059
0dj Caucasian Male 178 .7389 .2611
0dk Mestizo Male 175 .9220 .0780
0dl Mestizo Male 172 .4538 .5462
0dm Mestizo Female 136 .2183 .7817
0dn Mestizo Male 166 .3694 .6306
0do Mestizo Male 168 .5476 .4524
0dp Mestizo Male 166 .1653 .8347
0dq Mestizo Male 170 .7309 .2691
0dr Caucasian Female 158 .7624 .2376
0ds Caucasian Female 160 .9170 .0830
0dv Caucasian Female 151 .5181 .4819
0dw Mestizo Male 178 .5863 .4137
0dx Mestizo Male 167 .6549 .3451
0dy Mestizo Male 170 .4953 .5047
0dz Caucasian Male 171 .8717 .1283
0e0 Caucasian Male 160 .2408 .7592
0e1 Mestizo Male 172 .5484 .4516
0e2 Mestizo Male 171 .6700 .3300
0e4 Caucasian Male 168 .6855 .3145
0e5 Mestizo Male 167 .5110 .4890
0e6 Mestizo Male 158 .1879 .8121
0e7 Caucasian Male 165 .6337 .3663
0e9 Caucasian Male 173 .8050 .1950
0ea Caucasian Male 178 .8034 .1966
0eb Mestizo Male 161 .8818 .1182
0ed Mestizo Male 153 .1724 .8276
0ee Mestizo Male 179 .5559 .4441
0ef Mestizo Male 170 .1890 .8110
0eh Mestizo Male 170 .6383 .3617
0ei Mestizo Male 158 .1396 .8604
0ek Mestizo Male 161 .5511 .4489
0eo Mestizo Male 155 .1599 .8401
0ep Mestizo Female 153 .1997 .8003
0eq Mestizo Female 154 .1193 .8807
0er Mestizo Male 166 .1575 .8425
0et Mestizo Female 156 .8033 .1967
0eu Mestizo Male 172 .1903 .8097
0ev Mestizo Female 144 .0896 .9104
0ex Mestizo Male 163 .5826 .4174
0f0 Mestizo Male 162 .1381 .8619
0f1 Mestizo Male 160 .3733 .6267
0f2 Mestizo Male 151 .1719 .8281
0f3 Caucasian Male 191 .8856 .1144
0f4 Mestizo Male 164 .7052 .2948
0f5 Mestizo Male 160 .2973 .7027
0f8 Caucasian Male 184 .8615 .1385
0fb Mestizo Male 160 .0949 .9051
0fc Mestizo Male 167 .2901 .7099
0fe Mestizo Male 168 .1597 .8403
0fh Mestizo Male 160 .2974 .7026
0fj Mestizo Male 160 .0968 .9032
0fl Mestizo Male 163 .3957 .6043
0fm Mestizo Male 168 .0731 .9269
0fn Mestizo Male 164 .0996 .9004
0fo Mestizo Male 160 .6349 .3651
0fq Mestizo Male 154 .4988 .5012
0fr Mestizo Male 166 .2022 .7978
0fs Mestizo Female 154 .4985 .5015
0ft Mestizo Male 155 .4147 .5853
0fu Mestizo Male 168 .9101 .0899
0fv Mestizo Male 167 .8343 .1657
0fw Mestizo Male 166 .4248 .5752
0g1 Mestizo Female 118 .2731 .7269
0g2 Mestizo Male 171 .5132 .4868
0g5 Mestizo Male 154 .7244 .2756
0g6 Mestizo Male 171 .3916 .6084
0g7 Mestizo Female 159 .5678 .4322
0g8 Mestizo Male 160 .3190 .6810
0g9 Mestizo Male 165 .3160 .6840
0gb Mestizo Female 144 .1203 .8797
0gc Mestizo Male 185 .5447 .4553
0ge Mestizo Male 162 .2413 .7587
0gh Mestizo Male 165 .7255 .2745
0gj Mestizo Male 167 .1250 .8750
0gk Mestizo Male 165 .6056 .3944
0gl Mestizo Male 162 .4868 .5132
0gm Mestizo Male 160 .0690 .9310
0gn Mestizo Male 161 .1788 .8212
7op Mestizo Male 149 .2217 .7783
7oq Mestizo Male 162 .0618 .9382
7ou Mestizo Male 163 .4906 .5094
7ov Mestizo Male 170 .3040 .6960
7ow Mestizo Male 164 .1904 .8096
7oy Mestizo Male 159 .6188 .3812
7oz Mestizo Male 179 .7177 .2823
7p2 Mestizo Male .5777 .4223
7p3 Mestizo Male 157 .2603 .7397
7p4 Mestizo Male 169 .4466 .5534
7p5 Mestizo Male 168 .5085 .4915
7p7 Mestizo Male 178 .5731 .4269
7p8 Mestizo Female 160 .8835 .1165
7pa Mestizo Male 181 .9286 .0714
7pb Mestizo Male 165 .6542 .3458
7pd Mestizo Male 162 .2041 .7959
7pi Mestizo Male 157 .5592 .4408
7pj Mestizo Male 174 .4927 .5073
7pl Mestizo Male 179 .5644 .4356
7pm Mestizo Male 159 .4851 .5149
7pn Mestizo Male 162 .4956 .5044
7pr Mestizo Male 195 .5563 .4437
7ps Mestizo Male 176 .5133 .4867
7pu Mestizo Male 172 .4362 .5638
7pw Mestizo Male 158 .0942 .9058
7px Mestizo Male 169 .6493 .3507
7pz Mestizo Male 161 .4779 .5221
7q0 Mestizo Male 156 .7743 .2257
7q1 Mestizo Male 162 .2425 .7575
7q2 Mestizo Male 166 .1697 .8303
7q3 Mestizo Male 172 .5815 .4185
7qa Mestizo Female 146 .4247 .5753
7qc Mestizo Male 162 .3309 .6691
7qf Mestizo Male 162 .4822 .5178
7qh Mestizo Male 163 .1427 .8573
7qj Mestizo Male 169 .3919 .6081
7qm Mestizo Male 155 .6480 .3520
7qn Mestizo Male 155 .4069 .5931
7qp Mestizo Male 167 .6990 .3010
7qq Mestizo Male 165 .7216 .2784
7qv Mestizo Male 158 .5001 .4999
7qw Otomi Male 180 .6493 .3507
7qx Mestizo Male 168 .4843 .5157
7qz Mestizo Male 164 .5880 .4120
7r1 Mestizo Male 142 .3875 .6125
7r2 Mestizo Male 162 .4021 .5979
7r3 Mestizo Male 160 .0884 .9116
7r4 Mestizo Male 168 .5629 .4371
7r5 Mestizo Male 154 .1291 .8709
7r6 Mestizo Female 150 .1211 .8789
7ri Mestizo Male 176 .8348 .1652
7rj Mestizo Male 160 .2914 .7086
7rm Mestizo Male 158 .4687 .5313
7rq Mestizo Male 163 .2027 .7973
7rr Mestizo Male 177 .2619 .7381
7rs Mestizo Male 160 .0780 .9220
7ru Mestizo Male 172 .5849 .4151
7rw Mestizo Male 154 .0928 .9072
7ry Mestizo Male 151 .2050 .7950
7rz Mestizo Female 155 .3809 .6191
7s0 Mestizo Female 163 .5029 .4971
7s1 Mestizo Male 178 .6896 .3104
7s3 Mestizo Male 157 .4147 .5853
7s4 Mestizo Male 155 .3096 .6904
7s5 Mestizo Male 156 .3947 .6053
7s6 Mestizo Male 156 .1174 .8826
7s7 Mestizo Male 173 .6185 .3815
7s9 Mestizo Male 160 .4705 .5295
7sb Mestizo Male 168 .3815 .6185
7sd Mestizo Male 160 .5431 .4569
7se Otomi Male 150 .1733 .8267
7sf Otomi Male 168 .2148 .7852
7sg Otomi Male 158 .1271 .8729
7sh Otomi Female 143 .2110 .7890
7si Otomi Female 143 .0713 .9287
7sj Otomi Male 159 .2113 .7887
7sk Otomi Female 153 .2440 .7560
7sl Otomi Male 166 .1136 .8864
7sm Otomi Female 157 .3143 .6857
7sn Otomi Female 150 .1732 .8268
7so Otomi Male 152 .1007 .8993
7sp Otomi Male 166 .2582 .7418
7sr Otomi Male 154 .2083 .7917
7ss Otomi Male 178 .1095 .8905
7st Otomi Male 146 .2718 .7282
7su Mestizo Male 167 .3750 .6250
7sv Mestizo Male 166 .6236 .3764
7sw Mestizo Male 160 .1936 .8064
7sz Mestizo Male 160 .6043 .3957
7t0 Mestizo Male 161 .4253 .5747
7t2 Mestizo Male 176 .7161 .2839
7t4 Mestizo Male 178 .8549 .1451
7t5 Mestizo Male 180 .7728 .2272
7t7 Mestizo Male 174 .6574 .3426
7t8 Mestizo Male 156 .1293 .8707
7t9 Mestizo Male 170 .6342 .3658
7ta Mestizo Male 168 .2329 .7671
7tb Mestizo Male 166 .6250 .3750
7tc Mestizo Male 169 .3303 .6697
7td Mestizo Male 162 .1408 .8592
7te Mestizo Male 165 .1545 .8455
7tf Mestizo Male 172 .9084 .0916
7tg Mestizo Male 178 .9062 .0938
7th Mestizo Male 181 .6093 .3907
7ti Mestizo Male 180 .8071 .1929
7tk Mestizo Male 158 .4396 .5604
7tl Mestizo Male 156 .3333 .6667
7tn Mestizo Male 160 .4235 .5765
7to Mestizo Female 153 .6285 .3715
7tp Mestizo Female 154 .4776 .5224
7tq Mestizo Male 161 .1615 .8385
7tr Mestizo Male 168 .1676 .8324
7ts Mestizo Male 160 .1171 .8829
7tu Mestizo Male 165 .4346 .5654
7tv Mestizo Male 156 .1421 .8579
7tx Mestizo Female 155 .5290 .4710
7ty Mestizo Male 159 .2850 .7150
7tz Mestizo Male 168 .2214 .7786
7u0 Mestizo Female 152 .5419 .4581
7u3 Mestizo Male 163 .1941 .8059
7u4 Mestizo Male 160 .4576 .5424
7u5 Mestizo Male 161 .2100 .7900
7u6 Mestizo Male 167 .5917 .4083
7u8 Mestizo Male 168 .9431 .0569
7u9 Mestizo Male 165 .1774 .8226
7ua Mestizo Male 163 .4069 .5931
7ub Mestizo Male 172 .9570 .0430
7uc Mestizo Male 190 .7985 .2015
7ue Mestizo Male 172 .5063 .4937
7uf Mestizo Male 165 .5739 .4261
7ug Mestizo Male 157 .4205 .5795
7uh Mestizo Male 170 .3555 .6445
7uk Mestizo Male 160 .1563 .8437
7ul Mestizo Male 168 .5278 .4722
7um Mestizo Male 174 .7976 .2024
7un Mestizo Male 165 .3994 .6006
7up Mestizo Male 160 .4159 .5841
7uq Mestizo Male 165 .4985 .5015
7ur Mestizo Male 158 .1292 .8708
7us Mestizo Male 180 .8326 .1674
7ut Mestizo Male 163 .9325 .0675
7uu Mestizo Male 174 .6588 .3412
7uv Mestizo Male 167 .6644 .3356
7uw Mestizo Male 185 .9276 .0724
7ux Mestizo Male 180 .3082 .6918
7uy Mestizo Male 180 .6943 .3057
7uz Mestizo Male 186 .8417 .1583
7v1 Mestizo Male 172 .9133 .0867
7v2 Mestizo Female 156 .7400 .2600
7v3 Mestizo Male 170 .1986 .8014
7v4 Mestizo Male 170 .4294 .5706
7v5 Mestizo Male 163 .4120 .5880
7v6 Mestizo Male 170 .3065 .6935
7v7 Mestizo Male 173 .6868 .3132
7v8 Mestizo Male 178 .8400 .1600
7v9 Mestizo Male 162 .2877 .7123
7va Mestizo Male 173 .6125 .3875
7vc Mestizo Male 183 .3608 .6392
7vd Mestizo Male 186 .3975 .6025
7ve Mestizo Male 163 .6361 .3639
7vj Mestizo Male 172 .3938 .6062
7vk Mestizo Male 182 .7810 .2190
7vl Mestizo Male 166 .5863 .4137
7vm Otomi Male 162 .1494 .8506
7vo Mestizo Male 164 .3427 .6573
7vp Mestizo Male 163 .5622 .4378
7vq Mestizo Male 186 .0568 .9432
7vw Otomi Female 148 .1872 .8128
7vz Mestizo Female 159 .2409 .7591
7w0 Mestizo Male 170 .3778 .6222
7w2 Otomi Female 153 .5968 .4032
7w3 Mestizo Male 165 .1099 .8901
7w7 Otomi Male 166 .0945 .9055
7w8 Mestizo Male 159 .1765 .8235
7w9 Mestizo Male 160 .3967 .6033
7wa Mestizo Male 162 .5551 .4449
7wc Mestizo Male 180 .4988 .5012
7wn Mestizo Male 180 .4742 .5258
7wp Mestizo Male 178 .5604 .4396
7ws Mestizo Male 166 .3801 .6199
7wt Mestizo Male 172 .8085 .1915
7wu Mestizo Female 157 .4878 .5122
7wv Mestizo Male 177 .6316 .3684
7ww Mestizo Male 180 .3873 .6127
7wx Mestizo Female 156 .4296 .5704
7wy Mestizo Male 175 .4363 .5637
7wz Mestizo Male 163 .1446 .8554
7x0 Mestizo Male 167 .7575 .2425
7x1 Mestizo Male 170 .2288 .7712
7x2 Mestizo Female 162 .4132 .5868
7x3 Mestizo Male 170 .4404 .5596
7x4 Mestizo Male 162 .6423 .3577
7x5 Mestizo Male 160 .2939 .7061
7x6 Mestizo Male 168 .5462 .4538
7x7 Mestizo Male 182 .5964 .4036
7x8 Mestizo Female 162 .6169 .3831
7x9 Mestizo Male 180 .7221 .2779
7xb Mestizo Male 180 .7376 .2624
7xd Mestizo Male 168 .2357 .7643
7xf Mestizo Male 168 .3490 .6510
7xl Mestizo Male 175 .5976 .4024
7xm Mestizo Male 157 .3636 .6364
7xp Mestizo Male 161 .4115 .5885
7xx Mestizo Male 170 .3199 .6801
7xy Mestizo Male 158 .1743 .8257
7y3 Mestizo Male 187 .5545 .4455
7y5 Mestizo Male 178 .9041 .0959
7ya Mestizo Male 173 .6812 .3188
7yb Mestizo Male 175 .7895 .2105
7yc Mestizo Male 183 .8914 .1086
7yd Mestizo Male 178 .9484 .0516
7ye Mestizo Male 175 .4782 .5218
7yf Mestizo Male 171 .5358 .4642
7yh Mestizo Male 180 .7300 .2700
7yi Mestizo Male 162 .3142 .6858
7yj Mestizo Male 176 .5502 .4498
7yl Mestizo Male 164 .6590 .3410
7ym Mestizo Female 158 .2836 .7164
7yn Mestizo Male 170 .6483 .3517
7yo Mestizo Male 165 .2106 .7894
7yv Mestizo Male 154 .3878 .6122
7yw Mestizo Male 158 .2596 .7404
7yy Otomi Male 168 .4977 .5023
7z6 Otomi Female 145 .2737 .7263
7z8 Mestizo Male 168 .3650 .6350
7zd Mestizo Male 180 .4340 .5660
7zf Mestizo Male 162 .0632 .9368
7zg Mestizo Male 163 .4694 .5306
7zi Mestizo Male 162 .3179 .6821
7zl Mestizo Male 187 .7883 .2117
7zm Mestizo Male 182 .8448 .1552
7zo Mestizo Male 158 .2280 .7720
7zp Mestizo Male 160 .6799 .3201
7zq Mestizo Male 176 .5255 .4745
7zs Mestizo Male 185 .6608 .3392
7zy Mestizo Male 175 .7408 .2592
7zz Mestizo Male 175 .5542 .4458
800 Mestizo Male 179 .0730 .9270
804 Mestizo Male 179 .4978 .5022
808 Mestizo Male 180 .9080 .0920
80a Mestizo Male 175 .3391 .6609
80c Mestizo Male 160 .6212 .3788
80g Mestizo Male 178 .8452 .1548
80i Mestizo Male 160 .6255 .3745
80k Mestizo Male 175 .5195 .4805
80p Mestizo Male 185 .4407 .5593
80r Mestizo Male 175 .6390 .3610
80u Mestizo Male 178 .9401 .0599
80x Mestizo Male 162 .4841 .5159
cd6 Mestizo Female 160 .4590 .5410
cei Mestizo Male 160 .4917 .5083
cek Mestizo Male 160 .2380 .7620
cem Mestizo Male 160 .3231 .6769
cen Mestizo Male 158 .4957 .5043
cep Mestizo Male 156 .3283 .6717
ceq Mestizo Male 180 .2894 .7106
ces Mestizo Male 175 .6730 .3270
cev Mestizo Male 160 .4306 .5694
cf0 Mestizo Female 158 .5227 .4773
cf3 Mestizo Female 160 .4316 .5684
cff Mestizo Male 163 .2125 .7875
cfm Mestizo Female 160 .4760 .5240
cfr Mestizo Male 186 .5059 .4941
cfs Mestizo Male 175 .5130 .4870
cfu Mestizo Male 155 .0912 .9088
cfv Mestizo Male 158 .3552 .6448
cfy Mestizo Male 163 .4233 .5767
cg1 Mestizo Female 155 .4061 .5939
cg2 Mestizo Female 160 .1271 .8729
cg3 Mestizo Male 157 .3245 .6755
cg4 Mestizo Male 160 .7861 .2139
cg6 Mestizo Male 164 .5470 .4530
cg8 Mestizo Male 164 .4259 .5741
cg9 Mestizo Male 167 .3282 .6718
cga Mestizo Male 161 .2202 .7798
cgc Mestizo Male 162 .5532 .4468
cgf Mestizo Male 164 .5785 .4215
cgg Mestizo Male 175 .6731 .3269
cgi Mestizo Male 170 .8933 .1067
cgj Mestizo Male 180 .6012 .3988
cgk Mestizo Male 168 .7003 .2997
cgm Mestizo Female 148 .0753 .9247
cgo Mestizo Male 154 .1075 .8925
cgp Mestizo Male 161 .3718 .6282
cgq Mestizo Male 175 .4953 .5047
cgr Mestizo Male 160 .5099 .4901
cgs Mestizo Female 158 .1542 .8458
cgt Mestizo Male 168 .2268 .7732
cgu Mestizo Male 160 .1672 .8328
cgv Mestizo Male 160 .8747 .1253
cgw Mestizo Female 152 .3015 .6985
cgx Mestizo Male 162 .4679 .5321
cgy Mestizo Male 158 .4983 .5017
cgz Mestizo Male 160 .4002 .5998
ch1 Mestizo Male 158 .5599 .4401
ch2 Mestizo Male 156 .4946 .5054
ch5 Mestizo Male 160 .6014 .3986
ch6 Mestizo Male 159 .3855 .6145
ch7 Mestizo Male 160 .3793 .6207
chc Mestizo Male 165 .4911 .5089
chd Mestizo Male 162 .1948 .8052
che Otomi Male 160 .0890 .9110
chf Mestizo Male 168 .4535 .5465
chg Mestizo Male 160 .3978 .6022
chh Mestizo Male 173 .1835 .8165
chi Mestizo Male 160 .4183 .5817
chm Mestizo Male 175 .5505 .4495
chn Mestizo Male 155 .5957 .4043
cho Mestizo Male 170 .6976 .3024
chp Mestizo Male 160 .4180 .5820
chq Mestizo Male 164 .3477 .6523
chr Mestizo Male 160 .2536 .7464
chs Mestizo Male 168 .2898 .7102
cht Mestizo Male 162 .5725 .4275
chu Mestizo Male 162 .2614 .7386
ci0 Mestizo Male 184 .8965 .1035
ci1 Mestizo Female 165 .7463 .2537
ci3 Mestizo Male 168 .2474 .7526
ci4 Mestizo Female 153 .3485 .6515
ci7 Mestizo Male 172 .5412 .4588
ci8 Mestizo Male 158 .3928 .6072
ci9 Mestizo Male 165 .0887 .9113
cia Mestizo Male 160 .1353 .8647
cid Mestizo Male 170 .5194 .4806
cie Mestizo Male 170 .5424 .4576
cif Mestizo Male 170 .3959 .6041
cig Mestizo Male 169 .5562 .4438
cih Mestizo Female 150 .3813 .6187
cii Mestizo Male 168 .2885 .7115
cij Mestizo Male 170 .2932 .7068
cik Mestizo Female 165 .3344 .6656
cil Mestizo Male 170 .3358 .6642
cim Mestizo Male 166 .5217 .4783
cir Mestizo Male 188 .4899 .5101
cis Mestizo Male 162 .4245 .5755
ciu Mestizo Male 176 .4796 .5204
ciw Mestizo Male 165 .6046 .3954
cix Mestizo Male 180 .6664 .3336
cj1 Mestizo Male 189 .5819 .4181
cj2 Mestizo Male 170 .4985 .5015
cj3 Mestizo Male 178 .4031 .5969
cj5 Mestizo Male 178 .7656 .2344
cj6 Mestizo Male 178 .7587 .2413
cj7 Mestizo Male 163 .7340 .2660
cj8 Mestizo Female 158 .5880 .4120
cja Mestizo Male 171 .2849 .7151
cjc Mestizo Male 152 .2346 .7654
cjd Mestizo Male 164 .2892 .7108
cje Mestizo Female 144 .0960 .9040
cjf Mestizo Male 174 .4539 .5461
cjk Mestizo Male 162 .2774 .7226
cjl Mestizo Male 163 .5032 .4968
cjm Mestizo Male 169 .2475 .7525
cjp Mestizo Male 162 .2344 .7656
cjq Mestizo Male 159 .5760 .4240
cjt Mestizo Male 161 .0901 .9099
cju Mestizo Male 173 .8877 .1123
cjv Mestizo Female 165 .4959 .5041
cjw Mestizo Male 185 .3050 .6950
cjx Mestizo Male 175 .3155 .6845
cjz Mestizo Female 165 .2790 .7210
ck0 Mestizo Male 158 .1991 .8009
ck1 Mestizo Male 165 .3378 .6622
ck2 Mestizo Male 160 .5444 .4556
ck4 Mestizo Male 183 .2729 .7271
ck9 Mestizo Male 158 .1429 .8571
cka Mestizo Male 160 .1603 .8397
ckb Mestizo Male 171 .3592 .6408
ckc Mestizo Male 164 .2318 .7682
ckd Mestizo Male 165 .2490 .7510
cke Otomi Male 158 .2185 .7815
ckf Mestizo Male 175 .2882 .7118
ckg Mestizo Male 170 .4503 .5497
cki Mestizo Male 165 .2414 .7586
ckj Mestizo Male 172 .3228 .6772
ckk Mestizo Male 170 .7612 .2388
ckl Mestizo Male 172 .5816 .4184
ckm Mestizo Male 165 .1543 .8457
cko Mestizo Male 159 .4716 .5284
ckq Mestizo Male 167 .3547 .6453
ckr Mestizo Male 173 .6474 .3526
cks Mestizo Male 170 .2036 .7964
ckt Mestizo Male 182 .6745 .3255
cku Mestizo Male 180 .5302 .4698
ckv Mestizo Male 168 .2537 .7463
cky Mestizo Male 160 .1361 .8639
cl0 Mestizo Male 163 .7293 .2707
cl2 Mestizo Male 170 .6975 .3025
cl3 Mestizo Male 160 .2114 .7886
cl4 Mestizo Female 150 .1104 .8896
cl8 Mestizo Male 164 .8745 .1255
cl9 Mestizo Male 160 .5484 .4516
cla Mestizo Male 160 .1149 .8851
clb Mestizo Male 160 .0654 .9346
cld Mestizo Male 160 .3909 .6091
cle Mestizo Male .6304 .3696
clf Mestizo Male 156 .3609 .6391
clg Mestizo Male 162 .3228 .6772
clh Mestizo Male 158 .3679 .6321
cli Mestizo Male 168 .2824 .7176
clj Mestizo Male 160 .4434 .5566
cll Mestizo Male 166 .3035 .6965
clm Mestizo Male 165 .6688 .3312
clo Mestizo Female 173 .7143 .2857
clp Mestizo Male 170 .1587 .8413
clq Mestizo Male 151 .4306 .5694
clu Otomi Female 163 .2285 .7715
cly Mestizo Male 170 .5207 .4793
clz Mestizo Female 159 .1337 .8663
cm0 Mestizo Female 160 .0803 .9197
cm1 Mestizo Female 158 .3935 .6065
cm2 Mestizo Female 160 .1687 .8313
cm3 Mestizo Male 174 .5492 .4508
cm4 Mestizo Male 152 .0764 .9236
cm5 Mestizo Male 163 .0531 .9469
cm6 Mestizo Male 175 .0951 .9049
cm8 Mestizo Male 156 .0731 .9269
cm9 Mestizo Male 173 .2500 .7500
cma Mestizo Male 176 .8621 .1379
cmb Mestizo Male 172 .6413 .3587
cmc Mestizo Male 153 .2075 .7925
cmd Mestizo Male 158 .4901 .5099
cme Mestizo Female 148 .5704 .4296
cmf Mestizo Male 161 .3058 .6942
cmg Mestizo Male 162 .2842 .7158
cmi Mestizo Male 165 .4758 .5242
cmj Mestizo Male 172 .2286 .7714
cmk Mestizo Male 169 .3634 .6366
cml Mestizo Male 165 .1749 .8251
cmn Mestizo Male 178 .4953 .5047
cmo Mestizo Male 160 .5291 .4709
cmp Mestizo Male 179 .5358 .4642
cms Mestizo Male 173 .6984 .3016
cmt Mestizo Male 172 .2734 .7266
cmu Mestizo Male 165 .2056 .7944
cmv Mestizo Male 172 .6267 .3733
cmw Mestizo Male 169 .0696 .9304
cn0 Mestizo Male 164 .3236 .6764
cn1 Mestizo Male 165 .8299 .1701
cn2 Mestizo Male 175 .7790 .2210
cn3 Mestizo Male 182 .8640 .1360
cn4 Mestizo Male 197 .5495 .4505
cn7 Mestizo Male 179 .8030 .1970
cn8 Mestizo Male 170 .4233 .5767
cn9 Mestizo Male 169 .4747 .5253
cnb Mestizo Male 187 .5456 .4544
cnc Mestizo Male 170 .5493 .4507
cnd Mestizo Male 175 .6508 .3492
cne Mestizo Male 178 .6549 .3451
cnf Mestizo Male 165 .3321 .6679
cnh Mestizo Male 175 .5416 .4584
cni Mestizo Male 168 .2476 .7524
cnl Mestizo Male 180 .7485 .2515
cnm Mestizo Male 187 .4098 .5902
cnn Mestizo Male 175 .3086 .6914
cno Mestizo Male 178 .5785 .4215
cnq Mestizo Male 176 .7999 .2001
cnr Mestizo Male 178 .9065 .0935
cns Mestizo Male 177 .7149 .2851
cnz Mestizo Male 177 .8719 .1281
co1 Mestizo Male 185 .6890 .3110
co3 Mestizo Male 162 .7017 .2983
co6 Mestizo Male 150 .3495 .6505
co7 Mestizo Male 182 .9448 .0552
co9 Mestizo Male 195 .9177 .0823
cod Mestizo Male 176 .4690 .5310
coi Mestizo Male 161 .5186 .4814
cok Mestizo Male 175 .8392 .1608
con Mestizo Male 160 .6052 .3948
coo Mestizo Male 175 .5601 .4399
cop Mestizo Male 177 .7855 .2145
cor Mestizo Male 175 .6703 .3297
cou Mestizo Male 170 .6554 .3446
cov Mestizo Male 173 .7484 .2516
cox Mestizo Male 158 .5018 .4982
coy Mestizo Male 172 .3146 .6854
coz Mestizo Female 150 .4496 .5504
cp1 Mestizo Male 158 .2756 .7244
cp2 Mestizo Male 158 .1475 .8525
cp4 Mestizo Male 160 .4598 .5402
cp5 Mestizo Male 164 .3057 .6943
cp6 Mestizo Male 161 .3064 .6936
cp7 Mestizo Male 180 .8891 .1109
cp8 Mestizo Male 155 .2484 .7516
cp9 Mestizo Male 170 .0968 .9032
cpc Mestizo Male 160 .5259 .4741
cpd Mestizo Male 198 .7882 .2118
cpf Mestizo Male 155 .3186 .6814
cpi Mestizo Male 178 .7113 .2887
cpj Mestizo Male 178 .1082 .8918
cpn Mestizo Male 182 .5435 .4565
cpr Mestizo Male 176 .5803 .4197
cps Mestizo Male 184 .9105 .0895
cpv Otomi Male 168 .4396 .5604
cpw Mestizo Male 140 .1643 .8357
cpz Otomi Female 145 .2631 .7369
cq0 Mestizo Male 172 .5564 .4436
cq3 Mestizo Male 180 .4671 .5329
cq4 Mestizo Male 182 .6424 .3576
cq7 Mestizo Male 160 .3037 .6963
cq9 Mestizo Male 180 .3245 .6755
cqd Mestizo Male 186 .5922 .4078
cqe Mestizo Male 160 .3377 .6623
cqh Mestizo Male 160 .3452 .6548
cqi Mestizo Male 175 .7030 .2970
cqk Mestizo Male 179 .7167 .2833
cql Mestizo Male 162 .5204 .4796
cqm Mestizo Male 160 .2244 .7756
cqv Mestizo Male 163 .4053 .5947
cqy Mestizo Male 150 .1871 .8129
cr0 Mestizo Male 160 .6213 .3787
cr1 Mestizo Male 160 .4885 .5115
cr2 Mestizo Male 158 .6159 .3841
cr3 Mestizo Male 177 .3927 .6073
cr6 Mestizo Male 180 .6227 .3773
cr9 Mestizo Male 172 .1045 .8955
cra Mestizo Male 178 .4386 .5614
crb Mestizo Male 163 .1118 .8882
crc Mestizo Male 180 .8329 .1671
cre Mestizo Male 162 .6336 .3664
crg Mestizo Male 177 .3747 .6253
cri Mestizo Male 186 .5990 .4010
crn Mestizo Male 178 .4081 .5919
crq Mestizo Male 177 .3416 .6584
crr Mestizo Male 175 .8971 .1029
cs1 Mestizo Male 178 .8305 .1695
cs6 Mestizo Male 158 .2088 .7912
csd Mestizo Male 153 .4577 .5423
csf Mestizo Male 183 .3101 .6899
csg Mestizo Male 161 .4497 .5503
csi Mestizo Male 185 .8559 .1441
csr Mestizo Male 175 .9115 .0885
css Mestizo Male 167 .5829 .4171
cst Mestizo Male 160 .8267 .1733
csw Mestizo Male 168 .6293 .3707
ct0 Mestizo Male 160 .4537 .5463
ct2 Mestizo Male 187 .1484 .8516
cta Mestizo Male 184 .8225 .1775
cti Mestizo Male 179 .3820 .6180
ctk Mestizo Male 185 .5056 .4944
ctl Mestizo Male 155 .4277 .5723
ctm Mestizo Male 191 .8045 .1955
ctn Mestizo Male 175 .7304 .2696
ctq Mestizo Male 186 .6809 .3191
cts Mestizo Male 177 .5180 .4820
ctt Mestizo Male 176 .9071 .0929
ctv Mestizo Male 178 .6137 .3863
ctx Mestizo Male 176 .9351 .0649
cu2 Mestizo Male 187 .6762 .3238
cu3 Mestizo Male 168 .2758 .7242
cu4 Mestizo Male 170 .2379 .7621
cu5 Mestizo Male 180 .6858 .3142
cu7 Mestizo Male 175 .5640 .4360
cu8 Mestizo Male 181 .6415 .3585
cu9 Mestizo Male 183 .5631 .4369
cuc Mestizo Male 175 .6701 .3299
cuh Mestizo Male 176 .6290 .3710
cuk Mestizo Male 180 .3272 .6728

The admixture model used in the structure program assumes a unimodal distribution of individual admixture proportions. However, we found that our inclusion of small numbers of Caucasian and Otomi samples in the analysis did not significantly perturb the admixture estimates for the Mestizo samples. A separate analysis of just the Mestizo samples, which might be expected to better fit the unimodal admixture model, yielded admixture proportions that had a correlation of 0.9994 with the full analysis (data not shown). Thus, this analysis seems to be robust against some limited misspecification of the admixture model.

Association of Ancestry with Height

We compared the inferred ancestry information for individuals selected to represent the tallest and shortest 25% of male Mestizo subjects. Of the samples that were genotyped, we identified 164 short and 166 tall individuals. Height is strongly correlated with the inferred proportion of cluster A ancestry (fig. 2), and many spurious allele-frequency differences occur solely as a result of differences in ancestry between the tall and short groups (table 4, all samples). This is an extremely stratified population, and there are multiple SNPs with χ2-test P values of <10−8. This level of significance would exceed genomewide significance thresholds for 1 million independent SNP association tests with conservative adjustment for multiple testing, clearly a problem if these groups were to be used in the type of genomewide association study described above.

Figure 2.

Figure  2

Distribution of ancestry versus height categories. Density distributions for the inferred fraction of subjects with cluster A ancestry are shown for 164 short and 166 tall subjects. Each tick mark represents the fractional ancestry of an individual subject.

Table 4.

Association Test Results for Height in 275 SNPs

Number of SNPs with χ2 Test Statistics
Data Set P<.0001 P<.001 P<.01 P<.1 q<.05
Expected 0 0 2.75 27.5 0
All samples 22 38 69 126 94
Random subset 10 20 44 106 62
Matched subset 0 0 7 35 0
Leave out 20% 0 0 6 39 0
Linear adjusted 0 0 4 44 0

Matching Based on Average Ancestry Estimates

We composed new groups using subsets of the tall and short individuals, so the groups would have the same average proportions of ancestry in clusters A and B, while retaining as many samples as possible. This involved removing tall samples with the highest proportions of cluster A ancestry and short samples with the lowest proportions of cluster A ancestry. We were able to retain 98 tall samples and 98 short samples with this matching strategy. Ancestry proportions before and after matching are shown in table 3. For a direct comparison, 98 samples were also selected at random from the lists of tall and short samples. The random and matched groups were tested for significant allele-frequency differences (table 4, random and matched subsets). Matching removed most evidence for population structure. An overall test for stratification that was based on the sum of χ2 statistics (Pritchard and Rosenberg 1999) for the matched set gave a P value of ∼.005, versus ∼10−71 for the randomly selected set. The distribution of P values for the 275 SNPs is more nearly uniform for the matched groups (fig. 3), and no markers showed significant association after controlling the false-discovery rate.

Table 3.

Average Proportion of Ancestry in Cluster A, for Tall and Short Groups

No. of Subjectsin Proportion of Ancestryin Cluster A in
Data Set Tall Group Short Group Tall Group Short Group
All samples 166 164 .62 .36
Matched subset 98 98 .48 .48

Figure 3.

Figure  3

Cumulative distribution of P values for 275 SNPs, for the random and ancestry-matched subsets of tall and short subjects. In the absence of population structure, the P values should be uniformly distributed, and their cumulative distribution should be a straight line from (0,0) to (1,1). The random subset shows an excess of small P values, whereas the matched subset has a nearly uniform distribution.

In the previous analysis, the SNPs used to test for associations were the same ones used for the stratification analysis. Although the stratification analysis is blind to the phenotype, in principle, this analysis could underestimate the residual population structure expected for other SNPs not included in the stratification analysis. To address this, we split the 275 SNPs into five random subsets of 55. For each subset, we performed a stratification analysis of the other 220 SNPs, matched tall and short groups on the basis of that analysis, and then tested for association in the 20% that had been left out. Then we combined results for all the subsets, yielding a test result for each SNP stratified by use of what was, for that SNP, an independent set of data. Results (table 4, leave-out-20% data set) were essentially the same as for matching on all 275 SNPs, and there were no significant associations.

Matching Based on an Ancestry-Adjusted Phenotype

An alternative approach to eliminating stratification for a quantitative trait is to define groups on the basis of a phenotype that has been adjusted to remove effects of ancestry differences. We performed a linear regression of height against the inferred fraction of cluster A ancestry for the male Mestizo subjects in our study and determined that a 10% increase in cluster A ancestry corresponded, on average, to a 1.8-cm increase in height. We adjusted height by subtracting out this contribution, and we selected the tallest and shortest 98 individuals on the basis of the adjusted phenotype. We did not see any significant associations using these groups (table 4, linear adjusted).

In principle, adjusting for ancestry should yield a cleaner phenotype and a more powerful study design than the simple strategy of matching the mean ancestry of case and control groups. Comparing the distributions of height and inferred ancestry for the two designs (fig. 4), the regression design includes fewer individuals with relatively mild ancestry-adjusted phenotypes and intermediate ancestry coefficients, and more individuals with extreme ancestry-adjusted phenotypes and ancestry coefficients. The regression design may be more challenging to implement, however, if it requires collecting genotype data for additional individuals to accurately determine the relationship between phenotype and ancestry.

Figure 4.

Figure  4

Comparison of a matching strategy with independently determined cutoffs for height and ancestry (A) and a strategy based on a linear regression of height against ancestry (B). The samples retained from tall and short subjects by use of each method are shown as blackened circles, and excluded samples are shown as unblackened circles. The regression method results in inclusion of the tallest and shortest individuals within any narrow window of ancestry values.

Effects of Population Structure on Pooled Genotyping

In many if not most association studies, if the target population is relatively homogeneous, or if there is little confounding between the target phenotype and ancestry, then careful pool matching may not be necessary (e.g., Ardlie et al. 2002). Thus, it is useful to have a way of quantifying the practical impact of population structure on an association study, to decide when corrective action is needed. Significance tests are not appropriate for this purpose because they do not directly measure the magnitude of an effect. One approach is to model population structure as one of various sources of error that lead to an increase in the false-positive rate. If the effect of population structure is determined to be small compared with other known sources of experimental error, then correcting for it will have limited benefit.

We examined the behavior of the sum of χ2 statistics for association tests with data from the tall and short groups matched for average ancestry, as various amounts of random noise were added to allele frequencies in the two groups. Genotypes for each SNP were first permuted to eliminate any residual disequilibrium, so we essentially only preserved overall SNP allele frequencies from the original data. The allele frequencies for each pool were then perturbed by a normally distributed error term, with standard deviation specified in units of allele frequency (fig. 5). The sum statistics for the unpermuted random and matched groups (table 5)—that is, 928 and 338—are comparable to permuted data with additional experimental error of ∼5% and ∼1%, respectively. Additional error on the order of 1% seems tolerable for currently available pooled genotyping technologies, which generally cannot determine allele frequencies with better accuracy than that (Sham et al. 2002). This approach could be combined with estimates of experimental variance components (Barratt et al. 2002) to produce more realistic end-to-end power estimates for pooled genotyping study designs.

Figure 5.

Figure  5

Effect of simulated experimental error on an overall population-structure test statistic. We simulated the effect of experimental error by adding normally distributed noise to allele-frequency estimates in permuted copies of the genotype data for the matched tall and short groups. The overall test statistic is the sum of resulting χ2 statistics for the 275 individual SNPs; this is expected to follow a χ2 distribution, with 275 df. We show results for 20 separate permutations for each value of the noise parameter.

Table 5.

Overall Measures of Population Structure for Height Pools

Data Set χ2 Sum P Value λa ESS
All samples 1,380 2 × 10−146 4.9 34
Random subset 928 8 × 10−72 3.6 27
Matched subset 338 5 × 10−3 1.1 89
Leave out 20% 345 2 × 10−3 1.4 70
Linear adjusted 313 6 × 10−2 1.3 75

Genomic control (Devlin and Roeder 1999) provides another approach to estimating the magnitude of the effect of population structure in an association study. In this approach, rather than modeling structure in a population, its effects are measured by the inflation of test statistics for markers that, in aggregate, should not show evidence for association. We estimated the variance-inflation factors (λ) due to population structure for each set of tall and short groups by use of this approach. One interpretation of the variance inflation is as a reduction in effective sample size (ESS), which we estimate here as (N / λ), where N is the original sample size (table 5). Genomic control would effectively maintain a desired type I error rate in the presence of population structure in this example; however, it does so at a substantial cost in the ESS and, hence, power to detect causal associations. Our results show that matching to mitigate the impact of population structure can substantially boost the ESS, despite the reduction in raw sample count.

Discussion

Our results indicate that relatively simple matching strategies can effectively control for population stratification in case-control association studies, for a phenotype with a very large ancestry effect in an admixed population. The genotyping can be efficiently implemented in the laboratory in a high-throughput setting, with a single generic SNP genotyping array carrying around 300 uniformly distributed SNPs that are chosen without regard to their allele frequencies in specific target populations. We have now processed many thousands of these arrays.

Although we chose to use the structure program to infer admixture proportions, other methods are available, including the ADMIXMAP program (available on the Genetic Epidemiology Group Web site) (McKeigue et al. 2000; Hoggart et al. 2003), which may offer significant benefits in some situations. The admixture model in structure suffers from a theoretical deficiency (Pritchard et al. 2000_a_; Hoggart et al. 2003), in that it does not permit specification of prior allele-frequency information for the ancestral populations and thus cannot disambiguate between symmetric modes that differ only in the labels assigned to clusters. Also, interpretation of the admixture coefficients relies on the sampler only exploring one of these symmetric modes. In our analysis, we verified that individual structure runs consistently settled in one (randomly selected) mode, and we could easily determine consistent cluster labels when comparing results across multiple runs. The matching strategies we describe are also invariant under permutations of the cluster labels. Still, it is possible that the structure sampler may have more trouble in situations with more clusters or less clearly separated ones.

In the context of a pooled genotyping screen, absolute control of population structure is probably not required in many cases. It is probably only necessary to ensure that the incremental increase in variance due to population differences between case and control pools is small compared with other sources of variance in the genotyping experiment. In an association study design consisting of an initial screen of many markers by pooled genotyping followed by individual genotyping of candidates, there should be more tolerance for spurious associations in the pooled step. In these cases, a test for population structure on a representative subset of cases and controls may be sufficient to place bounds on the impact of population stratification on the entire study, thus avoiding unnecessary recruitment or individual genotyping effort.

A complete association study would consist of three phases. First, some or all samples would be individually genotyped to ascertain their population structure using our array of ∼300 SNPs. On the basis of those results, and constrained by the form of the phenotype and its ascertainment method, a strategy for mitigating population structure would be selected and validated using the available genotype data. Both of our matching strategies require genotyping some individuals who will end up being excluded from the matched case and control pools. The second phase would consist of pooled genotyping of many SNPs in replicate experiments. In a third phase, candidate SNPs would be selected for individual genotyping on the basis of the pooled data. Samples originally excluded from the pools could be genotyped at this point and could be analyzed using one of the structured association approaches. Genomic control could also be used to adjust significance tests for any residual population structure left in the matched pools.

The matching strategies we discuss here were designed for whole-genome association studies for which we required that a solution could not increase the experimental effort required at the pooled genotyping stage. This constraint (a practical, economic one) severely limited the range of solutions that we could consider. Another approach to controlling for population structure would be to perform a stratified analysis of subpools composed of individuals of similar ancestry. For experimental designs permitting many replicates, this may be a useful strategy for discrete traits that cannot be adjusted to remove ancestry effects. Such a design would allow all individuals to be included in the pooled analysis; however, strata with very unbalanced representation of the trait values would have somewhat lower informativeness for equal experimental effort. The number of strata required to account for most of the variance in ancestry would multiply the experimental effort required for allele-frequency determination, since this would be orthogonal to any replication required to characterize experimental variance.

The strategies we describe can be extended to more complex structured populations. For either admixed populations or populations composed of several unadmixed groups, our approach would be either to match the average genetic contribution of each empirically identified cluster in the case and control groups by excluding samples, or to use multivariate regression to determine an ancestry-adjusted phenotype for each individual on the basis of the individual's inferred cluster-membership proportions. In the absence of admixture, a multiethnic pooled study would be most sensitive for detecting loci that account for phenotypic variation in all of the included populations; such a study would be insensitive for loci accounting for fixed differences between populations.

Admixed populations are attractive targets for association studies because these groups should show more linkage disequilibrium over larger physical distances (Chakraborty and Weiss 1988). If the admixture is between populations with significantly different genetic predispositions to a target phenotype, then heritability of a trait in the admixed population may also be higher than in the more homogeneous ancestral populations. Although linkage-based admixture mapping (McKeigue 1998) can be a more efficient approach for identifying loci that specifically explain phenotypic variance between populations, an association study in an admixed population has the ability to detect loci that explain variance either between or within populations. Pooled allele-frequency differences would not distinguish within- from between-population associations, but these could be resolved later by modeling ancestry effects at associated loci by use of individual genotyping data. The groups used in this study are small, and larger sample sizes would be required for a whole-genome association study of a complex multigenic phenotype. The impact of stratification would be correspondingly larger for more realistic study designs, because although sampling variation in allele frequencies becomes smaller for larger pool sizes, the variance due to population stratification does not. Careful management of population structure is likely to be an important component of future whole-genome association studies.

Acknowledgments

We wish to acknowledge Dr. Raul Bernal Reyes (Instituto Mexicano del Seguro Social, Pachuca), Dr. Armando Diaz Belmont (Hospital General de México), and A. Christian Perez Pruna and Marta Garcia Sandoval (Instituto Nacional de Nutrición Salvador Zubirán), for their invaluable efforts to recruit subjects for this study. We wish to thank Robin Li, Coleen Hacker, Naiping Shen, Claire Marjoribanks, and Albert Yee for excellent technical assistance and overall contribution to this work. We thank Alberto Cevallos and Jesse Hsu, for helping to establish our Mexican collaboration, and Pascual Starink, for assistance with sample tracking. We also thank Kelly Frazer and two anonymous reviewers for helpful comments on the manuscript.

Electronic-Database Information

Accession numbers and URLs for data presented herein are as follows:

  1. dbSNP Home Page, http://www.ncbi.nlm.nih.gov/SNP/ (for ss12673803–ss12674077)
  2. Genetic Epidemiology Group Web Site, http://www.lshtm.ac.uk/eu/genetics/ (for ADMIXMAP software)
  3. NCBI BLAST, http://www.ncbi.nlm.nih.gov/BLAST/ (for BLAST search engine)
  4. NCBI Home Page, http://www.ncbi.nlm.nih.gov/
  5. Pritchard Lab, http://pritch.bsd.uchicago.edu/ (for the structure program)
  6. Q-VALUE Software, http://faculty.washington.edu/~jstorey/qvalue/
  7. R Project for Statistical Computing, http://www.r-project.org/
  8. RepeatMasker Web Server, http://ftp.genome.washington.edu/cgi-bin/RepeatMasker

References

  1. Ardlie KG, Lunetta KL, Seielstad M (2002) Testing for population subdivision and association in four case-control studies. Am J Hum Genet 71:304–311 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 10.1006/jmbi.1990.9999 [DOI] [PubMed] [Google Scholar]
  3. Barratt BJ, Payne F, Rance HE, Nutland S, Todd JA, Clayton DG (2002) Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann Hum Genet 66:393–405 10.1046/j.1469-1809.2002.00125.x [DOI] [PubMed] [Google Scholar]
  4. Cardon LR, Palmer LJ (2003) Population stratification and spurious allelic association. Lancet 361:598–604 10.1016/S0140-6736(03)12520-2 [DOI] [PubMed] [Google Scholar]
  5. Chakraborty R, Weiss KM (1988) Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc Natl Acad Sci USA 85:9119–9123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004 [DOI] [PubMed] [Google Scholar]
  7. Fodor SPA, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251:767–773 [DOI] [PubMed] [Google Scholar]
  8. Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, McKeigue PM (2003) Control of confounding in genetic associations in stratified populations. Am J Hum Genet 72:1492–1504 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comp Graph Stat 5:299–314 [Google Scholar]
  10. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG (1988) GM 3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet 43:520–526 [PMC free article] [PubMed] [Google Scholar]
  11. Kruglyak L (1999) Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 22:139–144 10.1038/9642 [DOI] [PubMed] [Google Scholar]
  12. Lander ES, Schork N (1994) Genetic dissection of complex traits. Science 265:2037–2048 [DOI] [PubMed] [Google Scholar]
  13. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BTN, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SPA, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723 10.1126/science.1065573 [DOI] [PubMed] [Google Scholar]
  14. McKeigue PM (1998) Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations by conditioning on parental admixture. Am J Hum Genet 63:241–51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. McKeigue PM, Carpenter JR, Parra EJ, Shriver MD (2000) Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations. Ann Hum Genet 64:171–186 10.1046/j.1469-1809.2000.6420171.x [DOI] [PubMed] [Google Scholar]
  16. Pritchard JK, Rosenberg NA (1999) Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 65:220–228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Pritchard JK, Stephens M, Donnelly P (2000_a_) Inference of population structure using multilocus genotype data. Genetics 155:945–959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000_b_) Association mapping in structured populations. Am J Hum Genet 67:170–181 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Reich DE, Goldstein DB (2001) Detecting association in a case-control study while correcting for population stratification. Genet Epidemiol 20:4–16 [DOI] [PubMed] [Google Scholar]
  20. Risch N (2000) Searching for genetic determinants in the new millennium. Nature 405:847–856 10.1038/35015718 [DOI] [PubMed] [Google Scholar]
  21. Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517 [DOI] [PubMed] [Google Scholar]
  22. Satten GA, Flanders WD, Yang Q (2001) Accounting for unmeasured population structure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 68:466–477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Sham P, Bader JS, Craig I, O’Donovan M, Owen M (2002) DNA pooling: a tool for large-scale association studies. Nat Rev Genet 3:862–871 10.1038/nrg930 [DOI] [PubMed] [Google Scholar]
  24. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445 10.1073/pnas.1530509100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ES 4th (2001) Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet 28:286–289. 10.1038/90135 [DOI] [PubMed] [Google Scholar]
  26. Weir B (1996) Genetic data analysis II. Sinauer, Sunderland, MA [Google Scholar]