High-throughput variation detection and genotyping using microarrays - PubMed (original) (raw)

High-throughput variation detection and genotyping using microarrays

D J Cutler et al. Genome Res. 2001 Nov.

Abstract

The genetic dissection of complex traits may ultimately require a large number of SNPs to be genotyped in multiple individuals who exhibit phenotypic variation in a trait of interest. Microarray technology can enable rapid genotyping of variation specific to study samples. To facilitate their use, we have developed an automated statistical method (ABACUS) to analyze microarray hybridization data and applied this method to Affymetrix Variation Detection Arrays (VDAs). ABACUS provides a quality score to individual genotypes, allowing investigators to focus their attention on sites that give accurate information. We have applied ABACUS to an experiment encompassing 32 autosomal and eight X-linked genomic regions, each consisting of approximately 50 kb of unique sequence spanning a 100-kb region, in 40 humans. At sufficiently high-quality scores, we are able to read approximately 80% of all sites. To assess the accuracy of SNP detection, 108 of 108 SNPs have been experimentally confirmed; an additional 371 SNPs have been confirmed electronically. To access the accuracy of diploid genotypes at segregating autosomal sites, we confirmed 1515 of 1515 homozygous calls, and 420 of 423 (99.29%) heterozygotes. In replicate experiments, consisting of independent amplification of identical samples followed by hybridization to distinct microarrays of the same design, genotyping is highly repeatable. In an autosomal replicate experiment, 813,295 of 813,295 genotypes are called identically (including 351 heterozygotes); at an X-linked locus in males (haploid), 841,236 of 841,236 sites are called identically.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Eight features (four for the forward strand and four for the reverse complement strand) are associated with every queried site. Each feature consists of a 25-base oligonucleotide. The 13th base is the query base and all possible genotypes are tested. Each feature is divided into 56 equal pixels, and the pixels are scanned individually. The outermost 26 pixels are “masked,” so that only the 30 interior pixels are used for any calculation.

Figure 2

Figure 2

In both haploid and diploid replicate experiments, the effect of varying the total threshold. Strand threshold = −2; total threshold is allowed to vary; for these thresholds; haploid data varies far less than diploids.

Figure 3

Figure 3

Strand threshold varies in diploids. Total threshold fixed at 10, 20, or 30.

Figure 4

Figure 4

All sites were characterized as either called or N. All Sites designated N were partitioned into one of seven categories. (1) Primer: Primer failure indicating that <50% of the sites between a pair of PCR primers were called; (2) Low Signal: Mean florescence intensity extremely low; (3) High Saturation: Mean florescence intensity near detector limits; (4) Threshold: No model obtained quality score greater than the total threshold; (5) Two Models: Different models fit best on the forward and reverse strands so that no model obtained strand thresholds on both strands; (6) Neighborhood: <50% of the 20 surrounding sites could be called; and (7) Sample: <50% of the samples could be called at a particular site.

Figure 7

Figure 7

ABACUS

quality scores for genotype calls. More than 64.5 million diploid genotype calls and >16.1 million haploid genotype calls were made.

Figure 5

Figure 5

VDA schematics. The schematic is colored as follows. (black) Control regions; (white) sites where high reliability

ABACUS

calls could be made; (red) sites with signal saturation; (violet) sites with low signal; (green) sites where different models fit best on the forward and reverse strand; (blue) sites where no model reached the total threshold; and (yellow) all other sources of failure. Both of these schematics come from VDA design CWRS-13. (a) An excellent VDA—very little failure of any kind distributed relatively evenly across the VDA. (b) PCR failure—the last PCR primer pair failed. Notice small spots of white mixed among the failure.

Figure 6

Figure 6

Florescence intensity for autosomal (diploid) loci. VDA features are tiled with 25mers. The number of As, Cs, Gs, and Ts were counted for the reference 25mer on both the forward and reverse strand. The florescence at each pixel within the feature was measured (>3.9 billion pixels in total). Error bars represent two standard errors, under the assumption that separate features are independent but pixels within a feature are not.

References

    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. - PubMed
    1. Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalyanaraman N, et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet. 1999;22:231–238. - PubMed
    1. Chakravarti, A. Population genetics–making sense out of sequence. Nat. Genet. 21: 56–60. - PubMed
    1. Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SP. Accessing genetic information with high-density DNA arrays. Science. 1996;274:610–614. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources