Mining SNPs from EST databases - PubMed (original) (raw)

Mining SNPs from EST databases

L Picoult-Newberg et al. Genome Res. 1999 Feb.

Abstract

There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs.

PubMed Disclaimer

Figures

Figure 1

Figure 1

An example of a contig containing a high quality mismatch. (a) A Consed view of a contig containing sequences from the 3′-untranslated region of the erythroblastosis virus oncogene homolog 2 (ETS2). The arrow indicates the location of the high-quality mismatch (A vs. T, position 207). Examples of sequence traces show an A at position 207 (b) and a T at position 207 (c). The mismatch has been confirmed as a common SNP by DNA sequence analysis.

Figure 2

Figure 2

Examples of two SNPs in the GADD34 gene confirmed by DNA sequence analysis. (a) Homozygote C vs. heterozygote C/T; (b) homozygote G vs. heterozygote G/C. Less than 2% of contigs contained more than one SNP. When confirmed the majority of these were in complete linkage disequilibrium with one another.

Figure 3

Figure 3

Cluster analyses of candidate EST–SNP sites of GBA genotype data from two contigs. The contig number and library source (3′ or 5′) are given. Each GBA experiment assayed 18 genomic DNA samples from three ethnic groups: eight Caucasian DNA samples (+), five African American DNA samples (o), and five Hispanic DNA samples (x), as well as two positive (/) and two negative controls (\ , −). Raw optical density values from the two-color ELISA system were plotted on an xy scatterplot. Allele 1 data (fluorescein–PNPP reactions) were captured by a standard plate reader at 405 nm and were plotted on the _x_-axis. Allele 2 readings (biotin–TMB reactions, 620 nm) were plotted on the _y_-axis. Thus, the axes represent the base analog extension signals determined by primer extension. Homozygotes for allele 1 lie on the _x_-axis, homozygotes for allele 2 lie on the _y_-axis, and heterozygotes lie on the diagonal. Negative PCR controls (\, reactions controlling for cross-hybridization of the PCR primer and capture–extension primer) and negative GBA controls (−, reactions controlling for self-extension of the capture–extension primer) lie near the plot origins. Synthetic templates (/) were also included as positive controls to monitor hybridization and extension efficiency. Cluster analyses for genotype determination were done automatically by an in-house software program.

Similar articles

Cited by

References

    1. Adams MD, Soares MB, Kerlavage AR, Fields C, Venter JC. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat Genet. 1993;4:373–380. - PubMed
    1. Becker KG, Simon RM, Bailey-Wilson JE, Freidlin B, Biddison WE, McFarland HF, Trent JM. Clustering of non-major histocompatibility complex susceptibility candidate loci in human autoimmune diseases. Proc Natl Acad Sci. 1998;95:9979–9984. - PMC - PubMed
    1. Broman KW, Murray JC, Sheffield VC, White RL, Weber JL. Comprehensive human genetic maps: Individual and sex-specific variation in recombination. Am J Hum Genet. 1998;63:861–869. - PMC - PubMed
    1. Chee M, Yang R, Hubbell E, Berno A, Huang XC, Stern D, Winkler J, Lockhart DJ, Morris MS, Fodor SP. Accessing genetic information with high-density DNA arrays. Science. 1996;274:610–614. - PubMed
    1. Cl’ement K, Vaisse C, Lahlow N, Cabrol S, Pelloux V, Cassuto D, Gourmelen M, Dina C, Chambaz J, Lacorte JM, et al. A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction. Nature. 1998;392:398–401. - PubMed

MeSH terms

Substances

LinkOut - more resources