Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences (original) (raw)
- Letter
- Published: October 2000
- Vlad Kustanovich2,
- Cheng Li3,
- Nik Brown5,
- Stanley Nelson2,4,
- Wing Wong3 &
- …
- Christopher J. Lee1
Nature Genetics volume 26, pages 233–236 (2000)Cite this article
- 591 Accesses
- 117 Citations
- 1 Altmetric
- Metrics details
A Correction to this article was published on 01 November 2000
Abstract
Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes1,2,3,4,5,6,7,8,9,10,11. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations—comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing—verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Similar content being viewed by others
Accession codes
Accessions
GenBank/EMBL/DDBJ
References
- Li, W. & Sadler, L.A. Low nucleotide diversity in man . Genetics 129, 513–523 (1991).
CAS PubMed PubMed Central Google Scholar - Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
Article CAS PubMed Google Scholar - Lai, E., Riley, J., Purvis, I. & Roses, A. A 4-Mb high-density single nucleotide polymorphism-based map around human APOE. Genomics 54, 31–38 ( 1998).
Article CAS PubMed Google Scholar - Nickerson, D.A. et al. DNA sequence diversity in a 9.7 kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233– 240 (1998).
Article CAS PubMed Google Scholar - Pennisi, E. A closer look at SNPs suggests difficulties. Science 281, 1787–1789 (1998).
Article CAS PubMed Google Scholar - Taillon-Miller, P., Gu, Z., Li, Q., Hillier, L. & Kwok, P.Y. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. 8, 748– 754 (1998).
Article CAS PubMed PubMed Central Google Scholar - Wang, D.G. et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998).
Article CAS PubMed Google Scholar - Brookes, A.J. The essence of SNPs. Gene 234, 177– 186 (1999).
Article CAS PubMed Google Scholar - Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).
Article CAS PubMed Google Scholar - Halushka, M.K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–246 (1999).
Article CAS PubMed Google Scholar - Masood, E. Consortium plans free SNP map of human genome. Nature 398, 545–546 (1999).
Article CAS PubMed Google Scholar - Schuler, G. Pieces of the puzzle: expressed sequence tags and the catalog of human genes . J. Mol. Med. 75, 694– 698 (1997).
Article CAS PubMed Google Scholar - Buetow, K.H., Edmonson, M.N. & Cassidy, A.B. Reliable identification of large numbers of candidate SNPs from public EST data. Nature Genet. 21, 323–325 (1999).
Article CAS PubMed Google Scholar - Picoult-Newberg, L. et al. Mining SNPs from EST databases. Genome Res. 9, 167–174 (1999).
CAS PubMed PubMed Central Google Scholar - Marth, G.T. et al. A general approach to single nucleotide polymorphism discovery . Nature Genet. 23, 452– 456 (1999).
Article CAS PubMed Google Scholar - Jackson, A.L. & Loeb, L.A. The mutation rate and cancer. Genetics 148, 1483–1490 (1998).
CAS PubMed PubMed Central Google Scholar - Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E.J. Gene discovery in dbEST. Science 265, 1993–1994 (1994).
Article CAS PubMed Google Scholar - Ingram, V.M. Abnormal human haemoglobin. III. The chemical difference between normal and sickle cell haemoglobins. Biochim. Biophys. Acta 36 , 402–411 (1959).
Article CAS PubMed Google Scholar - Baur, E.W. & Motulsky, A.G. Hemoglobin tacoma—a β-chain variant associated with increased hb A2. Humangenetik 1, 621–634 (1965).
CAS PubMed Google Scholar - Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 ( 1998).
Article CAS PubMed Google Scholar - Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Article CAS PubMed Google Scholar - Maeda, M. et al. A simple and rapid method for HLA-DQA1 genotyping by digestion of PCR-amplified DNA with allele specific restriction endonucleases. Tissue Antigens 34, 290–298 (1989).
Article CAS PubMed Google Scholar
Acknowledgements
We thank B. Modrek for assistance in mapping polymorphisms; P. Green for the PHRED, PHRAP and CONSED programs; K. Buetow for the CGAP SNP data; S. McGinnis for information about Unigene and E. Partsch for the sequence of pCMVSPORT. K.I. was supported by USPHS National Research Service Award GM08375. V.K. was supported by USPHS National Research Service Award GM07104. S.N. was supported by the Gwynn Hazen Cherry Memorial Laboratory. W.W. was supported by National Science Foundation grants NSF-DMS-9703918 and NSF-DBI-9904701. C.J.L. was supported by Department of Energy grant DEFG0387ER60615 and a grant from the Searle Scholars Program. Experimental SNP verification costs were supported in part by UC-Biostar grant S97106 to S.N.
Author information
Authors and Affiliations
- Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California, USA
Kris Irizarry & Christopher J. Lee - Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, USA
Vlad Kustanovich & Stanley Nelson - Department of Statistics, University of California, Los Angeles, Los Angeles, California, USA
Cheng Li & Wing Wong - Department of Pediatrics, University of California, Los Angeles, Los Angeles, California, USA
Stanley Nelson - Graduate Program in Computer Science, University of California, Los Angeles, Los Angeles, California, USA
Nik Brown
Authors
- Kris Irizarry
You can also search for this author inPubMed Google Scholar - Vlad Kustanovich
You can also search for this author inPubMed Google Scholar - Cheng Li
You can also search for this author inPubMed Google Scholar - Nik Brown
You can also search for this author inPubMed Google Scholar - Stanley Nelson
You can also search for this author inPubMed Google Scholar - Wing Wong
You can also search for this author inPubMed Google Scholar - Christopher J. Lee
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toChristopher J. Lee.
Supplementary information
Rights and permissions
About this article
Cite this article
Irizarry, K., Kustanovich, V., Li, C. et al. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences.Nat Genet 26, 233–236 (2000). https://doi.org/10.1038/79981
- Received: 29 November 1999
- Accepted: 10 July 2000
- Issue Date: October 2000
- DOI: https://doi.org/10.1038/79981