Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences (original) (raw)

Nature Genetics volume 26, pages 233–236 (2000)Cite this article

A Correction to this article was published on 01 November 2000

Abstract

Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes1,2,3,4,5,6,7,8,9,10,11. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations—comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing—verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

$209.00 per year

only $17.42 per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Similar content being viewed by others

Accession codes

Accessions

GenBank/EMBL/DDBJ

References

  1. Li, W. & Sadler, L.A. Low nucleotide diversity in man . Genetics 129, 513–523 (1991).
    CAS PubMed PubMed Central Google Scholar
  2. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).
    Article CAS PubMed Google Scholar
  3. Lai, E., Riley, J., Purvis, I. & Roses, A. A 4-Mb high-density single nucleotide polymorphism-based map around human APOE. Genomics 54, 31–38 ( 1998).
    Article CAS PubMed Google Scholar
  4. Nickerson, D.A. et al. DNA sequence diversity in a 9.7 kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233– 240 (1998).
    Article CAS PubMed Google Scholar
  5. Pennisi, E. A closer look at SNPs suggests difficulties. Science 281, 1787–1789 (1998).
    Article CAS PubMed Google Scholar
  6. Taillon-Miller, P., Gu, Z., Li, Q., Hillier, L. & Kwok, P.Y. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. 8, 748– 754 (1998).
    Article CAS PubMed PubMed Central Google Scholar
  7. Wang, D.G. et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280, 1077–1082 (1998).
    Article CAS PubMed Google Scholar
  8. Brookes, A.J. The essence of SNPs. Gene 234, 177– 186 (1999).
    Article CAS PubMed Google Scholar
  9. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).
    Article CAS PubMed Google Scholar
  10. Halushka, M.K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–246 (1999).
    Article CAS PubMed Google Scholar
  11. Masood, E. Consortium plans free SNP map of human genome. Nature 398, 545–546 (1999).
    Article CAS PubMed Google Scholar
  12. Schuler, G. Pieces of the puzzle: expressed sequence tags and the catalog of human genes . J. Mol. Med. 75, 694– 698 (1997).
    Article CAS PubMed Google Scholar
  13. Buetow, K.H., Edmonson, M.N. & Cassidy, A.B. Reliable identification of large numbers of candidate SNPs from public EST data. Nature Genet. 21, 323–325 (1999).
    Article CAS PubMed Google Scholar
  14. Picoult-Newberg, L. et al. Mining SNPs from EST databases. Genome Res. 9, 167–174 (1999).
    CAS PubMed PubMed Central Google Scholar
  15. Marth, G.T. et al. A general approach to single nucleotide polymorphism discovery . Nature Genet. 23, 452– 456 (1999).
    Article CAS PubMed Google Scholar
  16. Jackson, A.L. & Loeb, L.A. The mutation rate and cancer. Genetics 148, 1483–1490 (1998).
    CAS PubMed PubMed Central Google Scholar
  17. Boguski, M.S., Tolstoshev, C.M. & Bassett, D.E.J. Gene discovery in dbEST. Science 265, 1993–1994 (1994).
    Article CAS PubMed Google Scholar
  18. Ingram, V.M. Abnormal human haemoglobin. III. The chemical difference between normal and sickle cell haemoglobins. Biochim. Biophys. Acta 36 , 402–411 (1959).
    Article CAS PubMed Google Scholar
  19. Baur, E.W. & Motulsky, A.G. Hemoglobin tacoma—a β-chain variant associated with increased hb A2. Humangenetik 1, 621–634 (1965).
    CAS PubMed Google Scholar
  20. Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 ( 1998).
    Article CAS PubMed Google Scholar
  21. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
    Article CAS PubMed Google Scholar
  22. Maeda, M. et al. A simple and rapid method for HLA-DQA1 genotyping by digestion of PCR-amplified DNA with allele specific restriction endonucleases. Tissue Antigens 34, 290–298 (1989).
    Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank B. Modrek for assistance in mapping polymorphisms; P. Green for the PHRED, PHRAP and CONSED programs; K. Buetow for the CGAP SNP data; S. McGinnis for information about Unigene and E. Partsch for the sequence of pCMVSPORT. K.I. was supported by USPHS National Research Service Award GM08375. V.K. was supported by USPHS National Research Service Award GM07104. S.N. was supported by the Gwynn Hazen Cherry Memorial Laboratory. W.W. was supported by National Science Foundation grants NSF-DMS-9703918 and NSF-DBI-9904701. C.J.L. was supported by Department of Energy grant DEFG0387ER60615 and a grant from the Searle Scholars Program. Experimental SNP verification costs were supported in part by UC-Biostar grant S97106 to S.N.

Author information

Authors and Affiliations

  1. Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California, USA
    Kris Irizarry & Christopher J. Lee
  2. Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, USA
    Vlad Kustanovich & Stanley Nelson
  3. Department of Statistics, University of California, Los Angeles, Los Angeles, California, USA
    Cheng Li & Wing Wong
  4. Department of Pediatrics, University of California, Los Angeles, Los Angeles, California, USA
    Stanley Nelson
  5. Graduate Program in Computer Science, University of California, Los Angeles, Los Angeles, California, USA
    Nik Brown

Authors

  1. Kris Irizarry
    You can also search for this author inPubMed Google Scholar
  2. Vlad Kustanovich
    You can also search for this author inPubMed Google Scholar
  3. Cheng Li
    You can also search for this author inPubMed Google Scholar
  4. Nik Brown
    You can also search for this author inPubMed Google Scholar
  5. Stanley Nelson
    You can also search for this author inPubMed Google Scholar
  6. Wing Wong
    You can also search for this author inPubMed Google Scholar
  7. Christopher J. Lee
    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toChristopher J. Lee.

Supplementary information

Rights and permissions

About this article

Cite this article

Irizarry, K., Kustanovich, V., Li, C. et al. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences.Nat Genet 26, 233–236 (2000). https://doi.org/10.1038/79981

Download citation

This article is cited by