Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences (original) (raw)

Nature Genetics volume 26, pages 233–236 (2000)Cite this article

A Correction to this article was published on 01 November 2000


Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes1,2,3,4,5,6,7,8,9,10,11. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations—comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing—verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at

We thank B. Modrek for assistance in mapping polymorphisms; P. Green for the PHRED, PHRAP and CONSED programs; K. Buetow for the CGAP SNP data; S. McGinnis for information about Unigene and E. Partsch for the sequence of pCMVSPORT. K.I. was supported by USPHS National Research Service Award GM08375. V.K. was supported by USPHS National Research Service Award GM07104. S.N. was supported by the Gwynn Hazen Cherry Memorial Laboratory. W.W. was supported by National Science Foundation grants NSF-DMS-9703918 and NSF-DBI-9904701. C.J.L. was supported by Department of Energy grant DEFG0387ER60615 and a grant from the Searle Scholars Program. Experimental SNP verification costs were supported in part by UC-Biostar grant S97106 to S.N.

Author information

Authors and Affiliations

  1. Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California, USA
    Kris Irizarry & Christopher J. Lee
  2. Department of Human Genetics, University of California, Los Angeles, Los Angeles, California, USA
    Vlad Kustanovich & Stanley Nelson
  3. Department of Statistics, University of California, Los Angeles, Los Angeles, California, USA
    Cheng Li & Wing Wong
  4. Department of Pediatrics, University of California, Los Angeles, Los Angeles, California, USA
    Stanley Nelson
  5. Graduate Program in Computer Science, University of California, Los Angeles, Los Angeles, California, USA
    Nik Brown


  1. Kris Irizarry
    You can also search for this author inPubMed Google Scholar
  2. Vlad Kustanovich
    You can also search for this author inPubMed Google Scholar
  3. Cheng Li
    You can also search for this author inPubMed Google Scholar
  4. Nik Brown
    You can also search for this author inPubMed Google Scholar
  5. Stanley Nelson
    You can also search for this author inPubMed Google Scholar
  6. Wing Wong
    You can also search for this author inPubMed Google Scholar
  7. Christopher J. Lee
    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toChristopher J. Lee.

Irizarry, K., Kustanovich, V., Li, C. et al. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences.Nat Genet 26, 233–236 (2000).

