Pattern of Sequence Variation Across 213 Environmental Response Genes (original) (raw)

Robert J. Livingston 1,
Andrew von Niederhausern 2,
Anil G. Jegga 3,
Dana C. Crawford 1,
Christopher S. Carlson 1,
Mark J. Rieder 1,
Sivakumar Gowrisankar 3,
Bruce J. Aronow 3,
Robert B. Weiss 2, and
Deborah A. Nickerson 1,4
1 Department of Genome Sciences, University of Washington, Seattle, Washington 98195-7730, USA
2 Department of Human Genetics, University of Utah, Salt Lake City, Utah 84112-5330, USA
3 Division of Pediatric Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio 45229 USA

Abstract

To promote the clinical and epidemiological studies that improve our understanding of human genetic susceptibility to environmental exposure, the Environmental Genome Project (EGP) has scanned 213 environmental response genes involved in DNA repair, cell cycle regulation, apoptosis, and metabolism for single nucleotide polymorphisms (SNPs). Many of these genes have been implicated by loss-of-function mutations associated with severe diseases attributable to decreased protection of genomic integrity. Therefore, the hypothesis for these studies is that individuals with functionally significant polymorphisms within these genes may be particularly susceptible to genotoxic environmental agents. On average, 20.4 kb of baseline genomic sequence or 86% of each gene, including a substantial amount of introns, all exons, and 1.3 kb upstream and downstream, were scanned for variations in the 90 samples of the Polymorphism Discovery Resource panel. The average nucleotide diversity across the 4.2 MB of these 213 genes is 6.7 × 10-4, or one SNP every 1500 bp, when two random chromosomes are compared. The average candidate environmental response gene contains 26 PHASE inferred haplotypes, 34 common SNPs, 6.2 coding SNPs (cSNPs), and 2.5 nonsynonymous cSNPs. SIFT and Polyphen analysis of 541 nonsynonymous cSNPs identified 57 potentially deleterious SNPs. An additional eight polymorphisms predict altered protein translation. Because these genes represent 1% of all known human genes, extrapolation from these data predicts the total genomic set of cSNPs, nonsynonymous cSNPs, and potentially deleterious nonsynonymous cSNPs. The implications for the use of these data in direct and indirect association studies of environmentally induced diseases are discussed.

Footnotes

Supplemental material is available online at www.genome.org. All sequence data from this study have been submitted to GenBank and are available from our Web site at http://egp.gs.washington.edu and at other sites listed herein.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.2730004\. Article published online ahead of print in September 2004.
↵4 Corresponding author. E-MAIL debnick{at}u.washington.edu; FAX (206) 221-6498.
- Received April 30, 2004.
- Revision received August 4, 2004.
Cold Spring Harbor Laboratory Press