Single nucleotide polymorphism-based validation of exonic splicing enhancers - PubMed (original) (raw)

Single nucleotide polymorphism-based validation of exonic splicing enhancers

William G Fairbrother et al. PLoS Biol. 2004 Sep.

Abstract

Because deleterious alleles arising from mutation are filtered by natural selection, mutations that create such alleles will be underrepresented in the set of common genetic variation existing in a population at any given time. Here, we describe an approach based on this idea called VERIFY (variant elimination reinforces functionality), which can be used to assess the extent of natural selection acting on an oligonucleotide motif or set of motifs predicted to have biological activity. As an application of this approach, we analyzed a set of 238 hexanucleotides previously predicted to have exonic splicing enhancer (ESE) activity in human exons using the relative enhancer and silencer classification by unanimous enrichment (RESCUE)-ESE method. Aligning the single nucleotide polymorphisms (SNPs) from the public human SNP database to the chimpanzee genome allowed inference of the direction of the mutations that created present-day SNPs. Analyzing the set of SNPs that overlap RESCUE-ESE hexamers, we conclude that nearly one-fifth of the mutations that disrupt predicted ESEs have been eliminated by natural selection (odds ratio = 0.82 +/- 0.05). This selection is strongest for the predicted ESEs that are located near splice sites. Our results demonstrate a novel approach for quantifying the extent of natural selection acting on candidate functional motifs and also suggest certain features of mutations/SNPs, such as proximity to the splice site and disruption or alteration of predicted ESEs, that should be useful in identifying variants that might cause a biological phenotype.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no conflicts of interest exist.

Figures

Figure 1

Figure 1. Density of Predicted ESEs and SNPs along Human Exons

RESCUE-ESE hexamers were searched against a database of 121,000 internal human exons. ESE density (blue curve) was determined as the fraction of hexamers beginning at the given exon position in this dataset that were contained in the RESCUE-ESE set. SNP density (red curve) was determined analogously using SNPs from dbSNP mapped to the exon database. Both curves were smoothed by averaging the densities over a leading (3′ss) or lagging (5′ss) window of ten nucleotides.

Figure 2

Figure 2. Analysis of the Effects of SNPs and Unselected Mutations on Predicted ESEs

(A) The percentages of the four prediction outcomes. ESE disruption (+ −), ESE alteration (+ +), ESE neutrality (− −), and ESE creation (− +) changes are listed for the set of 2,561 validated SNPs (selected) and for the set of 100,000 simulated (unselected) mutations. (B) Synonymous and nonsynonymous mutations were analyzed separately and then compared using the MH test for homogeneity. All outcomes passed the MH test for homogeneity (H0:Outcomesynon ≈ Outcomenonsynon; p < 0.05) and could, therefore, be combined into a summary OR (weighted combination of the ORs measured in the synonymous and nonsynonymous sets). The height of each bar can be interpreted as the odds that the listed outcome will occur in the evolutionarily selected set of mutations (SNPs) relative to the odds that the same outcome will occur in the unselected (simulated mutation) set. Error bars extend one standard deviation on either side of the calculated value.

Figure 3

Figure 3. Selection against Disruption of Predicted ESEs in Different Exon Regions

Summary ORs were calculated for mutations that disrupt RESCUE-ESEs as in Figure 2, for each of four regions spanning the length of a typical human internal exon. The heights of the blue bars represent the odds that an ESE will be disrupted by a mutation in the set of 2,561 validated SNPs (selected mutations) relative to the odds of disruption in the set of 100,000 simulated (unselected) mutations. Error bars extend one standard deviation on either side of the calculated value.

Figure 4

Figure 4. Measuring Selective Pressure on Each RESCUE-ESE Hexamer

Any point mutation alters six overlapping hexamers, and so a database of 8,408 SNP mutations alters a total of approximately 50,000 hexamers in the wild-type (ancestral) allele. In considering all 238 RESCUE-ESE hexamers, the frequency of each ESE hexamer in the total set of ancestral alleles was recorded for the database of SNPs and simulated mutations (8,408 SNP mutations and 100,000 simulated mutations). The ESE frequency in the SNP set was divided by the ESE frequency in the simulated set to calculate the RR for each of the 238 hexamers. (A) The distribution of RR for all 238 ESE hexamers is plotted on a logarithmic scale. A resampling strategy was used to identify 57 ESE hexamers that were significantly conserved (pink bars have an RR less than 1; p < 0.05) and also six ESE hexamers that were not conserved (blue bars have an RR greater than 1; p < 0.05). (B) The output of RESCUE-ESE was compared for several vertebrate genomes (human, mouse, pufferfish, and zebrafish). The set of 238 human RESCUE-ESE hexamers was divided into nonoverlapping subsets based on their conservation in the RESCUE-ESE output generated from other vertebrates. The proportion of ESEs that were significantly conserved in the SNP analysis (as described above in [A]) were recorded for each subset of RESCUE-ESE hexamers and are represented as pink sectors in the pie chart.

Similar articles

Cited by

References

    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. - PubMed
    1. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, et al. Recent segmental duplications in the human genome. Science. 2002;297:1003–1007. - PubMed
    1. Betticher DC, Thatcher N, Altermatt HJ, Hoban P, Ryder WD, et al. Alternate splicing produces a novel cyclin D1 transcript. Oncogene. 1995;11:1005–1011. - PubMed
    1. Blencowe BJ. Exonic splicing enhancers: Mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci. 2000;25:106–110. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources