Genomic features defining exonic variants that modulate splicing - PubMed (original) (raw)

Genomic features defining exonic variants that modulate splicing

Adam Woolfe et al. Genome Biol. 2010.

Abstract

Background: Single point mutations at both synonymous and non-synonymous positions within exons can have severe effects on gene function through disruption of splicing. Predicting these mutations in silico purely from the genomic sequence is difficult due to an incomplete understanding of the multiple factors that may be responsible. In addition, little is known about which computational prediction approaches, such as those involving exonic splicing enhancers and exonic splicing silencers, are most informative.

Results: We assessed the features of single-nucleotide genomic variants verified to cause exon skipping and compared them to a large set of coding SNPs common in the human population, which are likely to have no effect on splicing. Our findings implicate a number of features important for their ability to discriminate splice-affecting variants, including the naturally occurring density of exonic splicing enhancers and exonic splicing silencers of the exon and intronic environment, extensive changes in the number of predicted exonic splicing enhancers and exonic splicing silencers, proximity to the splice junctions and evolutionary constraint of the region surrounding the variant. By extending this approach to additional datasets, we also identified relevant features of variants that cause increased exon inclusion and ectopic splice site activation.

Conclusions: We identified a number of features that have statistically significant representation among exonic variants that modulate splicing. These analyses highlight putative mechanisms responsible for splicing outcome and emphasize the role of features important for exon definition. We developed a web-tool, Skippy, to score coding variants for these relevant splice-modulating features.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Proportion of variants with gains or losses in exonic splicing regulatory sequence with significant differences between splice-affecting genome variants and HapMap SNPs. SAVs were characterized by (a) the loss of ESEs and (b) the gain of ESSs. As a comparison, ESEfinder, Ast-ESR and PESE losses are also included. These were not significantly different between SAVs and hSNPs. Z score _P_-values from random bootstrap sampling relating to each type of change are located on the right of the histogram.

Figure 2

Figure 2

Splice-affecting genome variants are characterized by losses of large numbers of NI-ESEs and the gain of large numbers of NI-ESSs, often in combination. For both ESE losses and ESS gains, the proportion of SAVs with changes of two or more were significantly greater compared to hSNPs. Combinations of ESE losses and ESS gains, as opposed to each occurring independently, are highly enriched in SAVs compared to hSNPs (bottom graph).

Figure 3

Figure 3

Distribution of specific types of NI-ESR changes for SAVs and hSNPs compared to neutral expectation. The tilde symbol (~) signifies an alteration where the hexamer is designated an ESE, neutral or ESS in both the wild-type and variant sequences. The arrow represents the direction of the change as a consequence of the change between wild type and variant hexamer. The neutral expected distribution reflects the underlying probability of each type of change given the ESE/ESS distribution among NI hexamers and the genome-wide nucleotide substitution bias in coding regions.

Figure 4

Figure 4

SAVs are enriched at the borders of exons. SAV and hSNP containing exons were divided into six equal sections and the proportion of variants falling into each section was plotted. While hSNPs were roughly distributed equally across the exon (with some depletion towards the edges), SAVs are significantly enriched at both edges of the exon (P = 0.005).

Figure 5

Figure 5

Regions surrounding SAVs are under greater non-coding evolutionary constraint. (a) We created a 192-codon position-specific scoring matrix based on genome-wide conservation levels across mammals. Matrix scores are visualized increasing from green to red. As scores are inversely proportional to the genome-wide conservation of each codon position, conservation levels can also be visualized using the same matrix, decreasing from green to red. (b) For each variant, four-way mammalian multiple DNA alignments were extracted for a region surrounding the variant, and a score assigned to each fully conserved column via the scoring matrix, and the total normalized by the length of the alignment. An example of a random synonymous CγG variant is shown. (c) The mean conservation score for all SAVs (blue arrow) and SAVs on autosomes (yellow arrow) was compared to a distribution of randomly sampled sets of scores from all hSNPs (orange distribution). Randomly sampled distributions of hSNPs were also created controlling for minimum distance from a splice junction by having similar distributions in this regard as SAVs (blue distribution). A distribution of mean conservation scores was also produced for hSNPs from autosomes also controlled by minimum distance from the splice site (yellow distribution).

Figure 6

Figure 6

Exons containing SAVs have significantly lower ESE and significantly higher ESS densities than exons containing hSNPs. As an illustration, the proportion of overlapping hexamers that are considered ESEs (green), ESSs (red) or splice neutral (grey) was plotted for 35 exons containing SAVs (that cause ESE/ESS changes) and a set of 35 randomly selected, length-matched hSNP-containing exons. Exons in both sets are sorted in descending order by ESS density.

Figure 7

Figure 7

Features that characterize variants that activate de novo ectopic splice sites ('ectopic SAVs'). (a) Most ectopic SAVs, in contrast to hSNPs and skipping SAVs, have a large Δ_SS_ value and create an ectopic splice site that is stronger than the natural splice site. (b) Hexamers in the vicinity of the splice junctions are largely made up of ESSs. The graph represents the proportion of positions occupied either by an ESE or ESS motif across approximately 25,000 internal exons. Each position on the graph represents the first base of a hexamer sliding across 100 bp of the upstream and downstream introns and the first and last 50 bp of the exon. (c) Ectopic SAVs are located predominantly in the vicinity of the splice site of the same type created, that is, the majority of ectopic splice sites created are 5' ectopic sites and are located towards the end of the exon close to the 5' splice site. hSNPs that create a strong ectopic splice site computationally ('ectopic-like' hSNPs) are distributed across the exon in quite the opposite way, indicating the same constraints do not apply to these variants.

Similar articles

Cited by

References

    1. Ingram EM, Spillantini MG. Tau gene mutations: dissecting the pathogenesis of FTDP-17. Trends Mol Med. 2002;8:555–562. doi: 10.1016/S1471-4914(02)02440-1. - DOI - PubMed
    1. Krawczak M, Reiss J, Cooper DN. The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum Genet. 1992;90:41–54. doi: 10.1007/BF00210743. - DOI - PubMed
    1. Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002;3:285–298. doi: 10.1038/nrg775. - DOI - PubMed
    1. Eriksson M, Brown WT, Gordon LB, Glynn MW, Singer J, Scott L, Erdos MR, Robbins CM, Moses TY, Berglund P, Dutra A, Pak E, Durkin S, Csoka AB, Boehnke M, Glover TW, Collins FS. Recurrent de novo point mutations in lamin A cause Hutchinson-Gilford progeria syndrome. Nature. 2003;423:293–298. doi: 10.1038/nature01629. - DOI - PMC - PubMed
    1. Venables JP. Downstream intronic splicing enhancers. FEBS Lett. 2007;581:4127–4131. doi: 10.1016/j.febslet.2007.08.012. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources