Splicing of designer exons reveals unexpected complexity in pre-mRNA splicing (original) (raw)

Abstract

Pre-messengerRNA (mRNA) splicing requires the accurate recognition of splice sites by the cellular RNA processing machinery. In addition to sequences that comprise the branchpoint and the 3′ and 5′ splice sites, the cellular splicing machinery relies on additional information in the form of exonic and intronic splicing enhancer and silencer sequences. The high abundance of these motifs makes it difficult to investigate their effects using standard genetic perturbations, since their disruption often leads to the formation of yet new elements. To lessen this problem, we have designed synthetic exons comprised of multiple copies of a single prototypical exonic enhancer and a single prototypical exonic silencer sequence separated by neutral spacer sequences. The spacer sequences buffer the exon against the formation of new elements as the number and order of the original elements are varied. Over 100 such designer exons were constructed by random ligation of enhancer, silencer, and neutral elements. Each exon was positioned as the central exon in a 3-exon minigene and tested for exon inclusion after transient transfection. The level of inclusion of the test exons was seen to be dependent on the provision of enhancers and could be decreased by the provision of silencers. In general, there was a good quantitative correlation between the proportion of enhancers and splicing. However, widely varying inclusion levels could be produced by different permutations of the enhancer and silencer elements, indicating that even in this simplified system splicing decisions rest on complex interplays of yet to be determined parameters.

Keywords: pre-mRNA, splicing, enhancer, silencer, designer, synthetic

INTRODUCTION

In higher organisms, pre-messengerRNA (mRNA) splicing represents an essential step in the transfer of information from DNA to protein, i.e., the central dogma. Much is known about the chemistry of intron removal catalyzed by the spliceosome, a multisubunit ribonucleoprotein comparable in size and complexity to the ribosome. Less is known about the recognition of the splice sites, which is the key step in deciphering the information resident in the primary transcript. The splice site sequences themselves—a 9-nucleotide (nt) stretch straddling the 5′ splice site, an ∼15-nt region at the 3′ splice site (including the polypyrimidine tract), and a 7-nt branch point sequence—do not seem to contain sufficient information for this purpose, since such combinations of sequences occur within large introns at frequencies greater than the actual splice sites (Senapathy et al. 1990; Sun and Chasin 2000; Chasin 2007). Additional information can be provided in the form of splicing enhancer elements located at various positions within the exons (exonic splicing enhancers [ESEs]) or their intronic flanks (ISEs) and in similarly placed splicing silencers (exonic splicing silencers [ESSs] and intronic splicing silencers [ISSs]). In general ESE elements are bound by SR proteins and ESSs and ISSs by hnRNP proteins, but proteins outside these exact categories are also often involved (for reviews, see Ladd and Cooper 2002; Black 2003; Bourgeois et al. 2004; Zheng 2004; Pozzoli and Sironi 2005).

Complete catalogs of these regulatory sequence motifs have been sought by protein binding determinations, functional selections, and validated computational predictions. The results have in a sense been too successful, in that by now at least 75% of the nucleotides in a typical human exon reside in motifs that have been found to influence splicing in one study or another (Chasin 2007). This high density of regulatory information often makes it difficult to make genetic perturbations that cleanly test the role of a particular motif. Three examples of this emergent ambiguity are presented in Figure 1. Example 1 shows two typical cases of a SELEX winner from a functional selection for splicing activity (Liu et al. 1998). The fact that many motifs are likely to be found in any random sequence leads to the presence of a high noise level in such experiments. (A complete analysis of all sequences underlying the ESEfinder program is presented in Supplemental Fig. S1.) Example 2 shows the sequence resulting from the insertion of a putative exonic splicing silencer (PESS) that we examined for silencing activity in a test exon (Zhang and Chasin 2004). Besides the addition of the ESS, several enhancer motifs were created and a preexisting silencer of another class was disrupted. Example 3 shows the substitution in a test exon of a predicted exonic splicing regulator (ESR) motif that reduced splicing efficiency (Goren et al. 2006). Concomitant with the substitution, an ESE was disrupted and an additional ESR was created. These examples are typical rather than exceptional. Thus, the very act of placing a motif at an exactly specified location often changes the nature of the exon in unintended ways, reminiscent of the Heisenberg uncertainty principle (Heisenberg 1927).

FIGURE 1.

FIGURE 1.

Examples of ambiguity in the identification or testing of splicing regulatory motifs. (1) Top: A functional SELEX selected sequence (Liu et al. 1998) that conferred responsiveness to SRp55. The match to the derived ESEfinder (Cartegni et al. 2003) SRp55 consensus sequence is underlined. The sequence also contains overlapping predicted PESE motifs (bold). Bottom: A functional SELEX selected sequence that conferred responsiveness to SRp40, with the SRp40 ESEfinder motif underlined. The sequence also contains a RESCUE-ESE (Fairbrother et al. 2002) motif (bold). These sequences are taken from those underlying ESEfinder (v.2; http://rulai.cshl.edu/tools/ESE2/) and were provided by Adrian Krainer (Cold Spring Harbor Laboratory). A similar analysis of all the sequences used by ESEfinder is presented in Supplemental Figure S1. (2) Testing of a predicted PESS by its insertion into a test exon (thbs4 exon 13) by Zhang and Chasin (2004). The bold sequence at the bottom was inserted into a BamHI site (arrowhead) in a test exon. Beyond the addition of the PESS, a fashex3 ESS (f-ESS) was disrupted (underlined in top sequence), a PESE was created (underlined in the bottom sequence), and two overlapping RESCUE-ESEs were created (R-ESEs, underlined in the bottom sequence). (3) Testing a predicted exonic splicing regulator (ESR, bold 6-mer at bottom) by substituting it for a 10-mer (bold at top) in a test exon (Goren et al. 2006). Besides the addition of the ESR, a PESE (underlined at top) was disrupted and an additional ESR (underlined at bottom) was created.

The context of a splicing regulatory motif (Kanopka et al. 1996; Mayeda et al. 1999; Goren et al. 2006) or of a splice site (Lear et al. 1990; Carothers et al. 1993; Hwang and Cohen 1997) can exert a strong influence on splicing efficiency. This context can be viewed in molecular terms by the quality and proximity of splicing regulatory motifs relative to each other and relative to the splice sites. A straightforward molecular genetic approach to test such a model would involve varying these parameters, but due to the density of regulatory motifs such variations would almost always change several parameters at once, and so confound the interpretation. One way to get around this problem might be to search for rare exons containing just a few well-defined regulatory motifs that are each separated by sequences predicted to have no effect on splicing, i.e., neutral sequences. Another way would be to construct such exons in silico and then in vitro, using as building blocks known motifs that have enhancing, silencing, or neutral effects. Here we have used the latter approach, using a prototype ESE, ESS, and a putatively neutral 8-mer motif. We have assembled exons with random combinations of these elements placed between a constant pair of natural 3′ and 5′ splice sites. These “designer exons” have a general requirement for the ESE modules to achieve efficient splicing and are inhibited by the inclusion of the ESS modules. Despite their apparently simplified modular organization, splicing of these designer exons exhibits a complex dependence on the exact pattern of the ESEs and ESSs present.

RESULTS

Design of the exons

We used three modules to build designer exons: an exonic splicing enhancer (ESE), an exonic splicing silencer (ESS), and a neutral sequence. Each module consisted of one particular 8-nt sequence chosen from among putative ESEs and ESSs and neutral sequences we previously identified on the basis of their overrepresentation in exons vs. human transcript regions that do not undergo splicing (Zhang and Chasin 2004). Libraries of exons consisting of multiple instances of one ESE and one ESS motif were created by using linkers with complementary overhangs (Fig. 2A) for the random ligation of the synthetic sequences. The linkers were designed to create a neutral motif upon ligation (Fig. 2B); thus the same neutral spacer is in place between each and every enhancer or silencer motif. The ligation products were inserted between 3′ and 5′ splice sites taken from intron 1 and intron 3, respectively, of the Chinese hamster dihydrofolate reductase (dhfr) gene. The exons so formed constituted the middle exon of a 3-exon minigene (Fig. 2C) with the 5′ exon being dhfr exon 1 (with its promoter) and the 3′ exon consisting of the fused dhfr exons 4 through 6 (with the first dhfr polyA site). Each designer exon also contains a neutral sequence at each end of the stretch of modules, generated as part of the insertion process (Fig. 2C).

FIGURE 2.

FIGURE 2.

Construction of designer exons. (A) Cartoons of E, N, and S modules showing single-stranded ends used for ligation, along with the actual sequences. (B) Color-coded examples of possible designer exons (green, ESE; red, ESS; gray, neutral) along with the abbreviated notation used (E, S, N). Note that the abbreviated notation does not indicate the neutral 8-mer that lies between each E and S module and at each end. (C) Diagram of a designer exon within the test minigene used. Exon 1 is exon 1 of the Chinese hamster dhfr gene, exon 3 comprises the fused exons 4 to 6 of the dhfr gene. The 3′ and 5′ splice sites (SS) are from dhfr introns 1 and 3, respectively. (D) _Z_-score profile of all 8-mers in a possible designer exon (EESSEESE). The E and S modules generate salient signals over an otherwise unremarkable landscape.

The first and key step in the experimental design was to select an appropriate combination of ESE, ESS, and neutral sequence modules. We required the combination of ESE/ESS/neutral sequences to meet several criteria. The first, and only essential, criterion was that the concatenation of these modules in any order should not yield any sequence that falls outside a neutral range (see below). Second, their concatenation should not produce any in-frame stop codons, to rule out nonsense codon-mediated decay (NMD) as a factor. Third, the ESE and ESS should contain distinct restriction sites, to facilitate the determination of their order by partial digestion.

We previously devised a scoring scheme and identified lists of octamers as putative splicing enhancers and silencers (PESE and PESSs) (Zhang and Chasin 2004). In that work, each octamer was assigned two _Z_-scores based on its over/underrepresentation in internal noncoding exons vs. (1) pseudo exons and (2) 5′-UTRs of intronless genes. The two _Z_-scores were called the P-score and I-score, respectively. Underrepresented octamers were assigned negative _Z_-scores. An octamer was called a PESE if both scores were >2.62 or a PESS if both scores were lower than −2.62. Based on these criteria, we collected a list of ∼2000 PESEs and ∼1000 PESSs. We searched this list for PESE, neutral, and PESS sequence combinations that fulfilled the criteria discussed in the above paragraph, requiring the neutral spacer sequence to have a _Z_-score with an absolute value of <1.8. From among millions of 8-mer combinations, only about three dozen met all the criteria. We chose the following three sequences to build designer exons: TCCTCGAA (an ESE, P-score +3.99, I-score +3.44), CCAAACAA (a neutral sequence: P-score −0.28, I-score −0.98), and CACATGGT (an ESS, P-score −4.50, I-score −3.38), which we term “E,” “N,” and “S” for brevity. An example of the distribution of these scores across all of the 8-mers of a typical designer exon is shown in Figure 2D.

Enhancers are required for the efficient splicing of designer exons

We first tested these sequences for their effect on splicing by inserting each singly into a BamHI site in the central exon (chuk exon 8) of a 3-exon minigene, and measuring the proportion exon inclusion (included/[included + skipped]) after transfection into human 293 cells and semiquantitative PCR, as described in our previous study (Zhang and Chasin 2004). As expected, the E sequence promoted splicing of a poorly spliced version of the _chuk_8 exon, the S sequence inhibited splicing of a well spliced _chuk_8 exon, and the N sequence had little effect on either type of exon (data not shown). We next constructed homogeneous designer exons made up of multiple copies of just a single type of motif (E, S, or N), constructed as described above and in Figure 2. Designer exons made up of E modules spliced very well, whereas those made up of S modules showed little or no splicing. Designer exons made up exclusively of concatenated N modules were also very poorly spliced: An exon with three modules (seven N 8-mers counting the spacers) was included 13% of the time and an exon with five modules (11 N 8-mers counting the spacers) showed almost no inclusion. Thus, these designer exons in this context require an enhancer for efficient splicing. We then went on to assemble a large number of additional designer exons carrying both E and S modules and test their splicing efficiency after transfection into 293 cells.

Splicing of designer exons carrying randomly combined E and S motifs

We ligated E and S motifs at various ratios, inserted the ligation products into the 3-exon minigene vector, and isolated 139 clones. The number and order of the modules were quickly determined by PCR amplification of the plasmid region spanning the designer exon using a fluorescently labeled primer followed by partial digestion with TaqI (TCGA) and with CviAII (CATG), as these sites are present in the E and S modules, respectively. From the ladder of fluorescent bands seen after electrophoresis, the arrangement of modules could be deduced (Fig. 3A). Splicing was then measured after transfection of 293 cells with plasmid DNA and using semiquantitative radioactive RT-PCR (Chen and Chasin 1993) to quantify molecules that included or skipped the designer exon. An example of these results is shown in Figure 3B. The splicing efficiencies of 139 exons tested are presented in Figure 4, where they are ranked according to the proportion of designer exon inclusion and with a graphical depiction of their structure. Reading from left to right it can be seen by eye that in general splicing efficiency decreases as the number of Es (green boxes) per exon decreases and as the number of Ss (red boxes) increases. It should be kept in mind that in between each E and/or S there exists an N module that is not depicted. Less easily but also discernable is a tendency for splicing efficiency to decrease with exon length.

FIGURE 3.

FIGURE 3.

Determination of designer exon genotypes and phenotypes. (A) Module order screening. Plasmid DNA from designer exon clones was PCR-amplified using one fluorescently tagged primer and then cleaved with the diagnostic restriction enzymes (RE) TaqI, which cuts in the E module or CViAII, which cuts in the S module. (Lane 1) A clone with seven E modules and no S modules cut with TaqI; (lane 4) a clone with six S modules and no E modules, cut with CviAII; these lanes serve here as standards. (Lanes 3,4) The same 10-module clone (SEESSESESE) cut with either TaqI (lane 2) or CviAII (lane 3). The relative positions of the bands (labeled E or S) allow the order of E and S modules to be read directly from the gel. All constructs used for analysis were subsequently DNA-sequenced. (B) Splicing phenotype measurement. Plasmids harboring designer exons were transfected into 293 cells and the mRNA products were amplified by radioactive RT-PCR. The relative amounts of molecules that included (I) or skipped (S) the designer exon were determined by Phosphorimaging. The analysis of eight representative clones is shown, with two independent transfections for each clone. The sequence of modules present in each exon is shown below each pair of lanes.

FIGURE 4.

FIGURE 4.

Splicing of designer exons. Bottom: Splicing of 139 designer exons ranked by percent exon inclusion. Exon inclusion is defined as 100 × included/(included + skipped), and the value is the average of at least two independent transfections (average SE = 16%). Top: Structure of the corresponding designer exons. Each colored rectangle represents an E (green) or S (red) module. The 5′ end of the exon is at the bottom. Exon inclusion levels for this designer exon are presented in tabular form in Supplemental Table 1.

A more quantitative assessment of correlations between splicing efficiency and designer exon structure was made by calculating Pearson's correlation of determination, R2, from scatter plots of the data (Fig. 5). Each of the six charts in Figure 5 tests a hypothesis about the dependence of splicing on these regulatory sequences. The first is that splicing is proportional to the absolute number of enhancer modules in an exon. We found a significant, if weak, correlation between splicing and the number of Es (Fig. 5B; R2 = 0.06, P = 0.004, _t_-test). The correlation was much stronger when we considered the proportion of Es in an exon (Fig. 5A; R2 = 0.53, P < 3e-24). A converse hypothesis is that it is the silencers that play the most important role in determining splicing efficiency. Indeed, the number of Ss per exon produced a much stronger (negative) correlation with inclusion rate (Fig. 5C; R2 = 0.78, P < 5e-47) than the number of Es. The correlation for the proportion of Ss is the same as for the proportion of Es by definition here, although of opposite sign.

FIGURE 5.

FIGURE 5.

Correlations between exon inclusion and splicing regulatory elements. Straight lines were fitted by a linear regression (Excel). (A) Percent enhancers (100 × E/[E + S]). (B) Number of enhancers. (C) Number of silencers. (D) Ratio of enhancers to silencers. (E) Number of enhancers minus number of silencers. (F) Exon length (number of Es plus Ss).

The factors governing splicing decisions, or the splicing code, have often been ascribed to a balance between positive and negative factors, so we tested the effect of combining the E and S content of each exon by calculating the E/S ratio and the E − S difference. To our surprise, these variables were less correlated with splicing (Fig. 3D,E; R2 = 0.40 and 0.42, respectively) than the proportion of Es or Ss considered independently, suggesting that combining Es and Ss in these ways added more noise than information. We also examined exon length, a variable related to E or S content, and found a weaker but highly significant negative correlation with splicing (Fig. 3F; R2 = 0.16, P < 1e-6). The strongest correlation seen was with the number of Ss (Fig. 5C), indicating that silencing was the most important factor at play in these exons. However, the negative effect of the number of Ss is actually a measurement of two variables: percentage of Ss and length, since longer exons will tend to have more Ss. The effect of Ss normalized for exon length can be seen in the plot of %E (Fig. 5A), which is equivalent to 1 − %S, and which shows a lesser but still strong correlation (0.53 for %E vs. 0.78 for number of Ss).

Splicing of designer exons carrying no silencers

Although the correlation coefficients for %E and number of Ss indicate that most of the variance can be explained by a linear relationship between inclusion and these variables, the fact remains that there is considerable scatter among these points. For example, in the plot with the best correlation (Fig. 5C), all exons having three Ss yielded inclusion levels ranging from 3% to 55%. We considered the possibility that complexities inherent in antagonism between Es and Ss tend to produce a metastable state, and that a better correlation between splicing and Es would be seen in the absence of added silencers. Designer exons were therefore constructed by randomly ligating E and N modules. Here again we point out that there were always additional N modules as spacers between each pair of named modules, plus one at each end: i.e., the designation ENE represents the sequence nEnNnEn, where the lower case n is formed in the course of construction and is the same sequence as N. Twenty-two EN exons were analyzed for splicing. The results are shown in Figure 6A, which displays the inclusion levels that correspond to the specific exon structures. The correlation between inclusion and %E was somewhat better for these E + N exons (Fig. 6B; R2 = 0.75, P < 6e-13) compared to E + S exons (Fig. 5A; R2 = 0.53, P < 3e-6), but there was still considerable scatter: At E = 50% the inclusion levels ranged from 38% to 94%.

FIGURE 6.

FIGURE 6.

Splicing of exons designed without silencers. (A) Bottom: Splicing of 22 designer exons ranked by percent exon inclusion. Top: Structure of the corresponding designer exons. Each colored rectangle represents an E (dark gray) or N (light gray) module. An additional N module is present between each E and N module but these are not depicted. The 5′ end of the exon is at the bottom. (B) Correlation between exon inclusion and the proportion of E elements in an exon. Exon inclusion levels for these designer exons are presented in tabular form in Supplemental Table 1.

Consideration of predicted secondary structure

It is possible that many of the E and S sequences included in these exons are not available for enhancing or silencing splicing because they are sequestered in the double-stranded stems of secondary structures. Anecdotal examples of secondary structure affecting splicing are many (for review, see Buratti and Baralle 2004) and a survey of functional ESEs showed that these tend to remain single-stranded (Hiller et al. 2007). If only the single-stranded Es in our designer exons were functional, then we might see a better correlation between inclusion and this subset of Es. We used RNAstructure (Mathews 2006) to fold each of the E + S designer exons and then assigned each base in the E, N, and S modules a probability of being in a double-stranded stem (s), a single-stranded loop (l), or a single-stranded interstem region (i). The designer exons as a whole did not form exceptionally stable secondary structures; for the most stable structures the average free energy value per nucleotide was −0.22 kcal/mol compared to −0.21 for scrambled versions. Correlation coefficients were calculated between exon inclusion level and each of these nine variables (Es, El, Ei; Ns, Nl, N_i_; Ss, Sl, Si). As can be seen in Table 1, none of the R2 values for the proportion of E or S nucleotides that are confined to stems, loops, or interstem regions was appreciably greater than the value for the E or S nucleotides as a whole. These results do not support the idea that variable secondary structures underlie the wide ranges of inclusion levels for designer exons that have the same proportion of E or S modules.

TABLE 1.

Correlation coefficients between inclusion level and the proportion of module bases found in different types of predicted secondary structures

graphic file with name 367tbl1.jpg

DISCUSSION

Responses to enhancers and silencers

The concatenation of single specific enhancer and silencer modules has been used here to construct exons that are much less complex than their natural counterparts. The plainness of these exons has allowed us to test some simple hypotheses regarding internal exon recognition in pre-mRNA splicing.

The first hypothesis is that enhancers are necessary for efficient splicing, and it is supported by our results. Designer exons consisting solely of neutral sequences spliced poorly if at all; incorporation of enhancers was required to achieve inclusion levels near 100%. As yet, this conclusion is limited to the context we provided, the natural splice sites and intronic flanks found in the dhfr gene. These 3′ and 5′ splice sites are of average or above average strength (i.e., agreement with the consensuses), with consensus values (Senapathy et al. 1990; Zhang et al. 2005b) of 81 and 88, respectively. It is likely that provision of stronger splice site sequences could obviate the need for an enhancer (see, for example, Ram et al. 2008). Nonetheless, most natural exons do not have splice sites stronger than those used here.

The second hypothesis is that splicing efficiency increases in proportion to the number of enhancer elements, and it is less directly supported by our data. Hertel and Maniatis showed that in vitro splicing of a 2-exon transcript responded linearly to the addition of multiple enhancer elements downstream from the 3′ splice site (Hertel and Maniatis 1998). Our results for an internal exon agree with this finding in that a highly significant correlation (R2 = 0.53) to a linear model was found for exon inclusion vs. the number of enhancers per exon if the data were normalized for exon length differences (Fig. 5A; %E). However, the great splicing variability seen among exons having the same proportion of enhancers belies the simple model in which the mere presence of an enhancer sequence adds linearly to the probability of binding an activator protein which in turn leads to a proportional increase in splicing.

We tried to take into account the possibility that some of the included motifs were being sequestered in secondary structures, but our test did not provide support for this idea. In particular, the simple notion that Es are much more effective when present as single-stranded targets was not substantiated. This negative result cannot be considered conclusive, as our ability to predict the in vivo secondary and tertiary structures of RNA molecules is limited. RNA folding in vivo may be influenced by RNA binding proteins that unwind, compete with, or enhance RNA–RNA interactions. Some of this secondary structure analysis was rather surprising, suggesting that S modules were more effective in inhibiting splicing when present in stems and loops compared to interstem regions (Table 1). It is possible that the particular S motif we used is a better target when presented in double-strand form. The N modules also showed an inhibitory effect when present in stems but a stimulatory effect when present in interstem regions (Table 1). These effects could be indirect, for instance, by N sequences forming a stem that places an S module in a loop. These ideas are amenable to experimental testing.

The third hypothesis is that splicing is the result of a balance between positive and negative elements. This oft-quoted inference has gained wide acceptance based mostly on its reasonableness, but has rarely been tested using multiple elements. Our data have supported this idea in a general sense: Whether the balance is considered the difference between enhancer and silencer content or their ratio, R2 values of 0.4 were obtained. These ways of defining balance proved no better than that given by the proportion of enhancer motifs (%E), which intrinsically also compares enhancer to silencer content (%E = 100 × E/[E + S]) in this system. However, the scatter in the data suggests that splicing is highly dependent on the relative positions of the E and S motifs, beyond their relative proportions. As yet we cannot posit a straightforward mathematical model capable of predicting splicing patterns based on the positions of the E and S modules. Table 2 illustrates this difficulty by showing three examples of pairs of compositional isoforms exhibiting two- to 10-fold differences in splicing efficiency despite having identical E/S ratios and E − S differences.

TABLE 2.

Designer exon pairs with the same composition but different splicing behavior

graphic file with name 367tbl2.jpg

Designer exons as a model system

The synthetic biology used to create designer exons results in molecules that are unlike any found in living cells. One might argue that we run the danger of being misled by the analysis of such artificial molecules; their behavior will tell us little about the rules governing the splicing of their more complex natural counterparts. We contend the opposite, that a prerequisite to understanding how individual elements combine to yield an emergent property requires an understanding of how the individual elements act, first alone, and then in simple combinations. The way in which transcriptional regulatory signals combine to produce what is often a binary decision is in many ways analogous to the splicing decision problem, and can also be explored using synthetic promoters (Alper et al. 2005; Ligr et al. 2006). Additional examples of this approach lie in the de novo design of proteins (Kuhlman et al. 2003) and cell membranes (Tanaka and Sackmann 2005).

We have not implemented an orchestrated combinatorial approach in these first experiments, but rather relied on seeing simple patterns emerge from randomly assembled molecules. Such was not the case, pointing to our ignorance of important parameters yet to be defined. We can speculate on several ways in which parameters may have been hidden in these designer exons. First, the assumption that the N modules are truly neutral may be wrong. The N module was chosen from among 10,000 sequences predicted to be neutral and was evaluated in an arbitrarily chosen test exon; in a different context it may not act neutrally. Second, we may have been overzealous in isolating the E motifs from each other and from the S motifs. The inclusion of 8-nt neutral spacers between each motif may be precluding important interactions that require close apposition. In this regard, we see no reason to assume that fundamentally neutral spacers are not occupied by RNA-binding proteins and so act passively to prevent positive interactions. Third, the multiplicity of exon lengths that were produced by random ligation added a confounding factor in the interpretation of the results.

Many of these problems will be solved by synthesizing designer exons in a more deliberate fashion. To this end we are developing methods to readily synthesize exons of defined size, and adding specific modules one or two at a time to produce a stable of exactly designed molecules. For instance, the placement of an E at 10 evenly spaced positions within a 160-nt exon will answer the question of whether proximity to a splice site is required for ESE action and whether there is specificity of a given E for a 3′ or 5′ splice site. This information can then be extended to real exons to weigh the potential of embedded ESEs and so improve exon prediction and the prediction of alternative splicing efficiency. The interplay between splice site strength and ESE and ESS effectiveness can also be weighed first in designer exons and then applied to natural exons. Finally, an all N vector with a unique restriction site will provide a convenient and perhaps more reliable vector for testing candidate ESEs gathered from real genes. Thus, despite the complexities revealed by this first generation analysis, we think that the properties of these synthetic molecules are likely to be useful for discovering properties of their natural counterparts.

MATERIALS AND METHODS

Construction of designer exons

A starting plasmid was pDCH1P12D, which contains a Chinese hamster dhfr minigene consisting of exon 1, a hybrid intron 1 + 3, exons 4 through 6 of the cDNA, and the first natural dhfr polyA site in exon 6; a NotI site was added near the center of the intron, which also contains a unique NheI site. The chuk8 minigene used for the initial test of candidate motifs was constructed by inserting exon 8 of the human chuk gene into the NotI site, either with ∼50 nt of intronic flanks or with just its splice sites. The construction of these _chuk_8 minigenes has been described previously (Zhang et al. 2005a,c). To construct designer exons, the two strands of E, S, or N modules were first mixed in an annealing buffer (30 mM HEPES at pH 7.4, 2 mM MgOAc, 100 mM K2OAc) at equal concentration (∼5 μg/μL). The mixtures were heated to 95°C and then gradually cooled at room temperature. The annealed E, S, or N modules were mixed at different ratios (1:3, 1:1, or 3:1) with T4 ligase (10 U in a 20-μL reaction mixture, 0.5 μg of each module). The ligation mix was subsequently electrophoresed in a 2% agarose gel and the products corresponding in length to three to 16 modules were extracted from the gel. These fragments were then ligated as a pool to the constant 5′ and 3′ modules containing an upstream EagI or downstream NheI restriction site, respectively. The product of this second ligation was then subjected to PCR with primers targeting the constant 5′ and 3′ modules. The PCR products were cut with EagI and NheI and inserted into pDCH1P12D that had been cut with NotI and NheI, the latter located 300 nt downstream from the NotI site of pDCH1P12D. The resulting minigenes have the designer exon located between a 300-bp upstream intron and a 600-bp downstream intron. The E and S module sequence of individual transformant colonies were determined by PCR amplification of the region spanning the designer exon using one primer 5′ end labeled with Cy5 followed by partial separate digestions with TaqI and CviAII, for which there are restriction sites in the E and S sequences, respectively. After electrophoresis, the fluorescent bands were visualized using a PhosphorImager Storm (Fig. 3A). The sequence of the E and S modules could be read from the partial digest patterns. All unique designer exons chosen for analysis were subsequently sequenced (GeneWiz).

Measurement of splicing

Human HEK293 cells were transfected with plasmid DNA using Lipofectamine 2000 (Invitrogen) according to the manufacturer's protocol. After 24 h, total RNA was isolated using RNAwiz (Ambion), reverse-transcribed with Omniscript and random hexamers from QIAGEN, and subjected to radioactive semiquantitative PCR as described previously (Chen and Chasin 1993). Percent inclusion was calculated as 100 × included/(included + skipped), using PhosphorImager counts of the indicated electrophoretic bands, taking into account the number of labeled bases in each molecule. Each transfection was performed at least twice; the average standard error for biological replicates was 16%.

Secondary structure analysis

The 139 E + S designer exon sequences were folded from −100 to + 100 relative to the exon ends using RNAstructure 4.5 (Mathews 2006) for DOS with default settings. The output of this program included the 20 most stable structures in .ct table format. The .ct values from all 20 structures were converted to Vienna dot-bracket depiction. The average designation of each base in each of the three structural categories (stem, loop, and interstem) was recorded, along with its identity as part of an E, N, or S motif. Pearson's correlation coefficient (r) and coefficient of determination (R2) were calculated for percent inclusion vs. percent of motif bases in each of the various predicted structural classes using a custom written computer program.

SUPPLEMENTAL MATERIAL

Supplemental material can be found at http://www.rnajournal.org.

ACKNOWLEDGMENTS

We thank Ashira Lubkin for a critical reading of the manuscript. This work was supported by a grant from the NIH (GM072740) to L.A.C.

Footnotes

REFERENCES

  1. Alper H., Fischer C., Nevoigt E., Stephanopoulos G. Tuning genetic control through promoter engineering. Proc. Natl. Acad. Sci. 2005;102:12678–12683. doi: 10.1073/pnas.0504604102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Black D.L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 2003;72:291–336. doi: 10.1146/annurev.biochem.72.121801.161720. [DOI] [PubMed] [Google Scholar]
  3. Bourgeois C.F., Lejeune F., Stevenin J. Broad specificity of SR (serine/arginine) proteins in the regulation of alternative splicing of pre-messenger RNA. Prog. Nucleic Acid Res. Mol. Biol. 2004;78:37–88. doi: 10.1016/S0079-6603(04)78002-2. [DOI] [PubMed] [Google Scholar]
  4. Buratti E., Baralle F.E. Influence of RNA secondary structure on the pre-mRNA splicing process. Mol. Cell. Biol. 2004;24:10505–10514. doi: 10.1128/MCB.24.24.10505-10514.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Carothers A.M., Urlaub G., Grunberger D., Chasin L.A. Splicing mutants and their second-site suppressors at the dihydrofolate reductase locus in Chinese hamster ovary cells. Mol. Cell. Biol. 1993;13:5085–5098. doi: 10.1128/mcb.13.8.5085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cartegni L., Wang J., Zhu Z., Zhang M.Q., Krainer A.R. ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res. 2003;31:3568–3571. doi: 10.1093/nar/gkg616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chasin L.A. Searching for splicing motifs. In: Blencowe B., Graveley B., editors. Alternative splicing in the postgenomic era. Landes Bioscience; Austin, TX: 2007. pp. 85–106. [Google Scholar]
  8. Chen I.T., Chasin L.A. Direct selection for mutations affecting specific splice sites in a hamster dihydrofolate reductase minigene. Mol. Cell. Biol. 1993;13:289–300. doi: 10.1128/mcb.13.1.289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Fairbrother W.G., Yeh R.F., Sharp P.A., Burge C.B. Predictive identification of exonic splicing enhancers in human genes. Science. 2002;297:1007–1013. doi: 10.1126/science.1073774. [DOI] [PubMed] [Google Scholar]
  10. Goren A., Ram O., Amit M., Keren H., Lev-Maor G., Vig I., Pupko T., Ast G. Comparative analysis identifies exonic splicing regulatory sequences—the complex definition of enhancers and silencers. Mol. Cell. 2006;22:769–781. doi: 10.1016/j.molcel.2006.05.008. [DOI] [PubMed] [Google Scholar]
  11. Heisenberg W. Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. Z. Phys. 1927;43:172–198. [Google Scholar]
  12. Hertel K.J., Maniatis T. The function of multisite splicing enhancers. Mol. Cell. 1998;1:449–455. doi: 10.1016/s1097-2765(00)80045-3. [DOI] [PubMed] [Google Scholar]
  13. Hiller M., Zhang Z., Backofen R., Stamm S. Pre-mRNA secondary structures influence exon recognition. PLoS Genet. 2007;3:e204. doi: 10.1371/journal.pgen.0030204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hwang D.Y., Cohen J.B. U1 small nuclear RNA-promoted exon selection requires a minimal distance between the position of U1 binding and the 3′ splice site across the exon. Mol. Cell. Biol. 1997;17:7099–7107. doi: 10.1128/mcb.17.12.7099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kanopka A., Muhlemann O., Akusjarvi G. Inhibition by SR proteins of splicing of a regulated adenovirus pre-mRNA. Nature. 1996;381:535–538. doi: 10.1038/381535a0. [DOI] [PubMed] [Google Scholar]
  16. Kuhlman B., Dantas G., Ireton G.C., Varani G., Stoddard B.L., Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003;302:1364–1368. doi: 10.1126/science.1089427. [DOI] [PubMed] [Google Scholar]
  17. Ladd A.N., Cooper T.A. Finding signals that regulate alternative splicing in the post-genomic era. Genome Biol. 2002;3:0008.1–0008.16. doi: 10.1186/gb-2002-3-11-reviews0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Lear A.L., Eperon L.P., Wheatley I.M., Eperon I.C. Hierarchy for 5′ splice site preference determined in vivo. J. Mol. Biol. 1990;211:103–115. doi: 10.1016/0022-2836(90)90014-D. [DOI] [PubMed] [Google Scholar]
  19. Ligr M., Siddharthan R., Cross F.R., Siggia E.D. Gene expression from random libraries of yeast promoters. Genetics. 2006;172:2113–2122. doi: 10.1534/genetics.105.052688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Liu H.X., Zhang M., Krainer A.R. Identification of functional exonic splicing enhancer motifs recognized by individual SR proteins. Genes & Dev. 1998;12:1998–2012. doi: 10.1101/gad.12.13.1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Mathews D.H. RNA secondary structure analysis using RNAstructure. In: Baxevanis A.D., et al., editors. Current Protocols in Bioinformatics. Wiley; Hoboken, NJ: 2006. pp. 12.6.1–12.6.14. [DOI] [PubMed] [Google Scholar]
  22. Mayeda A., Screaton G.R., Chandler S.D., Fu X.D., Krainer A.R. Substrate specificities of SR proteins in constitutive splicing are determined by their RNA recognition motifs and composite pre-mRNA exonic elements. Mol. Cell. Biol. 1999;19:1853–1863. doi: 10.1128/mcb.19.3.1853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Pozzoli U., Sironi M. Silencers regulate both constitutive and alternative splicing events in mammals. Cell. Mol. Life Sci. 2005;62:1579–1604. doi: 10.1007/s00018-005-5030-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ram O., Schwartz S., Ast G. Multifactorial interplay controls the splicing profile of Alu-derived exons. Mol. Cell. Biol. 2008;28:3513–3525. doi: 10.1128/MCB.02279-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Senapathy P., Shapiro M.B., Harris N.L. Splice junctions, branch point sites, and exons: Sequence statistics, identification, and applications to genome project. Methods Enzymol. 1990;183:252–278. doi: 10.1016/0076-6879(90)83018-5. [DOI] [PubMed] [Google Scholar]
  26. Sun H., Chasin L.A. Multiple splicing defects in an intronic false exon. Mol. Cell. Biol. 2000;20:6414–6425. doi: 10.1128/mcb.20.17.6414-6425.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Tanaka M., Sackmann E. Polymer-supported membranes as models of the cell surface. Nature. 2005;437:656–663. doi: 10.1038/nature04164. [DOI] [PubMed] [Google Scholar]
  28. Zhang X.H., Chasin L.A. Computational definition of sequence motifs governing constitutive exon splicing. Genes & Dev. 2004;18:1241–1250. doi: 10.1101/gad.1195304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Zhang X.H., Kangsamaksin T., Chao M.S., Banerjee J.K., Chasin L.A. Exon inclusion is dependent on predictable exonic splicing enhancers. Mol. Cell. Biol. 2005a;25:7323–7332. doi: 10.1128/MCB.25.16.7323-7332.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Zhang X.H., Leslie C.S., Chasin L.A. Computational searches for splicing signals. Methods. 2005b;37:292–305. doi: 10.1016/j.ymeth.2005.07.011. [DOI] [PubMed] [Google Scholar]
  31. Zhang X.H., Leslie C.S., Chasin L.A. Dichotomous splicing signals in exon flanks. Genome Res. 2005c;15:768–779. doi: 10.1101/gr.3217705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Zheng Z.M. Regulation of alternative RNA splicing by exon definition and exon sequences in viral and mammalian gene expression. J. Biomed. Sci. 2004;11:278–294. doi: 10.1159/000077096. [DOI] [PMC free article] [PubMed] [Google Scholar]