AT-AC Pre-mRNA Splicing Mechanisms and Conservation of Minor Introns in Voltage-Gated Ion Channel Genes (original) (raw)


Shortly after the discovery of split genes in 1977, a conserved sequence feature at both ends of cellular and viral introns was recognized, i.e., the presence of GT at the 5′ splice site and AG at the 3′ splice site, giving rise to the so-called GT-AG rule (8). This rule holds in most cases, but exceptions have been found. For example, GC is occasionally found at the 5′ end of certain introns (Table 1) (38). GC-AG introns are processed by the same splicing pathway as conventional GT-AG introns (3). It had long been assumed that removal of all introns from eukaryotic pre-mRNAs took place by the same splicing pathway, until recent developments demonstrated the existence of a second pre-mRNA splicing pathway.

TABLE 1.

Compilation of GC-AG intronsa

Gene product Intron no. 5′ splice site Size (nt)
SCA2c 9 AGgcaagt 6,600
Protein Zd 2 CTgcaagt NAb
AGLe 17 AGgcaaga 400
LAMA4f 7 AGgcatgg NA
s-Lamining 1 AGgcagag 86
HALh 20 AGgcaagc 2,000
MNKi 9 AGgcaagt 1,200
FAAj 16 AGgcaagc NA
FAA 37 AGgcaaga NA
FAA 41 AGgcaggt NA
CACNL1A3k 32 AGgcacgc 1,000
RYR1l 57 AGgcacgc 596
FE65m 10 AGgcacgg 144
XPGn 3 AGgcaaga NA
Ftnbo 13 AGgcaaga 3,100
Relnp 30 AGgcaagt 1,000
TFIIS.oAq 8 AGgcaagt NA
TGG1r 1 AGgcatgt NA

Intron 6 of the gene encoding human P120, a proliferating cell nucleolar antigen, and intron 7 of the gene encoding human CMP, a cartilage matrix protein (matrilin 1), were the first reported examples of introns with AT and AC at the intron ends, instead of GT and AG (38). Intron 6 of the gene encoding Rep-3, a DNA repair protein, and intron 2 of the gene encoding Prospero, a Drosophila melanogaster homeodomain transcription factor, also have AT-AC ends (28). In addition to their distinctive dinucleotide ends, these and other AT-AC introns have highly conserved 5′-splice-site and presumptive branch site 8-nucleotide sequence elements that are not present in the major class of introns, ATATCCTY and TCCTTRAY, respectively. On the basis of these sequence features, it was proposed that the minor U11 and U12 snRNAs, which have regions of complementarity to these elements, are required for splicing of AT-AC introns (28). Important aspects of this prediction were soon verified experimentally (29, 100), and two additional minor snRNAs involved in the novel pre-mRNA splicing pathway were discovered (101).

The AT-AC splicing pathway was originally named after the distinctive sequences of the intron ends (100). It was later found that a few introns with AT-AC ends are processed by the major pathway (120) and, conversely, that some introns with GT-AG ends are spliced by the minor pathway (18). Therefore, the name AT-AC no longer reflects the dinucleotide intron ends per se, but rather it refers to the minor pathway itself. An alternative designation for the two pathways—U2 dependent and U12 dependent—reflects their observed or expected requirements for one of the four snRNAs specific to each pathway (86).

DISTRIBUTION OF AT-AC INTRONS

AT-AC introns exist in a variety of organisms ranging from Arabidopsis thaliana to Drosophila, Xenopus laevis, and mammals (86, 102, 120). However, AT-AC introns are absent in budding yeast. There are no obvious structural relationships or common expression patterns among the genes or gene families that contain AT-AC introns. The lengths of known AT-AC introns range from less than 100 bases to more than 3,000 bases (120). The position of AT-AC introns is not conserved among unrelated genes. However, AT-AC introns are conserved phylogenetically and within gene families. For example, the last intron of the CMP gene is an AT-AC intron that is conserved in human, mouse, and chicken genes (4). Two genes with an AT-AC intron, mouse Rep-3 (a homologue of bacterial MutS) and XPG (a gene defective in xeroderma pigmentosum), are thought to be involved in DNA repair (120). The significance of the presence of AT-AC introns in DNA repair genes may be borne out as more sequences of DNA repair genes are determined. To date, three gene families have been noted to have AT-AC introns: the E2F transcription factor genes, the voltage-gated sodium and calcium channel α subunit genes (reference 120 and references therein), and the cartilage matrix protein (matrilin) family genes (4, 109). The voltage-gated ion channel genes are especially interesting because they have multiple minor introns; the conservation of minor introns in these genes is therefore reviewed in detail below.

IN VITRO AND IN VIVO SYSTEMS TO STUDY AT-AC PRE-MRNA SPLICING

Two different approaches have been used to study the mechanisms of AT-AC pre-mRNA splicing. The in vivo approach consists of transfecting wild-type or mutant minigenes containing an AT-AC intron and analyzing the splicing patterns after transient expression. Suppressor AT-AC snRNAs can be cotransfected with AT-AC intron mutants to test the functional significance of proposed base-pairing interactions between conserved intron elements and complementary regions in the snRNAs (29, 37, 45), as originally done for the conventional pathway (129).

The development of in vitro systems for AT-AC pre-mRNA splicing made it possible to begin to study the biochemical mechanisms of the reaction. To date, in vitro splicing conditions have been established for processing the AT-AC introns of pre-mRNAs from two different genes: the gene encoding P120 and that encoding SCN4A, the voltage-gated skeletal muscle sodium channel α subunit (100, 119). In vitro splicing of both P120 and SCN4A pre-mRNAs in HeLa cell nuclear extract results in generation of a lariat intermediate and release of the intron as a lariat. Therefore, AT-AC splicing occurs in two steps involving _trans_-esterification reactions similar to those of the major splicing pathway. The results are consistent in the two AT-AC in vitro splicing systems, but there is one major difference between them. Inactivation of U1 or U2 snRNAs in the nuclear extract is required to detect P120 AT-AC splicing (100; reviewed in reference 102). This observation led to the suggestion that U2 may compete with U12, which is about a hundred times less abundant (64), for binding to the AT-AC branch site (47). However, this is unlikely to be the case for SCN4A AT-AC splicing in vitro, which does not require inactivation of the major splicing pathway (119). In fact, SCN4A AT-AC splicing and cryptic splicing via conventional splice sites occur in the same reaction.

FOUR MINOR SNRNAS ARE REQUIRED FOR AT-AC PRE-MRNA SPLICING

AT-AC introns have unique and highly conserved 5′-splice-site and branch site elements, which are recognized by a unique set of minor snRNAs, U11, U12, U4atac, and U6atac. These snRNAs lack extensive sequence homology to the major snRNAs, but they appear to have related secondary structures, and more importantly, they play analogous roles in splice site recognition and perhaps in splicing catalysis. Despite the remarkable parallels, there are some significant differences between major and minor snRNAs. For example, U11 and U12 form a stable di-snRNP particle (111) that probably enters the AT-AC spliceosome as a single entity, whereas U1 and U2 are discrete snRNP mono-particles. Although the major U5 snRNA appears to be involved in both splicing pathways (100), this snRNA assembles onto the major spliceosome as part of a U4/U6 · U5 tri-snRNP particle, whereas an analogous U4atac/U6atac · U5 tri-snRNP particle has not yet been described. Notwithstanding the fact that the AT-AC and major spliceosomes have different snRNA constituents, the catalytic core of the AT-AC spliceosome is thought to resemble that of the major spliceosome (101; reviewed in references 9, 69, 70, and 102).

U11 and U12 snRNAs.

Human U11 and U12 are rare snRNAs that have Sm antigen binding sites but exhibit no sequence homology to other snRNAs (64). U11 and U12 snRNP particles presumably contain all the Sm core proteins. U12 interacts with a fraction of the more abundant U11 to form a di-snRNP complex (111). A 65-kDa protein of the U11/U12 complex, identified by virtue of its reactivity with a scleroderma patient antiserum has been described previously, although its sequence is not known (25). The predicted secondary structures of U11 and U12 snRNAs are similar to those of U1 and U2, respectively (64, 126). U11 and U12 localize in the nucleoplasm and are concentrated in coiled bodies and nuclear speckles, but they are excluded from nucleoli (58). This distribution is very similar to that of the major spliceosomal snRNAs U1 and U2 (reviewed in reference 89). U12 orthologues have been cloned from mouse, chicken, and frog (103, 126), species in which AT-AC introns are known to exist (reviewed in references 86, 102, and 120).

The role of U11 and U12 snRNAs in AT-AC splicing has been firmly established (29, 45, 100, 119, 125; reviewed in references 66, 70, and 102). U11 snRNA is present in in vitro-assembled P120 AT-AC spliceosomes (100) and can be cross-linked to the P120 AT-AC 5′ splice site (125). U11 also interacts with the P120 AT-AC 5′ splice site in vivo through base pairing (45). U12 snRNA is essential for AT-AC splicing in vitro and in vivo (29, 100, 119). U12 functions in the AT-AC splicing pathway by base pairing with the highly conserved branch site sequences (29, 100). Thus, U11 is analogous to U1 snRNA in the major pathway, whereas U12 is analogous to U2.

U11 and U12 snRNPs were also shown to be part of a negative regulator of splicing complex that inhibits splicing of Rous sarcoma virus pre-mRNA via the major pathway (26). This finding suggests a regulatory role for the U11/U12 di-snRNP particle in addition to its general role in AT-AC pre-mRNA splicing.

U4atac and U6atac snRNAs.

U4 and U6 snRNAs are not required for AT-AC splicing. U6 is highly conserved between yeasts and mammals and is thought to function at the catalytic core of the major spliceosome. This critical role of U6 in the conventional pathway suggested the existence of an analogous molecule for the AT-AC pathway (100). The spliceosomal U6 snRNA has a γ-monomethyl guanosine triphosphate (meGTP) cap structure. Several low-abundance snRNAs with a meGTP cap structure but otherwise structurally distinct from U6 were identified by immunoprecipitation with antibody specific for the meGTP cap structure (27). Using the same antibody, two novel minor snRNAs termed U4atac and U6atac were identified in affinity-purified AT-AC spliceosomes assembled in vitro on P120 pre-mRNA (101). U4atac has a trimethylguanosine cap but coprecipitated because of its tight association with U6atac, analogous to the interaction between U4 and U6. U4atac, like U4, is an Sm snRNA. U6atac, like U6, has a meGTP cap structure and terminates with a stretch of U residues, suggesting its transcription by RNA polymerase III. Both U4atac and U6atac were shown to be essential for in vitro splicing of the P120 AT-AC intron and subsequently also for the SCN4A AT-AC intron (101, 120).

The predicted secondary structure of the U4atac/U6atac snRNA complex is strikingly similar to that of the U4/U6 snRNA complex, although U4atac and U6atac exhibit only about 40% overall sequence homology to U4 and U6, respectively (101). In the central region of U6atac, the homology to U6 is much greater, about 80%, and this region includes sites that can be cross-linked to U12 with psoralen. U6atac, which has an AAGGAGA box at the region corresponding to the ACAGA box of U6, base pairs with the AT-AC 5′ splice site, replacing U11 (37, 125). The U6atac-U12 helix identified by psoralen cross-linking is analogous to a U6-U2 helix thought to be part of the major spliceosome active site, which strongly suggests that U6atac likewise functions at the catalytic core of the AT-AC spliceosome (101). U4atac appears to act as a chaperone for U6atac, similar to the role of U4 vis-à-vis U6 (101). U5 is apparently required for splicing of both AT-AC and GT-AG introns (100, 120). Whether U5 base pairs with AT-AC exon borders, as has been shown for conventional exon sequences (reviewed in reference 68), remains to be determined.

AT-AC SPLICE SITE RECOGNITION

The mechanisms by which conventional splice sites are selected with high fidelity in metazoan pre-mRNAs remain largely unknown (reviewed in reference 7). The short, degenerate splice site and branch site elements are clearly required for proper splice site recognition, but they are not sufficient (93). The arrangement, spacing, and sequence context of the splice sites probably also contribute to accurate splice site selection.

In the case of AT-AC introns, even though the 5′-splice-site and branch site sequences are highly conserved 8-nucleotide elements, they probably lack sufficient information to specify AT-AC splice site recognition. Genomic sequences contain many pairs of sequences that match the AT-AC 5′-splice-site and branch site elements, but probably only a few of these are authentic AT-AC intron elements. Therefore, the sequence complementarity between the minor snRNAs and the AT-AC 5′-splice-site and branch site sequences is not sufficient to accurately select AT-AC splice sites. As with conventional introns, additional mechanisms and/or auxiliary signals for AT-AC splicing may help identify the authentic splice sites. For example, the arrangement, spacing, and sequence context of AT-AC splice sites within the entire pre-mRNA, as well as the presence of intronic and exonic elements, are likely to contribute to accurate splice site selection.

Exon definition interactions between the AT-AC and major spliceosomes.

Most vertebrate genes have multiple introns, which are usually very large, whereas the exons are relatively small (30, 94). Sequences within large introns can match the degenerate splice site consensus elements, but they are not recognized by the spliceosome, at least in the presence of the wild-type splice sites. The exon definition model (reviewed in reference 6) proposes that in pre-mRNAs with large introns, the splicing machinery initially recognizes a pair of splice sites around a short exon and assembles on the exon; subsequently, neighboring exons are juxtaposed (80). Thus, the sequences within large introns need not be recognized. In lower eukaryotes, or in the case of small introns, a pair of splice sites at the ends of the short intron is directly recognized by the spliceosome, by an intron definition mechanism (97). Exon definition and intron definition are probably not mutually exclusive. Both types of interactions may occur simultaneously to facilitate the recognition of multiple splice sites.

The exon definition model predicts that in genes with multiple introns, exons and their flanking introns cannot both be large. Indeed, large introns are usually flanked by small exons and large exons are usually flanked by small introns (94, 128). Statistical analyses of the length of vertebrate internal regulated exons and primate internal exons showed that very large exons and very small exons are very rare (6, 91). Several lines of experimental evidence have given strong support to the exon definition model. First, mutation of a downstream 5′ splice site inhibits splicing of the upstream intron (97). Second, strengthening the downstream 5′ splice site increases the splicing efficiency of the upstream intron (49). Finally, joining a 5′ splice site to the end of the downstream exon increases the splicing efficiency of the upstream intron in vitro (48). Numerous cases of splice site mutations in vertebrate genes cause exon skipping rather than intron retention (46, 67). These phenomena are consistent with the exon definition model, although exon skipping can also be explained by _cis_-competition between splice sites or, in some cases, by instability of the retained-intron RNAs because of blocked mRNA export and/or nonsense-mediated mRNA decay. Evidence for the recognition of terminal exons also supports the exon definition model. Components that bind to the 5′ cap cooperate with the splicing machinery to facilitate the recognition of the first exon (52). Definition of the last exon involves cross talk between the splicing and polyadenylation machineries (71).

After exon definition, the correct exons must be ligated. The mechanisms responsible for the correct juxtaposition of exons are poorly understood. Members of the SR protein family, a group of essential pre-mRNA splicing factors with characteristic arginine-serine C-terminal repeats (RS domain) and one or two N-terminal RNA-recognition motifs, may play a bridging role for exon juxtaposition, since they can bind to exon sequences and also can interact with themselves (14, 92, 116). SR proteins can interact with U1-70K, a U1 snRNP-associated protein, and U2AF, a U2 auxiliary factor (116). It is thought that SR proteins can bridge partially assembled spliceosomes on neighboring exons through U1-70K and U2AF. Splicing factor 1 (SF1) may also play a bridging role for exon juxtaposition. SF1 interacts with both U2- and U1-associated factors (1, 78).

Because AT-AC introns always coexist with multiple major introns, the question arises whether exon definition interactions take place between the two different classes of introns. Indeed, in vitro splicing of the SCN4A AT-AC intron 2 is strongly stimulated when exon 3 is followed by the conventional 5′ splice site of intron 3. More importantly, the stimulatory effect is dependent on intact U1 snRNP (119, 121). Therefore, U1 bound at the downstream 5′ splice site interacts with the upstream AT-AC splicing machinery in vitro (Fig. 1). A 4-base deletion in the conventional 5′ splice site of SCN8A intron 3 results in complete skipping of exons 2 and 3 in vivo (43). This is probably due to disruption of exon definition interactions between the AT-AC intron 2 and the conventional intron 3. The fact that exon 2 does not join to exon 4 implies that an AT-AC 5′ splice site and a conventional 3′ splice site are incompatible, i.e., that the AT-AC 5′ spliceosomal components and the conventional 3′ spliceosomal components cannot be bridged by intron definition. Consistent with this idea, replacement of the SCN4A AT-AC intron 2 branch site element with a conventional branch site abrogates AT-AC splicing in vitro, resulting in stimulation of the conventional pathway via a pair of cryptic splice sites (118).

FIG. 1.

FIG. 1

Exon definition interactions between consecutive minor and major introns. Components of the minor and major spliceosomes interact across the intervening exon. The spliceosomes are denoted by the ellipses. U11 and U12 are shown bound to the AT-AC 5′ splice site and branch site, respectively, and U1 and U2 are shown bound to the conventional 5′ splice site and branch site, respectively. The interaction across the exon requires U1 snRNA base pairing at the conventional 5′ splice site. Bound U1 snRNP probably interacts indirectly with U12 snRNP components, perhaps through bridging by SR proteins. U11 and U12 form a di-snRNP particle, although it is not known if their interactions are maintained in the spliceosome.

Exon definition interactions between the major and minor spliceosomes presumably involve indirect contacts between the major and minor snRNPs (Fig. 1). Whether exon definition can also take place between a downstream AT-AC intron and an upstream conventional intron is not known. The question also arises how the terminal exons are defined in the context of AT-AC introns, which are occasionally the first or the last intron. For example, the AT-AC intron in the calcium channel CACNL1A4 gene is the first intron (73), whereas the AT-AC introns in the matrilin family genes (4, 109) and GT335, a gene of unknown function (50), are the last intron of their respective genes. Therefore, it will be of interest to determine whether terminal exons can be defined by interactions between the AT-AC splicing machinery with 5′-cap recognition components or the polyadenylation machinery.

Purine-rich enhancers contribute to AT-AC splice site recognition.

In addition to splice site interactions that take place via intron or exon definition, splice site selection can also be facilitated by exonic and/or intronic sequences. Splicing enhancers are elements that contribute to the recognition of authentic splice sites. The actual prevalence of these elements is not yet known. Purine-rich sequences characteristic of most natural exonic splicing enhancers (ESEs) characterized to date are present in a wide variety of cellular and viral genes in metazoans. Purine-rich sequences have also been shown to modulate plant 5′ splice-site selection (61). In metazoans, purine-rich ESEs are recognized by members of the SR protein family. Purine-rich enhancers are usually composed of GAR repeats (R represents a purine nucleotide) but not runs of G or A (35, 98). However, G-rich repeats typical of small intron sequences in vertebrate genes may facilitate the recognition of splice sites in small introns (60). Non-purine-rich splicing enhancers can also stimulate splicing (16, 53, 104, 105).

Some exons downstream of known AT-AC introns have purine-rich sequences that closely resemble those present in natural enhancers of the major splicing pathway (Table 2), raising the question of whether purine-rich sequences can act as AT-AC splicing enhancers. Indeed, heterologous purine-rich sequences that function as enhancers in the conventional pathway also strongly stimulate AT-AC splicing when placed in the context of a downstream exon that naturally lacks such sequences (121). The purine-rich sequences that are present in many exons flanking AT-AC introns may be natural AT-AC splicing enhancers, or they may influence splicing of both conventional and AT-AC introns on either side. It will be interesting to determine whether non-purine-rich exonic enhancers or intronic enhancers can also function in the AT-AC splicing pathway. The finding that purine-rich enhancers can function in the AT-AC splicing pathway suggests that the enhancer-specific functions of SR proteins are relevant to the AT-AC splicing pathway. If this is the case, an important question is whether SR proteins interact with overlapping or with distinct components of the major and minor spliceosomes to mediate splicing enhancement.

TABLE 2.

Exonic purine-rich sequences present downstream of known AT-AC intronsa

Gene product Exon no. Sequence
P120b 7 ...AGCGGGAGGAAGGGCGG...CTCAAGAAGGATCT...
CMPc 8 ...CCTGGAGAACAC...
GT335d 7 ...GGTGAGGAAGGTGCTGGAACTCACTGGAAAGTGA...
CDK5e 10 ...GACGAGGAACAA...
REP-3f 7 ...TACGAAGAAAAGGAGAACAT...GACAAAAAGAAGGGGAACCT...
HPSg 16 CTACCTGGAGGATTT...AACGAAAAGATGT...GGCAAGGGGCCC...
TFIIS.oAh 7 ...CCTGAGAAGAAATGT...

In the major splicing pathway, both the downstream 5′ splice site and purine-rich enhancers contribute to the selection of the upstream splice site and appear to involve the action of SR proteins (reviewed in references 6, 7, 10, and 57). In addition, the U1 snRNP has also been implicated in enhancer function: U1 snRNA has been cross-linked to a purine-rich enhancer (112) and is present in an in vitro-assembled enhancer complex (90, 92). SR proteins facilitate U1 binding to 5′ splice sites (21, 44, 127), and it follows that SR proteins bound to purine-rich enhancers may also recruit U1 (reviewed in references 10, 57, and 108). However, intact U1 snRNP is not required for the stimulation of AT-AC splicing by purine-rich enhancers in vitro (121). Therefore, enhancer function and exon definition are mechanistically different, at least in terms of a requirement for U1.

In the major pathway, ESEs and the downstream 5′ splice site facilitate recognition of the upstream 3′ splice site by specific interactions with SR proteins. These proteins apparently interact with the essential splicing factor U2AF, facilitating its binding to the 3′-splice-site polypyrimidine tract, while U2AF in turn promotes binding of U2 to the branch site (31, 130). The 3′ region of AT-AC introns differs from that of conventional introns: the former lack a polypyrimidine tract, and the branch site is much closer to the 3′ splice junction. These differences suggest that the AT-AC spliceosomal components involved in recognition of the 3′ splice site are different from those of the conventional spliceosome. Surprisingly, despite these differences, both a downstream 5′ splice site and purine-rich enhancers stimulate AT-AC splicing, suggesting that both contribute to AT-AC splice-site selection, as they do in the major splicing pathway. Although in the AT-AC splicing pathway both exonic enhancers and a downstream conventional 5′ splice site are also expected to interact with spliceosomal components at the upstream 3′ splice site, some of these components may be unique to the AT-AC splicing pathway.

A DIVERGENT SUBCLASS OF AT-AC INTRONS

In vitro studies showed that splicing of the SCN4A intron 2 occurs by the AT-AC splicing pathway, and this is probably also the case for the homologous introns in the voltage-gated sodium and calcium channel genes (119, 120). Interestingly, the SCN4A gene has another unusual intron, intron 21, which has 5′-AT and AC-3′ ends (24, 59, 120), as does the corresponding intron 25 of the SCN5A gene (110, 120). However, the 5′-splice-site and branch site elements of these two introns do not match the highly conserved AT-AC 5′-splice-site and branch site consensus sequences, suggesting that these divergent introns belong to a distinct subclass of AT-AC introns (119). Indeed, in vitro splicing of these two sodium channel introns, termed AT-AC II, requires the major U1, U2, U4, U5, and U6 spliceosomal snRNAs, rather than the minor AT-AC snRNAs (120). Other AT-AC II introns have previously been described in the chicken parvalbumin gene and in xylanase genes of several filamentous fungi (120).

The vast majority of introns have GT and AG boundaries at the intron 5′ and 3′ ends, respectively. The importance of the intron ends is underscored not only by in vitro and in vivo mutational analyses (3) but also by the fact that splice site mutations impair gene expression and cause numerous human genetic diseases (46, 67). On the other hand, previous mutational analyses showed that certain splice site mutations are compatible with accurate splicing. For example, in the yeast actin intron and a human tropomyosin intron, mutation of G to A at the first position or G to C at the last position compromises splicing; however, the double mutation at both ends of these introns allows accurate splicing in vivo, albeit less efficiently (13, 74, 85). The double mutations generate 5′-AT and AC-3′ intron boundaries. Although these observations preceded the discovery of a unique AT-AC splicing pathway, a possible explanation of these results is that the double-mutant pre-mRNAs were processed by the minor pathway. This appears unlikely, because the double-mutant intron sequences do not match the highly conserved AT-AC 5′-splice-site and branch site consensus (66). Although the snRNA requirements for splicing of the mutant introns have not been determined, it is likely that they are processed by the major pathway, by analogy to the natural AT-AC II introns, which require the major snRNAs (120). The mutational results were interpreted to suggest a non-Watson-Crick interaction between the first and last bases of the introns. Analysis of the splicing of introns with inosine inserted at the intron ends supports this model (85, 99). The non-Watson-Crick interaction between the intron ends probably exists in the natural AT-AC II introns and probably also exists in all AT-AC introns. On the other hand, the observed lack of specificity in selection of the last nucleotide of a yeast intron argues against a direct interaction between the first and last bases of introns (55).

Although the splice site sequences and the positions of the SCN4A AT-AC II intron 21 and SCN5A AT-AC II intron 25 are conserved, the lengths of these two introns are different. The longer SCN4A AT-AC II intron 21 contains an Alu repeat insertion at nucleotides 358 to 660 (numbering as per GenBank entry AF007782) (117). Alu sequences are repetitive elements that are unique to primates and are thought to be derived from the 7SL RNA (reviewed in reference 62). The functions of Alu sequences are unclear, although in some cases they can influence gene expression. Alu sequences contain several regions that differ from either 5′ or 3′ splice site consensus elements by only one or two nucleotides. When Alu sequences are present within conventional introns, they can have dramatic effects on splicing (reviewed in reference 56). Point mutations can activate cryptic splice sites in intronic Alu elements and result in abnormal protein formation and clinical disease, e.g., Alport syndrome and ornithine delta-aminotransferase deficiency (41, 63). Mutations in the SCN4A gene cause hyperkalemic periodic paralysis and paramyotonia congenita (reviewed in reference 32). Whether there are natural mutations of this gene that involve activation of cryptic splice sites within the Alu insertion in intron 21 or whether this Alu insertion has any functional consequences remains to be seen.

U12-TYPE GT-AG INTRONS

AT-AC II introns have 5′-AT and AC-3′ boundaries, yet their splicing requires the major spliceosomal snRNAs (120). Conversely, some introns that have 5′-GT and AC-3′ boundaries are spliced by the minor spliceosomal snRNAs (18). With the exception of the first and last nucleotides, which are both G, the sequences of these introns match the AT-AC splice site and branch site consensus elements (Tables 3 and 4). This subclass of GT-AG introns has been termed U12-type GT-AG introns (86). Recent compilations showed that U12-type GT-AG introns are more prevalent than AT-AC introns (18, 86). Additional examples of this type of intron are shown in Tables 3 and 4. Interestingly, XPG- and CDK5-encoding genes contain both an AT-AC intron and a U12-type GT-AG intron (54, 72). Likewise, the members of the voltage-gated ion channel α subunit gene family have several unconventional introns (see below). Intron 2 of the gene encoding human CACNLB3, the voltage-gated calcium channel β subunit, also belongs to the U12-type GT-AG intron class.

TABLE 3.

Compilation of U12-type GT-AG intronsa

Gene product Intron no. 5′ splice site Presumptive branch site 3′ splice site Distance (nt) Size (nt)
MDCc 11 AGgtatcctc ... ctcac ccctcagGG 9 109
HALd 3 CTgtatcctt ... NAb ctggatacagAT NA 100
HAL 6 AGgtatcttt ... NA tttattctagTT NA 100
HAL 15 CTgtatcttt ... NA cttttgaaagAT NA 3,000
MNKe 11 AGgtatttat ... atgttaac ttatatccagTG 12 1,000
BMP1f 15 AGgtatcc ... NA ttctgttgctccagTC NA NA
BMP1 15 ... NA ctctctcgtttcagAA NA NA
BMP1 15 ... NA ttgctcccctgcagAG NA NA
CACNL1A1g 15 CTgtatcctt ... NA acacaaacagAT NA 1,600
CACNL1A2h 16 CTgtatcctt ... cccttaaa aagttgaaattagAT 15 3,400
CACNL1A3i 13 CTgtatcctc ... tccttagc taaacccgctcagAC 15 900
CACNL1A4j 16 CTgtatcctt ... tcctgact cagacatttgcagAC 15 155
CACNLB3k 2 AGgtatactt ... gcactaat gggcaaattctccagCA 17 216
MSH5l 6 TGgtatctcc ... ccctcaaa tagGT 5 733
Relnm 27 AGgtatct ... tccttcac tagCT 5 2,000
SCN10An 8 AGgtatctt ... cccttgaa tctccagAC 9 300
XPGo 1 GGgtatcctt ... tcctttac tggttcccccagAT 14 NA
CDK5p 6 GGgtatctgt ... NA ctcccctcagAA NA 470

TABLE 4.

Minor intron conservation in voltage-gated sodium and calcium channel α subunit genesa

Gene product Intron no. 5′ splice site Presumptive branch site 3′ splice site Distance (nt) Size (nt)
SCN4Ac 2 GCatatcctg ... tccttgac cctgccccacGC 12 126
SCN5Ad 3 TCatatcc ... NAb cccacgcacGC NA NA
SCN8Ae 2 TCatatcctt ... cccttaac tcctctctacAG 12 NA
SCN10Af 2 TCgtgtcct ... tccttaac atggacctcacagCT 15 4200
CACNL1A1g 2 AAatatcctt ... tccttgac tccctttctcagAC 14 5000
CACNL1A2h 2 AAgtatcctt ... accttaac acattttttcagAC 14 4800
CACNL1A3i 1 AAgtatcctt ... tccttaac cctgctccagGC 12 600
CACNL1A4j 1 CCatatcctt ... tccttaat tccccaatacTC 12 NA

OTHER NATURAL INTRON BOUNDARIES

AT-AA introns.

Previous studies showed that a G-to-A mutation at the last position of a major class intron can partially suppress a G-to-A mutation at the first position of the intron in vivo (11, 74). To date, three natural AT-AA introns are known: intron 6 of the gene encoding the DNA excision repair protein hMSH3 (113); intron 7 of the gene encoding Arabidopsis AtG5 (115); and intron 3 of the gene encoding pig succinyl-coenzyme A (CoA) synthetase (82) (Table 5). Intron 6 of the gene encoding hMSH3 and intron 7 of the gene encoding AtG5 have the AT-AC 5′-splice-site consensus. Intron 6 of the gene encoding hMSH3 also has the AT-AC branch site consensus and is the homologous intron of the AT-AC intron 6 of the Rep-3 gene. Therefore, this intron is almost certainly processed via the AT-AC pathway. The corresponding intron 6 of the related hMSH5 gene is a U12-type GT-AG intron (Table 3). Interestingly, in alternative splicing of hMSH2, another related DNA repair gene, intron 12 was reported to have unusual TA-TT ends (65). Although the AT-AA intron 7 of the gene encoding AtG5 may be processed by the AT-AC pathway, this assumption needs to be verified experimentally, since the intron lacks a good match to the AT-AC branch site consensus. Intron 3 of the gene encoding pig succinyl-CoA synthetase is alternatively spliced and lacks the AT-AC 5′-splice-site and branch site consensus. Therefore, splicing of this intron likely requires the major snRNAs.

TABLE 5.

Compilation of known AT-AA intronsa

Gene product Intron no. 5′ splice site Presumptive branch site 3′ splice site Distance (nt) Size (nt)
hMSH3c 6 AGatatcctt ... tctttaat tattattaaAT 11 2,000
AtG5d 7 AAatatcctt ... atattaac caaggcttaaGT 12 138
SCSe 3 TGataagt ... tttttgaa ttctccttctttaaGG 16 NAb

AT-AG introns.

Mutation of the first nucleotide from G to A of a major-class intron inhibits the second step of splicing in vitro (13, 74, 85). Many natural mutations giving rise to various genetic diseases consist of G-to-A mutation at the first position of an intron, with the consequent impairment of gene expression (46, 67). For example, this type of mutation in intron 1 of the human β-globin gene prevents normal splicing, activates three cryptic 5′ splice sites, and causes thalassemia (106); in intron 5 of the human adenosine deaminase gene, it results in severe immunodeficiency (84); in intron 5 or intron 18 of the human myosin VIIA gene, it causes Usher syndrome type IB (2). These in vitro and in vivo observations suggested that 5′-AT and AG-3′ boundaries within the same introns are incompatible. This is likely to be true in the context of the major splicing pathway, so the few natural examples of AT-AG introns are likely to be processed via the minor splicing pathway.

Three natural introns have 5′-AT and AG-3′ ends: intron 38 of the myosin VIIa gene (40, 51), which codes for a long-tailed unconventional myosin; intron 2 of CACNL1A1 (87); and intron 24 of SCN10A (88). Intron 38 of the myosin VIIa gene has the 5′-splice-site and presumptive branch site sequences ATATCCGT and TCCTTGAC, respectively, as independently reported by two different groups (40, 51). These sequences are close matches to the AT-AC consensus elements. The introns corresponding to intron 2 of CACNL1A1 in other voltage-gated calcium channel α subunit genes are either of the AT-AC class or of the U12-type GT-AG class. Thus, splicing of all three types of introns presumably requires the minor snRNAs, although this assumption needs to be confirmed experimentally. Splicing via the minor pathway would account for the tolerance to 5′-AT and AG-3′ intron ends (see below).

DIFFERENTIAL TOLERANCE TO MUTATION OF THE LAST INTRON NUCLEOTIDE IN AT-AC AND CONVENTIONAL SPLICING

Most natural AT-AG and AT-AA introns have close matches to the highly conserved AT-AC 5′-splice-site and branch site consensus sequences. The distances between the branch site and the 3′-splice junction are all very short. Therefore, it appears that the AT-AC spliceosome tolerates changes in the last intron nucleotide, which can be either C, A, or G. Indeed, a recent in vivo mutational analysis of the last nucleotide of the AT-AC intron in the P120 gene showed that although C-to-G mutation results in activation of a pair of conventional cryptic splice sites, accurate splicing via the AT-AC pathway can still occur (18).

The 5′-AT and AG-3′ boundaries of the CACNL1A1 intron 2 may represent an evolutionary transition between the AT-AC introns in the voltage-gated sodium channel α subunit genes and the U12-type GT-AG introns within the homologous domain I of the voltage-gated calcium channel α subunit genes. Intron 2 of mouse SCN10A, which has a GTGTCC 5′ splice site and a canonical AT-AC branch site (Table 4), may represent a further evolutionary transition from a U12-type GT-AG intron to a conventional intron. The 5′-AT and AA-3′ boundaries of hMSH3 intron 6 represent a drift from the corresponding AT-AC intron 6 of the homologous mouse Rep-3 gene. The 5′-GT and AG-3′ boundaries of hMSH5 represent a further drift. The major spliceosome has a more stringent sequence requirement for the last intron nucleotide, which can only be a G, unless the first nucleotide is simultaneously mutated. Conversely, in AT-AC II introns, the last nucleotide presumably has to be a C, unless the first nucleotide changes simultaneously. It is probably difficult for an AT-AC II intron to evolve into a conventional intron, or vice versa, because this would require simultaneous mutations at both ends of the intron. In contrast, AT-AC introns may more easily evolve gradually into conventional introns. The last nucleotide, C, can be first mutated to G, then the first nucleotide, A, can be mutated to G, and splicing would still be catalyzed by the minor spliceosome. Then the +5 position of the 5′ splice site can change from C to G. This last mutation may be sufficient for switching from the minor splicing pathway to the major splicing pathway and would likely constitute an irreversible switch, since AT-AC introns have highly conserved 5′-splice-site and branch site elements, whereas conventional introns have much more degenerate elements.

COMMITMENT TO A SPECIFIC SPLICING PATHWAY

The dinucleotides at the intron ends are not responsible for commitment of a pre-mRNA to the major or the minor splicing pathway (18, 120). Likewise, the highly conserved AT-AC branch site consensus sequence (UCCUURAY) recognized by U12 snRNA is probably not the major determinant of commitment to the minor pathway, since it also matches the degenerate consensus for conventional branch sites (YNYURAY), and therefore it should also be recognized by the abundant U2 snRNA. Intron 7 of the xylanase xylP gene in the filamentous fungus Penicillium chrysogenum has a sequence identical to the AT-AC branch site consensus, but it is probably processed via the major splicing pathway (120). It is likely that the highly conserved AT-AC 5′ splice site, which is recognized by U11 and U6atac snRNAs, is the major determinant of commitment to the AT-AC splicing pathway. An AT-AC branch site sequence is of course also required but is probably compatible with either pathway. The lack of an extensive polypyrimidine tract and the short distance between the branch site and the 3′ splice site in AT-AC introns probably also contribute to their specific commitment to the AT-AC splicing pathway.

MULTIPLE MINOR INTRONS IN VOLTAGE-GATED ION CHANNEL GENES

The voltage-gated sodium channel and calcium channel α subunit genes belong to the voltage-gated ion channel gene superfamily. Mutations in these genes cause numerous neuromuscular and neurological diseases (reviewed in reference 19). The genes have four internal homologous domains (I to IV) that are thought to arise from two rounds of duplication from a single ancestral gene (Fig. 2) (12, 42). Interestingly, these genes have two or sometimes three nonconsensus introns (Fig. 2; Tables 3, 4, and 6). Multiple sequence alignments of the different family members show that these introns interrupt the genes at exactly homologous positions of the coding sequence (Fig. 3).

FIG. 2.

FIG. 2

Topology of voltage-gated ion channel proteins and positions of the unusual introns. The diagram shows the proposed topology of the voltage-gated sodium and calcium channel α subunit protein families (12, 42). Domain repeats I through IV (DI through DIV) are indicated at the top. Within each domain repeat, the six membrane-spanning segments are shown in blue, and the connecting loops are shown in gray. The positions of unusual introns that interrupt the coding sequences of the corresponding genes are indicated. For some of these introns, the first and last nucleotides are not invariant among different family members (Tables 4 and 6). The conservation of the position of the AT-AC introns (yellow rectangles) in domain I of the sodium and calcium channel genes is indicative of the presence of this intron prior to the divergence of these two gene families. The U12-type GT-AG introns (red circles) are present at different positions of the sodium and calcium channel genes. The AT-AC II intron (green square) is only present in the sodium channel genes. Some family members have lost one or more of these introns or have conventional introns at the same position. Each type of intron interrupts a different part of the domain repeat coding sequence, and hence each intron at the four indicated locations in the voltage-gated ion channel genes probably arose independently.

TABLE 6.

AT-AC II intron conservation in voltage-gated sodium channel α subunit genesa

Gene product Intron no. 5′ splice site Presumptive branch site 3′ splice site Distance (nt) Size (nt)
SCN4Ac 21 AGatgagtat ... acctgac ... ccactatacTT 21 822
SCN5Ad 25 AGatacgtgg ... ctctgag ... tctttgcacTT 37 680
SCN8Ae 21 AGataggtct ... NAb cctcctttacacTT NA NA
SCN10Af 24 AGataagtg ... cgttaat tcctcccccctagTT 15 900

FIG. 3.

FIG. 3

Conserved positions of unusual introns in voltage-gated ion channels. Multiple protein sequence alignments of the relevant regions of the voltage-gated sodium and calcium channel α subunits are shown. The complete amino acid sequences encoded by the flanking exons were aligned by using the Genetics Computer Group Pileup program, with a gap creation penalty of 12 and a gap extension penalty of 4 (23). The positions of the unusual introns in the corresponding genes are indicated by the vertical black bar. Amino acid identities in more than half of the sequences are indicated by gray shading. Complete exon sequences are shown, except where indicated. For each alignment, the intron position is also conserved at the nucleotide level (117). (A) Conserved positions of the minor AT-AC introns in eight ion channel genes. Translations of the 5′ portions of the calcium channel upstream exons are not shown in the alignment. The human SCN5A sequence is from reference 110; human SCN4A, mouse SCN8A, mouse SCN10A, and human CACNL1A1, CACNL1A2, CACNL1A3, and CACNL1A4 have GenBank accession no. L04216, U59964, Y09108, AJ224873, D43706, U30666, and X99897, respectively. (B) Conserved positions of the minor U12-type GT-AG introns in four calcium channel genes. Human CACNL1A1, CACNL1A2, CACNL1A3, and CACNL1A4 sequences have GenBank accession no. AJ224873, D43718, U30677, and X99897, respectively. (C) Conserved positions of the AT-AC II introns in four sodium channel genes. Human SCN5A and SCN8A sequences are from references 110 and 77, respectively; human SCN4A and mouse SCN10A sequences have GenBank accession no. L04233 and Y09108, respectively.

SCN4A, SCN5A, and SCN8A, the voltage-gated sodium channel α subunit in skeletal muscle, cardiac muscle, and brain and spinal cord, respectively, have two rare introns: an AT-AC intron and an AT-AC II intron (43, 77, 120; Fig. 2). The AT-AC intron interrupts the coding sequence of internal domain I, while the AT-AC II intron is located between domains III and IV. Interestingly, the corresponding AT-AC intron 2 of the mouse gene encoding SCN10A, the sensory neuron voltage-gated sodium channel α subunit, has the sequence GTGTCCT at the 5′ splice site and AG as the 3′-splice-site dinucleotide (Table 4) (88). The voltage-gated calcium channel α subunit genes also have an unusual intron at precisely the same position within domain repeat I (Fig. 2 and 3A). However, these introns have different boundaries (Table 4): intron 2 of the gene encoding CACNL1A1, the fibroblast L-type voltage-gated calcium channel α subunit, has 5′-AT and AG-3′ boundaries; intron 1 of the gene encoding CACNL1A4, the brain-specific P/Q-type voltage-gated calcium channel α subunit, has 5′-AT and AC-3′ boundaries; and intron 2 of the gene encoding CACNL1A2, the pancreatic voltage-gated calcium channel α subunit, and intron 1 of the gene encoding CACNL1A3, the skeletal muscle voltage-gated calcium channel α subunit, have 5′-GT and AG-3′ boundaries. All eight introns have the highly conserved AT-AC branch site element. The same kind of intron is expected to be present in other voltage-gated sodium and calcium channel α subunit genes and to be processed via the AT-AC splicing pathway (120). Splicing of an intron of the ADP ribose polymerase gene, which has the features of U12-type GT-AG introns, requires the minor U12 and U6atac snRNAs (18). It is therefore very likely that the GT-AG introns in CACNL1A2 (CACN4) and CACNL1A3 are also processed via the minor pathway.

Interestingly, the CACNL1A2 and CACNL1A3 genes have a second minor intron—a U12-type GT-AG intron—which falls within domain repeat II (Table 3; Fig. 2). This type of intron also has a precisely conserved position in all four known calcium channel α subunit gene sequences (Fig. 3B). These observations led to the prediction that analogous U12-type GT-AG introns may exist in other voltage-gated calcium channel α subunit genes. Alignment of the sequences of domain repeats I and II shows that the two minor introns do not interrupt a homologous position. The different location of these introns within their respective domain repeats suggests that they arose independently.

The SCN10A gene is also unusual in having three rare introns out of a total of 26 introns (Fig. 2): the U12-type GT-AG introns 2 and 8 (Tables 3 and 4) and the AT-AC II intron 24 (Table 6). The SCN4A intron 8 and SCN5A intron 9, which correspond to SCN10A intron 8, are conventional GT-AG introns. The voltage-gated calcium channel α subunit genes lack an intron at the corresponding position. This sequence comparison strongly suggests a shift from a U12-type GT-AG intron to a conventional GT-AG intron by mutational drift after the divergence of sodium and calcium channels from a common ancestral gene.

The SCN4A, SCN5A, SCN8A, and SCN10A genes are paralogues and belong to the voltage-gated sodium channel α subunit gene family. The genomic organization of these four genes is very similar (77, 88, 110). SCN4A, SCN5A, and SCN8A have an AT-AC II intron at a homologous position (Table 6). Interestingly, the corresponding intron 24 of SCN10A has 5′-AT and AG-3′ boundaries (Table 6) (88). All four introns interrupt a homologous position of the coding sequence of their respective gene (Fig. 3C). However, the voltage-gated calcium channel α subunit genes lack an intron at the homologous position (Fig. 2). Therefore, similar introns may exist in other members of the voltage-gated sodium channel α subunit genes but not in the voltage-gated calcium channel α subunit genes. Because splicing of the corresponding AT-AC II introns of SCN4A and SCN5A requires the major snRNAs, this is also expected to be the case for SCN8A intron 21 and SCN10A intron 24. The 5′-AT and AG-3′ boundaries of SCN10A intron 24 may represent an evolutionary transition between the corresponding AT-AC II introns and the major GT-AG introns.

In summary, the voltage-gated sodium and calcium α subunit genes, which are derived from a common ancestral gene, show an unusually high frequency of noncanonical introns. The distribution of introns among the two gene families and individual genes, the position of the introns within domain repeats of the coding sequence, and the patterns of sequence divergence at the intron ends provide interesting information about the evolutionary history of minor introns.

CONCLUSION

Although the first examples of AT-AC intron sequences were discovered only recently (28, 38, 100), a considerable amount of information about the AT-AC pre-mRNA splicing pathway has already been obtained. It is now known that AT-AC introns are widespread in higher eukaryotes (86, 102, 120). The removal of AT-AC introns from pre-mRNA requires the four minor U11, U12, U4atac, and U6atac snRNAs, which play roles analogous to those of U1, U2, U4, and U6 snRNAs in the major splicing pathway, and the major U5 snRNA, which is the only snRNA component shared with the major spliceosome (reviewed in references 70 and 102). Both exon definition interactions and purine-rich exonic enhancers stimulate AT-AC splicing in vitro (119, 121). Therefore, they may contribute to AT-AC splice site recognition in vivo.

Introns exist in the majority of eukaryotic cellular or viral genes. The mechanisms of splicing, including both catalysis and splice site selection, and the regulation of splicing are incompletely understood at present. However, the identification and characterization of some of the many protein factors involved in splicing continue to provide important clues about splicing and splice site selection mechanisms. The unexpected discovery of the AT-AC splicing pathway provides additional challenges and opportunities for understanding splicing mechanisms and specificity. Whereas the snRNAs involved in AT-AC splicing have been identified, no information is currently available about snRNP and non-snRNP proteins involved in this pathway. It will be especially interesting to determine whether some protein components are shared between the two pathways, and/or whether novel components—which may or may not resemble splicing factors in the conventional pathway—are uniquely required for AT-AC splicing. Determining which components are involved in interactions between the two pathways, such as in exon definition, should also be an important priority. Identification and characterization of these components should provide interesting insights into the mechanisms, regulation, and evolution of pre-mRNA splicing.

ADDENDUM

Shortly after this review was written, Burge et al. provided an extensive classification of U12-type and U2-type introns on the basis of a statistical analysis of 5′-splice-site and branch site sequences (9a). They describe instances of apparent intron conversion from U12 type to U2 type during evolution by examining introns of homologous or paralogous genes, and they discuss models for the evolution of the major and minor spliceosomes.

Comparison of the cDNA and genomic sequences (GenBank no. AF051782, AC005366, and AC005368) of HDIA1 (human diaphanous 1), a nonsyndromic deafness susceptibility gene, reveals the presence of two AT-AC introns and one GC-AG intron (118).

ACKNOWLEDGMENTS

We thank M. Hastings, M. Murray, and B. Graveley for helpful comments on the manuscript and M. Meisler for sharing unpublished sequence information.

The work on AT-AC pre-mRNA splicing in our laboratory is supported by grant GM42699 from the NIH to A.R.K.

REFERENCES