Analysis of canonical and non-canonical splice sites in mammalian genomes - PubMed (original) (raw)
Analysis of canonical and non-canonical splice sites in mammalian genomes
M Burset et al. Nucleic Acids Res. 2000.
Abstract
A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.
Figures
Figure 1
Structure and classification of spliced constructs. (a) Structure of spliced constructs. Two sequence regions of a splice pair (marked as Donor and Acceptor) with the corresponding splice site dinucleotides surrounded by 40 bp of gene sequence at each side. Joining exon part of donor (ExonL) and exon part of acceptor (ExonR) we produce a sequence of splice construct to be verified by ESTs. (b) EST alignment classification. After obtaining EST and splice construct alignments, every match was classified as D-end (EST covers only the donor part), A-end (EST covers only the acceptor part), B-ends (EST covers a splice junction without mismatches) or Error (EST covers the junction with mismatches).
Figure 2
Examples of possible ambiguities in supported by EST splice pairs. (a) Homo sapiens Telethonin gene, intron 1 (AJ010063). An example of annotated non-canonical junction supported by EST. The same EST can also support a canonical splice junction. The annotated non-canonical junction and the putative canonical one produce the same spliced sequence. (b) Homo sapiens FUS gene, intron 14 (X99001). An example of annotated non-canonical junction supported by EST. Another EST supports a closely located canonical splice junction. In this case the EST-supported putative spliced sequence differs by 2 nucleotides (gg) from the annotated one.
Figure 3
Shifted splice sites. Examples of GG-AG verified splice pairs (11 cases). In donor sites (exactly after the cut point) a GG pair is always found. To decide to which type of splicing pair we should assign these non-canonical examples we checked all closely located standard dinucleotides. They are found shifted by 1 nucleotide downstream. We reclassify the presented splice pairs as nine canonical GT-AG, one GC-AG and one GA-AG site.
Figure 4
Analysis of EST-supported non-canonical splice site groups. (a) Classification. Analyzing all EST-verified non-canonical splice pairs and taking into account cases with shifted canonical consensus this classification has been produced. Practically all splice pairs have only one non-canonical splice dinucleotide. (b) Table of possible splice pairs. After generalization we have obtained only seven non-canonical splice pair groups and a total of eight groups if we include the canonical splice pairs. The first (top) part of the right figures shows canonical donor site combined with all observed variations of acceptor site (GT-AG, GT-CG and GT-TG). The second (middle) part shows AT-AC group and hybrid pairs (GT-AC, AT-AC and AT-AG). The third (bottom) part shows canonical acceptor site combined with all observed variations of donor site (GA-AG, GC-AG and GT-AG).
Figure 5
Comparison of human GenBank sequences and available HTGs.
Figure 6
Small annotated and EST-supported non-canonical splice pair groups (without shifted dinucleotides).
Similar articles
- SpliceDB: database of canonical and non-canonical mammalian splice sites.
Burset M, Seledtsov IA, Solovyev VV. Burset M, et al. Nucleic Acids Res. 2001 Jan 1;29(1):255-9. doi: 10.1093/nar/29.1.255. Nucleic Acids Res. 2001. PMID: 11125105 Free PMC article. - Information for the Coordinates of Exons (ICE): a human splice sites database.
Chong A, Zhang G, Bajic VB. Chong A, et al. Genomics. 2004 Oct;84(4):762-6. doi: 10.1016/j.ygeno.2004.05.007. Genomics. 2004. PMID: 15475254 - Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes.
Pucker B, Brockington SF. Pucker B, et al. BMC Genomics. 2018 Dec 29;19(1):980. doi: 10.1186/s12864-018-5360-z. BMC Genomics. 2018. PMID: 30594132 Free PMC article. - Splicing mutations in inherited retinal diseases.
Weisschuh N, Buena-Atienza E, Wissinger B. Weisschuh N, et al. Prog Retin Eye Res. 2021 Jan;80:100874. doi: 10.1016/j.preteyeres.2020.100874. Epub 2020 Jun 15. Prog Retin Eye Res. 2021. PMID: 32553897 Review. - Genomic sequence, splicing, and gene annotation.
Mount SM. Mount SM. Am J Hum Genet. 2000 Oct;67(4):788-92. doi: 10.1086/303098. Epub 2000 Sep 8. Am J Hum Genet. 2000. PMID: 10986039 Free PMC article. Review. No abstract available.
Cited by
- Characterization of the myostatin gene in the neotropical species Piaractus mesopotamicus and the possibility of its use in genetic improvement programs.
Lattanzi GR, Dias MAD, Hashimoto DT, Costa AC, Neto SD, Pazo FD, Diaz J, Villanova GV, Reis Neto RV. Lattanzi GR, et al. Mol Biol Rep. 2024 Oct 10;51(1):1048. doi: 10.1007/s11033-024-09960-1. Mol Biol Rep. 2024. PMID: 39388010 - Human cells contain myriad excised linear intron RNAs with links to gene regulation and potential utility as biomarkers.
Yao J, Xu H, Ferrick-Kiddie EA, Nottingham RM, Wu DC, Ares M Jr, Lambowitz AM. Yao J, et al. PLoS Genet. 2024 Sep 26;20(9):e1011416. doi: 10.1371/journal.pgen.1011416. eCollection 2024 Sep. PLoS Genet. 2024. PMID: 39325823 Free PMC article. - Novornabreak: Local Assembly for Novel Splice Junction and Fusion Transcript Detection from RNA-Seq Data.
Tan Y, Mohanty V, Liang S, Dou J, Ma J, Kim KH, Bonder MJ, Shi X, Lee C; Human Genome Structural Variation Consortium; Chong Z, Chen K. Tan Y, et al. J Bioinform Syst Biol. 2023;6(2):74-81. doi: 10.26502/jbsb.5107050. Epub 2023 Apr 4. J Bioinform Syst Biol. 2023. PMID: 39301431 Free PMC article. - Splam: a deep-learning-based splice site predictor that improves spliced alignments.
Chao KH, Mao A, Salzberg SL, Pertea M. Chao KH, et al. Genome Biol. 2024 Sep 16;25(1):243. doi: 10.1186/s13059-024-03379-4. Genome Biol. 2024. PMID: 39285451 Free PMC article. - Impact of genome build on RNA-seq interpretation and diagnostics.
Ungar RA, Goddard PC, Jensen TD, Degalez F, Smith KS, Jin CA; Undiagnosed Diseases Network; Bonner DE, Bernstein JA, Wheeler MT, Montgomery SB. Ungar RA, et al. Am J Hum Genet. 2024 Jul 11;111(7):1282-1300. doi: 10.1016/j.ajhg.2024.05.005. Epub 2024 Jun 3. Am J Hum Genet. 2024. PMID: 38834072
References
- Breathnach R. and Chambon,P. (1981) Annu. Rev. Biochem., 50, 349–393. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous