A non-EST-based method for exon-skipping prediction - PubMed (original) (raw)

Comparative Study

. 2004 Aug;14(8):1617-23.

doi: 10.1101/gr.2572604.

Affiliations

Comparative Study

A non-EST-based method for exon-skipping prediction

Rotem Sorek et al. Genome Res. 2004 Aug.

Abstract

It is estimated that between 35% and 74% of all human genes can undergo alternative splicing. Currently, the most efficient methods for large-scale detection of alternative splicing use expressed sequence tags (ESTs) or microarray analysis. As these methods merely sample the transcriptome, splice variants that do not appear in deeply sampled tissues have a low probability of being detected. We present a new method by which we can predict that an internal exon is skipped (namely whether it is a cassette-exon) merely based on its naked genomic sequence and on the sequence of its mouse ortholog. No other data, such as ESTs, are required for the prediction. Using our method, which was experimentally validated, we detected hundreds of novel splice variants that were not detectable using ESTs. We show that a substantial fraction of the splice variants in the human genome could not be identified through current human EST or cDNA data.

Copyright 2004 Cold Spring Harbor Laboratory Press ISSN

PubMed Disclaimer

Figures

Figure 1

Figure 1

Graphic representation of the differences between alternative and constitutive exons. For each of the following curves, constitutive exons are in squares, and alternatives are in diamond shapes. (A) Length of conserved region in the nearest 100 nt of the flanking upstream intron. _x_-axis, length of conserved region (best Sim4 local alignment); _y_-axis, percent exons with upstream conserved region greater than or equal to the value in x. Conservation was detected using local alignment with the mouse 100 counterpart intronic nt. A minimum hit was 12 consecutive perfectly matching nt. (B) Length of conserved region in the nearest 100 nt of the flanking downstream intron. Axes as in A.(C) Exon size distribution. _x_-axis, exon size; _y_-axis, percent exons having size lesser or equal to the size in x. (D) Human–mouse exon identity. _x_-axis, percent identity in the global alignment of the human and the mouse exons; _y_-axis, percent exons with identity greater or equal to the value in x. (E) Human–mouse exon identity, for exons whose size is a multiple of 3. Axes as in D. Note that by combining two features we get better separation of the two exon-types.

Figure 1

Figure 1

Graphic representation of the differences between alternative and constitutive exons. For each of the following curves, constitutive exons are in squares, and alternatives are in diamond shapes. (A) Length of conserved region in the nearest 100 nt of the flanking upstream intron. _x_-axis, length of conserved region (best Sim4 local alignment); _y_-axis, percent exons with upstream conserved region greater than or equal to the value in x. Conservation was detected using local alignment with the mouse 100 counterpart intronic nt. A minimum hit was 12 consecutive perfectly matching nt. (B) Length of conserved region in the nearest 100 nt of the flanking downstream intron. Axes as in A.(C) Exon size distribution. _x_-axis, exon size; _y_-axis, percent exons having size lesser or equal to the size in x. (D) Human–mouse exon identity. _x_-axis, percent identity in the global alignment of the human and the mouse exons; _y_-axis, percent exons with identity greater or equal to the value in x. (E) Human–mouse exon identity, for exons whose size is a multiple of 3. Axes as in D. Note that by combining two features we get better separation of the two exon-types.

Figure 2

Figure 2

Experimental validation for the existence of alternative splicing in selected predicted exons. RT–PCR for 15 exons (detailed in Table 2), for which no EST/cDNA indicating alternative splicing was found, was conducted over 14 different tissue types and cell lines (see Methods). Detected splice variants were confirmed by sequencing. For nine of these exons a splice isoform was detected in at least one of the tissues tested. Only a single tissue is shown here for each of these nine exons. Lane 1, DNA size marker. Lane 2, exon 2 skipping in FGF11 in ovary tissue (the 344-nt and 233-nt products are exon inclusion and skipping, respectively). Lane 3, exon 4 skipping in EFNA5 gene in ovary tissue (exon inclusion 287 nt; skipping 199nt). Lane 4, exon 8 skipping in NCOA1 gene in placenta tissue (exon inclusion 377 nt; skipping 275 nt). Lane 5, exon 22 skipping in PAM gene in cervix tissue (exon inclusion 323 nt; skipping 215 nt). Additional upper band contains a novel exon in PAM. Lane 6, exon 9 skipping in GOLGA4 gene in uterus tissue (exon inclusion 288 nt; skipping 213 nt). Lane 7, exon 9 skipping of NPR2 gene in placenta tissue (282nt inclusion; 207nt skipping). Lane 8, intron 8 retention in VLDLR gene in ovary tissue (wild type 324 nt; intron retention 427 nt). Lane 9, alternative acceptor site in exon 12 of BAZ1A in ovary tissue (wild type 351 nt; alternative acceptor variant 265 nt). The uppermost band represents a new exon in BAZ1A, inserted between exons 12 and 13. Lane 10, alternative acceptor site in exon 7 of SMARCD1 in uterus tissue (wild type 353 nt; exon 7 extension 397 nt).

Figure 3

Figure 3

Sensitivity vs. false-positive rate in classification rules. Each square on the curve represents the performance of a single classification rule. _x_-axis, 1-specificity, i.e., percent constitutive exons (false positives) retrieved by the rule. _y_-axis, sensitivity, i.e., percent alternative exons (true positives) identified by the rule. Values were computed relative to the training set. Rules that were used for this plot are provided as Supplemental material.

Similar articles

Cited by

References

    1. Berget, S.M. 1995. Exon recognition in vertebrate splicing. J. Biol. Chem. 270: 2411–2414. - PubMed
    1. Brett, D., Hanke, J., Lehmann, G., Haase, S., Delbruck, S., Krueger, S., Reich, J., and Bork, P. 2000. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 474: 83–86. - PubMed
    1. Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78–94. - PubMed
    1. Cartegni, L., Chew, S.L., and Krainer, A.R. 2002. Listening to silence and understanding nonsense: Exonic mutations that affect splicing. Nat. Rev. Genet. 3: 285–298. - PubMed
    1. Clamp, M., Andrews, D., Barker, D., Bevan, P., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., et al. 2003. Ensembl 2002: Accommodating comparative genomics. Nucleic Acids Res. 31: 38–42. - PMC - PubMed

WEB SITE REFERENCES

    1. http://genes.mit.edu/GENSCANinfo.html; GENSCAN.
    1. www.ncbi.nlm.nih.gov/dbEST; GenBank version 136 (June 2003).
    1. www.ncbi.nlm.nih.gov/genome/guide/human; Human genome (April 2003 assembly).

Publication types

MeSH terms

Substances

LinkOut - more resources