Inferring Alternative Splicing Patterns in Mouse from a Full-Length cDNA Library and Microarray Data (original) (raw)
Abstract
Although many studies on alternative splicing of specific genes have been reported in the literature, the general mechanism that regulates alternative splicing has not been clearly understood. In this study, we systematically aligned each pair of the 21,076 cDNA sequences of Mus musculus, searched for putative alternative splicing patterns, and constructed a list of potential alternative splicing sites. Two cDNAs are suspected to be alternatively spliced and originating from a common gene if they share most of their region with a high degree of sequence homology, but parts of the sequences are very distinctive or deleted in either cDNA. The list contains the following information: (1) tissue, (2) developmental stage, (3) sequences around splice sites, (4) the length of each gapped region, and (5) other comments. The list is available at http://www.bioinfo.sfc.keio.ac.jp/intron. Our results have predicted a number of unreported alternatively spliced genes, some of which are expressed only in a specific tissue or at a specific developmental stage.
Alternative splicing of pre-mRNA plays an important role in the production of diverse mRNAs from individual genes, and it helps increase the functional range of gene products in higher eukaryotes. In many cases, gene expression is tightly regulated at the splicing level by specific mechanisms to provide suitable proteins for a particular tissue or stage (McKeown 1992; Chabot 1996; Wang and Manley 1997). On the other hand, alternative transcripts are generated in the same tissue, especially in brain or muscle, to supply an extensive number of proteins that have distinct functions, contributing to their plasticity (Bernstein et al. 1986; Missler and Sushof 1998). The total number of genes in the human genome is estimated to range from 28,000 to 120,000 (Crollius et al. 2000; Ewing and Green 2000; Liang et al. 2000; Wright et al. 2001), and at least one-third of them might give rise to alternatively spliced transcripts (Mironov et al. 1999; Brett et al. 2000). Although the databases of alternative splicing were established by collecting alternatively spliced genes from annotated databases (Dralyuk et al. 2000; Ji et al. 2001), the number of alternatively spliced genes cataloged in such databases is small compared with the estimated total number of alternatively spliced human genes (Modrek et al. 2001).
Using the approach of single-pass end sequence from randomly selected cDNA clones, >1 million expressed sequence tags (ESTs) have been submitted to publicly available databases (Adams et al. 1991). The accumulation of ESTs contributes not only to the discovery of new genes (Adams et al. 1995) but also to the detection of new alternatively spliced genes. There are several ways to detect alternatively spliced genes, including (1) mapping EST sequences onto the genome sequence (Wolfsberg and Landsman 1997; Modrek et al. 2001), (2) comparing full-length mRNA sequences from annotated databases against the EST database (Brett et al. 2000), and (3) clustering EST sequences (Burke et al. 1998). Although the ESTs are effective material to identify novel candidates of alternatively spliced genes, full-length cDNAs are much more desirable for that purpose because they cover entire coding regions.
In this study, we used 21,076 full-length cDNA clones of Mus musculus derived from numerous tissues or developmental stages (The RIKEN Genome Exploration Research Group Phase II and the FANTOM Consortium 2001) to analyze the extent of alternative splicing. Here, we conducted a systematic analysis to extract putative alternative cDNAs by comprehensive, round-robin comparisons among the 21,076 clone sequences and constructed a list of potential alternatively spliced transcripts. After that, we analyzed the expression patterns of clusters using their expression profile (Miki et al. 2001) and adopted the clusters whose cDNAs showed a tendency to express in a specific tissue or developmental stage. It has been reported that 69 out of 1600 rat genes were detected as alternatively spliced genes based on expression data (Hu et al. 2001). Our analysis used a putative alternative splicing data set and an enormous microarray data set.
The use of this method is significant not only because it allowed alternatively spliced genes to be identified but also because it can be limited to the specific condition of alternative splicing and reduce experimental work. This method may be a model of transcriptome analysis of alternative splicing.
RESULTS
Overview of the Clusters Predicted as Alternatively Spliced Genes
The data set of alternatively spliced cDNAs was constructed from a library of 21,076 cDNAs as described in the previous section. The data set consists of 415 clusters with a total of 1136 cDNAs. In the data set, potentially alternatively spliced cDNAs are listed with the following information: (1) tissue, (2) developmental stage, (3) sequences around splice sites, (4) the length of each gapped region, and (5) other comments. These cDNAs are available at http://www.bioinfo.sfc.keio.ac.jp/intron. Most clusters have only one gapped region (putative alternatively spliced site), as summarized in Table 1.
Table 1.
Clusters by the Number of Gapped Regions
No. of gapped regions | No. of clusters |
---|---|
One gap | 346 |
Two gaps | 48 |
Three gaps | 18 |
More than three gaps | 3 |
Total | 415 |
Various types of alternative splicing patterns have been discussed. Breitbart et al. (1987) suggested five canonical types of alternative splicing (illustrated in Fig. 1): (A) cassette, (B) internal donor site, (C) internal acceptor site, (D) mutually exclusive, and (E) retained intron. We classified the 490 gapped regions of the 415 clusters into one of these five categories according to the criteria defined below. For the sake of classification, we consider nucleotide sequences around the splicing sites (Mount 1982; Padgett et al. 1986) 5′-(a/c)ag‖GT(a/g)agt and (c/t)10N(c/t)AG‖g-3′. These consensus nucleotides are reflected in Figure 1. For each gapped region to be classified into one of the five categories, the nucleotides represented by capital letters are compulsory, and the nucleotides represented by lower-case letters are preferred. More precisely, we used the following criteria: (A) cassette: GT or AG; (B) internal donor site: GT required, and at least four of the seven preferred nucleotides of donor site; (C) internal acceptor site: AG required, and at least 8 of the 13 preferred nucleotides of acceptor site; and (E) retained intron: GT—AG required, and at least four of the seven preferred nucleotides of donor site and 8 of the 13 preferred nucleotides of acceptor site. Because category D can be uniquely determined by the pattern of alignment alone, no nucleotides were checked for it. The gapped regions that could not be classified in each category were categorized as Unclassified. The results of this categorization are presented in Table 2. To estimate the tendency of misclassifications, alternative exons of M. musculus known in the literature (Stamm et al. 2000) were used as a sample set and classified according to the same criteria. The result of this classification is represented in Table 3. The majority of the known exons were categorized correctly in accordance with their appropriate splicing patterns, except many (A) cassette exons were classified as (C) internal accepter sites. These misclassifications arise from the fact that exonic consensus sequences in the acceptor site are similar to the intronic consensus sequence AG, making it difficult to predict the form of alternative splicing on the basis of sequence data (Thanraj 2000). From this control study, it can be inferred that a good portion of the 134 gapped regions listed as (C) internal acceptor sites in Table 2 are actually (A) cassettes.
Figure 1.
Patterns of alternative splicing. Nucleotide sequences are consensus sequences around the splicing sites (Mount 1982; Padgett et al. 1986).
Table 2.
Classification of Potential Sites of Alternative Splicing
Patterns | No. of gapped regions |
---|---|
(A) Cassette | 111 |
(B) Internal donor site | 56 |
(C) Internal acceptor site | 134 |
(D) Mutually exclusive | 8 |
(E) Retained intron | 125 |
Unclassified | 56 |
Total | 490 |
Table 3.
Known Alternative Exons of Mus musculus Were Classified According to the Same Criteria
Patterns (predicted) | Patterns (actual) | |||
---|---|---|---|---|
(A) cassette | (B) internal donor site | (C) internal acceptor site | (E) retained intron | |
(A) Cassette | 26 | 0 | 0 | 0 |
(B) Internal donor site | 2 | 5 | 0 | 0 |
(C) Internal acceptor site | 15 | 0 | 9 | 1 |
(E) Retained intron | 3 | 2 | 1 | 7 |
Unclassified | 14 | 0 | 0 | 0 |
Total | 60 | 7 | 10 | 8 |
The numbers of spliced and unspliced regions (illustrated in Fig. 2) of putative alternative splicing are summarized in Tables 4 and 5 according to expressed tissue and developmental stage, respectively. No general tendency specific to tissue or specificity of developmental stage was found, indicating that alternative splicing is taking place widely in all tissues and at all developmental stages.
Figure 2.
An example of spliced and unspliced regions. Spliced has a gapped region.
Table 4.
The Number of Spliced and Unspliced Regions Listed by Tissues
Tissue | No. of gapped regions | |
---|---|---|
spliced | unspliced | |
Adipose | 0 | 1 |
Brain | 4 | 9 |
C. quadrigemina microdissected | 0 | 2 |
Cecum | 4 | 1 |
Cerebellum | 21 | 15 |
Cerebellum microdissected | 1 | 1 |
Colon | 0 | 4 |
Corpus striatum microdissected | 2 | 3 |
ES cell | 31 | 24 |
Extra testis | 1 | 0 |
Extra testis microdissected | 1 | 3 |
Eyeball | 2 | 2 |
Eyeball microdissected | 0 | 1 |
Forelimb | 3 | 0 |
Head | 36 | 35 |
Heart | 5 | 5 |
Hippocampus | 13 | 8 |
Hypothalamus microdissected | 1 | 1 |
Intestine | 1 | 0 |
Kidney | 21 | 40 |
Liver | 33 | 16 |
Liver microdissected | 19 | 16 |
Lower body | 1 | 1 |
Lung | 25 | 7 |
Lung microdissected | 0 | 5 |
Mammary gland | 1 | 0 |
Medulla oblonagata microdissected | 5 | 3 |
Ovary and uterus | 5 | 2 |
Pancreas | 30 | 39 |
Pituitary gland | 6 | 5 |
Placenta | 10 | 6 |
Placenta and extraembryonic tissues | 4 | 1 |
Retina microdissected | 3 | 0 |
Skin | 8 | 9 |
Small intestine | 26 | 34 |
Spinal cord microdissected | 1 | 1 |
Spleen | 0 | 1 |
Stomach | 28 | 17 |
Stomach microdissected | 0 | 1 |
Testis | 123 | 101 |
Testis microdissected | 1 | 1 |
Thymus | 2 | 3 |
Tongue | 60 | 57 |
Upper body | 2 | 2 |
Urinary bladder | 1 | 0 |
Whole body | 188 | 199 |
Whole body microdissected | 1 | 1 |
Total | 720 | 683 |
Table 5.
The Number of Spliced and Unspliced Regions Listed by Developmental Stage
Developmental stage | No. of gapped regions | |
---|---|---|
spliced | unspliced | |
Adult | 418 | 375 |
Embryo-10 | 52 | 58 |
Embryo-10+ | 45 | 38 |
Embryo-11 | 30 | 26 |
Embryo-12 | 9 | 11 |
Embryo-13 | 45 | 34 |
Embryo-14 | 0 | 1 |
Embryo-14, 17 | 2 | 3 |
Embryo-15 | 1 | 1 |
Embryo-16 | 1 | 6 |
Embryo-17 | 2 | 3 |
Embryo-18 | 34 | 58 |
Embryo-7 | 1 | 1 |
Embryo-8 | 25 | 19 |
ES cell | 31 | 24 |
Lactation-10 | 1 | 0 |
Neonate-0 | 12 | 14 |
Neonate-10 | 2 | 6 |
Neonate-6 | 4 | 3 |
Pregnant-11 | 5 | 2 |
Total | 720 | 683 |
Details of the Several Clusters Predicted as Alternatively Spliced Genes
One of the clusters in category D (mutually exclusive) is homologous (96% identity) to the CHIP protein (Ballinger et al. 1999). The form of this protein is shown in Figure 3. Although the CHIP gene has not been reported as an alternatively spliced gene, it is likely that this gene has alternative transcripts.
Figure 3.
Mutually exclusive splicing of the CHIP gene (Ballinger et al. 1999).
Figure 4 shows examples of more complicated alternative splicing patterns in which three cDNAs were potentially produced in different forms from a single gene. An open reading frame (ORF) was predicted for each cDNA using the RIKEN DECODER program (Fukunishi and Hayashizaki 2001).
Figure 4.
Examples of more complicated alternative splicing patterns in which three cDNAs were potentially produced in different forms from a single gene. Cluster 8: homologs to human PR domain zinc finger protein 5 (Deng et al., unpubl.). Cluster 45: homologs to human mitochondrial carrier homolog 2 (Jang et al., unpubl.). Cluster 63: homologs to human HSPC204 protein (Zhang et al. 2000). Cluster 74: homologs to human HSPC223 protein (Ye et al., unpubl.). Cluster 85: homologs to human heterogeneous nuclear ribonucleoprotein C (Nakagawa et al. 1986). Clusters 3022, 3058, and 3110: no homology found (hypothetical protein). Splice variant of Cluster 3058, no homology found (unclassifiable). Cluster 3147: homologs to D. melanogaster brain cDNA clone NMCB-2386 (Osada et al., unpubl.). Cluster 3148: homologs to bisphosphate 3′-nucleotidase (Spiegelberg et al. 1999).
In the case that an alternatively spliced region resides in a predicted ORF, it is likely that the spliced exon increases variation of the protein function. In particular, cDNA Cluster 8 has three splicing patterns, and the second spliced region causes a drastic change of amino acids by a frameshift. Although it is possible that this frameshift is caused by a sequencing error, we think it is not, because the frameshifted region includes a zinc finger motif (Table 6). It could be suggested that the variety of zinc finger motifs in the three translation products contributes to variation in gene regulation by altering their DNA-binding sites.
Table 6.
The Result of Motif Analysis in Alternate Exons (Cluster 8)
Seq. ID | Description of motif | Number of motif | E value |
---|---|---|---|
18486 | Zinc finger C2H2 type | 11 | 7.00 _E_-14 |
7864 | Zinc finger C2H2 type | 11 | 1.80 _E_-82 |
22966 | Zinc finger C2H2 type | 8 | 3.20 _E_-66 |
Besides this case, frameshifts were identified in cDNA Clusters 63 and 3071, but a motif was not found in these exons. It has been reported that in the integrin β5 subunit of mouse and major protein zero (MPZ) of human, the occurrence of alternative splicing events in the ORF resulted in open-reading frameshifts (Besancon et al. 1999). Thus, two clusters may also have distinct gene functions regulated by frameshifts.
Transcriptome Analysis of Mouse DNA Arrays with Our Data Set
Figures 5 and 6 show the transcriptome analyses of mouse DNA arrays with our putative alternative splicing data set. These clusters each have a prominent splicing pattern in specific tissues or at distinct developmental stages. The level of gene expression is presented as a score of signal intensity between cDNAs.
Figure 5.
These clusters each have a prominent splicing pattern in specific tissues or at distinct developmental stages. Cluster 2204: homologs to prolactin-like-peptide (Ishibashi and Imai 1999). Cluster 3082: homologs to human HSPC011 and 28S ribosomal protein S17, mitochondrial precursor (Gantt and Thompson 1990). Cluster 3138: homologs to TIA-1 cytotoxic granule-associated RNA-binding protein-like 1 (Lowin et al. 1996). Cluster 3148: homologs to bisphosphate 3′-nucleotidase (Spiegelberg et al. 1999).
Figure 6.
The horizontal axis is the tissue in which the gene expression was observed. The vertical axis is the level of gene expression as a score of signal intensity between cDNAs (log).
In Cluster 2204, cDNAs are homologs to prolactin-like peptide. It is known that the prolactin (PRL)/growth hormone (GH) gene is expressed in the pituitary gland, uterus, or the placenta (Ishibashi and Imai 1999). Our data show that SeqID 4107 is expressed in the placenta but not in the thymus or uterus. On the other hand, SeqID 3784 presents high expression in thymus and uterus. The alternative exon may contribute to the construction of this protein in a particular tissue.
In Cluster 3148, cDNAs are homologs to bisphosphate 3′-nucleotidase (Spiegelberg et al. 1999), which has not been reported to have alternative transcripts. Although the distal start codon may be adopted by both cDNAs, two start codons may be properly used at a specific developmental stage by alternative splicing.
Some alternatively spliced regions are outside of predicted ORFs (Clusters 3082, 3138). The cDNAs of Cluster 3138 are homologs to TIA-1 cytotoxic granule-associated RNA-binding protein-like 1. This gene is expressed in the cells fated to be brain and retina at embryonic days 12.5. Its expression is also found in the lung, kidney, and thymus (Lowin et al. 1996). On the other hand, the gene expression of cDNA Cluster 3082 is likely to be regulated according to the skin developmental stage. The cDNAs of this cluster are homologs to 28S ribosomal protein S17 (Gantt and Thompson 1990). It has been reported that alternative splicing often occurs in 5′-untranslated regions, resulting in alternative regulation of gene expression (Mironov et al. 1999). Therefore, the alternatively spliced regions may contain regulatory elements.
DISCUSSION
We divided 1136 cDNAs into 415 clusters as putative alternatively spliced transcripts. These cDNAs constitute 7.4% of the 15,294 cDNAs (the estimated number of unique sequences). Although it has, indeed, been reported that ∼38% of all human genes are produced by alternative splicing (Brett et al. 2000), our number should not be interpreted as the percentage of alternatively spliced genes in general. In the process of constructing the cDNA library, we tried to reduce redundancy by not sequencing cDNAs with the same nucleotide sequence in their 5′- or 3′-untranslated regions (The RIKEN Genome Exploration Research Group Phase II and the FANTOM Consortium 2001). This procedure should have eliminated a large number of alternatively spliced transcripts.
It has been reported that many genes are alternatively spliced at multiple sites (Smith et al. 1989), from which hundreds of alternate transcripts could be produced in theory. One example of this is the lymphocyte homing receptor gene CD44, which can generate enormous molecular diversity, >1000 potential isoforms, by including or excluding each of 10 exons in the gene (Screaton et al. 1992; Tolg et al. 1993). In our results, on the other hand, most of the clusters showed potential alternative splicing at only one site (Table 1); it may be that they have many more splicing variants that we have overlooked. To study this possibility, a greater amount of cDNA sequence data from a given gene will be necessary (Regan et al. 2000).
In summary, computational analysis is a powerful means for predicting potential sites of alternative splicing, and we have constructed a list of these sites from the largest available data set of mouse full-length cDNA sequences. Our results have predicted a number of unreported alternatively spliced genes, some of which are expressed only in a specific tissue or at a specific developmental stage.
METHODS
We used a set of 21,076 mouse full-length cDNAs produced by The RIKEN Genome Exploration Research Group Phase II and the FANTOM Consortium (2001). The average length of all the cDNAs was 1257 bp. The number of unique sequences, after eliminating redundant sequences, is presumed to be 15,294. In our work, however, we did not make any attempt to eliminate redundancy and used all of the 21,076 sequences, in order not to miss any potential alternative transcripts.
First, we conducted a round-robin BLAST search (Altschul et al. 1990) of the 21,076 cDNAs sequences against each other. The cDNA pairs whose BLAST output met the following criteria were extracted from the data set: (1) >95% of nucleotides were identical for >20 consecutive nucleotides; and (2) more than one such matching region in common. After these comprehensive pair-wise comparisons, the cDNA pairs were merged into clusters, if one sequence was paired with two or more different sequences.
Next, the sequences of these clusters were aligned using the multiple sequence alignment program CLUSTALW (Thompson et al. 1994). The gap penalty parameter was set to 0 to tolerate large gaps. If the output of alignment shared most of the region with a high degree of sequence homology but parts of the sequences were very distinctive or deleted in either cDNA, the cluster was suspected to be alternatively spliced originating from the common gene. We define such distinctive or deleted regions as gapped regions, and consider them as candidate alternatively spliced exons.
We also used microarray data of expression patterns for 18,816 mouse cDNA sequences (Miki et al. 2001), to extract alternatively spliced genes whose expression pattern is prominent in a specific tissue or at a specific developmental stage. We presented the level of gene expression as a score of signal intensity between cDNAs.
WEB SITE REFERENCES
http://www.bioinfo.sfc.keio.ac.jp/intron; a list of alternative splicing patterns.
Acknowledgments
We thank Atsushi Sakurai, Shigeo Fujimori, Koya Mori, Hitomi Itoh, and members of the Tomita laboratory for helpful discussions and suggestions during the course of this work. This study was supported in part by a research grant for the RIKEN Genome Exploration Research Project from the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT) to Y.H. This work was also supported by a research grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Rice Genome Project), New Energy and Industrial Technology Development Organization (NEDO) of the Ministry of Economy, Trade and Industry of Japan (Development of a Technological Infrastructure for Industrial Bioprocesses Project), and Japan Science and Technology Agency.
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.
Footnotes
E-MAIL mt@sfc.keio.ac.jp; FAX 81 (466) 47-5099.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.220302\. Article published online before print in July 2002.
REFERENCES
- Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, et al. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science. 1991;252:1651–1656. doi: 10.1126/science.2047873. [DOI] [PubMed] [Google Scholar]
- Adams MD, Kerlavage AR, Fleischmann RD, Fuldner RA, Bult CJ, Lee NH, Kirkness EF, Weinstock KG, Gocayne JD, White O, et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature. 1995;377:3–174. [PubMed] [Google Scholar]
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Ballinger CA, Connell P, Wu Y, Hu Z, Thompson LJ, Yin LY, Patterson C. Identification of CHIP, a novel tetratricopeptide repeat-containing protein that interacts with heat shock proteins and negatively regulates chaperone functions. Mol Cell Biol. 1999;19:4535–4545. doi: 10.1128/mcb.19.6.4535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernstein SI, Hansen CJ, Becker KD, Wassenberg DR, II, Roche ES, Donady JJ, Emerson CP., Jr Alternative RNA splicing generates transcripts encoding a thorax-specific isoform of Drosophila melanogaster myosin heavy chain. Mol Cell Biol. 1986;6:2511–2519. doi: 10.1128/mcb.6.7.2511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Besancon R, Prost AL, Konecny L, Latour P, Petiot P, Boutrand L, Kopp N, Mularoni A, Chamba G, Vandenberghe A. Alternative exon 3 splicing of the human major protein zero gene in white blood cells and peripheral nerve tissue. FEBS Lett. 1999;457:339–342. doi: 10.1016/s0014-5793(99)01069-8. [DOI] [PubMed] [Google Scholar]
- Breitbart RE, Andreadis A, Nadal-Ginard B. Alternative splicing: A ubiquitous mechanism for the generation of multiple protein isoforms from single genes. Annu Rev Biochem. 1987;56:467–495. doi: 10.1146/annurev.bi.56.070187.002343. [DOI] [PubMed] [Google Scholar]
- Brett D, Hanke J, Lehmann G, Haase S, Delbruck S, Krueger S, Reich J, Bork P. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 2000;26:83–86. doi: 10.1016/s0014-5793(00)01581-7. [DOI] [PubMed] [Google Scholar]
- Burke J, Wang H, Hide W, Davison DB. Alternative gene form discovery and candidate gene selection from gene indexing projects. Genome Res. 1998;8:276–290. doi: 10.1101/gr.8.3.276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chabot B. Directing alternative splicing: Cast and scenarios. Trends Genet. 1996;12:472–478. doi: 10.1016/0168-9525(96)10037-8. [DOI] [PubMed] [Google Scholar]
- Crollius RH, Jaillon O, Bernot A, Dasilva C, Bouneau L, Fischer C, Fizames C, Wincker P, Brottier P, Quetier F, et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet. 2000;25:235–238. doi: 10.1038/76118. [DOI] [PubMed] [Google Scholar]
- Dralyuk I, Brudno M, Gelfand MS, Zorn M, Dubchak I. ASDB: Database of alternatively spliced genes. Nucleic Acids Res. 2000;28:296–297. doi: 10.1093/nar/28.1.296. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewing B, Green P. Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet. 2000;25:232–234. doi: 10.1038/76115. [DOI] [PubMed] [Google Scholar]
- Fukunishi Y, Hayashizaki Y. Amino acid translation program for full-length cDNA sequences with frameshift errors. Physiol Genomics. 2001;5:81–87. doi: 10.1152/physiolgenomics.2001.5.2.81. [DOI] [PubMed] [Google Scholar]
- Gantt JS, Thompson MD. Plant cytosolic ribosomal protein S11 and chloroplast ribosomal protein S17. Their primary structures and evolutionary relationships. J Biol Chem. 1990;265:2763–2767. [PubMed] [Google Scholar]
- Hu GK, Madore SJ, Moldover B, Jatkoe T, Balaban D, Thomas J, Wang Y. Predicting splice variant from DNA chip expression data. Genome Res. 2001;11:1237–1245. doi: 10.1101/gr.165501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishibashi K, Imai M. Identification of four new members of the rat prolactin/growth hormone gene family. Biochem Biophys Res Commun. 1999;262:575–578. doi: 10.1006/bbrc.1999.1260. [DOI] [PubMed] [Google Scholar]
- Ji H, Zhou Q, Wen F, Xia H, Lu X, Li Y. AsMamDB: An alternative splice database of mammals. Nucleic Acids Res. 2001;29:260–263. doi: 10.1093/nar/29.1.260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liang F, Holt I, Pertea G, Karamycheva S, Salzberg SL, Quackenbush J. Gene Index analysis of the human genome estimates approximately 120,000 genes. Nat Genet. 2000;25:239–240. doi: 10.1038/76126. [DOI] [PubMed] [Google Scholar]
- Lowin B, French L, Martinou JC, Tschopp J. Expression of the CTL-associated protein TIA-1 during murine embryogenesis. J Immunol. 1996;157:1448–1454. [PubMed] [Google Scholar]
- McKeown M. Alternative mRNA splicing. Annu Rev Cell Biol. 1992;8:133–155. doi: 10.1146/annurev.cb.08.110192.001025. [DOI] [PubMed] [Google Scholar]
- Miki R, Kadota K, Bono H, Mizuno Y, Tomaru Y, Carninci P, Itoh M, Shibata K, Kawai J, Konno H, et al. Delineating developmental and metabolic pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length enriched mouse cDNA arrays. Proc Natl Acad Sci. 2001;98:2199–2204. doi: 10.1073/pnas.041605498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mironov AA, Fickett JW, Gelfand MS. Frequent alternative splicing of human genes. Genome Res. 1999;9:1288–1293. doi: 10.1101/gr.9.12.1288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Missler M, Sudhof TC. Neurexins: Three genes and 1001 products. Trends Genet. 1998;14:20–26. doi: 10.1016/S0168-9525(97)01324-3. [DOI] [PubMed] [Google Scholar]
- Modrek B, Resch A, Grasso C, Lee C. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 2001;29:2850–2859. doi: 10.1093/nar/29.13.2850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mount SM. A catalogue of splice junction sequences. Nucleic Acids Res. 1982;10:459–472. doi: 10.1093/nar/10.2.459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakagawa TY, Swanson MS, Wold BJ, Dreyfuss G. Molecular cloning of cDNA for the nuclear ribonucleoprotein particle C proteins: A conserved gene family. Proc Natl Acad Sci. 1986;83:2007–2011. doi: 10.1073/pnas.83.7.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Padgett RA, Grabowski PJ, Konarska MM, Seiler S, Sharp PA. Splicing of messenger RNA precursors. Annu Rev Biochem. 1986;55:1119–1150. doi: 10.1146/annurev.bi.55.070186.005351. [DOI] [PubMed] [Google Scholar]
- Regan MR, Emerick MC, Agnew WS. Full-length single-gene cDNA libraries: Applications in splice variant analysis. Anal Biochem. 2000;286:265–276. doi: 10.1006/abio.2000.4819. [DOI] [PubMed] [Google Scholar]
- The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium. Functional annotation of a full-length mouse cDNA collection. Nature. 2001;409:685–690. doi: 10.1038/35055500. [DOI] [PubMed] [Google Scholar]
- Screaton GR, Bell MV, Jackson DG, Cornelis FB, Gerth U, Bell JI. Genomic structure of DNA encoding the lymphocyte homing receptor CD44 reveals at least 12 alternatively spliced exons. Proc Natl Acad Sci. 1992;89:12160–12164. doi: 10.1073/pnas.89.24.12160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith CW, Patton JG, Nadal-Ginard B. Alternative splicing in the control of gene expression. Annu Rev Genet. 1989;23:527–577. doi: 10.1146/annurev.ge.23.120189.002523. [DOI] [PubMed] [Google Scholar]
- Spiegelberg BD, Xiong JP, Smith JJ, Gu RF, York JD. Cloning and characterization of a mammalian lithium-sensitive bisphosphate 3′-nucleotidase inhibited by inositol 1,4-bisphosphate. J Biol Chem. 1999;274:13619–13628. doi: 10.1074/jbc.274.19.13619. [DOI] [PubMed] [Google Scholar]
- Stamm S, Zhu J, Nakai K, Stoilov P, Stoss O, Zhang MQ. An alternative-exon database and its statistical analysis. DNA Cell Biol. 2000;19:739–756. doi: 10.1089/104454900750058107. [DOI] [PubMed] [Google Scholar]
- Thanraj TA. Positional characterization of false positives from computational prediction of human splice sites. Nucleic Acids Res. 2000;28:744–754. doi: 10.1093/nar/28.3.744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tolg C, Hofmann M, Herrlich P, Ponta H. Splicing choice from ten variant exons establishes CD44 variability. Nucleic Acids Res. 1993;21:1225–1229. doi: 10.1093/nar/21.5.1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Manley JL. Regulation of pre-mRNA splicing in metazoa. Curr Opin Genet Dev. 1997;7:205–211. doi: 10.1016/s0959-437x(97)80130-x. [DOI] [PubMed] [Google Scholar]
- Wright FA, Lemon WJ, Zhao WD, Sears R, Zhuo D, Wang JP, Yang H, Y, Baer T, Stredney D, Spitzner J, et al. A draft annotation and overview of the human genome. Genome Biol. 2001;2:RESEARCH0025.1–RESEARCH0025.18. doi: 10.1186/gb-2001-2-7-research0025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolfsberg TG, Landsman D. A comparison of expressed sequence tags (ESTs) to human genomic sequences. Nucleic Acids Res. 1997;25:1626–1632. doi: 10.1093/nar/25.8.1626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang QH, Ye M, Wu XY, Ren SX, Zhao M, Zhao CJ, Fu G, Shen Y, Fan HY, Lu G, et al. Cloning and functional analysis of cDNAs with open reading frames for 300 previously undefined genes expressed in CD34+ hematopoietic stem progenitor cells. Genome Res. 2000;10:1546–1560. doi: 10.1101/gr.140200. [DOI] [PMC free article] [PubMed] [Google Scholar]