A comparison of expressed sequence tags (ESTs) to human genomic sequences (original) (raw)

Abstract

The Expressed Sequence Tag (EST) division of GenBank, dbEST, is a large repository of the data being generated by human genome sequencing centers. ESTs are short, single pass cDNA sequences generated from randomly selected library clones. The approximately 415 000 human ESTs represent a valuable, low priced, and easily accessible biological reagent. As many ESTs are derived from yet uncharacterized genes, dbEST is a prime starting point for the identification of novel mRNAs. Conversely, other genes are represented by hundreds of ESTs, a redundancy which may provide data about rare mRNA isoforms. Here we present an analysis of >1000 ESTs generated by the WashU-Merck EST project. These ESTs were collected by querying dbEST with the genomic sequences of 15 human genes. When we aligned the matching ESTs to the genomic sequences, we found that in one gene, 73% of the ESTs which derive from spliced or partially spliced transcripts either contain intron sequences or are spliced at previously unreported sites; other genes have lower percentages of such ESTs, and some have none. This finding suggests that ESTs could provide researchers with novel information about alternative splicing in certain genes. In a related analysis of pairs of ESTs which are reported to derive from a single gene, we found that as many as 26% of the pairs do not BOTH align with the sequence of the same gene. We suspect that some of these unusual ESTs result from artifacts in EST generation, and caution researchers that they may find such clones while analyzing sequences in dbEST.

Full Text

The Full Text of this article is available as a PDF (265.8 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Aaronson J. S., Eckman B., Blevins R. A., Borkowski J. A., Myerson J., Imran S., Elliston K. O. Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. Genome Res. 1996 Sep;6(9):829–845. doi: 10.1101/gr.6.9.829. [DOI] [PubMed] [Google Scholar]
  2. Adams M. D., Kelley J. M., Gocayne J. D., Dubnick M., Polymeropoulos M. H., Xiao H., Merril C. R., Wu A., Olde B., Moreno R. F. Complementary DNA sequencing: expressed sequence tags and human genome project. Science. 1991 Jun 21;252(5013):1651–1656. doi: 10.1126/science.2047873. [DOI] [PubMed] [Google Scholar]
  3. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  4. Bains W. Virtually sequenced: the next genomic generation. Nat Biotechnol. 1996 Jun;14(6):711–713. doi: 10.1038/nbt0696-711. [DOI] [PubMed] [Google Scholar]
  5. Boguski M. S., Lowe T. M., Tolstoshev C. M. dbEST--database for "expressed sequence tags". Nat Genet. 1993 Aug;4(4):332–333. doi: 10.1038/ng0893-332. [DOI] [PubMed] [Google Scholar]
  6. Boguski M. S. The turning point in genome research. Trends Biochem Sci. 1995 Aug;20(8):295–296. doi: 10.1016/s0968-0004(00)89051-9. [DOI] [PubMed] [Google Scholar]
  7. Boguski M. S., Tolstoshev C. M., Bassett D. E., Jr Gene discovery in dbEST. Science. 1994 Sep 30;265(5181):1993–1994. doi: 10.1126/science.8091218. [DOI] [PubMed] [Google Scholar]
  8. Bonaldo M. F., Lennon G., Soares M. B. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 1996 Sep;6(9):791–806. doi: 10.1101/gr.6.9.791. [DOI] [PubMed] [Google Scholar]
  9. Chao K. M., Zhang J., Ostell J., Miller W. A local alignment tool for very long DNA sequences. Comput Appl Biosci. 1995 Apr;11(2):147–153. doi: 10.1093/bioinformatics/11.2.147. [DOI] [PubMed] [Google Scholar]
  10. Connelly C., Hieter P. Budding yeast SKP1 encodes an evolutionarily conserved kinetochore protein required for cell cycle progression. Cell. 1996 Jul 26;86(2):275–285. doi: 10.1016/S0092-8674(00)80099-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Gavin K. A., Hidaka M., Stillman B. Conserved initiator proteins in eukaryotes. Science. 1995 Dec 8;270(5242):1667–1671. doi: 10.1126/science.270.5242.1667. [DOI] [PubMed] [Google Scholar]
  12. Hillier L. D., Lennon G., Becker M., Bonaldo M. F., Chiapelli B., Chissoe S., Dietrich N., DuBuque T., Favello A., Gish W. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 1996 Sep;6(9):807–828. doi: 10.1101/gr.6.9.807. [DOI] [PubMed] [Google Scholar]
  13. Houlgatte R., Mariage-Samson R., Duprat S., Tessier A., Bentolila S., Lamy B., Auffray C. The Genexpress Index: a resource for gene discovery and the genic map of the human genome. Genome Res. 1995 Oct;5(3):272–304. doi: 10.1101/gr.5.3.272. [DOI] [PubMed] [Google Scholar]
  14. Huang X. Q., Hardison R. C., Miller W. A space-efficient algorithm for local similarities. Comput Appl Biosci. 1990 Oct;6(4):373–381. doi: 10.1093/bioinformatics/6.4.373. [DOI] [PubMed] [Google Scholar]
  15. McIntosh J. R., West R. R. A cell biological perspective on genome research. J Cell Biol. 1995 Dec;131(6 Pt 1):1361–1364. doi: 10.1083/jcb.131.6.1361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Schuler G. D., Boguski M. S., Stewart E. A., Stein L. D., Gyapay G., Rice K., White R. E., Rodriguez-Tomé P., Aggarwal A., Bajorek E. A gene map of the human genome. Science. 1996 Oct 25;274(5287):540–546. [PubMed] [Google Scholar]
  17. Schuler G. D., Epstein J. A., Ohkawa H., Kans J. A. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–162. doi: 10.1016/s0076-6879(96)66012-1. [DOI] [PubMed] [Google Scholar]
  18. Tilghman S. M. Lessons learned, promises kept: a biologist's eye view of the Genome Project. Genome Res. 1996 Sep;6(9):773–780. doi: 10.1101/gr.6.9.773. [DOI] [PubMed] [Google Scholar]