Using the transcriptome to annotate the genome (original) (raw)

Nature Biotechnology volume 20, pages 508–512 (2002)Cite this article

Abstract

A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified ∼15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another ∼10,000–20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions1,2. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed3,4. As the in silico approaches identified a smaller number of genes than anticipated5,6,7,8,9, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method—called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach10—that can be used to rapidly identify novel genes and exons.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

$209.00 per year

only $17.42 per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Similar content being viewed by others

References

  1. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
    CAS PubMed Google Scholar
  2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
    CAS PubMed Google Scholar
  3. Wheelan, S.J. & Boguski, M.S. Late-night thoughts on the sequence annotation problem. Genome Res. 8, 168–169 (1998).
    Article CAS PubMed Google Scholar
  4. Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).
    Article CAS PubMed PubMed Central Google Scholar
  5. Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nat. Genet. 7, 345–346 (1994).
    Article CAS PubMed Google Scholar
  6. Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).
    Article CAS PubMed Google Scholar
  7. Velculescu, V.E. et al. Analysis of human transcriptomes. Nat. Genet. 23, 387–388 (1999).
    Article CAS PubMed Google Scholar
  8. Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25, 239–240 (2000).
    Article CAS PubMed Google Scholar
  9. de Souza, S.J. et al. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl. Acad. Sci. USA 97, 12690–12693 (2000).
    Article CAS PubMed PubMed Central Google Scholar
  10. Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).
    Article CAS PubMed Google Scholar
  11. Lal, A. et al. A public database for gene expression in human cancers. Cancer Res. 59, 5403–5407 (1999).
    CAS PubMed Google Scholar
  12. Caron, H. et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291, 1289–1292 (2001).
    Article CAS PubMed Google Scholar
  13. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
    Article CAS PubMed Google Scholar
  14. Polyak, K., Xia, Y., Zweier, J.L., Kinzler, K.W. & Vogelstein, B. A model for p53-induced apoptosis. Nature 389, 300–304 (1997).
    Article CAS PubMed Google Scholar
  15. Adams, M.D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3 ff. (1995).
    Google Scholar
  16. Okubo, K., Yoshii, J., Yokouchi, H., Kameyama, M. & Matsubara, K. An expression profile of active genes in human colonic mucosa. DNA Res. 1, 37–45 (1994).
    Article CAS PubMed Google Scholar
  17. Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
    Article CAS PubMed Google Scholar
  18. Boyd, A.C., Charles, I.G., Keyte, J.W. & Brammar, W.J. Isolation and computer-aided characterization of _Mme_I, a type II restriction endonuclease from Methylophilus methylotrophus. Nucleic Acids Res. 14, 5255–5274 (1986).
    Article CAS PubMed PubMed Central Google Scholar
  19. Tucholski, J., Skowron, P.M. & Podhajska, A.J. MmeI, a class-IIS restriction endonuclease: purification and characterization. Gene 157, 87–92 (1995).
    Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Kathy Romans for assistance with database searches, Jennifer Davis for statistical analyses, and Steve Madden, Kathy Klinger, Xiaohong Cao, and members of our laboratories for helpful discussions. This work was supported by NIH grant CA57345.

Author information

Author notes

  1. Andrew B. Sparks
    Present address: GMP Genetics, 200 Prospect Street, Waltham, MA, 02451
  2. Saurabh Saha and Andrew B. Sparks: These authors contributed equally to this work.

Authors and Affiliations

  1. Howard Hughes Medical Institute and the Sidney Kimmel Comprehensive Cancer Center, Baltimore, 21231, MD
    Saurabh Saha, Andrew B. Sparks, Carlo Rago, Bert Vogelstein, Kenneth W. Kinzler & Victor E. Velculescu
  2. Program in Cellular and Molecular Medicine, Johns Hopkins Medical Institutions, Baltimore, 21231, MD
    Saurabh Saha
  3. Genzyme Molecular Oncology, P.O. Box 9322, Framingham, 01701, MA
    Viatcheslav Akmaev & Clarence J. Wang

Authors

  1. Saurabh Saha
    You can also search for this author inPubMed Google Scholar
  2. Andrew B. Sparks
    You can also search for this author inPubMed Google Scholar
  3. Carlo Rago
    You can also search for this author inPubMed Google Scholar
  4. Viatcheslav Akmaev
    You can also search for this author inPubMed Google Scholar
  5. Clarence J. Wang
    You can also search for this author inPubMed Google Scholar
  6. Bert Vogelstein
    You can also search for this author inPubMed Google Scholar
  7. Kenneth W. Kinzler
    You can also search for this author inPubMed Google Scholar
  8. Victor E. Velculescu
    You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toKenneth W. Kinzler or Victor E. Velculescu.

Ethics declarations

Competing interests

K.W.K. received research funding from Genzyme Molecular Oncology (Genzyme). Under a licensing agreement between the Johns Hopkins University and Genzyme, the SAGE technology was licensed to Genzyme for commercial purposes, and B.V., K.W.K., and V.E.V. are entitled to shares of royalties received by the university from the sales of the licensed technology. The SAGE technology is freely available to academia for research purposes. K.W.K. and V.E.V. are consultants to Genzyme, and B.V. has consulted for Genzyme in the past. The university and researchers (B.V., K.W.K., and V.E.V.) own Genzyme stock, which is subject to certain restrictions under university policy. The terms of these arrangements are being managed by the university in accordance with its conflict of interest policies.

Supplementary information

Rights and permissions

About this article

Cite this article

Saha, S., Sparks, A., Rago, C. et al. Using the transcriptome to annotate the genome.Nat Biotechnol 20, 508–512 (2002). https://doi.org/10.1038/nbt0502-508

Download citation

This article is cited by