Using the transcriptome to annotate the genome (original) (raw)

Technical Report
Published: 01 May 2002
Andrew B. Sparks 1 na1 nAff3,
Carlo Rago 1,
Viatcheslav Akmaev 3,
Clarence J. Wang 3,
Bert Vogelstein 1,
Kenneth W. Kinzler 1 &
…
Victor E. Velculescu 1

Nature Biotechnology volume 20, pages 508–512 (2002)Cite this article

2549 Accesses
449 Citations
15 Altmetric
Metrics details

Abstract

A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified ∼15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another ∼10,000–20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions1,2. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed3,4. As the in silico approaches identified a smaller number of genes than anticipated5,6,7,8,9, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method—called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach10—that can be used to rapidly identify novel genes and exons.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

$209.00 per year

only $17.42 per issue

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

References

Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
CAS PubMed Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
CAS PubMed Google Scholar
Wheelan, S.J. & Boguski, M.S. Late-night thoughts on the sequence annotation problem. Genome Res. 8, 168–169 (1998).
Article CAS PubMed Google Scholar
Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).
Article CAS PubMed PubMed Central Google Scholar
Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nat. Genet. 7, 345–346 (1994).
Article CAS PubMed Google Scholar
Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).
Article CAS PubMed Google Scholar
Velculescu, V.E. et al. Analysis of human transcriptomes. Nat. Genet. 23, 387–388 (1999).
Article CAS PubMed Google Scholar
Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25, 239–240 (2000).
Article CAS PubMed Google Scholar
de Souza, S.J. et al. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl. Acad. Sci. USA 97, 12690–12693 (2000).
Article CAS PubMed PubMed Central Google Scholar
Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).
Article CAS PubMed Google Scholar
Lal, A. et al. A public database for gene expression in human cancers. Cancer Res. 59, 5403–5407 (1999).
CAS PubMed Google Scholar
Caron, H. et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291, 1289–1292 (2001).
Article CAS PubMed Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Polyak, K., Xia, Y., Zweier, J.L., Kinzler, K.W. & Vogelstein, B. A model for p53-induced apoptosis. Nature 389, 300–304 (1997).
Article CAS PubMed Google Scholar
Adams, M.D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3 ff. (1995).
Google Scholar
Okubo, K., Yoshii, J., Yokouchi, H., Kameyama, M. & Matsubara, K. An expression profile of active genes in human colonic mucosa. DNA Res. 1, 37–45 (1994).
Article CAS PubMed Google Scholar
Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
Article CAS PubMed Google Scholar
Boyd, A.C., Charles, I.G., Keyte, J.W. & Brammar, W.J. Isolation and computer-aided characterization of _Mme_I, a type II restriction endonuclease from Methylophilus methylotrophus. Nucleic Acids Res. 14, 5255–5274 (1986).
Article CAS PubMed PubMed Central Google Scholar
Tucholski, J., Skowron, P.M. & Podhajska, A.J. MmeI, a class-IIS restriction endonuclease: purification and characterization. Gene 157, 87–92 (1995).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Kathy Romans for assistance with database searches, Jennifer Davis for statistical analyses, and Steve Madden, Kathy Klinger, Xiaohong Cao, and members of our laboratories for helpful discussions. This work was supported by NIH grant CA57345.

Author information

Author notes

Andrew B. Sparks
Present address: GMP Genetics, 200 Prospect Street, Waltham, MA, 02451
Saurabh Saha and Andrew B. Sparks: These authors contributed equally to this work.

Authors and Affiliations

Howard Hughes Medical Institute and the Sidney Kimmel Comprehensive Cancer Center, Baltimore, 21231, MD
Saurabh Saha, Andrew B. Sparks, Carlo Rago, Bert Vogelstein, Kenneth W. Kinzler & Victor E. Velculescu
Program in Cellular and Molecular Medicine, Johns Hopkins Medical Institutions, Baltimore, 21231, MD
Saurabh Saha
Genzyme Molecular Oncology, P.O. Box 9322, Framingham, 01701, MA
Viatcheslav Akmaev & Clarence J. Wang

Authors

Saurabh Saha
You can also search for this author inPubMed Google Scholar
Andrew B. Sparks
You can also search for this author inPubMed Google Scholar
Carlo Rago
You can also search for this author inPubMed Google Scholar
Viatcheslav Akmaev
You can also search for this author inPubMed Google Scholar
Clarence J. Wang
You can also search for this author inPubMed Google Scholar
Bert Vogelstein
You can also search for this author inPubMed Google Scholar
Kenneth W. Kinzler
You can also search for this author inPubMed Google Scholar
Victor E. Velculescu
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toKenneth W. Kinzler or Victor E. Velculescu.

Ethics declarations

Competing interests

K.W.K. received research funding from Genzyme Molecular Oncology (Genzyme). Under a licensing agreement between the Johns Hopkins University and Genzyme, the SAGE technology was licensed to Genzyme for commercial purposes, and B.V., K.W.K., and V.E.V. are entitled to shares of royalties received by the university from the sales of the licensed technology. The SAGE technology is freely available to academia for research purposes. K.W.K. and V.E.V. are consultants to Genzyme, and B.V. has consulted for Genzyme in the past. The university and researchers (B.V., K.W.K., and V.E.V.) own Genzyme stock, which is subject to certain restrictions under university policy. The terms of these arrangements are being managed by the university in accordance with its conflict of interest policies.

Supplementary information

Rights and permissions

About this article

Cite this article

Saha, S., Sparks, A., Rago, C. et al. Using the transcriptome to annotate the genome.Nat Biotechnol 20, 508–512 (2002). https://doi.org/10.1038/nbt0502-508

Download citation

Received: 02 October 2001
Accepted: 25 February 2002
Issue Date: 01 May 2002
DOI: https://doi.org/10.1038/nbt0502-508