Using the transcriptome to annotate the genome (original) (raw)
- Technical Report
- Published: 01 May 2002
- Andrew B. Sparks1 na1 nAff3,
- Carlo Rago1,
- Viatcheslav Akmaev3,
- Clarence J. Wang3,
- Bert Vogelstein1,
- Kenneth W. Kinzler1 &
- …
- Victor E. Velculescu1
Nature Biotechnology volume 20, pages 508–512 (2002)Cite this article
- 2549 Accesses
- 449 Citations
- 15 Altmetric
- Metrics details
Abstract
A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified ∼15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another ∼10,000–20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions1,2. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed3,4. As the in silico approaches identified a smaller number of genes than anticipated5,6,7,8,9, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method—called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach10—that can be used to rapidly identify novel genes and exons.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Similar content being viewed by others
References
- Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
CAS PubMed Google Scholar - Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
CAS PubMed Google Scholar - Wheelan, S.J. & Boguski, M.S. Late-night thoughts on the sequence annotation problem. Genome Res. 8, 168–169 (1998).
Article CAS PubMed Google Scholar - Guigo, R., Agarwal, P., Abril, J.F., Burset, M. & Fickett, J.W. An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10, 1631–1642 (2000).
Article CAS PubMed PubMed Central Google Scholar - Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nat. Genet. 7, 345–346 (1994).
Article CAS PubMed Google Scholar - Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).
Article CAS PubMed Google Scholar - Velculescu, V.E. et al. Analysis of human transcriptomes. Nat. Genet. 23, 387–388 (1999).
Article CAS PubMed Google Scholar - Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25, 239–240 (2000).
Article CAS PubMed Google Scholar - de Souza, S.J. et al. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl. Acad. Sci. USA 97, 12690–12693 (2000).
Article CAS PubMed PubMed Central Google Scholar - Velculescu, V.E., Zhang, L., Vogelstein, B. & Kinzler, K.W. Serial analysis of gene expression. Science 270, 484–487 (1995).
Article CAS PubMed Google Scholar - Lal, A. et al. A public database for gene expression in human cancers. Cancer Res. 59, 5403–5407 (1999).
CAS PubMed Google Scholar - Caron, H. et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 291, 1289–1292 (2001).
Article CAS PubMed Google Scholar - Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar - Polyak, K., Xia, Y., Zweier, J.L., Kinzler, K.W. & Vogelstein, B. A model for p53-induced apoptosis. Nature 389, 300–304 (1997).
Article CAS PubMed Google Scholar - Adams, M.D. et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3 ff. (1995).
Google Scholar - Okubo, K., Yoshii, J., Yokouchi, H., Kameyama, M. & Matsubara, K. An expression profile of active genes in human colonic mucosa. DNA Res. 1, 37–45 (1994).
Article CAS PubMed Google Scholar - Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
Article CAS PubMed Google Scholar - Boyd, A.C., Charles, I.G., Keyte, J.W. & Brammar, W.J. Isolation and computer-aided characterization of _Mme_I, a type II restriction endonuclease from Methylophilus methylotrophus. Nucleic Acids Res. 14, 5255–5274 (1986).
Article CAS PubMed PubMed Central Google Scholar - Tucholski, J., Skowron, P.M. & Podhajska, A.J. MmeI, a class-IIS restriction endonuclease: purification and characterization. Gene 157, 87–92 (1995).
Article CAS PubMed Google Scholar
Acknowledgements
We thank Kathy Romans for assistance with database searches, Jennifer Davis for statistical analyses, and Steve Madden, Kathy Klinger, Xiaohong Cao, and members of our laboratories for helpful discussions. This work was supported by NIH grant CA57345.
Author information
Author notes
- Andrew B. Sparks
Present address: GMP Genetics, 200 Prospect Street, Waltham, MA, 02451 - Saurabh Saha and Andrew B. Sparks: These authors contributed equally to this work.
Authors and Affiliations
- Howard Hughes Medical Institute and the Sidney Kimmel Comprehensive Cancer Center, Baltimore, 21231, MD
Saurabh Saha, Andrew B. Sparks, Carlo Rago, Bert Vogelstein, Kenneth W. Kinzler & Victor E. Velculescu - Program in Cellular and Molecular Medicine, Johns Hopkins Medical Institutions, Baltimore, 21231, MD
Saurabh Saha - Genzyme Molecular Oncology, P.O. Box 9322, Framingham, 01701, MA
Viatcheslav Akmaev & Clarence J. Wang
Authors
- Saurabh Saha
You can also search for this author inPubMed Google Scholar - Andrew B. Sparks
You can also search for this author inPubMed Google Scholar - Carlo Rago
You can also search for this author inPubMed Google Scholar - Viatcheslav Akmaev
You can also search for this author inPubMed Google Scholar - Clarence J. Wang
You can also search for this author inPubMed Google Scholar - Bert Vogelstein
You can also search for this author inPubMed Google Scholar - Kenneth W. Kinzler
You can also search for this author inPubMed Google Scholar - Victor E. Velculescu
You can also search for this author inPubMed Google Scholar
Corresponding authors
Correspondence toKenneth W. Kinzler or Victor E. Velculescu.
Ethics declarations
Competing interests
K.W.K. received research funding from Genzyme Molecular Oncology (Genzyme). Under a licensing agreement between the Johns Hopkins University and Genzyme, the SAGE technology was licensed to Genzyme for commercial purposes, and B.V., K.W.K., and V.E.V. are entitled to shares of royalties received by the university from the sales of the licensed technology. The SAGE technology is freely available to academia for research purposes. K.W.K. and V.E.V. are consultants to Genzyme, and B.V. has consulted for Genzyme in the past. The university and researchers (B.V., K.W.K., and V.E.V.) own Genzyme stock, which is subject to certain restrictions under university policy. The terms of these arrangements are being managed by the university in accordance with its conflict of interest policies.
Supplementary information
Rights and permissions
About this article
Cite this article
Saha, S., Sparks, A., Rago, C. et al. Using the transcriptome to annotate the genome.Nat Biotechnol 20, 508–512 (2002). https://doi.org/10.1038/nbt0502-508
- Received: 02 October 2001
- Accepted: 25 February 2002
- Issue Date: 01 May 2002
- DOI: https://doi.org/10.1038/nbt0502-508