Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags - PubMed (original) (raw)

Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags

Jianjun Chen et al. Proc Natl Acad Sci U S A. 2002.

Abstract

The number of genes in the human genome is still a controversial issue. Whereas most of the genes in the human genome are said to have been physically or computationally identified, many short cDNA sequences identified as tags by use of serial analysis of gene expression (SAGE) do not match these genes. By performing experimental verification of more than 1,000 SAGE tags and analyzing 4,285,923 SAGE tags of human origin in the current SAGE database, we examined the nature of the unmatched SAGE tags. Our study shows that most of the unmatched SAGE tags are truly novel SAGE tags that originated from novel transcripts not yet identified in the human genome, including alternatively spliced transcripts from known genes and potential novel genes. Our study indicates that by using novel SAGE tags as probes, we should be able to identify efficiently many novel transcripts/novel genes in the human genome that are difficult to identify by conventional methods.

PubMed Disclaimer

Figures

Fig 1.

Fig 1.

Analyses of the SAGE tags collected from the 101 human SAGE libraries. (A) Relationship between SAGE tag collection and unique SAGE tag identification. The total SAGE tags and unique SAGE tags were extracted from 1, 10, 20, 40, 60, 80, and 101 human SAGE libraries and used for comparison. (B) Changes in frequency of unique SAGE tags with increasing SAGE tag numbers. Unique SAGE tags from 1, 10, and 101 SAGE libraries were divided into groups based on copy number; the rate of unique SAGE tags in each group is illustrated in the bar graph.

Fig 2.

Fig 2.

Genomic confirmation of novel full-length cDNAs converted from novel SAGE tags. Full-length cDNAs were generated from 3′ cDNAs converted from novel SAGE tags and matched to human genomic sequences. (A) The full-length cDNA originating from novel SAGE tag GTTCACACGG matches exactly the antisense strand of the β-2 microglobin (B2M) gene. (B) Match of a full-length cDNA originating from novel SAGE tag TTTTAGGTGG partially overlapping with predicted exons, but with no matches with known mRNA or EST.

References

    1. Lander E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001) Nature (London) 409, 860-921. - PubMed
    1. Venter J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351. - PubMed
    1. Hogenesch J. B., Ching, K. A., Batalov, S., Su, A. I., Walker, J. R., Zhou, Y., Kay, S. A., Schultz, P. G. & Cooke, M. P. (2001) Cell 106, 413-415. - PubMed
    1. Kapranov P., Cawley, , Simon, E., Drenkow, J., Bekiranov, S., Strausberg, R. L., Fodor, S. P. A. & Gingeras, T. R. (2002) Science 296, 916-919. - PubMed
    1. Adams M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., et al. (1991) Science 252, 1651-1656. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources