Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs (original) (raw)

Nature Genetics volume 37, pages 991–996 (2005)Cite this article

Abstract

Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from unamplified, polyadenylation-selected RNA samples from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

$209.00 per year

only $17.42 per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

  1. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
  2. Schadt, E.E. et al. A comprehensive transcript index of the human genome generated using microarrays and computational approaches. Genome Biol. 5, R73 (2004).
    Article Google Scholar
  3. Hughes, T.R. et al. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat. Biotechnol. 19, 342–347 (2001).
    Article CAS Google Scholar
  4. Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
    Article CAS Google Scholar
  5. Krogh, A. Two methods for improving performance of an HMM and their applicatoin for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 179–186 (1997).
    CAS PubMed Google Scholar
  6. Xu, Y., Mural, R.J. & Uberbacher, E.C. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 344–353 (1997).
    CAS PubMed Google Scholar
  7. Waterston, R.H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
    Article CAS Google Scholar
  8. Zhang, W. et al. The functional landscape of moues gene expression. J. Biol. 3, 21 (2004).
    Article Google Scholar
  9. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003).
    Article CAS Google Scholar
  10. Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409, 922–927 (2001).
    Article CAS Google Scholar
  11. Stolc, V. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).
    Article Google Scholar
  12. Yamada, K. et al. Empricial analysis of transcriptional activity in the Arabidopsis genome. Science 302, 842–846 (2003).
    Article CAS Google Scholar
  13. Kapranov, P. et al. Large-scale transcriptional activity in Chromosomes 21 and 22. Science 296, 916–919 (2002).
    Article CAS Google Scholar
  14. Rinn, J.L. et al. The transcriptional activity of human Chromosome 22. Genes Dev. 17, 529–540 (2003).
    Article CAS Google Scholar
  15. Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).
    Article CAS Google Scholar
  16. Kschischang, F.R., Frey, B.J. & Loeliger, H.A. Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47, 498–519 (2001).
    Article Google Scholar
  17. Garbarino, J.E. & Gibbons, I.R. Expression and genomic analysis of midasin, a novel and highly conserved AAA protein distantly related to dynein. BMC Genomics 3, 18 (2002).
    Article Google Scholar
  18. Okazaki, Y. et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
    Article Google Scholar
  19. Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).
    Article CAS Google Scholar
  20. Wang, J. et al. Mouse transcriptome: Neutral evolution of 'non-coding' complementary DNAs (reply). Nature 431, 757 (2004).
    Article CAS Google Scholar
  21. Wyers, F. et al. Cryptic Pol II transcripts are degraded by a nuclear quality control pathway involving a new poly(A) polymerase. Cell 121, 725–737 (2005).
    Article CAS Google Scholar
  22. Wong, G.K., Passey, D.A. & Yu, J. Most of the human genome is transcribed. Genome Res. 11, 1975–1977 (2001).
    Article CAS Google Scholar
  23. Kent, W.J. BLAT - The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
    Article CAS Google Scholar
  24. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence Project: update and current status. Nucleic Acids Res. 31, 34–37 (2003).
    Article CAS Google Scholar
  25. Hubbard, T. et al. Ensembl 2005. Nucleic Acids Res. 33, D447–D453 (2005).
    Article CAS Google Scholar
  26. Pontius, J.U., Wagner, L. & Schuler, G.D. Unigene: A unified view of the transcriptome. in The NCBI Handbook (National Center for Biotechnology Information, Bethesda, MD, 2003).
    Google Scholar

Download references

Acknowledgements

We thank G.E. Hinton for conversations and C. Boone and B. Andrews for their support. This work was supported by grants from the Canadian Institutes of Health Research, the Natural Sciences and Engineering Research Council of Canada and the Canadian Foundation for Innovation (to T.R.H., B.J.F. and B.J.B.), by a PREA award (to B.J.F.) and by a Natural Sciences and Engineering Research Council of Canada postdoctoral fellowship (to Q.D.M.).

Author information

Author notes

  1. Brendan J Frey, Naveed Mohammad, Quaid D Morris and Wen Zhang: These authors contributed equally to this work.

Authors and Affiliations

  1. Electrical and Computer Engineering, University of Toronto, 10 King's College Rd., Toronto, M5S 3G4, Ontario, Canada
    Brendan J Frey, Quaid D Morris & Mark D Robinson
  2. Banting and Best Department of Medical Research, University of Toronto, 112 College St., Toronto, M5G 1L6, Ontario, Canada
    Brendan J Frey, Naveed Mohammad, Quaid D Morris, Wen Zhang, Mark D Robinson, Sanie Mnaimneh, Richard Chang, Qun Pan, Benjamin J Blencowe & Timothy R Hughes
  3. Medical Genetics and Microbiology, University of Toronto, 1 King's College Ct., Toronto, M5S 3G4, Ontario, Canada
    Wen Zhang, Janet Rossant, Benoit G Bruneau, Jane E Aubin, Benjamin J Blencowe & Timothy R Hughes
  4. Samuel Lunenfeld Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, M5G 1X5, Ontario, Canada
    Eric Sat & Janet Rossant
  5. The Hospital for Sick Children, 555 University Ave., Toronto, M5G 1X8, Ontario, Canada
    Benoit G Bruneau

Authors

  1. Brendan J Frey
    You can also search for this author inPubMed Google Scholar
  2. Naveed Mohammad
    You can also search for this author inPubMed Google Scholar
  3. Quaid D Morris
    You can also search for this author inPubMed Google Scholar
  4. Wen Zhang
    You can also search for this author inPubMed Google Scholar
  5. Mark D Robinson
    You can also search for this author inPubMed Google Scholar
  6. Sanie Mnaimneh
    You can also search for this author inPubMed Google Scholar
  7. Richard Chang
    You can also search for this author inPubMed Google Scholar
  8. Qun Pan
    You can also search for this author inPubMed Google Scholar
  9. Eric Sat
    You can also search for this author inPubMed Google Scholar
  10. Janet Rossant
    You can also search for this author inPubMed Google Scholar
  11. Benoit G Bruneau
    You can also search for this author inPubMed Google Scholar
  12. Jane E Aubin
    You can also search for this author inPubMed Google Scholar
  13. Benjamin J Blencowe
    You can also search for this author inPubMed Google Scholar
  14. Timothy R Hughes
    You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toBenjamin J Blencowe or Timothy R Hughes.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

About this article

Cite this article

Frey, B., Mohammad, N., Morris, Q. et al. Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs.Nat Genet 37, 991–996 (2005). https://doi.org/10.1038/ng1630

Download citation

This article is cited by