Computational methods for exon detection (original) (raw)

References

  1. Pearce, M., Blake, D. J., Tinsley, J. M., Byth, B. C., Campbell, L., Monaco, A. P., and Davies, K. E. (1993) The utrophin and dystrophin genes share similarities in genomic structure.Hum. Mol. Genet. 2, 1765–1772.
    Article PubMed CAS Google Scholar
  2. Levinson, B., Kenwrick, S., Gamel, P., Fisher, K., and Gitschier, J. (1992) Evidence for a third transcript from the human factor VIII gene.Genomics 14, 585–589.
    Article PubMed CAS Google Scholar
  3. De Backer, O., Verheyden, A. M., Martin, B., Godelaine, D., De Plaen, E., Brasseur, R., Avner, P., and Boon, T. (1995) Structure, chromosomal location, and expression pattern of three mouse genes homologous to the human MAGE genes.Genomics 28, 74–83.
    Article PubMed Google Scholar
  4. Legouis R., Hardelin, J-P., Levilliers, J., Claverie, J.-M., Compain, S., Wunderle, V., Millasseau P., Le Paslier D., Cohen D., Caterina D., Bougueleret, L., Lutfalla G., Weissenbach J., and Petit C. (1991) The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules.Cell 67, 423–435.
    Article PubMed CAS Google Scholar
  5. Senapathy, P., Shapiro, M. B., and Harris, N. L. (1990) Splice junctions, Branch point sites, and exons: sequence statistics, identification, and applications to genome project.Methods Enzymol. 183, 252–278.
    PubMed CAS Google Scholar
  6. Stormo, G. D. (1990) Consensus patterns in DNA.Methods Enzymol. 183, 211–221.
    PubMed CAS Google Scholar
  7. Brunak, S., Engelbrecht, J., and Knudsen, S. (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence.J. Mol. Biol. 220, 49–65.
    Article PubMed CAS Google Scholar
  8. Simmler, M. C., Cunningham, D., Clerc, P., Vermat T., Cruaud C., Pawlak, A., Szpirer C., Weissenbach, J., Claverie J.-M., and Avner, P. (1996) A 94kb genomic sequence 3′ to the murine_Xist_ gene reveals an AT-rich region containing a new testis specific gene_Tex_.Hum. Mol. Genet. 5, 1713–726.
    Article PubMed CAS Google Scholar
  9. Hawkins, J. D. (1988) A survey of intron and exon lengths.Nucl. Acids. Res. 21, 9893–9908.
    Article Google Scholar
  10. Snyder, E. E., and Stormo, G. D. (1995) Identification of Protein Coding Regions In Genomic DNA.J. Mol. Biol. 248, 1–18.
    Article PubMed CAS Google Scholar
  11. Grantham, R., Gautier, C., Gouy, M., Mercier, R., and Pavé, A. (1980) Codon catalog usage and the genome hypothesis.Nucleic Acids Res. 8, r49-r60.
    PubMed CAS Google Scholar
  12. Staden, R. (1990) Finding protein coding regions in genomic sequences.Methods Enzymol. 183, 163–180.
    PubMed CAS Google Scholar
  13. Shepherd, J. C. W. (1981)Proc. Nat. Acad. Sci. USA 78, 1596–1600.
    Article PubMed CAS Google Scholar
  14. Shepherd, J. C. W. Ancient patterns in nucleic acid sequences.Methods Enzymol. 183, 180–192.
  15. Fickett, J. W. (1982) Recognition of protein coding regions in DNA sequences.Nucleic Acids Res. 10, 5303–5318.
    Article PubMed CAS Google Scholar
  16. Claverie, J.-M., and Bougueleret, L. (1986) Heuristic informational analysis of sequences.Nucleic Acids Res. 14, 179–196.
    Article PubMed CAS Google Scholar
  17. Beckmann, J. S., Brendel, V., and Trifonov, E. N. (1986) Intervening sequences exhibit distinct vocabulary.J. Biomolec. Struct. Dynamics 4, 391–400.
    CAS Google Scholar
  18. Borodovsky, M., Sprizhitskii, Y. A., Golovanov, E. I., and Aleksandrov, A. A. (1986) Statistical patterns in primary structure of the functional regions of the genome in_E. Coli_. III. Computer recognition of coding regions.Molekulyarnaya Biologiya 20, 1390–1398.
    Google Scholar
  19. Fickett, J. W., and Tung, C.-S. (1992) Assessment of protein coding measures.Nucleic Acids Res. 20, 6441–6450.
    Article PubMed CAS Google Scholar
  20. Claverie, J.-M., Sauvaget, I., and Bougueleret, L. (1990) k-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping.Meth. Enzym. 183, 237–252.
    PubMed CAS Google Scholar
  21. Bougueleret, L., Tekaia F., Sauvaget, I., and Claverie, J.-M. (1988) Objective comparison of exon and intron sequences by the mean of 2-dimensional data analysis methods.Nucleic Acids Res. 16, 1729–1738.
    Article PubMed CAS Google Scholar
  22. Borodovsky, M. Y., Rudd, K. E., and Koonin E. V. (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.Nucleic Acids Res. 22, 4756–4767.
    Article PubMed CAS Google Scholar
  23. Uberbacher, E. C., and Mural, R. J. (1991) Locating protein-coding regions in DNA sequences by a multiple sensor-neural approach.Proc. Natl. Acad. Sci. USA 88, 11,261–11,265.
    Article CAS Google Scholar
  24. Xu, Y., Einstein, J. R., Mural, R. J., Shah, M. B., and Uberbacher, E. C. (1994) Recognizing exons in genomic sequence using grail II, in:Genetic Engineering: Principles and Methods, (Setlow, J., ed.), Plenum Press.
  25. Sulston, J., Du, Z., Thomas, K., Wilson, R., Hillier, L., Staden, R., Halloran, N., Green, P., Thierry-Mieg, J., Qiu, L., et al. (1992) The C. elegans genome sequencing project: a beginning.Nature 356, 37–41.
    Article PubMed Google Scholar
  26. Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992) Prediction of gene structure.J. Mol. Biol. 226, 141–157.
    Article PubMed CAS Google Scholar
  27. Solovyev V. V., Salamov A. A., and Lawrence, C. B. (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Nucleic Acids Res. 22, 5156–5163.
    Article PubMed CAS Google Scholar
  28. Zhang, M. Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis.Proc. natl. Acad. Sci. USA 94, 565–568.
    Article PubMed CAS Google Scholar
  29. Claverie, J.-M. (1997) Computational methods for the identification of genes in vertebrate genomic sequences.Human Molec. Genetics 6, 1735–1744.
    Article CAS Google Scholar
  30. http://igs-server. cnrs-mrs.fr
  31. Wu T. D. (1996) A segment-based dynamic programming algorithm for predicting gene.J. Comput. Biol. 3, 375–394.
    Article PubMed CAS Google Scholar
  32. Burge C., and Karlin S. (1997) Prediction of complete gene structure in human genomic DNA.J. Mol. Biol. 268, 1–17.
    Article Google Scholar
  33. Xu, Y., Mural R. J., and Uberbacher E. C. (1994) Constructing gene models from accurately predicted exons: an application of dynamic programming.Comput. Appl. Biosci. 10, 613–623.
    PubMed CAS Google Scholar
  34. Claverie, J.-M. (1995) Progress in large scale sequence analysis, in:Advances in Computational Biology (H. Villar, ed.), Vol. 2, JAI Press, London.
    Google Scholar
  35. Lopez, R., Larsen, F., and Prydz, H. (1994) Evaluation of the exon prediction of the Grail software.Genomics 24, 133–136.
    Article PubMed CAS Google Scholar
  36. Ansari-Lari M. A., Shen, Y., Muzny D. M., Lee, W., and Gibbs R. A. (1997) Large-scale sequencing in human chromosome 12p13: experimental and computational gene structure determination.Genome Res. 7, 268–280.
    Article PubMed CAS Google Scholar
  37. Ansari-Lari M. A., Muzny D. M., Lu J., Lu F., Lilley C. E., Spanos S., Malley T., and Gibbs R. A. (1996) A gene-rich cluster between the CD4 and triose-phosphate isomerase genes at human chromosome 12p13.Genome Res. 6, 314–326.
    Article PubMed CAS Google Scholar
  38. Hunkapiller, T., Kaiser, R. J., Koop, B. F., and Hood, L. (1991) Large-scale and automated DNA sequence determination.Science 254, 59–67.
    Article PubMed CAS Google Scholar
  39. Olson, M. V. (1993) The human genome project.Proc. Natl. Acad. Sci. USA 90, 4338–4344.
    Article PubMed CAS Google Scholar
  40. Nowak, R. (1995) Bacterial genome sequence bagged news.Science 269, 468–470.
    Article PubMed CAS Google Scholar
  41. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J.-F., Dougherty, B. A., Merrick, J. M., et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.Science 269, 496–512.
    Article PubMed CAS Google Scholar
  42. Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project.Science 252, 1651–1656.
    Article PubMed CAS Google Scholar
  43. Adams, M. D., Dubnick, M., Kerlavage, A. R., Moreno, R. F., Kelley, J. M., Utterback, T. R., Nagle, J. W., Fields, C. A., and Venter, J. C. (1992) Sequence Identification of 2,375 human brain genes.Nature 355, 632–634.
    Article PubMed CAS Google Scholar
  44. Adams, M. D., Kerlavage, A. R., Fields, C., and Venter, J. C. (1993) 3,400 new expressed sequence tags identify diversity of transcripts in human brain.Nature Genet. 4, 256–267.
    Article PubMed CAS Google Scholar
  45. Adams, M. D., Soares, M. B., Kerlavage, A. R., Fields, C., and Venter, J. C. (1993) Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library.Nature Genet. 4, 373–380.
    Article PubMed CAS Google Scholar
  46. (1995) Merck releases first ‘gene index’ sequences news.Nature 373, 549.
  47. Hillier L. D., Lennon G., Becker M., Bonaldo M. F., Chiapelli B., Chissoe S., Dietrich N., DuBuque T., Favello A., Gish W., Hawkins M. Hultman M., Kucaba T., Lacy M., Le M., Le, N., Mardis E., Moore B., Morris M., Parsons J., Prange C., Rifkin L., Rohlfing T., Schellenberg K., Marra M., et al. (1996) Generation and analysis of 280,000 human expressed sequence tags.Genome Res. 6, 807–828.
    Article PubMed CAS Google Scholar
  48. Aaronson J. S., Eckman B., Blevins R. A., Borkowski J. A., Myerson J., Imran S., and Elliston K. O. (1996) Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.Genome Res. 6, 829–845.
    Article PubMed CAS Google Scholar
  49. Adams M. D., Kerlavage A. R., Fleischmann R. D., Fuldner R. A., Bult C. J., Lee, N. H., Kirkness E. F., Weinstock K. G., Gocayne J. D., White O., et al. (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence.Nature 377 (6547 Suppl.), 3–174.
    PubMed CAS Google Scholar
  50. Benson, D. A., Boguski, M., Lipman, D. J., and Ostell, J. (1994) GenBank.Nucleic Acids Res. 22, 3441–3444.
    Article PubMed CAS Google Scholar
  51. Boguski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993) dbEST—database for “expressed sequence tags.”Nature Genet. 4, 332–333.
    Article PubMed CAS Google Scholar
  52. Kuska, B. 1996. Cancer genome anatomy project set for take-off.J. Natl. Cancer Inst. 88, 1801–1803.
    Article PubMed CAS Google Scholar
  53. O'Brien, C. 1997. Cancer genome anatomy project launched.Mol. Med. Today 3, 94.
    PubMed Google Scholar
  54. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool.J. Mol. Biol. 215, 403–410.
    PubMed CAS Google Scholar
  55. Altschul, S. F., Madden, T. L., Alejandro A., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res. 25, 3389–3402.
    Article PubMed CAS Google Scholar
  56. Claverie, J-M (1992) Identifying coding exons by similarity search: Alu-derived and other potentially misleading protein sequences.Genomics 12, 838–841.
    Article PubMed CAS Google Scholar
  57. Gish, W. and States, D. J. (1993) Identification of protein coding regions by database similarity search.Nature Genet. 3, 266–272.
    Article PubMed CAS Google Scholar
  58. Claverie, J.-M. (1994) A treamlined random sequencing strategy for finding coding exons.Genomics 23, 575–581.
    Article PubMed CAS Google Scholar
  59. Oliver, S. G., van der Aart, Q. J., Agostoni-Carbone, M. L., Aigle, M., Alberghina, L., Alexandraki, D., Antoine, G., Anwar, R., Ballesta, J. P., Benit, P., et al. (1992) The complete DNA sequence of yeast chromosome III.Nature 357, 38–46.
    Article PubMed CAS Google Scholar
  60. Dujon, B., Alexandraki, D., Andre, B., Ansorge, W., Baladron, V., Ballesta, J. P., Banrevi, A., Bolle, P. A., Bolotin-Fukuhara, M., Bossier, P., et al. (1994) Complete DNA sequence of yeast chromosome XI.Nature 369, 371–378.
    Article PubMed CAS Google Scholar
  61. Wilson, R., Ainscough, R., Anderson, K., Baynes, C., Berks, M., Bonfield, J., Burton, J., Connell, M., Copsey, T., Cooper, J., et al. (1994) 2. 2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans.Nature 368, 32–38.
    Article PubMed CAS Google Scholar
  62. Green, P., Lipman, D., Hillier, L., Waterston, R., States, D., and Claverie, J.-M. (1993) Ancient conserved regions in new gene sequences and the protein databases.Science 259, 1711–1716.
    Article PubMed CAS Google Scholar
  63. Claverie, J.-M. (1993) Database of ancient sequences.Nature 364, 19,20.
    PubMed CAS Google Scholar
  64. Bairoch, A. and Boeckmann, B. (1994) The SWISS-PROT protein sequence database: current status.Nucleic Acids Res. 22, 3578–3580.
    Article PubMed CAS Google Scholar
  65. Brockdorff, N., Ashworth, A., Kay, G.F., McCabe, V. M., Norris, D. P., Cooper, P. J., Swift, S., and Rastan, S. (1992) The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus.Cell 71, 515–526.
    Article PubMed CAS Google Scholar
  66. Pfeifer K., Leighton P. A., and Tilghman S. M. (1996) The structural H19 gene is required for transgene imprinting.Proc. Natl. Acad. Sci. USA 93, 13,876–13,883.
    Article CAS Google Scholar
  67. Wevrick R., and Francke U. (1997) An imprinted mouse transcript homologous to the human imprinted in Prader-Willi syndrome (IPW) gene.Hum. Mol. Genet. 6, 325–332.
    Article PubMed CAS Google Scholar
  68. Velleca, M. A., Wallace, M. C., and Merlie, J. P. (1994) A novel synapse-associated noncoding RNA.Mol. Cell. Biol. 14, 7095–7104.
    PubMed CAS Google Scholar
  69. Askew, D. S., Li, J., and Ihle, J. N. (1994) Retroviral insertions in the murine His-1 locus activate the expression of a novel RNA that lacks an extensive open reading frame.Mol. Cell. Biol. 14, 1743–1751.
    PubMed CAS Google Scholar
  70. Liu A. Y., Torchia B. S., Migeon B. R., and Siliciano R. F. (1997) The human NTT gene: identification of a novel 17-kb noncoding nuclear RNA expressed in activated CD4+ T cells.Genomics 39, 171–184.
    Article PubMed CAS Google Scholar
  71. Fichant, G. A. and Burks, C. (1991) Identifying potential genes in genomic DNA sequences.J. Mol. Biol. 220, 659–671.
    Article PubMed CAS Google Scholar
  72. Laferriere A., Gautheret D., and Cedergren R. (1994) An RNA pattern matching program with enhanced performance and portability.Comput. Appl. Biosci. 10, 211,212.
    PubMed CAS Google Scholar
  73. States, D. J., Gish, W., and Altschul, S. F. (1991) Improved sensitivity of nucleic acid database searches using application-specific scoring matrices.Methods 3, 66–70.
    Article CAS Google Scholar
  74. Altschul, S. F. (1991) Amino acid substitution matrices from an information theoric perspective.J. Mol. Biol. 219, 555–565.
    Article PubMed CAS Google Scholar
  75. Claverie, J.-M. (1993) Detecting Frame shifts by amino acid sequence comparison.J. Mol. Biol. 234, 1140–1157.
    Article PubMed CAS Google Scholar
  76. Henikoff, S. and Henikoff, J. G. (1993) Performance evaluation of amino acid substitution matrices.Proteins 17, 49–61.
    Article PubMed CAS Google Scholar
  77. Claverie, J-M. (1994) A streamlined random sequencing strategy for finding coding exons.Genomics 23, 575–581.
    Article PubMed CAS Google Scholar
  78. Rice, C. M. and Cameron, G. N. (1994) Submission of nucleotide sequences data to EMBL/Genbank/DDBJ.Methods Mol. Biol. 24, 355–366.
    PubMed CAS Google Scholar
  79. Pearson W. R. (1990) rapid and sensitive sequence comparison with FASTP and FASTA.Meth. Enzymol. 183, 4698–4702.
    Google Scholar
  80. Sturrock, S. and Collins, J. (1993) MPsrch version 1.3. Biocomputing Research Unit, University of Edinburgh, UK.
    Google Scholar
  81. Claverie, J. M. and Makalowski, W. (1994) Alu alert.Nature 371, 752–752.
    Article PubMed CAS Google Scholar
  82. Kehoe, B. P. (1996)Zen and the Art of the Internet: A Beginner's Guide. Fourth Edition. Prentice Hall: Englewood Cliffs, NJ.
    Google Scholar
  83. Internet for the Molecular Biologist (1996) (Swindell, S. R., Miller, R. R., and Myers G., eds.), ISBN1-898486-02-6, Horizon Scientific Press, London, UK.
    Google Scholar
  84. Claverie, J. M. and States, D. (1993) Information enhancement methods for large scale sequence analysis.Computers Chem. 17, 191–201.
    Article CAS Google Scholar
  85. Claverie, J.-M. (1994) Large scale sequence analysis, in_Automated DNA Sequencing and Analysis Techniques_ (Adams, M. D., Fields, C., and Venter, J. C., eds.), Academic Press, New York, pp. 267–279.
    Google Scholar
  86. Claverie, J. M. (1996) Effective large scale sequence similarity searches, in_Computer Methods for Macromolecular Sequence Analysis_ (Doolittle, R., ed.), pp. 212–227.
  87. Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases.Nature Genet. 6, 119–129.
    Article PubMed CAS Google Scholar
  88. Burglin, T. R., and Barnes, T. M. (1992) Introns in sequence tags.Nature 357, 367.
    Article PubMed CAS Google Scholar
  89. Smit A. F. A. and Green P. (1997) The RepeatMasker program, available at http://ftp.genome.washington.edu.

Download references