Computational methods for exon detection (original) (raw)
References
Pearce, M., Blake, D. J., Tinsley, J. M., Byth, B. C., Campbell, L., Monaco, A. P., and Davies, K. E. (1993) The utrophin and dystrophin genes share similarities in genomic structure.Hum. Mol. Genet.2, 1765–1772. ArticlePubMedCAS Google Scholar
Levinson, B., Kenwrick, S., Gamel, P., Fisher, K., and Gitschier, J. (1992) Evidence for a third transcript from the human factor VIII gene.Genomics14, 585–589. ArticlePubMedCAS Google Scholar
De Backer, O., Verheyden, A. M., Martin, B., Godelaine, D., De Plaen, E., Brasseur, R., Avner, P., and Boon, T. (1995) Structure, chromosomal location, and expression pattern of three mouse genes homologous to the human MAGE genes.Genomics28, 74–83. ArticlePubMed Google Scholar
Legouis R., Hardelin, J-P., Levilliers, J., Claverie, J.-M., Compain, S., Wunderle, V., Millasseau P., Le Paslier D., Cohen D., Caterina D., Bougueleret, L., Lutfalla G., Weissenbach J., and Petit C. (1991) The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules.Cell67, 423–435. ArticlePubMedCAS Google Scholar
Senapathy, P., Shapiro, M. B., and Harris, N. L. (1990) Splice junctions, Branch point sites, and exons: sequence statistics, identification, and applications to genome project.Methods Enzymol.183, 252–278. PubMedCAS Google Scholar
Stormo, G. D. (1990) Consensus patterns in DNA.Methods Enzymol.183, 211–221. PubMedCAS Google Scholar
Brunak, S., Engelbrecht, J., and Knudsen, S. (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence.J. Mol. Biol.220, 49–65. ArticlePubMedCAS Google Scholar
Simmler, M. C., Cunningham, D., Clerc, P., Vermat T., Cruaud C., Pawlak, A., Szpirer C., Weissenbach, J., Claverie J.-M., and Avner, P. (1996) A 94kb genomic sequence 3′ to the murine_Xist_ gene reveals an AT-rich region containing a new testis specific gene_Tex_.Hum. Mol. Genet.5, 1713–726. ArticlePubMedCAS Google Scholar
Hawkins, J. D. (1988) A survey of intron and exon lengths.Nucl. Acids. Res.21, 9893–9908. Article Google Scholar
Snyder, E. E., and Stormo, G. D. (1995) Identification of Protein Coding Regions In Genomic DNA.J. Mol. Biol.248, 1–18. ArticlePubMedCAS Google Scholar
Grantham, R., Gautier, C., Gouy, M., Mercier, R., and Pavé, A. (1980) Codon catalog usage and the genome hypothesis.Nucleic Acids Res.8, r49-r60. PubMedCAS Google Scholar
Staden, R. (1990) Finding protein coding regions in genomic sequences.Methods Enzymol.183, 163–180. PubMedCAS Google Scholar
Shepherd, J. C. W. Ancient patterns in nucleic acid sequences.Methods Enzymol.183, 180–192.
Fickett, J. W. (1982) Recognition of protein coding regions in DNA sequences.Nucleic Acids Res.10, 5303–5318. ArticlePubMedCAS Google Scholar
Claverie, J.-M., and Bougueleret, L. (1986) Heuristic informational analysis of sequences.Nucleic Acids Res.14, 179–196. ArticlePubMedCAS Google Scholar
Beckmann, J. S., Brendel, V., and Trifonov, E. N. (1986) Intervening sequences exhibit distinct vocabulary.J. Biomolec. Struct. Dynamics4, 391–400. CAS Google Scholar
Borodovsky, M., Sprizhitskii, Y. A., Golovanov, E. I., and Aleksandrov, A. A. (1986) Statistical patterns in primary structure of the functional regions of the genome in_E. Coli_. III. Computer recognition of coding regions.Molekulyarnaya Biologiya20, 1390–1398. Google Scholar
Fickett, J. W., and Tung, C.-S. (1992) Assessment of protein coding measures.Nucleic Acids Res.20, 6441–6450. ArticlePubMedCAS Google Scholar
Claverie, J.-M., Sauvaget, I., and Bougueleret, L. (1990) k-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping.Meth. Enzym.183, 237–252. PubMedCAS Google Scholar
Bougueleret, L., Tekaia F., Sauvaget, I., and Claverie, J.-M. (1988) Objective comparison of exon and intron sequences by the mean of 2-dimensional data analysis methods.Nucleic Acids Res.16, 1729–1738. ArticlePubMedCAS Google Scholar
Borodovsky, M. Y., Rudd, K. E., and Koonin E. V. (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.Nucleic Acids Res.22, 4756–4767. ArticlePubMedCAS Google Scholar
Uberbacher, E. C., and Mural, R. J. (1991) Locating protein-coding regions in DNA sequences by a multiple sensor-neural approach.Proc. Natl. Acad. Sci. USA88, 11,261–11,265. ArticleCAS Google Scholar
Xu, Y., Einstein, J. R., Mural, R. J., Shah, M. B., and Uberbacher, E. C. (1994) Recognizing exons in genomic sequence using grail II, in:Genetic Engineering: Principles and Methods, (Setlow, J., ed.), Plenum Press.
Sulston, J., Du, Z., Thomas, K., Wilson, R., Hillier, L., Staden, R., Halloran, N., Green, P., Thierry-Mieg, J., Qiu, L., et al. (1992) The C. elegans genome sequencing project: a beginning.Nature356, 37–41. ArticlePubMed Google Scholar
Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992) Prediction of gene structure.J. Mol. Biol.226, 141–157. ArticlePubMedCAS Google Scholar
Solovyev V. V., Salamov A. A., and Lawrence, C. B. (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.Nucleic Acids Res.22, 5156–5163. ArticlePubMedCAS Google Scholar
Zhang, M. Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis.Proc. natl. Acad. Sci. USA94, 565–568. ArticlePubMedCAS Google Scholar
Claverie, J.-M. (1997) Computational methods for the identification of genes in vertebrate genomic sequences.Human Molec. Genetics6, 1735–1744. ArticleCAS Google Scholar
Wu T. D. (1996) A segment-based dynamic programming algorithm for predicting gene.J. Comput. Biol.3, 375–394. ArticlePubMedCAS Google Scholar
Burge C., and Karlin S. (1997) Prediction of complete gene structure in human genomic DNA.J. Mol. Biol.268, 1–17. Article Google Scholar
Xu, Y., Mural R. J., and Uberbacher E. C. (1994) Constructing gene models from accurately predicted exons: an application of dynamic programming.Comput. Appl. Biosci.10, 613–623. PubMedCAS Google Scholar
Claverie, J.-M. (1995) Progress in large scale sequence analysis, in:Advances in Computational Biology (H. Villar, ed.), Vol. 2, JAI Press, London. Google Scholar
Lopez, R., Larsen, F., and Prydz, H. (1994) Evaluation of the exon prediction of the Grail software.Genomics24, 133–136. ArticlePubMedCAS Google Scholar
Ansari-Lari M. A., Shen, Y., Muzny D. M., Lee, W., and Gibbs R. A. (1997) Large-scale sequencing in human chromosome 12p13: experimental and computational gene structure determination.Genome Res.7, 268–280. ArticlePubMedCAS Google Scholar
Ansari-Lari M. A., Muzny D. M., Lu J., Lu F., Lilley C. E., Spanos S., Malley T., and Gibbs R. A. (1996) A gene-rich cluster between the CD4 and triose-phosphate isomerase genes at human chromosome 12p13.Genome Res.6, 314–326. ArticlePubMedCAS Google Scholar
Hunkapiller, T., Kaiser, R. J., Koop, B. F., and Hood, L. (1991) Large-scale and automated DNA sequence determination.Science254, 59–67. ArticlePubMedCAS Google Scholar
Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J.-F., Dougherty, B. A., Merrick, J. M., et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.Science269, 496–512. ArticlePubMedCAS Google Scholar
Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., et al. (1991) Complementary DNA sequencing: expressed sequence tags and human genome project.Science252, 1651–1656. ArticlePubMedCAS Google Scholar
Adams, M. D., Dubnick, M., Kerlavage, A. R., Moreno, R. F., Kelley, J. M., Utterback, T. R., Nagle, J. W., Fields, C. A., and Venter, J. C. (1992) Sequence Identification of 2,375 human brain genes.Nature355, 632–634. ArticlePubMedCAS Google Scholar
Adams, M. D., Kerlavage, A. R., Fields, C., and Venter, J. C. (1993) 3,400 new expressed sequence tags identify diversity of transcripts in human brain.Nature Genet.4, 256–267. ArticlePubMedCAS Google Scholar
Adams, M. D., Soares, M. B., Kerlavage, A. R., Fields, C., and Venter, J. C. (1993) Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library.Nature Genet.4, 373–380. ArticlePubMedCAS Google Scholar
(1995) Merck releases first ‘gene index’ sequences news.Nature373, 549.
Hillier L. D., Lennon G., Becker M., Bonaldo M. F., Chiapelli B., Chissoe S., Dietrich N., DuBuque T., Favello A., Gish W., Hawkins M. Hultman M., Kucaba T., Lacy M., Le M., Le, N., Mardis E., Moore B., Morris M., Parsons J., Prange C., Rifkin L., Rohlfing T., Schellenberg K., Marra M., et al. (1996) Generation and analysis of 280,000 human expressed sequence tags.Genome Res.6, 807–828. ArticlePubMedCAS Google Scholar
Aaronson J. S., Eckman B., Blevins R. A., Borkowski J. A., Myerson J., Imran S., and Elliston K. O. (1996) Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data.Genome Res.6, 829–845. ArticlePubMedCAS Google Scholar
Adams M. D., Kerlavage A. R., Fleischmann R. D., Fuldner R. A., Bult C. J., Lee, N. H., Kirkness E. F., Weinstock K. G., Gocayne J. D., White O., et al. (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence.Nature 377 (6547 Suppl.), 3–174. PubMedCAS Google Scholar
Benson, D. A., Boguski, M., Lipman, D. J., and Ostell, J. (1994) GenBank.Nucleic Acids Res.22, 3441–3444. ArticlePubMedCAS Google Scholar
Boguski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993) dbEST—database for “expressed sequence tags.”Nature Genet.4, 332–333. ArticlePubMedCAS Google Scholar
Kuska, B. 1996. Cancer genome anatomy project set for take-off.J. Natl. Cancer Inst.88, 1801–1803. ArticlePubMedCAS Google Scholar
O'Brien, C. 1997. Cancer genome anatomy project launched.Mol. Med. Today3, 94. PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool.J. Mol. Biol.215, 403–410. PubMedCAS Google Scholar
Altschul, S. F., Madden, T. L., Alejandro A., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res.25, 3389–3402. ArticlePubMedCAS Google Scholar
Claverie, J-M (1992) Identifying coding exons by similarity search: Alu-derived and other potentially misleading protein sequences.Genomics12, 838–841. ArticlePubMedCAS Google Scholar
Gish, W. and States, D. J. (1993) Identification of protein coding regions by database similarity search.Nature Genet.3, 266–272. ArticlePubMedCAS Google Scholar
Claverie, J.-M. (1994) A treamlined random sequencing strategy for finding coding exons.Genomics23, 575–581. ArticlePubMedCAS Google Scholar
Oliver, S. G., van der Aart, Q. J., Agostoni-Carbone, M. L., Aigle, M., Alberghina, L., Alexandraki, D., Antoine, G., Anwar, R., Ballesta, J. P., Benit, P., et al. (1992) The complete DNA sequence of yeast chromosome III.Nature357, 38–46. ArticlePubMedCAS Google Scholar
Dujon, B., Alexandraki, D., Andre, B., Ansorge, W., Baladron, V., Ballesta, J. P., Banrevi, A., Bolle, P. A., Bolotin-Fukuhara, M., Bossier, P., et al. (1994) Complete DNA sequence of yeast chromosome XI.Nature369, 371–378. ArticlePubMedCAS Google Scholar
Wilson, R., Ainscough, R., Anderson, K., Baynes, C., Berks, M., Bonfield, J., Burton, J., Connell, M., Copsey, T., Cooper, J., et al. (1994) 2. 2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans.Nature368, 32–38. ArticlePubMedCAS Google Scholar
Green, P., Lipman, D., Hillier, L., Waterston, R., States, D., and Claverie, J.-M. (1993) Ancient conserved regions in new gene sequences and the protein databases.Science259, 1711–1716. ArticlePubMedCAS Google Scholar
Bairoch, A. and Boeckmann, B. (1994) The SWISS-PROT protein sequence database: current status.Nucleic Acids Res.22, 3578–3580. ArticlePubMedCAS Google Scholar
Brockdorff, N., Ashworth, A., Kay, G.F., McCabe, V. M., Norris, D. P., Cooper, P. J., Swift, S., and Rastan, S. (1992) The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus.Cell71, 515–526. ArticlePubMedCAS Google Scholar
Pfeifer K., Leighton P. A., and Tilghman S. M. (1996) The structural H19 gene is required for transgene imprinting.Proc. Natl. Acad. Sci. USA93, 13,876–13,883. ArticleCAS Google Scholar
Wevrick R., and Francke U. (1997) An imprinted mouse transcript homologous to the human imprinted in Prader-Willi syndrome (IPW) gene.Hum. Mol. Genet.6, 325–332. ArticlePubMedCAS Google Scholar
Velleca, M. A., Wallace, M. C., and Merlie, J. P. (1994) A novel synapse-associated noncoding RNA.Mol. Cell. Biol.14, 7095–7104. PubMedCAS Google Scholar
Askew, D. S., Li, J., and Ihle, J. N. (1994) Retroviral insertions in the murine His-1 locus activate the expression of a novel RNA that lacks an extensive open reading frame.Mol. Cell. Biol.14, 1743–1751. PubMedCAS Google Scholar
Liu A. Y., Torchia B. S., Migeon B. R., and Siliciano R. F. (1997) The human NTT gene: identification of a novel 17-kb noncoding nuclear RNA expressed in activated CD4+ T cells.Genomics39, 171–184. ArticlePubMedCAS Google Scholar
Fichant, G. A. and Burks, C. (1991) Identifying potential genes in genomic DNA sequences.J. Mol. Biol.220, 659–671. ArticlePubMedCAS Google Scholar
Laferriere A., Gautheret D., and Cedergren R. (1994) An RNA pattern matching program with enhanced performance and portability.Comput. Appl. Biosci.10, 211,212. PubMedCAS Google Scholar
States, D. J., Gish, W., and Altschul, S. F. (1991) Improved sensitivity of nucleic acid database searches using application-specific scoring matrices.Methods3, 66–70. ArticleCAS Google Scholar
Altschul, S. F. (1991) Amino acid substitution matrices from an information theoric perspective.J. Mol. Biol.219, 555–565. ArticlePubMedCAS Google Scholar
Claverie, J.-M. (1993) Detecting Frame shifts by amino acid sequence comparison.J. Mol. Biol.234, 1140–1157. ArticlePubMedCAS Google Scholar
Henikoff, S. and Henikoff, J. G. (1993) Performance evaluation of amino acid substitution matrices.Proteins17, 49–61. ArticlePubMedCAS Google Scholar
Claverie, J-M. (1994) A streamlined random sequencing strategy for finding coding exons.Genomics23, 575–581. ArticlePubMedCAS Google Scholar
Rice, C. M. and Cameron, G. N. (1994) Submission of nucleotide sequences data to EMBL/Genbank/DDBJ.Methods Mol. Biol.24, 355–366. PubMedCAS Google Scholar
Pearson W. R. (1990) rapid and sensitive sequence comparison with FASTP and FASTA.Meth. Enzymol.183, 4698–4702. Google Scholar
Sturrock, S. and Collins, J. (1993) MPsrch version 1.3. Biocomputing Research Unit, University of Edinburgh, UK. Google Scholar
Kehoe, B. P. (1996)Zen and the Art of the Internet: A Beginner's Guide. Fourth Edition. Prentice Hall: Englewood Cliffs, NJ. Google Scholar
Internet for the Molecular Biologist (1996) (Swindell, S. R., Miller, R. R., and Myers G., eds.), ISBN1-898486-02-6, Horizon Scientific Press, London, UK. Google Scholar
Claverie, J. M. and States, D. (1993) Information enhancement methods for large scale sequence analysis.Computers Chem.17, 191–201. ArticleCAS Google Scholar
Claverie, J.-M. (1994) Large scale sequence analysis, in_Automated DNA Sequencing and Analysis Techniques_ (Adams, M. D., Fields, C., and Venter, J. C., eds.), Academic Press, New York, pp. 267–279. Google Scholar
Claverie, J. M. (1996) Effective large scale sequence similarity searches, in_Computer Methods for Macromolecular Sequence Analysis_ (Doolittle, R., ed.), pp. 212–227.
Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases.Nature Genet.6, 119–129. ArticlePubMedCAS Google Scholar