Computational prediction of eukaryotic protein-coding genes (original) (raw)
Claverie, J.-M. Computational methods for the identification of genes in vertebrate genomic sequences. Hum. Mol. Genet.6, 1735–1744 (1997). CASPubMed Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structure in human genomic DNA. J. Mol. Biol.268, 78–94 (1997).In this paper, the popular Genscan gene-prediction algorithm was first reported. CASPubMed Google Scholar
Milanesi, L. & Rogozin, I. B. in Guide to Human Genome Computing 2nd edn (ed. Bishop, M. J.) 215–260 (Academic, New York, 1998). Google Scholar
Krogh, A. in Guide to Human Genome Computing 2nd edn (ed. Bishop, M. J.) 261–274 (Academic, New York, 1998). Google Scholar
Pavy, N. et al. Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thaliana sequences. Bioinformatics15, 887–899 (1999). CASPubMed Google Scholar
Rogic, S., Mackworth, A. K. & Ouellette, F. B. F. Evaluation of gene-finding programs on mammalian sequences. Genome Res.11, 817–832 (2001). CASPubMedPubMed Central Google Scholar
Solovyev, V. V. in Current Topics in Computational Molecular Biology (eds Jiang, T., Xu, Y. & Zhang, M. Q.) 201–248 (MIT Press, Cambridge, Massachusetts, 2002).An up-to-date introduction and review on computational gene-prediction methods. Google Scholar
Brent, M. R. Predicting full-length transcripts. Trends Biotechnol.20, 273–275 (2002). CASPubMed Google Scholar
Zhang, M. Q. Statistical features of human exons and their flanking regions. Hum. Mol. Genet.7, 919–932 (1998). CASPubMed Google Scholar
Senapathy, P., Shapiro, M. B. & Harris, N. L. Splice junctions, branch point sites, and exons: sequence statistics, identification and application to genome project. Methods Enzymol.183, 252–278 (1990).A good introduction to the statistical features of splicing signals and exons. CASPubMed Google Scholar
Chen, T. & Zhang, M. Q. POMBE: a fission yeast gene-finding and exon–intron structure prediction system. Yeast14, 701–710 (1998). CASPubMed Google Scholar
Lim, L. P. & Burge, C. B. A computational analysis of sequence features involved in recognition of short introns. Proc. Natl Acad. Sci. USA98, 11193–11198 (2001).A systematic study of the sequence features that might define a short intron. CASPubMedPubMed Central Google Scholar
Robberson, B. L., Cote, G. J. & Berget, S. M. Exon definition may facilitate splice site selection in RNAs with multiple exons. Mol. Cell. Biol.10, 84–94 (1990). CASPubMedPubMed Central Google Scholar
Ripley, B. D. Pattern Recognition and Neural Networks (Cambridge Univ. Press, Cambridge, UK, 1996). Google Scholar
Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucleic Acids Res.22, 248–250 (1994). Google Scholar
Pertea, M., Lin, X. & Salzberg, S. L. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res.29, 1185–1190 (2001). CASPubMedPubMed Central Google Scholar
Fickett, J. W. & Tung, C.-S. Assessment of protein coding measures. Nucleic Acids Res.20, 6441–6450 (1992).This is a comprehensive assessment of protein-coding measures, which are used in many gene-prediction algorithms. CASPubMedPubMed Central Google Scholar
Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. Microbial gene identification using interpolated Markov models. Nucleic Acids Res.26, 544–548 (1998). CASPubMedPubMed Central Google Scholar
Bernardi, G. The human genome: organization and evolutionary history. Annu. Rev. Genet.29, 445–476 (1995). CASPubMed Google Scholar
Zhang, M. Q. Identification of protein coding regions in the human genome based on quadratic discriminant analysis. Proc. Natl Acad. Sci. USA94, 565–568 (1997). CASPubMedPubMed Central Google Scholar
Uberbacher, E. C. & Mural, R. J. Locating protein coding segments in human DNA sequences by a multiple sensor-neural network approach. Proc. Natl Acad. Sci. USA88, 11261–11265 (1991). CASPubMedPubMed Central Google Scholar
Graber, J. H., Cantor, C. R., Mohr, S. C. & Smith, T. F. In silico detection of control signals: mRNA 3′-end-processing sequences in diverse species. Proc. Natl Acad. Sci. USA96, 14055–14060 (1999). CASPubMedPubMed Central Google Scholar
Tabaska, J. E. & Zhang, M. Q. Detection of polyadenylation signals in human DNA sequences. Gene231, 77–86 (1999). CASPubMed Google Scholar
Tabaska, J. E., Davuluri, R. V. & Zhang, M. Q. Identifying the 3′-terminal exon in human DNA. Bioinformatics17, 602–607 (2001). CASPubMed Google Scholar
Schell, T., Kulozik, A. E. & Hentze, M. W. Integration of splicing, transport and translation to achieve mRNA quality control by the nonsense-mediated decay pathway. Genome Biol.3, ReviewS1006 (2002). PubMedPubMed Central Google Scholar
Cartegni, L., Chew, S. L. & Krainer, A. R. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Rev. Genet.3, 285–298 (2002). CASPubMed Google Scholar
Suzuki, Y. et al. DBTSS: database of human transcriptional start sites and full-length cDNAs. Nucleic Acids Res.30, 328–331 (2002). CASPubMedPubMed Central Google Scholar
Carey, M. & Smale, S. T. Transcriptional Regulation in Eukaryotes: Concepts, Strategies, and Techniques (Cold Spring Harbor Laboratory Press, New York, 2000). Google Scholar
Fickett, J. W. & Hatzigeorgiou, A. G. Eukaryotic promoter recognition. Genome Res.7, 861–878 (1997).The first comparison of promoter prediction programs. CASPubMed Google Scholar
Werner, T. Models for prediction and recognition of eukaryotic promoters. Mamm. Genome23, 168–175 (1999). Google Scholar
Ohler, U. & Niemann, H. Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet.17, 56–60 (2001). CASPubMed Google Scholar
Zhang, M. Q. in Current Topics in Computational Molecular Biology (eds Jiang, T., Xu, Y. & Zhang, M. Q.) 249–268 (MIT Press, Cambridge, Massachusetts, 2002). Google Scholar
Ioshikhes, I. P. & Zhang, M. Q. Large-scale human promoter mapping using CpG islands. Nature Genet.26, 61–63 (2000). CASPubMed Google Scholar
Scherf, M., Klingenhoff, A. & Werner, T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J. Mol. Biol.297, 599–606 (2000). CASPubMed Google Scholar
Solovyev, V. & Salamov, A. The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Proc. ISMB5, 294–302 (1997). CASPubMed Google Scholar
Down, T. A. & Hubbard, T. J. P. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res.12, 458–461 (2002). CASPubMedPubMed Central Google Scholar
Frech, K., Quandt, K. & Werner, T. Muscle actin genes: a first step towards computational classification of tissue specific promoters. In Silico Biol.1, 29–38 (1998). CASPubMed Google Scholar
Kel, A., Kel-Margoulis, O., Banemko, V. & Wingender, E. Recognition of NFATp/AP-1 composite elements within genes induced upon the activation of immune cells. J. Mol. Biol.288, 353–376 (1999). CASPubMed Google Scholar
Kozak, M. A progress report on translational control in eukaryotes. SciSTKE2001, PE1 (2001). CAS Google Scholar
Davuluri, R. V., Grosse, I. & Zhang, M. Q. Computational identification of promoters and first exons in the human genome. Nature Genet.29, 412–417 (2001).The first report of a first-exon prediction algorithm. CASPubMed Google Scholar
Fickett, J. W. ORFs and genes: how strong a connection? J. Comput. Biol.2, 117–123 (1995). CASPubMed Google Scholar
Harrison, P. M. et al. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res.12, 272–280 (2002). CASPubMedPubMed Central Google Scholar
Gelfand, M. S. & Roytberg, M. A. Prediction of the exon–intron structure by a dynamic programming approach. Biosystems30, 173–182 (1993). CASPubMed Google Scholar
Snyder, E. E. & Stormo, G. D. Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Res.11, 607–613 (1993). Google Scholar
Stormo, G. D. & Haussler, D. Optimally parsing a sequence into different classes based on multiple types of evidence. Proc. Int. Conf. ISMB2, 369–375 (1994). CASPubMed Google Scholar
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE77, 257–286 (1989). Google Scholar
Krogh, A. Two methods for improving performance of an HMM and their application for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol.5, 179–186 (1997). CASPubMed Google Scholar
Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. A generalized hidden Markov model for the recognition of human genes in DNA. Proc. Int. Conf. Intell. Syst. Mol. Biol.4, 134–142 (1996). CASPubMed Google Scholar
Hooper, P. M., Zhang, H. & Wishart, D. S. Prediction of genetic structure in eukaryotic DNA using reference point logistic regression and sequence alignment. Bioinformatics16, 425–438 (2000). CASPubMed Google Scholar
Cox, D. R. & Snell, E. J. Analysis of Binary Data 2nd edn (Chapman & Hall, London, 1989). Google Scholar
Rogic, S., Mackworth, A. K. & Ouellette, F. B. F. Improving gene recognition accuracy by combining predictions from two gene-finding programs. Bioinformatics (in the press).
Lukashin, A. V. & Borodovski, M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res.26, 1107–1115 (1998). CASPubMedPubMed Central Google Scholar
Reese, M. G., Kulp, D., Tammana, H. & Haussler, D. Genie — gene finding in Drosophila melanogaster. Genome Res.10, 529–538 (2000). CASPubMedPubMed Central Google Scholar
Burset, M. & Guigo, R. Evaluation of gene structure prediction programs. Genomics34, 353–367 (1996).The first comprehensive evaluation of gene-prediction programs using a common standard training set. CASPubMed Google Scholar
Korf, I., Flicek, P., Duan, D. & Brent, M. R. Integrating genomic homology into gene structure prediction. Bioinformatics17 (Suppl.), 140–148 (2001). Google Scholar
Frisch, M. et al. In silico prediction of scaffold/matrix attachment regions in large genome sequences. Genome Res.12, 349–354 (2002). CASPubMedPubMed Central Google Scholar
Zhan, H. C., Liu, D. P. & Liang, C. C. Insulator: from chromatin domain boundary to gene regulation. Hum. Genet.109, 471–478 (2001). CASPubMed Google Scholar
Gish, W. & States, D. J. Identification of protein coding regions by database similarity search. Nature Genet.3, 266–272 (1993). CASPubMed Google Scholar
Florea, L. et al. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res.8, 967–974 (1998). CASPubMedPubMed Central Google Scholar
Gelfand, M. S., Mironov, A. & Pevner, P. Gene recognition via spliced sequence alignment. Proc. Natl Acad. Sci. USA93, 9061–9066 (1996). CASPubMedPubMed Central Google Scholar
Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. Integrating database homology in a probabilistic gene structure model. Pacif. Symp. Biocomput. 232–244 (1997).
Xu, Y. & Uberbacher, E. C. Gene prediction by pattern recognition and homology search. Proc. Int. Conf. Intell. Syst. Mol. Biol.4, 241–251 (1996). CASPubMed Google Scholar
Krogh, A. Using database matches with HMMgene for automated gene detection in Drosophila. Genome Res.10, 523–528 (2000). CASPubMedPubMed Central Google Scholar
Gotoh, O. Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps. Bioinformatics16, 190–202 (2000). CASPubMed Google Scholar
Guigo, R. et al. An assessment of gene prediction accuracy in large DNA sequences. Genome Res.10, 1631–1642 (2000).A comparison ofab initioand alignment-based gene-prediction programs. CASPubMedPubMed Central Google Scholar
Yeh, R. F., Lim, L. P. & Burge, C. B. Computational inference of homologous gene structures in the human genome. Genome Res.11, 803–816 (2001). CASPubMedPubMed Central Google Scholar
Pennacchio, L. A. & Rubin, E. M. Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet.2, 100–119 (2001). CASPubMed Google Scholar
Mayor, C. et al. VISTA: visualizing global DNA sequence alignment of arbitrary length. Bioinformatics16, 1046–1047 (2000). CASPubMed Google Scholar
Schwartz, S. et al. PipMaker — a web server for aligning two genomic DNA sequences. Genome Res.10, 577–586 (2000). CASPubMedPubMed Central Google Scholar
Batzoglou, S. et al. Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res.10, 950–958 (2000). CASPubMedPubMed Central Google Scholar
Kent, W. J. & Zahler, A. M. Conservation, regulation, synteny, and introns in a large C. briggsae_–_C. elegans genomic alignment. Genome Res.10, 1115–1125 (2000). CASPubMed Google Scholar
Bafna, V. & Huson, D. H. The conserved exon method for gene finding. Proc. Int. Conf. Intell. Syst. Mol. Biol.8, 3–12 (2000). CASPubMed Google Scholar
Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T. & Guigo, R. SGP-1: prediction and validation of homologous genes based on sequence alignments. Genome Res.11, 1574–1583 (2001). CASPubMedPubMed Central Google Scholar
Pachter, L., Alexandersson, M. & Cawley, S. Applications of generalized pair hidden Markov models to alignment and gene finding problems. J. Comput. Biol.9, 389–399 (2002). CASPubMed Google Scholar
Claverie, J.-M. From bioinformatics to computational biology. Genome Res.10, 1277–1279 (2000). CASPubMed Google Scholar
Zhang, M. Q. Predicting full-length transcripts. Nature Biotechnol.20, 275 (2002). CAS Google Scholar
Miyajima, N., Burge, C. B. & Saito, T. Computational and experimental analysis identifies many novel human genes. Biochem.Biophys. Res. Commun.272, 801–807 (2000). CAS Google Scholar
Shoemaker, D. D. et al. Experimental annotation of the human genome using microarray technology. Nature409, 922–927 (2001). CASPubMed Google Scholar
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science296, 916–919 (2002). CASPubMed Google Scholar
Lee, S. et al. Correct identification of genes from serial analysis of gene expression tag sequences. Genomics79, 598–602 (2002). CASPubMed Google Scholar
Horak, C. E. & Snyder, M. ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods Enzymol.350, 469–483 (2002). CASPubMed Google Scholar
Clark, T. A., Sugnet, C. W. & Ares, M. Jr. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science296, 907–910 (2002). CASPubMed Google Scholar
Yeakey, J. M. et al. Profiling alternative splicing on fiber-optic arrays. Nature Biotechnol.20, 353–358 (2002). Google Scholar
Goldstrohm, A. C., Greenleaf, A. L. & Garcia-Blanco, M. A. Co-transcriptional splicing of pre-messenger RNAs: considerations for the mechanism of alternative splicing. Gene277, 31–47 (2001). CASPubMed Google Scholar
Proudfoot, N. J., Furger, A. & Dye, M. J. Integrating mRNA processing with transcription. Cell108, 501–512 (2002).A recent review on the interdependence of transcription and RNA processing. CASPubMed Google Scholar