FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences - PubMed (original) (raw)

FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences

Thomas Schiex et al. Nucleic Acids Res. 2003.

Abstract

We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms.

PubMed Disclaimer

Figures

Figure 1

Figure 1

A simplified view of the directed acyclic graph built for analyzing the sequence CATGAGTACNGA. This view ignores the additional complexity induced by gene overlapping regions and frameshift modeling. The occurrence of a START codon at position 2 to 4 induces a ‘signal’ edge that goes from the non-coding track to the +2 coding track. Similarly, the occurrence of the NGA codon at the end induces a STOP signal edge. Edge weights sources are indicated using dotted arrows.

Figure 2

Figure 2

An example of the graphical output of FrameD. The sequence is on the _x_-axis. The _y_-axis corresponds to possible predictions. From top to bottom: frame 3, 2, 1 coding tracks, intergenic track (IG) and frame −1, −2, −3 coding tracks. In-frame START codons are represented as blue vertical lines. The longer the line, the better a possible RBS. In-frame STOP codons are represented as small red vertical lines (grey if the STOP codon is degenerated). Thin black lines represent the smoothed normalized coding/non coding score. Finally, BlastX hits are represented as magenta blocks. The prediction itself is visible as red blocks and the ‘mean’ prediction as a thin grey line. The thin magenta line represents frameshift expectation. The sequence here has been specifically modified for the example: the ATG start codon of the gene, at position 148 has been replaced by an ANG and 15 nucleotides from position 915 to 929 have been replaced by 14 Ns. Using the ‘low quality sequence’ frameshift penalty, FrameD correctly predicts a gene that starts at position 148 and a frameshift between positions 911 and 912. The frameshift expectation and the ‘mean’ prediction make clear the uncertainties on the frameshift and START positions.

Similar articles

Cited by

References

    1. Salanoubat M., Genin,S., Artiguenave,F., Gouzy,J., Mangenot,S., Arlat,M., Billault,A., Brottier,P., Camus,J., Cattolico,L. et al. (2002) Genome sequence of the plant pathogen Ralstonia solanacearum. Nature, 415, 497–502. - PubMed
    1. Galibert F., Finan,T.M., Long,S.R., Puhler,A., Abola,P., Ampe,F., Barloy-Hubler,F., Barnett,M.J., Becker,A., Boistard,P. et al. (2001) The composite genome of the legume symbiont Sinorhizobium meliloti. Science, 293, 668–672. - PubMed
    1. Journet E.P., van Tuiner,D., Gouzy,J., Crespeau,H., Carreau,V., Farmer,M.J., Nicoel,A., Schiex,T., Jaillon,O., Chatagnier,O. et al. (2002) Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis. Nucleic Acids Res., 30, 5579–5592. - PMC - PubMed
    1. Salzberg S.L., Delcher,A.L., Kasif,S. and White,O. (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res., 26, 544–548. - PMC - PubMed
    1. Serra M., Turner,D. and Freier,S. (1995) Predicting thermodynamic properties of RNA. Methods Enzymol., 259, 243–261. - PubMed

MeSH terms

Substances

LinkOut - more resources