GeneMark gene prediction (original) (raw)

Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes
circular genome Novel prokaryotic genomic sequences can be analyzed by the self-training software tool GeneMarkS-2 (sequences longer than 50 Kb). For some species pre-trained model parameters are ready and available through the GeneMark.hmm. Metagenomic sequences and individual short sequences (sequences < 50 kb) can be analyzed by MetaGeneMark.
Gene Prediction in Eukaryotes
mouse Novel eukaryotic genomes can be analyzed by the self-training GeneMark-ES. The fungal mode of GeneMark-ES accounts for fungal-specific intron organization. GeneMark-ET integrates into GeneMark-ES information on mapped RNA-Seq reads. GeneMark-EP+ integrates into GeneMark-ES information on cross-species protein sequences. GeneMark-ETP integrates into GeneMark-ES both types of external information, RNA reads and cross-species proteins.
Gene Prediction in Transcripts
gel Sets of eukaryotic transcripts can be analyzed by GeneMarkS-T.
Gene Prediction in Viruses, Phages and Plasmids
virus Sequences of viruses, phages or plasmids can be analyzed either by GeneMarkS (sequences > 50 Kb) or MetaGeneMark (sequences < 50 kb)
All the software tools mentioned here are available for download. The GeneMark software is a part of genome annotation pipelines at NIH NCBI (for prokaryotes) and DOE JGI (for eukaryotes) as well as others: QUAST: assessment of genome assembly quality - uses GeneMarkSMetAMOS: a tool for metagenome assembly and analysis - uses MetaGeneMark
Eukaryotic genome annotation pipelines: MAKER2: uses GeneMark-ES along with SNAP and AUGUSTUS.BRAKER1: integrates RNA-Seq reads -- uses GeneMark-ET and AUGUSTUSBRAKER2: integrates known proteins -- uses GeneMark-EP+ and AUGUSTUSBRAKER3: integrates RNA-seq reads and known proteins -- uses GeneMark-EP+ and AUGUSTUS For more information see Background and Publications.