Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER - PubMed (original) (raw)
Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER
Markus Wistrand et al. BMC Bioinformatics. 2005.
Abstract
Background: Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring.
Results: Using default settings, SAM was overall more sensitive. SAM's model estimation was superior, while HMMER's model scoring was more accurate. Critical features for model building were then analysed by comparing the two packages' algorithmic choices and parameters. The weighting between prior probabilities and multiple alignment counts held the primary explanation why SAM's model building was superior. Our analysis suggests that HMMER gives too much weight to the sequence counts. SAM's emission prior probabilities were also shown to be more sensitive. The relative sequence weighting schemes are different in the two packages but performed equivalently.
Conclusion: SAM model estimation was more sensitive, while HMMER model scoring was more accurate. By combining the best algorithmic features from both packages the accuracy was substantially improved compared to their default performance.
Figures
Figure 1
Model building and database searching performance of SAM and HMMER. (A) Local/local mode. (B) Global/local mode. The Viterbi algorithm was used for all searches. Otherwise, default settings were used. The model building program is mentioned first in the legend, and the scoring program second. 'SAM-SAM' means 'Building the HMM using SAM; Searching using SAM', 'HMMER-SAM' means 'Building the HMMs using HMMER; Searching using SAM', etc.
Figure 2
Analysis of emission prior probabilities. HMMER models were estimated using the SAM emission prior recode3.20.comp. Both local/local and global/local models were improved compared to using the HMMER default emission prior. Part of the SAM – HMMER performance difference can therefore be assigned to the emission prior.
Figure 3
Analysis of transition prior probabilities. (A) Local/local mode. (B) Global/local mode. SAM and HMMER models were built using their own default transition prior, and with default transition prior from the other program. The transition priors seem to be specific to their programs as they degrade the performance of the other program.
Figure 4
Analysis of relative and total sequence weight calculation methods. (A) Local/local mode. (B) Global/local mode. HMMER was run with relative and total sequence weights produced by HMMER or SAM in different combinations. In local/local mode the benefit of SAM's total weight is strong. In global/local mode, the benefit is less pronounced and dependent on using SAM's transition prior. With our implementation of SAM's "bits saved" method for total weight calculation, HMMER performed about as well as using weights estimated by SAM. SAM's recode3.20.comp emission prior was used for all model building.
Similar articles
- Improving profile HMM discrimination by adapting transition probabilities.
Wistrand M, Sonnhammer EL. Wistrand M, et al. J Mol Biol. 2004 May 7;338(4):847-54. doi: 10.1016/j.jmb.2004.03.023. J Mol Biol. 2004. PMID: 15099750 - HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.
Srivastava PK, Desai DK, Nandi S, Lynn AM. Srivastava PK, et al. BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104. BMC Bioinformatics. 2007. PMID: 17389042 Free PMC article. - Fast model-based protein homology detection without alignment.
Hochreiter S, Heusel M, Obermayer K. Hochreiter S, et al. Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8. Bioinformatics. 2007. PMID: 17488755 - Sequence comparison and protein structure prediction.
Dunbrack RL Jr. Dunbrack RL Jr. Curr Opin Struct Biol. 2006 Jun;16(3):374-84. doi: 10.1016/j.sbi.2006.05.006. Epub 2006 May 19. Curr Opin Struct Biol. 2006. PMID: 16713709 Review. - What is a hidden Markov model?
Eddy SR. Eddy SR. Nat Biotechnol. 2004 Oct;22(10):1315-6. doi: 10.1038/nbt1004-1315. Nat Biotechnol. 2004. PMID: 15470472 Review. No abstract available.
Cited by
- The Virome of Cocoa Fermentation-Associated Microorganisms.
Santos JPN, Rodrigues GVP, Ferreira LYM, Monteiro GP, Fonseca PLC, Lopes ÍS, Florêncio BS, da Silva Junior AB, Ambrósio PE, Pirovani CP, Aguiar ERGR. Santos JPN, et al. Viruses. 2024 Jul 31;16(8):1226. doi: 10.3390/v16081226. Viruses. 2024. PMID: 39205200 Free PMC article. - Biogenesis of flavor-related linalool is diverged and genetically conserved in tree peony (Paeonia × suffruticosa).
Li S, Zhang L, Sun M, Lv M, Yang Y, Xu W, Wang L. Li S, et al. Hortic Res. 2022 Nov 7;10(2):uhac253. doi: 10.1093/hr/uhac253. eCollection 2023 Feb. Hortic Res. 2022. PMID: 36751271 Free PMC article. - Structural basis for the calmodulin-mediated activation of eukaryotic elongation factor 2 kinase.
Piserchio A, Isiorho EA, Long K, Bohanon AL, Kumar EA, Will N, Jeruzalmi D, Dalby KN, Ghose R. Piserchio A, et al. Sci Adv. 2022 Jul 8;8(27):eabo2039. doi: 10.1126/sciadv.abo2039. Epub 2022 Jul 6. Sci Adv. 2022. PMID: 35857468 Free PMC article. - Profile Comparer Extended: phylogeny of lytic polysaccharide monooxygenase families using profile hidden Markov model alignments.
Voshol GP, Punt PJ, Vijgenboom E. Voshol GP, et al. F1000Res. 2019 Oct 31;8:1834. doi: 10.12688/f1000research.21104.1. eCollection 2019. F1000Res. 2019. PMID: 31956399 Free PMC article. - Changes in Gene Expression and Metabolite Profiles in Platanus acerifolia Leaves in Response to Feeding Damage Caused by Corythucha ciliata.
Li F, Wu C, Dewer Y, Li D, Qu C, Luo C. Li F, et al. Int J Mol Sci. 2019 Jul 15;20(14):3465. doi: 10.3390/ijms20143465. Int J Mol Sci. 2019. PMID: 31311085 Free PMC article.
References
- Hughey R, Krogh A. Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci. 1996;12:95–107. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources