Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER - PubMed (original) (raw)

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER

Markus Wistrand et al. BMC Bioinformatics. 2005.

Abstract

Background: Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring.

Results: Using default settings, SAM was overall more sensitive. SAM's model estimation was superior, while HMMER's model scoring was more accurate. Critical features for model building were then analysed by comparing the two packages' algorithmic choices and parameters. The weighting between prior probabilities and multiple alignment counts held the primary explanation why SAM's model building was superior. Our analysis suggests that HMMER gives too much weight to the sequence counts. SAM's emission prior probabilities were also shown to be more sensitive. The relative sequence weighting schemes are different in the two packages but performed equivalently.

Conclusion: SAM model estimation was more sensitive, while HMMER model scoring was more accurate. By combining the best algorithmic features from both packages the accuracy was substantially improved compared to their default performance.

PubMed Disclaimer

Figures

Figure 1

Model building and database searching performance of SAM and HMMER. (A) Local/local mode. (B) Global/local mode. The Viterbi algorithm was used for all searches. Otherwise, default settings were used. The model building program is mentioned first in the legend, and the scoring program second. 'SAM-SAM' means 'Building the HMM using SAM; Searching using SAM', 'HMMER-SAM' means 'Building the HMMs using HMMER; Searching using SAM', etc.

Figure 2

Analysis of emission prior probabilities. HMMER models were estimated using the SAM emission prior recode3.20.comp. Both local/local and global/local models were improved compared to using the HMMER default emission prior. Part of the SAM – HMMER performance difference can therefore be assigned to the emission prior.

Figure 3

Analysis of transition prior probabilities. (A) Local/local mode. (B) Global/local mode. SAM and HMMER models were built using their own default transition prior, and with default transition prior from the other program. The transition priors seem to be specific to their programs as they degrade the performance of the other program.

Figure 4

Analysis of relative and total sequence weight calculation methods. (A) Local/local mode. (B) Global/local mode. HMMER was run with relative and total sequence weights produced by HMMER or SAM in different combinations. In local/local mode the benefit of SAM's total weight is strong. In global/local mode, the benefit is less pronounced and dependent on using SAM's transition prior. With our implementation of SAM's "bits saved" method for total weight calculation, HMMER performed about as well as using weights estimated by SAM. SAM's recode3.20.comp emission prior was used for all model building.

Cited by

The Virome of Cocoa Fermentation-Associated Microorganisms.
Santos JPN, Rodrigues GVP, Ferreira LYM, Monteiro GP, Fonseca PLC, Lopes ÍS, Florêncio BS, da Silva Junior AB, Ambrósio PE, Pirovani CP, Aguiar ERGR. Santos JPN, et al. Viruses. 2024 Jul 31;16(8):1226. doi: 10.3390/v16081226. Viruses. 2024. PMID: 39205200 Free PMC article.
Biogenesis of flavor-related linalool is diverged and genetically conserved in tree peony (Paeonia × suffruticosa).
Li S, Zhang L, Sun M, Lv M, Yang Y, Xu W, Wang L. Li S, et al. Hortic Res. 2022 Nov 7;10(2):uhac253. doi: 10.1093/hr/uhac253. eCollection 2023 Feb. Hortic Res. 2022. PMID: 36751271 Free PMC article.
Structural basis for the calmodulin-mediated activation of eukaryotic elongation factor 2 kinase.
Piserchio A, Isiorho EA, Long K, Bohanon AL, Kumar EA, Will N, Jeruzalmi D, Dalby KN, Ghose R. Piserchio A, et al. Sci Adv. 2022 Jul 8;8(27):eabo2039. doi: 10.1126/sciadv.abo2039. Epub 2022 Jul 6. Sci Adv. 2022. PMID: 35857468 Free PMC article.
Profile Comparer Extended: phylogeny of lytic polysaccharide monooxygenase families using profile hidden Markov model alignments.
Voshol GP, Punt PJ, Vijgenboom E. Voshol GP, et al. F1000Res. 2019 Oct 31;8:1834. doi: 10.12688/f1000research.21104.1. eCollection 2019. F1000Res. 2019. PMID: 31956399 Free PMC article.
Changes in Gene Expression and Metabolite Profiles in Platanus acerifolia Leaves in Response to Feeding Damage Caused by Corythucha ciliata.
Li F, Wu C, Dewer Y, Li D, Qu C, Luo C. Li F, et al. Int J Mol Sci. 2019 Jul 15;20(14):3465. doi: 10.3390/ijms20143465. Int J Mol Sci. 2019. PMID: 31311085 Free PMC article.

References

1. Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol. 1998;284:1201–1210. doi: 10.1006/jmbi.1998.2221. - DOI - PubMed
1. Hughey R, Krogh A. Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci. 1996;12:95–107. - PubMed
1. Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. - DOI - PubMed
1. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. The Pfam protein families database. Nucleic Acids Res. 2004;32 (Database issue):D138–41. doi: 10.1093/nar/gkh121. - DOI - PMC - PubMed
1. Gough J, Karplus K, Hughey R, Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol. 2001;313:903–919. doi: 10.1006/jmbi.2001.5080. - DOI - PubMed

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER - PubMed (original) (raw)