Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER - PubMed (original) (raw)

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER

Markus Wistrand et al. BMC Bioinformatics. 2005.

Abstract

Background: Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring.

Results: Using default settings, SAM was overall more sensitive. SAM's model estimation was superior, while HMMER's model scoring was more accurate. Critical features for model building were then analysed by comparing the two packages' algorithmic choices and parameters. The weighting between prior probabilities and multiple alignment counts held the primary explanation why SAM's model building was superior. Our analysis suggests that HMMER gives too much weight to the sequence counts. SAM's emission prior probabilities were also shown to be more sensitive. The relative sequence weighting schemes are different in the two packages but performed equivalently.

Conclusion: SAM model estimation was more sensitive, while HMMER model scoring was more accurate. By combining the best algorithmic features from both packages the accuracy was substantially improved compared to their default performance.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Model building and database searching performance of SAM and HMMER. (A) Local/local mode. (B) Global/local mode. The Viterbi algorithm was used for all searches. Otherwise, default settings were used. The model building program is mentioned first in the legend, and the scoring program second. 'SAM-SAM' means 'Building the HMM using SAM; Searching using SAM', 'HMMER-SAM' means 'Building the HMMs using HMMER; Searching using SAM', etc.

Figure 2

Figure 2

Analysis of emission prior probabilities. HMMER models were estimated using the SAM emission prior recode3.20.comp. Both local/local and global/local models were improved compared to using the HMMER default emission prior. Part of the SAM – HMMER performance difference can therefore be assigned to the emission prior.

Figure 3

Figure 3

Analysis of transition prior probabilities. (A) Local/local mode. (B) Global/local mode. SAM and HMMER models were built using their own default transition prior, and with default transition prior from the other program. The transition priors seem to be specific to their programs as they degrade the performance of the other program.

Figure 4

Figure 4

Analysis of relative and total sequence weight calculation methods. (A) Local/local mode. (B) Global/local mode. HMMER was run with relative and total sequence weights produced by HMMER or SAM in different combinations. In local/local mode the benefit of SAM's total weight is strong. In global/local mode, the benefit is less pronounced and dependent on using SAM's transition prior. With our implementation of SAM's "bits saved" method for total weight calculation, HMMER performed about as well as using weights estimated by SAM. SAM's recode3.20.comp emission prior was used for all model building.

Similar articles

Cited by

References

    1. Park J, Karplus K, Barrett C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol. 1998;284:1201–1210. doi: 10.1006/jmbi.1998.2221. - DOI - PubMed
    1. Hughey R, Krogh A. Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci. 1996;12:95–107. - PubMed
    1. Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. - DOI - PubMed
    1. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. The Pfam protein families database. Nucleic Acids Res. 2004;32 (Database issue):D138–41. doi: 10.1093/nar/gkh121. - DOI - PMC - PubMed
    1. Gough J, Karplus K, Hughey R, Chothia C. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol. 2001;313:903–919. doi: 10.1006/jmbi.2001.5080. - DOI - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources