FreeContact: fast and free software for protein contact prediction from residue co-evolution - PubMed (original) (raw)

FreeContact: fast and free software for protein contact prediction from residue co-evolution

László Kaján et al. BMC Bioinformatics. 2014.

Abstract

Background: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software.

Results: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library "libfreecontact", complete with command line tool "freecontact", as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability.

Conclusions: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).

PubMed Disclaimer

Figures

Figure 1

Runtimes for FreeContact. We measured the runtime (logarithmic y-axis) for different program components (x-axis) on a single thread. The program components were: “seqw” – sequence weighting; “pairfreq” – pairwise residue frequencies; “shrink” – shrinking of covariance matrix; “inv” – sparse inverse covariance estimation/covariance matrix inversion. The different colors distinguish: the original PSICOV implementation (blue), our acceleration of PSICOV (FC.psicov, yellow), our acceleration of the faster PSICOV version “sensible default” (FC.psicov-fast, green), and our implementation of EVfold-mfDCA (FC.evfold, red). The whiskers on the box plots show the most extreme data point that is less than 1.5-times the interquartile range from the box. Outliers are not shown. Total runtime of all methods tested is dominated by the sparse inverse covariance estimation/covariance matrix inversion component.

Figure 2

Speedup using multiple threads. A: Sequence weighting. Speed is calculated as: proteins in alignment2 length of target protein/runtime. B: Pairwise residue frequency calculation. Speed is calculated as: proteins in alignment length of target protein2/runtime. Dashed lines indicate linear correlation, extrapolated from one thread. The whiskers extend to the most extreme data point that is less than 1.5-times the interquartile range from the box. The surprisingly clear correlation between the number of threads and speed demonstrates how well our implementation scales for multi-threading.

Cited by

Deep-learning contact-map guided protein structure prediction in CASP13.
Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Zheng W, et al. Proteins. 2019 Dec;87(12):1149-1164. doi: 10.1002/prot.25792. Epub 2019 Aug 14. Proteins. 2019. PMID: 31365149 Free PMC article.
The EVcouplings Python framework for coevolutionary sequence analysis.
Hopf TA, Green AG, Schubert B, Mersmann S, Schärfe CPI, Ingraham JB, Toth-Petroczy A, Brock K, Riesselman AJ, Palmedo P, Kang C, Sheridan R, Draizen EJ, Dallago C, Sander C, Marks DS. Hopf TA, et al. Bioinformatics. 2019 May 1;35(9):1582-1584. doi: 10.1093/bioinformatics/bty862. Bioinformatics. 2019. PMID: 30304492 Free PMC article.
Inter-Residue Distance Prediction From Duet Deep Learning Models.
Zhang H, Huang Y, Bei Z, Ju Z, Meng J, Hao M, Zhang J, Zhang H, Xi W. Zhang H, et al. Front Genet. 2022 May 16;13:887491. doi: 10.3389/fgene.2022.887491. eCollection 2022. Front Genet. 2022. PMID: 35651930 Free PMC article.
Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.
Hou J, Wu T, Cao R, Cheng J. Hou J, et al. Proteins. 2019 Dec;87(12):1165-1178. doi: 10.1002/prot.25697. Epub 2019 Apr 25. Proteins. 2019. PMID: 30985027 Free PMC article.
ConEVA: a toolbox for comprehensive assessment of protein contacts.
Adhikari B, Nowotny J, Bhattacharya D, Hou J, Cheng J. Adhikari B, et al. BMC Bioinformatics. 2016 Dec 7;17(1):517. doi: 10.1186/s12859-016-1404-z. BMC Bioinformatics. 2016. PMID: 27923350 Free PMC article.

References

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28(1):235–242. - PMC - PubMed
1. Magrane M, Consortium U. UniProt knowledgebase: a hub of integrated protein data. Database: the journal of biological databases and curation. 2011;2011:bar009. - PMC - PubMed
1. Rost B, Sander C. Bridging the protein sequence-structure gap by structure predictions. Annual review of biophysics and biomolecular structure. 1996;25:113–136. - PubMed
1. Kiefer F, Arnold K, Kunzli M, Bordoli L, Schwede T. The SWISS-MODEL repository and associated resources. Nucleic Acids Res. 2009;37(Database issue):D387–392. - PMC - PubMed
1. Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z, Meng EC, Pettersen EF, Huang CC, Datta RS, Sampathkumar P, Madhusudhan MS, Sjölander K, Ferrin TE, Burley SK, Sali A. ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 2011;39(Database issue):D465–474. - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

FreeContact: fast and free software for protein contact prediction from residue co-evolution - PubMed (original) (raw)