A divide and conquer approach (DACA) to predict high fidelity structure of large multidomain protein BRWD1 (original) (raw)

Divide and Conquer Strategies for Protein Structure Prediction

Mathematical Approaches to Polymer Sequence Analysis and Related Problems, 2010

In this chapter, we discuss some approaches to the problem of protein structure prediction by addressing "simpler" sub-problems. The rationale behind this strategy is to develop methods for predicting some interesting structural characteristics of the protein, which can be useful per se and, at the same time, can be of help in solving the main problem. In particular, we discuss the problem of predicting the protein secondary structure, which is at the moment one of the most successful sub-problems addressed in computational biology. Available secondary structure predictors are very reliable and can be routinely used for annotating new genomes or as input for other more complex prediction tasks, such as remote homology detection and functional assignments. As a second example, we also discuss the problem of predicting residue-residue contacts in proteins. In this case, the task is much more complex than secondary structure prediction, and no satisfactory results have been achieved so far. Differently from the secondary structure sub-problem, the residue-residue contact sub-problem is not intrinsically simpler than the prediction of the protein structure, since a roughly correctly predicted set of residue-residue contacts would directly lead to prediction of a protein backbone very close to the real structure. These two protein structure sub-problems are discussed in the light of the current evaluation of the performance that are based on periodical blind-checks (CASP meetings) and permanent evaluation (EVA servers).

Recursive protein modeling: A divide and conquer strategy for protein structure prediction and its case study in CASP9

2011

After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of modeling structure, two types of modeling techniques - template-based modeling and template-free modeling - have been developed. Template-based modeling can often generate a moderate to high resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling such as fragment-based assembly may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e., certain) and template-free (i.e., uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can significantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction.

Protein Structure Prediction Using CABS – A Consensus Approach

We have designed a new pipeline for protein structure prediction based on the CABS engine. The procedure is fully automated and generates consensus models from a set of templates. Restraints derived from the templates define a region of conformational space, which is then sampled by Replica Exchange Monte Carlo algorithm implemented in CABS. Results from CASP9 show, that for great majority of targets this approach leads to better models than the mean quality of templates (in respect to GDT TS). In five cases the obtained models were the best among all predictions submitted to CASP9 as the first models.

Identification of novel natural product inhibitors of BRD4 using high throughput virtual screening and MD simulation

Bromodomains are evolutionarily conserved structural motifs that recognize acetylated lysine residues on histone tails. They play a crucial role in shaping chromatin architecture and regulating gene expression in various biological processes. Mutations in bromodomains containing proteins leads to multiple human diseases, which makes them attractive target for therapeutic intervention. Extensive studies have been done on BRD4 as a target for several cancers, such as Acute Myeloid Leukemia (AML) and Burkitt Lymphoma. Several potential inhibitors have been identified against the BRD4 bromodomain. However, most of these inhibitors have drawbacks such as nonspecificity and toxicity, decreasing their appeal and necessitating the search for novel non-toxic inhibitors. This study aims to address this need by virtually screening natural compounds from the NPASS database against the Kac binding site of BRD4-BD1 using high throughput molecular docking followed by similarity clustering, pharmac...

The protein structure prediction problem could be solved using the current PDB library

Proceedings of The National Academy of Sciences, 2005

For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with Ϸ82% alignment coverage. These template structures often contain a significant number of insertions͞deletions. The TASSER algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2͞1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-theart structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments.

Protein structure prediction in biology and medicine

2000

Fox Chase Cancer Center 2005 Scientific Report 2 structures from 250 research groups for 76 protein targets for which the experimental structures became available in the autumn of 2004. With the other assessors and the organizers, we chose the top 6 groups who were asked to present their results at the CASP6 meeting in Gaeta, Italy in December 2004. In 2005, we finished the analysis of the CASP6 data and published the assessment papers (1–3).

MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8

Bioinformatics/computer Applications in The Biosciences, 2010

Protein structure prediction is one of the most important problems in structural bioinformatics. Here we describe MULTICOM, a multi-level combination approach to improve the various steps in protein structure prediction. In contrast to those methods which look for the best templates, alignments and models, our approach tries to combine complementary and alternative templates, alignments and models to achieve on average better accuracy. Results: The multi-level combination approach was implemented via five automated protein structure prediction servers and one human predictor which participated in the eighth Critical Assessment of Techniques for Protein Structure Prediction (CASP8), 2008. The MULTICOM servers and human predictor were consistently ranked among the top predictors on the CASP8 benchmark. The methods can predict moderate-to high-resolution models for most templatebased targets and low-resolution models for some template-free targets. The results show that the multi-level combination of complementary templates, alternative alignments and similar models aided by model quality assessment can systematically improve both template-based and template-free protein modeling. Availability: The MULTICOM server is freely available at

Prediction of the secondary structures of proteins by using PREDICT, a nearest neighbor method on pattern space

2004

We introduce a novel method for predicting the secondary structure of proteins, PREDICT (PRofile Enumeration DICTionary), in which the nearest-neighbor method is applied to a pattern space. For a given protein sequence, PSI-BLAST is used to generate a profile that defines patterns for amino acid residues and their local sequence environments. By applying the PSI-BLAST to protein sequences with known secondary structures, we construct pattern databases. The secondary structure of a query residue of a protein with unknown structure can be determined by comparing the query pattern with those in the pattern databases and selecting the patterns close to the query pattern. We have tested the PREDICT on the CB513 set (a set of 513 non-homologous proteins) in three different ways. The first test was based on a pattern database derived from 7777 proteins in the Protein Data Bank (PDB), including those homologous to proteins in the CB513 set and gave an average Q3 score of 78.8 % per chain. In the second test, in order to carry out a more stringent benchmark test on the CB513 set, we removed from the 7777 proteins all proteins homologous to the CB513 set, leaving 4330 proteins. Pattern databases were constructed based on these proteins, and the average Q3 score was 74.6 %. In the third test, we selected one query protein among the CB513 set and built pattern databases by using the remaining 512 proteins. This procedure was repeated for each of the 513 proteins, and the average Q3 score was 73.1 %. Finally, we participated in the CASP5 (group ID: 531) where we employed the first-layer database based on the 7777 proteins and the second-layer database based on the CB513 set. The PREDICT gave quite promising results with an average Q3 (Sov) score of 78.1 (77.4) % on 55 CASP5 targets.

Tools for Protein Structure Prediction at the bri-shur. com Web Portal

2012

Internet services on bioinformatics still remain a popular tool for the researchers. Here the authors present a recently developed web-site http://bri-shur.com where several tools and pipelines for protein structure prediction are implemented. The prediction of a structure for a particular protein often requires a sensitive and iterative approach, and the web-site provides an environment for this kind of work. Software that is used in the services includes both free programs available in the Internet and newly developed algorithms. The service on homology screening in PDB for a structure template is implemented using an approach that is alternative to well-known BLAST algorithm and it has some advantages over BLAST. The service on homology modeling uses well-known Nest program. The service on protein energy estimate allows selecting a best template in the set of homologs and adds a functionality of fold recognition to the environment. The design of the site simplifies several of the most useful bioinformatics routines, thus making them available to a large community of researchers. Services are provided free of charge without registration, and the user's privacy is taken care of.