Template-based protein structure modeling using the RaptorX web server - PubMed (original) (raw)

Template-based protein structure modeling using the RaptorX web server

Morten Källberg et al. Nat Protoc. 2012.

Abstract

A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.

Figures

Figure 1

Figure 1

Performance assessment of core prediction modules in the RaptorX server. (a) Comparison of structure prediction performance by global distance test (GDT) score for RaptorX and three other publicly available protocols on the CASP9 targets. Performance is compared in four categories: All CASP9 targets, template-based modeling (TBM) targets, hard TBM targets and multidomain targets. (b) Performance comparison for domain parsing between RaptorX and DoBo. Metrics are given for overall performance, and performance on single-domain and multidomain CASP9 target proteins. Specifically, accuracy is the overall proportion of both single-domain and multidomain proteins identified correctly; single (multi) recall is the fraction of single-domain (multidomain) proteins that are predicted; single (multi) precision is the fraction of correctly predicted single-domain (multidomain) proteins among all the predictions. A multidomain protein is correctly predicted only if its domain boundaries are correctly identified. (c) Performance comparison between RaptorX and PSIPRED for secondary structure prediction. The accuracies achieved for three-state prediction (helix, sheet and coil) are compared. (d) Performance comparison between RaptorX and SSPRO8 for eight-state secondary structure prediction. The accuracies achieved for eight-state prediction for the classes H, G, I, E, B, T, S, L (using SSPRO8 nomenclature) are compared.

Figure 2

Figure 2

Workflow used by the RaptorX server. Outline of the three modeling tasks users can accomplish using the RaptorX server, namely tertiary structure prediction, custom alignment and secondary structure prediction. For each stage, details of the computation and approximate completion time for a 250-residue target sequence are given (for threading; the indicated time is for a full template library scan). Blue boxes indicate mandatory stages, green boxes indicate optional stages and gray boxes indicate the resulting output. The blue, red and yellow directed paths indicate the flow for structure prediction, custom alignment jobs and secondary structure prediction, respectively. Dashed and solid paths indicate that the subsequent steps are optional and required, respectively. n/a, not applicable.

Figure 3

Figure 3

Job-listing interface. Selecting ‘My jobs’ displays this job overview for the user’s account, which gives the status of each prediction in the job along with overall information of the predictions being done for each sequence submitted.

Figure 4

Figure 4

Secondary structure result interface. The numbered labels indicate the location of the following screen features: (1) tabs for switching between the three-state and eight-state prediction; (2) hovering over a residue will give detailed statistics on the secondary-state distribution; (3) the status: a current running time of the job; (4) a download link for the prediction results; and (5) a color-code legend for secondary structure diagram.

Figure 5

Figure 5

Tertiary structure result interface. The numbered labels indicate the location of the following screen features: (1) the rank of currently selected model; (2) the quality score of the model; (3) the PDB IDs for the set templates used for modeling; (4) a drop-down menu for selecting alternative structure models; (5) tabs for switching between structure prediction, function annotation and BLAST output; (6) interactive viewer displaying the currently selected model structure; (7) menu for controlling the interactive viewer; (8) alignment used for structure modeling; (9) indication of the status: a current running time of the job; (10) download links for prediction results; and (11) a user guide for the interactive structure viewer.

Figure 6

Figure 6

Disorder prediction result display. Graphics comparable to those described for the secondary structure result interface (Fig. 4) are used to visualize the probability that a given residue is either in a disorder segment (marked in red) or nondisorder segment (marked in blue). Again, hovering over a residue will give detailed statistics on the disorder prediction, whereas the right-hand side shows the status of the job with a download link for the disorder prediction results and a color-code legend for the disorder prediction diagram.

Figure 7

Figure 7

Custom alignment result interface. The numbered labels indicate the location of the following screen features: (1) a drop-down menu for switching between alternative alignments; (2) the alignment between target sequence and template; (3) indication of the status: a current running time of the job; (4) a link for download of the prediction result; and (5) a legend indicating the alignment color coding.

Figure 8

Figure 8

Domain parsing result display. If multiple domains are found, the domain parsing results outline the span of each segment, the Pfam family it is predicted to belong to, a confidence measure (E value) for the domain assignment and a possible functional annotation of domain region.

Similar articles

Cited by

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Källberg M, Lu H. An improved machine learning protocol for the identification of correct Sequest search results. BMC Bioinformatics. 2010;11:591. - PMC - PubMed
    1. Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. 2000;28:304–305. - PMC - PubMed
    1. Hannum G, et al. Genome-wide association data reveal a global map of genetic interactions among protein complexes. PLoS Genet. 2009;5:e1000782. - PMC - PubMed
    1. Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources