The ConSurf-HSSP database: The mapping of evolutionary conservation among homologs onto PDB structures (original) (raw)
Related papers
The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures
Nucleic Acids Research, 2009
ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at
ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures
Nucleic Acids Research, 2005
Key amino acid positions that are important for maintaining the 3D structure of a protein and/or its function(s), e.g. catalytic activity, binding to ligand, DNA or other proteins, are often under strong evolutionary constraints. Thus, the biological importance of a residue often correlates with its level of evolutionary conservation within the protein family. ConSurf (http://consurf.tau.ac.il/) is a web-based tool that automatically calculates evolutionary conservation scores and maps them on protein structures via a user-friendly interface. Structurally and functionally important regions in the protein typically appear as patches of evolutionarily conserved residues that are spatially close to each other. We present here version 3.0 of ConSurf. This new version includes an empirical Bayesian method for scoring conservation, which is more accurate than the maximum-likelihood method that was used in the earlier release. Various additional steps in the calculation can now be controlled by a number of advanced options, thus further improving the accuracy of the calculation. Moreover, ConSurf version 3.0 also includes a measure of confidence for the inferred amino acid conservation scores.
ConSurf: Using Evolutionary Data to Raise Testable Hypotheses about Protein Function
Israel Journal of Chemistry, 2013
Many mutations disappear from the population because they impair protein function and/or stability. Thus, amino acid positions that are essential for proper function evolve more slowly than others, or in other words, the slow evolutionary rate of a position reflects its importance. Con-Surf (http://consurf.tau.ac.il), reviewed in this manuscript, exploits this to reveal key amino acid positions that are important for maintaining the native conformation(s) of the protein and its function, be it binding, catalysis, transport, etc. Given the sequence or 3D structure of the query protein as input, a search for similar sequences is conducted and the sequences are aligned. The multiple sequence alignment is subsequently used to calculate the evolutionary rates of each amino acid site, using Bayesian or maximum-likelihood algorithms. Both algorithms take into account the evolutionary relationships between the sequences, reflected in phylogenetic trees, to alleviate problems due to uneven (biased) sampling in sequence space. This is particularly important when the number of sequences is low. The ConSurf-DB, a new release of which is presented here, provides precalculated ConSurf conservation analysis of nearly all available structures in the Protein DataBank (PDB). The usefulness of ConSurf for the study of individual proteins and mutations, as well as a range of large-scale, genome-wide applications, is reviewed.
Nucleic Acids Research, 2010
It is informative to detect highly conserved positions in proteins and nucleic acid sequence/structure since they are often indicative of structural and/or functional importance. ConSurf (http://consurf.tau. ac.il) and ConSeq (http://conseq.tau.ac.il) are two well-established web servers for calculating the evolutionary conservation of amino acid positions in proteins using an empirical Bayesian inference, starting from protein structure and sequence, respectively. Here, we present the new version of the ConSurf web server that combines the two independent servers, providing an easier and more intuitive step-by-step interface, while offering the user more flexibility during the process. In addition, the new version of ConSurf calculates the evolutionary rates for nucleic acid sequences. The new version is freely available at: http://consurf.tau.ac.il/.
Bioinformatics, 2002
Motivation: A number of proteins of known threedimensional (3D) structure exist, with yet unknown function. In light of the recent progress in structure determination methodology, this number is likely to increase rapidly. A novel method is presented here: 'Rate4Site', which maps the rate of evolution among homologous proteins onto the molecular surface of one of the homologues whose 3D-structure is known. Functionally important regions often correspond to surface patches of slowly evolving residues. Results: Rate4Site estimates the rate of evolution of amino acid sites using the maximum likelihood (ML) principle. The ML estimate of the rates considers the topology and branch lengths of the phylogenetic tree, as well as the underlying stochastic process. To demonstrate its potency, we study the Src SH2 domain. Like previously established methods, Rate4Site detected the SH2 peptide-binding groove. Interestingly, it also detected inter-domain interactions between the SH2 domain and the rest of the Src protein that other methods failed to detect.
Journal of molecular biology, 2001
Experimental approaches for the identi®cation of functionally important regions on the surface of a protein involve mutagenesis, in which exposed residues are replaced one after another while the change in binding to other proteins or changes in activity are recorded. However, practical considerations limit the use of these methods to small-scale studies, precluding a full mapping of all the functionally important residues on the surface of a protein. We present here an alternative approach involving the use of evolutionary data in the form of multiple-sequence alignment for a protein family to identify hot spots and surface patches that are likely to be in contact with other proteins, domains, peptides, DNA, RNA or ligands. The underlying assumption in this approach is that key residues that are important for binding should be conserved throughout evolution, just like residues that are crucial for maintaining the protein fold, i.e. buried residues. A main limitation in the implementation of this approach is that the sequence space of a protein family may be unevenly sampled, e.g. mammals may be overly represented. Thus, a seemingly conserved position in the alignment may re¯ect a taxonomically uneven sampling, rather than being indicative of structural or functional importance. To avoid this problem, we present here a novel methodology based on evolutionary relations among proteins as revealed by inferred phylogenetic trees, and demonstrate its capabilities for mapping binding sites in SH2 and PTB signaling domains. A computer program that implements these ideas is available freely at: http://ashtoret.tau.ac.il/ $ rony
Journal of Molecular Biology, 2001
Experimental approaches for the identi®cation of functionally important regions on the surface of a protein involve mutagenesis, in which exposed residues are replaced one after another while the change in binding to other proteins or changes in activity are recorded. However, practical considerations limit the use of these methods to small-scale studies, precluding a full mapping of all the functionally important residues on the surface of a protein. We present here an alternative approach involving the use of evolutionary data in the form of multiple-sequence alignment for a protein family to identify hot spots and surface patches that are likely to be in contact with other proteins, domains, peptides, DNA, RNA or ligands. The underlying assumption in this approach is that key residues that are important for binding should be conserved throughout evolution, just like residues that are crucial for maintaining the protein fold, i.e. buried residues. A main limitation in the implementation of this approach is that the sequence space of a protein family may be unevenly sampled, e.g. mammals may be overly represented. Thus, a seemingly conserved position in the alignment may re¯ect a taxonomically uneven sampling, rather than being indicative of structural or functional importance. To avoid this problem, we present here a novel methodology based on evolutionary relations among proteins as revealed by inferred phylogenetic trees, and demonstrate its capabilities for mapping binding sites in SH2 and PTB signaling domains. A computer program that implements these ideas is available freely at: http://ashtoret.tau.ac.il/ $ rony
Bioinformatics, 2004
A web-based application to analyze protein amino acids conservation-Consensus Sequence (ConSSeq) is presented. ConSSeq graphically represents information about amino acid conservation based on sequence alignments reported in homology-derived structures of proteins. Beyond the relative entropy for each position in the alignment, ConSSeq also presents the consensus sequence and information about the amino acids, which are predominant at each position of the alignment. ConSSeq is part of the STING Millennium Suite and is implemented as a Java Applet.
The PSSH database of alignments between protein sequences and tertiary structures
We introduce the PSSH ( Protein Sequence- to-Structure Homologies) database derived from HSSP2, an improved version of the HSSP ( Homology- derived Secondary Structure of Proteins) database [ Dodge et al. ( 1998) Nucleic Acids Res., 26, 313 - 315]. Whereas each HSSP entry lists all protein sequences related to a given 3D structure, PSSH is the inverse, with each entry listing all structures related to a given sequence. In addition, we introduce two other derived databases: HSSPchain, in which each entry lists all sequences related to a given PDB chain, and HSSPalign, in which each entry gives details of one sequence aligned onto one PDB chain. This re- organization makes it easier to navigate from sequence to structure, and to map sequence features onto 3D structures. Currently ( September 2002), PSSH provides structural information for over 400 000 protein sequences, covering 48% of SWALL and 61% of SWISS- PROT sequences; HSSPchain provides sequence information for over 25 000 PDB chains, and HSSPalign gives over 14 million sequence- to-structure alignments. The databases can be accessed via SRS 3D, an extension to the SRS system, at http: / / srs3d. ebi. ac. uk/