Protein evolution along phylogenetic histories under structurally constrained substitution models (original) (raw)

Substitution rates predicted by stability-constrained models of protein evolution are not consistent with empirical data

Molecular biology and evolution, 2017

Protein structures strongly influence molecular evolution. In particular, the evolutionary rate of a protein site depends on the number of its native contacts. Stability constrained models of protein evolution consider this influence of protein structure on evolution by predicting the effect of mutations on the stability of the native state, but they currently neglect how mutations affect the protein structure. These models predict that buried protein sites with more native contacts are more constrained by natural selection and less variable, as observed. Nevertheless, previous work did not consider the stability against compact misfolded conformations, although it is known that the negative design that destabilizes these misfolded conformations influences protein evolution significantly. Here we show that stability constrained models that consider misfolding predict that site-specific sequence entropy and substitution rate peak at amphiphilic sites with an intermediate number of co...

Modeling Evolution at the Protein Level Using an Adjustable Amino Acid Fitness Model

Biocomputing 2000, 1999

An adjustable fitness model for amino acid site substitutions is investigated. This model, a generalization of previously developed evolutionary models, has several distinguishing characteristics: it separately accounts for the processes of mutation and substitution, allows for heterogeneity among substitution rates and among evolutionary constraints, and does not make any prior assumptions about which sites or characteristics of proteins are important to molecular evolution. While the model has fewer adjustable parameters than the general reversible mtREV model, when optimized it outperforms mtREV in likelihood analysis on protein-coding mitochondrial genes. In addition, the optimized fitness parameters of the model show correspondence to some biophysical characteristics of amino acids.

A Model of Substitution Trajectories in Sequence Space and Long-Term Protein Evolution

Molecular Biology and Evolution, 2014

What drives protein evolution is still being debated. One hypothesis states that the rate and extent of protein evolution can only be explained if amino acid substitutions change the fitness effects of amino acids at other sites. Using a simple model of protein evolution we show that the rate and extent of long-term protein sequence divergence can be explained if interactions between amino acid sites are taken into account. The conclusion that amino acid substitutions strongly affect the direction of evolution at other sites explains many patterns in long-term protein evolution and has important consequences for our understanding of protein function and protein engineering.

Biophysics of protein evolution and evolutionary protein biophysics

The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.

The evolution dynamics of model proteins

The Journal of Chemical Physics, 2004

Explicit simulations of protein evolution, where protein chains are described at a molecular, although simplified, level provide important information to understand the similarities found to exist between known proteins. The results of such simulations suggest that a number of evolutionary-related quantities, such as the distribution of sequence similarity for structurally similar proteins, are controlled by evolutionary kinetics and do not reflect an equilibrium state. An important result for phylogeny is that a subset of the residues of each protein evolve on a much larger time scale than the other residues.

Protein evolution within and between species

Journal of Theoretical Biology, 2007

Background: Protein evolution is particularly shaped by the conservation of the amino acids' physico-chemical properties and the structure of the genetic code. While conservation is the result of negative selection against proteins with reduced functionality, the codon sequences determine the stochastic aspect of amino acid exchanges. Thus far, it is known that the genetic code is the dominant factor if little time has elapsed since the divergence of one gene into two, but physicochemical forces gain importance at greater evolutionary distances. Further details, however, on how the influence of these factors varies with time are unknown to date.

The interface of protein structure, protein biophysics, and molecular evolution

Protein Science, 2012

The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.

A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank

BMC evolutionary biology, 2006

Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account. We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare...

SIMPROT: using an empirically determined indel distribution in simulations of protein evolution

BMC bioinformatics, 2005

General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets. We have developed a new method of simulating protein sequence evolution, including insertion and deletion (indel) events in addition to amino-acid substitutions. The simulation generates both the simulated sequence family and a true sequence alignment that captures the evolutionary relationships between amino acids from different sequences. Our statistical model for indel evolution is based on the empirical indel distribution determined by Qian and Goldstein. We have parameterized this distribution so that it applies to sequences diverged by varying evolutionary times and generalized it to provide flexibility in simulation conditions. Our method uses a Monte-Carlo simulation strategy, and has been implemented in a C++ program named Simprot. Simprot will be...