Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force - PubMed (original) (raw)
Background: Many protein sequences, often unrelated, adopt similar folds. Sequences folding into the same shape thus form subsets of sequence space. The shape and the connectivity of these sets have implications for protein evolution and de novo design.
Results: We investigate the topology of these sets for some proteins with known three-dimensional structure using inverse folding techniques. First, we find that sequences adopting a given fold do not cluster in sequence space and that there is no detectable sequence homology among them. Nevertheless, these sequences are connected in the sense that there exists a path such that every sequence can be reached from every other sequence while the fold remains unchanged. We find similar results for restricted amino acid alphabets in some cases (e. g. ADLG). In other cases, it seems impossible to find sequences with native-like behavior (e.g. QLR). These findings seem to be independent of the particular structure considered.
Conclusions: Amino acid sequences folding into a common shape are distributed homogeneously in sequence space. Hence, the connectivity of the set of these sequences implies the existence of very long neutral paths on all examined protein structures. Regarding protein design, these results imply that sequences with more or less arbitrary chemical properties can be attached to a given structural framework. But we also observe that designability varies significantly among native structures. These features of protein sequence space are similar to what has been found for nucleic acids.