You are on page 1of 6

doi:10.1006/jmbi.2001.5226 available online at http://www.idealibrary.com on J. Mol. Biol.

(2002) 315, 479±484

Why Are Proteins So Robust To Site Mutations?


Darin M. Taverna1 and Richard A. Goldstein2*
1
Biophysics Research Division There have been repeated observations that proteins are surprisingly
and robust to site mutations, enduring signi®cant numbers of substitutions
2 with little change in structure, stability, or function. These results are
Department of Chemistry
almost paradoxical in light of what is known about random heteropoly-
University of Michigan, Ann
mers and the sensitivity of their properties to seemingly trivial mutations.
Arbor, MI 48109-1055, USA
To address this discrepancy, the preservation of biological protein prop-
erties in the presence of mutation has been interpreted as indicating the
independence of selective pressure on such properties. Such results also
lead to the prediction that de novo protein design should be relatively
easy, in contrast to what is observed. Here, we use a computational
model with lattice proteins to demonstrate how this robustness can result
from population dynamics during the evolutionary process. As a result,
sequence plasticity may be a characteristic of evolutionarily derived pro-
teins and not necessarily a property of designed proteins. This suggests
that this robustness must be re-interpreted in evolutionary terms, and
has consequences for our understanding of both in vivo and in vitro
protein evolution.
# 2002 Academic Press
Keywords: site substitution; mutagenesis; molecular evaluation; protein
*Corresponding author stability; protein folding

Introduction site substitutions are destabilizing, many result in


essentially unchanged stabilities, and a signi®cant
fraction of mutations actually result in increased
There has been much interest in probing the
stability over the wild type (e.g. see Reddy et al.1).
relationship between a protein's sequence and its
The conclusion drawn from these studies is that
resulting structural, thermodynamic, and func-
there is an inherent robustness to the mapping of
tional properties. It is hoped that insights resulting
from these pursuits will lead to the ability to pre- sequence to structure, and that sequence space con-
dict protein properties based on sequence infor- sists of large regions of possible sequences corre-
mation as well as how these properties could be sponding to proteins with essentially equivalent
altered by changes in the sequence. Such insights properties. The general level of sequence plasticity
are also crucial in developing the ability to design has also led researchers to conclude that the robust
proteins with prescribed or altered structures, stab- properties must not be under active selection
ilities, and functionalities. during evolution.2 This plasticity provides opti-
One of the major methods of investigating the mism for de novo protein design, in that it indicates
relationship of protein sequence to the correspond- that there are large numbers of amino acid
ing properties is to alter naturally occuring pro- sequences consistent with a given stable structure;
teins through site mutagenesis. Often the the ability of a protein to fold despite changed
substitution is chosen so as to modify a speci®c interactions means that the interactions do not
interaction, although more exhaustive and random have to be formulated precisely in advance. Protein
substitutions have been studied. One of the sur- design may correspond to ®nding a needle in a
prising results of such studies is the robustness of haystack, but at least it is a quite sizable needle.
protein properties to mutations. Although most In contrast, de novo engineering of well-packed
proteins has proven ``surprisingly dif®cult''.3 Why
does the sequence plasticity observed in site-
Present address: D. Taverna, Protein Pathways, Inc.,
1145 Gayley Ave., Suite 304, Los Angeles, CA 90024, directed mutagenesis not translate into ease in pro-
USA. tein engineering? Perhaps we are interpreting the
E-mail address of the corresponding author: results of these mutagenesis experiments in the
richard@umich.edu wrong context. Proteins are the result of a long

0022-2836/02/030479±6 $35.00/0 # 2002 Academic Press


480 Why Are Proteins So Robust To Site Mutations?

evolutionary process, involving the dynamics of a In Figure 1, we compare the destabilizing prob-
population cluster, or pseudospecies, in sequence ability for a mutation (the probability that
space. During the past few years we (among Gmut > 0) made in a sequence chosen at random
others) have looked at how population-based evol- with the destabilizing probability of a mutation in
utionary dynamics can alter the structures, func- a sequence resulting from population evolution
tions, and thermodynamics of proteins and other (with Gcrit ˆ 0.0) as a function of Gwt. As can be
biological macromolecules.4 ± 10 seen, the sequences resulting from the population
Here, we use a simple computational model to evolution have a much smaller probability of
demonstrate how population dynamics can explain having a destabilizing mutation compared to
why proteins are so robust to changes in sequence. random sequences with identical initial stabilities.
We describe simulations of the evolution of lattice Figure 2 shows the resulting distributions of
proteins, using two different models. In one model, Gmut for the two different types of evolutionary
a single sequence performs a random walk in the simulations for three different values of Gcrit. In
space of all suf®ciently stable sequences. In the the case of sequences performing a random walk,
second model, we represent the evolution of a most of the mutations lead to reduced stabilities
population of lattice proteins as they undergo close to the average of the random sequence distri-
random mutagenesis, reproduction, and death. bution; only 0.04 % to 0.4 % of mutations resulting
Proteins that emerge during the population simu- in increased stability (depending upon the value of
lations have a robustness to sequence change simi- Gcrit). Conversely, in the case of the population
lar to that found in biological proteins. In contrast, trials, there is an appreciable probability (18 % to
proteins derived from the single-sequence random 28 %) of a mutation being stabilizing with the most
walk, even with the same structure and stability, likely change in stability near zero, especially for
are extremely fragile to sequence changes. The highly stable proteins with large negative values of
evolution of robust sequences proceeds without Gcrit.
any explicit evolutionary pressure for robustness, Since our population evolution sequences are
but rather results directly from the population mutated with a Poisson distribution, we have a
dynamics. In contrast, de novo sequence design decreasing probability that there will be multiple
must deal with the more random set of possible mutations of a particular sequence in the popu-
sequences. These results suggest that the observed lation. Figure 3 presents the average probability
sequence plasticity of biological proteins may that a multiple mutation will be destabilizing com-
occur because proteins have evolved to be robust pared with single mutations, under the constraints
to these speci®c experiments. If so, we may need to of a speci®ed Gcrit. Note that even signi®cant
revise the conclusions based on these observations. sequence changes (three out of 25 residues) in the
proteins resulting from population evolution have
a non-negligible chance of resulting in increased
stabilization.
Results
We use a computational model of proteins
consisting of 25 residues con®ned to a maximally
Discussion
compact, two-dimensional lattice. In addition to It is not surprising that most mutations in bio-
studying the properties of protein sequences cho- logical proteins result in decreased protein stab-
sen at random, we implement two different
dynamic models of sequence change. In the ®rst
model, we allow a single sequence to perform a
random walk in sequence space among all viable
sequences. Sequences are considered viable if the
native state of the protein remains ®xed and if it
remains suf®ciently stable, that is, with a free
energy of folding, Gfolding, smaller than some
®xed parameter, Gcrit. In the second model, we
consider a population of such proteins where the
sequences undergo random mutagenesis, death,
and random reproduction, again with the con-
straint that all proteins must fold into a constant
native state and remain suf®ciently stable in order
to survive to the next generation. Simulations are
performed for both models ®ve times for each of
Figure 1. Probability of a destabilizing mutation
®ve values of Gcrit (0.0, ÿ0.5, ÿ1.0, ÿ1.5, ÿ2.0). P(Gmut > 0)) from sequences resulting from popu-
The robustness of the resulting sequences to lation evolution with Gcrit ˆ 0 (continuous line) is com-
mutation is monitored by recording the depen- pared with random sequences (broken line), as a
dence of the change in stability (Gmut) as a function of the original stability of the unmutated pro-
function of the protein's stability prior to the tein Gwt. The destabilization probability for stable ran-
mutation (Gwt). dom sequences (with Gwt < 0) is close to unity.
Why Are Proteins So Robust To Site Mutations? 481

Figure 2. Density distribution of Gmut from model Figure 3. Probability of destabilizing mutation from
proteins undergoing population (continuous line) and model proteins undergoing population evolution,
single sequence evolution (broken line), for various according to the number of point mutations, as a func-
values of Gcrit. tion of Gcrit (thin continuous line). The average rate for
all mutations (thick continuous line) and the high rate
of destabilizing mutations for single sequence evolution
(broken line) are included for comparison.

ility. If we consider the fact that most random


sequences of amino acids do not have a stable
folded state, then any mutation in one of the few
viable sequences with a stable ground state would So how can evolution, where robustness is not
most likely move the stability in the direction of an explicit selection criterion, result in such unex-
the more random sequences; that is, towards being pectedly robust proteins? Insight into this phenom-
less stable. enon comes from the pioneering work by Eigen.25
The surprise is rather that experimental In analytical studies of RNA evolution, he found
mutations have a signi®cant probability in result- that evolution selected for a network of genotypes,
ing in unchanged or increased stability. The exact what he called quasispecies. The relevant ®tness of
percentages of mutations that are stabilizing vary the quasispecies is a function of the ®tness of all of
according to the protein and the nature of the sub- the genotypes, so the population of any one geno-
stitutions, ranging from approximately 8 % in type would be enhanced by being surrounded by
mutations of barnase11 and staphylococcal nucle- ®t neighbors. This effect depends on the possibility
ase12 ± 14 designed to eliminate speci®c interactions, of back-mutations: if one genotype contributes to a
to 17 % in interior locations of myoglobin,15 20 % of neighboring genotype in one generation, there is a
non-Ala locations in Arc repressor,16 and 29 % for probability that the neighboring genotype will
two speci®c solvent-exposed locations in phage T4 return the favor in a future generation. Through
lysozyme.17 A more comprehensive set of 356 site- this mechanism, population dynamics result in an
evolutionary selection of genotypes biased by the
directed mutations compiled from the literature by
®tness of their neighbors; that is, on their robust-
Reddy and co-workers showed that 25 % of the
ness to mutations. In the sense of ®tness land-
mutations increased protein stability.1 While the
scapes, nature may choose broad ®tness plateaus
speci®cs vary, these results are suf®ciently consist-
of well-connected neighbors even in the presence
ent to conclude that robustness would seem to be a
of higher, yet poorly connected ®tness peaks. This
general characteristic of systems that have come evolutionary heritage is encoded in the genotype,
into being through the Darwinian evolution of resulting in a sequence plasticity that distinguishes
populations. The range of these experimental these sequences from random sequences chosen to
results is close to the 18 % to 28 % (depending have the same phenotype. Bornberg-Bauer &
upon Gcrit) we observed for our lattice proteins Chan, for instance, found that evolutionary
evolving through population dynamics, but far dynamics would result in a bias in the population
from the 0.04 % to 0.4 % observed for random-walk towards ``prototype'' sequences with the maximum
sequences with comparable Gcrit. number of ``neutral neighbors''.6 The work
Note that this robustness occurs in the absence described here concentrates on protein stability,
of any selective pressure towards robustness. The but it should be true of any protein property that
various sequences in the model with is important for survival of the organism. This
Gfolding < Gcrit have equal ®tness and equal evolutionary trend towards robustness may be a
probability of contributing to the next generation. general characteristic of biological systems.26,27
Robustness towards mutations is just one of a There are a number of important consequences
number of properties that emerge from neutral of this effect. Firstly, the lessons of sequence plas-
evolution in sequence space, as has been empha- ticity in biological proteins may be inapplicable to
sized by a number of authors.4,5,7,10,18 ± 24 arti®cally designed proteins. It may be necessary to
482 Why Are Proteins So Robust To Site Mutations?

have a de novo sequence exquisitely designed to two-dimensional model to provide a more realistic ratio
have properties similar to biological proteins. This of buried to exposed sites.
also suggests that taking advantage of the We assume that the energies of any sequence in con-
observed robustness by modifying existing pro- formation k is given by a simple contact energy of the
teins may be a more effective route. Alternatively, form:
in vitro evolution studies may provide proteins X
with the same degree of sequence plasticity as Eˆ g…Ai Aj †Uijk …1†
i<j
natural proteins. More optimistically, proteins may
have compromised possible interactions and prop- Here, Ukij is equal to 1 if residues i and j are not cova-
erties in developing this robustness, which lently connected but are on adjacent lattice sites in con-
suggests that more effective if less robust proteins formation k, and g …Ai Aj † is the contact energy between
may be available. amino acid Ai at location i and Aj at location j in the
In addition, these results suggest that the sequence. We use the contact energies derived by Miya-
observed sequence plasticity may have non- zawa & Jernigan based on a statistical analysis of the
obvious consequences for our understanding of database of known proteins that implicitly includes the
effect of interactions of the protein with the solvent.31 In
proteins and their evolution. For instance, Baker
our simpli®ed proteins, there are 132 pairs of residues
and co-workers observed that sequence changes in that can possibly come into contact, with 16 of these con-
the IgG binding domain of protein L often resulted tacts present in any given compact structure.
in proteins that folded faster than the wild-type Using equation (1) we can calculate the energy of a
protein, and concluded that this indicates that the given protein sequence in all 1081 possible confor-
folding rate is not under strong selective pressure.2 mations. We make the assumption that the thermodyn-
The model presented here results in the opposite amic hypothesis is obeyed and that the lowest-energy
conclusion, that properties of the protein under structure is the native state;32 the other 1080 possible
stronger selective pressure are more likely to be structures represent the ensemble of unfolded states.
``buffered'' and thus robust to mutations. In other Not all possible protein sequences are viable. In gener-
words, robustness to site mutations would para- al, a protein must ful®l a number of conditions relating
doxically be an indication of stronger selective to stability, functionality, and foldability. Here, we con-
centrate on stability. For each sequence, we calculate the
pressure on these characteristics.
free energy of folding:
Finally, we note that there is growing interest in
the relationship between robustness and evolvabil- Gfolding ˆ Ef ‡ kT ln…Z ÿ exp…ÿEf =kT†† …2†
ity; that is, between the ability to buffer genotypic
variations and the ability of an organism to modify where Z is the partition function. (For the Miyazawa-
to new situations and environments.28 If so, the Jernigan potential, we use kT ˆ 0.6.) We consider a
tendency of population dynamics to increase sequence as representing a viable protein as long as its
sequence plasticity might have had signi®cant Gfolding is less than some speci®ed Gcrit.
impact on the evolutionary process, including We implement two different dynamic models of
the development of new functionalities of existent sequence change. In the ®rst model, we choose a
sequence at random and make point mutations until we
proteins.
arrive at a suf®ciently stable protein sequence. Starting
with this initial stable sequence, residue positions are
Methods randomly mutated with the number of mutations chosen
from a Poisson distribution with an average of 0.002
We consider a highly simpli®ed representation of mutations per amino acid residue per generation. With
evolving proteins. Our model proteins consist of chains this low mutation rate, multiple mutations are rare (the
of n ˆ 25 monomers, con®ned to a 5  5 two-dimen- ratio of single mutants to multiple mutants is 200). We
sional, maximally compact square lattice with each calculate the stability of the new sequence; if Gfolding is
monomer located at one lattice point. This provides us larger than Gcrit or the structure has changed, the
with 1081 possible conformations represented by the mutation is rejected and the original sequence retained.
1081 self-avoiding walks on this lattice, neglecting struc- Generations where no mutations occurred are not
tures related by rotation, re¯ection, or inversion. The counted. This allows the single sequence to diffuse ran-
non-compact states were neglected in order to allow for domly over the range of acceptable sequences, analogous
a reasonable number of stable sequences. Alternatively, to random-walk models in which a particle has average
we would expect the non-compact states to be neglect- zero velocity when a boundary is encountered. This is
ible as long as the contact energies were suf®ciently done ®ve times for each of ®ve values of Gcrit (0.0,
attractive. The fact that most protein structures are ÿ0.5, ÿ1.0, ÿ1.5, ÿ2.0). Sequences that arise during these
reasonably compact makes this assumption not too runs are probed for robustness to mutations. We make
unreasonable. There are important differences between mutations in the sequence with a Poisson distribution
the two-dimensional and three-dimensional models, with mean 0.002 mutation per amino acid residue, main-
especially in folding simulations where the two-dimen- taining a constant rare rate of multiple mutations. We
sional conformation space may not be ergodic.29,30 While then calculate the probability that a mutation results in a
these limitations are critical in folding simulations, we given change in stability (Gmut) as a function of the
are more interested in the mapping of sequence to struc- stability prior to the mutation (Gwt).
ture rather than how the sequence folds to that given For the second model, we simulate the effect of popu-
structure; the thermodynamic properties described lation dynamics using an evolutionary scheme, using a
below involve sums over states and should be less method described elsewhere.8 We construct a population
affected by the dimensionality of the model. We use the of N ˆ 3000 identical viable sequences. For each gener-
Why Are Proteins So Robust To Site Mutations? 483

ation, each residue in the protein population has a prob- 9. Williams, P. D., Pollock, D. D. & Goldstein, R. A.
ability of 0.002 to be mutated to another random residue; (2001). Evolution of functionality in lattice proteins.
both the population size and mutation rate were chosen J. Mol. Graph. Mod. 19, 150-156.
to be comparable to previous analytical models of evol- 10. Taverna, D. & Goldstein, R. A. (2001). Why are pro-
ution processes.33 ± 35 The stability of each protein in the teins marginally stable? Proteins: Struct. Funct. Genet.
population is then calculated. We use truncation selec- In the press.
tion where the N0 sequences having Gfolding < Gcrit 11. Serrano, L. J. T., Kellis, J., Cann, P., Matouschek, A.
and a conserved native state structure are considered & Fersht, A. R. (1992). The folding of an enzyme II:
viable and capable of reproducing; the rest are removed substructure of barnase and the contribution of
from the population. The next generation of N sequences different interactions to protein stability. J. Mol. Biol.
is chosen from the N0 surviving sequences randomly 224, 783-804.
with replacement, representing the stochastic process 12. Shortle, D., Stites, W. E. & Meeker, A. K. (1990).
of reproduction. The population is ®rst allowed to pre- Contributions of the large hydrophobic amino acids
equilibrate for 30,000 generations. The evolutionary to the stability of staphylococcal nuclease. Biochemis-
simulations are then continued for an additional 30,000 try, 29, 8033-8041.
generations. In the subsequent 30,000 generations, we 13. Green, S. M., Meeker, A. K. & Shortle, D. (1992).
monitor the stability of the sequences (Gwt) as well as Contributions of the polar, uncharged amino acids
the changes in stability that occur with mutations to the stability of staphylococcal nuclease: evidence
(Gmut). We perform these calculations ®ve times for for mutational effects on the free energy of the
the same Gcrit constraints used for the single-sequence denatured state. Biochemistry, 31, 5717-5728.
trials. 14. Meeker, A. K., Garcia-Moreno, B. & Shortle, D.
(1996). Contributions of the ionizable amino acids to
the stability of staphylococcal nuclease. Biochemistry,
35, 6443-6449.
15. Lin, L., Pinker, R. J. & Kallenbach, N. R. (1993).
Acknowledgments a-Helix stability and the native state of myoglobin.
Biochemistry, 32, 12638-12643.
We thank Lee Altenberg, Nicolas Buchler, Matthew 16. Milla, M. E., Brown, B. M. & Sauer, R. T. (1994).
Dimmic, Walter Fontana, Luca Peliti, Kevin Plaxco, Protein stability effects of a complete set of alanine
David Pollock, and Peter Wolynes for insights and help- substitutions in arc repressor. Nature Struct. Biol. 1,
ful comments, and Matthew Dimmic, Bin Qian, and 518-523.
Todd Raeker for computational assistance. Financial sup- 17. Blaber, M., Zhang, X. J., Lindstrom, J. D., Pepiot,
port was provided by NIH grant numbers LM05770 and S. D., Baase, W. A. & Matthews, B. W. (1994). Deter-
GM08270, and NSF shared equipment grant number mination of alpha-helix propensity within the
BIR9512955. context of a folded protein. Sites 44 and 131 in bac-
teriophage t4 lysozyme. J. Mol. Biol. 235, 600-624.
18. Lipman, D. J. & Wilbur, W. J. (1991). Modelling
neutral and selective evolution of protein folding.
References Proc. Roy. Soc. London, 245, 7-11.
1. Reddy, B. V. B., Datta, S. & Tiwari, S. (1998). Use of 19. Schuster, P., Fontana, W., Stadler, P. F. & Hofacker,
propensities of amino acids to the local structural I. L. (1994). From sequences to shapes and back: a
environment to understand effect of substitution case study in RNA secondary structures. Proc. Roy.
mutations on protein stability. Protein Eng. 11, 1137- Soc. ser. B. 255, 279-284.
1145. 20. Bornberg-Bauer, E. (1997). How are model protein
2. Kim, D. E., Gu, H. & Baker, D. (1998). The structures distributed in sequence space? Biophys. J.
sequences of small proteins are not extensively opti- 73, 2393-2403.
mized for rapid folding by natural selection. Proc. 21. Babajide, A., Hofacker, I. L., Sippl, M. J. & Stadler,
Natl Acad. Sci. USA, 95, 4982-4986. P. F. (1997). Neutral networks in protein space: a
3. DeGrado, W. F., Summa, C. M., Pavone, V., Nastri, computational study based on knowledge-based
F. & Lombardi, A. (1999). De novo design and potentials of mean force. Fold. Des. 2, 261-269.
structural characterization of proteins and metallo- 22. Bourdeau, V., Ferbeyre, G., Pageau, M., Paquin, B.
proteins. Annu. Rev. Biochem. 68, 779-819. & Cedergren, R. (1999). The distribution of RNA
4. Fontana, W. & Schuster, P. (1998). Continuity in motifs in natural sequences. Nucl. Acids Res. 27,
evolution: on the nature of transitions. Science, 280, 4457-4467.
1451-1455. 23. Forst, C. V. (2000). Molecular evolution of catalysis.
5. Bastolla, U., Roman, H. E. & Vendruscolo, M. J. Theor. Biol. 205, 409-431.
(1999). Neutral evolution of model proteins: diffu- 24. Reidys, C., Forst, C. V. & Schuster, P. (2001). Repli-
sion in sequence space and overdispersion. J. Theor. cation and mutation on neutral networks. Bull.
Biol. 200, 49-64. Math. Biol. 63, 57-94.
6. Bornberg-Bauer, E. & Chan, H. S. (1999). Modeling 25. Eigen, M. (1971). Selforganization of matter and the
evolutionary landscapes: mutational stability, top- evolution of biological macromolecules. Naturwis-
ology, and superfunnels in sequence space. Proc. senschaften, 10, 465-523.
Natl Acad. Sci. USA, 96, 10689-10694. 26. van Nimwegen, E., Crutch®eld, J. P. & Huynen, M.
7. Ancel, L. W. & Fontana, W. (2000). Plasticity, evol- (1999). Neutral evolution of mutational robustness.
vability and modularity in RNA. J. Expt. Zool. 288, Proc. Natl Acad. Sci. USA, 96, 9716-9720.
242-283. 27. Wilke, C. O., Wang, J. L., Ofria, C., Lenski, R. E. &
8. Taverna, D. & Goldstein, R. A. (2000). The distri- Adami, C. (2001). Evolution of digital organisms at
bution of structures in evolving protein populations. high mutation rates leads to survival of the ¯attest.
Biopolymers, 53, 1-8. Nature, 412, 331-333.
484 Why Are Proteins So Robust To Site Mutations?

28. Kirschner, M. & Gerhart, J. (1998). Evolvability. Proc. crystal structures: quasi-chemical approximation.
Natl Acad. Sci. USA, 95, 8420-8427. Macromolecules, 18, 534-552.
29. Abkevich, A. I., Gutin, A. M. & Shakhnovich, E. I. 32. Govindarajan, S. & Goldstein, R. A. (1998). On the
(1995). Impact of local and non-local interactions on thermodynamic hypothesis of protein folding. Proc.
thermodynamics and kinetics of protein folding. Natl Acad. Sci. USA, 95, 5545-5549.
J. Mol. Biol. 252, 460-471. 33. Kimura, M. (1979). The neutral theory of molecular
30. Pande, V. S., Grosberg, A. Y. & Tanaka, T. (1997). evolution. Sci. Am. 241, 98-126.
Statistical mechanics of simple models of protein 34. Ohta, T. (1987). Simulating evolution by gene dupli-
folding and design. Biophys. J. 73, 3192-3210. cation. Genetics, 115, 207-213.
31. Miyazawa, S. & Jernigan, R. L. (1985). Estimation of 35. Ohta, T. (1988). Multigene and supergene families.
effective interresidue contact energies from protein Oxford Surv. Evol. Biol. 5, 41-65.

Edited by J. Thornton

(Received 6 August 2001; received in revised form 22 October 2001; accepted 23 October 2001)

You might also like