Professional Documents
Culture Documents
evolutionary process, involving the dynamics of a In Figure 1, we compare the destabilizing prob-
population cluster, or pseudospecies, in sequence ability for a mutation (the probability that
space. During the past few years we (among Gmut > 0) made in a sequence chosen at random
others) have looked at how population-based evol- with the destabilizing probability of a mutation in
utionary dynamics can alter the structures, func- a sequence resulting from population evolution
tions, and thermodynamics of proteins and other (with Gcrit 0.0) as a function of Gwt. As can be
biological macromolecules.4 ± 10 seen, the sequences resulting from the population
Here, we use a simple computational model to evolution have a much smaller probability of
demonstrate how population dynamics can explain having a destabilizing mutation compared to
why proteins are so robust to changes in sequence. random sequences with identical initial stabilities.
We describe simulations of the evolution of lattice Figure 2 shows the resulting distributions of
proteins, using two different models. In one model, Gmut for the two different types of evolutionary
a single sequence performs a random walk in the simulations for three different values of Gcrit. In
space of all suf®ciently stable sequences. In the the case of sequences performing a random walk,
second model, we represent the evolution of a most of the mutations lead to reduced stabilities
population of lattice proteins as they undergo close to the average of the random sequence distri-
random mutagenesis, reproduction, and death. bution; only 0.04 % to 0.4 % of mutations resulting
Proteins that emerge during the population simu- in increased stability (depending upon the value of
lations have a robustness to sequence change simi- Gcrit). Conversely, in the case of the population
lar to that found in biological proteins. In contrast, trials, there is an appreciable probability (18 % to
proteins derived from the single-sequence random 28 %) of a mutation being stabilizing with the most
walk, even with the same structure and stability, likely change in stability near zero, especially for
are extremely fragile to sequence changes. The highly stable proteins with large negative values of
evolution of robust sequences proceeds without Gcrit.
any explicit evolutionary pressure for robustness, Since our population evolution sequences are
but rather results directly from the population mutated with a Poisson distribution, we have a
dynamics. In contrast, de novo sequence design decreasing probability that there will be multiple
must deal with the more random set of possible mutations of a particular sequence in the popu-
sequences. These results suggest that the observed lation. Figure 3 presents the average probability
sequence plasticity of biological proteins may that a multiple mutation will be destabilizing com-
occur because proteins have evolved to be robust pared with single mutations, under the constraints
to these speci®c experiments. If so, we may need to of a speci®ed Gcrit. Note that even signi®cant
revise the conclusions based on these observations. sequence changes (three out of 25 residues) in the
proteins resulting from population evolution have
a non-negligible chance of resulting in increased
stabilization.
Results
We use a computational model of proteins
consisting of 25 residues con®ned to a maximally
Discussion
compact, two-dimensional lattice. In addition to It is not surprising that most mutations in bio-
studying the properties of protein sequences cho- logical proteins result in decreased protein stab-
sen at random, we implement two different
dynamic models of sequence change. In the ®rst
model, we allow a single sequence to perform a
random walk in sequence space among all viable
sequences. Sequences are considered viable if the
native state of the protein remains ®xed and if it
remains suf®ciently stable, that is, with a free
energy of folding, Gfolding, smaller than some
®xed parameter, Gcrit. In the second model, we
consider a population of such proteins where the
sequences undergo random mutagenesis, death,
and random reproduction, again with the con-
straint that all proteins must fold into a constant
native state and remain suf®ciently stable in order
to survive to the next generation. Simulations are
performed for both models ®ve times for each of
Figure 1. Probability of a destabilizing mutation
®ve values of Gcrit (0.0, ÿ0.5, ÿ1.0, ÿ1.5, ÿ2.0). P(Gmut > 0)) from sequences resulting from popu-
The robustness of the resulting sequences to lation evolution with Gcrit 0 (continuous line) is com-
mutation is monitored by recording the depen- pared with random sequences (broken line), as a
dence of the change in stability (Gmut) as a function of the original stability of the unmutated pro-
function of the protein's stability prior to the tein Gwt. The destabilization probability for stable ran-
mutation (Gwt). dom sequences (with Gwt < 0) is close to unity.
Why Are Proteins So Robust To Site Mutations? 481
Figure 2. Density distribution of Gmut from model Figure 3. Probability of destabilizing mutation from
proteins undergoing population (continuous line) and model proteins undergoing population evolution,
single sequence evolution (broken line), for various according to the number of point mutations, as a func-
values of Gcrit. tion of Gcrit (thin continuous line). The average rate for
all mutations (thick continuous line) and the high rate
of destabilizing mutations for single sequence evolution
(broken line) are included for comparison.
have a de novo sequence exquisitely designed to two-dimensional model to provide a more realistic ratio
have properties similar to biological proteins. This of buried to exposed sites.
also suggests that taking advantage of the We assume that the energies of any sequence in con-
observed robustness by modifying existing pro- formation k is given by a simple contact energy of the
teins may be a more effective route. Alternatively, form:
in vitro evolution studies may provide proteins X
with the same degree of sequence plasticity as E g
Ai Aj Uijk
1
i<j
natural proteins. More optimistically, proteins may
have compromised possible interactions and prop- Here, Ukij is equal to 1 if residues i and j are not cova-
erties in developing this robustness, which lently connected but are on adjacent lattice sites in con-
suggests that more effective if less robust proteins formation k, and g
Ai Aj is the contact energy between
may be available. amino acid Ai at location i and Aj at location j in the
In addition, these results suggest that the sequence. We use the contact energies derived by Miya-
observed sequence plasticity may have non- zawa & Jernigan based on a statistical analysis of the
obvious consequences for our understanding of database of known proteins that implicitly includes the
effect of interactions of the protein with the solvent.31 In
proteins and their evolution. For instance, Baker
our simpli®ed proteins, there are 132 pairs of residues
and co-workers observed that sequence changes in that can possibly come into contact, with 16 of these con-
the IgG binding domain of protein L often resulted tacts present in any given compact structure.
in proteins that folded faster than the wild-type Using equation (1) we can calculate the energy of a
protein, and concluded that this indicates that the given protein sequence in all 1081 possible confor-
folding rate is not under strong selective pressure.2 mations. We make the assumption that the thermodyn-
The model presented here results in the opposite amic hypothesis is obeyed and that the lowest-energy
conclusion, that properties of the protein under structure is the native state;32 the other 1080 possible
stronger selective pressure are more likely to be structures represent the ensemble of unfolded states.
``buffered'' and thus robust to mutations. In other Not all possible protein sequences are viable. In gener-
words, robustness to site mutations would para- al, a protein must ful®l a number of conditions relating
doxically be an indication of stronger selective to stability, functionality, and foldability. Here, we con-
centrate on stability. For each sequence, we calculate the
pressure on these characteristics.
free energy of folding:
Finally, we note that there is growing interest in
the relationship between robustness and evolvabil- Gfolding Ef kT ln
Z ÿ exp
ÿEf =kT
2
ity; that is, between the ability to buffer genotypic
variations and the ability of an organism to modify where Z is the partition function. (For the Miyazawa-
to new situations and environments.28 If so, the Jernigan potential, we use kT 0.6.) We consider a
tendency of population dynamics to increase sequence as representing a viable protein as long as its
sequence plasticity might have had signi®cant Gfolding is less than some speci®ed Gcrit.
impact on the evolutionary process, including We implement two different dynamic models of
the development of new functionalities of existent sequence change. In the ®rst model, we choose a
sequence at random and make point mutations until we
proteins.
arrive at a suf®ciently stable protein sequence. Starting
with this initial stable sequence, residue positions are
Methods randomly mutated with the number of mutations chosen
from a Poisson distribution with an average of 0.002
We consider a highly simpli®ed representation of mutations per amino acid residue per generation. With
evolving proteins. Our model proteins consist of chains this low mutation rate, multiple mutations are rare (the
of n 25 monomers, con®ned to a 5 5 two-dimen- ratio of single mutants to multiple mutants is 200). We
sional, maximally compact square lattice with each calculate the stability of the new sequence; if Gfolding is
monomer located at one lattice point. This provides us larger than Gcrit or the structure has changed, the
with 1081 possible conformations represented by the mutation is rejected and the original sequence retained.
1081 self-avoiding walks on this lattice, neglecting struc- Generations where no mutations occurred are not
tures related by rotation, re¯ection, or inversion. The counted. This allows the single sequence to diffuse ran-
non-compact states were neglected in order to allow for domly over the range of acceptable sequences, analogous
a reasonable number of stable sequences. Alternatively, to random-walk models in which a particle has average
we would expect the non-compact states to be neglect- zero velocity when a boundary is encountered. This is
ible as long as the contact energies were suf®ciently done ®ve times for each of ®ve values of Gcrit (0.0,
attractive. The fact that most protein structures are ÿ0.5, ÿ1.0, ÿ1.5, ÿ2.0). Sequences that arise during these
reasonably compact makes this assumption not too runs are probed for robustness to mutations. We make
unreasonable. There are important differences between mutations in the sequence with a Poisson distribution
the two-dimensional and three-dimensional models, with mean 0.002 mutation per amino acid residue, main-
especially in folding simulations where the two-dimen- taining a constant rare rate of multiple mutations. We
sional conformation space may not be ergodic.29,30 While then calculate the probability that a mutation results in a
these limitations are critical in folding simulations, we given change in stability (Gmut) as a function of the
are more interested in the mapping of sequence to struc- stability prior to the mutation (Gwt).
ture rather than how the sequence folds to that given For the second model, we simulate the effect of popu-
structure; the thermodynamic properties described lation dynamics using an evolutionary scheme, using a
below involve sums over states and should be less method described elsewhere.8 We construct a population
affected by the dimensionality of the model. We use the of N 3000 identical viable sequences. For each gener-
Why Are Proteins So Robust To Site Mutations? 483
ation, each residue in the protein population has a prob- 9. Williams, P. D., Pollock, D. D. & Goldstein, R. A.
ability of 0.002 to be mutated to another random residue; (2001). Evolution of functionality in lattice proteins.
both the population size and mutation rate were chosen J. Mol. Graph. Mod. 19, 150-156.
to be comparable to previous analytical models of evol- 10. Taverna, D. & Goldstein, R. A. (2001). Why are pro-
ution processes.33 ± 35 The stability of each protein in the teins marginally stable? Proteins: Struct. Funct. Genet.
population is then calculated. We use truncation selec- In the press.
tion where the N0 sequences having Gfolding < Gcrit 11. Serrano, L. J. T., Kellis, J., Cann, P., Matouschek, A.
and a conserved native state structure are considered & Fersht, A. R. (1992). The folding of an enzyme II:
viable and capable of reproducing; the rest are removed substructure of barnase and the contribution of
from the population. The next generation of N sequences different interactions to protein stability. J. Mol. Biol.
is chosen from the N0 surviving sequences randomly 224, 783-804.
with replacement, representing the stochastic process 12. Shortle, D., Stites, W. E. & Meeker, A. K. (1990).
of reproduction. The population is ®rst allowed to pre- Contributions of the large hydrophobic amino acids
equilibrate for 30,000 generations. The evolutionary to the stability of staphylococcal nuclease. Biochemis-
simulations are then continued for an additional 30,000 try, 29, 8033-8041.
generations. In the subsequent 30,000 generations, we 13. Green, S. M., Meeker, A. K. & Shortle, D. (1992).
monitor the stability of the sequences (Gwt) as well as Contributions of the polar, uncharged amino acids
the changes in stability that occur with mutations to the stability of staphylococcal nuclease: evidence
(Gmut). We perform these calculations ®ve times for for mutational effects on the free energy of the
the same Gcrit constraints used for the single-sequence denatured state. Biochemistry, 31, 5717-5728.
trials. 14. Meeker, A. K., Garcia-Moreno, B. & Shortle, D.
(1996). Contributions of the ionizable amino acids to
the stability of staphylococcal nuclease. Biochemistry,
35, 6443-6449.
15. Lin, L., Pinker, R. J. & Kallenbach, N. R. (1993).
Acknowledgments a-Helix stability and the native state of myoglobin.
Biochemistry, 32, 12638-12643.
We thank Lee Altenberg, Nicolas Buchler, Matthew 16. Milla, M. E., Brown, B. M. & Sauer, R. T. (1994).
Dimmic, Walter Fontana, Luca Peliti, Kevin Plaxco, Protein stability effects of a complete set of alanine
David Pollock, and Peter Wolynes for insights and help- substitutions in arc repressor. Nature Struct. Biol. 1,
ful comments, and Matthew Dimmic, Bin Qian, and 518-523.
Todd Raeker for computational assistance. Financial sup- 17. Blaber, M., Zhang, X. J., Lindstrom, J. D., Pepiot,
port was provided by NIH grant numbers LM05770 and S. D., Baase, W. A. & Matthews, B. W. (1994). Deter-
GM08270, and NSF shared equipment grant number mination of alpha-helix propensity within the
BIR9512955. context of a folded protein. Sites 44 and 131 in bac-
teriophage t4 lysozyme. J. Mol. Biol. 235, 600-624.
18. Lipman, D. J. & Wilbur, W. J. (1991). Modelling
neutral and selective evolution of protein folding.
References Proc. Roy. Soc. London, 245, 7-11.
1. Reddy, B. V. B., Datta, S. & Tiwari, S. (1998). Use of 19. Schuster, P., Fontana, W., Stadler, P. F. & Hofacker,
propensities of amino acids to the local structural I. L. (1994). From sequences to shapes and back: a
environment to understand effect of substitution case study in RNA secondary structures. Proc. Roy.
mutations on protein stability. Protein Eng. 11, 1137- Soc. ser. B. 255, 279-284.
1145. 20. Bornberg-Bauer, E. (1997). How are model protein
2. Kim, D. E., Gu, H. & Baker, D. (1998). The structures distributed in sequence space? Biophys. J.
sequences of small proteins are not extensively opti- 73, 2393-2403.
mized for rapid folding by natural selection. Proc. 21. Babajide, A., Hofacker, I. L., Sippl, M. J. & Stadler,
Natl Acad. Sci. USA, 95, 4982-4986. P. F. (1997). Neutral networks in protein space: a
3. DeGrado, W. F., Summa, C. M., Pavone, V., Nastri, computational study based on knowledge-based
F. & Lombardi, A. (1999). De novo design and potentials of mean force. Fold. Des. 2, 261-269.
structural characterization of proteins and metallo- 22. Bourdeau, V., Ferbeyre, G., Pageau, M., Paquin, B.
proteins. Annu. Rev. Biochem. 68, 779-819. & Cedergren, R. (1999). The distribution of RNA
4. Fontana, W. & Schuster, P. (1998). Continuity in motifs in natural sequences. Nucl. Acids Res. 27,
evolution: on the nature of transitions. Science, 280, 4457-4467.
1451-1455. 23. Forst, C. V. (2000). Molecular evolution of catalysis.
5. Bastolla, U., Roman, H. E. & Vendruscolo, M. J. Theor. Biol. 205, 409-431.
(1999). Neutral evolution of model proteins: diffu- 24. Reidys, C., Forst, C. V. & Schuster, P. (2001). Repli-
sion in sequence space and overdispersion. J. Theor. cation and mutation on neutral networks. Bull.
Biol. 200, 49-64. Math. Biol. 63, 57-94.
6. Bornberg-Bauer, E. & Chan, H. S. (1999). Modeling 25. Eigen, M. (1971). Selforganization of matter and the
evolutionary landscapes: mutational stability, top- evolution of biological macromolecules. Naturwis-
ology, and superfunnels in sequence space. Proc. senschaften, 10, 465-523.
Natl Acad. Sci. USA, 96, 10689-10694. 26. van Nimwegen, E., Crutch®eld, J. P. & Huynen, M.
7. Ancel, L. W. & Fontana, W. (2000). Plasticity, evol- (1999). Neutral evolution of mutational robustness.
vability and modularity in RNA. J. Expt. Zool. 288, Proc. Natl Acad. Sci. USA, 96, 9716-9720.
242-283. 27. Wilke, C. O., Wang, J. L., Ofria, C., Lenski, R. E. &
8. Taverna, D. & Goldstein, R. A. (2000). The distri- Adami, C. (2001). Evolution of digital organisms at
bution of structures in evolving protein populations. high mutation rates leads to survival of the ¯attest.
Biopolymers, 53, 1-8. Nature, 412, 331-333.
484 Why Are Proteins So Robust To Site Mutations?
28. Kirschner, M. & Gerhart, J. (1998). Evolvability. Proc. crystal structures: quasi-chemical approximation.
Natl Acad. Sci. USA, 95, 8420-8427. Macromolecules, 18, 534-552.
29. Abkevich, A. I., Gutin, A. M. & Shakhnovich, E. I. 32. Govindarajan, S. & Goldstein, R. A. (1998). On the
(1995). Impact of local and non-local interactions on thermodynamic hypothesis of protein folding. Proc.
thermodynamics and kinetics of protein folding. Natl Acad. Sci. USA, 95, 5545-5549.
J. Mol. Biol. 252, 460-471. 33. Kimura, M. (1979). The neutral theory of molecular
30. Pande, V. S., Grosberg, A. Y. & Tanaka, T. (1997). evolution. Sci. Am. 241, 98-126.
Statistical mechanics of simple models of protein 34. Ohta, T. (1987). Simulating evolution by gene dupli-
folding and design. Biophys. J. 73, 3192-3210. cation. Genetics, 115, 207-213.
31. Miyazawa, S. & Jernigan, R. L. (1985). Estimation of 35. Ohta, T. (1988). Multigene and supergene families.
effective interresidue contact energies from protein Oxford Surv. Evol. Biol. 5, 41-65.
Edited by J. Thornton
(Received 6 August 2001; received in revised form 22 October 2001; accepted 23 October 2001)