You are on page 1of 23

Protein Science (1997), 6301-523. Cambridge University Press. Printed in the USA.

Copyright 0 1997 The Protein Society

REVIEW

Subtilases:
The superfamily of subtilisin-like serine proteases

ROLAND J. SIEZEN

AND

JACK A.M. LEUNISSEN2

Department of Biophysical Chemistry, NIZO, P.O. Box 20, 6710BA Ede, The Netherlands
CAOSKAMM Center, University of Nijmegen, Toernooiveld,6525ED, Nijmegen, The Netherlands

(RECEIVED
August 22, 1996; ACCEPTED
November 5 , 1996)

Abstract
Subtilases are members of the clan (or superfamily) of subtilisin-like serine proteases. Over 200 subtilases are presently
known, more than 170 of which with their complete amino acid sequence. In this update of our previous overview
(Siezen RJ, de Vos WM, Leunissen JAM, Dijkstra BW, 1991, Protein Eng 4719-731), details of more than 100 new
subtilases discovered in the past five years are summarized, and amino acid sequences of their catalytic domains are
compared in a multiple sequence alignment. Based on sequence homology, a subdivision into six families is proposed.
Highly conserved residues of the catalytic domain are identified, as are large or unusual deletions and insertions.
Predictions have been updated for Ca*+-bindingsites, disulfide bonds, and substrate specificity, based on both sequence
alignment and three-dimensional homology modeling.

Keywords: homology modeling; sequence alignment; serine protease; subtilase; subtilisin family
lowable substitutions, disulfide bonds, Ca2+-bindingsites, substratebinding site residues, ionic and aromatic interactions, and surface
loops. Based on these predictions, strategies for homology modeling and protein engineering were developed and implemented,
aimed at modulating either stability, catalytic activity, or substrate
specificity (Siezen et al., 1991, 1993, 1994, 1995a).
Since 1991, more than 100 new subtilases have been discovered,
and these are now included in this updated review. In addition to
many new enzymes from micro-organisms, numerous members of
the subtilase superfamily have now also been identified in various
eukaryotes such as slime molds, plants, insects, nematodes, molluscs, amphibia, fish, mammals, and even in a catfish virus.

Serine endo- and exo-peptidases are of extremely widespread occurrence and diverse function. Many distinct families of serine
proteases exist; they have been grouped into six clans (Rawlings
and Barrett, 1994; Barrett and Rawlings, 1995), of which the two
largest are the (chymo)trypsin-like and subtilisin-like clans. These
twoclansare distinguished by a highly similar arrangement of
catalytic His, Asp, and Ser residues in radically different PIP (chymotrypsin) and a @ (subtilisin) protein scaffolds.
In 1991, we presented a review of over 40 members of the
subtilisin-like serine proteases, termed subtilases, which occur in
Archaea, Bacteria, fungi, yeasts, and higher eukaryotes (Siezen
et al., 1991). The mature enzymes were found to contain up to
1775 residues, with N-terminal catalytic domains ranging from
268 to 5 1 1 residues, and signal and/or activation-peptides ranging
from 27 to 280 residues. Several members contain C-terminal
extensions, relative to the subtilisins, which display additional properties such as sequence repeats, Cys-rich domains, or transmembrane segments. From four known crystal structures and a multiple
alignment of 40 known amino acid sequences, a corestructure was
predicted for the catalytic domain of all subtilases, together with
the variations that are allowed in the main-chain length as a result
of insertions and deletions (Fig. 1). Nineteen of these core residues
were found to be highly conserved, 10 of which are glycines.
Predictionswerealsomade
for subtilases of unknown threedimensional structure concerning essential conserved residues, al-

Structure-based alignment

The coordinates of subtilisin BPN, subtilisin Carlsberg, thermitase, and proteinase K were used previously (Siezen et al., 1991)
to determine the core of structurally conserved regions (scrs;
Greer, 1990) and the common secondary structure elements, as
analyzed with the DSSP program (Kabsch and Sander, 1983). This
core of about 190 residues contains virtually all of the common
a-helix and &strand elements, including the active site residues
D32, H64, and S221 (Siezen et al., 1991). Slight adjustments to
thesecore regions have now been incorporated (core ABC in
Fig. 2) based on a recent spatial superpositioning of seven structures that also included mesentericopeptidase, Savinase, and Esperase (Heringaetal., 1995); topologically equivalent residues
were defined as those that have Ca-atom distances of less than
2.0 A. The variable regions (or vrs) nearly always correspond to

Reprint requests to: Dr. Roland J. Siezen, Department of Biophysical


Chemistry, NIZO, P.O. Box 20, 6710BA Ede, The Netherlands; e-mail:
siezen@nizo.nl.

501

R.J. Siezen and J.A.M. Leunissen

502

Fig. 1. A: Schematic representationof the secondary structure topologyof


subtilases, with a-helices shown as cylinders and p-sheet strandsas arrows. Solid lines indicatethe conserved regions (scrs)io all subtilases, and
dashed lines the variable regions (vrs). Approximate location is indicated
of the main Ca2+-binding sites (by Cal and CaZ), catalytic triad residues
D32, H64,and S221 (by *) and substrate-binding region (between strands
e1and e m ) . B: Ribbon-plot representation of the secondary and tertiary
structure of subtilisin (PDBcode 2SNI), made with MOLSCRET (Kraulis, 1991). Side chains of the catalytic residues are shown in ball-and-stick
representation.

corresponding geneor cDNA sequences. We caution that in many


cases it has not been established whether these genes encode functional proteins or whether the encoded protein is actually a protease. Examplesof the latter are the outer-membrane antigen
phssal
of Pasteurella haemolytica (Lo et al., 1991), and the anti-freeze
proteinaf70 of Picea abies (EMBL D86598),whichwerenot
described as proteases by the authors.
Themajority of the subtilases are synthesized as pre-proenzymes, subsequently translocated overa cell membrane via the
by cleavage of
pre-peptide (or signal peptide), and finally activated
the
pro-peptide.
A
detailed
comparison
of
the
pre-pro
sequences
Identification of subtilase supetfamily members
and the putative processing sitesof these subtilases has identified
An extensive searchof scientific literature and databases (EMBL, two main types of pro-peptide (Siezen et al., 1995b). However,
Genbank, Swiss-Rot) was performed to identify new subtilisinthere are numerous exceptionsin which the pro-peptides appearto
like serine proteases, using the programs BLAST (Altschul et al., be completely unrelatedor even absent.A small number of subti1990), TFASTA, and FASTA (Pearson and Lipman, 1988). Conlases is intracellular (Table 1).
sensus sequence segments
of 20-40 residues aroundthe active site
Table 1 shows that the (putative) mature enzymes range in size
residues D32, H64, and S221 were usedfor this purpose; different from266to1775residues.Thecatalyticdomain
or moduleis
consensus segments were obtainedfor different subtilase families
defined as the segment with sequence homology to subtilisins; it is
(see Fig. 2). Sequences from patent literature and databases are notalways located at the N-terminal end of the amino acid sequence
included because they represent synthetic or mutated genes encod- directly after the pre-pro region. This review is focussed only on
ing engineered subtilases. The main results of these searches are
the catalytic domains.
summarized in Tables 1 and 2. Further details, including reference
to 10 crystal structures, can be found in the EMBLlGenbank and
Alignment of primary sequences
PDB databases using codes listed in the tables.
The multiple sequence alignmentof the catalytic domains of over
At present, over 170 complete and several partial amino acid
120 subtilases is shownin Figure 2. Additional variants with
<10%
sequences of subtilasesareknown;mostarederivedfromthe

connecting loops between helices and strands and generally lieon


the external surface of the protein (Fig. 1).
When only the subtilisin BPN', subtilisin Carlsberg, and thermitase structures were superimposed the number
of structurally
equivalent Ca atoms increased to over 230
(or about 85% of all Ca
atoms), whichwereferto
as the"extendedcore"(core
ABin
Fig. 2). This distinction between core and extended core scrs isof
relevance for homology modeling, because the superfamily
of subtilases can be subdivided into several families (see below).

503

Subtilases
20

10

*
*
*

*
*
*
*
*
*

*
*
*

basbpn
bssl6B
bssdy
blscar
besprc
beeprd
bsaprq
b16147
baalkp
bscyab
bsaprs
bsepr
b66epr
"rnvapt
psaprp
paa1ys
bsta39
bsta41
bplep
b616pl
bslakp
belepq
L611p
LvLhcr
tstap
bsakl
hmhl ye
nahlya
S"0535
bsrpra
dnbpr
dnavp2
dnavpi
alaprl
xcpr0.a
.&.t

c o r . AB
vaproa
a1apl-2

LrL4la
Laaqua
LaOrOL

taprok
tapror
bbprl
fuealp
plbepr
macdpa
aoespr
acaipr
atoryz
aooryz
atelst
anprta
anpspd
Lhprbl
anpepc
scprbl
6cy6p3
EpsCpr
YlXprZ
scycts
CQX-ABC

efcyla
E=PePP
lslaep
bspara
6esplp
llnlsp
f

.
*

*
*
*
*

.
..
.
..

cGFS~~mY--QWNVKHIN-------APRLGRLGRLFSHIW~RRAFGYG--VKVAVLDTGIDY~PELSG-~~~~~~----~~~~~~--AASQST""PWGII(AIY
~ ~ " . . . .~SSITQTSGGGG
~ ~ ~ ~
..~ ~
N ~ A
~
v L~
~ T .~ v.- .
p ~ L ~ N . . . ~.
~ ~ .~ ~ ~
. . ~ ~ ~ ~ ~ . . . ~
~ ~ ~ ~ ~ " " p W G ~ ~ A ~ y ~ " - - - - - - N N S N t T S T S G G R G p ~ ~
M E R K V H I I P Y Q V I K Q E Q ~ I - - P R G V E M I Q ~ ~ - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ - A P A V W N Q T ~ R G R G - - V K V A V L D T G C D I \ D H~-P D~~~~~~~~LKA----~~~~~~~

Y T R N D P I ( Y - C S Q Y A P Q Q V N - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - - ~ ~ S A W D T T L G S S ~ - - V K l A V V D ~ V ~ ~ D H P D L S S Q ~ - - - ~ - ~ ~ ~ ~ ~ - ~ " " ~~

AAPNDQHY-REQWHYFDRY------~-~~~~~~-GVKADKVWDHGFTGQN--VVVAVVDTGIL-H~DLNAPNLPG-~~~-YDFISNSQISLDGDG~

- "-

AAPNDPFY~NDQWHYYSEY---~~~~~~~~----GVKADKVWDRGIT~KG~~VTVAVVDTGIV~~PDLNAPNIPGSG~~-~FIQEAEIAQDGDGRD
ATPNDPRY~NDQWHYYE~A----~~~~~~~~~~AGINAPAAWDK-ATGQG-~VVVAVLDTGYR~PHLDLDANILPG~~~~~~MISKTFVANDGG~D
~LTPNDTRL~SEQWAPGTTN---~~~~~~~~~--A~LNlRPAWDK-ATGS~--TVVAV~DTG~T~SHADLNANlLAG-~~~~YDF~SDA~ARDGNGRD
hhhhhtt
hhhhh
tL
LL
ccccccb
tt L t

< I 9 N Q T N A ~ ~ ~ ~ ~ I W G L D R I D Q R ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - N L P L D N N Y S A N F - - D G T G ~ ~ V T A Y V -I D T G V ~ ~ V E F G G - - - A S G D Q A N P ~ ~ ~ - - T W G L D R V D Q R " - - - - N L P L N S N Y H Y D F - I D T G V R I S H N E F ~ N - ~ ~ ~ ~ ~ ~~ ~ ~ ~ ~~ "


~
~
~
~
~
~
~ V Q S P A ~ ~ ~ ~ ~ T W G L D R I D Q R ~ ~ ~ ~ ~ ~ ~ ~ ~ - - T L P L D G ~ Y T Y T A - - T G A G ~ - V H A ~ V D T G I L L S H Q E P T ~ - - - - - ~ ~ ~
A T Q S P A ~ ~ ~ - - P W G L D R I D Q R ~ ~ ~ ~ ~ ~ ~ - - - - D L P L S N S Y T Y T I I - - T G R-~~~~~~~~
R E F ~ ~ ~-- - - - - ~ ~ ~ ~ ~ ~ ~ ~
r A T Q E D A ~ ~ ~ ~ - P W G L I I R I S S Q ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - E P G G T T Y T Y D D S - I \ G T G - ~ T C A Y I I" D T G l Y ~ ~- F G ~ - - - - ~- ~ ~ ~ ~ ~ ~ ~~ ~ ~
A A Q T N A - - - - - P W G L A R I S S T ~ ~ - - - - - - - - - ~ S P G T S T Y Y Y D E S - A G Q G S C V Y Y I D T G l E A S H P E F E ~ ~ ~ ~"
~~~~~~~~~""~~~~~~~r A E Q R N ~ - - - - - P W G L A R I S S T " - - - - - - - - - ~ S P G T S T Y R Y D D S - A G ~ ~ ~ T C V Y V l D T G V E A S H P E F E ~ - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ " "~~-~~~~~
A V V R Q A G A - - - - - P W G L G R I S H R ~ ~ ~ - - - - - - - - - A R G A T T F D Y D S S - A G A G ~ ~ T C V Y Y I D T G V D A S H P N P D ~ - - - - -~ ~ ~ ~ ~- ~ ~ ~ ~ "~
~
~
~
~
A I T Q Q Q C A ~ - - - - T W G L T R I S H R " - - Q R G S T R Y R Y D T T - * P E F E G ~ - ~ - - ~
" - - - ~ ~ ~ ~ ~ ~
< A Y T Q Q P G I \ - - - - - P W G L G R I S H R ~ ~ ~ - - - - - - - ~ S K G S T T Y E Y D T S - G G S G ~ ~ T C A Y V I D T G V E A S H P E F E G " " ~ ~ ~~-~ ~ ~- ~ ~
CITEQSGI---~-PWGLGRISHR----------~~SKGSTTYRYD~~-AGQG~~TCVYIIDTGIEASHPEFEG----~~~~~~~~~~~~--~ A E Q ~ S - ~ ~ ~ ~ T W C L D R I S H E - - - - - - - - ~ ~ ~ D Y S A P Y T Y E Y D E T A A G A G ~ ~ ~ Y ~ I D T G I R- I S H D E F Q ~ - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~~~~~~

~~~~~~~

~~

~~~~

~~~~~~~~~

~ L Y T Q N G A ~ ~ ~ ~ ~ P W G L G T I S W ( - ~ - Q P G S T S Y I Y D D S R C S C - T Y I T G I L E S H N E ~ ~- ~ .~~~~~~
~~-~~~
~~~~

A L T T Q K C A ~ ~ ~ ~ ~ P W G L G S I S H K ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - G Q I \ S T D Y I Y D T S R G A G A - - T Y R Y V V -D S C I P- N N I ( V- E F E S ~~ ~ ~ ~ ~ ~~~ ~ ~ ~ ~ ~
~
C L T T Q K S A - - - - - P W G L G S I S H K ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - = Q Q ~ ~ Y I Y D T S ~ ~ G ~ ~ - - T~~ A Y~V V D~S G V ~
~ D " ~~~ ~ ~ ~
~ ~ ~ ~~~ ~ ~ ~~
D L T T Q S D A - - - - - P W G L G S I S ~ K ~ ~ ~ ~ ~ ~ - - - - - - ~ Q P S ~ Y ~ Y D ~ - ~ ~ ~ ~ - - T Y A Y V ~ I ~ I N V D H E E F E G ~ ~ ~ ~ ~ ~ ~ - - r A L T S Q S G A - - - - - P W G L G A I S H X " - - - - G E A S T T Y V Y D D H E E F G G ~ ~ ~ ~ ~ ~ - - - - - --~- ~ ~ ~ ~ ~ ~ ~ ~ - - ~ T L V T E ~ - - - - - P H C L G S I S H R " - - - - - - - G R 5 S T D ~ G ~ ~ - - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ - - A L T T Q S G I \ - - - - - P W G L C ~ S H R - - - - - - - - - - - ~ T S G S T ~ Y I Y D ~ ~ - A G A G ~ ~ T F A Y ~ S G I ~ S H Q Q F G G ~ - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ " - - - - ~ ~ ~ ~
~ E G I T E K N I \ - ~ ~ ~ ~ P W G L A R I S K R D - - - - - - ~ - S L T F G N F N ~ ~ ~ Y ~ S E ~ G G E G ~ ~ V D A Y T l D T G l N V D H V D F E ~ - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
E F D T Q N S A ~ ~ ~ ~ ~ P W G L A R I S H R E - - - - - R L N L G S F N R Y L Y D D D R K D F ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - -

~ N S S L Q E E A ~ ~ ~ ~ - P W G L H R V S H R E ~ ~ ~ ~ ~ ~ ~ ~ K P K Y G Q D L E Y L Y E D A ~ A G K G - - V T S Y V L D T C I D T E H E D F E G ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ K L E T Q S G A ~ ~ ~ ~ ~ P W G L A E I S H K - - - - - ~ ~ ~ ~ S V K Y D D I C K Y V Y D S S ~ A C D N - - I T A Y V V D T C V S ~I H H V E
~ F E G -~
~
~
~
~
~
~
A I Q ~ P V T ~ ~ ~ ~ ~ Q W G L S R I S H K - - - - - - ~ ~ ~ ~ ~ ~ K A Q T ~ N ~ A Y V R E T V G K ~ H P T V S Y V V D S G I R T T H S E F ~ ~ - - ~ ~ ~ ~ ~ ~ ~ ~ ~
<EGDSYNSAESSYTFNR.TAKYSYEDVEEEQNITYQPDAPRHLARISRHLARISRH~QLPFDVGDKDRYKSWFNYY~EHDY
C D
V N R Y I W D T G I F I \ D H P E F E D - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ " - - - ~ ~ ~ ~ ~ ~

a"-

EVNSSILNINILNKDFK-SFNWPYKKIL - - - - - - - - ~ - ~ ~ ~ ~ ~ ~ ~ S H I D P V K E Q L G K D ~ ~ I T I A L l D S G l D R L H P N L Q D ~ ~ ~ ~ ~ ~ ~ ~ ~ HKSMITYIKQTIT~~DSILFIDSGCDFKHPELQD------------~-~~~~~~~-----MTVL~LRDINS*ILT"""~~".~~~~~~~EyRLH.HYSSRyT~~SSIALLDE~~KT~syLQK~~~~~~~~"""~~~~~~
<FE~EDNWAFEHL"""".~~~~~~~~~~"SI(RH"DFNGNK~~"*IAVLDSGVS.~IKGLDK~~~~~~~"".~~~~~~~~~~~
~NGSHDLF~DRQWD~RRIT-----------~~~~~~NECKSYKLSPDRKK-~AKVALVDSGVNSSHTDLr~~~~~~~----c T N S H D F W ~ D Y Q W O H ~ Y V T - - - - - ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ G E ~ ~ ~ L Y Q P S K K ~ ~ I S ~ G l l D S ~ l H E E H P D L S N S L ~ N Y

1spc2
cepe2

hspc2
acpc1
lapcl
h6pc13
bcpc3
hspacl
hepc6
aafur
dmfurl
tLfur
=?furl
actur1
acfur2
ISt"l-2
X1f"rA

dssp

* 6C6epr

A T P N D P Q Y ~ C Q Q Y A P Q Q V N - - - - ~ ~ ~ ~ ~ ~ ~ ~ - - - - - - C E A A ~ D V T Y ~ D ~ ~ - ~ V T I S ~ ~ I Q Y D H E D L E ~ ~ ~ ~ - ~ ~ ~
~ E Y P N D P E Y ~ S K Q W N L R A I A - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - - - - ~ M E S A W D - ~ ~ K G E G ~ ~ V T V A " I D T G V T ~ R V P D L R Q T - - ~ ~ ~ ~ ~ ~ ~ ~
~SL~IQY-PYQWPLI(NNG"------ENGGVI(NADIYEP~12)TLIAVVDTGVDSTLADLKGKV-"---------~~~~~~-----~PNDPSY~RQQWHYFGNY
- . . . ~ ~
G V~
K A ~~
V W ~~
R G F~
TGQ
~
~~V
-V V
-S ~-~ TGlL~DHVDLNG~LPG~~~~-YDFlSSAPNARDGDQRD

* hvccvp
c avprca
* asaspa

Y T P N D P Y F S S R Q Y G P Q K I Q - - - - - - - - " A P Q R W D I A E G S P D L A G ~ ~ ~ - - - - ~ ~ ~ ~ ~ ~ ~ ~ " " - ~ ~ ~ ~ ~ ~ ~


~
~
p
~
~
"
~
~
~
Q
W
~
p
Q
~
~
Q
~
~
~
~
~
.
~
.
~
~
~
~
WTpNDTyyQGyQyCpQKTy
" . . . . ~ ~~ ~
y A~
~ ~
~ ~~
~ ~~
~ ~
~ ~~
~ . .. .Q .~ ~ A v I D T G ~ y ~ p D L D ~ ~ ~ - - - - - ~

hsfur
dmfur2
cefur2
hakx2
hslpc
ylxprd
k1 k e x l
sckcx2
* spkrpl

~~~~~

* mmpc4

*
*
*
*
*

H N C E I R L I P W T N E Q I H D ~ ? I E L - - P E G l K V l K ~ ~ - - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ - A ~ E M W A K G V K G K N - - I K V A V L D T G ~ D T S H-P. D- L K N ~ ~ - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ - H R K F R L I P Y K Q V D K V S A L S E Y " P H G V E I V E " - - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ A P A V W K A S A K G A G - - Q l l G V ~ D T G C Q V D H~~~~~~~~~PDLAE~~~---~~~~~~~~-H S K V S L I P F K V E K V L N D T K V I - - P P G I E H I E " - - - R P I V l E F K D ~ ~ ~ ~ - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ H G Q V R L I P Y E V T S I Q D D T K K I - - P P G I E H I E - - - - - - - - R P D L W Q Q G Y K G K G - - I ~ V A V L D T G C D V E H Y E L R D ~-~ ~ -- - -- - -~ ~ ~~ ~ ~~~ ~ ~ ~ ~ ~ ~ -

*
*
*
*
*

40

1 .

~ Q ~ ~ ~ ~ ~ . p y G ~ ~ Q I ~ ~ ~ ~ ~ ~ " " " . ~ ~ ~ ~ ~ . ~ p ~ ~ H ~ Q ~ y ~ G S ~ "


n Q s v . ~ ~ ~ p y c I s Q ~ r " ~ ~ ~ ~ ~ " " " ~ ~ . ~ ~ n p A ' ~ s Q c y T c s N " v l v n v r D s G I o s s H p o L ~ ~ ~ ~ ~.
~ " " . ~ ~ ~ ~ ~ ~ " " . ~ ~ ~ ~ ~
~
Q
~
"
~
~
p
y
~
~
p
~
*
~
"
"
~
.
~
~
~
~
AQTV~".pyGIp L I K ~ ~ ~ " " . . ~ ~ ~ ~ ADKVQAQGFKG
~ ~ . . .
I9N..VKVAVLDTGIQ
AsHpDL ~ . . . . . . ~ ~ ~ ~ ~ ~ ~ " . ~ ~
~
Q
T
V
~
~
~
.
p
~
G
I
p
~
I
~
~
~
~
~
~
~
"
"
.
.
.
~
~
~
~
.
~
~
~
AQTV'pyGVpHl
~
"
~
~
~
~
~
~
"
QTV""pWCIpyIy
" " ~ ~ ~ ~ S~
D ~ H~
R Q~
G y P.
G N.G .
~~.
v K.
v A.V L
.Q T G v A ~ p ~
HpD . ~ ~ ~ ~ ~ ~ . . . . . ~ ~ ~ ~
Q ~ I " . P W G I S F I N ~ ~ ~ ~ " " " " - R R V R . ~ ~ p ~ ~ ~ ~ . " . ~ ~ ~ ~ ~ ~ ~ " " ~
~ Q ~ ~ " " p W ~ I ~ R ~ Q " " " - - " . ~ p ~ R H N R G ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ s
Q T V ~ ~ ~ ~ p W G I N R ~ Q " ~ . ~ ~ ~ ~ ~ ~ " . ~ ~ ~ ~ ~ R p I A Q S R G ~ T ~ ~ G ~ . v R " A ~ ~
~
Q
~
~
.
~
~
~
p
~
G
~
m
~
Q
"
~
~
~
~
~
~
"
"
"
.
~
~
~
~
p
~
~
~
~
~
y
~
~
S D G m ~ S D N F E . ~ ~mQL E p I Q " ~ ~ ~ ~
~ ~~
~ ~
~ V.K. ~. ~~..~..lA.
~ G~ L
A T
v IGD s ~ ~ ~
~ . p H
~ ~ ~~
L
~
.
. . ..
.
~
~
~
~
~
<SQI ~ ~ ~ ~ . ~ W ~ y K K ~ y ~ ~ ~ ~ ~ " " " . ~ ~ ~ ~ ~ . ~ ~ ~ Q y ~ ~ p ~ ~ ~ ~ ~ . . ~ ~ ~ ~ ~ ~ ~ ~ ~ I A ~ ~
SETT""pWGyFAVX
~ ~ " " ~ . ~ ~ ~ ~ ~ ~ L~ .. . ~. I ~\ D
~ QQ A ~ N Q . . Q ~ ~ C ~ ~ ~ ~ .
~ .
~ .L .A ~
H ~
~ L~~ ~
~ . ~ ~ ~ ~
~
~
~
"
"
p
y
~
~
~
~
~
Q
"
~
.
~
~
~
~
~
~
~
~

* bcpc2

30

'"BC'P
smssp1
S""6p2
phssal
bsspra
basprb
bsbpf
bsvpr
epscpa
IlprLp
Idprtb
llspos
agserp
lep69

*
* cmcucu
paat70
atscrp
hsklaa
ddLagb
ddLagc
dmpga9
hstpp2
cctpp
* sm6Lab
* ptpyro
tsp1st
+

*
*
*
*
*
c
f

Fig. 2. Continues on following pages.

504

R.J. Siezen and J.A.M. Leunissen


60

50

basbpn
bsslbS
bssdy
bl -?a>bssprc
b66nrd
bsaprq
blS147

baalkp
bseyab
bsaprs
bscpr
bssepr
vmvapt
psaprp
paa1ys
beta39

betail

. .

boleo

bslspl
bslakp
bslepq
Islap

70

SO

100

110

90

120

130

- - - K V A G G A S M V P S E T N P F ~ ~ ~ ~ ~ - - - - - - - Q D ~ S E G ~ A G ~ ~ L ~ ~ ~ ~ ~ ~ - - ~ S ~ G ~ - L C V S ~ S - ~ A S L Y A V K V L G A ~ ~ ~ D G S G Q Y S W l I N G I E W A l A N - - - - -~- -- ~- ~
- -- -S-A-A- L- - N E I D V I N E I
W R G G I S P Y P S E T N P Y ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Q D G S S E G ~ A G T I A A L - - - ~ ~ ~ ~ ~ ~ S I G V ~ L G V A P ~ - - A S L Y A V K V L D S - - - T G S G Q Y S W I I N G I E W A I S N ~ ~ ~ ~ ~"STAL
~~~~-~-~~--NEID~IN
K V V G G I I S F Y S G E - S Y N ~ ~ ~ ~ ~ ~ ~ ~ - - - - T D G N ~ G ~ A G T V A A L ~ ~ ~ ~ ~ ~ ~ ~ D N T T G V - L G V A ~ ~ - - V S L ~ A I K V L N S - ~ - S G S G T Y S A I V S G I E -W-A- T. .Q .N---STAL
~- ~ ~ ~ ~ ~ - - ~ ~ - - ~ ~ - ~
W V G G A S F V I G E - I Y N ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T D G N G ~ G - A G F J ~ A L - - - - ~ - ~ - D ~ T G V ~ L G V A P S - - V ~ ~ ~ A V ~ V L N ~ - - - ~ ~ ~ ~ T Y S G I " S G I E W A T T N ~ ~ ~ ~ ~
WKGGASFVSGEPNIIL------~~~~~~QDGNGEG~VAGTY~L--------~TGV-LGVAYN~~ADLYAVKVLSA---SGSGTL~GIAQGIE~SIS~--~-~~~~~~~~~~~~
R V V G C A S F V S E E P D A L - - - - - - - - - - - - ~ G N G B G T H V R C V L S A ~ ~ ~ G G S G T L A G I A Q G I E W A I D N - - ~ - ~ - - - - ~ - - - - - - ~ D V I N H S L- G- G~S T- G --~-S-T TL
~ H~
I R.
G G.Y S F I S T E P T r Y ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - V D ~ G E G ~ A ~ F J ~ ~ L - - - - ~ ~ ~ ~ ~ S Y G " ~ L G V A P G - - A E L Y A V K V L ~ R - - - N ~ S ~ S H A S I A Q G I E ~ A M ~ ~ ~
~ ~ - - R I A G G A S F I S S E P S Y ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - H D N N G B G T H V I G T l A A ~ - - - - - - - - ~ S I G V ~ L G V A P S - - ~ L Y A V K V L D R - - - N G ~ G S L ~ S V A Q G ~ E ~ A I ~ - ~ ~ - ~ - ~ ~ ~ ~ ~
NIRGGASFVPGEPST~~~~~~~------QDGNGEG~VAGTIAAL---~~~~~~SIGV~LGVAPN--AELYAVKVLGA---SGSGSVSSIAQGLEWAG~-~~~~~~~~~~~~~~
R i R G C A S F ~ P G E P N I - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ S D G N G ~ G T Q V A G T I A R L - - - - - - - - ~ S I G V ~ L G V A P N ~ ~ V D L Y G V K V L G A - - - S G S G S I S G I A Q G L Q W ~ ~ - - - - - - - - - -- ~- -- -- -- -~ -- G-~~-- HS IA AT ~M S L G S S A G - - - - NIRGGYSFYPGEPSY-~~~~~~~~~~~~QDGNCHGTHVIGTIIRL--------~SIGV-VG~APN~~AELYAVKVLGA---NGSGSVSSIAQGLQWTAQN--~-------~----~NIHV~LSLGS
SIAGGYSIVSYTSSY----~~~~~~~~~KDDNCPCTHVIGIIGAK--------HNGYGI-D~IAPE~~AQIYAV~LDQ---NGSGDLQSLLQGIDWSIAN~~-~~~~~~~~~~~~~
KVKGGTCVIRSDCGKGY--------")DNCHGTHVAGIIGA~~-------DNGVG~-VGVAPD--ADLYAV~FDE~~~FGEGSTSSITAGVDWAIQH~~~~~~~~~~~~~~~~~DIINLS
NRVTGTNDRGTGQWYIP-----------GS~~G~VAGTIAAI-~~-----A~EGV-KGLLPNQWNLHIVKVFNE---SGWGYSSTLV~IQTCADN~~~~~~~~~~~~~~~~GAKI~
AGVTGSTFSGHGSWF----------TDGNGBCTHVACTIVAL-~------D~G~-~G~LPSGLVGLHNVKIFND~~SGV~~ASDLI~IQSCQSA~~~~~~~~~~~~~~~~GSH
KVVYCINTLGKI.LYKG~RK------CADRKCEG~VAGIIAASL---~~---~SA-AG~PK--VQLIAVKVLYD~~~SGSGYYSDIAEGIIEAVKA~~~~~~~~~~~~~~~~GALILSMSL
~~.~
WEQCKDFTYG~YTNNS~~~~~~~~~CTDRQGBGTHVAGSALADG-------CTGNGV-YGVAPD~~ADLWAYKVLGD---DGSGYADDI~IRHAGDQATALN~~---~~~~~~TKVVINEISLGSS
. ~W.E.Q C K D F r V G T N F T D N S ~ ~ ~ ~ ~ ~ ~ ~ ~ C T R ~ E G ~ V A G S A L A N G - - - - - - - G T G S G V - Y G V A P E ~ ~ ~ L W A Y K V L G D - - - D G S G Y A D D I A E A I R H A G D Q A T A L N ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T K V V I N H S
.." R I I G G R N F T O D D E G D P E I F ~ ~ ~ ~ ~ ~ - - - I ( D Y N G E G ~ V A G T I A A T - - - ~ ~ ~ - - E N E N G V ~ V G V A P E ~ ~ A D L L I I K ~ L N K - - - Q G S G Q Y D W I I Q G I Y Y A I E Q ~ ~ - ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ K V D I
--QIIGGRNFSDDDGCKEDAI"---SDYNGBGTHYRGTIAAN--~~~~~~DSNGGl~AGVAPE~~ASLLlVKVLGGE--NGSGQYEWIINGINYAVEQ------~~-~~~~~~~KVDIlSMSLGGPSD
RIIGGVNLTTDYGG~ETNF---------SD~GEG~VAG~AAA~~~~~~~~ETGSGV.VGVAPK--ADLFIIKALSG---DGSGEMGWIAKAIRYAVDWRGPK~E----------WIRIITMSLGGPTD-~~~
~ ~ Q l I D G R N F T T O D N S D P D W ~ ~ ~ ~ ~ ~ ~ ~ ~ E D S N G E G T H V C G P V A A C - - - - - - - - ~ N D K G V - l C T A P K - ~ A K L L V V K V L S G ~ ~ ~ Q G Y G D T K W V l E G V R Y A I N W R G P ~ E - ~ - ~ ~ ~ ~ ~ ~ ~~ R
-V
TR
PV
El
LS M S L G C R ~ D " ~ " " ~ - R I I G K H W T S D D C N D P E I V ~ ~ ~ ~ ~ ~ ~ ~ ~ S D Q N G ~ G T H V C G T l A A T - - ~ - - ~ - - E ~ R A ~ ~ I G V A P E ~ ~ C Q L L V V K V L S N ~ ~ ~ R G F G T T E W V V E G l R H A l N W E G P N G DPRL
E~~~~~~~~~~KVQVLS
~~~~

~~~~

~~~~

~~~~

""

""
~~~~

~~~~

~~~~

~~~~

~~~~

~~~~

~~~~

~~~~

~~~

tvLhel

IGLap
bsakl
hnhlys

nahly6
yo535

bsrpra
dnbpr
dnavp2
dnavpS
alaprl
'"p'Oa
..c..t
COX.

AB

Yapma
a1apr2
LrLIloi

taaqu.3
LdPlUt
raprok
irp,or

bbpr 1
tlicalp

plbspl

macdpa
.'OPSP'
acalpl
atoryz
aooryz
afplsr
anprta

anpepd
thprbl
d"pepc
scprbl
ECyspl
6psep'

y1xp1-2
CcyCLs
sor-ABC

etcy1a
=epepp
ISlasp

bepara
seep.?

Il"lS?

1spc:
rPpc?

bcpcZ
hspc2

acpc1

lnpcl
hspcll
bcpc?
hspar4
hspc6
aafur

dmfurl
ttfur
cefurl
actur1
actur2

Isfur2
mmpc4
x1 turA

hstur
drnfur2
cetur2

hakx2
hslpc
ylxpr6
klkcxl
sckcxz
spkrpl

~~NYHADASYDFSSNDPYPYPRY
- - ~ .TDTWFNSBGTRCAGEV~AAK
...
--.....
D ~ G V C ~ . V G " A Y ~ - - S ~ " A ~ L ~ M L D Q - - - - P ~ ~ ~ ~ l E A N A M G H M P N - ~ ~ ~ ~ ~ ~ ~ - ~ -~ ~""-GKFJDGPRNLT
~~~~~VlDIYSASWGPTOD"~ ~ N F N A E A S Y D F S S N D P F P Y P R Y ~ ~ ~ ~ ~ ~ ~ T D D W F N S E G = C~ G~
E l VDNGVCG-VGVAYD
A A~
R
~ . .
- - G K V A G I ~ L D Q....P Y M T D L l E A N S M G H E P S ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ K l H l Y S A S W G P ~ D - - ~ ~
~~NYNADASYDFSSNEAFPYPRY"-TDDWFNSBGTRCAGEVVGKI--~.~..~GLCG.VGVRYG--~RVAGI~LDQ----PFMTDIIEASSMGHKPQ~~~~~~~~-~~~~~~~-EIDIYSATWGPTDD"~"~-~~~~GRT
~ ~ N Y N A E A S Y D F S S N D P Y P Y P R Y ~ ~ ~ ~ ~ ~ ~ T D D W F N S ~ G ~ C A G E ~ S A ~ - ~ ~ ~ ~ ~ ~ ~ ~ C G ~ V G V A Y N ~ ~ S K V A ~ l ~ L D Q ~ ~ ~ ~ P F M T D I l E A S S I S ~ P Q ~ ~ ~ ~ ~ ~ ~ - ~ ~
~~NYDPEASYDFNDNDEDPSPRY
~ . ~~
~ D.
I ~.E N ~ G = c A G E V S...~~..MVA
KCG-TGIAFT..LKIGGV~MLD
~....
G H V T D R L E G D A l C F ~ H ~ ~ ~ ~ ~ ~ -- -~-.K-Y-D-I-Y S A S W G P N D D - - ~ ~ ~ ~ ~ ~ ~ ~ ~ C R T T E G P G V M A
~ ~ N y D A E A S y D F N D N D p N p F P R Y ~ ~ ~ ~ ~ ~ ~ D ~ ~ ~ N ~ ~ ~ c A G E l ~ Q A ~ ~ . . ~ ~ ~ D ~ K c ~ ~ v ~ V A F N ~ ~ S K V G G ~ R M L D ~ ~ ~ ~ ~ G l V T D A I E A CKFJEGPGRLP
S S ~ G F N P ~ ~ - ~ - - ~ - - ~ - ~ ~ N y D p E A S y D F N D N D H D p F P R Y ~ ~ ~ ~ ~ ~ ~ D L ~ E N ~ ~ ~ c A G E ~ A M Q A ~ ~ ~ ~ ~ ~ ~ ~ K c ~ ~ v ~ V A Y N ~ ~ S K V G G I R M L D . ~ ~ ~ ~ G ~ V T D A I E A GXFJEGPGRLA
S S l ~ F N P ~ - - - - - - - - - - ~
~ ~ N Y D P D I S y D F ~ N D D D p Q P R Y ~ ~ ~ ~ ~ ~ ~ ~ ~ T N ~ N ~ G ~ c A G E ~~ ~A~ M- A
G YA V~T ~D I
~ Y~ E~I ~S S~ I~ G~F ~N Ic Q~ - ~- ~- -~ -~- A- ~
H VN D~I ~Y S
A RR S~W GC PG N~ D RD M- EL GD P E K L A
~-NyDSyASyDVNGNDYDp~PRY~~~~~~~DA~NEN~~~cAGEVAASA-~~~.~.~~yc~.v~IAYN~~AK~GGIRMLD-----GD~~VVEAKSLG~RPN-------~-~~~~~~~~Y~D~YSA
- - NYDILISCDVNGNDLDPMPRY ~ - ~
DAsNEN~G=cACEVAAAA
-...
---....
~ S ~ ~ T . ~ G ~ A ~ ~ - - ~ K I ~ ~ ~ R ~ ~ ~ - - - - - ~ D V ~ M V E A K S V S F N P Q ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ H V H ~ Y S
- - NYDPI(IISYDYNGNDGDpMPHC----...
~ L T O s ~ G = C A G E V ~---....
TA
~~KCA.~GIAY~--ARVGCV~LD-----GDVTDVVEAKSLGLNSQ~~~~~~~~~~~~~~~~~H~D~YSASWGPDDD~"""~~~
~.NYDPIII\SYDVNSIIDDD~M~H~.......~~~~~~~~AGE"~T~.......~~F~A.~G~~~~..~~"G~V~LD.~~~~GDVTDAVEARSLSLNPQ~~~~~~----~------~~~
..~ ~ o p K ~ s y ~ v N m o ~ .
~.p
..
Q..
p.nD y
I I N S ~ G = C A G ~ V A A I.......
A
~ ~ ~ ~ A . v ~ ~ A F H . . A G I G G V ~ L D ~ ~ ~ ~ ~ G D V ~ A V E A R S L S L N S Q ~ ~ ~ ~ ~ ~ ~ ~ - ~ - - - - - - - Y ~ D l Y S A S W G P D D D - ~
~ . N ~ ~ ~ R ~ ~ ~ ~ V ~ ~ ~ ~ ~ ~ ~ R ~ . . . . . . . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ G ~ V ~ ~ F . . . . . . . ~ ~ L ~ I . ~ ~ I A Y N . . A N I G G ~ ~ L D ~ . ~ ~ ~ G D V T D A V E A A S V G ~ N A D ~ ~ - - - - - - ~ - - - - - - - ..NYDEI(ASYD"NGHD~DP~PRY.......D y ~ E ~ G = c A G V V - Q A .......~ v ~ ~ . v ~ V A Y N . . A R I G G V ~ L D ~ ~ ~ ~ - G D V ~ S V E A Q S L G L N S Q - - - - - - - - - - - - - ~ ~ ~ - H I H I Y S A T W G P D D D " " - " " " G R F J D G P A T L A
..NYDPYI\SYDLNDHDNDPM~R~
.......DASNE*G~CAGE"SAEA
..~...~~ ~ ~ ~ ~ ~ I A p D ~ ~ ~ ~ I ~ ~ ~ ~ ~ L D ~ ~ ~ ~ ~ ~ ~ V Y ~ A ~ ~ A A S L S F ~ ~ - ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~

::~:~~::~:~;i~~~~~~~~~~~:::~::~~ND~~~~~:~~~~:~:~~~~:~~~~~:~~~:

:~~~~~a:~~~~~~~~~~g~~~:~~::~~~~~~~~~~~~~~::~::~:::::~~~~~:~~~~~

~ ~ N ~ D ~ E A S F D I N G N D S O P.T ~
P Q ~ DN~DN*G=cAGEVAAVA
~ ~ . . . .
.....~ ~ ~ ~ ~ ~ ~ . ~ ~ v A y N ~ ~ A s I
~ ~ N ~ D P L A S T D I N D H D D D~
~ T P Q ~ -GDN*G~=A~EVAALA
~
~
~
........
~ ~ .
~ ~ .~ . ~ ~ v A F K ~ ~ A K ~ ~
~ ~ N ~ D Q T A S I V L N D N D N D ~ ~ ~ R ~ ~ ~ ~ ~ ~ ~ ~ D.......
~ D A D ~N~~~ ~c ~ .= ~~ ~AvGA Ey A
N ~A ~AAIK Al G G v R M L D ~ ~
~ ~ N ~ S ~ ~ G S ~ D L N S N D ~ D ~ ~ P H P - - " - - D V E N G ~ ~ ~ ~ A G E ~ A A V P - - - . ~ ~ ~ ~ ~ F ~

G G v R M L D ~ ~ ~ ~ ~ G K ~ N D ~ E A Q A L S L N P S ~ ~ ~ ~ ~ ~ ~ ~
~ v ~ ~ L D ~ ~ ~ ~ ~ G A V S D S V E A A S L ~ ~GKTFDGPGPLA
N Q D - . . ~ ~ ~
~ ~ ~ G Q A T DA L.E.A .
S A~
L G~~
F~ R~~
G~D ~. ~-~-" I D l Y l ~ C W G P K D D ~ - - - ~ ~ ~ ~ ~ ~ ~
A ~ ~ G V A Y G ~ - S R ~ A C I R V L D ~ - ~ ~ ~ G P L T D S M E A V A F N ~ Y Q ~ ~ ~

- ~ ~ ~ ~ ~
~ ~ ~ ~ ~
G K T F C K P
~ ~ ~ ~ ~ - ~

: : ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ : : : : ~ : : : : : ~ ~ ~ ~ ~ ~ ~ ~ : ~ ~ ~ : : ~ ~ :

: : ~ F ~ ~ ~ ~ ~ : ~ ~ ~ ~ ~ ~ ~ ~ ~ : : : : : : ~ : : : ~ ~ ~ ~ ~ ~ ~ ~ : ~ ~ ~ ~

hvccw
avprca
asaspa

rSYAVVSESWGCVDD-----------GAAFCDTTGNF
~~SCKI"APRD"TRKRIFPTP .~..~......
= - ~ G T A C A G V A C G ~ ~ ~ ~ ~ ~ ~ ~ . N G ~ G * . S..
GAVKA=? ~G ~ I ~......
~ F v ALGSQDEADS~"~A~Q~----~--.-~~~~~~~CADVISCSWGPPDG~-TWWDDRDPLHKQKVPLP~ST
~ ~ W R P ~ C S K W V T G C S D P~ p ~
~
~
~
~
SV~G
~I I A.A.V~T~ ~ ~~~ ~ ~DD N ~~ I ~ ~ . L ~G V A~~ R ... ~ Q.L Q ~ ~ N ~ ~ D ~ . . . N I Q Q L Q K D ~ L Y A L C Q R R ~ ~ ~ Q P G - - - . ~ - ~ ~ ~ ~ ~ ~ L Q P E L R M S L V D P E G ~ - - - ~ ~ G L D Q V

elssp
sCt7cpr
ernscrp
smssp1
emESP2
phssal

~~VNCVACKPDTADCAWRPS~~~~~-----I\IESP~G~~GEIAAAK~-~~----NGVG~-TCVA~G--~KVA~IKVSNP---DGFFYTEA~CGFMWAAEH--~-~-~-~~~~~~~-C~DV~SYYTDPW-~~

bsspra
bssprb
bsbpf

~-~QWLGSTNLNI\HTGILPITYV~NVP~~~DSSSGEG~AGFJGGTGA------MSGGKY~EGVAPG--ENL~GYGSGA.....~VVAMLDTLGGFDYAL~QQEY~~~-~~------NIRIl~SWGATSD"-----~~~~AGTOFDP~P
- -V Q N V L G S T N L Q G I T G I L P I T Y T ~ N V P ~ ~ ~ D ~ S ~ G ~ A G ~ G G T G A - - - - - ~ M S G G K Y - ~ G A A P G - - A D L I G Y G ~ G G . . . . . ~ A L F ~ L D G ~ G G F D Y A ~ ~ ~ E Y ~ ~ ~ - - - - - - - - - D ~ R V ~ ~ S W G S S G D " - - ~~NEPENEMNWYDAVAGEASP
..........~ Y D D ~ ~ G ~ ~ G T M V G S E - - - - - - - P D G ~ Q . l G V A P G - - A K ~ l A V ~ A F S E - - - - D G G T D ~ I L E A G E W V L A P ~ A E G ~ H P E M " - - - A P D V ~ S W G C G S G " " " - ~ ~ ~ ~ ~ ~ " " ~ ~ ~ ~
~~NFGQYKGYDFVDNDYDPI(ET
....p T G D p R G E A ~ n G - ~ ~ ~ A A N G T.l...........KGVAPD~~ATLLAYRVLGP
...G G 5 G T T E W I A G V E R A V Q D - - - - - ~ ~ - ~ ~ - - - - - - G A D V M N L S L G N S L N " - - - - - ~ ~ ~ ~ ~ ~ ~ " N P D W A T
WVNDKVAYYHDySI(DGKT"--AVDQEBGTWSGILSGNAPSET~--KEPYRL.EGAMPE--AQLLLMRVEIVN--GLADY~YAQAIRDAV~----------~~~~~-GAKVINEISFGNAAL-~~~"""-~~AYANLPDET

b6TT

spscpa
11prtp
ldprtb
llsp09
ageerp
lcp69
CrnCUC"

paat70
atserp
hsklaa

ddtagb
ddtagc
dmpga9
hstpp2
cetpp
emstab
pwro
Lsplst

"

"

~~IPY~KGDAFRyDGTpSYDSD"--------CTLGS~G~vAASPPAAE~~~-----DGG~.HGVAFN~~AQIlSAENGDP~6]IL~ND~AVYQAGWDALVAS~~~~~~-~-------~GARI~~SWGIG~T~~D~QKQFDQ~KQI

y ' F ' ? r . , ~ ~ ~ y ' ~ F ~ ~ ~ ~ ~ ~ ~ : : : : . : : : : ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ : : : : ~ ~ ~ ~ ~ ~ ~ ~ ~

SCNGKIVGAQYFRHGAIAV~E~-NRTRDYRSPFD~GEGS~TAST~GN~~A~~~NGYNFGYASGMAPG--AWIA~Y~L~~----FGG~SDVVAAYD~~E~~~~~~~----------GVDIISLSVGPSAV
HCNS~LICIRYF~CIHAAIP-NATFSMNSRRDTLGEG~TA~T~~N~NGAS~FGYGKGTARGIAP~--RR~~~~~~T~P-~~~EGRYTS~VL~G~~~IAD~~-~~--~-------~GVDVISISLGY

KC~KLIGARSYQLGHC.~~~~~~~~~---SPIDDDG~G~~AST~GAFVNG~FGN~GTAAGVAPF--AHIAVYKVCNS~~--DGC~~VL~MD~IDD--------~~~~~-~-GVDILSIS
RCNRKIIGARSYHIGRPISPG------D~GP~D~GEG~T~ST~GGLV~~~LYGLGLG~ARGGVPL--~RIAAYKVCWN.~~~DGCSD~ILAAYDDAIAD~~~----~---~~~~~GVDIISLSVGGANP
N C N R I ( I I C R R Y ~ S ~ ~ E D D D L K ~ ~ ~ I W P E S R T ~ ) Y Q G ~ C ~ Y T ~ T A A ~ S F ~ N ~ N G L ~ ~ ~ ~ ~ G ~ ~ A S S S ~ ~ A ~ ~ V C G L - - - ~ ~ G ~ P G ~ Q ~ L A A F D D A ~ ~ ~ - - - - ~ ~ ~ - ~ ~ ~ ~
LCNRKLIGARFFRRGYESMGp~DESKESRS~~DDGEG~T5STAAGSVVEGA~LL~YA~GTARCML--~-HALAVYKVC~L----GGCFSSDlL~lD~lAD-~~~~"~""""WWLSUSLGGGMS~~~"""~~~~~
.~~~
KNVKERRIW-RTL
.............D D G ~ G E G T ~ V A G V I A S M.
R.
E.
~.
.
.
.
.
.
.
.
.
Q
~FApD..A~LH~FRVF~.~~NQvSYTSWFLDAFNYAlLK----------~~~-~-~IDVLNLSlGGPDF""-~~~~~"""UDHPFV

""~:~~::~~~~~~~~~~~.~:::~:~~~~~~:~:::
" . - ~ ~ ~ f y ' : ~ ~ ~ ~ , , I ~ ~ ~ ~ : : ~ ~ ~ : ~ ~ . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ : ~ ~ ~ ~ ~ ~ : : : : ~ ~ ~
~~~~

K V I ~ R ~ D C S G ~ ~ ~ ~ ~ K K K ~ ~ . ~ ~ ~ ~ ~ . ~ ~ S S ~ ~ ~ ~ ~ S I A S G I I - . . . . . . . H S S R D V - D G V A ~ N - - ~ ~ ~ V ~ ~ T I ~ D ~ ~ L - G ~ M ~ T ~ T A L V ~ T K V ~ E L - - - - - -

""
""

~~~~

~~~~

DVIA~DNGT~~~~~~NG~T.~~~~~~SDFHG~GTSV~~IASRGRVLYDLYGDG~L(~~GV~PG--AKIAGGDAWLL---CNILVLEANLAGFNIVTEEEDGWYLSLDPFGPH-~DI~SNSWGS~YI--NFW

""

~
Fig. 2. Continues.

505

Subtilases
140

15:

160
.

basbpn
be6168
bssdy

blscar
b=SptY
b6mrd
baaprq

bls141
baalkp
bseyab
b.aprs

b6FP2
baecpr
"TWapL
psaprp
paa1ys
bSCA39

betall
bP16P
b616pl
bslakp
barspq
Lllap

170

180

200

~~~~

~~~~~

~~~

~~

~~~~~

~~~~~

~~~~~

ivLher

LsLap
bsakl
hmhlys
nahlys
syos35
bsvpxa
dnbpr
dnavp2

dnavps
a1apri
XCP'Od

.aE.lf

ear.

190

K A A ~ ~ V ..~~~G--G~cTs~sss.~~~...~
A S
........ - . T V G y p G k y p ....SVIAVCA"DS---- ..........--SNQRASF - - - - - b J G P E - - - - - - ~ - - - - - ~ L D V I ( A P G V - - - - - - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~
~ ~ ~ = ~
~ G y p A K q ~...
p
S T ~ A V C A V N S~ ~ ~ ~ S ~
N Q R~
A S F
~
~ ~ ~~
- - S~
S A G.S E.- - .
- ~ ~~
~ ~ ~~
~ ~ ~.~ ~.L D.V I I A P
- G- Y- - -~ - - - - - ~ ~ ~ X ~ ~ V ..~.
S S~ C I V W - G ~ G ~ ..~~..............
KQAYDXAYAS.....G I V W - C N ~ G ~ S G ~ Q N ............~~....
T ~ G Y P A K ~ D ~ ~ ~ ~ S VYD~........~........
IAVGA
NXNRASF ...~ ~ S S V G A~
E ~ ~ ~ . .L .
E V.H.A.P .
CV
.- - - - - - - ...
K Q A V D N ~ Y ~.R ~
~~GV"W-ACNSGSSG.
............~.....
TIIGYPAKIYD. ~ ~ S V I A V c A V D
........~........
S
N-ASF.....SSVGAE~~~..
.......L E V H A P G A - - - - - - - - - - - - - ~
~ A C N N IYNR....
I
~ ~ I V V I A A A G N ~ G ~ ~............~~~~~.
G~ c y p - y s ....~ V ~ A v C ~~.~.............A v ~ ~
A S F . . . . . S S V C S E . ~ ~ ~~ ~
L E~ V~ H~A~P ~G ~
V-------~~
- - -~
RQASDNAYNS .....C I W I ~ A ~ N ~ c ~ ~~.~............
v L c L
~ ~ T I G y p - y D....S V l A V C A V ~ S. . . ~ ~ ~ ~N~
M(R
~
A S.F .
. ~~ ~.~.
SS
.V.C.S Q - - " - - - - - L E V H I I P G . - - - - - - -~~
Q L - - N A ~ ~ ~ ~ ~ C V L L I G I \ R ~ N ~ C Q ............~.....
~C~N
NHGyp-yA
....SVMAVGAVDQ ................. NCM(ANF .....S S Y C S E " - - - - L E I H I \ P G V - - - - - - - ~ - - ~ - - - - - - - - ~
E ~ ~ I V M L I \ N N ~ " " ~ C I L ~ ~ ~ ~ ~ ~ ~ ~ Q " - - - - - - - . " " " ~ ~ ~ ~ ~ ~ G ~ ~ y p ~ y ~ " " G ~ ~ ~ ~ ~ ~ ~ ~ Q " - - - - - - - - - - - - ~ ~ ~ ~ . . . . . s ~ y ~ p E . . . . ~ ~ ~ ~ ~ ~ ~ ~ . I E I $ A P G V " " " - ~ ~
EQAVNSATSR .....G v " I \ R ~ C ~ ~ c A G .~.~~~~~..............
SIsyp-yA
....
NAMAVGA=Q ......~~~........~~~..~~.SQYGA~--"-------LDIVI\PCV-~------------~~~~~~
EQAVNQATAS .....GvLv"-sI\QTsGAc
...~~~~~..............
N V C F p A ~ y A~ N~
~ A v c.A =.
Q ......~..........
~~F.....SQyCAC~~-..........LDIVAPCV---------------~~~~~
E L A V N Q A R I A ~ ~ ~ ~ . G V L V " - T G ~ G....~~~~..............
~G
TVsyp-qA~~~~NALAvcA
........~........
~Q
NNNRASF .....S Q Y G T C ~ ~ ~ ~ ~ ~ ~ ~ ~ . . . . L N I V A P G V - - - - - - - ~ ~ ~ - - - - - - - ~ ~ ~
WAVNRAYEQ.....
G"LLVIsc~GNGK
~~~.~~."............
p v ~ p I \ R q s . . ~ . S"SAT
~ A ~ . . . . . . ~ . ~ . . .L.A.
S F..""
..~
STTGD
Q ~ ~ ~ ~ ~V.E F
.~.
A p.C.
T "." ".. ~.
. .
. . . . . ~ ~ ~ ~ ~
K S A L ~ X A Y N ~ " " . C I L I ~ ~ . ~ ~ ~ s ~ ~ ~ ~ ~ . . . . . . ~ . . . . . . ~ ~ ~ ~ ~ ~ L y p ~ ~ y ~ . . . . ~ ~ I ~ ~ c ~ ~ ~ ~ ~ . ~ ~ ~ ~ ~ ~ ~ . . " " " ~ L Q R L p ~ ~ ~ ~ ~ ~
Q N R I I ~ ~ L Y ~ ~ " . ~ . C ~ L ~ I ~ ~ N s G ~ . . . . . . . . . . . ~ . . . ~ ~ ~ ~ ~ ~ ~ ~ " s y p ~ s y ~ . . . . ~ " ~ ~ " ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ .
. . . . . . .~
. ~ ~ ~ ~ ~ .
~ ~ ~ ~ ~ ~~
~
Q N A H R N F y Q Q " " . G H L L ~ ~ c N ~ c ~ ~ . ~ . . . . . . . . . ~ . ~ . . . ~ . . ~ . c ~ ~ y p ~ s y ~ ~ ~ ~ . ~ ~ ~ ~ " ~ " ~ ~ . . . . . . . ~ ~ . . . . . . . . s ~ ~ ~ ~ ~ . . . . . ~ Q ~ ~ ~ Q ~ ~ ~ ~ ~ ~ ~ . . . . . . ~ E ~ $ A p ~
RDASyWAqQQ . ~ GAVQI-I\QTsGDc~pL
~..
~ . . . . . . ~ . .~.~.~~
C ~
y p.
A.
~...
K y S~ V I ~ - V D Q.....~~~~........
N c S V p T ~ . ~.S S~
D G p.E
~~-~...
V.
D T.A.A.P ~
G V.
-------~~
~ A V N Y S Y N K - ~ ~ ~ ~ G V L I I A ~ I \ Q T S G P Y Q ~ - - - - - - - - - - - - - ~ ~ ~ ~ ~ ~ ~ S I G Y P G A L V - - - - N A ~ A V ~ L E N ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - ~ V E N C T Y R V A D F - ~ ~ ~ ~ S- S -R- C ~ S ~ ~ C D ~ A ~ Q ~ ~ D - V ~ I $ A P C A
T N A V D Y A Y D K - ~ ~ ~ - G Y L I I ~ A I \ G I ( S G P K P G - - - - - - - - - - - - - ~ ~ ~ ~ - - - S I ~ Y ~ ~ A L V - - - - N ~ ~ ~ ~ A ~ ~ ~ N ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - ~ I Q -~ ~ T ~ ~ ~ ~ ~ F ~ ~ ~ ~ ~ S S R G ~ K R
" E A " K ~ ~ "~ S . " " Q I L " W ~ - ~ ~ c ~
~ ~~
D ~ ~ ~ ~ . . . . . .E V.
I S V~
G A I.
NF .
~. ~
. .~. ~
~~
~
~ ~
~
~ "~
~
A S~
~
E F~
~
~ .
~~
.
~.""
~.
~
~.~ S.
~.~N~
.~ S~..
~~ ~..
E~ V D L V A p G E . . .~.
. . .
~ ~ ~ " ~ N I \ V ~ N . " " G V L V V C ~ C ~ ~ D C D~~~..............
ERTE
EL~ypAA
~ . ~.
E V.I A V G S V S V~ ~ ~ ~ ~A ~
R E ~
~ S ~~
F ~~
~ ~ .
~ ~.
...........
S.
N A. N.K .E ~.~ .
~ ~ L V A ~ G ~ . . . ..
. . .. ~
I(~RVXyAVSN"""1Svv~~~~cDc
~
~ ~~
D ~ ~ . . . . .E.
~A
.y.p ~
A A y.N
""
...
EVIAVGAVDF
.
~ ~ ~ ~ ~
~ L~
R L~
s D F~
. ~ ~~
~ ~.
p -.
E E. ~ .
~ ~.. ~." .
IDIVAPG"""".~
" "..
.
.
H Q ~ I R W \ ~ ~ E . ~ ~ " D I L V ~ ~ ~ ~ ~ ~ ~ c ~ ~ ~ ~ ~ . . . . . . . . ~ . . . . . ~ y ~ y p c ~ y p . . . . ~ ~ ~ Q ~ ~ ~ ~ ~ ~ ~ ~ .~ ~ ~ ~ ~ ~ ~ . . . . . . .
WDRIKEAVAS"".GRLVV=-G~cDcNEE
~~~~
~ .~ .~. ~. .
~ .
~ .~ p""E
~ F AVVQVGSVSL
y p G A. .
y . ~ . . ~ ~ ~ . . . . . ~. . .N ~ ~
S
~
N ~
C~
IK
~
D L~
~V I.\ .
~ * .G.~E. " ~" " . ~~ ~ ~ ~ ~~ ~ ~ ~~ ~ ~~ ~ ~~ . ~
~

Yaprod
l l l p r i
trt41a
Laaqua

taproc
iaprok
Lapror
bbprl
tuhrlp
plbspr
macdpa
"Uespr
acalpr
dfO'YZ

aooryz
afelsl
dnprfd
dnpepd
rhprbl
anpepc
SCpFbl
SCYFPJ

"P6FPr
y1xprz

scyct5
COr.AB(

etcy12
6PPFPP
1siasp
bspara
hecplp
Il">PP
1spo:
cepc2
bcpc:
hspc2
dCPC1

lapel
hepcl3
bCPCl
hspac4
hrpcb

aatur
dmfurl
fLfYI
cctur,
actur1
actur2
Isfur2
"PC4
xlfuri
hrfur
dmfu2-2
cetur2
hakxl
hslpc
ylxpr6
k l kexl
eckex2

apkrpl
hvccvp
avprca
"maspa
SlSSP
scsepr

rmserp
SWSPl
6.66P2
phesal
bsspra
besprb
bsbpf
bsvpr
'p6cp'
11prtp
Idprtb
llsp09
agserp
1cp69
CrnCYC"

paat70
dCr3FI-P
hsklaa
ddtagb
ddcagc
dapg.9
hstpp2
CCtPP
emstab

PfPYro
t6p16L

Fig. 2. Continues.

~~~

506

R.J. Siezen and J.A.M. Leunissen

basbpn
bs6168
bssdy
blscax
bstipl c
bsspxd

bsapxq
bls147

baa1 k p

bseyab
bsaprs

bscpr
bsseor
Ymvapt

p=ap=p
pdalY6
bsfa19
b~t.341
bplsp

bsispl

bslakp
bslspq
Lllap

LvLher
tstap

bsakl
hmhlys
nahlys
syo531
bsvpra
dnbpi
dnavp?
dna"pc.
alaprl

xrproa
..c.st

c o r . AB
"aproa
alapl2
Lrt41,
Laaqua

taprot
Laprak

Lapro.
bbprl
tuea1p

plbsp,
macdpa

aocspr
.CdlD.

ataryz
d"0'yZ

ilfC1.t
anprta

anpepd

Lhilrbl
="pep=
s;cprbl

ECYspl
spsepr

ylXpr2
scyct5
cor-C

~ ~ N l L S T W I -C - S- - ~- -~- -~- -~ - ~ - ~N Y ~A R~I I~I -~ S G


~ T S H R S P H I A C L L A Y F V S L Q P S S D S A ~ A V ~ ~ ~ ~ ~ ~ E E L T P A K L K K D I I A I A T E ~ A ~ ~ - - - - - - - ~ ~ ~ ~ ~8~6 ~
1 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ D I P S N T
~ ~ N I L S T Y I G S~- ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - ~ - - - - - - - O D I T A T L - S G T S H I S P R Y I C L L T Y F L S L Q ~ C S D S E F F E L G Q ~ ~ ~ D S L T P Q Q L K K ~ ~ ~ ~ ~ 4~5 1T ~ ~ ~ ~ ~ - - - - - - - ~ ~
~ ~ N l M S T Y I ~ C~ ~S - ~~ -~ ~ ~
- - ~R N~ A ~T L~ S L
~ - ~S G~ T ~S M~I S~ P ~R Y~ l G l L S Y F L S L Q P R P D S E F F N ~ ~ ~ ~ ~ ~ D A P S P Q E L K E ~ " ~ ~ ~ ~ = ~ " L G - ~ ~
- N I L S T W I C S ~ ~ -~ -~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ N T S T N T I - S G T -S- -M- A~ T~ P- H- V ~A -G IL\SA AS YI Y
I SL E~ VLX~D ~A .I .
IKMGIHDVLL~~~~~~~~~~~~~~~~~~~~~~~SIPVGSSTINLLR
- - O I I S A S Y Q S - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ D S G T L V Y ~ S G T S M A C P H V A G L A ~ Y Y L ~ l ~ - - - - - - - - - - - - - - E V L T P A Q V E A L I T E S N T G V L P T ~ ~ ~ ~ ~ ~
- - EIESLSHLN~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ Y N D T L I L - S G T S M S T P l V T G V ~ ~ L L ~ K C . . . - - ~ - - - - ~ - ~ - - - - I E P E M I A Q E I E Y L S T R ~ F 6H5 1R R T L - F F I ( P S T P N Q I

efcyla
sep=pp
161 asp

- - Y Y P T S L V S P L G K A A D F ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ P D ~ Y T L S - F G T S L A T P E V S A A L A ~ l ~ ~ ~ ~ -- --RNSHLKYKEVRII
- - - - ~ - - - - - - D ~ ~ ~ D S N ~ V ~ N ~ L F
~ E I T T M l V A N 7 R L V G K I S D ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ P I G Y T L N - - M- -G- N -SI RIYAP STI SN EYI R
I SSL l
G SCr YNO-D K-E -R N L ~~~
~ --IEITKRVIEDEIV ~
~
~ - E I I T T I G T D A I W I D F Q F I E N V P R G F I l n - I G T S L I T G L F ~ I ~ - - - - - - - - - - - - - ----- - S-L-Q R F K S A N F Y

bspara

~ ~ E " L A l D K ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ . ~ " " " " " " " . Q S E I T I Q . S G T S F I \ T P ~ " ~ ~ " ~ ~ L y l E D C E " ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ S I D L D F L R S I ( S E D L G " " " " ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
K Q S V L S T S S ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - ~ - ~ ~ - - - - N G R Y I Y Q - S G T S L R ~ P I Y -S LG ~I \
O LQ RP LE ETIAD II E( LY F
Qr -K ~r ~c-l-E~ K~ E~ r- Y -H D R X B Y G N C r L D V Y K L L K E
KDWLFTTAN ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T G W Y Q Y V ~ Y C N S F A T P K Y S G A L ~ L ~ ~ D K ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Geeplp
lln16p
1 "PC2
ccpcz

bcpr2
hPpc2
dcpcl

laprl

hspcl3
bcpc3
hspac4
hspc6
aatur
dmfuri
tttur
cctur1
acfurl
actur2

1etur2
mmpc4
xlturR
hsful

dmtur2
cetur2
hakxZ

hslpc
ylrpr6
klkexl
ecker2
epkrpl
hvccvp
avprca
asaspa
s1ssp
6c6epT
srnscrp

6mSSpl
6msspZ
phseal
bsepra
bsaprb
bsbpf
bsGr
spscpa
11prtp
ldprtb
116~09
agscrp
lep69
cm.Ic"c"

paat70
atscrp

hsklaa
ddtagb
ddtagc
dmpga9
hstpp2
=tPP
smstab
PfPFO
tsp1st

D A G V A T T D L Y - - - - - - -- - - - ~ ~ ~ - ~ ~ ~ ~ - ~ ~ ~ ~ ~ M I C T A S H ~ S ~ T S A A A P E A A C Y P R L R L E A ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ N L T W R D M Q ~ L N L T S K ~ N ~ ~ ~ D ~ - - - ~ ~ - - - - N ~ ~ ~ - H W K
E T C V A T m L y - ~ ~ " ................
~~~
GRCTRSH-SGTSAARPEAAG\IFRLALAL~ANP~~~~~~~~~~~~~~~SLTWRDLQHLNLT~~~N~~~D~~C~FII~lNCSHFEU~NGVGLEYIDM(LFGFGVLDA
E A C V A T T D L y - ~ ~ " ~ ................ G N C T L ~ R - S G T S ~ A P E A A C Y F R L A L A L Q A N P ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ N L T W R D ~ Q " L T " L T ~ ~ ~ N ~ ~ ~ ~ ~ V H E - - - - - - - ~ - ~ ~ - W ~ N G V G L E F I D M ( L F G F G V L
EAGVATTDLY~~~~~~~~~~~~~~~~~~~~~~~~~~GNCTLRH-SGTSAAAP~~~~~F~L~LEANL----------~----GLTWRDMQHLT~LT~K~NQLHDEVHQ-~-~~~~~~~~~
E C R V T S A D L H ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - G K C R I S R - ~ ~ T ~ ~ A A ~ ~ ~ A ~ L ~ A L L L E S N P ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ N I T W R D A Q ~ ~ ~ A H T S R M E P L A L E ~
DPRITSADLH~~~~~~~~~~~~~~~~~~~~~~~~--NECTQTH-TGTSASAPLAA~IFALALEQNP-~~~~-~~~~~~~~~~LTWRDLQHIVVWTSEFDPLA~G-------------WI(RS
DQRITSADLH~~~~~~~-~-~~--------------NDCTETH-TGTSASAPLAAGIFALALEANP~~~~~~~~~~~~~~~NLTWRDM~HL~~~TSEYDPLAMIPG~~~~~~~~~----WKKNGAG
DQKISS~LH~~~~~~~~~-----------------HECTDSH-TC~SAAAPLAAG~LALALEANP~~~~~~~~~~~~~~~NLTWRDVQ~LIVWTSEYDPLSS~G~~~~~~~~~----~FQNG
ERKIV~DLR~~~~~~~~~~----------------QRCmCH-TGTBVSRPMVACIIALALEANS~~~~~~~~~~~~~~~QLTWRDVQHLL~KTSRPAHLKASD--------------~~~N~A~HKV~HF
DKKII~DLR~~~~~~~~~~----------------QRC~IDM(~TCTSASAPMAAGIIALALEANP~~~~~~~~~~~~~~~FLTWRDVQHVIVRTSRAGHLNA~~-~~~~-~--~~--~K
E K Q V I ~ L H ~ ~ - - - - - - - - - - - - - ~ - - ~ - - - - ~ ~ - H S C T S S H T C T s R S R P L A A G I A A L V L E A N P ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ ~N ~L ~T ~W WR SD~LNQG~YIGVRVR RV TS AH KS ~F G Y
N GL LK MD DP AT A~ H~V~I~L~A~O~ ~
EKQVVT~LH------------HSCNSHTCTBRSRPLRRGIAALVLQSNQ~~~~~~~~~~--~-~NLTWRDLQ~IVVRTAKPANLKDPS~~~~~~~~~~~~~~~SRNGV~RRVSHSFGYGL
EREIITSDLH-~--------------~-~-~-~~~~HSCTTQH~TGTSASAPLAAGICALALEANK~~~~~~~~~~~~~~~QLTWRDMQHIVVRTARLANLQSSD~~~~~~~~------~~TN~"~RH"~
EK~ILTTDLH~~~~~~~~------------------HAC~H-TGTSASAPLAAGIVALALEANP~~~~~~~~~~~~~~~NLTWRDLQH~VIRTAKPINL~GD--------~-----WTTNGVGR~SHSFC
E K Q I V T ~ L H ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - - - - - - - - - - - - - ~ ~ ~ ~ ~ ~ - ~ ~ T ~ A S A P I V ~ ~ L L A L A L E A N P ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ S L T W R D L Q ~ I I ~ E T A K ~ D ~ L ~ ~ D ~ - - - - - - - - ERQIATTDLR~~~~-~~~~-----------------QRCTTI1I-TGTSlSAPLAA~I~AL~LEA~~~~~~~~~~~~~~~~DLTWRDVQYITLMTSRSDPI~DGQ-----------~--WIVNGVCRKVSLRY
E K C I A S T D L H ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ - - - - - - - - - - - - E K C T ~ - T G T S I \ S I ~ ~ ~ ~ ~ ~ ~ E - - - - - - - - - - ~ ~ - ~ W V T N C V G R Q V S L R Y G Y C L H D
O P Q I V ~ L H - - ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ H Q C T D K H T G T S A S R P L A A G M I A L ~ L E A N P - - - - - - - - - - - - - - - L L T W R O L Q H L V V R A S R P A Q L Q A E D ~ ~ ~ ~ ~ ~ ~
EI(QIVTTDLR---------------~~~~-~~~~~~QKC~SH~TCTSASAPLAAGIlALALEANK~~~~~~~~~~~-~~-NLT~RDMQHLVVQTSNPAGLNAND~~~~~~~~~~~~--~IT
EKQIVTTDLR---------------"-QI[CTESHTCTESH~TGTSASAPLAAGIIALTLEANK-~~~-~~~~~-----NLTWRDMQHLVVQTSK~AHLNA~~~~-~~~~~-~~--WATNGVGRKVSHSYGYGLLDAGAMVAL
DKSVANDHDGSLRPD--------"-HIC?1IEHTCTERSAPLAAGICALALEANP~~~~~~~~--~~-~~ELTWRDMQYLVVYTSRPAPLE~ENC~~~-~~~------~TLN~VKRKYSNKFGYGLMDAGAMVSLAI9
QPAIVNDVP--------------------~~~-~~GGC~KH~TGTSASAPLAACIIALALEANP~~~~~~~~~~~~~~~ELTWRDMQHLVLRTAN~KPLE~G-------------WSRNGVGRMVSNKFGYGL
EN~HY~LY------------------------~-H~TEEF~KGTSASAPLAAGI~ALTLEANP~~~~~~~~~~~~~~~LLT~RDVQALIVHTAQITSPVDE~--------------W~RNCRCFHF~KFGFCR
LRSIVTTDWDLQKG----~~--~~~~~~~~~~~~~~T~CTECH~T~TSAAAPLAA~MIALMLQV~P~~~~~~~~~~~~~~-CLTWRDVQHIIVFTATRYEDRRAE-~~~~~~-------WV~
--YIYGTDINAIDDKSRR---~~~-~~~~~~~~~~~PRCQNQH~GGTSARAPLAAGVFALALSVRP--~~-~~~---~---DLTWRDMQYLALYSAVEINSNDDG~~~~~~~~~~----~QDTASGQRFHHQF
--YIITTDLD---~-~~~~~~~~~~~~~~~~~~---EKCSKI1(-GCTslU\RPLAAGIYTLVLERNP---------------NLTWRDVQYLSILSSEEINPHDGK~~~~~~~~~~~~~-WQDT~GKRYSH
--Y~HSSDIN~~~~~~~~~~~~~~~-~~--~~~---GRCSNSH-GGTS~APLAAG~YTL~LEAN~---------------NL~WROYQYLSILSAVGLEIWADGD~~~~~~~~~
~~SIL~PE~~------------------------GTCTRSH-GGTSAAAPLASA~YALALSIRP-------------~~DLSWRDIQH~~YSASPFDSPSQNAE--~-~--~~~~~UQKTPAGFQFSHHFGFGKLDASKFVEV
~~PNE?VUYD~-------------------------GKCGFIP-SSSSARPPILG~LLALIRAHP--------------~TLTL~IQRIL~RAA~~V~T~~GRGW~~~~~~~~~~WLMlV~R~~RNFGFGEVS
T P G I W T R D R T G V - V G Y N S G N L G D Q A - - - - - - - - - G N Y R I ~ ~ - ~ ~ T S ~ A C P ~ - ~ " ~ ~ L I L S ~ N ~ - - - - - - - - ~ - - - - - - ~ ~ ~ ~ D ~ " ~ D I I K R ~ C D R I D P V G G - ~ ~ ~ ~ ~ ~ ~881
~~~~~~~---~NAEGRSPF
APA~VT~LPGCDUGYMlVDDPSTNRLHMIPQLDlSCDYNG~~~--------------~~DLSYRDLRDLLI~NITRLDAN~PVQINYI9)VTGLECWERNAAGLWYSPSYGFGLVDVNKTQPCSIIl91
~

--LILGTLP---------------------------GGK~GYM-AGTSMASPHVAGVAALlKS~P---------------HASPAMVKALLYA~ADATACTKPYDlDGDGKVDAV------~E~PK~~CFYG~GMADALDAVTW
- - D I Y S T Y P - - - - - - - - - - - - - - - - - - - - - - - - - G C G Q C T Y P - - - - - - - - - - - - ~ ~ ~ D ~ T P A Q I ~ T R I E ~ T A E R S V N G ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ - - - 7-0-) - - - - H D D F V G W G V V D
~~DIYSNCRLES~GCAVM(EAYNKGELSL~~-~~NPGYGNK-SGTSMAAPHVTGVAAVLMQRFP~~~~~~~~~~~~~~~Y~SADQISAVIK~ATDLGVA--------"GIDNLFA
-~RVYSSIIEGTSVENL-------------------TTGYAKY-SGTSMAAPHVAGSVAVLMERFP--~~----~------YLNGAQVAEVLKTTA~MGAP~~~~----------~~~~~~~~~~~GIDALYGWGMINLGKAI
--I(IYSNRNGSDP~-----~-~-~~-~------~SDYGNK-NGTSMATPHVTGAVA~LLQRFP---------------~~SSAQIADVLKTTA~MGAP~~~~~----------~---~~~~~~CIDALYGWGIIINLGI(
--LIGVADEHKKP-----------------------QYGLTKE-~TSFSAPAITASLAVLKE~~D---------------~~TATQIRDTLLTTA~LGEK------------~-~~~~~~~~~~~G~
--OIYSARYFTPLSALSAQILEYISPRH-------LPYYTTF-SGTSMAAPHVAGIlALMLE~~-----~---------~~~~LE~KEILEGTAlPMEGY-----------~~~~~~~~~~~~~~~AlWETCAGYVDA
~-DIYS~RVLAPLSALMEIA~Ll~PQH-~-~~~~LPYYT~~SGTSMATP~AGlVALMLEADP---------------T~~PDQVKEI~QHTA~PGY~~~~~-------------~~~~~~~~EAWEVGAC
~~NIRSSVP~~~~~~~~-~~~~~~----------~~GQTYEDG~DGTSMAGP~VSAVAALLKQ~A---------~-----SLSVDEMEDILTSTAEPLTDST~~~----------~~~-~~-~~PDSPMIGY
~~NIVSTIP~PDH~~~~---~------------PYCYCSI(QCTSMASPH~AGAVAVIKQAKP~-~~~-~~~------KWSVEQ1KAA~M~AVTLKDSD~~~~~~--------~~~~~~~GEVYPH
~~DILSSVA-~~--~~---~----------------MIKYAKL~SGTSMSAPLVAGIMGLLQKQYE~~~~~~~TQYPDMTPSERLDLAKK~LMSSATALYDED~~~~~~~~----------~~E~YFSPRQ~A

~ ~ N I W S T Q N ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - ~ ~ ~ M I C Y ~ ~ S C T S M A S P ~ l A ~ ~ Q ~ ~ ~ K Q A L M I K M I ~ ~ Y A ~ ~ ~ Q ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ? V ~ ~ ~ ~
- - ~ l Y S L I \ ~ - - - - - - - - - ~ - ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ - D N K Y Q Q M ~ S G T S M A S P F ~ A ~ S E A L ~ L Q G ~ ~ - - - - - - - ~ Q ~ ~ N ~ ~ ~ ~ ~ ~ ~ Q F ~ ~ ~ A ~ ~ S H P ~
--SIWRAWSSNSTE----------------------GENFAL~-S~TSMATP~AG~A~~~KQ~HP---------------NWSP~lASAlM~AQ~D~~~LL-~~~~~~~~~AQQAT~PSTATPFDYG
~~LYLASWIPNEATAQICRiYYL-------------~~H~~~-~~TSMA~PHA~GV~ALL~AHP~~~-~~~~~~~~~~~EWSPARl~~~~~~~N~~~NTLNPl~~~~~~-~~-~~NILAAWPTSVDDN~------------------KSTFNII-SGTSMSCPHLSGVRALL~S~P~~~~~~~~~--~~~-D~~P~~KS~M~RDTLNLANSPI--------------LDERLLPAD~YAICAGH~
- - EILAAWPSVAPVGGIR - - - - - - - - - - - - - - - - -N T L F N I I - S G T S M S C P H I T G I A T ~ K T Y ~-P - - - - - - ~-~-T W S.P.A.
AlKSALM~ASPMN~---~~~~~~~~~~---------FNPQAEFAYCSGHVNPLKAVR?Gll441
- ~ N I I A A W N P P N Q S D E D T W S E H T - - - - - - - - - - - - P S T F M L L ~ ~ - ~- - - - T
~N~
- S D- ~ P- G T P F D F G A G W N P I C R L P P C o
~~N~LAAUTGARGPTGLASDSR--------------RVEFNII~SGTSMSCPHVSGLAALLKSVHP---------------~~~P~~RSALM~AYKTYKDGK~~-----~~~~~~--LD~ATGKPSTPFDHGAGHVSPTTATNPGIl
- - GVRGSGV~ - ~ - - - KGGCRAL
~ ~. ~ G~T ~ ~~A ~ ~~~ A ~~A ~ T~L L ~~~ ~ Q- . .~
. . .-. . ~
. . .~
. . ~~
E L ~ .N ~ ~
A ~ ~~
K Q A~
L I A~ ~ ~~
~ ~ ~-~ . . . ~ ~ ~ ~ ~ ~ - - - - - - - - - - --Y~?.~~~~N~ENSToQCGDGSLPN-----------RN~~~~~-~~TS~ATPLATAATTlLRQYLVDGYFPTGESVEENKL~P~~~~~~~L~lMIAQLLNGTYFWSASS--~-~~TNPSNA~FEQlNCANL~QGW

--YITS~SNG~~QCGDGSLPN-----------~ALLAl-SGTSMATSFAAAA~lLRQYLVDGYYPTGSlVES~LQPTGSLLKALMlMIAQLLNGTFQLlTSSSl~~~-TYPSNQVFENFAGASLV~WGAI~SNWLHVllO~2l
~ ~ A I A S V P Q~
F T ~ ~ ~ ~ . M S~
K S Q L~
M.NG~
T ~ M - ~~
~ A = A~
" * ~ ~~
I S = L K~
...~~
~ ~ .~ ~
~ ~ .
~ ~~
~ N. I .E .
~ ~~
~~
~. .
S.
~
I .
K. .
R .~ .~. ~. V. .
T ~
A ~T. ~~
K ~L~
A G ~~
~H V~
G L~
L ~~
~ = ~ ~ ~ H L 1
-2 7 ~ 1
~~AIASVPNW
~
T
~
~ ~
~ ~ ~ Q ~ H .~
N ~ ~ ~ ~ ~ ~
S. ~. N. A. C. ~~
~~
I~~
DA
~ -~
L~ V~R~
RL A SL~
~
E~~ A~
L V K~ A
-D N~
] ~
~ . . . . . .~
. . . . .E~
V F A~
.
QG~
~
G l ~~
QV~
DKA
~
~
Y D Y~
L 1 73 ~1 - ~
--~
..AFAGYPQYC.........................
R Q ~ M - ~ . N ~ T S ~ S ~ ~ N - G - A C M L ~ G L.
K..........Q-K ~ T P Y ? V R M A L E ~ A Y M L P " I. . . . ~ ~ ~ ~ ~
~ ~ ~~
~ E S~
F SQ G~ U-l K lA TA Y-E K Li ~-l 3-l
.FEUAS~TIDCRGY~~~..............~.~~~AQpDVF-~~T~~ATPyTS~T~AL~~QAYKE-----~~~-~Vy~TpDp~TA~~~LKSSAKDIWY~~---------~--~~~~~~------~PAFSQCSGRMALKARD?Vl6O
..GIYSSLPMW.........................
1 G ~ ~ F M . s G T s M ~ T p ~ V S G ~ A L L I s...........
GpK
p ~ ~ ~ y ~ p D ~ ~ ~ ~ v L E s ~ A T ~ L E G D P ~ ~ ~ ~ ~ ~ ~ - - - - - - - - ~ ~ ~ ~ ~ T G Q K Y T ~ ~ ~ ~
~~HI~.SSLPLWYTV-S ~
~
~
~ ~
~ ~ ~ ~
. ~ ~ ~
~ ~ ~ A
~~ ~ ~. ~ ~ ~
~ ~ ~~
A L ~ ~
I ~ ~ ~
A K ~ ~
~ ~ ~ ~~
~ ~ ~~
~ ~ ~. Q ~ .~ ~ ~- ~ ~ ~ ~ ~ ? A I L ~ L ~ ~ K ~ ~ N ~

Fig. 2. Continued (see facing page for caption).

507

Subtilases

sequence difference are not shown in Figure 2 but are listed in


Table 2. Amino acid numbering used throughout this review corresponds to that of mature subtilisin BPN (acronym basbpn), our
reference sequence. Residues in inserts relative to this reference
sequence are numbered in square brackets; for instance, residues
inserted between positions 12 and 13 are numbered 12[+ 1],12[+2],
etc., or 13[ -21, 13[- 11 if more appropriate.
The conserved catalytic residues Asp 32, His 64, and Ser221 are
highlighted in Figure 2, as is the oxyanion-hole residue Asn 155.
Conserved core elements (black bars) and secondary structure are
indicated (Siezen et al., 1991, Heringa et al., 1995).This structural
framework can be used for homology modeling of subtilases of
known primary structures but unknown three-dimensional structures.
In some of the most highly diverged sequences there are regions
with very weak sequence homology, even in the core, which results in alignments that are not unambiguous. In those cases, alternative alignments to those in Figure 2 may need to be considered.
These regions are found on the surface of the molecule and contain
numeroussolvent-exposed residues, allowingforgreatersidechain variation. Examples are (a) the exposed regions 43-58 and
182-21 8, which contain structurally conserved P-strands and turns;
and (b) the exposed amphipathic helices 104-1 16, 133-144. and
243-252. In the latter case, the sequence alignment of amphipathic
helices is also based on the requirement that at certain positions
non-polar side chains are conserved that point into the interior of
the molecule, while polar residues face outward. When necessary,
correct three-dimensional positioning of Cys residues to form
putative disulfide bonds was used as an aid in proper sequence
alignment.
Sequence homology and family division

In Figure 3, the pairwise sequence identity within the catalytic


domains is plotted graphically for all members of the subtilase
superfamily aligned in Figure 2. It is clear that clustering occurs
into groups or families, in which members show higher sequence
identity to each other.
Figure 4 shows the parts of a family tree or cladogram, a measure of the sequence homology between superfamily members,
constructed from the sequence alignment of the catalytic domains
in Figure 2. In our earlier paper, a less extensive tree identified two
main classes and some subclasses (Siezen et al., 1991). This expanded sequence information now allows a new subdivision into
six families, which are summarized below. The dendrograms in
Figure 4B illustrate the sequence homology within these families
and further subdivision into subgroups (or subfamilies). Many of
these subgroups are also apparent from the color patterns of sequence identity in Figure 3.

Subtilisin family
Only found in micro-organisms as yet. Includes mainly enzymes
from Bacillus, with subgroups of true subtilisins (>64% identity),
high-alkaline proteases (>55% identity), and intracellular proteases (>37% identity). Numerous minor variants of true subtilisins
and high-alkaline proteases have been identified (Table 2). Long
C-terminal extensions are rare. Several 3D structures are known
(see Tables 1 and 2).
Thermitase family
Enzymes found only in micro-organisms, including some thermophiles (>55% identity) and halophiles. The characteristic N-terminal
sequencewas alsofound in severalother Bacillus proteases
(Table 3). Only one 3D structure is known (thermitase).
Proteinase K family
Large family of secreted endopeptidases found only in fungi,
yeasts, and gram-negative bacteria as yet; the bacterial subgroup
has >55% sequence identity. This family is characterized by a
high degree of sequence similarity (>37% identity), only minor
insertions and deletions and the absence of the Ca2+-bindingloop
residues 76-81. Only a few of these enzymes have a significant
C-terminal extension beyond the catalytic domain. One 3D structure is known (proteinase K).
Lantibioric peptidase family
A small number of highly specialized enzymes for cleavage of
leader peptides from precursors of lantibiotics, a unique group of
post-translationally modified, antimicrobial peptides (Sahl et al.,
1995). Theseendopeptidases have only been found in grampositive bacteria, and several are intracellular. Only llnisp has a
C-terminal extension, which acts as a membrane anchor. Characterized by low sequence similarity with each other and other subtilases (Fig. 3), and by numerous insertions/deletions. The most
recently reported protein bspara from Bacillus subtilis is described
as a putative protease required for plasmid stability; we speculate
that it may also play a role in lantibiotic processing.
A few 3D structures have been predicted by homology modeling
(Siezen et a]., 1995a; Booth et al., 1996).
Kexin family
A large group of proprotein convertases (PCs) have been identified, all involved in activation of peptide hormones, growth factors, viral proteins, etc. (Barr, 1991; van de Ven et al., 1993). High
specificity is seen for cleavage after dibasic (Lys-Arg or Arg-Arg)
or multiple basic residues. Nearly all are eukaryotic and have high
sequence homology (>40% identity), while two more distant members from Aeromonas and Anabaena provide links to other subti-

Fig. 2. (fucing page) Alignment of amino acid sequences of catalytic domains of subtilases. Multiple sequence alignment was initially
performed using the PILEUP program (Devereux et al., 1984). Next, improvements were made manually by taking into account the
structure-based alignment (Siezen et al., 1991; Heringa et al., 1995). Inserts werejudged to occur most likely in turns in external loops.
Families A to F are indicated on the left. Enzyme acronyms are given in Table 1. (*) New entries, and (c) corrected entries since Siezen
et al. (1991). Residue numbering at the top corresponds to that of mature subtilisin BPN (basbpn). Catalytic residues Asp 32, His 64,
and Ser 221 are in bold (highlighted red), as is the oxyanion-hole residue Asn 155. Green = highly conserved residues from Table 4;
yellow = Cys residues. Structurally conserved regions of the coreABC and extended coreAB are shown as solid bars; common
secondary structure elements are shown as: h = helix, e = extended p-sheet, b = bend and t = p-turn (see also Fig. I). The number
of additional residues inlargeinsertsin
the catalytic domain, andin N- andC-terminal extensions, are shown in brackets.Each
sequence begins atthemature N-terminus; an N-terminus based on the predicted pro-peptide cleavage site is indicated as (<). An
unknown number of C-terminal residues is presented as (>). Residues 146-156 of bspara are from a different reading frame than
proposed by the authors.

508

R.J. Siezen and J.A.M. Leunissen

l $!El

V
W

u% + +

+??

.Lo

o
~
0

8 ; i i
"
-

E
M

z v ~i

~- ~

8 ~ 8 ~ ~ ~
x & x ~ r o n

o~ ~ S~m
x

~ ~ m ~g o g~o

'f:

za"

11

+ + +

~m

~
~

~* ~~ S ~
n n ~ ~

c5;

m
r - m w

G;

m m N w m O - m m ~ 5 ;

+ e . + + +

" " h

m m b d m m N m m m = = G ; 5 ; w 5 ; N w m
~

r
N
P
~
~~o ~
g . M~
m m m~o g ?~Z
~~ S
f Si ~P S ~ -

~
r -

"

. v

"

Lo

.z

2 2
k 7 ~ g g Z C ~ ~ -~ 0e0 . 0G0 "- "o wN - ?0, g & . z z r-ss

"
"

wrm m ""-om=""

o s .

"
"
h

? & J m = o N m m m o g m o e

"

e.

m
P

- N

I C

moo

oo z
m b m m ~ w ~
N P P P m $ o m P m

w o o P m N

m m ~ ~ m m m m m e e e m m m m oe o

4 c

-2 .%s
2

w N mN N
~

d N zW ~

2w m6mZm -2~ Z~~ 8edS - % ~S b

--

~ x sx xs x sx x ex xs x ~x e.
s s se. s s s s- ~ s ~ s s ~ s Ex x sx s s ~ s
x x x x x % x x '
e.

sy

e,
5

oe,

4 -

9 1

E8

ii9
"0

,gg

z& 1w3z4 w
.ss .zsma m :

8:s

.It:

-e,

: $4

Lo

E
m
& $
z &
z zE w
E
g
gs *c2 4 z
* &
$8
b m u n s s
$ 3
5 3
$ L
2
a w
.s-,z$
g
w E Z B P
2 Q
2 g
%
%W"g;~cnnsswe,~u"
m n ~ l : , ~ F F ~ P ~ . ~ ~
E w. B~a iB a
e w~w a ,
f&j5g!$;sgc
w w " 2 c m
w D.83 8 3 z $ g
8 , B 3 -y 8 g g ;
.-2 .g .g ,g :. ,g ,g :. ,g ,g E E g E ::5 2 g E 7 $ $ $
2 ;* % .c .$ 2w .E0 22 9 $E! ' S
.- .- .- .- .-.....-, "
.,r=r=r=r=nr=r=
s r!jz
3 u u u
gs&w- w w o
B .B- B
B ; B B ~ 8 B . . c8 .2 i2. :i+. $
ir
. : ~. -2.
- E 2 E s E 2 2 5 u s mz&.:&&m

w 5
S L
-n

l5

e,

6)

m.;

9)

O
C

<

.+

Q.;

U "

m & @ "

e,

P
z

P?

$5

L?

Y??Vqu

+ + + 4 $+ + p 9 p $3 +

L,Y 3

'"a

*vi

01

3 3.94 g B . 3 + 2 . 3 + 2 +

Q a ,

Q.-

s X % % % 3 % 5? %
G g 3 %% s k ks
h

2
W

e4

CO

.%

V,

uZ

N
m

z,

.E

- - a

d
\o

-8

W Z

m
$

-gggB

v)

'I)

% % x o g z

2EE

Ey

2.5 p

3.3 .3 4 2

e! $

v ) , , - u a g q z = z

hl

2 %

'Du

L k
$g$,z

m g

t - l m
x
.z yp?m.c2 4

m .Z

&
*"
-s,oE

E
.*
B

y 'Z

-- u2

8 &

R
8
3

a , s2 ~z *ps vS )A 2~ g% g; sF $~ ~sZ g".- &Q s- 0s 4s.z.4 M,Z9 f9.Z' -g $3g ! ! 2 v ) % G p p $ $ $ + a


s 6 2 F z B s s n .~ mL m
. , uS u. ~u .u Eg 2E 22 uE 3u 3
u 2
= 2
s.-& % *Z2$JL&.$$
u
u
J
p
s
r
r
s
s
p
a
u
a
a
s
o
p
,
o
a
Q
a
Q
g
.
g
bg ig %r n% 3%2 z2 %r %
r n%2 %
2 2% % %
z % z % % % % % % % % % $ $ u$ c2 cn ~c.. 4' ~~4 4~ ~

$ 0

rj.

v)

ill
P
r(

v)

v ) v )

$ a

v)

(0

11

3.23

v)

k Q Q Q Q Q Q Q Q Q Q Q
*

LoLo

z h 2 s 3 FQ zQ gg qs m& =& &


u -& 3% $
$ B
*

m % 8 0 L o m m m

4 -

gCa ~

a ~a

* *

"
, ~a ~2 ,~aB~ ~= ~
w as.?&EL3

b ~ % ~ ~ ~ % b % ~ ~ $ $ $ $ % $ ~ $ $ = = % = = = s -

509

Subtilases

k4

.-2

B
P % s q g g E g s % ; z g gN pm gt w m m

g~ w w

s.pmvi %

NOS% m
-vi-t-P-0
s s g~. I>=aI - s
~ i d x E o o x a ~ Z E a 5 ~2 ~ o~a m
+ + + + + + + + + + + + + + + + + e .

"
"

.I

"

"

"

"-

e e e e e e e g e e . s g s s g g s r .

Z Z Z Z Z Z Z X Z
w
w w w w w w w w w wX wX wX wX wX wX

-8

e,

e,

e,

qH.4

.e

"xnaS g

e,

a g g 3

-g g g z z "

-5 =0 z-

u.5 3
2 %
2 g

"
"

"
"

" " e . r mn rmnt "- 2-wAmm z- zo l- 3- z


l o w

e r - m ~ m m m t - ~ m m m
m m m \ g t - w m w
0 0 0
m m N m m m l o m * * e d

sse.;sg
GLG L G L G L G

w w w w w w w

A - A -

E z g g s s g e . e g z g

.d

wZ wX wZ wX

g a

E . 2%
2
e, e,

-2-

g5

-2 %

g 2

O L
rm no <o .< a w% w& Zm z ~' &< " + m 7 ~ ~ g g S & uC:
g

?F
k325xkx;%Q
533++++2333&2:232+

e,

-g g -

.:-.o, ~ g + + = < , g f @ . g ~ ~ $ , 4 2=. g>

3
2 2 2 $ B

akz

V u

8v i "v 6i mZmZ~ :m3m -

2 2

-3.

E2

.=.e%.e%w

rz zr p_ exzr<~ i
$$2&2%

Y-

PP++++

e , M

P P P

$ &m w mP gm 2
m m
2 2 2

k 4 r z l " g k w e , E 2 E E

e,e,w&a%'ssnnan
g g g

e,= $ L e , w

e,

m . ~ . s . s ~ z z ~

.e .e0.e w 0gj 2 4 e, e, 2 .e .e .e
gsss.Yse,w
& & & & 2 u m m 7 < < <

% a% + 3 2

+ + h,& +

Q +

,4
Q

"0

CQ

00

2
.5 0
u u v s 1.
6
q
.
o
u$9
2 a 2 8 8 $ 2 :
T T - : p s 2
:
" u r n E
O
0
E
5
a
,
i
s
: $ : 2 ; z 2 E Z Z Q $ $ Z % S E m ; 8
~ k i k k - u P x * " Q $ 2 . s -s " z c c
Q

-s

c 2.-

m
k

z 2z s= p'E .E$

.z

,
i

. s m

2.: $ , ? & $
2
.a UJ

* $g

' C 3 Q 1 .

BE

.-1

g g & 3
a4a."

Qg

v f 2

.Y k s k $ $ . %
u 2 * %.??,.??% c g

:. $ 2
u %
z~ z z z s
u
u
u
g QS L Q e
~
E E E X $~5 ; 5 ; ; ; ; ~
p ~~ ~f uu 9 u. ~
u ' 2 ~ 2' ~: F+ i EI e, Ea . , ? Q& Q
. Q ~ un- =: S4'%
i 'kT, X
P8
z 8 8 ; + f ~ ~ ~ G y y p
~ ~ d % ~ k i k $ p p 2 ?g % kE $
2 E
0 4
S W
M 8 'k9 e e
.g- ..-2. -p. -. E 2
x u
2 gu u9 u%k3 g g . g .
~ 4 6 6 X w w m h h ? q P % ? 4 4 S Gk - 2 w a ~ ~ $ . $ . ~ ~ 3 g h h ~ ~ g L q $ 4 q c1
x *
* *
u * *
* * * * * * u o * a * * * * * *W k
* * * * * * *
*
Q

zss.sE:5=E-zzz

$g3

lom

= % E +zF *2 g %g %
25
:g=%

eLI a> %
> %
d k%Z*
oC

uC u
U

sFi%$zzgg&gzz

P H

z 3- -3 s2a2kgkg. -s f g P < PL gz g . 25 . 3S g2
W

ar=i
w w

,E %E Ee 1,

"

m m - - o - t - * v i m m m

8w

% E

v> i>N 2g. ig:

"

""

X E E ~ G E E Z S Z E Z

z *o oz *zmw' D -

gGr??zEgggY-

- m c m o ; z :

% z s ~ s g g ~ ~ g ~ g g zQ ' =g ~Q '~z Kgemxt.z2gG


m v i m O v i w e ~ m m - b P - m N ~ O
ow l o
m, , pm? m
mm
O N N m e N m N
m
q q l o * P - w v i m w m * v i m
A '

" 8

"

"

+ + e . +++++- + + + + e . + + + +

? 2 2 g g g g z s & w g m g % n e ' *vi

e.+

"

- * m - m t 2 - Z m t - B m m

t-vi

m 8

h
g3%

m
s>
~

m m

gg
r $ r5 n

Zjag~s
m g * o

h->g,S
e

22ezaGKpx.N

& " L

a a a a a

&a

& q :
Z

2 i

510

R.J. Siezen and J.A.M. Leunissen


m

f R
U

.-Le

cc

m m m g< i gS g- $J gZ
3

a a Z x

5.g

izg

+ + + +

g g Ng N
g
N

gg

m m m z & - $ z g
O
O b N O~ Z br - m
m
p wg m ~

+ +

e , + + + + + + +
+ +
e . + + + +

N W

m m

"

e,C,

"

y e m o w m

2:
%F
m
3 g = % % g
E Z
E S
S

r - r - r - o m

m 0 r - m m ~ o r - G
m m o o m
orno
m N g l - r 0 3 " I $ - N ~

J -m m
c nmc m
n NNNwm

g E gW W
?

EEH

=ss--r4

"

;2

.g
d

" V

"

0 '~

"

.-

-~

3 6 3 3

.-

"

553e'

"

m m m

"

"

AvC " '

"

a m

"

"

2 10- m
2

" V

cn
m
r-

b Nm- b
m
m
W a W ?

e.

e- e. e.

%;;e.?;;

e.

0 0

2m . 5a0

2o
3
w

o >

.e

g 2 8

.-2 mi

-g _g

2k gg g2 :
z. wk

8
m

8
m

.ew e.e,
d d

Z 3 - y -

l z < < E

2 %

.
E .
g
C

~ c .. e . ~
c . e .. e . C .

.ccU

m
.e

'I

w
22

L- g

i w ' 5

g2 2q 4a 0s g E 90 8
2 s.;mE: 8 5 g
8 > ;. w 2 $ .g .4 .2
- c m 9 2 w ;;i 2
f

cx

"

.i:

@
J

o 8 8 6 , S I

x'C:D

e.

C.

C.

e.

e.

-E

-m

W
J
i
.E

.e k
Q

gw

$8

c
8
9

.g.g ;. ,i
;.
-

<ws

4-

g z g z

LU.$ZZ

Le

4-

a
x

.-W

gk k.cg k.c
gg 2: z
W

.-

.c:

.e

9d:62&&&<a ddXE34

C.

c0 z
= .M
g

as:

2 8

8$ 4
8

N O mW Q
m
~ \ mO
m W " m N * q

c' Cn I eWw- em m
- mr r-r-r-r-r-r' A

"

o w

I
i

mr-cnmr-

+e.

kN

~
C

"

.
2
!g g g g

!3

~
v

- m W b b b c n

$?.

m 0
0
~8 0
m
$ 0
gm 2 m m - T t W
* * b e
r - b r
- A "

- t &E; m b~ o &t
h

N r - C \ ( m o m - r - C.C.

m .o

6s

J 3 3 3

"

m m

5%-e.Zm&

?
W$B%2%ZZ

"

2
2 2 2 2
a
g ~ - o o s

-_m

"

m -

CabTtrnNrg
ggawg

12

c . r . c . + +

m m N W

M
I

arj
x

;E$$

ViN

8
,p

3
L?

8r
$

.e

%9 k9

m & b g
N$.s 2
.-

.E
E

.L

ill
g
w

E E

3 3

$ .8 .& .g

.z

-5
3

.&

,2 .2 ."-

. 2 : z s a s s
g A
o A
o E? E?E ?E ?, ; , ;

-z g $ $ $ p $ $

K S 3 A 2. h : > g g
$ g s p Ep Sp Sp .E s . s

E 2
2

Q J
*
"

U U

%
2 % %
u u u u

2 2 2 %
3 2 2 k

?.?ZZ

Q J Q J Q J U

SL: Sr Sr . oa9

g 2 24

gE -usaszr s

.s

X%++

rr:"ul

.- s g
s g

2 g ,s
ssszffZc 999
sE Y%
I
z . ~ . ~. 2r ~ . r s $a ? ~? ? ~?
oa.2

'=:

c
1

"

"

"

P E
a %
3 *2 t 2 . E . Y k , g O N 2 2 2 2 g g E L $ g U
%
~
~
%
w
*
~
~
a
& ? . U S 0
~ , ~ B B B : ~ ~ Y . 3 ~ ~ c . ~ . ~ Y ~ ~ ~
; + E + . E - G ~~ & G S ~ ~ ~~ S E E ~ I a a a + ;+ g z ~ G ~G G G
*
* * z * * > ,* x
* *
* ~ * * * * * *
* I * *
u s z * * * *

4 Q

2 2$
%
in

o\

s : - $ + + + 2< ,2z + +
m 2 m

9 9

9 9

u o . r a

Q ? $ q Q T y

3 + . $ 3 Q g g , % @

$5

3 ;.$$ $ 8g &z
5

+ a & ?

gi

sPe B

%:E$

$ 3 -3g&s$gsy22
-s $ 3g nz xX Az gx . x4 g=$ 2e, sg gg si E
g

u v

" 3

$$$%S
z $ g L L

%
f

s "w ow %
g
w

c
w
o

tac

51 1

Subtilases

O m m N u l P

zm zu F. zm 3~r mF m

; g s

;g

~ l n a x d d

5E 3X

+ + + + + +

+ +

"

"

35s

m
m

Q W N u l m N

"
"
"

r - e - r - m m
rmn tN. O
W CmCPI uNl u

"
"

\ o m
"

sz g* m
s u~l z
0m
g gmP$q O

x ~ x g n 3 s ~ 5n

+++

+ + + + + + + I +

r - \ O W

ul

rW- NS *m
u.GGTS
N * O I * *

m * m

3 3 3

M M o

mu.

0 0 %

e'

' D ~ . ' D u l 3 ~ s N " '


v

'D"*'D"Q' 2

"
"
"

"E,=,"
"

m r - v i r n * f !

l n r n O u l N
W r n r - W W P

"

80

"

"

" +

"

m u .

"
"

""

"V

" V

PS3$~'DO-O
2
c-.
3 6
v

m * V l

m m
P a

ul

" 0 "

* u . r n W v i ~ v r u . N

u.wmrnul

P.

r . u . r - o w m F 3 .. ~ .

vi

"

e, c-. e, c-. e.e.

e. e.

e,

C.

c-. e. c-. e. c-. c-.

c-.
c-.

c-.

e.

P.

.-C

e,
c

g v
Z 8

Q
E

; 2 ;

N x . S w x -

ggzggg

VI

$23

c
61

Nx

.C- N

e,

2X

22

&

+i='z,+xx

g g
P gg gmW gB Bg gz . 2
8
" " B B B O S e ,

.?:

8%

$;

g
v)

N r n -

. ' z ' N * f E , & m

* g B b p 5 P.2
isgggg3gz

3>

+ + +

z , +2z 2: + + + + +

ggg

z-

c
e,

ggg

z g

.-i s g
o a ,

g g g g g +: < g
g 2

6 % ;

8 ;

E=

.-z

I\

&

.-z

.-2 .$
Q U U Q
.z .z .o.$
.o .o
73%
E:: E E
r : 2 C C Q C

$eo".>>>>
s e Q Q Q
p z $r: 2z . GQ . S . Z
= F E ~ ~
D

g,g

5r:
3

9.5

9 9

isszg
P

zg.5

$ 5

2 9 s

2 2 Q *

55.3
S

E$$.-EqQ.Q

g
z

8 %

p x? *3* $* $* 9
$4t *n
*

E
E

E E

&

9 8
P

e,

m~ m ~

rn
c c w
m ~m m ~y i
E

5:

B
i

cis
bg
w=l

'-

.BE d B

: 2
~ z z x x E z s s z ~ $3u !"
3 >

u *

5 *

E Z L

sa,-

$2 2
N T p y Z

-g g 2 p ; g

E2

$5

& me Z

282

xm

2 *O mZ N * w g g z
2 sp ,sp s& E
*
E s
g 2- 2 2

gsz

29&

* r i a

512

R.J. Siezen and J.A.M. kunissen


m

'

Y g

1 e +
)

SF

z 2

E2

.-0
m

gs3zz z g g z g

Q$$

*mvlz

- * w

8 zZ ga sE zo :E

; q z m m s g N % Z % g s 2 % r$- 2m *2
58
g c 3 % r-zs
133 E E I P X r ,

R-C
p !mC !m! Sm! z s g s m
8 8
m 2
p 2
g m x x x E a a a Qnn z n e x 2 , E x 2

Z N s
x 3 x

e,

se!

N N N m

e
0
2

m m

0'"

2 8 ;

-00

mrn

213

z
Q

y1

$ .?i

" m e
m m
* m m

n I.

22 2%

E.
a

Eg

2'5z 2

2j

D
s
u

P P P S

45

45 45 2% Pi

:@
Q R u ,vuu
,<,zg

Q\

m
a

-o

3
"

g j :
S

w- E
a % E $
c m " t n

1
"

$e,%%

L L

$ 3

e a

3 $ .E:2 . 33 . Gg

gg ggg
.e
.-.-.-.S s % 2 .-222 .- .s s a s l 2.Zg$$ z.sgz a s 2 P P $ 2 :
ssss:E s4:ss s 2 z s $ 3 7 a m u u u
c

'j;

,E:

vl

c 'c
'C
j;
2

'j;

-2z2z2z

e,

E
._

.-E
m

.E:
c z . 2 . 9
.E:
.E:

.E:

Ee, 3

.9

.E:

"

-009
" I

e,

"
"

.s

& & eZ eZ eZ

-"
x
x
&
&
K c K K u u & & e
8 95 9% c c c
9
g
p
3
.- .- ;;n n n '5
.s .5
e: , M A 3 3
Pm m
P

e,

E EQ EQ EQ

E E E E

m a

K K K
X X X

"

u u gZg

L L L

39

P m8 $23
0 u
0
u
m

$L1L141

2 %

*Q ~$ n~n n2 %r % nt

: %

k g 9 5

B + B B + + s B B ~ + + + I + B + + ~ , ~ : ~ : : : ~ ~ ~ ~ ~ Q + +

._
0

*r

&
_a

00

r-

E
.\

* a

.E Eu

*r

3
u

53

*8

.-

a 2 b , = E*
xs s3 z9 3 z
.y .y .y .y .y

-2

0- 0x'

2 -

13
0

z o 2m %m

0 2
0
g
E N
M n
U . *G - *
N
2
2
,
p
5 E & 232.8
-,.

*i

u u *

3
9 2 2 7
3.2 Q Q %

. y . y , y p

C 9
Z 9
C Q
9
*
4

$ 2 2.:a8
=

s s
~
j
z
<
~
~
$
~ 2 9 2 2 p * * g- x m
~ ? g~
~
22$-$~~.-~;~~~$~$gg-$~ g g g Z3 Zx u2 u9 x 2
2 2 2 2 2 .- 2z 2z 2z =
2 2l 2z 2z 2x 2z 2z =
2 2z 2- .-2 $ $ S e e e s 9 s $ 2 g 5
z z z x :
u u u u u
u u u u u
u u u u
u u u " C ' G E
Q EQ E Q S S H < < % % a O P
G G G G G G G G G G G G G G G G G G G &jdi;j 9 9 9 9 6 6 4 4 X I P d

ss

-._2

*
'1

E
G b
k 2
p "= *

s
2
E
f
f
E
E
S
4:
8 Z2ES.Z
3 %--I:
e %- 3r: g g g g g 4 i:

'1

Y)

a -

$a 5z u
2

.-

00
2
\o

'i:

A
5
._

--

.e'
._

;.>

. - E E

ks

.s$a
s2 a4 p
2

g
+ aj zs n
s

+2 %
=

2:s

2 9 %
E E S
a

2
!?

E
CI

00

.5

z
2

2
x

rn

r-

* N

a
-ra

E E

2L Z6
8JE

8%
iiia

9
ci

X X E

mrnE 5

$ g
9 s

L.

513

Subtilases

0 - m

gm ogp gW
X X N

g 8 8 %8 5

Zgz? N s g g
P N r n
;=F
p % m
zwg.0
$
$
$
gg
WP-F:
m m F W
X P E D E E E Z
~ Z 283 8 2 5:s E E

mu20

r-mm

00

N m

m m m o o
g m mg
WEm%
. l

zz

2 %

555

"2;;

r-

.2al

8
8

.c 'C

22

C C
' C .c

3 %

8 8 8 8

.-z g

C
0

.5
2 .5
2

Ea $ $ ? !a

5$ .5

0 0

u
o o a
C C C C

E Z

g2 2g

&E

k3
gg
a a a W - kb a
E Z.E.E.6B 6B 6B 6S
.
u u P - & d > > K .g.g
E g g E E g g 2 E g 2 2 g E2 E E E 2 2
m-mmm

a a

C)N

""

"

+++

6> 5>

o a l o a l

222

5:

3 8 8 3

>
C >
C >
C >
C

P)

'C

C C , C
'C'C
C

$%S$
5665

N N N N

u u

S B + +

"OGN

B S F

+ + + + + + :d: + + +

c~

EE

.g.s a
2 2 AE &G &G -g,

22

5 5 2
x *

z+ ++++

2-8
3 k

.y .2

f %

z g s sgz.9
g u u =:E&&

2Qo>
:
2 8
f %
:'E fal
E&+
rs Eg ez :. a9 !&.,a
- 0

,y
$ ,y5

5EE *.@3=A

.Y

s s 3Q F.
9
2
5 %
35 2 2 %
g z g g z E L : s.2 g z g z
Ei% X E e : $ 4 Q + = & E *

8g

Ei

-kz? 2
m

$22

22%

8::

E E ,E S D

E %

g g8 Zg g 4
m

P *
@

8
P

%%,x
E E x

v1

$
6
.d

B
c

B
c

*%

-% S

m
i
E
Zv

" 8
0
E
N

~x

5
6
Z

514

R.J. Siezen and J A M . Leunissen

Fig.3.

Pair-wise sequence identity matrix. Sequences are plotted vertically and horizontally in the same order
as in Figure 2; the
incomplete sequenceof hvccvp is not included. Subdivision into families
A to F is indicated.A color codebar for percentage sequence
identity is shown.

lase families.Asubgroup of yeastenzymes is evident,asare


subgroups ofPC1 ( 2 3 % identity), PC2 (>73% identity), and
furin ( X 5 5 identity). In catfish herpes virus 1 a related but incomplete amino acid sequence has been found that is presumed
to
have been captured from a host (Rawlings and Barrett, 1994).
Several 3D structures have been predicted bymodeling (see
below).

Conserved residues

Highly conserved residuesare listed in Table 4 and highlighted in


Figure 2. Only the essential catalytic triad residues D32, H64, and
S221 and a single glycine residue (G219)are totally conservedin
all sequences.Four other glycine residues(34,65,83, and 154) are
varied only once or twice; G34 and G154 have main-chain torsion
angles that do not allow for amino acid residues with side chains.
At several other positions the variationis limited to two or three
Pyrolysin family
residues,whichareusuallystructurallysimilar.
In general, the
residues of the two internal helices hC
and hF are the most highly
Heterogeneous group of enzymes of varied origin and low sequence conservation (most <37% identity). Characterized
by large
conserved in all subtilases. Three amino acid sequences (lslasp,
insertions and/or long C-terminal extensions,
many with sequence
sepepp, and asaspa) are particularly poorly conserved; althoughit
homology suggesting common ancestors. The most extreme exam-seems questionable whether these enzymes are functional, a muple is llspO9 from the plant Lilium with insertions totaling more than tation analysis of thepepP gene suggests that it indeed encodes a
260 residues compared to subtilisin, almost doubling
sizethe
of the
functional protease (Meyer et al., 1995).
catalytic domain. Subgroups
of tripeptidyl peptidases and plant sub- Many more residues are totally conserved within each of the six
tilases (>37% identity) are distinguished; the former are of higher
families A to F, andthesecan be used to identifynewfamily
eukaryotic origin, but only
the human and mouse enzymes have ac- members. In particular, familiesA and C are most conserved, with
tually been identified biochemically as tripeptidyl peptidases.
a total of 32 and 41 invariant residues, respectively, while family
Several 3D structures have been predicted by modeling (see
E has 63 invariant residues if the two more divergent sequences
below).
(asaspa and avprca) are excluded.
Residue N155 (in a conserved segment 152-155), which helps
Several other subtilases have been identifiedfor which only the
to stabilize the oxyanion generated in the tetrahedral transition
N-terminal or other partial sequence of the purified enzyme is
state (Carter and Wells, 1990), is not fully conserved. The only
available; based on sequence alignment with Figure 2, these subaccepted substitution here is N155D, as is found in the PC2 subtilases presumably belong to families A, B, and C (Table 3).
group of the kexin family. The effect of this substitution on the

515

Subtilases

FAMILY

Families
"true"
subtilisin

high-alkaline

17-

Subtilisin
A

Lantibiotic
peptidases

7 1
intracellular

Fig. 4. Family tree or dendrogram analysis of the subtilase superfamily, based on sequence alignment of the
catalytic domains only (Fig. 2). A: General layout of
the relationship between families A to F. B: Detailed
dendrograms of the individual families, in which branch
lengths are in inverse proportion to the degree of sequence similarity.Not included are members with >90%
sequence identity to one of the listed enzymes (see
Table 2). Trees were constructed using the neighborjoining method of Saitou and Nei (1987), as implemented in the programs NEIGHBOR (Felsenstein, 1993)
and GROWTREE (Devereuxet al., 1984). The distance matrices that were used as input for the programs
were calculated using DISTANCES (Devereux et al.,
1984), PROTDIST (Felsenstein, 1993), and HOMOLOGIES (Leunissen. unpubl. obs.). Positions containing
gaps were ignored, as were the large insertions indicated between brackets in Figure 2. Whenever appropriate, the distances were correctedformultiple
substitutions (Jukes and Cantor. 1969; Kimura, 1983).
All methods used delivered in principle identical topologies, except for the branch lengths; these may vary,
depending upon the method used to calculate the distances between the proteins, and correcting for multiple substitutions.

Proteinase K

Lantibiotic
peptidase

Kexin

7
7
3

gram-negative
bacteria
gram-positive
bacteria
plant

Pyrolysin

tripeptidasr

2 thermophile-

catalytic efficiency of these proteases has been investigated by


protein engineering (Benjannet et al., 1995; Zhou et al., 1995).
Homology modeling

The procedure for homology modeling and protein engineering of


the catalytic domain of subtilases of unknown 3D structure based
on known crystal structures was described in our previous review
(Siezen et al., 1991), and can be applied to any of the enzymes
listed in Tables 1 and 2.
Modeling should be based on the known crystal structure of the
most related enzyme, and this will be straightforward for members
of the families A-C, because 3D structures are known in each
family. For the families D-F, with no known 3D structures, modeling will be less straightforward and can be based on any known
structure from families A-C or a combination of these. Problems
will arise where large insertions occur, because these are still impossible to model reliably. It would be extremely helpful for mod-

eling purposes to determine the crystal structure of at least one


member of each of the D-F families, preferably those with large
inserts.
This homology method has since been refined and applied for
modeling and engineering of (a) the cell-envelope proteinase llprtp
of Lactococcus lactis (Siezen et al., 1993; Bruinenberg et al., 1994a,
1994b); (b) the lantibiotic leader peptidases llnisp of Lactococcus
lactis (Van der Meer et al., 1994; Siezen et al., 1995a), and efcyla
of Enterococcus faecalis (Booth et al., 1996); (c) the kexin family
members furin (hsfur: Creemers et al., 1993; Siezen et al., 1994)
and PC2/PC3 (Lipkind et al., 1995); and (d) the heat-stable proteases pfpyro and tsplst of the hyperthermophiles Pyrococcusfuriosus and Thermococcus stetteri (W. Voorhorst, A. Warner, W. de
Vos, R. Siezen, in prep.). These studies have provided predictions
and evidence for inserted and disposable loops, disulfide bridges,
&'+-ion binding sites, surface salt bridges and networks, aromatic
surface clusters, and residues involved in enzyme-substrate interactions. Some examples are discussed below.

516

R.J. Siezen and J.A.M. Leunissen

Table 3. Incomplete amino acid sequences of subtifuses


Residues determined
Organism

Enzyme

BACTERIA
Gram-positive
Bacillus subrilis A50
Bacillus sp. (3x6644
Bacillus sp. Y
Bacillus thuringiensis israelensis
Bacillus thuringiensis finitimus
Bacillus thuringiensis kurstaki
BaciNus cereus
Bacillus intermedius 3-19
Nocardiopsis dassonvillei (prasina)

Acronym

N-term.

Intracell. serine protease


Subtilisin GX
Protease BYA
Extracellular serine protease
Extracellular serine protease
Extracellular serine protease
Extracellular serine protease
Alkaline serine protease
Alkaline serine protease

bsia50
bssugx
bspbya
btisra
btfini
btkurs
bcespr
biprot
ndapII

1-54
1-16
1-2 1
1-14
1-15
6-20
1-15
1-15
1-26

Streptomyces rutgersensis
Thermus Tok3A 1
Vibrio metschnikovii
Cochliobolus carbonum

Proteinase D
Caldolysin
Alkaline protease VapK
Extracellular protease

srespd
tscald
vmapk
ccalp2

1-23
1-15
1-36
1-29

EUKARYA
Fungi
Agaricus bisporus
Malbranchea suljurea
Ophiostoma piceae
Verrlcllliu~ ~hlamydospor~um
Scedosporium apiospermum

Extracellular serine protease


Thermomycolin
Extracellular protease
Extracellular protease VCPl
Extracellular protease

abexpr
msthmy
opexpr
vcexp 1
saalpr

1-19
1-28
1-18
1-20
1-13

Other

Family

References

Strongin et al., 1978


Durham, 1993
Shimogaki et al., 1991
Chestukhina et al., 1986
Chestukhina et al., 1986
Kunitate et al., 1989
Chestukhina et al., 1986
Balaban et al., 1994
Tsujibo et al., 1990

223-243

223-243

Gram-negative

Table 4. Highest conserved residues in subtilases (v


Residue
32
34
64
65
68
69
70
83
90
125
152
154
155
189
193
20 1
219
220
22 1
223
225
229

u = I

G
S

G
N

u =2

u = 3

217-222
170-193

C
C
A
C

Lavrenova et al., 1984


Freeman et al., 1993
Kwon et al., 1994
Murphy & Walton, 1996

C
C
C
C

Burton et al., 1993


Gaucher & Stevenson, 1976
Abraham & Breuil, 1995
Segers et al., 1995
Larcheral.,
et
1996

variability)
Context/function

Catalytic triad residue


Bend; 4, @ = 99", 179"
Catalytic triad residue
Buried helix, close packing
Buried helix, close packing, directly under catalytic triad
Buried helix, close packing
Buried helix, close packing
Helix/turn, close packing
Buried fi-strand, hydrophobic packing to helix C
Bend, directly adjacent to catalytic triad
Lines S 1 pocket
Lines S I pocket; 6,@ = 114", 163"
Oxyanion stabilization
Turn at surface, side chain turned into pocket
Begin turn
Bend at end &strand, hydrophobic ring stacks with H226
Bend between e9 and hF; 4, @ = 147", 160"
OD1 H-bonded to backbone NH-154
Catalytic triad residue
Buried helix, close packing
Buried helix, close packing
Buried helix, close packing

Exception

N (Islasp), A (smserp), P (smsspl, smssp2)


del (asaspa)
M (Islasp), I (sepepp, ddtagc)
G (nahlys), T (bsbpf), I (sepepp)
T (smstab), A (smsspl, paaf70)
A (Islasp), T (efcyla)
W (bsbpf), M (seepip)
P (Islasp), C (hakx2), T (acfurl, bcpc2)
M (sepepp), del (bssepr)
D (Ispc2, bcpc2, cepc2, hspc2)
del (sepepp), S (smserp), L (bspara)
Y (=pew), D (dmpga9). T (vmvapt)
I (seepip, smstab)
N (sepepp, Ilnisp)

G (Islasp), S (sepepp, ddtagc)


T (bssepr)

517

Subtilases
Large insertions and deletions

The 190 residues that constitute the scrs, as defined from the
known crystal structures (Siezen et al., 1991) and shown in Figure 2 are present in nearly all the subtilases. Some unusual deletions are found, however, as listed in Table 5, and this implies that
not all of these core residues are essential for proper folding. Most
of these deletions occur in subtilase family D, the lantibiotic leader

peptidases, and include large N- and C-terminal deletions. All but


one of the internal deletions can be readily accommodated by
connecting residues that are spatially adjacent in the 3D structures
of subtilisin/thermitase. Particularly interesting in this respect is
the natural deletion of the Cal-ion binding loop, residues 74-82,
in the Enterococcus subtilase (efcyla), thereby presumably extending helix C by another four residues (Booth et al., 1996); this is
precisely the loop deletion that was engineered into subtilisin to

Table 5. Large or unusual deletions and insertions


Unusual deletion
Missing residues

Context

1-13
65-66
14-82
96-102
180-189
257-215

Family

Enzyme

N-terminus, hA
Part hC, adjacent catalytic His
Ca-binding loop + hC extended
Turn, substrate-binding region
Turns
C-terminus, hH

sepepp
asaspa
efcyla
smserp
sePePP
lslasp

Large insertion
Inserted residues
Position

Number

N-term.

Up to 98
59
34
18
30-33
28-30

vr 1
vr4

26-3 1

vr5
vr6
vrl
vr8
VI9

vrll
vr13

vr15
vr16

vr18
vr19

23
147-213
30
42
16
51
34
18
22
16-18
134-169

13-15
21
20-22
149
21
22
20
19
38
34
25
22-24
21

Properties
No homology
Highly charged
Highly charged

Weak homology
High homology

Family

Enzyme

E
F
C
C
F

Most family members


spscpa
scyct5
scyct5
spscpa, Ilprtp, ldprtb
dnbpr, dnavp2, dnavp5, xcproa,
alaprl
llspO9, atserp, cmcucu, agserp,
lep69, paafl0
smsspl, smssp2
pfpyro, tsplst, dmpga9, hstpp2,
CetPP
PfPYro
phssal
anpepC
smserp
smssp I , smssp2
phssal
slssp
seepip, llnisp
spscpa, Ilprtp, Idprtb, bsvpr,
lep69, paaf70, atserp, agserp,
cmcucu, llspO9
ddtagb,ddtagc
PfPYro
efcyla, seepip, llnisp
vmvapt
asaspa
smserp
bsspra, bssprb
sy0535
cepc2
asaspa
bswpra
ddtagb, ddtagc
slssp

Medium homology, conserved S-S bond ?

High homology
Weak homology, see alignment in Fig. 5

F
F

Highly charged (50%)


Highly charged
High homology

Weak homology
Weak homology in central section (Fig. 5 )

High homology
Weak homology

F
F
C
F
F
F
F
D
F

F
F
D
A

S-S bond ?
High homology
S-S bond ?

High homology
S-S bond?

E
F
F
B
E
E
B
F
F

518

R.J. Siezen and J.A.M. Leunissen

obtain a Ca*+-independent,faster-folding variant (Gallagher et al.,


1993, 1995). The only unusual deletion comprises residues 65-66
of helix C adjacent to the catalytic His; this deletion is not due to
a sequencing error (G. Coleman, pers. comm.).
The vrs essentially comprise all the connections between conserved elements of secondary structure, as shown schematically in
Figure 1. While the positions of vrs are essentially the same as
defined before (Siezen et al., 1991), the length of these vrs is now
found to vary considerably more. The largest and most unusual
insertions are listed in Table 5. Some of these large inserts are
unique for a single enzyme, for example, pfpyro (in vr6), phssal
(in vr7), vmvapt (in vr16), and cepc2 (in vrl9). Large insertions
occur most frequently in vr5, vr13, and vr16. Sequence conservation in large inserts is frequently also apparent, particularly within
subtilase family B (inserts in vr4), as shown in Figure 2, and within
family F, as shown in the alignments in Figure 5. This is further
evidence for a common evolutionary origin of subgroups of enzymes that were alreadyclusteredtogether
in the cladogram
(Fig. 4). Also note that the inserts in the kexin family E are not
large, but they are highly conserved (Fig. 2).

vr5

dmpga9
hstpp2
cetpp

-l10l~GNIKGLSGNSLKLS-l1G3!-YDCILFPT~GWLTIVDTTEQGDL~l38l9)-GEIVGLSGRVLKIP-( 95)-YDCLWHDGEVWRACIDSNEDGDL-!40)- ! 91-GVIEGISGRKLAIP~I


96)~~WTWHDGEMWRVCIDTSFRGRL-!401- (

total
168
181
182

vrI 3
total
llSP09
agserp
CrnCUCU

paaf70
lep69
atserp
bsvpr
spscpa
11prtp
ldprtb

-VKGKIVMCDRGIN-- - - -ARVQKGOWKMGGVGMIUUUT- 1 5 2 I
-LTGKVAWKRGSI-----AFVDKADNAKKAGAIGMWY~-I461-VKGKIALIERGDI-----DFKDKVANAKKAGAVGVLIYDN-1491-AKGKIAIVKRGEF-----SFDDKOKYAOAAGAAGLIIVN'-1561-

C O ~ S ~ ~ S U S

*kgk+***
g

169
135
140
149
147

SERINE PROTEINASE
(SUBTILISIN-LIKE)

218

Fig. 6. Schematic representation of substratdinhibitor (bold lines) binding


to a subtilisin-like serine proteinase (smooth surfaces). Nomenclature P4P2' and S4-S2' according to Schechter and Berger (1967). Side chains of
the P4-P2' residues are shown as large spheres; positions of the enzyme
residues that may interact with these P4-P2' side chains are shown surrounding the binding sites (SI, S2, etc). Enzyme numbering is that of
subtilisin BPN'. Hydrogen bonds between enzyme and substratelinhibitor
are shown as dotted lines, and the scissile bond is shown by a jagged line.
Catalytic residues D32, H64, and S221, and oxyanion-hole residue N 155
are indicated. Approximate positions of inserts (vr7, vr9, and vrl I ) are
shown by large arrows.

145

134
134
151
150

aga*G****

Fig. 5. Sequence alignment of most homologous regions in large inserts in


vr5 and vr13. The numbers of additional residues in these large inserts are
shown in brackets. Consensus residues are indicated (* = hydrophobic;
upper/lower case = totally/highly conserved)

N-terminal extensions can also be quite large, particularly in the


kexin family (Fig. 2, Table 4). These extensions are quite unique
because there is no apparent sequence homology in the large
N-terminal extensions. They are often highly charged, like the
pro-peptides, but their function is unknown.
Substrate specificity and catalysis

Figure 6 shows ourschematic representation of the binding region


in subtilases, based on 3D binding data of subtilisins (McPhalen
and James, 1988; Heinz et al., 1991; Takeuchi et al., 1991a, 1991b)
and thermitase (Gros et al., 1989).This binding region can be
described as a surface channel or crevice capable of accommodating at least six amino acid residues (P4-P2') of a polypeptide
substrate or inhibitor (pseudo-substrate). Both main-chain and sidechain interactions between enzyme and substratelinhibitor contribute to binding. The P4-PI backbone is H-bonded to the enzyme
backbone P-sheet residues 100-102 (strand e1 in Fig. 1) and 125127 (strand eIII), forming the central strand (eII) of a threestranded antiparallel P-sheet. The C-terminal or leaving portion
Pl'-P2' of the substrate appears to be held less tightly as it runs
along the enzyme backbone segment 217-219.

In general, the specificity of subtilases appears tobe largely


determined by interactions of the P4-PI residue side chains in the
enzymes' S4-S 1 binding sites, respectively, with S4 and S 1 dominating the substrate preference in subtilisin (Gron et al., 1992).
These sites have the following general characteristics:
S T : A hydrophobic pocket of variable size depending on the onentation of the conserved aromatic side chain of residue 189.
SI: A distinct, large and elongated cleft, surrounded at the sides
and bottom by the backbone segments 125-128 and 152-155, at
the bottom end by residue 166 and at the rim by residues 156
and 129.
S2: A less distinct, smaller cleft, bounded at either side by residue
100 and active site residue H64, at the bottom by hydrophobic
residue 96 and active site residue D32, at the bottom end by
residue 33 and at the rim by residue 62.
S3: Not a distinct site, because the P3 residue points away from the
enzyme towards the solvent. However, the most likely interaction is with enzyme residue 101, which is adjacent and points
in the same direction. P3 side chains could also interact with
nearby residue 100 and the more distant residue 129.
S4: A very distinct pocket, between the segments 101-104 and
126-1 30, which appears to have two subsites; these subsites can
have different characteristics. Site 4a has at the side and bottom
the residues 96, 107, and 126, and at the rim 102. Site 4b has at
the side and bottom residues 104 and 135, and at the rim residues 128 and 130. These side chains determine the size of the S4

5 19

Subtilases

pocket; in subtilisin Y IO4 is thought to form a flexible lid to the


S4 pocket (Takeuchi et al., 1991a).
In subtilisin and thermitase the SI and S4 binding sites are large
and hydrophobic, which explains the broad specificity of both
enzymes with a preference for aromatic or large nonpolar PI and
P4 substrate residues (Gron et al., 1992). For further details on the
structural basis of substrate specificity in subtilases we refer to the
recent review by Perona and Craik (1995).
Variations in the substrate specificity of naturally occurring subtilases should be due to (and could be modified by) modulation of
the residues in the substrate-binding region as shown in Figure 6,
and in particular those residues whose side chains interact with PI
and P4 substrate residues. Some general predictions of substrate
specificity can be made by comparison of the multiple sequence
alignment in Figure 2 with the substrate-binding model in Figure 6.
Most of the subtilases should have a broad specificity and can be
considered as general-purpose proteases, because their binding regions resemble those of subtilisin and thermitase.
The most notable exception occurs when residue 166 at the
bottom of the S 1 pocket is an Asp, making the protease specific for
cleavage after PI Arg residues. This occurs in family C (ylxpr2),
family D (seepip and Ilnisp), andin all of the kexin family E
members; these highly specific proteases are all involved in activation of pro-proteins. A certain preference for cleavage after PI
Lys residues is observed in proteases with a negative charge on
residue 156 at the rim of the SI pocket (Wells et al., 1987; W.
Voorhorst, A. Warner, W. de Vos, R. Siezen, in prep.). The kexin
family proteases are even more specific because they cleave only
after dibasic or multiple basic residues (Barr, 1991; van de Ven
et al., 1993). Modeling and engineering studies indicate that a high
density of negative charge at the substrate-binding face, and in
particular at the SI, S2, and S4 sites, is responsible for this high
selectivity (Van de Ven et al., 1990; Creemers et al., 1993; Siezen
et a]., 1994; Lipkind et al., 1995; Perona & Craik, 1995). These
modeling studies predict that the (semi-)conserved acidic residues
at positions 33,61,97, 104, 107, 129, 130, 131, 161, 166, 191, and
209 in family E subtilases are all in or near the substrate-binding
region. Based on this modeling, acidic residues were introduced in
the SI and S2 sites of subtilisin, and this led to a specificity for
dibasic residues (Ballinger et al., 1995).
Details of other enzyme-substrate interactions that could be important for substrate binding and selectivity can be obtained by homology modeling, as demonstrated for the family D members efcyla
(Booth et al., 1996) and llnisp (Siezen et al., 1995a), the family E
members (Siezen et al., 1994), and the family F members llprtp
(Siezen et al., 1993), tsplst and pfpyro (W. Voorhorst,A. Warner, W.
de Vos, R. Siezen, in prep.). These studies all suggest that electrostatic interactions between enzyme and substrate are more dominant than hydrophobic interactions, and that they are used togenerate
a more narrow specificity for certain substrates. In addition, interactions with P3, P5, and P6 residues can also contribute to substrate
binding, particularly if these are charge-charge interactions. These
predictions have been verified by protein engineering of residues involved in enzyme-substrate interactions in llprtp (Siezen et al., 1993),
llnisp (Van der Meer et al., 1994; Siezen et al., 1995a), and hsfur
(Creemers et al., 1993; Siezen et al., 1994).
Calcium coordination sites

Four calcium-ion binding sites are known from crystal structures


of subtilisins, thermitase, and proteinase K; these calcium ions are

essential for stability and activity. From previous sequence alignments and homology modeling it was predicted that the Cal (strong)
and Ca3 (weak) sites are most common in members of the subtilase family, whereas the Ca2 (medium-strength) site is less common (Siezen et al., 1991). The weak Ca4 site has only been found
in proteinase K (Betzel et al., 1988a, 1988b).
For the new subdivision into six families the following predictions can be made about the Cal and Ca2 sites. The Cal site requires coordination from side-chain ligands of residues 2 and 41 and
from several side chains of residues 76-81 in the Ca2+-ion embracing loop; these ligands are usually carboxyl/carbonyl groups of
Asp/Asn, but can also be from Glu/Gln. This Cal site is therefore
predicted to be present in nearly all members of families A, B, and
E, because they appear to contain the required ligands. In contrast,
this Cal site cannot be present in any member of families C and D
due to the lack of loop 76-81, nor is it likely to occur in family F
members due to the high variability in sequence in this loop.
The Ca2 site requires coordination from several side-chain and
main-chain ligands of the loop 49-58, with side chains of residues
49, 52, and 54 appearing to be essential, and stabilization by the
positively charged side chain Arg/Lys of residue 94. Many family
B and E members have the elements required for this Ca2 site if
the side-chain oxygen ligands of Asp, Asn, Glu, Gln, Ser, and Thr
are considered as acceptable. Some members of families A (intracellular proteases) and C (vaproa, alapr2) should also have the Ca2
site. Predictions for the families D and F are too difficult because
in general the sequence alignment is rather speculative in this
region; however, some likely candidates for the Ca2 site are seepip,
Ilnisp, bsvpr, llprtp, and Idprtb.
The Ca3 and Ca4 sites are weak and characteristically only have
one or two side-chain ligands in the known structures. For this
reason no predictions are attempted for these sites in other proteases.
Disuljide bonds

Disulfide bridges can contribute to the overall stability of a protein,


and the introduction of new S-S bonds can enhance the thermal
stability, as demonstrated in, for example, phage T4 lysozyme
(Matsumura et al., 1989). Initial attempts to stabilize subtilisin by
introduction of S-S bonds 22-87, 24-87, 26-232, 29-1 19, 36210, 41-80, and 148-243 were not successful (Wells & Powers,
1986; Pantoliano et al., 1987; Mitchinson & Wells, 1989; Katz &
Kossiakof, 1990); all of these crosslinks were designed by inspection of the three-dimensional structure of subtilisin. The first successful thermal stabilization of subtilisin was the introduction of
the 61-98 S-S bond (Takagi et al., 1990), which occurs naturally
in aqualysin. It stands to reason, therefore, that naturally occurring
S-S bonds should provide a better choice for stabilization of subtilisins than previously designed disulfides.
Seven naturally occurring disulfide bonds have now been identified in subtilases, Le., 27-1 18[ -21 and 175-247 in proteinase K
(taprok; Betzel et al., 1988a, 1988b), 61-98 and 163-195 in aqualysin (taaqua; Kwon et al., 1988). 53-100 and 171-131[+1] in
Dichelobacter basic protease (dnbpr; Lilley et al., 1992). and 4759[- I] in Bacillus subtilisin S41 (bsta41; Davail et al., 1994).
Based on sequence homology (Fig. 2), we predict that these seven
S-S bonds also occur in many other subtilases (Table 6). Based on
the known three-dimensional structures, together with the sequence alignment in Fig. 2, we also predict that many Cys residues
are correctly positioned to form other natural S-S bonds, as listed
in Table 6.

520

R.J. Siezen and J.A.M. Leunissen

Table 6. Putative and known (in bold) S-S bonds in subtilases


Family
A

S-S bond
29-1 I4

Context

Enzyme

el-hD
Strand-hC
e2-loop
e2-loop
Within insert

vmvapt, psaprp
bsispq, tsiap
paalys, bsta39, bsta41
bssepr
vmvapt

171-131[+1]
259-263

Turn-turn
Loop-loop
Intraloop

dnbpr, xcproa, dnapv2, dnapv5, alaprl


dnbpr, xcproa, dnapv2, dnapv5, alaprl
xcproa, alaprl

27-118[-21
175-247
61-98
163-195
120-1 17[+ I]
68-224

e I -hD
e6-hG
Loop-turn
loop-turn
lntraloop
hC-hF

taprok, tapror, taprot, bbprl, fusalp, plbspr, macdpa


taprok, tapror, taprot, bbprl , fusalp, plbspr, macdpa
taaqua, vaproa, alapr2, trt4 1a
taaqua, vaproa, alapr2, trt4la. anpepc, spsepr, scprbl, scysp3, ylxpr2
scyct5
aoespr

80-2 I4
163-193
68-224
198-254
vr I
vr16
vr I 9

Loop-e9
Loop-loop
hC-hF
e7-hG
Within insert
Within insert
Within insert

All except asaspa, avprca


All except asaspa, avprca
avprca
avprca
hspac4, hspc6
asaspa
cepc2

96-102
135-167
151-224
193-197
214-75[+3]
vr4
vr5
vr13
vr19

Intraloop
hE-loop
e5-hF
Intraloop
e9-loop
Within insert
Within insert
Within insert
Within insert

atserp, crncucu, lep69, paaf70


hstpp2
atserp, lep69
smserp, phssal, smsspl, smssp2
hskiaa
llsp09, crncucu, agserp, atserp, lep69, paaf70
hstpp2, cetpp, drnpga9
llsp09, cmcucu, agserp, atserp, lep69, paaf70, ddtagb, ddtagc
slssp

35-69
47-59[-1]
49-55
vr16

53-100

Figure 7 shows a stereo view of these known and putative natural S-S bonds, but only those that can be superimposed on the
subtilisin BPN structure. Other putative s-S bonds may occur in
large inserts that have more than one Cys residue (Table 6). Of the

17 natural S-S bonds shown in Figure 7, only 29-1 14 and 163193 were included in a theoretical prediction of the 31 most energetically and stereochemically favorable disulfide conformations
in subtilisin (Hazes & Dijkstra, 1988). Two natural S-S bonds

Fig. 7. Stereo view of known and putative natural disulfide bonds (bold) in subtilase members, superimposed on the subtilisin BPN
structure (cu-carbon atom trace), Side chains of the catalytic residues are shown in ball-and-stick representation.

52 1

Subtilases

appears to be the maximum for any subtilase, with the possible


exception of atserp (Table 6). All the remaining Cys residues in
Figure 2 should not form S-S bonds, because they are either single
or predicted to be too far removed spatially from another Cys
residue.
Disulfides are predicted in each of the families Ato F, except for
family D, the lantibiotic leader peptidases. These disulfides appear
to be family specific, and in some cases even sub-family specific.
Extracellular enzymes of gram-positive bacteria rarely contain disulfides. While Cys residues are indeed rare in extracellular subtilases from gram-positive bacteria (Fig. 2), a disulfide has been
found in a subtilisin (bsta41) from a psychophilic Bacillus, and a
different disulfide is predicted in another extracellular subtilase
(bssepr) from Bacillus (Table 6 ) .

Conclusion
New members of the subtilase superfamily are being identified
continuously, with even more to be expected from the accelerating
genome sequencing projects. Therefore, this summary is bound to
be incomplete when it appears in print. The fact that subtilases
have now been discovered in numerous Archaea, Bacteria, and
Eucarya suggests that they are ubiquitous and have been around
for a long time. The novel information accumulated since our
previous review (Siezen et al., 1991) provides exciting new insights into this unique set of enzymes. Through evolution, many
variants have arisen and at present these can be divided into six
main families Ato F (Fig. 4),based on sequence alignment of only
the catalytic domains. This classification is by no means definitive
yet, as a further subdivision of family F may become apparent
when more sequences are available. Subtilases are quite common
in gram-positive bacteria, and Bacillus species stand out in particular, as many extracellular and even intracellular variants have
been identified (Tables 1 and 2), belonging to four different families. Recently, a Bacillus strain was even found to have a cluster
of four different subtilase genes (Schmidt et al., 1995), belonging
to families A and F.
What is most surprising now is the high degree of sequence
variability that is observed within the catalytic domains of subtilases. With the exception of the three catalytic residues Asp-HisSer virtually every other residue can be replaced by one or more
different residues. Moreover, it is not even clear what the minimal
structural framework requirement is, because large deletions have
now been found (Table 5 ) . Large insertions in this domain are also
quite common (Table 5 ) , andit is still not clear whether these
additions provide extra stability, binding sites, or other functionalities. In one case, Bruinenberg et al. (1994a) demonstrated that
deletion of such a large insert ( 1 5 1 residues) in Lactococcus lactis
proteinase PrtP did not impair protein folding, but it did affect
proteolytic activity and specificity. While sequence comparisons and
homology modeling can provide a first estimate of overall structure
and functionality, and are useful as a tool for rational design of engineered enzymes (Siezen et al., 199l), the high sequence and structural variabilities observed here clearly make some of the predictions
speculative and emphasize the need for more detailed 3D structural
information to complement the sequence data.
High sequence variability is also found in other protease families, such as in the trypsin family of serine proteases (Rypniewski
et al., 1994) and the papain family of cysteine proteases (Berti &
Storer, 1995), although these both tend to have more conserved
disulfides. Subtilases do not rely on highly conserved disulfides for

stabilization, and in fact, most subtilases do not have any disulfides. When these enzymes do have disulfides there is presumably
a maximum of two bonds, which can occur in many different
positions (Table 6).
Known members of the subtilase superfamily are all (putative)
endoproteases or tripeptidylpeptidases. In most bacteria, archaea,
and lower eukaryotes they are extracellular, rather unspecific enzymes required either for defense or for growth on proteinaceous
substrates. In certain cases, and particularly in higher eukaryotes,
the subtilases have developed into highly specialized enzymes of
biosynthetic pathways where they are involved in processing and
maturation of pro-proteins; examples are all family D and E members. As yet, no other completely different function appears to have
arisen for this protein through evolution. On the other hand, given
the high sequence variability allowed for proteases, it is questionable whether such a protein would be recognized as a subtilase
superfamily member in database screening if it has also lost one or
more of the three conserved catalytic residues. Nevertheless, the
search is still on, and the authors would appreciate any useful
comments, updates or advice.

Acknowledgments
We are greatly indebted to our many colleagues for communicating their
sequence data prior to publication. We thankDrs. S. Visser and 0. Kuipers
for critically reading this manuscript. Use of the services and facilities of
the Dutch National NWO/SURF Expertise Center CAOSKAMM is gratefully acknowledged.

References
Abraham LD, Breuil C. 1995. Factors affecting autolysis of a subtilisin-like
serine proteinase secreted by Ophiostoma piceae and identification of the
cleavage site. Biochim Biophys Acta 1245:76-84.
Altschul SF, Gish W, Miller W, Myers EW, LipmanDJ. 1990. Basic local
alignment search tool. J Mol Biol 215:403-410.
Balaban NP. Sharipova MR. Itskovich EL, Leshchinskaya IB, Rudenskaya GN.
1994. Secreted serine protease from the spore-forming bacterium Bacillus
intermedius 3-19. Biochemist? (Moscow) 59: 1033-1038.
Ballinger MD, Tom J, Wells JA. 1995. Designing subtilisin BPN' to cleave
substrates containing dibasic residues.Biochemistry 34: 13312-133 19.
Barr PJ. 1991. Mammalian subtilisins: The long-sought dibasic processing endoproteases. Cel/ 66:1-3.
Barrett AJ, Rawlings ND. 1995. Families and clans of serine peptidases. Arch
Biochem Biophys 318247-250.
Benjannet S , Lusson L, Hamelin J, Savaria D, Chrttien M, Seidah NG. 1995.
Structure-function studies on the biosynthesis and bioactivity of the precursor convertase PC2 and the formation of the PC2/7B2 complex. FEES
Lett 362:151-155.
Berti PJ. Storer AC. 1995. Alignment/phylogeny of the papain superfamily of
cysteine proteases. J Mol B i d 246:273-283.
Betzel C, Belleman M, Pal GP, Bajorath J, SaengerW, Wilson KS. 1988a. X-ray
and model-building studies on the specificity of the active site of proteinase
K. Proteins Strucr Funcr Genet 4:157-164.
Betzel C, Pal GP, Saenger W. l988b. Three-dimensional structure of proteinase
K at 0.15 nm resolution. Eur J Biochem /78:155-171.
Booth MC. Bogie ChP, Sahl HG. Siezen RJ, Hatter KL, Gilmore MS. 1996.
Structural analysis and proteolytic activation of Enrerococcus faecalis cytolysin, a novel lantibiotic. Mol Microhiol 21:1175-1184.
Bruinenberg PC, Doesburg P, Alting AC, Exterkate FA, Vos WM de, Siezen RJ.
1994a. Evidence for a large dispensable segment in the subtilisin-like catalytic domain of the Lactococcus lactis cell-envelope proteinase. Protein
Eng 7991-996.
Bruinenberg PC, Vos WMde, Siezen RJ. 1994b. Prevention of C-terminal
autoprocessing of Lactococcus [actis SKI 1 cell-envelope proteinase by engineering of an essential surface loop. Biochem J 302:957-963.
Burton KS, Wood DA, Thurston CF, Barker PJ. 1993. Purification and characterization of a serine proteinase from senescent sporophores of the commercial mushroom Agaricus bisporus. J Gen Microbiol 1391379-1386.

522
Carter P, Wells JA. 1990. Functional interaction amongcatalytic residues in
subtilisin BPN. Proteins Struct Function Genet 7335-342.
Chestukhina GG, Zagnitko OP, Revina LP, Klepikova FS, Stepanov VM. 1986.
Extracellular serine proteinases from subspecies of Bacillus thuringiensis
evolve much more slowly than the corresponding &endotoxins. Biochemist? (Moscow) 51:1472-1479.
Creemers JWM, Siezen RJ, Roebroek AJM, Ayoubi TAY, Heylebroeck D, Van
de Ven WJM. 1993. Modulation of furin-mediated proprotein processing
activity by site-directed mutagenesis. J B i d Chem 26821826-21834.
Davail S, Feller G, Narinx E, Gerday C. 1994. Coldadaptation of proteins.
J Biol Chem 2691744-17453.
Devereux J. Haeberli P. Smithies 0. 1984. A comprehensive set of sequence
analysis programs for the VAX. Nucleic Acids Res /2:387-395.
Durham DR. 1993. The elastolytic properties of subtilisin GX from alkalophilic
Bacillus sp. strain 6644 provides a means of differentiation from other
subtilisins. Biochem Biophys Res Commun 194: 1365-1 370.
Felsenstein J. 1993. PHYLIP (Phylogeny Inference Package)version 3% Distributed by the author. Seattle, Departmentof Genetics, University of Washington.
Freeman SA, Peek K, Prescott M, Daniel R. 1993. Characterization o f a chelatorresistant proteinase from Thermus strain Rt4A2. Biochem J 295:463-369.
Gallagher T, Bryan P, Gilliland GL. 1993. Calcium-independent subtilisin by
design. Proteins 16:205-2 13.
Gallagher T, Gilliland G, Wang L, Bryan P. 1995. The prosegment-subtilisin BPN
complex: Crystal structure of a specific foldase. Structure 3:907-914.
Gaucher GM, Stevenson KJ. 1976. Thermomycolin. Method., En;ymol45:415433.
GreerJ. 1990. Comparative modeling methods: Application to the family of
mammalian serine proteases. Protein5 Struct Funct Genet 7 3 17-334.
Gron H, Meldal M, Breddam K. 1992. Extensive comparison of the substrate
preferences of two subtilisins as determined with peptide substrates which
are based on the principle of intramolecularquenching. Biorhemistry 3/:601 I6018.
Groa P. Betzel Ch, Dauter Z, Wilson KS, Hol WGJ. 1989. Molecular dynamics
refinement of a thermitase-eglin-c complex at I .98 A resolution and comparison of two crystal forms that differ in calciumcontent. J Mol Biol
210347-361.
Hazes B. Dijkstra BW. 1988. Model building of disulfide bonds in proteins with
known three-dimensional structure. Protein Eng 2: 119-125.
Heinz DW. Pnestle JP, Rahuel J, Wilson KS, Griitter MG. 1991. Refined crystal
structures of subtilisin Novo in complex with wild-type and two mutant
eglins: Comparison with other serine proteinase inhib~tor complexes.J Mol
Biol 217353-371.
Heringa J, Argos P, Egmond MR. Vlieg J de. 1995. Increasing thermal stability
of subtilisin from mutations suggested by strongly interacting side-chain
clusters. Protein Eng 8 2 - 3 0 .
Jukes TH, Cantor CR.1969. Evolution of protein molecules. In: Munro HN, ed.
Mammalian protein metabolism, vol. 111. New York: Academic Press. pp 2 I132.
Kabsch W, Sander C. 1983. Dictionary of protein secondary structure: Pattern
recognition of hydrogen-bondedandgeometricalfeatures.
Biopolymers
22:2577-2637.
Katz B, Kossiakoff AA. 1990. Crystal structures of subtilisin BPN variants
containing disulfide bonds and cavities: Concerted structural rearrangements induced by mutagenesis. Proteins Struct Funct Genet 7343-357.
Kimura M. 1983. The neutral theo? of molecular evolution. Cambridge, UK:
Cambridge University Press.
Kraulis PJ. I99 I , MOLSCRIPT A program to produce both detalled and schematic plots of protein structure. J Appl Cry.stal 24:946-950.
Kunitate A. Okamoto M, Ohmori I . 1989. Purification and characterization of a
thermostable serine protease from Bacillus thuringiensis. Agric Biol Chem
533251-3256.
Kwon ST, Terada 1, Matsuzawa H, Ohta T. 1988. Nucleotide sequence of the
gene for aqualysin I (a thermophilic alkaline serine protease) of Thermus
aquaticus YT-I and characteristics of the deduced primary structure of the
enzyme. Eur J Biochem 173:491-497.
Kwon YT, Kim JO, Moon SY, Lee HH, Rho HM. 1994. Extracellular alkaline
proteases from alkalophilic Vibrio metschnikovii strain RH530. Biotech Lett
/6:413-418.
Larcher G, Cimon B, Symoens F, Tronchin G, Charbasse D, BoucharaJ-P. 1996.
A 33 kDa serine proteinase from Scedosporium apiospermum. Biochem J
315:I 19-1 26.
Lavrenova GI, Gulnik SV, Kalugar SV, Borovikova VP, Revina LP, Stepanov
VM. 1984. Extracellular acid serine proteinase D of Streptomyces rutgersens i s . Biochemistry (Moscow) 49447-454.
Lilley G, Stewart DJ, Kortt AA. 1992. Amino acid and DNA sequences of an
extracellular basic protease of Dichelobacternodosus show that it is a
member of the subtilisin family of proteases. Eur J Biochem 210:13-21.

R.J. Siezen and J.A.M. Leunissen


Lipkind G, Gong Q, Steiner DE 1995. Molecular modeling of the substrate
specificity of prohormoneconvertasesSPC2andSPC3.
J Biol Chem
270:13277-13284.
Lo RYC, Strathdee CA, Shewen PE, Cooney
BJ. 1991. Molecular studies of
Ssal, a serotype-specific antigen of Pasteurella haemolytica AI. Infect lmmun 593398-3406.
Matsumura M, Signor G, Matthews BW. 1989. Substantial increase of protein
stability by multiple disulphide bonds. Nature 342:291-293.
McPhalen CA, James MNG. 1988. Structural comparison of two serine proteinase-protein inhibitor complexes: Eglin-C-subtilisin Carlsberg and CI-2subtilisin Novo. Biochemistry 276582-6598.
Meyer C,Bierbaum G, Heidrich C, Reis M, Siiling J, Iglesias-Wind MI, Kempter
C, Molitor E, Sahl HG.1995. Nucleotide sequence analysisof the lantibiotic
gene cluster and functional analysis of PepP and PepC. Eur J Biochem
232:478-489.
Mitchinson C, Wells JA. 1989. Protein engineering of disulfide bonds in subtilisin BPN. Blochemistry 28:4807-48 15.
Murphy JM, Walton JD. 1996. Three extracellular proteases from Cuchliobolus
curbonum: Cloning and targeted disruption of A L P / . Mol Plant-Microbe
Interact 9:290-297.
Pantoliano MW, Ladner RC, Bryan PN, Rollence ML, Wood JF, Poulos TL.
1987. Protein engineering of subtilisin BPN: Enhanced stabilization through
the introduction of two cysteines to form a disulfide bond. Biochemistry
26:2077-2082.
Pearson WR, Lipman DJ. 1988. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA 85:2444-2448.
Perona JJ. Craik CS. 1995. Structural basis of substrate specificity in the serine
proteases. Protein Sci 4:337-360.
Rawlings ND, Barrett AJ. 1994. Families of serine peptidases. Methods Enpmol
244: 19-6 I .
Rypniewski WR, Perrakis A, Vorgias CE, Wilson KS. 1994. Evolutionary divergence and conservation of trypsin. Protein Eng 757-64.
Sahl HG, Jack RW, Bierbaum G. 1995. Biosynthesis and biological activities of
lantibiotics with unique post-translational modifications. Eur J Biochem
230:827-853.
Saitou N, Nei M. 1987. The neighbor-joining method: A new method for reconstructmg phylogenetic trees. Mol Biol Evol 4406-425.
Schechter I. Berger A. 1967. On the size of the active sitein proteases. I. Papain.
Biochem Biuphys Res Commun 2 7 157-162.
Schmidt BF, Woodhouse L. Adams RM, WardT, Mainzer SE, Lad PJ. 1995.
Alkalophilic Bacillus sp. strain LC I2 has a series of serine protease genes.
Appl Emiron Microhid 61:4490-4493.
Segers R, Butt TM, Keen IN, Kerry BR, Peberdy JF. 1995.The subtilisinsof the
invertebrate mycopathogens Verticillium chlamydosporium and Metarhizium anisupline are serologically and functionally related. FEMS Micrubiol
Lett 126:227-232.
Shimogaki H, Takeuchi K, Nishino T. Ohdera M, Kudo T, Ohba K, Iwama M,
hie M. 1991. Purification and properties of a novel surface-active agentand alkaline-resistant protease from Bacillus sp. Y. Agric Biol Chem 55:225 12258.
Siezen RJ, Vos WM de, Leunissen JAM, Dijkstra BW. 1991. Homology modelling and protein engineering strategy of subtilases, the family of subtilisinlike serine proteases. Protein Eng 4:719-737.
Siezen RJ, Bruinenberg PC, VobP, van Alen-Boerrigter IJ, Nijhuis M, Alting
AC, Exterkate FA, Vos WM de. 1993. Engineering of the substrate binding
region of the subtilisin-like, cell-envelope proteinase of hctococcus lactis.
Protein Eng 6:927-937.
Siezen RJ, Creemers JWM,van de Ven WJM. 1994. Homology modelling of the
catalytic domain of human furin. A model for the eukaryotic subtilisin-like
proprotein convertases. Eur J Biochem 222:255-266.
Siezen RJ, Rollema HS, Kuipers OP, Vos WM de. 1995a. Homology modelling
of the Lactococcus lacris leader peptidase NisP and its interaction with the
precursor of the lantibiotic nisin. P rotein Eng 8 1 17-1 25.
Siezen RJ, Leunissen JAM. Shinde U. 1995b. Homology analysis of the propeptides of subtilisin-like serine proteases (subtilases). In: Shinde U, ed.
Intramolecular chuperones and folding. Austin: R.G. Landes Company. pp
231-253.
Strongin AYa, Izotova LS, Abramov ZT, Gorodetsky Dl, Ermakova LM, Baratova LA, Belyanova LP, Stepanov VM. 1978. Intracellular serine protease of
Bacillus subtilis: Sequence homology with extracellular subtilisins. J Bacteriol 133:1401-141 I .
Takagi H, Takahashi T, Momose H, Inouye M, Maeda Y, Matsuzawa H, OhtaT.
1990. Enhancement of the thermostability of subtilisin E by introduction of
a disulfide bond engineered on the basis of structural comparison with a
thermophilic serine protease. J B i d Chem 265:6874-6878.
Takeuchi Y, Noguchi S, Satow Y, Kojima S, Kumagai 1, Miura K, Nakamura
KT, Mitsui Y. 1991a. Molecular recognition at the active site of subtilisin
BPN:Crystallographicstudies
using genetically engineeredproteina-

Subtilases
ceous inhibitor SSI (Streptomyces subtilisin inhibitor), Protein Eng 4501508.
Takeuchi Y, Satow Y, Nakamura KT, Mitsui Y.1991b. Refined crystal structure
of the complex of subtilisin BPN and Streptomyces subtilisin inhibitor at
1.8 8, resolution. J Mol B i d 221:309-325.
Tsujibo H, Miyamoto K, Hasegawa T, Inamori Y. 1990. Amino acid compositions and partial sequences of two types of alkaline serine proteases from
Nocardiopsis dassonvillei subsp. prusina O K - 2 IO. Agric B i d Chem 5 4 2 1772179.
van de Ven WJM, Voorherg J, Fontijn R, Pannekoek H, Ouweland AMW van
Furin is a
den. Duijnhoven HLPvan, Roebroek AJM, SiezenRJ.1990
subtilisin-like proprotein processing enzyme in higher eukaryotes.Mol Biol
Rep 14265-275.
van de Ven WJM, Roebroek AJM, Duijnhoven HLP van. 1993. Structure and

523
function of eukaryotic proprotein processing enzymes of the subtilisin family of serine proteases. Crit Rev Oncogenesis 4: 1 15-1 36.
van der Meer JR,Rollema HS, SiezenRJ, Kuipers OP, Vos WMde. 1994.
Influence of amino acid substitutions in the nisin leader peptide on biosynthesis and secretion of nisin by Lactococcus lacris. J Biol Chem 269:35553562.
Wells JA, Powers DB. 1986. In vivo formation and stability of engineered
disulfide bonds in subtilisin. J B i d Chem 261:6564-6570.
Wells JA, Powers DB, Bott RR, Graycar TP, Estell DA. 1987. Designing substrate specificity by protein engineering of electrostatic interactions. Proc
Narl Acad Sci USA 84: I 2 19-1 223.
Zhou A, Paquet L, Mains E. 1995. Structural elements that direct specific
processing of difterent mammalian subtilisin-like prohormone convertases.
J Biol Chem 270:21509-21516.

You might also like