Professional Documents
Culture Documents
REVIEW
Subtilases:
The superfamily of subtilisin-like serine proteases
ROLAND J. SIEZEN
AND
Department of Biophysical Chemistry, NIZO, P.O. Box 20, 6710BA Ede, The Netherlands
CAOSKAMM Center, University of Nijmegen, Toernooiveld,6525ED, Nijmegen, The Netherlands
(RECEIVED
August 22, 1996; ACCEPTED
November 5 , 1996)
Abstract
Subtilases are members of the clan (or superfamily) of subtilisin-like serine proteases. Over 200 subtilases are presently
known, more than 170 of which with their complete amino acid sequence. In this update of our previous overview
(Siezen RJ, de Vos WM, Leunissen JAM, Dijkstra BW, 1991, Protein Eng 4719-731), details of more than 100 new
subtilases discovered in the past five years are summarized, and amino acid sequences of their catalytic domains are
compared in a multiple sequence alignment. Based on sequence homology, a subdivision into six families is proposed.
Highly conserved residues of the catalytic domain are identified, as are large or unusual deletions and insertions.
Predictions have been updated for Ca*+-bindingsites, disulfide bonds, and substrate specificity, based on both sequence
alignment and three-dimensional homology modeling.
Keywords: homology modeling; sequence alignment; serine protease; subtilase; subtilisin family
lowable substitutions, disulfide bonds, Ca2+-bindingsites, substratebinding site residues, ionic and aromatic interactions, and surface
loops. Based on these predictions, strategies for homology modeling and protein engineering were developed and implemented,
aimed at modulating either stability, catalytic activity, or substrate
specificity (Siezen et al., 1991, 1993, 1994, 1995a).
Since 1991, more than 100 new subtilases have been discovered,
and these are now included in this updated review. In addition to
many new enzymes from micro-organisms, numerous members of
the subtilase superfamily have now also been identified in various
eukaryotes such as slime molds, plants, insects, nematodes, molluscs, amphibia, fish, mammals, and even in a catfish virus.
Serine endo- and exo-peptidases are of extremely widespread occurrence and diverse function. Many distinct families of serine
proteases exist; they have been grouped into six clans (Rawlings
and Barrett, 1994; Barrett and Rawlings, 1995), of which the two
largest are the (chymo)trypsin-like and subtilisin-like clans. These
twoclansare distinguished by a highly similar arrangement of
catalytic His, Asp, and Ser residues in radically different PIP (chymotrypsin) and a @ (subtilisin) protein scaffolds.
In 1991, we presented a review of over 40 members of the
subtilisin-like serine proteases, termed subtilases, which occur in
Archaea, Bacteria, fungi, yeasts, and higher eukaryotes (Siezen
et al., 1991). The mature enzymes were found to contain up to
1775 residues, with N-terminal catalytic domains ranging from
268 to 5 1 1 residues, and signal and/or activation-peptides ranging
from 27 to 280 residues. Several members contain C-terminal
extensions, relative to the subtilisins, which display additional properties such as sequence repeats, Cys-rich domains, or transmembrane segments. From four known crystal structures and a multiple
alignment of 40 known amino acid sequences, a corestructure was
predicted for the catalytic domain of all subtilases, together with
the variations that are allowed in the main-chain length as a result
of insertions and deletions (Fig. 1). Nineteen of these core residues
were found to be highly conserved, 10 of which are glycines.
Predictionswerealsomade
for subtilases of unknown threedimensional structure concerning essential conserved residues, al-
Structure-based alignment
The coordinates of subtilisin BPN, subtilisin Carlsberg, thermitase, and proteinase K were used previously (Siezen et al., 1991)
to determine the core of structurally conserved regions (scrs;
Greer, 1990) and the common secondary structure elements, as
analyzed with the DSSP program (Kabsch and Sander, 1983). This
core of about 190 residues contains virtually all of the common
a-helix and &strand elements, including the active site residues
D32, H64, and S221 (Siezen et al., 1991). Slight adjustments to
thesecore regions have now been incorporated (core ABC in
Fig. 2) based on a recent spatial superpositioning of seven structures that also included mesentericopeptidase, Savinase, and Esperase (Heringaetal., 1995); topologically equivalent residues
were defined as those that have Ca-atom distances of less than
2.0 A. The variable regions (or vrs) nearly always correspond to
501
502
503
Subtilases
20
10
*
*
*
*
*
*
*
*
*
*
*
*
basbpn
bssl6B
bssdy
blscar
besprc
beeprd
bsaprq
b16147
baalkp
bscyab
bsaprs
bsepr
b66epr
"rnvapt
psaprp
paa1ys
bsta39
bsta41
bplep
b616pl
bslakp
belepq
L611p
LvLhcr
tstap
bsakl
hmhl ye
nahlya
S"0535
bsrpra
dnbpr
dnavp2
dnavpi
alaprl
xcpr0.a
.&.t
c o r . AB
vaproa
a1apl-2
LrL4la
Laaqua
LaOrOL
taprok
tapror
bbprl
fuealp
plbepr
macdpa
aoespr
acaipr
atoryz
aooryz
atelst
anprta
anpspd
Lhprbl
anpepc
scprbl
6cy6p3
EpsCpr
YlXprZ
scycts
CQX-ABC
efcyla
E=PePP
lslaep
bspara
6esplp
llnlsp
f
.
*
*
*
*
*
.
..
.
..
cGFS~~mY--QWNVKHIN-------APRLGRLGRLFSHIW~RRAFGYG--VKVAVLDTGIDY~PELSG-~~~~~~----~~~~~~--AASQST""PWGII(AIY
~ ~ " . . . .~SSITQTSGGGG
~ ~ ~ ~
..~ ~
N ~ A
~
v L~
~ T .~ v.- .
p ~ L ~ N . . . ~.
~ ~ .~ ~ ~
. . ~ ~ ~ ~ ~ . . . ~
~ ~ ~ ~ ~ " " p W G ~ ~ A ~ y ~ " - - - - - - N N S N t T S T S G G R G p ~ ~
M E R K V H I I P Y Q V I K Q E Q ~ I - - P R G V E M I Q ~ ~ - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ - A P A V W N Q T ~ R G R G - - V K V A V L D T G C D I \ D H~-P D~~~~~~~~LKA----~~~~~~~
Y T R N D P I ( Y - C S Q Y A P Q Q V N - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - - ~ ~ S A W D T T L G S S ~ - - V K l A V V D ~ V ~ ~ D H P D L S S Q ~ - - - ~ - ~ ~ ~ ~ ~ - ~ " " ~~
AAPNDQHY-REQWHYFDRY------~-~~~~~~-GVKADKVWDHGFTGQN--VVVAVVDTGIL-H~DLNAPNLPG-~~~-YDFISNSQISLDGDG~
- "-
AAPNDPFY~NDQWHYYSEY---~~~~~~~~----GVKADKVWDRGIT~KG~~VTVAVVDTGIV~~PDLNAPNIPGSG~~-~FIQEAEIAQDGDGRD
ATPNDPRY~NDQWHYYE~A----~~~~~~~~~~AGINAPAAWDK-ATGQG-~VVVAVLDTGYR~PHLDLDANILPG~~~~~~MISKTFVANDGG~D
~LTPNDTRL~SEQWAPGTTN---~~~~~~~~~--A~LNlRPAWDK-ATGS~--TVVAV~DTG~T~SHADLNANlLAG-~~~~YDF~SDA~ARDGNGRD
hhhhhtt
hhhhh
tL
LL
ccccccb
tt L t
~~~~~~~
~~
~~~~
~~~~~~~~~
~ L Y T Q N G A ~ ~ ~ ~ ~ P W G L G T I S W ( - ~ - Q P G S T S Y I Y D D S R C S C - T Y I T G I L E S H N E ~ ~- ~ .~~~~~~
~~-~~~
~~~~
A L T T Q K C A ~ ~ ~ ~ ~ P W G L G S I S H K ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - G Q I \ S T D Y I Y D T S R G A G A - - T Y R Y V V -D S C I P- N N I ( V- E F E S ~~ ~ ~ ~ ~ ~~~ ~ ~ ~ ~ ~
~
C L T T Q K S A - - - - - P W G L G S I S H K ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - = Q Q ~ ~ Y I Y D T S ~ ~ G ~ ~ - - T~~ A Y~V V D~S G V ~
~ D " ~~~ ~ ~ ~
~ ~ ~ ~~~ ~ ~ ~~
D L T T Q S D A - - - - - P W G L G S I S ~ K ~ ~ ~ ~ ~ ~ - - - - - - ~ Q P S ~ Y ~ Y D ~ - ~ ~ ~ ~ - - T Y A Y V ~ I ~ I N V D H E E F E G ~ ~ ~ ~ ~ ~ ~ - - r A L T S Q S G A - - - - - P W G L G A I S H X " - - - - G E A S T T Y V Y D D H E E F G G ~ ~ ~ ~ ~ ~ - - - - - --~- ~ ~ ~ ~ ~ ~ ~ ~ - - ~ T L V T E ~ - - - - - P H C L G S I S H R " - - - - - - - G R 5 S T D ~ G ~ ~ - - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ - - A L T T Q S G I \ - - - - - P W G L C ~ S H R - - - - - - - - - - - ~ T S G S T ~ Y I Y D ~ ~ - A G A G ~ ~ T F A Y ~ S G I ~ S H Q Q F G G ~ - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ " - - - - ~ ~ ~ ~
~ E G I T E K N I \ - ~ ~ ~ ~ P W G L A R I S K R D - - - - - - ~ - S L T F G N F N ~ ~ ~ Y ~ S E ~ G G E G ~ ~ V D A Y T l D T G l N V D H V D F E ~ - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
E F D T Q N S A ~ ~ ~ ~ ~ P W G L A R I S H R E - - - - - R L N L G S F N R Y L Y D D D R K D F ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - -
~ N S S L Q E E A ~ ~ ~ ~ - P W G L H R V S H R E ~ ~ ~ ~ ~ ~ ~ ~ K P K Y G Q D L E Y L Y E D A ~ A G K G - - V T S Y V L D T C I D T E H E D F E G ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ K L E T Q S G A ~ ~ ~ ~ ~ P W G L A E I S H K - - - - - ~ ~ ~ ~ S V K Y D D I C K Y V Y D S S ~ A C D N - - I T A Y V V D T C V S ~I H H V E
~ F E G -~
~
~
~
~
~
~
A I Q ~ P V T ~ ~ ~ ~ ~ Q W G L S R I S H K - - - - - - ~ ~ ~ ~ ~ ~ K A Q T ~ N ~ A Y V R E T V G K ~ H P T V S Y V V D S G I R T T H S E F ~ ~ - - ~ ~ ~ ~ ~ ~ ~ ~ ~
<EGDSYNSAESSYTFNR.TAKYSYEDVEEEQNITYQPDAPRHLARISRHLARISRH~QLPFDVGDKDRYKSWFNYY~EHDY
C D
V N R Y I W D T G I F I \ D H P E F E D - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ " - - - ~ ~ ~ ~ ~ ~
a"-
EVNSSILNINILNKDFK-SFNWPYKKIL - - - - - - - - ~ - ~ ~ ~ ~ ~ ~ ~ S H I D P V K E Q L G K D ~ ~ I T I A L l D S G l D R L H P N L Q D ~ ~ ~ ~ ~ ~ ~ ~ ~ HKSMITYIKQTIT~~DSILFIDSGCDFKHPELQD------------~-~~~~~~~-----MTVL~LRDINS*ILT"""~~".~~~~~~~EyRLH.HYSSRyT~~SSIALLDE~~KT~syLQK~~~~~~~~"""~~~~~~
<FE~EDNWAFEHL"""".~~~~~~~~~~"SI(RH"DFNGNK~~"*IAVLDSGVS.~IKGLDK~~~~~~~"".~~~~~~~~~~~
~NGSHDLF~DRQWD~RRIT-----------~~~~~~NECKSYKLSPDRKK-~AKVALVDSGVNSSHTDLr~~~~~~~----c T N S H D F W ~ D Y Q W O H ~ Y V T - - - - - ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ G E ~ ~ ~ L Y Q P S K K ~ ~ I S ~ G l l D S ~ l H E E H P D L S N S L ~ N Y
1spc2
cepe2
hspc2
acpc1
lapcl
h6pc13
bcpc3
hspacl
hepc6
aafur
dmfurl
tLfur
=?furl
actur1
acfur2
ISt"l-2
X1f"rA
dssp
* 6C6epr
A T P N D P Q Y ~ C Q Q Y A P Q Q V N - - - - ~ ~ ~ ~ ~ ~ ~ ~ - - - - - - C E A A ~ D V T Y ~ D ~ ~ - ~ V T I S ~ ~ I Q Y D H E D L E ~ ~ ~ ~ - ~ ~ ~
~ E Y P N D P E Y ~ S K Q W N L R A I A - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - - - - ~ M E S A W D - ~ ~ K G E G ~ ~ V T V A " I D T G V T ~ R V P D L R Q T - - ~ ~ ~ ~ ~ ~ ~ ~
~SL~IQY-PYQWPLI(NNG"------ENGGVI(NADIYEP~12)TLIAVVDTGVDSTLADLKGKV-"---------~~~~~~-----~PNDPSY~RQQWHYFGNY
- . . . ~ ~
G V~
K A ~~
V W ~~
R G F~
TGQ
~
~~V
-V V
-S ~-~ TGlL~DHVDLNG~LPG~~~~-YDFlSSAPNARDGDQRD
* hvccvp
c avprca
* asaspa
hsfur
dmfur2
cefur2
hakx2
hslpc
ylxprd
k1 k e x l
sckcx2
* spkrpl
~~~~~
* mmpc4
*
*
*
*
*
*
*
*
*
*
40
1 .
* bcpc2
30
'"BC'P
smssp1
S""6p2
phssal
bsspra
basprb
bsbpf
bsvpr
epscpa
IlprLp
Idprtb
llspos
agserp
lep69
*
* cmcucu
paat70
atscrp
hsklaa
ddLagb
ddLagc
dmpga9
hstpp2
cctpp
* sm6Lab
* ptpyro
tsp1st
+
*
*
*
*
*
c
f
504
50
basbpn
bsslbS
bssdy
bl -?a>bssprc
b66nrd
bsaprq
blS147
baalkp
bseyab
bsaprs
bscpr
bssepr
vmvapt
psaprp
paa1ys
beta39
betail
. .
boleo
bslspl
bslakp
bslepq
Islap
70
SO
100
110
90
120
130
- - - K V A G G A S M V P S E T N P F ~ ~ ~ ~ ~ - - - - - - - Q D ~ S E G ~ A G ~ ~ L ~ ~ ~ ~ ~ ~ - - ~ S ~ G ~ - L C V S ~ S - ~ A S L Y A V K V L G A ~ ~ ~ D G S G Q Y S W l I N G I E W A l A N - - - - -~- -- ~- ~
- -- -S-A-A- L- - N E I D V I N E I
W R G G I S P Y P S E T N P Y ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Q D G S S E G ~ A G T I A A L - - - ~ ~ ~ ~ ~ ~ S I G V ~ L G V A P ~ - - A S L Y A V K V L D S - - - T G S G Q Y S W I I N G I E W A I S N ~ ~ ~ ~ ~"STAL
~~~~-~-~~--NEID~IN
K V V G G I I S F Y S G E - S Y N ~ ~ ~ ~ ~ ~ ~ ~ - - - - T D G N ~ G ~ A G T V A A L ~ ~ ~ ~ ~ ~ ~ ~ D N T T G V - L G V A ~ ~ - - V S L ~ A I K V L N S - ~ - S G S G T Y S A I V S G I E -W-A- T. .Q .N---STAL
~- ~ ~ ~ ~ ~ - - ~ ~ - - ~ ~ - ~
W V G G A S F V I G E - I Y N ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T D G N G ~ G - A G F J ~ A L - - - - ~ - ~ - D ~ T G V ~ L G V A P S - - V ~ ~ ~ A V ~ V L N ~ - - - ~ ~ ~ ~ T Y S G I " S G I E W A T T N ~ ~ ~ ~ ~
WKGGASFVSGEPNIIL------~~~~~~QDGNGEG~VAGTY~L--------~TGV-LGVAYN~~ADLYAVKVLSA---SGSGTL~GIAQGIE~SIS~--~-~~~~~~~~~~~~
R V V G C A S F V S E E P D A L - - - - - - - - - - - - ~ G N G B G T H V R C V L S A ~ ~ ~ G G S G T L A G I A Q G I E W A I D N - - ~ - ~ - - - - ~ - - - - - - ~ D V I N H S L- G- G~S T- G --~-S-T TL
~ H~
I R.
G G.Y S F I S T E P T r Y ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - V D ~ G E G ~ A ~ F J ~ ~ L - - - - ~ ~ ~ ~ ~ S Y G " ~ L G V A P G - - A E L Y A V K V L ~ R - - - N ~ S ~ S H A S I A Q G I E ~ A M ~ ~ ~
~ ~ - - R I A G G A S F I S S E P S Y ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - H D N N G B G T H V I G T l A A ~ - - - - - - - - ~ S I G V ~ L G V A P S - - ~ L Y A V K V L D R - - - N G ~ G S L ~ S V A Q G ~ E ~ A I ~ - ~ ~ - ~ - ~ ~ ~ ~ ~
NIRGGASFVPGEPST~~~~~~~------QDGNGEG~VAGTIAAL---~~~~~~SIGV~LGVAPN--AELYAVKVLGA---SGSGSVSSIAQGLEWAG~-~~~~~~~~~~~~~~
R i R G C A S F ~ P G E P N I - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ S D G N G ~ G T Q V A G T I A R L - - - - - - - - ~ S I G V ~ L G V A P N ~ ~ V D L Y G V K V L G A - - - S G S G S I S G I A Q G L Q W ~ ~ - - - - - - - - - -- ~- -- -- -- -~ -- G-~~-- HS IA AT ~M S L G S S A G - - - - NIRGGYSFYPGEPSY-~~~~~~~~~~~~QDGNCHGTHVIGTIIRL--------~SIGV-VG~APN~~AELYAVKVLGA---NGSGSVSSIAQGLQWTAQN--~-------~----~NIHV~LSLGS
SIAGGYSIVSYTSSY----~~~~~~~~~KDDNCPCTHVIGIIGAK--------HNGYGI-D~IAPE~~AQIYAV~LDQ---NGSGDLQSLLQGIDWSIAN~~-~~~~~~~~~~~~~
KVKGGTCVIRSDCGKGY--------")DNCHGTHVAGIIGA~~-------DNGVG~-VGVAPD--ADLYAV~FDE~~~FGEGSTSSITAGVDWAIQH~~~~~~~~~~~~~~~~~DIINLS
NRVTGTNDRGTGQWYIP-----------GS~~G~VAGTIAAI-~~-----A~EGV-KGLLPNQWNLHIVKVFNE---SGWGYSSTLV~IQTCADN~~~~~~~~~~~~~~~~GAKI~
AGVTGSTFSGHGSWF----------TDGNGBCTHVACTIVAL-~------D~G~-~G~LPSGLVGLHNVKIFND~~SGV~~ASDLI~IQSCQSA~~~~~~~~~~~~~~~~GSH
KVVYCINTLGKI.LYKG~RK------CADRKCEG~VAGIIAASL---~~---~SA-AG~PK--VQLIAVKVLYD~~~SGSGYYSDIAEGIIEAVKA~~~~~~~~~~~~~~~~GALILSMSL
~~.~
WEQCKDFTYG~YTNNS~~~~~~~~~CTDRQGBGTHVAGSALADG-------CTGNGV-YGVAPD~~ADLWAYKVLGD---DGSGYADDI~IRHAGDQATALN~~---~~~~~~TKVVINEISLGSS
. ~W.E.Q C K D F r V G T N F T D N S ~ ~ ~ ~ ~ ~ ~ ~ ~ C T R ~ E G ~ V A G S A L A N G - - - - - - - G T G S G V - Y G V A P E ~ ~ ~ L W A Y K V L G D - - - D G S G Y A D D I A E A I R H A G D Q A T A L N ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T K V V I N H S
.." R I I G G R N F T O D D E G D P E I F ~ ~ ~ ~ ~ ~ - - - I ( D Y N G E G ~ V A G T I A A T - - - ~ ~ ~ - - E N E N G V ~ V G V A P E ~ ~ A D L L I I K ~ L N K - - - Q G S G Q Y D W I I Q G I Y Y A I E Q ~ ~ - ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ K V D I
--QIIGGRNFSDDDGCKEDAI"---SDYNGBGTHYRGTIAAN--~~~~~~DSNGGl~AGVAPE~~ASLLlVKVLGGE--NGSGQYEWIINGINYAVEQ------~~-~~~~~~~KVDIlSMSLGGPSD
RIIGGVNLTTDYGG~ETNF---------SD~GEG~VAG~AAA~~~~~~~~ETGSGV.VGVAPK--ADLFIIKALSG---DGSGEMGWIAKAIRYAVDWRGPK~E----------WIRIITMSLGGPTD-~~~
~ ~ Q l I D G R N F T T O D N S D P D W ~ ~ ~ ~ ~ ~ ~ ~ ~ E D S N G E G T H V C G P V A A C - - - - - - - - ~ N D K G V - l C T A P K - ~ A K L L V V K V L S G ~ ~ ~ Q G Y G D T K W V l E G V R Y A I N W R G P ~ E - ~ - ~ ~ ~ ~ ~ ~ ~~ R
-V
TR
PV
El
LS M S L G C R ~ D " ~ " " ~ - R I I G K H W T S D D C N D P E I V ~ ~ ~ ~ ~ ~ ~ ~ ~ S D Q N G ~ G T H V C G T l A A T - - ~ - - ~ - - E ~ R A ~ ~ I G V A P E ~ ~ C Q L L V V K V L S N ~ ~ ~ R G F G T T E W V V E G l R H A l N W E G P N G DPRL
E~~~~~~~~~~KVQVLS
~~~~
~~~~
~~~~
~~~~
""
""
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~
tvLhel
IGLap
bsakl
hnhlys
nahly6
yo535
bsrpra
dnbpr
dnavp2
dnavpS
alaprl
'"p'Oa
..c..t
COX.
AB
Yapma
a1apr2
LrLIloi
taaqu.3
LdPlUt
raprok
irp,or
bbpr 1
tlicalp
plbspl
macdpa
.'OPSP'
acalpl
atoryz
aooryz
afplsr
anprta
anpepd
thprbl
d"pepc
scprbl
ECyspl
6psep'
y1xp1-2
CcyCLs
sor-ABC
etcy1a
=epepp
ISlasp
bepara
seep.?
Il"lS?
1spc:
rPpc?
bcpcZ
hspc2
acpc1
lnpcl
hspcll
bcpc?
hspar4
hspc6
aafur
dmfurl
ttfur
cefurl
actur1
actur2
Isfur2
mmpc4
x1 turA
hstur
drnfur2
cetur2
hakx2
hslpc
ylxpr6
klkcxl
sckcxz
spkrpl
~~NYHADASYDFSSNDPYPYPRY
- - ~ .TDTWFNSBGTRCAGEV~AAK
...
--.....
D ~ G V C ~ . V G " A Y ~ - - S ~ " A ~ L ~ M L D Q - - - - P ~ ~ ~ ~ l E A N A M G H M P N - ~ ~ ~ ~ ~ ~ ~ - ~ -~ ~""-GKFJDGPRNLT
~~~~~VlDIYSASWGPTOD"~ ~ N F N A E A S Y D F S S N D P F P Y P R Y ~ ~ ~ ~ ~ ~ ~ T D D W F N S E G = C~ G~
E l VDNGVCG-VGVAYD
A A~
R
~ . .
- - G K V A G I ~ L D Q....P Y M T D L l E A N S M G H E P S ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ K l H l Y S A S W G P ~ D - - ~ ~
~~NYNADASYDFSSNEAFPYPRY"-TDDWFNSBGTRCAGEVVGKI--~.~..~GLCG.VGVRYG--~RVAGI~LDQ----PFMTDIIEASSMGHKPQ~~~~~~~~-~~~~~~~-EIDIYSATWGPTDD"~"~-~~~~GRT
~ ~ N Y N A E A S Y D F S S N D P Y P Y P R Y ~ ~ ~ ~ ~ ~ ~ T D D W F N S ~ G ~ C A G E ~ S A ~ - ~ ~ ~ ~ ~ ~ ~ ~ C G ~ V G V A Y N ~ ~ S K V A ~ l ~ L D Q ~ ~ ~ ~ P F M T D I l E A S S I S ~ P Q ~ ~ ~ ~ ~ ~ ~ - ~ ~
~~NYDPEASYDFNDNDEDPSPRY
~ . ~~
~ D.
I ~.E N ~ G = c A G E V S...~~..MVA
KCG-TGIAFT..LKIGGV~MLD
~....
G H V T D R L E G D A l C F ~ H ~ ~ ~ ~ ~ ~ -- -~-.K-Y-D-I-Y S A S W G P N D D - - ~ ~ ~ ~ ~ ~ ~ ~ ~ C R T T E G P G V M A
~ ~ N y D A E A S y D F N D N D p N p F P R Y ~ ~ ~ ~ ~ ~ ~ D ~ ~ ~ N ~ ~ ~ c A G E l ~ Q A ~ ~ . . ~ ~ ~ D ~ K c ~ ~ v ~ V A F N ~ ~ S K V G G ~ R M L D ~ ~ ~ ~ ~ G l V T D A I E A CKFJEGPGRLP
S S ~ G F N P ~ ~ - ~ - - ~ - - ~ - ~ ~ N y D p E A S y D F N D N D H D p F P R Y ~ ~ ~ ~ ~ ~ ~ D L ~ E N ~ ~ ~ c A G E ~ A M Q A ~ ~ ~ ~ ~ ~ ~ ~ K c ~ ~ v ~ V A Y N ~ ~ S K V G G I R M L D . ~ ~ ~ ~ G ~ V T D A I E A GXFJEGPGRLA
S S l ~ F N P ~ - - - - - - - - - - ~
~ ~ N Y D P D I S y D F ~ N D D D p Q P R Y ~ ~ ~ ~ ~ ~ ~ ~ ~ T N ~ N ~ G ~ c A G E ~~ ~A~ M- A
G YA V~T ~D I
~ Y~ E~I ~S S~ I~ G~F ~N Ic Q~ - ~- ~- -~ -~- A- ~
H VN D~I ~Y S
A RR S~W GC PG N~ D RD M- EL GD P E K L A
~-NyDSyASyDVNGNDYDp~PRY~~~~~~~DA~NEN~~~cAGEVAASA-~~~.~.~~yc~.v~IAYN~~AK~GGIRMLD-----GD~~VVEAKSLG~RPN-------~-~~~~~~~~Y~D~YSA
- - NYDILISCDVNGNDLDPMPRY ~ - ~
DAsNEN~G=cACEVAAAA
-...
---....
~ S ~ ~ T . ~ G ~ A ~ ~ - - ~ K I ~ ~ ~ R ~ ~ ~ - - - - - ~ D V ~ M V E A K S V S F N P Q ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ H V H ~ Y S
- - NYDPI(IISYDYNGNDGDpMPHC----...
~ L T O s ~ G = C A G E V ~---....
TA
~~KCA.~GIAY~--ARVGCV~LD-----GDVTDVVEAKSLGLNSQ~~~~~~~~~~~~~~~~~H~D~YSASWGPDDD~"""~~~
~.NYDPIII\SYDVNSIIDDD~M~H~.......~~~~~~~~AGE"~T~.......~~F~A.~G~~~~..~~"G~V~LD.~~~~GDVTDAVEARSLSLNPQ~~~~~~----~------~~~
..~ ~ o p K ~ s y ~ v N m o ~ .
~.p
..
Q..
p.nD y
I I N S ~ G = C A G ~ V A A I.......
A
~ ~ ~ ~ A . v ~ ~ A F H . . A G I G G V ~ L D ~ ~ ~ ~ ~ G D V ~ A V E A R S L S L N S Q ~ ~ ~ ~ ~ ~ ~ ~ - ~ - - - - - - - Y ~ D l Y S A S W G P D D D - ~
~ . N ~ ~ ~ R ~ ~ ~ ~ V ~ ~ ~ ~ ~ ~ ~ R ~ . . . . . . . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ G ~ V ~ ~ F . . . . . . . ~ ~ L ~ I . ~ ~ I A Y N . . A N I G G ~ ~ L D ~ . ~ ~ ~ G D V T D A V E A A S V G ~ N A D ~ ~ - - - - - - ~ - - - - - - - ..NYDEI(ASYD"NGHD~DP~PRY.......D y ~ E ~ G = c A G V V - Q A .......~ v ~ ~ . v ~ V A Y N . . A R I G G V ~ L D ~ ~ ~ ~ - G D V ~ S V E A Q S L G L N S Q - - - - - - - - - - - - - ~ ~ ~ - H I H I Y S A T W G P D D D " " - " " " G R F J D G P A T L A
..NYDPYI\SYDLNDHDNDPM~R~
.......DASNE*G~CAGE"SAEA
..~...~~ ~ ~ ~ ~ ~ I A p D ~ ~ ~ ~ I ~ ~ ~ ~ ~ L D ~ ~ ~ ~ ~ ~ ~ V Y ~ A ~ ~ A A S L S F ~ ~ - ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~
::~:~~::~:~;i~~~~~~~~~~~:::~::~~ND~~~~~:~~~~:~:~~~~:~~~~~:~~~:
:~~~~~a:~~~~~~~~~~g~~~:~~::~~~~~~~~~~~~~~::~::~:::::~~~~~:~~~~~
~ ~ N ~ D ~ E A S F D I N G N D S O P.T ~
P Q ~ DN~DN*G=cAGEVAAVA
~ ~ . . . .
.....~ ~ ~ ~ ~ ~ ~ . ~ ~ v A y N ~ ~ A s I
~ ~ N ~ D P L A S T D I N D H D D D~
~ T P Q ~ -GDN*G~=A~EVAALA
~
~
~
........
~ ~ .
~ ~ .~ . ~ ~ v A F K ~ ~ A K ~ ~
~ ~ N ~ D Q T A S I V L N D N D N D ~ ~ ~ R ~ ~ ~ ~ ~ ~ ~ ~ D.......
~ D A D ~N~~~ ~c ~ .= ~~ ~AvGA Ey A
N ~A ~AAIK Al G G v R M L D ~ ~
~ ~ N ~ S ~ ~ G S ~ D L N S N D ~ D ~ ~ P H P - - " - - D V E N G ~ ~ ~ ~ A G E ~ A A V P - - - . ~ ~ ~ ~ ~ F ~
G G v R M L D ~ ~ ~ ~ ~ G K ~ N D ~ E A Q A L S L N P S ~ ~ ~ ~ ~ ~ ~ ~
~ v ~ ~ L D ~ ~ ~ ~ ~ G A V S D S V E A A S L ~ ~GKTFDGPGPLA
N Q D - . . ~ ~ ~
~ ~ ~ G Q A T DA L.E.A .
S A~
L G~~
F~ R~~
G~D ~. ~-~-" I D l Y l ~ C W G P K D D ~ - - - ~ ~ ~ ~ ~ ~ ~
A ~ ~ G V A Y G ~ - S R ~ A C I R V L D ~ - ~ ~ ~ G P L T D S M E A V A F N ~ Y Q ~ ~ ~
- ~ ~ ~ ~ ~
~ ~ ~ ~ ~
G K T F C K P
~ ~ ~ ~ ~ - ~
: : ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ : : : : ~ : : : : : ~ ~ ~ ~ ~ ~ ~ ~ : ~ ~ ~ : : ~ ~ :
: : ~ F ~ ~ ~ ~ ~ : ~ ~ ~ ~ ~ ~ ~ ~ ~ : : : : : : ~ : : : ~ ~ ~ ~ ~ ~ ~ ~ : ~ ~ ~ ~
hvccw
avprca
asaspa
rSYAVVSESWGCVDD-----------GAAFCDTTGNF
~~SCKI"APRD"TRKRIFPTP .~..~......
= - ~ G T A C A G V A C G ~ ~ ~ ~ ~ ~ ~ ~ . N G ~ G * . S..
GAVKA=? ~G ~ I ~......
~ F v ALGSQDEADS~"~A~Q~----~--.-~~~~~~~CADVISCSWGPPDG~-TWWDDRDPLHKQKVPLP~ST
~ ~ W R P ~ C S K W V T G C S D P~ p ~
~
~
~
~
SV~G
~I I A.A.V~T~ ~ ~~~ ~ ~DD N ~~ I ~ ~ . L ~G V A~~ R ... ~ Q.L Q ~ ~ N ~ ~ D ~ . . . N I Q Q L Q K D ~ L Y A L C Q R R ~ ~ ~ Q P G - - - . ~ - ~ ~ ~ ~ ~ ~ L Q P E L R M S L V D P E G ~ - - - ~ ~ G L D Q V
elssp
sCt7cpr
ernscrp
smssp1
emESP2
phssal
~~VNCVACKPDTADCAWRPS~~~~~-----I\IESP~G~~GEIAAAK~-~~----NGVG~-TCVA~G--~KVA~IKVSNP---DGFFYTEA~CGFMWAAEH--~-~-~-~~~~~~~-C~DV~SYYTDPW-~~
bsspra
bssprb
bsbpf
~-~QWLGSTNLNI\HTGILPITYV~NVP~~~DSSSGEG~AGFJGGTGA------MSGGKY~EGVAPG--ENL~GYGSGA.....~VVAMLDTLGGFDYAL~QQEY~~~-~~------NIRIl~SWGATSD"-----~~~~AGTOFDP~P
- -V Q N V L G S T N L Q G I T G I L P I T Y T ~ N V P ~ ~ ~ D ~ S ~ G ~ A G ~ G G T G A - - - - - ~ M S G G K Y - ~ G A A P G - - A D L I G Y G ~ G G . . . . . ~ A L F ~ L D G ~ G G F D Y A ~ ~ ~ E Y ~ ~ ~ - - - - - - - - - D ~ R V ~ ~ S W G S S G D " - - ~~NEPENEMNWYDAVAGEASP
..........~ Y D D ~ ~ G ~ ~ G T M V G S E - - - - - - - P D G ~ Q . l G V A P G - - A K ~ l A V ~ A F S E - - - - D G G T D ~ I L E A G E W V L A P ~ A E G ~ H P E M " - - - A P D V ~ S W G C G S G " " " - ~ ~ ~ ~ ~ ~ " " ~ ~ ~ ~
~~NFGQYKGYDFVDNDYDPI(ET
....p T G D p R G E A ~ n G - ~ ~ ~ A A N G T.l...........KGVAPD~~ATLLAYRVLGP
...G G 5 G T T E W I A G V E R A V Q D - - - - - ~ ~ - ~ ~ - - - - - - G A D V M N L S L G N S L N " - - - - - ~ ~ ~ ~ ~ ~ ~ " N P D W A T
WVNDKVAYYHDySI(DGKT"--AVDQEBGTWSGILSGNAPSET~--KEPYRL.EGAMPE--AQLLLMRVEIVN--GLADY~YAQAIRDAV~----------~~~~~-GAKVINEISFGNAAL-~~~"""-~~AYANLPDET
b6TT
spscpa
11prtp
ldprtb
llsp09
ageerp
lcp69
CrnCUC"
paat70
atserp
hsklaa
ddtagb
ddtagc
dmpga9
hstpp2
cetpp
emstab
pwro
Lsplst
"
"
~~IPY~KGDAFRyDGTpSYDSD"--------CTLGS~G~vAASPPAAE~~~-----DGG~.HGVAFN~~AQIlSAENGDP~6]IL~ND~AVYQAGWDALVAS~~~~~~-~-------~GARI~~SWGIG~T~~D~QKQFDQ~KQI
SCNGKIVGAQYFRHGAIAV~E~-NRTRDYRSPFD~GEGS~TAST~GN~~A~~~NGYNFGYASGMAPG--AWIA~Y~L~~----FGG~SDVVAAYD~~E~~~~~~~----------GVDIISLSVGPSAV
HCNS~LICIRYF~CIHAAIP-NATFSMNSRRDTLGEG~TA~T~~N~NGAS~FGYGKGTARGIAP~--RR~~~~~~T~P-~~~EGRYTS~VL~G~~~IAD~~-~~--~-------~GVDVISISLGY
KC~KLIGARSYQLGHC.~~~~~~~~~---SPIDDDG~G~~AST~GAFVNG~FGN~GTAAGVAPF--AHIAVYKVCNS~~--DGC~~VL~MD~IDD--------~~~~~-~-GVDILSIS
RCNRKIIGARSYHIGRPISPG------D~GP~D~GEG~T~ST~GGLV~~~LYGLGLG~ARGGVPL--~RIAAYKVCWN.~~~DGCSD~ILAAYDDAIAD~~~----~---~~~~~GVDIISLSVGGANP
N C N R I ( I I C R R Y ~ S ~ ~ E D D D L K ~ ~ ~ I W P E S R T ~ ) Y Q G ~ C ~ Y T ~ T A A ~ S F ~ N ~ N G L ~ ~ ~ ~ ~ G ~ ~ A S S S ~ ~ A ~ ~ V C G L - - - ~ ~ G ~ P G ~ Q ~ L A A F D D A ~ ~ ~ - - - - ~ ~ ~ - ~ ~ ~ ~
LCNRKLIGARFFRRGYESMGp~DESKESRS~~DDGEG~T5STAAGSVVEGA~LL~YA~GTARCML--~-HALAVYKVC~L----GGCFSSDlL~lD~lAD-~~~~"~""""WWLSUSLGGGMS~~~"""~~~~~
.~~~
KNVKERRIW-RTL
.............D D G ~ G E G T ~ V A G V I A S M.
R.
E.
~.
.
.
.
.
.
.
.
.
Q
~FApD..A~LH~FRVF~.~~NQvSYTSWFLDAFNYAlLK----------~~~-~-~IDVLNLSlGGPDF""-~~~~~"""UDHPFV
""~:~~::~~~~~~~~~~~.~:::~:~~~~~~:~:::
" . - ~ ~ ~ f y ' : ~ ~ ~ ~ , , I ~ ~ ~ ~ : : ~ ~ ~ : ~ ~ . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ : ~ ~ ~ ~ ~ ~ : : : : ~ ~ ~
~~~~
K V I ~ R ~ D C S G ~ ~ ~ ~ ~ K K K ~ ~ . ~ ~ ~ ~ ~ . ~ ~ S S ~ ~ ~ ~ ~ S I A S G I I - . . . . . . . H S S R D V - D G V A ~ N - - ~ ~ ~ V ~ ~ T I ~ D ~ ~ L - G ~ M ~ T ~ T A L V ~ T K V ~ E L - - - - - -
""
""
~~~~
~~~~
DVIA~DNGT~~~~~~NG~T.~~~~~~SDFHG~GTSV~~IASRGRVLYDLYGDG~L(~~GV~PG--AKIAGGDAWLL---CNILVLEANLAGFNIVTEEEDGWYLSLDPFGPH-~DI~SNSWGS~YI--NFW
""
~
Fig. 2. Continues.
505
Subtilases
140
15:
160
.
basbpn
be6168
bssdy
blscar
b=SptY
b6mrd
baaprq
bls141
baalkp
bseyab
b.aprs
b6FP2
baecpr
"TWapL
psaprp
paa1ys
bSCA39
betall
bP16P
b616pl
bslakp
barspq
Lllap
170
180
200
~~~~
~~~~~
~~~
~~
~~~~~
~~~~~
~~~~~
ivLher
LsLap
bsakl
hmhlys
nahlys
syos35
bsvpxa
dnbpr
dnavp2
dnavps
a1apri
XCP'Od
.aE.lf
ear.
190
K A A ~ ~ V ..~~~G--G~cTs~sss.~~~...~
A S
........ - . T V G y p G k y p ....SVIAVCA"DS---- ..........--SNQRASF - - - - - b J G P E - - - - - - ~ - - - - - ~ L D V I ( A P G V - - - - - - - - - - - ~ ~ ~ ~ ~ ~ ~ ~ ~
~ ~ ~ = ~
~ G y p A K q ~...
p
S T ~ A V C A V N S~ ~ ~ ~ S ~
N Q R~
A S F
~
~ ~ ~~
- - S~
S A G.S E.- - .
- ~ ~~
~ ~ ~~
~ ~ ~.~ ~.L D.V I I A P
- G- Y- - -~ - - - - - ~ ~ ~ X ~ ~ V ..~.
S S~ C I V W - G ~ G ~ ..~~..............
KQAYDXAYAS.....G I V W - C N ~ G ~ S G ~ Q N ............~~....
T ~ G Y P A K ~ D ~ ~ ~ ~ S VYD~........~........
IAVGA
NXNRASF ...~ ~ S S V G A~
E ~ ~ ~ . .L .
E V.H.A.P .
CV
.- - - - - - - ...
K Q A V D N ~ Y ~.R ~
~~GV"W-ACNSGSSG.
............~.....
TIIGYPAKIYD. ~ ~ S V I A V c A V D
........~........
S
N-ASF.....SSVGAE~~~..
.......L E V H A P G A - - - - - - - - - - - - - ~
~ A C N N IYNR....
I
~ ~ I V V I A A A G N ~ G ~ ~............~~~~~.
G~ c y p - y s ....~ V ~ A v C ~~.~.............A v ~ ~
A S F . . . . . S S V C S E . ~ ~ ~~ ~
L E~ V~ H~A~P ~G ~
V-------~~
- - -~
RQASDNAYNS .....C I W I ~ A ~ N ~ c ~ ~~.~............
v L c L
~ ~ T I G y p - y D....S V l A V C A V ~ S. . . ~ ~ ~ ~N~
M(R
~
A S.F .
. ~~ ~.~.
SS
.V.C.S Q - - " - - - - - L E V H I I P G . - - - - - - -~~
Q L - - N A ~ ~ ~ ~ ~ C V L L I G I \ R ~ N ~ C Q ............~.....
~C~N
NHGyp-yA
....SVMAVGAVDQ ................. NCM(ANF .....S S Y C S E " - - - - L E I H I \ P G V - - - - - - - ~ - - ~ - - - - - - - - ~
E ~ ~ I V M L I \ N N ~ " " ~ C I L ~ ~ ~ ~ ~ ~ ~ ~ Q " - - - - - - - . " " " ~ ~ ~ ~ ~ ~ G ~ ~ y p ~ y ~ " " G ~ ~ ~ ~ ~ ~ ~ ~ Q " - - - - - - - - - - - - ~ ~ ~ ~ . . . . . s ~ y ~ p E . . . . ~ ~ ~ ~ ~ ~ ~ ~ . I E I $ A P G V " " " - ~ ~
EQAVNSATSR .....G v " I \ R ~ C ~ ~ c A G .~.~~~~~..............
SIsyp-yA
....
NAMAVGA=Q ......~~~........~~~..~~.SQYGA~--"-------LDIVI\PCV-~------------~~~~~~
EQAVNQATAS .....GvLv"-sI\QTsGAc
...~~~~~..............
N V C F p A ~ y A~ N~
~ A v c.A =.
Q ......~..........
~~F.....SQyCAC~~-..........LDIVAPCV---------------~~~~~
E L A V N Q A R I A ~ ~ ~ ~ . G V L V " - T G ~ G....~~~~..............
~G
TVsyp-qA~~~~NALAvcA
........~........
~Q
NNNRASF .....S Q Y G T C ~ ~ ~ ~ ~ ~ ~ ~ ~ . . . . L N I V A P G V - - - - - - - ~ ~ ~ - - - - - - - ~ ~ ~
WAVNRAYEQ.....
G"LLVIsc~GNGK
~~~.~~."............
p v ~ p I \ R q s . . ~ . S"SAT
~ A ~ . . . . . . ~ . ~ . . .L.A.
S F..""
..~
STTGD
Q ~ ~ ~ ~ ~V.E F
.~.
A p.C.
T "." ".. ~.
. .
. . . . . ~ ~ ~ ~ ~
K S A L ~ X A Y N ~ " " . C I L I ~ ~ . ~ ~ ~ s ~ ~ ~ ~ ~ . . . . . . ~ . . . . . . ~ ~ ~ ~ ~ ~ L y p ~ ~ y ~ . . . . ~ ~ I ~ ~ c ~ ~ ~ ~ ~ . ~ ~ ~ ~ ~ ~ ~ . . " " " ~ L Q R L p ~ ~ ~ ~ ~ ~
Q N R I I ~ ~ L Y ~ ~ " . ~ . C ~ L ~ I ~ ~ N s G ~ . . . . . . . . . . . ~ . . . ~ ~ ~ ~ ~ ~ ~ ~ " s y p ~ s y ~ . . . . ~ " ~ ~ " ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ .
. . . . . . .~
. ~ ~ ~ ~ ~ .
~ ~ ~ ~ ~ ~~
~
Q N A H R N F y Q Q " " . G H L L ~ ~ c N ~ c ~ ~ . ~ . . . . . . . . . ~ . ~ . . . ~ . . ~ . c ~ ~ y p ~ s y ~ ~ ~ ~ . ~ ~ ~ ~ " ~ " ~ ~ . . . . . . . ~ ~ . . . . . . . . s ~ ~ ~ ~ ~ . . . . . ~ Q ~ ~ ~ Q ~ ~ ~ ~ ~ ~ ~ . . . . . . ~ E ~ $ A p ~
RDASyWAqQQ . ~ GAVQI-I\QTsGDc~pL
~..
~ . . . . . . ~ . .~.~.~~
C ~
y p.
A.
~...
K y S~ V I ~ - V D Q.....~~~~........
N c S V p T ~ . ~.S S~
D G p.E
~~-~...
V.
D T.A.A.P ~
G V.
-------~~
~ A V N Y S Y N K - ~ ~ ~ ~ G V L I I A ~ I \ Q T S G P Y Q ~ - - - - - - - - - - - - - ~ ~ ~ ~ ~ ~ ~ S I G Y P G A L V - - - - N A ~ A V ~ L E N ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - ~ V E N C T Y R V A D F - ~ ~ ~ ~ S- S -R- C ~ S ~ ~ C D ~ A ~ Q ~ ~ D - V ~ I $ A P C A
T N A V D Y A Y D K - ~ ~ ~ - G Y L I I ~ A I \ G I ( S G P K P G - - - - - - - - - - - - - ~ ~ ~ ~ - - - S I ~ Y ~ ~ A L V - - - - N ~ ~ ~ ~ A ~ ~ ~ N ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - ~ I Q -~ ~ T ~ ~ ~ ~ ~ F ~ ~ ~ ~ ~ S S R G ~ K R
" E A " K ~ ~ "~ S . " " Q I L " W ~ - ~ ~ c ~
~ ~~
D ~ ~ ~ ~ . . . . . .E V.
I S V~
G A I.
NF .
~. ~
. .~. ~
~~
~
~ ~
~
~ "~
~
A S~
~
E F~
~
~ .
~~
.
~.""
~.
~
~.~ S.
~.~N~
.~ S~..
~~ ~..
E~ V D L V A p G E . . .~.
. . .
~ ~ ~ " ~ N I \ V ~ N . " " G V L V V C ~ C ~ ~ D C D~~~..............
ERTE
EL~ypAA
~ . ~.
E V.I A V G S V S V~ ~ ~ ~ ~A ~
R E ~
~ S ~~
F ~~
~ ~ .
~ ~.
...........
S.
N A. N.K .E ~.~ .
~ ~ L V A ~ G ~ . . . ..
. . .. ~
I(~RVXyAVSN"""1Svv~~~~cDc
~
~ ~~
D ~ ~ . . . . .E.
~A
.y.p ~
A A y.N
""
...
EVIAVGAVDF
.
~ ~ ~ ~ ~
~ L~
R L~
s D F~
. ~ ~~
~ ~.
p -.
E E. ~ .
~ ~.. ~." .
IDIVAPG"""".~
" "..
.
.
H Q ~ I R W \ ~ ~ E . ~ ~ " D I L V ~ ~ ~ ~ ~ ~ ~ c ~ ~ ~ ~ ~ . . . . . . . . ~ . . . . . ~ y ~ y p c ~ y p . . . . ~ ~ ~ Q ~ ~ ~ ~ ~ ~ ~ ~ .~ ~ ~ ~ ~ ~ ~ . . . . . . .
WDRIKEAVAS"".GRLVV=-G~cDcNEE
~~~~
~ .~ .~. ~. .
~ .
~ .~ p""E
~ F AVVQVGSVSL
y p G A. .
y . ~ . . ~ ~ ~ . . . . . ~. . .N ~ ~
S
~
N ~
C~
IK
~
D L~
~V I.\ .
~ * .G.~E. " ~" " . ~~ ~ ~ ~ ~~ ~ ~ ~~ ~ ~~ ~ ~~ . ~
~
Yaprod
l l l p r i
trt41a
Laaqua
taproc
iaprok
Lapror
bbprl
tuhrlp
plbspr
macdpa
"Uespr
acalpr
dfO'YZ
aooryz
afelsl
dnprfd
dnpepd
rhprbl
anpepc
SCpFbl
SCYFPJ
"P6FPr
y1xprz
scyct5
COr.AB(
etcy12
6PPFPP
1siasp
bspara
hecplp
Il">PP
1spo:
cepc2
bcpc:
hspc2
dCPC1
lapel
hepcl3
bCPCl
hspac4
hrpcb
aatur
dmfurl
fLfYI
cctur,
actur1
actur2
Isfur2
"PC4
xlfuri
hrfur
dmfu2-2
cetur2
hakxl
hslpc
ylxpr6
k l kexl
eckex2
apkrpl
hvccvp
avprca
"maspa
SlSSP
scsepr
rmserp
SWSPl
6.66P2
phesal
bsspra
besprb
bsbpf
bsvpr
'p6cp'
11prtp
Idprtb
llsp09
agserp
1cp69
CrnCYC"
paat70
dCr3FI-P
hsklaa
ddtagb
ddcagc
dapg.9
hstpp2
CCtPP
emstab
PfPYro
t6p16L
Fig. 2. Continues.
~~~
506
basbpn
bs6168
bssdy
blscax
bstipl c
bsspxd
bsapxq
bls147
baa1 k p
bseyab
bsaprs
bscpr
bsseor
Ymvapt
p=ap=p
pdalY6
bsfa19
b~t.341
bplsp
bsispl
bslakp
bslspq
Lllap
LvLher
tstap
bsakl
hmhlys
nahlys
syo531
bsvpra
dnbpi
dnavp?
dna"pc.
alaprl
xrproa
..c.st
c o r . AB
"aproa
alapl2
Lrt41,
Laaqua
taprot
Laprak
Lapro.
bbprl
tuea1p
plbsp,
macdpa
aocspr
.CdlD.
ataryz
d"0'yZ
ilfC1.t
anprta
anpepd
Lhilrbl
="pep=
s;cprbl
ECYspl
spsepr
ylXpr2
scyct5
cor-C
efcyla
sep=pp
161 asp
- - Y Y P T S L V S P L G K A A D F ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ P D ~ Y T L S - F G T S L A T P E V S A A L A ~ l ~ ~ ~ ~ -- --RNSHLKYKEVRII
- - - - ~ - - - - - - D ~ ~ ~ D S N ~ V ~ N ~ L F
~ E I T T M l V A N 7 R L V G K I S D ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ P I G Y T L N - - M- -G- N -SI RIYAP STI SN EYI R
I SSL l
G SCr YNO-D K-E -R N L ~~~
~ --IEITKRVIEDEIV ~
~
~ - E I I T T I G T D A I W I D F Q F I E N V P R G F I l n - I G T S L I T G L F ~ I ~ - - - - - - - - - - - - - ----- - S-L-Q R F K S A N F Y
bspara
~ ~ E " L A l D K ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ . ~ " " " " " " " . Q S E I T I Q . S G T S F I \ T P ~ " ~ ~ " ~ ~ L y l E D C E " ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ S I D L D F L R S I ( S E D L G " " " " ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
K Q S V L S T S S ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - ~ - ~ ~ - - - - N G R Y I Y Q - S G T S L R ~ P I Y -S LG ~I \
O LQ RP LE ETIAD II E( LY F
Qr -K ~r ~c-l-E~ K~ E~ r- Y -H D R X B Y G N C r L D V Y K L L K E
KDWLFTTAN ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ T G W Y Q Y V ~ Y C N S F A T P K Y S G A L ~ L ~ ~ D K ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Geeplp
lln16p
1 "PC2
ccpcz
bcpr2
hPpc2
dcpcl
laprl
hspcl3
bcpc3
hspac4
hspc6
aatur
dmfuri
tttur
cctur1
acfurl
actur2
1etur2
mmpc4
xlturR
hsful
dmtur2
cetur2
hakxZ
hslpc
ylrpr6
klkexl
ecker2
epkrpl
hvccvp
avprca
asaspa
s1ssp
6c6epT
srnscrp
6mSSpl
6msspZ
phseal
bsepra
bsaprb
bsbpf
bsGr
spscpa
11prtp
ldprtb
116~09
agscrp
lep69
cm.Ic"c"
paat70
atscrp
hsklaa
ddtagb
ddtagc
dmpga9
hstpp2
=tPP
smstab
PfPFO
tsp1st
D A G V A T T D L Y - - - - - - -- - - - ~ ~ ~ - ~ ~ ~ ~ - ~ ~ ~ ~ ~ M I C T A S H ~ S ~ T S A A A P E A A C Y P R L R L E A ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ N L T W R D M Q ~ L N L T S K ~ N ~ ~ ~ D ~ - - - ~ ~ - - - - N ~ ~ ~ - H W K
E T C V A T m L y - ~ ~ " ................
~~~
GRCTRSH-SGTSAARPEAAG\IFRLALAL~ANP~~~~~~~~~~~~~~~SLTWRDLQHLNLT~~~N~~~D~~C~FII~lNCSHFEU~NGVGLEYIDM(LFGFGVLDA
E A C V A T T D L y - ~ ~ " ~ ................ G N C T L ~ R - S G T S ~ A P E A A C Y F R L A L A L Q A N P ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ N L T W R D ~ Q " L T " L T ~ ~ ~ N ~ ~ ~ ~ ~ V H E - - - - - - - ~ - ~ ~ - W ~ N G V G L E F I D M ( L F G F G V L
EAGVATTDLY~~~~~~~~~~~~~~~~~~~~~~~~~~GNCTLRH-SGTSAAAP~~~~~F~L~LEANL----------~----GLTWRDMQHLT~LT~K~NQLHDEVHQ-~-~~~~~~~~~
E C R V T S A D L H ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - G K C R I S R - ~ ~ T ~ ~ A A ~ ~ ~ A ~ L ~ A L L L E S N P ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ N I T W R D A Q ~ ~ ~ A H T S R M E P L A L E ~
DPRITSADLH~~~~~~~~~~~~~~~~~~~~~~~~--NECTQTH-TGTSASAPLAA~IFALALEQNP-~~~~-~~~~~~~~~~LTWRDLQHIVVWTSEFDPLA~G-------------WI(RS
DQRITSADLH~~~~~~~-~-~~--------------NDCTETH-TGTSASAPLAAGIFALALEANP~~~~~~~~~~~~~~~NLTWRDM~HL~~~TSEYDPLAMIPG~~~~~~~~~----WKKNGAG
DQKISS~LH~~~~~~~~~-----------------HECTDSH-TC~SAAAPLAAG~LALALEANP~~~~~~~~~~~~~~~NLTWRDVQ~LIVWTSEYDPLSS~G~~~~~~~~~----~FQNG
ERKIV~DLR~~~~~~~~~~----------------QRCmCH-TGTBVSRPMVACIIALALEANS~~~~~~~~~~~~~~~QLTWRDVQHLL~KTSRPAHLKASD--------------~~~N~A~HKV~HF
DKKII~DLR~~~~~~~~~~----------------QRC~IDM(~TCTSASAPMAAGIIALALEANP~~~~~~~~~~~~~~~FLTWRDVQHVIVRTSRAGHLNA~~-~~~~-~--~~--~K
E K Q V I ~ L H ~ ~ - - - - - - - - - - - - - ~ - - ~ - - - - ~ ~ - H S C T S S H T C T s R S R P L A A G I A A L V L E A N P ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ ~N ~L ~T ~W WR SD~LNQG~YIGVRVR RV TS AH KS ~F G Y
N GL LK MD DP AT A~ H~V~I~L~A~O~ ~
EKQVVT~LH------------HSCNSHTCTBRSRPLRRGIAALVLQSNQ~~~~~~~~~~--~-~NLTWRDLQ~IVVRTAKPANLKDPS~~~~~~~~~~~~~~~SRNGV~RRVSHSFGYGL
EREIITSDLH-~--------------~-~-~-~~~~HSCTTQH~TGTSASAPLAAGICALALEANK~~~~~~~~~~~~~~~QLTWRDMQHIVVRTARLANLQSSD~~~~~~~~------~~TN~"~RH"~
EK~ILTTDLH~~~~~~~~------------------HAC~H-TGTSASAPLAAGIVALALEANP~~~~~~~~~~~~~~~NLTWRDLQH~VIRTAKPINL~GD--------~-----WTTNGVGR~SHSFC
E K Q I V T ~ L H ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - - - - - - - - - - - - - ~ ~ ~ ~ ~ ~ - ~ ~ T ~ A S A P I V ~ ~ L L A L A L E A N P ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ S L T W R D L Q ~ I I ~ E T A K ~ D ~ L ~ ~ D ~ - - - - - - - - ERQIATTDLR~~~~-~~~~-----------------QRCTTI1I-TGTSlSAPLAA~I~AL~LEA~~~~~~~~~~~~~~~~DLTWRDVQYITLMTSRSDPI~DGQ-----------~--WIVNGVCRKVSLRY
E K C I A S T D L H ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ - - - - - - - - - - - - E K C T ~ - T G T S I \ S I ~ ~ ~ ~ ~ ~ ~ E - - - - - - - - - - ~ ~ - ~ W V T N C V G R Q V S L R Y G Y C L H D
O P Q I V ~ L H - - ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ H Q C T D K H T G T S A S R P L A A G M I A L ~ L E A N P - - - - - - - - - - - - - - - L L T W R O L Q H L V V R A S R P A Q L Q A E D ~ ~ ~ ~ ~ ~ ~
EI(QIVTTDLR---------------~~~~-~~~~~~QKC~SH~TCTSASAPLAAGIlALALEANK~~~~~~~~~~~-~~-NLT~RDMQHLVVQTSNPAGLNAND~~~~~~~~~~~~--~IT
EKQIVTTDLR---------------"-QI[CTESHTCTESH~TGTSASAPLAAGIIALTLEANK-~~~-~~~~~-----NLTWRDMQHLVVQTSK~AHLNA~~~~-~~~~~-~~--WATNGVGRKVSHSYGYGLLDAGAMVAL
DKSVANDHDGSLRPD--------"-HIC?1IEHTCTERSAPLAAGICALALEANP~~~~~~~~--~~-~~ELTWRDMQYLVVYTSRPAPLE~ENC~~~-~~~------~TLN~VKRKYSNKFGYGLMDAGAMVSLAI9
QPAIVNDVP--------------------~~~-~~GGC~KH~TGTSASAPLAACIIALALEANP~~~~~~~~~~~~~~~ELTWRDMQHLVLRTAN~KPLE~G-------------WSRNGVGRMVSNKFGYGL
EN~HY~LY------------------------~-H~TEEF~KGTSASAPLAAGI~ALTLEANP~~~~~~~~~~~~~~~LLT~RDVQALIVHTAQITSPVDE~--------------W~RNCRCFHF~KFGFCR
LRSIVTTDWDLQKG----~~--~~~~~~~~~~~~~~T~CTECH~T~TSAAAPLAA~MIALMLQV~P~~~~~~~~~~~~~~-CLTWRDVQHIIVFTATRYEDRRAE-~~~~~~-------WV~
--YIYGTDINAIDDKSRR---~~~-~~~~~~~~~~~PRCQNQH~GGTSARAPLAAGVFALALSVRP--~~-~~~---~---DLTWRDMQYLALYSAVEINSNDDG~~~~~~~~~~----~QDTASGQRFHHQF
--YIITTDLD---~-~~~~~~~~~~~~~~~~~~---EKCSKI1(-GCTslU\RPLAAGIYTLVLERNP---------------NLTWRDVQYLSILSSEEINPHDGK~~~~~~~~~~~~~-WQDT~GKRYSH
--Y~HSSDIN~~~~~~~~~~~~~~~-~~--~~~---GRCSNSH-GGTS~APLAAG~YTL~LEAN~---------------NL~WROYQYLSILSAVGLEIWADGD~~~~~~~~~
~~SIL~PE~~------------------------GTCTRSH-GGTSAAAPLASA~YALALSIRP-------------~~DLSWRDIQH~~YSASPFDSPSQNAE--~-~--~~~~~UQKTPAGFQFSHHFGFGKLDASKFVEV
~~PNE?VUYD~-------------------------GKCGFIP-SSSSARPPILG~LLALIRAHP--------------~TLTL~IQRIL~RAA~~V~T~~GRGW~~~~~~~~~~WLMlV~R~~RNFGFGEVS
T P G I W T R D R T G V - V G Y N S G N L G D Q A - - - - - - - - - G N Y R I ~ ~ - ~ ~ T S ~ A C P ~ - ~ " ~ ~ L I L S ~ N ~ - - - - - - - - ~ - - - - - - ~ ~ ~ ~ D ~ " ~ D I I K R ~ C D R I D P V G G - ~ ~ ~ ~ ~ ~ ~881
~~~~~~~---~NAEGRSPF
APA~VT~LPGCDUGYMlVDDPSTNRLHMIPQLDlSCDYNG~~~--------------~~DLSYRDLRDLLI~NITRLDAN~PVQINYI9)VTGLECWERNAAGLWYSPSYGFGLVDVNKTQPCSIIl91
~
--LILGTLP---------------------------GGK~GYM-AGTSMASPHVAGVAALlKS~P---------------HASPAMVKALLYA~ADATACTKPYDlDGDGKVDAV------~E~PK~~CFYG~GMADALDAVTW
- - D I Y S T Y P - - - - - - - - - - - - - - - - - - - - - - - - - G C G Q C T Y P - - - - - - - - - - - - ~ ~ ~ D ~ T P A Q I ~ T R I E ~ T A E R S V N G ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ - - - 7-0-) - - - - H D D F V G W G V V D
~~DIYSNCRLES~GCAVM(EAYNKGELSL~~-~~NPGYGNK-SGTSMAAPHVTGVAAVLMQRFP~~~~~~~~~~~~~~~Y~SADQISAVIK~ATDLGVA--------"GIDNLFA
-~RVYSSIIEGTSVENL-------------------TTGYAKY-SGTSMAAPHVAGSVAVLMERFP--~~----~------YLNGAQVAEVLKTTA~MGAP~~~~----------~~~~~~~~~~~GIDALYGWGMINLGKAI
--I(IYSNRNGSDP~-----~-~-~~-~------~SDYGNK-NGTSMATPHVTGAVA~LLQRFP---------------~~SSAQIADVLKTTA~MGAP~~~~~----------~---~~~~~~CIDALYGWGIIINLGI(
--LIGVADEHKKP-----------------------QYGLTKE-~TSFSAPAITASLAVLKE~~D---------------~~TATQIRDTLLTTA~LGEK------------~-~~~~~~~~~~~G~
--OIYSARYFTPLSALSAQILEYISPRH-------LPYYTTF-SGTSMAAPHVAGIlALMLE~~-----~---------~~~~LE~KEILEGTAlPMEGY-----------~~~~~~~~~~~~~~~AlWETCAGYVDA
~-DIYS~RVLAPLSALMEIA~Ll~PQH-~-~~~~LPYYT~~SGTSMATP~AGlVALMLEADP---------------T~~PDQVKEI~QHTA~PGY~~~~~-------------~~~~~~~~EAWEVGAC
~~NIRSSVP~~~~~~~~-~~~~~~----------~~GQTYEDG~DGTSMAGP~VSAVAALLKQ~A---------~-----SLSVDEMEDILTSTAEPLTDST~~~----------~~~-~~-~~PDSPMIGY
~~NIVSTIP~PDH~~~~---~------------PYCYCSI(QCTSMASPH~AGAVAVIKQAKP~-~~~-~~~------KWSVEQ1KAA~M~AVTLKDSD~~~~~~--------~~~~~~~GEVYPH
~~DILSSVA-~~--~~---~----------------MIKYAKL~SGTSMSAPLVAGIMGLLQKQYE~~~~~~~TQYPDMTPSERLDLAKK~LMSSATALYDED~~~~~~~~----------~~E~YFSPRQ~A
~ ~ N I W S T Q N ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - ~ ~ ~ M I C Y ~ ~ S C T S M A S P ~ l A ~ ~ Q ~ ~ ~ K Q A L M I K M I ~ ~ Y A ~ ~ ~ Q ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ? V ~ ~ ~ ~
- - ~ l Y S L I \ ~ - - - - - - - - - ~ - ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ - D N K Y Q Q M ~ S G T S M A S P F ~ A ~ S E A L ~ L Q G ~ ~ - - - - - - - ~ Q ~ ~ N ~ ~ ~ ~ ~ ~ ~ Q F ~ ~ ~ A ~ ~ S H P ~
--SIWRAWSSNSTE----------------------GENFAL~-S~TSMATP~AG~A~~~KQ~HP---------------NWSP~lASAlM~AQ~D~~~LL-~~~~~~~~~AQQAT~PSTATPFDYG
~~LYLASWIPNEATAQICRiYYL-------------~~H~~~-~~TSMA~PHA~GV~ALL~AHP~~~-~~~~~~~~~~~EWSPARl~~~~~~~N~~~NTLNPl~~~~~~-~~-~~NILAAWPTSVDDN~------------------KSTFNII-SGTSMSCPHLSGVRALL~S~P~~~~~~~~~--~~~-D~~P~~KS~M~RDTLNLANSPI--------------LDERLLPAD~YAICAGH~
- - EILAAWPSVAPVGGIR - - - - - - - - - - - - - - - - -N T L F N I I - S G T S M S C P H I T G I A T ~ K T Y ~-P - - - - - - ~-~-T W S.P.A.
AlKSALM~ASPMN~---~~~~~~~~~~---------FNPQAEFAYCSGHVNPLKAVR?Gll441
- ~ N I I A A W N P P N Q S D E D T W S E H T - - - - - - - - - - - - P S T F M L L ~ ~ - ~- - - - T
~N~
- S D- ~ P- G T P F D F G A G W N P I C R L P P C o
~~N~LAAUTGARGPTGLASDSR--------------RVEFNII~SGTSMSCPHVSGLAALLKSVHP---------------~~~P~~RSALM~AYKTYKDGK~~-----~~~~~~--LD~ATGKPSTPFDHGAGHVSPTTATNPGIl
- - GVRGSGV~ - ~ - - - KGGCRAL
~ ~. ~ G~T ~ ~~A ~ ~~~ A ~~A ~ T~L L ~~~ ~ Q- . .~
. . .-. . ~
. . .~
. . ~~
E L ~ .N ~ ~
A ~ ~~
K Q A~
L I A~ ~ ~~
~ ~ ~-~ . . . ~ ~ ~ ~ ~ ~ - - - - - - - - - - --Y~?.~~~~N~ENSToQCGDGSLPN-----------RN~~~~~-~~TS~ATPLATAATTlLRQYLVDGYFPTGESVEENKL~P~~~~~~~L~lMIAQLLNGTYFWSASS--~-~~TNPSNA~FEQlNCANL~QGW
--YITS~SNG~~QCGDGSLPN-----------~ALLAl-SGTSMATSFAAAA~lLRQYLVDGYYPTGSlVES~LQPTGSLLKALMlMIAQLLNGTFQLlTSSSl~~~-TYPSNQVFENFAGASLV~WGAI~SNWLHVllO~2l
~ ~ A I A S V P Q~
F T ~ ~ ~ ~ . M S~
K S Q L~
M.NG~
T ~ M - ~~
~ A = A~
" * ~ ~~
I S = L K~
...~~
~ ~ .~ ~
~ ~ .
~ ~~
~ N. I .E .
~ ~~
~~
~. .
S.
~
I .
K. .
R .~ .~. ~. V. .
T ~
A ~T. ~~
K ~L~
A G ~~
~H V~
G L~
L ~~
~ = ~ ~ ~ H L 1
-2 7 ~ 1
~~AIASVPNW
~
T
~
~ ~
~ ~ ~ Q ~ H .~
N ~ ~ ~ ~ ~ ~
S. ~. N. A. C. ~~
~~
I~~
DA
~ -~
L~ V~R~
RL A SL~
~
E~~ A~
L V K~ A
-D N~
] ~
~ . . . . . .~
. . . . .E~
V F A~
.
QG~
~
G l ~~
QV~
DKA
~
~
Y D Y~
L 1 73 ~1 - ~
--~
..AFAGYPQYC.........................
R Q ~ M - ~ . N ~ T S ~ S ~ ~ N - G - A C M L ~ G L.
K..........Q-K ~ T P Y ? V R M A L E ~ A Y M L P " I. . . . ~ ~ ~ ~ ~
~ ~ ~~
~ E S~
F SQ G~ U-l K lA TA Y-E K Li ~-l 3-l
.FEUAS~TIDCRGY~~~..............~.~~~AQpDVF-~~T~~ATPyTS~T~AL~~QAYKE-----~~~-~Vy~TpDp~TA~~~LKSSAKDIWY~~---------~--~~~~~~------~PAFSQCSGRMALKARD?Vl6O
..GIYSSLPMW.........................
1 G ~ ~ F M . s G T s M ~ T p ~ V S G ~ A L L I s...........
GpK
p ~ ~ ~ y ~ p D ~ ~ ~ ~ v L E s ~ A T ~ L E G D P ~ ~ ~ ~ ~ ~ ~ - - - - - - - - ~ ~ ~ ~ ~ T G Q K Y T ~ ~ ~ ~
~~HI~.SSLPLWYTV-S ~
~
~
~ ~
~ ~ ~ ~
. ~ ~ ~
~ ~ ~ A
~~ ~ ~. ~ ~ ~
~ ~ ~~
A L ~ ~
I ~ ~ ~
A K ~ ~
~ ~ ~ ~~
~ ~ ~~
~ ~ ~. Q ~ .~ ~ ~- ~ ~ ~ ~ ~ ? A I L ~ L ~ ~ K ~ ~ N ~
507
Subtilases
Subtilisin family
Only found in micro-organisms as yet. Includes mainly enzymes
from Bacillus, with subgroups of true subtilisins (>64% identity),
high-alkaline proteases (>55% identity), and intracellular proteases (>37% identity). Numerous minor variants of true subtilisins
and high-alkaline proteases have been identified (Table 2). Long
C-terminal extensions are rare. Several 3D structures are known
(see Tables 1 and 2).
Thermitase family
Enzymes found only in micro-organisms, including some thermophiles (>55% identity) and halophiles. The characteristic N-terminal
sequencewas alsofound in severalother Bacillus proteases
(Table 3). Only one 3D structure is known (thermitase).
Proteinase K family
Large family of secreted endopeptidases found only in fungi,
yeasts, and gram-negative bacteria as yet; the bacterial subgroup
has >55% sequence identity. This family is characterized by a
high degree of sequence similarity (>37% identity), only minor
insertions and deletions and the absence of the Ca2+-bindingloop
residues 76-81. Only a few of these enzymes have a significant
C-terminal extension beyond the catalytic domain. One 3D structure is known (proteinase K).
Lantibioric peptidase family
A small number of highly specialized enzymes for cleavage of
leader peptides from precursors of lantibiotics, a unique group of
post-translationally modified, antimicrobial peptides (Sahl et al.,
1995). Theseendopeptidases have only been found in grampositive bacteria, and several are intracellular. Only llnisp has a
C-terminal extension, which acts as a membrane anchor. Characterized by low sequence similarity with each other and other subtilases (Fig. 3), and by numerous insertions/deletions. The most
recently reported protein bspara from Bacillus subtilis is described
as a putative protease required for plasmid stability; we speculate
that it may also play a role in lantibiotic processing.
A few 3D structures have been predicted by homology modeling
(Siezen et a]., 1995a; Booth et al., 1996).
Kexin family
A large group of proprotein convertases (PCs) have been identified, all involved in activation of peptide hormones, growth factors, viral proteins, etc. (Barr, 1991; van de Ven et al., 1993). High
specificity is seen for cleavage after dibasic (Lys-Arg or Arg-Arg)
or multiple basic residues. Nearly all are eukaryotic and have high
sequence homology (>40% identity), while two more distant members from Aeromonas and Anabaena provide links to other subti-
Fig. 2. (fucing page) Alignment of amino acid sequences of catalytic domains of subtilases. Multiple sequence alignment was initially
performed using the PILEUP program (Devereux et al., 1984). Next, improvements were made manually by taking into account the
structure-based alignment (Siezen et al., 1991; Heringa et al., 1995). Inserts werejudged to occur most likely in turns in external loops.
Families A to F are indicated on the left. Enzyme acronyms are given in Table 1. (*) New entries, and (c) corrected entries since Siezen
et al. (1991). Residue numbering at the top corresponds to that of mature subtilisin BPN (basbpn). Catalytic residues Asp 32, His 64,
and Ser 221 are in bold (highlighted red), as is the oxyanion-hole residue Asn 155. Green = highly conserved residues from Table 4;
yellow = Cys residues. Structurally conserved regions of the coreABC and extended coreAB are shown as solid bars; common
secondary structure elements are shown as: h = helix, e = extended p-sheet, b = bend and t = p-turn (see also Fig. I). The number
of additional residues inlargeinsertsin
the catalytic domain, andin N- andC-terminal extensions, are shown in brackets.Each
sequence begins atthemature N-terminus; an N-terminus based on the predicted pro-peptide cleavage site is indicated as (<). An
unknown number of C-terminal residues is presented as (>). Residues 146-156 of bspara are from a different reading frame than
proposed by the authors.
508
l $!El
V
W
u% + +
+??
.Lo
o
~
0
8 ; i i
"
-
E
M
z v ~i
~- ~
8 ~ 8 ~ ~ ~
x & x ~ r o n
o~ ~ S~m
x
~ ~ m ~g o g~o
'f:
za"
11
+ + +
~m
~
~
~* ~~ S ~
n n ~ ~
c5;
m
r - m w
G;
m m N w m O - m m ~ 5 ;
+ e . + + +
" " h
m m b d m m N m m m = = G ; 5 ; w 5 ; N w m
~
r
N
P
~
~~o ~
g . M~
m m m~o g ?~Z
~~ S
f Si ~P S ~ -
~
r -
"
. v
"
Lo
.z
2 2
k 7 ~ g g Z C ~ ~ -~ 0e0 . 0G0 "- "o wN - ?0, g & . z z r-ss
"
"
wrm m ""-om=""
o s .
"
"
h
? & J m = o N m m m o g m o e
"
e.
m
P
- N
I C
moo
oo z
m b m m ~ w ~
N P P P m $ o m P m
w o o P m N
m m ~ ~ m m m m m e e e m m m m oe o
4 c
-2 .%s
2
w N mN N
~
d N zW ~
--
~ x sx xs x sx x ex xs x ~x e.
s s se. s s s s- ~ s ~ s s ~ s Ex x sx s s ~ s
x x x x x % x x '
e.
sy
e,
5
oe,
4 -
9 1
E8
ii9
"0
,gg
z& 1w3z4 w
.ss .zsma m :
8:s
.It:
-e,
: $4
Lo
E
m
& $
z &
z zE w
E
g
gs *c2 4 z
* &
$8
b m u n s s
$ 3
5 3
$ L
2
a w
.s-,z$
g
w E Z B P
2 Q
2 g
%
%W"g;~cnnsswe,~u"
m n ~ l : , ~ F F ~ P ~ . ~ ~
E w. B~a iB a
e w~w a ,
f&j5g!$;sgc
w w " 2 c m
w D.83 8 3 z $ g
8 , B 3 -y 8 g g ;
.-2 .g .g ,g :. ,g ,g :. ,g ,g E E g E ::5 2 g E 7 $ $ $
2 ;* % .c .$ 2w .E0 22 9 $E! ' S
.- .- .- .- .-.....-, "
.,r=r=r=r=nr=r=
s r!jz
3 u u u
gs&w- w w o
B .B- B
B ; B B ~ 8 B . . c8 .2 i2. :i+. $
ir
. : ~. -2.
- E 2 E s E 2 2 5 u s mz&.:&&m
w 5
S L
-n
l5
e,
6)
m.;
9)
O
C
<
.+
Q.;
U "
m & @ "
e,
P
z
P?
$5
L?
Y??Vqu
+ + + 4 $+ + p 9 p $3 +
L,Y 3
'"a
*vi
01
3 3.94 g B . 3 + 2 . 3 + 2 +
Q a ,
Q.-
s X % % % 3 % 5? %
G g 3 %% s k ks
h
2
W
e4
CO
.%
V,
uZ
N
m
z,
.E
- - a
d
\o
-8
W Z
m
$
-gggB
v)
'I)
% % x o g z
2EE
Ey
2.5 p
3.3 .3 4 2
e! $
v ) , , - u a g q z = z
hl
2 %
'Du
L k
$g$,z
m g
t - l m
x
.z yp?m.c2 4
m .Z
&
*"
-s,oE
E
.*
B
y 'Z
-- u2
8 &
R
8
3
$ 0
rj.
v)
ill
P
r(
v)
v ) v )
$ a
v)
(0
11
3.23
v)
k Q Q Q Q Q Q Q Q Q Q Q
*
LoLo
m % 8 0 L o m m m
4 -
gCa ~
a ~a
* *
"
, ~a ~2 ,~aB~ ~= ~
w as.?&EL3
b ~ % ~ ~ ~ % b % ~ ~ $ $ $ $ % $ ~ $ $ = = % = = = s -
509
Subtilases
k4
.-2
B
P % s q g g E g s % ; z g gN pm gt w m m
g~ w w
s.pmvi %
NOS% m
-vi-t-P-0
s s g~. I>=aI - s
~ i d x E o o x a ~ Z E a 5 ~2 ~ o~a m
+ + + + + + + + + + + + + + + + + e .
"
"
.I
"
"
"
"-
e e e e e e e g e e . s g s s g g s r .
Z Z Z Z Z Z Z X Z
w
w w w w w w w w w wX wX wX wX wX wX
-8
e,
e,
e,
qH.4
.e
"xnaS g
e,
a g g 3
-g g g z z "
-5 =0 z-
u.5 3
2 %
2 g
"
"
"
"
e r - m ~ m m m t - ~ m m m
m m m \ g t - w m w
0 0 0
m m N m m m l o m * * e d
sse.;sg
GLG L G L G L G
w w w w w w w
A - A -
E z g g s s g e . e g z g
.d
wZ wX wZ wX
g a
E . 2%
2
e, e,
-2-
g5
-2 %
g 2
O L
rm no <o .< a w% w& Zm z ~' &< " + m 7 ~ ~ g g S & uC:
g
?F
k325xkx;%Q
533++++2333&2:232+
e,
-g g -
3
2 2 2 $ B
akz
V u
2 2
-3.
E2
.=.e%.e%w
rz zr p_ exzr<~ i
$$2&2%
Y-
PP++++
e , M
P P P
$ &m w mP gm 2
m m
2 2 2
k 4 r z l " g k w e , E 2 E E
e,e,w&a%'ssnnan
g g g
e,= $ L e , w
e,
m . ~ . s . s ~ z z ~
.e .e0.e w 0gj 2 4 e, e, 2 .e .e .e
gsss.Yse,w
& & & & 2 u m m 7 < < <
% a% + 3 2
+ + h,& +
Q +
,4
Q
"0
CQ
00
2
.5 0
u u v s 1.
6
q
.
o
u$9
2 a 2 8 8 $ 2 :
T T - : p s 2
:
" u r n E
O
0
E
5
a
,
i
s
: $ : 2 ; z 2 E Z Z Q $ $ Z % S E m ; 8
~ k i k k - u P x * " Q $ 2 . s -s " z c c
Q
-s
c 2.-
m
k
z 2z s= p'E .E$
.z
,
i
. s m
2.: $ , ? & $
2
.a UJ
* $g
' C 3 Q 1 .
BE
.-1
g g & 3
a4a."
Qg
v f 2
.Y k s k $ $ . %
u 2 * %.??,.??% c g
:. $ 2
u %
z~ z z z s
u
u
u
g QS L Q e
~
E E E X $~5 ; 5 ; ; ; ; ~
p ~~ ~f uu 9 u. ~
u ' 2 ~ 2' ~: F+ i EI e, Ea . , ? Q& Q
. Q ~ un- =: S4'%
i 'kT, X
P8
z 8 8 ; + f ~ ~ ~ G y y p
~ ~ d % ~ k i k $ p p 2 ?g % kE $
2 E
0 4
S W
M 8 'k9 e e
.g- ..-2. -p. -. E 2
x u
2 gu u9 u%k3 g g . g .
~ 4 6 6 X w w m h h ? q P % ? 4 4 S Gk - 2 w a ~ ~ $ . $ . ~ ~ 3 g h h ~ ~ g L q $ 4 q c1
x *
* *
u * *
* * * * * * u o * a * * * * * *W k
* * * * * * *
*
Q
zss.sE:5=E-zzz
$g3
lom
= % E +zF *2 g %g %
25
:g=%
eLI a> %
> %
d k%Z*
oC
uC u
U
sFi%$zzgg&gzz
P H
z 3- -3 s2a2kgkg. -s f g P < PL gz g . 25 . 3S g2
W
ar=i
w w
,E %E Ee 1,
"
m m - - o - t - * v i m m m
8w
% E
"
""
X E E ~ G E E Z S Z E Z
z *o oz *zmw' D -
gGr??zEgggY-
- m c m o ; z :
" 8
"
"
+ + e . +++++- + + + + e . + + + +
e.+
"
- * m - m t 2 - Z m t - B m m
t-vi
m 8
h
g3%
m
s>
~
m m
gg
r $ r5 n
Zjag~s
m g * o
h->g,S
e
22ezaGKpx.N
& " L
a a a a a
&a
& q :
Z
2 i
510
f R
U
.-Le
cc
m m m g< i gS g- $J gZ
3
a a Z x
5.g
izg
+ + + +
g g Ng N
g
N
gg
m m m z & - $ z g
O
O b N O~ Z br - m
m
p wg m ~
+ +
e , + + + + + + +
+ +
e . + + + +
N W
m m
"
e,C,
"
y e m o w m
2:
%F
m
3 g = % % g
E Z
E S
S
r - r - r - o m
m 0 r - m m ~ o r - G
m m o o m
orno
m N g l - r 0 3 " I $ - N ~
J -m m
c nmc m
n NNNwm
g E gW W
?
EEH
=ss--r4
"
;2
.g
d
" V
"
0 '~
"
.-
-~
3 6 3 3
.-
"
553e'
"
m m m
"
"
"
a m
"
"
2 10- m
2
" V
cn
m
r-
b Nm- b
m
m
W a W ?
e.
e- e. e.
%;;e.?;;
e.
0 0
2m . 5a0
2o
3
w
o >
.e
g 2 8
.-2 mi
-g _g
2k gg g2 :
z. wk
8
m
8
m
.ew e.e,
d d
Z 3 - y -
l z < < E
2 %
.
E .
g
C
~ c .. e . ~
c . e .. e . C .
.ccU
m
.e
'I
w
22
L- g
i w ' 5
g2 2q 4a 0s g E 90 8
2 s.;mE: 8 5 g
8 > ;. w 2 $ .g .4 .2
- c m 9 2 w ;;i 2
f
cx
"
.i:
@
J
o 8 8 6 , S I
x'C:D
e.
C.
C.
e.
e.
-E
-m
W
J
i
.E
.e k
Q
gw
$8
c
8
9
.g.g ;. ,i
;.
-
<ws
4-
g z g z
LU.$ZZ
Le
4-
a
x
.-W
gk k.cg k.c
gg 2: z
W
.-
.c:
.e
9d:62&&&<a ddXE34
C.
c0 z
= .M
g
as:
2 8
8$ 4
8
N O mW Q
m
~ \ mO
m W " m N * q
c' Cn I eWw- em m
- mr r-r-r-r-r-r' A
"
o w
I
i
mr-cnmr-
+e.
kN
~
C
"
.
2
!g g g g
!3
~
v
- m W b b b c n
$?.
m 0
0
~8 0
m
$ 0
gm 2 m m - T t W
* * b e
r - b r
- A "
- t &E; m b~ o &t
h
N r - C \ ( m o m - r - C.C.
m .o
6s
J 3 3 3
"
m m
5%-e.Zm&
?
W$B%2%ZZ
"
2
2 2 2 2
a
g ~ - o o s
-_m
"
m -
CabTtrnNrg
ggawg
12
c . r . c . + +
m m N W
M
I
arj
x
;E$$
ViN
8
,p
3
L?
8r
$
.e
%9 k9
m & b g
N$.s 2
.-
.E
E
.L
ill
g
w
E E
3 3
$ .8 .& .g
.z
-5
3
.&
,2 .2 ."-
. 2 : z s a s s
g A
o A
o E? E?E ?E ?, ; , ;
-z g $ $ $ p $ $
K S 3 A 2. h : > g g
$ g s p Ep Sp Sp .E s . s
E 2
2
Q J
*
"
U U
%
2 % %
u u u u
2 2 2 %
3 2 2 k
?.?ZZ
Q J Q J Q J U
SL: Sr Sr . oa9
g 2 24
gE -usaszr s
.s
X%++
rr:"ul
.- s g
s g
2 g ,s
ssszffZc 999
sE Y%
I
z . ~ . ~. 2r ~ . r s $a ? ~? ? ~?
oa.2
'=:
c
1
"
"
"
P E
a %
3 *2 t 2 . E . Y k , g O N 2 2 2 2 g g E L $ g U
%
~
~
%
w
*
~
~
a
& ? . U S 0
~ , ~ B B B : ~ ~ Y . 3 ~ ~ c . ~ . ~ Y ~ ~ ~
; + E + . E - G ~~ & G S ~ ~ ~~ S E E ~ I a a a + ;+ g z ~ G ~G G G
*
* * z * * > ,* x
* *
* ~ * * * * * *
* I * *
u s z * * * *
4 Q
2 2$
%
in
o\
s : - $ + + + 2< ,2z + +
m 2 m
9 9
9 9
u o . r a
Q ? $ q Q T y
3 + . $ 3 Q g g , % @
$5
3 ;.$$ $ 8g &z
5
+ a & ?
gi
sPe B
%:E$
$ 3 -3g&s$gsy22
-s $ 3g nz xX Az gx . x4 g=$ 2e, sg gg si E
g
u v
" 3
$$$%S
z $ g L L
%
f
s "w ow %
g
w
c
w
o
tac
51 1
Subtilases
O m m N u l P
zm zu F. zm 3~r mF m
; g s
;g
~ l n a x d d
5E 3X
+ + + + + +
+ +
"
"
35s
m
m
Q W N u l m N
"
"
"
r - e - r - m m
rmn tN. O
W CmCPI uNl u
"
"
\ o m
"
sz g* m
s u~l z
0m
g gmP$q O
x ~ x g n 3 s ~ 5n
+++
+ + + + + + + I +
r - \ O W
ul
rW- NS *m
u.GGTS
N * O I * *
m * m
3 3 3
M M o
mu.
0 0 %
e'
'D"*'D"Q' 2
"
"
"
"E,=,"
"
m r - v i r n * f !
l n r n O u l N
W r n r - W W P
"
80
"
"
" +
"
m u .
"
"
""
"V
" V
PS3$~'DO-O
2
c-.
3 6
v
m * V l
m m
P a
ul
" 0 "
* u . r n W v i ~ v r u . N
u.wmrnul
P.
r . u . r - o w m F 3 .. ~ .
vi
"
e. e.
e,
C.
c-.
c-.
c-.
e.
P.
.-C
e,
c
g v
Z 8
Q
E
; 2 ;
N x . S w x -
ggzggg
VI
$23
c
61
Nx
.C- N
e,
2X
22
&
+i='z,+xx
g g
P gg gmW gB Bg gz . 2
8
" " B B B O S e ,
.?:
8%
$;
g
v)
N r n -
* g B b p 5 P.2
isgggg3gz
3>
+ + +
z , +2z 2: + + + + +
ggg
z-
c
e,
ggg
z g
.-i s g
o a ,
g g g g g +: < g
g 2
6 % ;
8 ;
E=
.-z
I\
&
.-z
.-2 .$
Q U U Q
.z .z .o.$
.o .o
73%
E:: E E
r : 2 C C Q C
$eo".>>>>
s e Q Q Q
p z $r: 2z . GQ . S . Z
= F E ~ ~
D
g,g
5r:
3
9.5
9 9
isszg
P
zg.5
$ 5
2 9 s
2 2 Q *
55.3
S
E$$.-EqQ.Q
g
z
8 %
p x? *3* $* $* 9
$4t *n
*
E
E
E E
&
9 8
P
e,
m~ m ~
rn
c c w
m ~m m ~y i
E
5:
B
i
cis
bg
w=l
'-
.BE d B
: 2
~ z z x x E z s s z ~ $3u !"
3 >
u *
5 *
E Z L
sa,-
$2 2
N T p y Z
-g g 2 p ; g
E2
$5
& me Z
282
xm
2 *O mZ N * w g g z
2 sp ,sp s& E
*
E s
g 2- 2 2
gsz
29&
* r i a
512
'
Y g
1 e +
)
SF
z 2
E2
.-0
m
gs3zz z g g z g
Q$$
*mvlz
- * w
8 zZ ga sE zo :E
; q z m m s g N % Z % g s 2 % r$- 2m *2
58
g c 3 % r-zs
133 E E I P X r ,
R-C
p !mC !m! Sm! z s g s m
8 8
m 2
p 2
g m x x x E a a a Qnn z n e x 2 , E x 2
Z N s
x 3 x
e,
se!
N N N m
e
0
2
m m
0'"
2 8 ;
-00
mrn
213
z
Q
y1
$ .?i
" m e
m m
* m m
n I.
22 2%
E.
a
Eg
2'5z 2
2j
D
s
u
P P P S
45
45 45 2% Pi
:@
Q R u ,vuu
,<,zg
Q\
m
a
-o
3
"
g j :
S
w- E
a % E $
c m " t n
1
"
$e,%%
L L
$ 3
e a
3 $ .E:2 . 33 . Gg
gg ggg
.e
.-.-.-.S s % 2 .-222 .- .s s a s l 2.Zg$$ z.sgz a s 2 P P $ 2 :
ssss:E s4:ss s 2 z s $ 3 7 a m u u u
c
'j;
,E:
vl
c 'c
'C
j;
2
'j;
-2z2z2z
e,
E
._
.-E
m
.E:
c z . 2 . 9
.E:
.E:
.E:
Ee, 3
.9
.E:
"
-009
" I
e,
"
"
.s
& & eZ eZ eZ
-"
x
x
&
&
K c K K u u & & e
8 95 9% c c c
9
g
p
3
.- .- ;;n n n '5
.s .5
e: , M A 3 3
Pm m
P
e,
E EQ EQ EQ
E E E E
m a
K K K
X X X
"
u u gZg
L L L
39
P m8 $23
0 u
0
u
m
$L1L141
2 %
*Q ~$ n~n n2 %r % nt
: %
k g 9 5
B + B B + + s B B ~ + + + I + B + + ~ , ~ : ~ : : : ~ ~ ~ ~ ~ Q + +
._
0
*r
&
_a
00
r-
E
.\
* a
.E Eu
*r
3
u
53
*8
.-
a 2 b , = E*
xs s3 z9 3 z
.y .y .y .y .y
-2
0- 0x'
2 -
13
0
z o 2m %m
0 2
0
g
E N
M n
U . *G - *
N
2
2
,
p
5 E & 232.8
-,.
*i
u u *
3
9 2 2 7
3.2 Q Q %
. y . y , y p
C 9
Z 9
C Q
9
*
4
$ 2 2.:a8
=
s s
~
j
z
<
~
~
$
~ 2 9 2 2 p * * g- x m
~ ? g~
~
22$-$~~.-~;~~~$~$gg-$~ g g g Z3 Zx u2 u9 x 2
2 2 2 2 2 .- 2z 2z 2z =
2 2l 2z 2z 2x 2z 2z =
2 2z 2- .-2 $ $ S e e e s 9 s $ 2 g 5
z z z x :
u u u u u
u u u u u
u u u u
u u u " C ' G E
Q EQ E Q S S H < < % % a O P
G G G G G G G G G G G G G G G G G G G &jdi;j 9 9 9 9 6 6 4 4 X I P d
ss
-._2
*
'1
E
G b
k 2
p "= *
s
2
E
f
f
E
E
S
4:
8 Z2ES.Z
3 %--I:
e %- 3r: g g g g g 4 i:
'1
Y)
a -
$a 5z u
2
.-
00
2
\o
'i:
A
5
._
--
.e'
._
;.>
. - E E
ks
.s$a
s2 a4 p
2
g
+ aj zs n
s
+2 %
=
2:s
2 9 %
E E S
a
2
!?
E
CI
00
.5
z
2
2
x
rn
r-
* N
a
-ra
E E
2L Z6
8JE
8%
iiia
9
ci
X X E
mrnE 5
$ g
9 s
L.
513
Subtilases
0 - m
gm ogp gW
X X N
g 8 8 %8 5
Zgz? N s g g
P N r n
;=F
p % m
zwg.0
$
$
$
gg
WP-F:
m m F W
X P E D E E E Z
~ Z 283 8 2 5:s E E
mu20
r-mm
00
N m
m m m o o
g m mg
WEm%
. l
zz
2 %
555
"2;;
r-
.2al
8
8
.c 'C
22
C C
' C .c
3 %
8 8 8 8
.-z g
C
0
.5
2 .5
2
Ea $ $ ? !a
5$ .5
0 0
u
o o a
C C C C
E Z
g2 2g
&E
k3
gg
a a a W - kb a
E Z.E.E.6B 6B 6B 6S
.
u u P - & d > > K .g.g
E g g E E g g 2 E g 2 2 g E2 E E E 2 2
m-mmm
a a
C)N
""
"
+++
6> 5>
o a l o a l
222
5:
3 8 8 3
>
C >
C >
C >
C
P)
'C
C C , C
'C'C
C
$%S$
5665
N N N N
u u
S B + +
"OGN
B S F
+ + + + + + :d: + + +
c~
EE
.g.s a
2 2 AE &G &G -g,
22
5 5 2
x *
z+ ++++
2-8
3 k
.y .2
f %
z g s sgz.9
g u u =:E&&
2Qo>
:
2 8
f %
:'E fal
E&+
rs Eg ez :. a9 !&.,a
- 0
,y
$ ,y5
5EE *.@3=A
.Y
s s 3Q F.
9
2
5 %
35 2 2 %
g z g g z E L : s.2 g z g z
Ei% X E e : $ 4 Q + = & E *
8g
Ei
-kz? 2
m
$22
22%
8::
E E ,E S D
E %
g g8 Zg g 4
m
P *
@
8
P
%%,x
E E x
v1
$
6
.d
B
c
B
c
*%
-% S
m
i
E
Zv
" 8
0
E
N
~x
5
6
Z
514
Fig.3.
Pair-wise sequence identity matrix. Sequences are plotted vertically and horizontally in the same order
as in Figure 2; the
incomplete sequenceof hvccvp is not included. Subdivision into families
A to F is indicated.A color codebar for percentage sequence
identity is shown.
Conserved residues
515
Subtilases
FAMILY
Families
"true"
subtilisin
high-alkaline
17-
Subtilisin
A
Lantibiotic
peptidases
7 1
intracellular
Fig. 4. Family tree or dendrogram analysis of the subtilase superfamily, based on sequence alignment of the
catalytic domains only (Fig. 2). A: General layout of
the relationship between families A to F. B: Detailed
dendrograms of the individual families, in which branch
lengths are in inverse proportion to the degree of sequence similarity.Not included are members with >90%
sequence identity to one of the listed enzymes (see
Table 2). Trees were constructed using the neighborjoining method of Saitou and Nei (1987), as implemented in the programs NEIGHBOR (Felsenstein, 1993)
and GROWTREE (Devereuxet al., 1984). The distance matrices that were used as input for the programs
were calculated using DISTANCES (Devereux et al.,
1984), PROTDIST (Felsenstein, 1993), and HOMOLOGIES (Leunissen. unpubl. obs.). Positions containing
gaps were ignored, as were the large insertions indicated between brackets in Figure 2. Whenever appropriate, the distances were correctedformultiple
substitutions (Jukes and Cantor. 1969; Kimura, 1983).
All methods used delivered in principle identical topologies, except for the branch lengths; these may vary,
depending upon the method used to calculate the distances between the proteins, and correcting for multiple substitutions.
Proteinase K
Lantibiotic
peptidase
Kexin
7
7
3
gram-negative
bacteria
gram-positive
bacteria
plant
Pyrolysin
tripeptidasr
2 thermophile-
516
Enzyme
BACTERIA
Gram-positive
Bacillus subrilis A50
Bacillus sp. (3x6644
Bacillus sp. Y
Bacillus thuringiensis israelensis
Bacillus thuringiensis finitimus
Bacillus thuringiensis kurstaki
BaciNus cereus
Bacillus intermedius 3-19
Nocardiopsis dassonvillei (prasina)
Acronym
N-term.
bsia50
bssugx
bspbya
btisra
btfini
btkurs
bcespr
biprot
ndapII
1-54
1-16
1-2 1
1-14
1-15
6-20
1-15
1-15
1-26
Streptomyces rutgersensis
Thermus Tok3A 1
Vibrio metschnikovii
Cochliobolus carbonum
Proteinase D
Caldolysin
Alkaline protease VapK
Extracellular protease
srespd
tscald
vmapk
ccalp2
1-23
1-15
1-36
1-29
EUKARYA
Fungi
Agaricus bisporus
Malbranchea suljurea
Ophiostoma piceae
Verrlcllliu~ ~hlamydospor~um
Scedosporium apiospermum
abexpr
msthmy
opexpr
vcexp 1
saalpr
1-19
1-28
1-18
1-20
1-13
Other
Family
References
223-243
223-243
Gram-negative
u = I
G
S
G
N
u =2
u = 3
217-222
170-193
C
C
A
C
C
C
C
C
variability)
Context/function
Exception
517
Subtilases
Large insertions and deletions
The 190 residues that constitute the scrs, as defined from the
known crystal structures (Siezen et al., 1991) and shown in Figure 2 are present in nearly all the subtilases. Some unusual deletions are found, however, as listed in Table 5, and this implies that
not all of these core residues are essential for proper folding. Most
of these deletions occur in subtilase family D, the lantibiotic leader
Context
1-13
65-66
14-82
96-102
180-189
257-215
Family
Enzyme
N-terminus, hA
Part hC, adjacent catalytic His
Ca-binding loop + hC extended
Turn, substrate-binding region
Turns
C-terminus, hH
sepepp
asaspa
efcyla
smserp
sePePP
lslasp
Large insertion
Inserted residues
Position
Number
N-term.
Up to 98
59
34
18
30-33
28-30
vr 1
vr4
26-3 1
vr5
vr6
vrl
vr8
VI9
vrll
vr13
vr15
vr16
vr18
vr19
23
147-213
30
42
16
51
34
18
22
16-18
134-169
13-15
21
20-22
149
21
22
20
19
38
34
25
22-24
21
Properties
No homology
Highly charged
Highly charged
Weak homology
High homology
Family
Enzyme
E
F
C
C
F
High homology
Weak homology, see alignment in Fig. 5
F
F
Weak homology
Weak homology in central section (Fig. 5 )
High homology
Weak homology
F
F
C
F
F
F
F
D
F
F
F
D
A
S-S bond ?
High homology
S-S bond ?
High homology
S-S bond?
E
F
F
B
E
E
B
F
F
518
vr5
dmpga9
hstpp2
cetpp
total
168
181
182
vrI 3
total
llSP09
agserp
CrnCUCU
paaf70
lep69
atserp
bsvpr
spscpa
11prtp
ldprtb
-VKGKIVMCDRGIN-- - - -ARVQKGOWKMGGVGMIUUUT- 1 5 2 I
-LTGKVAWKRGSI-----AFVDKADNAKKAGAIGMWY~-I461-VKGKIALIERGDI-----DFKDKVANAKKAGAVGVLIYDN-1491-AKGKIAIVKRGEF-----SFDDKOKYAOAAGAAGLIIVN'-1561-
C O ~ S ~ ~ S U S
*kgk+***
g
169
135
140
149
147
SERINE PROTEINASE
(SUBTILISIN-LIKE)
218
145
134
134
151
150
aga*G****
5 19
Subtilases
essential for stability and activity. From previous sequence alignments and homology modeling it was predicted that the Cal (strong)
and Ca3 (weak) sites are most common in members of the subtilase family, whereas the Ca2 (medium-strength) site is less common (Siezen et al., 1991). The weak Ca4 site has only been found
in proteinase K (Betzel et al., 1988a, 1988b).
For the new subdivision into six families the following predictions can be made about the Cal and Ca2 sites. The Cal site requires coordination from side-chain ligands of residues 2 and 41 and
from several side chains of residues 76-81 in the Ca2+-ion embracing loop; these ligands are usually carboxyl/carbonyl groups of
Asp/Asn, but can also be from Glu/Gln. This Cal site is therefore
predicted to be present in nearly all members of families A, B, and
E, because they appear to contain the required ligands. In contrast,
this Cal site cannot be present in any member of families C and D
due to the lack of loop 76-81, nor is it likely to occur in family F
members due to the high variability in sequence in this loop.
The Ca2 site requires coordination from several side-chain and
main-chain ligands of the loop 49-58, with side chains of residues
49, 52, and 54 appearing to be essential, and stabilization by the
positively charged side chain Arg/Lys of residue 94. Many family
B and E members have the elements required for this Ca2 site if
the side-chain oxygen ligands of Asp, Asn, Glu, Gln, Ser, and Thr
are considered as acceptable. Some members of families A (intracellular proteases) and C (vaproa, alapr2) should also have the Ca2
site. Predictions for the families D and F are too difficult because
in general the sequence alignment is rather speculative in this
region; however, some likely candidates for the Ca2 site are seepip,
Ilnisp, bsvpr, llprtp, and Idprtb.
The Ca3 and Ca4 sites are weak and characteristically only have
one or two side-chain ligands in the known structures. For this
reason no predictions are attempted for these sites in other proteases.
Disuljide bonds
520
S-S bond
29-1 I4
Context
Enzyme
el-hD
Strand-hC
e2-loop
e2-loop
Within insert
vmvapt, psaprp
bsispq, tsiap
paalys, bsta39, bsta41
bssepr
vmvapt
171-131[+1]
259-263
Turn-turn
Loop-loop
Intraloop
27-118[-21
175-247
61-98
163-195
120-1 17[+ I]
68-224
e I -hD
e6-hG
Loop-turn
loop-turn
lntraloop
hC-hF
80-2 I4
163-193
68-224
198-254
vr I
vr16
vr I 9
Loop-e9
Loop-loop
hC-hF
e7-hG
Within insert
Within insert
Within insert
96-102
135-167
151-224
193-197
214-75[+3]
vr4
vr5
vr13
vr19
Intraloop
hE-loop
e5-hF
Intraloop
e9-loop
Within insert
Within insert
Within insert
Within insert
35-69
47-59[-1]
49-55
vr16
53-100
Figure 7 shows a stereo view of these known and putative natural S-S bonds, but only those that can be superimposed on the
subtilisin BPN structure. Other putative s-S bonds may occur in
large inserts that have more than one Cys residue (Table 6). Of the
17 natural S-S bonds shown in Figure 7, only 29-1 14 and 163193 were included in a theoretical prediction of the 31 most energetically and stereochemically favorable disulfide conformations
in subtilisin (Hazes & Dijkstra, 1988). Two natural S-S bonds
Fig. 7. Stereo view of known and putative natural disulfide bonds (bold) in subtilase members, superimposed on the subtilisin BPN
structure (cu-carbon atom trace), Side chains of the catalytic residues are shown in ball-and-stick representation.
52 1
Subtilases
Conclusion
New members of the subtilase superfamily are being identified
continuously, with even more to be expected from the accelerating
genome sequencing projects. Therefore, this summary is bound to
be incomplete when it appears in print. The fact that subtilases
have now been discovered in numerous Archaea, Bacteria, and
Eucarya suggests that they are ubiquitous and have been around
for a long time. The novel information accumulated since our
previous review (Siezen et al., 1991) provides exciting new insights into this unique set of enzymes. Through evolution, many
variants have arisen and at present these can be divided into six
main families Ato F (Fig. 4),based on sequence alignment of only
the catalytic domains. This classification is by no means definitive
yet, as a further subdivision of family F may become apparent
when more sequences are available. Subtilases are quite common
in gram-positive bacteria, and Bacillus species stand out in particular, as many extracellular and even intracellular variants have
been identified (Tables 1 and 2), belonging to four different families. Recently, a Bacillus strain was even found to have a cluster
of four different subtilase genes (Schmidt et al., 1995), belonging
to families A and F.
What is most surprising now is the high degree of sequence
variability that is observed within the catalytic domains of subtilases. With the exception of the three catalytic residues Asp-HisSer virtually every other residue can be replaced by one or more
different residues. Moreover, it is not even clear what the minimal
structural framework requirement is, because large deletions have
now been found (Table 5 ) . Large insertions in this domain are also
quite common (Table 5 ) , andit is still not clear whether these
additions provide extra stability, binding sites, or other functionalities. In one case, Bruinenberg et al. (1994a) demonstrated that
deletion of such a large insert ( 1 5 1 residues) in Lactococcus lactis
proteinase PrtP did not impair protein folding, but it did affect
proteolytic activity and specificity. While sequence comparisons and
homology modeling can provide a first estimate of overall structure
and functionality, and are useful as a tool for rational design of engineered enzymes (Siezen et al., 199l), the high sequence and structural variabilities observed here clearly make some of the predictions
speculative and emphasize the need for more detailed 3D structural
information to complement the sequence data.
High sequence variability is also found in other protease families, such as in the trypsin family of serine proteases (Rypniewski
et al., 1994) and the papain family of cysteine proteases (Berti &
Storer, 1995), although these both tend to have more conserved
disulfides. Subtilases do not rely on highly conserved disulfides for
stabilization, and in fact, most subtilases do not have any disulfides. When these enzymes do have disulfides there is presumably
a maximum of two bonds, which can occur in many different
positions (Table 6).
Known members of the subtilase superfamily are all (putative)
endoproteases or tripeptidylpeptidases. In most bacteria, archaea,
and lower eukaryotes they are extracellular, rather unspecific enzymes required either for defense or for growth on proteinaceous
substrates. In certain cases, and particularly in higher eukaryotes,
the subtilases have developed into highly specialized enzymes of
biosynthetic pathways where they are involved in processing and
maturation of pro-proteins; examples are all family D and E members. As yet, no other completely different function appears to have
arisen for this protein through evolution. On the other hand, given
the high sequence variability allowed for proteases, it is questionable whether such a protein would be recognized as a subtilase
superfamily member in database screening if it has also lost one or
more of the three conserved catalytic residues. Nevertheless, the
search is still on, and the authors would appreciate any useful
comments, updates or advice.
Acknowledgments
We are greatly indebted to our many colleagues for communicating their
sequence data prior to publication. We thankDrs. S. Visser and 0. Kuipers
for critically reading this manuscript. Use of the services and facilities of
the Dutch National NWO/SURF Expertise Center CAOSKAMM is gratefully acknowledged.
References
Abraham LD, Breuil C. 1995. Factors affecting autolysis of a subtilisin-like
serine proteinase secreted by Ophiostoma piceae and identification of the
cleavage site. Biochim Biophys Acta 1245:76-84.
Altschul SF, Gish W, Miller W, Myers EW, LipmanDJ. 1990. Basic local
alignment search tool. J Mol Biol 215:403-410.
Balaban NP. Sharipova MR. Itskovich EL, Leshchinskaya IB, Rudenskaya GN.
1994. Secreted serine protease from the spore-forming bacterium Bacillus
intermedius 3-19. Biochemist? (Moscow) 59: 1033-1038.
Ballinger MD, Tom J, Wells JA. 1995. Designing subtilisin BPN' to cleave
substrates containing dibasic residues.Biochemistry 34: 13312-133 19.
Barr PJ. 1991. Mammalian subtilisins: The long-sought dibasic processing endoproteases. Cel/ 66:1-3.
Barrett AJ, Rawlings ND. 1995. Families and clans of serine peptidases. Arch
Biochem Biophys 318247-250.
Benjannet S , Lusson L, Hamelin J, Savaria D, Chrttien M, Seidah NG. 1995.
Structure-function studies on the biosynthesis and bioactivity of the precursor convertase PC2 and the formation of the PC2/7B2 complex. FEES
Lett 362:151-155.
Berti PJ. Storer AC. 1995. Alignment/phylogeny of the papain superfamily of
cysteine proteases. J Mol B i d 246:273-283.
Betzel C, Belleman M, Pal GP, Bajorath J, SaengerW, Wilson KS. 1988a. X-ray
and model-building studies on the specificity of the active site of proteinase
K. Proteins Strucr Funcr Genet 4:157-164.
Betzel C, Pal GP, Saenger W. l988b. Three-dimensional structure of proteinase
K at 0.15 nm resolution. Eur J Biochem /78:155-171.
Booth MC. Bogie ChP, Sahl HG. Siezen RJ, Hatter KL, Gilmore MS. 1996.
Structural analysis and proteolytic activation of Enrerococcus faecalis cytolysin, a novel lantibiotic. Mol Microhiol 21:1175-1184.
Bruinenberg PC, Doesburg P, Alting AC, Exterkate FA, Vos WM de, Siezen RJ.
1994a. Evidence for a large dispensable segment in the subtilisin-like catalytic domain of the Lactococcus lactis cell-envelope proteinase. Protein
Eng 7991-996.
Bruinenberg PC, Vos WMde, Siezen RJ. 1994b. Prevention of C-terminal
autoprocessing of Lactococcus [actis SKI 1 cell-envelope proteinase by engineering of an essential surface loop. Biochem J 302:957-963.
Burton KS, Wood DA, Thurston CF, Barker PJ. 1993. Purification and characterization of a serine proteinase from senescent sporophores of the commercial mushroom Agaricus bisporus. J Gen Microbiol 1391379-1386.
522
Carter P, Wells JA. 1990. Functional interaction amongcatalytic residues in
subtilisin BPN. Proteins Struct Function Genet 7335-342.
Chestukhina GG, Zagnitko OP, Revina LP, Klepikova FS, Stepanov VM. 1986.
Extracellular serine proteinases from subspecies of Bacillus thuringiensis
evolve much more slowly than the corresponding &endotoxins. Biochemist? (Moscow) 51:1472-1479.
Creemers JWM, Siezen RJ, Roebroek AJM, Ayoubi TAY, Heylebroeck D, Van
de Ven WJM. 1993. Modulation of furin-mediated proprotein processing
activity by site-directed mutagenesis. J B i d Chem 26821826-21834.
Davail S, Feller G, Narinx E, Gerday C. 1994. Coldadaptation of proteins.
J Biol Chem 2691744-17453.
Devereux J. Haeberli P. Smithies 0. 1984. A comprehensive set of sequence
analysis programs for the VAX. Nucleic Acids Res /2:387-395.
Durham DR. 1993. The elastolytic properties of subtilisin GX from alkalophilic
Bacillus sp. strain 6644 provides a means of differentiation from other
subtilisins. Biochem Biophys Res Commun 194: 1365-1 370.
Felsenstein J. 1993. PHYLIP (Phylogeny Inference Package)version 3% Distributed by the author. Seattle, Departmentof Genetics, University of Washington.
Freeman SA, Peek K, Prescott M, Daniel R. 1993. Characterization o f a chelatorresistant proteinase from Thermus strain Rt4A2. Biochem J 295:463-369.
Gallagher T, Bryan P, Gilliland GL. 1993. Calcium-independent subtilisin by
design. Proteins 16:205-2 13.
Gallagher T, Gilliland G, Wang L, Bryan P. 1995. The prosegment-subtilisin BPN
complex: Crystal structure of a specific foldase. Structure 3:907-914.
Gaucher GM, Stevenson KJ. 1976. Thermomycolin. Method., En;ymol45:415433.
GreerJ. 1990. Comparative modeling methods: Application to the family of
mammalian serine proteases. Protein5 Struct Funct Genet 7 3 17-334.
Gron H, Meldal M, Breddam K. 1992. Extensive comparison of the substrate
preferences of two subtilisins as determined with peptide substrates which
are based on the principle of intramolecularquenching. Biorhemistry 3/:601 I6018.
Groa P. Betzel Ch, Dauter Z, Wilson KS, Hol WGJ. 1989. Molecular dynamics
refinement of a thermitase-eglin-c complex at I .98 A resolution and comparison of two crystal forms that differ in calciumcontent. J Mol Biol
210347-361.
Hazes B. Dijkstra BW. 1988. Model building of disulfide bonds in proteins with
known three-dimensional structure. Protein Eng 2: 119-125.
Heinz DW. Pnestle JP, Rahuel J, Wilson KS, Griitter MG. 1991. Refined crystal
structures of subtilisin Novo in complex with wild-type and two mutant
eglins: Comparison with other serine proteinase inhib~tor complexes.J Mol
Biol 217353-371.
Heringa J, Argos P, Egmond MR. Vlieg J de. 1995. Increasing thermal stability
of subtilisin from mutations suggested by strongly interacting side-chain
clusters. Protein Eng 8 2 - 3 0 .
Jukes TH, Cantor CR.1969. Evolution of protein molecules. In: Munro HN, ed.
Mammalian protein metabolism, vol. 111. New York: Academic Press. pp 2 I132.
Kabsch W, Sander C. 1983. Dictionary of protein secondary structure: Pattern
recognition of hydrogen-bondedandgeometricalfeatures.
Biopolymers
22:2577-2637.
Katz B, Kossiakoff AA. 1990. Crystal structures of subtilisin BPN variants
containing disulfide bonds and cavities: Concerted structural rearrangements induced by mutagenesis. Proteins Struct Funct Genet 7343-357.
Kimura M. 1983. The neutral theo? of molecular evolution. Cambridge, UK:
Cambridge University Press.
Kraulis PJ. I99 I , MOLSCRIPT A program to produce both detalled and schematic plots of protein structure. J Appl Cry.stal 24:946-950.
Kunitate A. Okamoto M, Ohmori I . 1989. Purification and characterization of a
thermostable serine protease from Bacillus thuringiensis. Agric Biol Chem
533251-3256.
Kwon ST, Terada 1, Matsuzawa H, Ohta T. 1988. Nucleotide sequence of the
gene for aqualysin I (a thermophilic alkaline serine protease) of Thermus
aquaticus YT-I and characteristics of the deduced primary structure of the
enzyme. Eur J Biochem 173:491-497.
Kwon YT, Kim JO, Moon SY, Lee HH, Rho HM. 1994. Extracellular alkaline
proteases from alkalophilic Vibrio metschnikovii strain RH530. Biotech Lett
/6:413-418.
Larcher G, Cimon B, Symoens F, Tronchin G, Charbasse D, BoucharaJ-P. 1996.
A 33 kDa serine proteinase from Scedosporium apiospermum. Biochem J
315:I 19-1 26.
Lavrenova GI, Gulnik SV, Kalugar SV, Borovikova VP, Revina LP, Stepanov
VM. 1984. Extracellular acid serine proteinase D of Streptomyces rutgersens i s . Biochemistry (Moscow) 49447-454.
Lilley G, Stewart DJ, Kortt AA. 1992. Amino acid and DNA sequences of an
extracellular basic protease of Dichelobacternodosus show that it is a
member of the subtilisin family of proteases. Eur J Biochem 210:13-21.
Subtilases
ceous inhibitor SSI (Streptomyces subtilisin inhibitor), Protein Eng 4501508.
Takeuchi Y, Satow Y, Nakamura KT, Mitsui Y.1991b. Refined crystal structure
of the complex of subtilisin BPN and Streptomyces subtilisin inhibitor at
1.8 8, resolution. J Mol B i d 221:309-325.
Tsujibo H, Miyamoto K, Hasegawa T, Inamori Y. 1990. Amino acid compositions and partial sequences of two types of alkaline serine proteases from
Nocardiopsis dassonvillei subsp. prusina O K - 2 IO. Agric B i d Chem 5 4 2 1772179.
van de Ven WJM, Voorherg J, Fontijn R, Pannekoek H, Ouweland AMW van
Furin is a
den. Duijnhoven HLPvan, Roebroek AJM, SiezenRJ.1990
subtilisin-like proprotein processing enzyme in higher eukaryotes.Mol Biol
Rep 14265-275.
van de Ven WJM, Roebroek AJM, Duijnhoven HLP van. 1993. Structure and
523
function of eukaryotic proprotein processing enzymes of the subtilisin family of serine proteases. Crit Rev Oncogenesis 4: 1 15-1 36.
van der Meer JR,Rollema HS, SiezenRJ, Kuipers OP, Vos WMde. 1994.
Influence of amino acid substitutions in the nisin leader peptide on biosynthesis and secretion of nisin by Lactococcus lacris. J Biol Chem 269:35553562.
Wells JA, Powers DB. 1986. In vivo formation and stability of engineered
disulfide bonds in subtilisin. J B i d Chem 261:6564-6570.
Wells JA, Powers DB, Bott RR, Graycar TP, Estell DA. 1987. Designing substrate specificity by protein engineering of electrostatic interactions. Proc
Narl Acad Sci USA 84: I 2 19-1 223.
Zhou A, Paquet L, Mains E. 1995. Structural elements that direct specific
processing of difterent mammalian subtilisin-like prohormone convertases.
J Biol Chem 270:21509-21516.