You are on page 1of 22

Scott Hollingsworth

(Department of Biochemistry & Biophysics, Oregon State University)

Mentor:
Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics, OSU)
In Collaboration With:
Dr. Weng-Keen Wong (Department Of Computer Science, OSU)
Dr. Donald Berkholz (Department of Biochemistry and Molecular Biology, M
Dr. Dale Tronrud (Department of Biochemistry & Biophysics, OSU)

Each protein has an


individual structure

Structure flows from


function

Understand
structure,
understand function

Ptr Tox A

Phi & Psi (, )


Phi and psi describe the

conformation of the
planar peptide (amino
acid) in regards to other
peptides
One amino acid two
angles

Ramachandran Plot
Voet, Voet & Pratt Biochemistry
(Upcoming 4th Edition)

310 Helix

Use of Protein Geometry


Database (PGD) to identify
linear group existence (i.e.
-helix, -sheet, -helix)
Simple repeating structures
Methods: manual searches
Hollingsworth et al. 2009. On

the occurrence of linear groups


in proteins. Protein Sci.
18:1321-25

-Helix

Linear groups are only part of the picture


Not all common protein motifs are repeating structures
Many have changing conformations

Goal of this research:


Identify all common motifs in proteins

Too complex for manual searches


Enter machine learning

Form
Can

of artificial intelligence

identify clusters within a dataset

Cluster significant grouping of data points

Visual

example

Topographical map of Oregon


Data value: Elevation

Mt. Hood
(11,239 Feet)
Mt. Jefferson
(10,497 Feet)
Three Sisters
(10,358-10,047 Feet)

Highest points (Individual peaks)

Topographical map of Oregon


Data value: Elevation

Highest points (Individual peaks)

Topographical map of Oregon


Data value: Elevation

LU

LO

AS

TS

OCHOCO
S T RAW B E R R I E S

PAU LI NA
MTS

J AC K AS S
MTS

NS

MAH O GANY
MTS

HAR T MTN

ST

SISKIYOUS
( K ALAM AT H )

AL

EE

C A S C
A D E S

T U ALAT I N
HILLS

C OAS T

R AN G

TROUT CREEK
MTS

Mountain ranges (Broad patterns)

Similar approach with our data


2-Dimensional Example

helix

PI

Similar approach with our data


2-Dimensional Example

Complications
Our

Data: 4-dimensional dataset

4D to 2D distance conversions

What

has and hasnt been observed?

No definitive source
Abundance / Peak Heights

Machine

learning programs can


identify both previously documented
and unknown common motifs and
their abundances

1) Create and prep datasets


with resolution of at least 1.2
or higher, 1.75 or higher

2) Run cuevas

3) Analyze identified clusters


Automated process using Python

to remove bias

4) Analyze context of motifs

2D-visual example of cuevas clustering

Goal: Definitive list of


the most common
protein motifs
In order of abundance

Everest Method
Locate highest peak

first
Bad pun : Mt. Alpha-rest

Locate second highest

peak
Locate third.

Identifying motifs
Search for peaks while

looking for ranges

Results:
Definitive list of common

protein motifs in order of


abundance
The list

Points Per
Circle r=10 Degree2
5644
247
173
147
125
117
88
55
51
43
40
36
35
34
31
31
30
29
24
20
20
20
19
17
15
14
11
10
9
9
8
8
7
6
6
6

18.07
0.7909
0.5540
0.4707
0.4003
0.3747
0.2818
0.1761
0.1633
0.1377
0.1281
0.1153
0.1121
0.1089

i
-63.4
-125.5
-69.9
-65.5
-70.4
-57.2
-88.3
-88.1
-91.8
93.5
-133.9
-82.4
54.9
-122.3

i
-42
132.4
157.4
-21.4
153.6
131
-2
1.3
-1.9
-0.1
164.3
-26.8
38.3
119.6

i+1
-64
-118
-61
-90.3
-60.4
82.4
-64.7
87.9
-58.4
-71.7
-62.2
-146.3
84.5
52.7

i+1

Residue
i
i+1

-40.6
130.2
-36.3
1.5
143
-0.6
136.9
5.7
-42.5
146
-34.1
152.1
0.8
41

PII

PII
PII

L
L
PII`
L
PII

L
`
PII`
PII`

0.0993
0.0993

-136.1
65.3

70.4
28.3

-65
-67.2

-19
140.8

0.0961
0.0929
0.0769
0.0640
0.0640
0.0640
0.0608
0.0544
0.0480
0.0448

82.6
56.7
78
-78.3
-96.6
50.5
-69.9
-129.1
53.7
-87.6

5.6
-133.5
0.5
116
0.9
49.9
-32.3
80.8
48
61

-103.1
-73.7
-67.5
-89.1
-133.8
-61.2
-129.8
-70.3
-118.9
-140.3

137.5
-10.7
-43.1
-31.1
156.3
148.3
73.1
141.9
126.6
149.5

0.0352
0.0320
0.0288
0.0288
0.0256
0.0256
0.0224
0.0192
0.0192
0.0192

76.3
78.8
-138.5
92.8
-107.6
84.6
-85.8
-102.4
-77.9
83.8

-169.3
171.1
165.7
165.9
16.8
8.1
71.8
-9
-8.6
-166.3

-61.4
-69.3
57.7
-62.5
80
-143
-83.1
92.6
86.7
-121.9

138.3
-29.6
-137.8
-35.7
-177
169.3
163.5
163.3
174.2
132.1

L
`

PII`

Cluster Size

Motif Name

PII
L
PII
L

PII

L
L

1
1
1
1
1
1
1
1
1
1
1
2
1
1

PII

PII

PII

1
1

-helix / 310-helix
-strand
PII- Helix N-Cap / Capping Box
Type I Turn#
PII
Type II Turn
Type I Turn Cap
Schellman Motif
Reverse Type I Turn
Reverse Type II Turn
Turn
Classic Beta Bulge
Type I` Turn
L
P
G1 Beta Bulge
L
Type II` Turn
L
Type VIa1 Turn (S)
Classic Beta Bulge (S)
Wide Beta Bulge (S)

PII
L (S)
` Turn
PII` PII
PII` (S)
PII`

Reverse Type II` Turn


L
` PII

(S)
PII`

PII

PII`

PII`

PII

1
3
1
1
1
1
2
1
1
1
2
1
2
1
1
1
3
4
1
1

New Motif

X
X

X
X

X
X
X
X
X
X
X
X
X
X
X
X

Motif shapes
Each motif analyzed by

plotting of each motif


range
Understand the shape of
the cluster/motif

Results:
New insight into each

motifs structure
Context
Comparisons

Type II Vs. Type II`


Hairpin turns
180 Turn
Two Residues
Defined as mirror
images of each
other

Distributions show
differences between
the two structures
Nearly four years in
the making

The results go on
Motif analysis
Viral forming of Pangea

Range and peak method sections


Adapting cuevas for our data
Python automation
Identification of 310 Helix & Type I Turn

6D, 8D, 10D and 12D clustering


Full helix caps, loops, halfturns

For full story, a manuscript for publication is being


prepared:
Hollingsworth et al. The protein parts list: motif identification

through the application of machine learning.(Unpublished)

Cuevas was successful in identifying both


documented and undocumented motifs
Previously described: Linear groups, helix caps, -turns

(& reverses), -bulges, -turns, loops, helix bends, structures


Numerous new motifs
Successful from 4D through 20D

Results form the Protein Parts List


Comprehensive list of all common protein motifs found in

proteins

Dr. P. Andrew Karplus


Dr. Weng-Keen Wong
Dr. Donald Berkholz
Dr. Dale Tronrud
Dr. Kevin Ahern
Howard Hughes Medical
Institute

You might also like