Identification of common protein motifs through the application of machine learning
-
Upload
ashton-rojas -
Category
Documents
-
view
30 -
download
0
description
Transcript of Identification of common protein motifs through the application of machine learning
![Page 1: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/1.jpg)
Scott Hollingsworth (Department of Biochemistry & Biophysics, Oregon State University)
Mentor: Dr. P. Andrew Karplus (Department Of Biochemistry & Biophysics, OSU)In Collaboration With: Dr. Weng-Keen Wong (Department Of Computer Science, OSU)
Dr. Donald Berkholz (Department of Biochemistry and Molecular Biology, Mayo Clinic)Dr. Dale Tronrud (Department of Biochemistry & Biophysics, OSU)
![Page 2: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/2.jpg)
Each protein has an individual structure
Structure flows from function
Understand structure, understand function
Ptr Tox A
![Page 3: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/3.jpg)
Phi & Psi (φ, ψ) Phi and psi describe the
conformation of the planar peptide (amino acid) in regards to other peptides
One amino acid – two angles
Ramachandran PlotVoet, Voet & Pratt Biochemistry
(Upcoming 4th Edition)
φ
ψ
![Page 4: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/4.jpg)
Use of Protein Geometry Database (PGD) to identify linear group existence (i.e. α-helix, β-sheet, π-helix…) Simple repeating structures Methods: manual searches Hollingsworth et al. 2009. “On the
occurrence of linear groups in proteins.” Protein Sci. 18:1321-25
α-Helix
310 Helix
![Page 5: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/5.jpg)
Linear groups are only part of the picture Not all common protein motifs are repeating structures Many have changing conformations
Goal of this research: Identify all common motifs in proteins
Too complex for manual searches Enter machine learning
![Page 6: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/6.jpg)
Form of artificial intelligence
Can identify clusters within a dataset Cluster – significant grouping of data points
Visual example…
![Page 7: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/7.jpg)
Topographical map of OregonData value: Elevation
Highest points (Individual peaks)
Mt. Hood(11,239 Feet)
Mt. Jefferson(10,497 Feet)
Three Sisters(10,358-10,047 Feet)
![Page 8: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/8.jpg)
Topographical map of OregonData value: Elevation
Highest points (Individual peaks)
![Page 9: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/9.jpg)
Topographical map of OregonData value: Elevation
Mountain ranges (Broad patterns)
C A
S C
A D
E S
C O
A S
T
R A
N G
E
S I S K I Y O U S( K A L A M A T H )
B L U E M T S
W A L L O W A S
S T
E E
N S
S T R A W B E R R I E S
O C H O C O
M A H O G A N Y M T S
J A C K A S S M T S
H A R T M T N
T U A L A T I N H I L L S
T R O U T C R E E KM T S
P A U L I N AM T S
![Page 10: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/10.jpg)
Similar approach with our data2-Dimensional Example
φ
ψ
![Page 11: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/11.jpg)
Similar approach with our data2-Dimensional Example
α-helix
β
PII
αL
φψ
![Page 12: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/12.jpg)
Complications…
Our Data: 4-dimensional dataset 4D to 2D distance conversions
What has and hasn’t been observed? No definitive source Abundance / Peak Heights
![Page 13: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/13.jpg)
Machine learning programs can identify both previously documented and unknown common motifs and their abundances
![Page 14: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/14.jpg)
1) Create and prep datasets with resolution of at least 1.2Å or higher, 1.75Å or higher
2) Run cuevas
3) Analyze identified clusters Automated process using Python
to remove bias
4) Analyze context of motifs
2D-visual example of cuevas clustering
![Page 15: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/15.jpg)
Goal: Definitive list of the most common protein motifs In order of abundance
“Everest” Method Locate “highest” peak
first▪ Bad pun : “Mt. Alpha-rest”
Locate second highest peak
Locate third…….
![Page 16: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/16.jpg)
Identifying motifs Search for peaks while
looking for ranges
Results: Definitive list of common
protein motifs in order of abundance
The list…
![Page 17: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/17.jpg)
Points Per ResidueCircle r=10 Degree2 φi ψi φi+1 ψi+1 i i+1 Cluster Size Motif Name New Motif
5644 18.07 -63.4 -42 -64 -40.6 α α 1 α-helix / 310-helix247 0.7909 -125.5 132.4 -118 130.2 β β 1 β-strand173 0.5540 -69.9 157.4 -61 -36.3 PII α 1 PII- Helix N-Cap / Capping Box147 0.4707 -65.5 -21.4 -90.3 1.5 α δ 1 Type I Turn#
125 0.4003 -70.4 153.6 -60.4 143 PII PII 1 PII
117 0.3747 -57.2 131 82.4 -0.6 PII δL 1 Type II Turn88 0.2818 -88.3 -2 -64.7 136.9 δ PII 1 Type I Turn Cap55 0.1761 -88.1 1.3 87.9 5.7 δ δL 1 Schellman Motif51 0.1633 -91.8 -1.9 -58.4 -42.5 δ α 1 Reverse Type I Turn X43 0.1377 93.5 -0.1 -71.7 146 δL PII 1 Reverse Type II Turn X40 0.1281 -133.9 164.3 -62.2 -34.1 β α 1 βα Turn36 0.1153 -82.4 -26.8 -146.3 152.1 δ β 2 Classic Beta Bulge‡
35 0.1121 54.9 38.3 84.5 0.8 αL δL 1 Type I` Turn34 0.1089 -122.3 119.6 52.7 41 β αL 1 β → αL X31 0.0993 -136.1 70.4 -65 -19 ζ α 1 ζ → αP
†
31 0.0993 65.3 28.3 -67.2 140.8 αL PII 1 G1 Beta Bulge30 0.0961 82.6 5.6 -103.1 137.5 δL β 1 δL → β X29 0.0929 56.7 -133.5 -73.7 -10.7 PII` δ 3 Type II` Turn24 0.0769 78 0.5 -67.5 -43.1 δL α 1 δL → α X20 0.0640 -78.3 116 -89.1 -31.1 PII δ 1 Type VIa1 Turn (S)20 0.0640 -96.6 0.9 -133.8 156.3 δ β 1 Classic Beta Bulge (S)20 0.0640 50.5 49.9 -61.2 148.3 αL PII 1 Wide Beta Bulge (S)19 0.0608 -69.9 -32.3 -129.8 73.1 α ζ 2 α → ζ†
17 0.0544 -129.1 80.8 -70.3 141.9 ζ PII 1 ζ → PII X15 0.0480 53.7 48 -118.9 126.6 αL β 1 αL → β (S) X14 0.0448 -87.6 61 -140.3 149.5 γ` β 1 γ` Turn11 0.0352 76.3 -169.3 -61.4 138.3 PII` PII 2 PII` → PII X10 0.0320 78.8 171.1 -69.3 -29.6 PII` α 1 PII` → α (S) X
9 0.0288 -138.5 165.7 57.7 -137.8 β PII` 2 β → PII` X9 0.0288 92.8 165.9 -62.5 -35.7 ε α 1 ε → α X8 0.0256 -107.6 16.8 80 -177 δ PII` 1 Reverse Type II` Turn X8 0.0256 84.6 8.1 -143 169.3 δL β 1 δL → β X7 0.0224 -85.8 71.8 -83.1 163.5 γ` PII 3 γ` → PII X6 0.0192 -102.4 -9 92.6 163.3 δ ε 4 δ → ε X6 0.0192 -77.9 -8.6 86.7 174.2 δ ε 1 δ → ε (S) X6 0.0192 83.8 -166.3 -121.9 132.1 PII` β 1 PII` → β X6 0.0192 57.1 44.5 -152.5 158.8 αL β 1 αL → β X6 0.0192 -128.3 98.7 56.7 -133.3 ζ PII` 1 ζ → PII` X
![Page 18: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/18.jpg)
Motif “shapes” Each motif analyzed by
plotting of each motif range
Understand the shape of the cluster/motif
Results: New insight into each
motif’s structure Context Comparisons
![Page 19: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/19.jpg)
Type II Vs. Type II`
Hairpin turns 180° Turn Two Residues
Defined as mirror images of each other
Distributions show differences between the two structures
Nearly four years in the making…
φ
ψ
![Page 20: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/20.jpg)
The results go on… Motif analysis
▪ Viral forming of “Pangea”
Range and peak method sections▪ Adapting cuevas for our data▪ Python automation
▪ Identification of 310 Helix & Type I Turn
6D, 8D, 10D and 12D clustering▪ Full helix caps, loops, halfturns…
For full story, a manuscript for publication is being prepared: Hollingsworth et al. “The protein parts list: motif identification
through the application of machine learning.”(Unpublished)
![Page 21: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/21.jpg)
Cuevas was successful in identifying both documented and undocumented motifs Previously described: Linear groups, helix caps, β-turns (&
reverses), β-bulges, α-turns, loops, helix bends, π-structures… Numerous new motifs Successful from 4D through 20D
Results form the “Protein Parts” List Comprehensive list of all common protein motifs found in proteins
![Page 22: Identification of common protein motifs through the application of machine learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56813788550346895d9f248f/html5/thumbnails/22.jpg)
• Dr. P. Andrew Karplus• Dr. Weng-Keen Wong• Dr. Donald Berkholz• Dr. Dale Tronrud• Dr. Kevin Ahern• Howard Hughes Medical
Institute