Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf ·...
Transcript of Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf ·...
![Page 1: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/1.jpg)
Guowei Wei
Departments of Mathematics
Michigan State University
http://www.math.msu.edu/~wei
The 3rd Annual Meeting of SIAM Central States Section
September 29 — October 1, 2017Colorado State University
Grant support:
NSF, NIH, MSU and BMS
Topology based deep learning for drug discovery
![Page 2: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/2.jpg)
Drug design and discovery
1) Disease identification 2) Target hypothesis3) Virtual screening4) Drug structural optimization in the target binding
site5) Preclinical in vitro and in vivo test6) Clinical test7) Optimize drug’s efficacy, toxicity, pharmacokinetics,
and pharmacodynamics properties (quantitative systems pharmacology)
M2 channel AmantadineInfluenza -- flu virus M2-A complex
![Page 3: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/3.jpg)
CPU
GPU
TPU
Half of all jobs will be done by robots in the near future
Welcome to big-data era
![Page 4: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/4.jpg)
GenBankWhole Genome Shotgun
Release Date Bases Sequences Bases Sequences
219 Apr 2017 231824951552 200877884 2035032639807 451840147
![Page 5: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/5.jpg)
Biological sciences are undergoing a historic transition: From
qualitative, phenomenological, and descriptive to quantitative,
analytical and predictive, as quantum physics did a century ago
Yearly Growth of Total Structures in the Protein Data Bank
![Page 6: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/6.jpg)
Deep learning
Fukushima (1980) – Neo-Cognitron; LeCun (1998) – Convolutional Neural Networks (CNN);…
![Page 7: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/7.jpg)
How to do deep learning for 3D biomolecular data?
Obstacles for deep learning of 3D biomolecules:
• Geometric dimensionality: R3N,
where N~5500 for a protein.
• Machine learning dimensionality: > m10243, where m is the
number of atom types in a protein.
• Molecules have different sizes --- non-scalable.
• Complexity: biochemistry & biophysics
Solution:
• Topological simplification
• Dimensionality reduction & unification (scalability)
![Page 8: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/8.jpg)
Möbius Strips (1858)
Klein Bottle (1882)
Classical topological objects
Torus
Double Torus
Sphere
Trefoil Knot
Seven Bridges of Königsberg
Leonhard Euler (1735)
Leonhard Paul Euler(Swiss Mathematician,
April 15, 1707 – Sept 18 1783)
![Page 9: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/9.jpg)
Topological invariants: Betti numbers
0 is the number of connected components.
1 is the number of tunnels or circles.
2 is the number of cavities or voids.
Circle TorusPoint Sphere
0
0
1
2
1
0
0
1
1
2
1
0
1
0
1
2
1
0
1
2
1
2
1
0
![Page 10: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/10.jpg)
Topological simplification
HIV 4.2
million
atoms
Trefoil Knot
DNA
Mug Doughnut
Poincare-Hopf index Morse theory
![Page 11: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/11.jpg)
Opportunities, challenges and promises
Challenges with topological methods:
Geometric methods are often inundated with too
much structural detail.
Topological tools incur too much reduction of
original geometric information.
Topology is hardly used for quantitative prediction.
Opportunities from topological methods:
New approach for big data characterization and classification.
Dramatic reduction of dimensionality and data size.
Applicable to a variety of fields.
Promises from persistent homology:
Embeds geometric information in topological invariants.
Bridges the gap between geometry and topology.
![Page 12: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/12.jpg)
What is the topology of a benzene?
What is the topology of a H2O-CO2 complex?
Level sets generated by
Laplace-Beltrami flows:
Electron density level sets computed
by using quantum mechanics:
Persistent homology answers following questions
![Page 13: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/13.jpg)
Vietoris-Rips complexes of planar point sets
Simplexes:
0-simplex 1-simplex 2-simplex 3-simplex
Simplicial complexes of ten points:
![Page 14: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/14.jpg)
kk
k
kk
kk
kk
H
BZ
H
B
Z
Rank
Im
Ker
1
¶ksk = (-1)i
i=0
k
å v0,v1,...,vi,...,vk{ }
k
i
i
ic
Simplexes:
0-simplex 1-simplex 2-simplex 3-simplex
Boundary operator:
Frosini and Nandi (1999),Robins (1999),Edelsbrunner, Letscher and Zomorodian (2002), Edelsbrunner and Harer, (2007)Kaczynski, Mischaikow and Mrozek (2004),Zomorodian and Carlsson (2005),Ghrist (2008),……k-chain:
Chain group: )( 2K,ZCk
Topological modeling - Persistent homology
![Page 15: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/15.jpg)
Vietoris-Rips complexes, persistent homology and
persistent barcodes (Xia, Wei, 2014)
![Page 16: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/16.jpg)
Topological fingerprints of an alpha helix
(Xia & Wei,
IJNMBE,
2014)
Short bars are NOT
noise!
![Page 17: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/17.jpg)
Topological fingerprints of beta barrel
(Xia & Wei, IJNMBE, 2014)
Protein:2GR8
![Page 18: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/18.jpg)
Topological noise reduction
Original data Ten-iteration
denoising
Twenty-iteration
denoising
Forty-iteration
denoising
(Xia & Wei, IJNMBE 2015)
![Page 19: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/19.jpg)
Persistent homology for ill-posed inverse problems
Original data:
microtubule
Fitted with one-
type of tubulins
Fitted with two-
types of tubulins
PCC=0.96 PCC=0.96
(Xia, Wei, IJNMBE, 2015)
![Page 20: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/20.jpg)
G = g area[ ]dr,ò Sarea
Objective oriented persistent homology
where gamma is the surface
tension, and S is a surface
characteristic function:
)(
S
SS
t
S
S=1
S=0
Generalized Laplace-Beltrami flow
Objective: Minimal surface energy
(Wang & Wei, JCP, 2016)
![Page 21: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/21.jpg)
S
SS
t
S Level sets generated from
Laplace-Beltrami flow
Objective oriented persistent homology
(Wang & Wei, JCP, 2016)
![Page 22: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/22.jpg)
Topological analysis
of protein folding
ID: 1I2T
(Xia, Wei, IJNMBE, 2014)
j
jLE 0Bond
j
jLE 1/1Total
Quantitative!
![Page 23: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/23.jpg)
Time
2D persistence in protein 1UBQ unfolding
(Xia & Wei, JCC, 2015)
log10(N)
Ra
diu
s
0
2
1
![Page 24: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/24.jpg)
Multicomponent and multichannel persistent
homology for a protein-drug complex
Radius
Co
mp
on
en
t
Components are generated from element specific persistent
homology. Eight channels are constructed from births,
deaths and persistences at Betti-0, Betti-1 and Betti-2.
(Cang & Wei, IJNMBE, 2017)
…
![Page 25: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/25.jpg)
…
Protein-
ligand
complex
Element
specific
groups
Correlation
matrix
topological
fingerprints
Feature
vector
Training and
prediction
Training
data
Learning
algorithm
Known labels
Query
…
…
Trained
model
Prediction
Topology based learning architecture
(Cang & Wei, IJNMBE, 2017)
![Page 26: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/26.jpg)
Topological fingerprint based machine learning method for the classification of 2400 proteins
Protein domains: 85% Accuracy(Alzheimer’s disease)
Influenza A virus drug inhibition: 96% Accuracy
Hemoglobins in their relaxed and taut forms: 80% accuracy
55 classification tasks of protein superfamilies over 1357 proteins from Protein Classification Benchmark Collection: 82% accuracy
(Cang et al, MBMB, 2015)
![Page 27: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/27.jpg)
…
Original Complex
Classify atoms
into element
specific groups
Generate
topological
fingerprints
Multichannel images
(54x200)
Convolutional deep
learning neural
network
…
Topological convolutional deep Learning architecture
Convolution (128x200)
…
Pooling (128x100)
Flattening (1xN)
Prediction
(Cang & Wei, PLOS CB, 2017)
![Page 28: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/28.jpg)
Blind binding affinity prediction of PDBBind v2013 core set of 195 protein-ligand complexes
![Page 29: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/29.jpg)
Directory of Useful Decoy (DUD) Classification of 98266 compounds containing 95316 decoys and 2950
active ligands binding to 40 targets from six families
![Page 30: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/30.jpg)
Drug Design & Discovery Resource (D3R) Grand Challenge 2
Given: Farnesoid X receptor (FXR) and 102 ligandsTasks: Dock 102 ligands to FXR, and compute their poses, binding free energies and energy ranking
Duc Nguyen
![Page 31: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/31.jpg)
D3R Grand Challenge 2 Given: Farnesoid X receptor (FXR) and 102 ligandsTasks: Dock 102 ligands to FXR, and compute their poses, binding free energies and energy ranking
![Page 32: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/32.jpg)
Convolution
and pooling
Task specific
representation
Multi-task topological deep learningTopological feature extraction
Membrane
protein
mutation
impacts
Globular
protein
mutation
impacts
Topological Multi-Task Deep Learning
(Cang & Wei, PLOS CB, 2017)
![Page 33: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/33.jpg)
Blind prediction of mutation energies
Prediction
correlations for 2648
mutations on global
proteins
Prediction
correlations for 223
mutations on
membrane proteins
(Cang & Wei, Bioinformatics, 2017)
![Page 34: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/34.jpg)
Prediction of partition coefficients: Star Set (223 molecules)
Wu, Wang, Zhao, Wang, Wei, 2016
![Page 35: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/35.jpg)
Concluding remarks Multidimensional, multicomponent, multichannel and
objective orientated persistent homologies areintroduced to retain essential chemical and biologicalinformation during the topological simplification ofbiomolecular geometric complexity.
The abovementioned approaches are integrated withadvanced machine learning, including deep learning,to achieve the state-of-the-art predictions of protein-ligand binding affinities & ranking, mutation inducedprotein stability changes, and drug partitioncoefficients.
Take home messages Molecular based mathbio (3 NSF-Simons Centers) Topological data analysis Machine learning
![Page 36: Topology based deep learning for drug discoveryusers.math.msu.edu/users/wei/Topology2017.pdf · Poincare-Hopf index Morse theory. ... topological fingerprints Feature vector Training](https://reader031.fdocuments.in/reader031/viewer/2022020302/5ab4fd7b7f8b9a0f058c567b/html5/thumbnails/36.jpg)
P Bates
(MSU)N Baker
(PNNL)Z Burton
(MSU)
X Ye
(UKLR)K Dong
(MSU)
J Wang
(NSF)M Feig
(MSU)
H Hong
(MSU)J Hu
(MSU)
Y Tong
(MSU)