Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf ·...
Transcript of Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf ·...
Automatic chemical design using a continuous, data-driven
representation of moleculesRafa Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato,
Jorge Aguilera-Iparraguirre, Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik
How to design molecules?
• Can predict properties of molecules (QSAR)
• Want a molecule with a certain property (inverse QSAR)
How to optimize molecules?• Build a giant library of molecules and try them all
• Or: Local search based on small changes
Cornelius J. O'Connor, Henning S.G. Beckmanna and David R. Spring
Discrete optimization is basically impossible
• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go
• Gradient gives D hints for your D parameters
• need some version of@molecule
@property
Discrete optimization is basically impossible
• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go
• Gradient gives D hints for your D parameters
• need some version of@molecule
@property
Discrete optimization is basically impossible
• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go
• Gradient gives D hints for your D parameters
• need some version of@molecule
@property
Nguyen, Dosovitskiy, Yosinski, Brox, Clune
Credit: Natalie Fey
Credit: Natalie Fey
What is a molecule?Graph SMILES string
Text autoencoders
• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
TextVAE-Interpola*on
Repurposing text autoencoders
c1ccccc1 c1ccccc1
ENCODERNeural Network
DECODERNeural Network
CONTINUOUS MOLECULARREPRESENTATION
Latent Space
Discrete Structure SMILES
Discrete Structure SMILES
Can be trained on unlabeled data
Map of 220,000 Drugs
Map of 100,000 OLEDs
Random Organic LEDs
F S S F O OO S
N N N
HS S S S SH
HO SHH2N NH2
O
FHNH2N
NH3
F F
HO F
N NH2
OHSHSO
F S FH2S
OHHO OH
FHS
NH2
OH
O
N SH
O
NH2H2N FHF
SHS
S
F
H2O
SHSHS F
CH4
H2N OHN
F F
HO S S S S S S S S S S S SH
N
S N
NO
O
N
N
N N
N
O O
OO
S O
N
N N
NN N
N
NN
N
NN N
NN
NS
N
N N
N
N
N N
N
N
N
O
O
NN
S N
NNN
N
N NN
N
O
NN N
N
N
NNO
NN
NO
NN
N
NNN
N
Standard autoencoderVariational autoencoder
Molecules near aspirin
O
NHN+
O
-O
Cl
Cl
NH
NH
N+
O
O-Cl
Cl
Br
N
N+
O
-OCl BrCl
O
NH
N+
O
O-Cl
Cl
SHCl
ON+
O
-O
NH
Cl
Cl
O
NH
N
N+
O
-OCl
F
O
ON+
O
-OCl
Cl
HN
O
NH
N+
O
O-Br
Cl O
ON+
O
O-Cl
Br O
NH
NH
N+
O
O-Cl
Cl
O
O
HN
N+
O
-O
HN Cl
Br
O
NH
N+
O
O-HO
Cl
Cl
ON+
O
O-Cl
Cl Br
ON+
O
O-Br
Cl
Cl
HN
N+
O
O-
Cl
Cl
O
O
NH
N
N+
O
O-Cl
Cl
Br
O
NH
N+
O
O-F
NH
Cl
O
O
HN
N+
O
O-HO
F
F
NHN+
O
-OF
NH
Cl
BrO
NH
N+
O
O-Br
Cl
Cl
O
O
N+
O
O-Cl
Cl Br
O
NH
N+
O
O-Cl
Br
O
ON+
O
O-Cl
Cl O
O
NH
N+
O
O-Cl
Cl
O
ON+
O
O-Cl
F
Cl
O
NH
N+
O
O-
Cl
Cl
Cl
O
NH
N+
O
O-Cl
Cl O
ON+
O
O-Cl
Cl
O
O
NH
N+
O
O-Cl
Cl
Br
O
NH
N
N+
O
O-Cl
Cl
Cl
O
NN+
O
O-
Br
Cl
Cl
O
NH
N+
O
O-Cl
O Cl
ON+
O
O-Cl
Cl
Cl
O
NH
N+
O
O-Cl
Cl Cl
O NH
N+O O-
Cl
N
O NH
N+O O-
Br
O
NH
N+
O
O-Br
Cl Cl
O
NH
N+
O
O-
NH
Cl
Br
O
N+
O
-O
N NH
Cl
O
O
NH
N+
O
O-Cl
NCl
Cl
O
O
N+
O
O-
NH
Cl Cl
O
NH
N+
O
O-Cl
Cl Br
HN
N+
O
O-
Cl
Cl
Br
O
NH
N+
O
O-Br
Cl
Br
O
NH
N+
O
O-
Cl
Cl
O
NH
N+
O
O-Cl
OCl
ON+
O
O-Cl
Cl
Br
O
NH
N
N+
O
O-Cl
Cl
Cl
O
NN+
O
O-
Cl
Cl Cl
O
NH
N+
O
O-Cl
Cl
OH
No chemistry-specific design!
Gradient-based optimization
Objective Values in Training Data
Objective Values
Freq
uenc
y
15 10 5 0 5 10
050
0015
000
2500
0Molecule 1
Molecule 2
Molecule 1Molecule 2
Proof of conceptBut be careful what you wish for
Applications of small-molecule design
OrganicLEDs,liquidcrystals,organicsolarcells,gasdielectrics,supercapacitors,ba8eries,electronicpolymers,homogeneouscatalysts,plas;caddi;ves,adhesives,sealants,3Dprin;ng,paintsandcoa;ngs,specialtyfibers,biodegradablepolymers,medicalplas;cs,pes;cides,smallmoleculedrugs
Future work
Future work• Jointly train autoencoder
and prediction model
Future work• Jointly train autoencoder
and prediction model
• Decode directly to graphs
Future work• Jointly train autoencoder
and prediction model
• Decode directly to graphs
• Decode directly to recipes for synthesis
Thanks!
Rafa Gómez-Bombarelli, Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre
Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik