Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf ·...

Automatic chemical design using a continuous, data-driven

representation of moleculesRafa Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato,

Jorge Aguilera-Iparraguirre, Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik

How to design molecules?

• Can predict properties of molecules (QSAR)

• Want a molecule with a certain property (inverse QSAR)

How to optimize molecules?• Build a giant library of molecules and try them all

• Or: Local search based on small changes

Cornelius J. O'Connor, Henning S.G. Beckmanna and David R. Spring

Discrete optimization is basically impossible

• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go

• Gradient gives D hints for your D parameters

• need some version of@molecule

@property

Discrete optimization is basically impossible

• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go

• Gradient gives D hints for your D parameters

• need some version of@molecule

@property

Nguyen, Dosovitskiy, Yosinski, Brox, Clune

Credit: Natalie Fey

What is a molecule?Graph SMILES string

Text autoencoders

• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

TextVAE-Interpola*on

Repurposing text autoencoders

c1ccccc1 c1ccccc1

ENCODERNeural Network

DECODERNeural Network

CONTINUOUS MOLECULARREPRESENTATION

Latent Space

Discrete Structure SMILES

Discrete Structure SMILES

Can be trained on unlabeled data

Map of 220,000 Drugs

Map of 100,000 OLEDs

Random Organic LEDs

F S S F O OO S

N N N

HS S S S SH

HO SHH2N NH2

O

FHNH2N

NH3

F F

HO F

N NH2

OHSHSO

F S FH2S

OHHO OH

FHS

NH2

OH

O

N SH

O

NH2H2N FHF

SHS

S

F

H2O

SHSHS F

CH4

H2N OHN

F F

HO S S S S S S S S S S S SH

N

S N

NO

O

N

N

N N

N

O O

OO

S O

N

N N

NN N

N

NN

N

NN N

NN

NS

N

N N

N

N

N N

N

N

N

O

O

NN

S N

NNN

N

N NN

N

O

NN N

N

N

NNO

NN

NO

NN

N

NNN

N

Standard autoencoderVariational autoencoder

Molecules near aspirin

O

NHN+

O

-O

Cl

Cl

NH

NH

N+

O

O-Cl

Cl

Br

N

N+

O

-OCl BrCl

O

NH

N+

O

O-Cl

Cl

SHCl

ON+

O

-O

NH

Cl

Cl

O

NH

N

N+

O

-OCl

F

O

ON+

O

-OCl

Cl

HN

O

NH

N+

O

O-Br

Cl O

ON+

O

O-Cl

Br O

NH

NH

N+

O

O-Cl

Cl

O

O

HN

N+

O

-O

HN Cl

Br

O

NH

N+

O

O-HO

Cl

Cl

ON+

O

O-Cl

Cl Br

ON+

O

O-Br

Cl

Cl

HN

N+

O

O-

Cl

Cl

O

O

NH

N

N+

O

O-Cl

Cl

Br

O

NH

N+

O

O-F

NH

Cl

O

O

HN

N+

O

O-HO

F

F

NHN+

O

-OF

NH

Cl

BrO

NH

N+

O

O-Br

Cl

Cl

O

O

N+

O

O-Cl

Cl Br

O

NH

N+

O

O-Cl

Br

O

ON+

O

O-Cl

Cl O

O

NH

N+

O

O-Cl

Cl

O

ON+

O

O-Cl

F

Cl

O

NH

N+

O

O-

Cl

Cl

Cl

O

NH

N+

O

O-Cl

Cl O

ON+

O

O-Cl

Cl

O

O

NH

N+

O

O-Cl

Cl

Br

O

NH

N

N+

O

O-Cl

Cl

Cl

O

NN+

O

O-

Br

Cl

Cl

O

NH

N+

O

O-Cl

O Cl

ON+

O

O-Cl

Cl

Cl

O

NH

N+

O

O-Cl

Cl Cl

O NH

N+O O-

Cl

N

O NH

N+O O-

Br

O

NH

N+

O

O-Br

Cl Cl

O

NH

N+

O

O-

NH

Cl

Br

O

N+

O

-O

N NH

Cl

O

O

NH

N+

O

O-Cl

NCl

Cl

O

O

N+

O

O-

NH

Cl Cl

O

NH

N+

O

O-Cl

Cl Br

HN

N+

O

O-

Cl

Cl

Br

O

NH

N+

O

O-Br

Cl

Br

O

NH

N+

O

O-

Cl

Cl

O

NH

N+

O

O-Cl

OCl

ON+

O

O-Cl

Cl

Br

O

NH

N

N+

O

O-Cl

Cl

Cl

O

NN+

O

O-

Cl

Cl Cl

O

NH

N+

O

O-Cl

Cl

OH

No chemistry-specific design!

Gradient-based optimization

Objective Values in Training Data

Objective Values

Freq

uenc

y

15 10 5 0 5 10

050

0015

000

2500

0Molecule 1

Molecule 2

Molecule 1Molecule 2

Proof of conceptBut be careful what you wish for

Applications of small-molecule design

OrganicLEDs,liquidcrystals,organicsolarcells,gasdielectrics,supercapacitors,ba8eries,electronicpolymers,homogeneouscatalysts,plas;caddi;ves,adhesives,sealants,3Dprin;ng,paintsandcoa;ngs,specialtyfibers,biodegradablepolymers,medicalplas;cs,pes;cides,smallmoleculedrugs

Future work

Future work• Jointly train autoencoder

and prediction model



• Decode directly to graphs



• Decode directly to graphs

• Decode directly to recipes for synthesis

Thanks!

Rafa Gómez-Bombarelli, Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre

Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik

Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf ·...

Documents

Transcript of Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf ·...