Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf ·...

30
Automatic chemical design using a continuous, data-driven representation of molecules Rafa Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre, Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik

Transcript of Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf ·...

Page 1: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Automatic chemical design using a continuous, data-driven

representation of moleculesRafa Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato,

Jorge Aguilera-Iparraguirre, Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik

Page 2: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

How to design molecules?

• Can predict properties of molecules (QSAR)

• Want a molecule with a certain property (inverse QSAR)

Page 3: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

How to optimize molecules?• Build a giant library of molecules and try them all

• Or: Local search based on small changes

Cornelius J. O'Connor, Henning S.G. Beckmanna and David R. Spring

Page 4: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Discrete optimization is basically impossible

• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go

• Gradient gives D hints for your D parameters

• need some version of@molecule

@property

Page 5: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Discrete optimization is basically impossible

• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go

• Gradient gives D hints for your D parameters

• need some version of@molecule

@property

Page 6: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Discrete optimization is basically impossible

• In more than 10 or 20 dimensions, search is too slow because no way to know which direction to go

• Gradient gives D hints for your D parameters

• need some version of@molecule

@property

Nguyen, Dosovitskiy, Yosinski, Brox, Clune

Page 7: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Credit: Natalie Fey

Page 8: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Credit: Natalie Fey

Page 9: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

What is a molecule?Graph SMILES string

Page 10: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Text autoencoders

• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

Page 11: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

TextVAE-Interpola*on

Page 12: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Repurposing text autoencoders

c1ccccc1 c1ccccc1

ENCODERNeural Network

DECODERNeural Network

CONTINUOUS MOLECULARREPRESENTATION

Latent Space

Discrete Structure SMILES

Discrete Structure SMILES

Can be trained on unlabeled data

Page 13: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Map of 220,000 Drugs

Page 14: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Map of 100,000 OLEDs

Page 15: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Random Organic LEDs

F S S F O OO S

N N N

HS S S S SH

HO SHH2N NH2

O

FHNH2N

NH3

F F

HO F

N NH2

OHSHSO

F S FH2S

OHHO OH

FHS

NH2

OH

O

N SH

O

NH2H2N FHF

SHS

S

F

H2O

SHSHS F

CH4

H2N OHN

F F

HO S S S S S S S S S S S SH

N

S N

NO

O

N

N

N N

N

O O

OO

S O

N

N N

NN N

N

NN

N

NN N

NN

NS

N

N N

N

N

N N

N

N

N

O

O

NN

S N

NNN

N

N NN

N

O

NN N

N

N

NNO

NN

NO

NN

N

NNN

N

Standard autoencoderVariational autoencoder

Page 16: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Molecules near aspirin

O

NHN+

O

-O

Cl

Cl

NH

NH

N+

O

O-Cl

Cl

Br

N

N+

O

-OCl BrCl

O

NH

N+

O

O-Cl

Cl

SHCl

ON+

O

-O

NH

Cl

Cl

O

NH

N

N+

O

-OCl

F

O

ON+

O

-OCl

Cl

HN

O

NH

N+

O

O-Br

Cl O

ON+

O

O-Cl

Br O

NH

NH

N+

O

O-Cl

Cl

O

O

HN

N+

O

-O

HN Cl

Br

O

NH

N+

O

O-HO

Cl

Cl

ON+

O

O-Cl

Cl Br

ON+

O

O-Br

Cl

Cl

HN

N+

O

O-

Cl

Cl

O

O

NH

N

N+

O

O-Cl

Cl

Br

O

NH

N+

O

O-F

NH

Cl

O

O

HN

N+

O

O-HO

F

F

NHN+

O

-OF

NH

Cl

BrO

NH

N+

O

O-Br

Cl

Cl

O

O

N+

O

O-Cl

Cl Br

O

NH

N+

O

O-Cl

Br

O

ON+

O

O-Cl

Cl O

O

NH

N+

O

O-Cl

Cl

O

ON+

O

O-Cl

F

Cl

O

NH

N+

O

O-

Cl

Cl

Cl

O

NH

N+

O

O-Cl

Cl O

ON+

O

O-Cl

Cl

O

O

NH

N+

O

O-Cl

Cl

Br

O

NH

N

N+

O

O-Cl

Cl

Cl

O

NN+

O

O-

Br

Cl

Cl

O

NH

N+

O

O-Cl

O Cl

ON+

O

O-Cl

Cl

Cl

O

NH

N+

O

O-Cl

Cl Cl

O NH

N+O O-

Cl

N

O NH

N+O O-

Br

O

NH

N+

O

O-Br

Cl Cl

O

NH

N+

O

O-

NH

Cl

Br

O

N+

O

-O

N NH

Cl

O

O

NH

N+

O

O-Cl

NCl

Cl

O

O

N+

O

O-

NH

Cl Cl

O

NH

N+

O

O-Cl

Cl Br

HN

N+

O

O-

Cl

Cl

Br

O

NH

N+

O

O-Br

Cl

Br

O

NH

N+

O

O-

Cl

Cl

O

NH

N+

O

O-Cl

OCl

ON+

O

O-Cl

Cl

Br

O

NH

N

N+

O

O-Cl

Cl

Cl

O

NN+

O

O-

Cl

Cl Cl

O

NH

N+

O

O-Cl

Cl

OH

Page 17: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol
Page 18: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol
Page 19: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol
Page 20: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol
Page 21: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

No chemistry-specific design!

Page 22: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Gradient-based optimization

Page 23: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Objective Values in Training Data

Objective Values

Freq

uenc

y

15 10 5 0 5 10

050

0015

000

2500

0Molecule 1

Molecule 2

Molecule 1Molecule 2

Proof of conceptBut be careful what you wish for

Page 24: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Applications of small-molecule design

OrganicLEDs,liquidcrystals,organicsolarcells,gasdielectrics,supercapacitors,ba8eries,electronicpolymers,homogeneouscatalysts,plas;caddi;ves,adhesives,sealants,3Dprin;ng,paintsandcoa;ngs,specialtyfibers,biodegradablepolymers,medicalplas;cs,pes;cides,smallmoleculedrugs

Page 25: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol
Page 26: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Future work

Page 27: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Future work• Jointly train autoencoder

and prediction model

Page 28: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Future work• Jointly train autoencoder

and prediction model

• Decode directly to graphs

Page 29: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Future work• Jointly train autoencoder

and prediction model

• Decode directly to graphs

• Decode directly to recipes for synthesis

Page 30: Automatic chemical design using a continuous, data-driven ...duvenaud/talks/mol-auto-talk.pdf · • Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol

Thanks!

Rafa Gómez-Bombarelli, Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre

Timothy Hirzel, Ryan P. Adams, Alán Aspuru-Guzik