Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

10
Composing Molecules with Multiple Property Constraints Wengong Jin, Regina Barzilay, Tommi Jaakkola CSAIL, Massachusetts Institute of Technology [email protected] Abstract Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increas- ingly challenging when there are many property constraints. We propose to offset this complex- ity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand ratio- nales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale com- pletions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate sig- nificant improvements over state-of-the-art base- lines in terms of accuracy, diversity, and novelty of generated compounds. 1. Introduction The key challenge in drug discovery is to find molecules that satisfy multiple constraints, from potency, safety, to desired metabolic profiles. Optimizing these constraints simultaneously is challenging for existing computational models. The primary difficulty lies in the lack of training instances of molecules that conform to all the constraints. For example, for this reason, Jin et al. (2019a) reports over 60% performance loss when moving beyond the single- constraint setting. In this paper, we propose a novel approach to multi- property molecular optimization. Our strategy is inspired by fragment-based drug discovery (Murray & Rees, 2009) of- ten followed by medicinal chemists. The idea is to start with substructures (e.g., functional groups or later pieces) that drive specific properties of interest, and then combine these building blocks into a target molecule. To automate this Figure 1. Illustration of our rationale based generative model. To generate a dual inhibitor against biological targets GSK3β and JNK3, our model first identifies rationale substructures S for each property, and then learns to compose them into a full molecule G. Note that rationales are not provided as domain knowledge. process, our model has to learn two complementary tasks illustrated in Figure 1: (1) identification of the building blocks that we call rationales, and (2) assembling multi- ple rationales together into a fully formed target molecule. In contrast to competing methods, our generative model does not build molecules from scratch, but instead assem- bles them from automatically extracted rationales already implicated for specific properties (see Figure 1). We implement this idea using a generative model of molecules where the rationale choices play the role of latent variables. Specifically, a molecular graph G is generated from underlying rationale sets S according to: P (G)= X S P (G|S )P (S ) (1) As ground truth rationales (e.g., functional groups or sub- graphs) are not provided, the model has to extract candidate rationales from molecules with the help of a property pre- dictor. We formulate this task as a discrete optimization problem efficiently solved by Monte Carlo tree search. Our rationale conditioned graph generator, P (G|S ), is initially trained on a large collection of real molecules so that it is capable of expanding any subgraph into a full molecule. arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

Transcript of Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

Page 1: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

Composing Molecules with Multiple Property Constraints

Wengong Jin, Regina Barzilay, Tommi Jaakkola

CSAIL, Massachusetts Institute of [email protected]

AbstractDrug discovery aims to find novel compoundswith specified chemical property profiles. In termsof generative modeling, the goal is to learn tosample molecules in the intersection of multipleproperty constraints. This task becomes increas-ingly challenging when there are many propertyconstraints. We propose to offset this complex-ity by composing molecules from a vocabularyof substructures that we call molecular rationales.These rationales are identified from molecules assubstructures that are likely responsible for eachproperty of interest. We then learn to expand ratio-nales into a full molecule using graph generativemodels. Our final generative model composesmolecules as mixtures of multiple rationale com-pletions, and this mixture is fine-tuned to preservethe properties of interest. We evaluate our modelon various drug design tasks and demonstrate sig-nificant improvements over state-of-the-art base-lines in terms of accuracy, diversity, and noveltyof generated compounds.

1. IntroductionThe key challenge in drug discovery is to find moleculesthat satisfy multiple constraints, from potency, safety, todesired metabolic profiles. Optimizing these constraintssimultaneously is challenging for existing computationalmodels. The primary difficulty lies in the lack of traininginstances of molecules that conform to all the constraints.For example, for this reason, Jin et al. (2019a) reports over60% performance loss when moving beyond the single-constraint setting.

In this paper, we propose a novel approach to multi-property molecular optimization. Our strategy is inspired byfragment-based drug discovery (Murray & Rees, 2009) of-ten followed by medicinal chemists. The idea is to start withsubstructures (e.g., functional groups or later pieces) thatdrive specific properties of interest, and then combine thesebuilding blocks into a target molecule. To automate this

Figure 1. Illustration of our rationale based generative model. Togenerate a dual inhibitor against biological targets GSK3β andJNK3, our model first identifies rationale substructures S for eachproperty, and then learns to compose them into a full molecule G.Note that rationales are not provided as domain knowledge.

process, our model has to learn two complementary tasksillustrated in Figure 1: (1) identification of the buildingblocks that we call rationales, and (2) assembling multi-ple rationales together into a fully formed target molecule.In contrast to competing methods, our generative modeldoes not build molecules from scratch, but instead assem-bles them from automatically extracted rationales alreadyimplicated for specific properties (see Figure 1).

We implement this idea using a generative model ofmolecules where the rationale choices play the role of latentvariables. Specifically, a molecular graph G is generatedfrom underlying rationale sets S according to:

P (G) =∑SP (G|S)P (S) (1)

As ground truth rationales (e.g., functional groups or sub-graphs) are not provided, the model has to extract candidaterationales from molecules with the help of a property pre-dictor. We formulate this task as a discrete optimizationproblem efficiently solved by Monte Carlo tree search. Ourrationale conditioned graph generator, P (G|S), is initiallytrained on a large collection of real molecules so that it iscapable of expanding any subgraph into a full molecule.

arX

iv:2

002.

0324

4v1

[cs

.LG

] 8

Feb

202

0

Page 2: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

The mixture model is then fine-tuned using reinforcementlearning to ensure that the generated molecules preserve allthe properties of interest. This training paradigm enables usto realize molecules that satisfy multiple constraints withoutobserving any such instances in the training set.

The proposed model is evaluated on molecule design tasksunder different combinations of property constraints. Ourbaselines include state-of-the-art molecule generation meth-ods (Olivecrona et al., 2017; You et al., 2018a). Across alltasks, our model achieve state-of-the art results in terms ofaccuracy, novelty and diversity of generated compounds. Inparticular, we outperform the best baseline with 38% abso-lute improvement in the task with three property constraints.We further provide ablation studies to validate the benefitof our architecture in the low-resource scenario. Finally, weshow that identified rationales are chemically meaningful ina toxicity prediction task (Sushko et al., 2012).

2. Related WorkReinforcement Learning One of the prevailing paradigmsfor drug design is reinforcement learning (RL) (You et al.,2018a; Olivecrona et al., 2017; Popova et al., 2018), whichseeks to maximize the expected reward defined as the sumof predicted property scores using the property predictors.Their approach learns a distribution P (G) (a neural network)for generating molecules. Ideally, the model should achievehigh success rate in generating molecules that meet all theconstraints, while maintaining the diversity of P (G).

The main challenge of RL lies in the sparsity of rewards,especially when there are multiple competing constraints.For illustration, we tested a state-of-the-art reinforcementlearning method (Olivecrona et al., 2017) under three prop-erty constraints: biological activity to target DRD2, GSK3βand JNK3 (Li et al., 2018). As shown in Figure 2, initiallythe success rate and diversity is high when given only oneof the constraints, but they decrease dramatically when allthe property constraints are added. The reason of this failureis that the property predictor (i.e., reward function) remainsblack-box and the model has limited understand of how andwhy certain molecules are desirable.

Our framework offsets this complexity by understandingproperty landscape through rationales. At a high level, therationales are analogous to options (Sutton et al., 1999;Stolle & Precup, 2002), which are macro-actions leadingthe agent faster to its goal. The rationales are automaticallydiscovered from molecules with labeled properties.

Molecule Generation Previous work have adopted var-ious approaches for generating molecules under specificproperty constraints. Roughly speaking, existing methodscan be divided along two axes — representation and opti-mization. On the representation side, they either operate

Figure 2. Challenge of multi-objective drug design. Standard re-inforcement learning method (Olivecrona et al., 2017) fails underthree property constraints due to reward sparsity.

on SMILES strings (Gomez-Bombarelli et al., 2018; Segleret al., 2017; Kang & Cho, 2018) or directly on moleculargraphs (Simonovsky & Komodakis, 2018; Jin et al., 2018;Samanta et al., 2018; Liu et al., 2018; De Cao & Kipf, 2018;Ma et al., 2018; Seff et al., 2019). On the optimizationside, the task has been formulated as reinforcement learn-ing (Guimaraes et al., 2017; Olivecrona et al., 2017; Popovaet al., 2018; You et al., 2018a; Zhou et al., 2018), continu-ous optimization in the latent space learned by variationalautoencoders (Gomez-Bombarelli et al., 2018; Kusner et al.,2017; Dai et al., 2018; Jin et al., 2018; Kajino, 2018; Liuet al., 2018), or graph-to-graph translation (Jin et al., 2019b).In contrast to existing approaches, our model focuses on themulti-objective setting of the problem and offers a differentformulation for molecule generation based on rationales.

Interpretability Our rationale based generative modelseeks to provide transparency (Doshi-Velez & Kim, 2017)for molecular design. The choice of rationales P (S) is visi-ble to users and can be easily controlled by human experts.Prior work on interpretability primarily focuses on findingrationales (i.e., explanations) of model predictions in imageand text classification (Lei et al., 2016; Ribeiro et al., 2016;Sundararajan et al., 2017) and molecule property predic-tion (McCloskey et al., 2019; Ying et al., 2019; Lee et al.,2019). In contrast, our model uses rationales as buildingblocks for molecule generation.

3. Composing Molecules using RationalesMolecules are represented as graphs G = (V, E) with atomsV as nodes and bonds E as edges. The goal of drug discov-ery is to find novel compounds satisfying given propertyconstraints (e.g., drug-likeness, binding affinity, etc.). With-out loss of generality, we assume the property constraints tobe of the following form:

Find molecules GSubject to ri(G) ≥ δi; i = 1, · · · ,M

(2)

For each property i, the property score ri(G) ∈ [0, 1] ofmolecule G must be higher than threshold δi ∈ [0, 1]. A

Page 3: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

Figure 3. Overview of our approach. We first construct rationales for each individual property and then combine them as multi-propertyrationales. The method learns a graph completion model P (G|S) and rationale distribution P (S) in order to generate positive molecules.

molecule G is called positive to property i if ri(G) ≥ δi andnegative otherwise.

Following previous work (Olivecrona et al., 2017; Popovaet al., 2018), ri(G) is output of property prediction models(e.g., random forests) which effectively approximate empir-ical measurements. The prediction model is trained overa set of molecules with labeled properties gathered fromreal experimental data. The property predictor is then fixedthroughout the rest of the training process.

Overview Our model generates molecules by first samplinga rationale S from the vocabulary V [M ]

S , and then complet-ing it into a molecule G. The generative model is definedas

P (G) =∑S∈V [M]

SP (S)P (G|S) (3)

As shown in Figure 3, our model consists of three modules:

• Rational Extraction: Construct rationale vocabulary V iSeach individual property i and combines these rationalesfor multiple properties V [M ]

S (see §3.1).

• Graph Completion P (G|S): Generate molecules G us-ing multi-property rationales S[M ] ∈ V [M ]

S . The model isfirst pre-trained on natural compounds and then fine-tunedto generate molecules satisfying multiple constraints (see§3.2 for its architecture and §3.3 for fine-tuning).

• Rationale Distribution P (S): The rationale distribu-tion P (S) is learned based on the properties of completemolecules G generated from P (G|S). A rationale S issampled more frequently if it is more likely to be ex-panded into a positive molecule G (see §3.3).

3.1. Rationale Extraction from Predictive Models

Single-property Rationale We define a rationale Si for asingle property i as a subgraph of some molecule G whichcauses G to be positive (see Figure 1). To be specific, let V iSbe the vocabulary of such rationales for property i. Eachrationale Si ∈ V iS should satisfy the following two criteria

to be considered as a rationale:

1. The size of Si should be small (less thanN = 20 atoms).

2. Its predicted property score ri(Si) ≥ δi.

For a single property i, we propose to extract its rationalesfrom a set of positive molecules Dposi used to train theproperty predictor. For each molecule Gposi ∈ Dposi , we finda rationale subgraph with high predicted property and smallsize (N = 20):

Find subgraph Si ⊂ Gposi

Subject to ri(Si) ≥ δi,|Si| ≤ N and Si is connected

(4)

Solving the above problem is challenging because rationaleSi is discrete and the potential number of subgraphs growsexponentially to the size of Gposi . To limit the search space,we have added an additional constraint that Si has to be aconnected subgraph.1 In this case, we can find a rationaleSi by iteratively removing some peripheral bonds whilemaintaining its property. Therefore, the key is learning toprune the molecule.

This search problem can be efficiently solved by MonteCarlo Tree Search (MCTS) (Silver et al., 2017). The rootof the search tree is Gpos and each state s in the search treeis a subgraph derived from a sequence of bond deletions.To ensure that each subgraph is chemically valid and staysconnected, we only allow deletion of one peripheral non-aromatic bond or one peripheral ring from each state. Asshown in Figure 4, a bond or a ring a is called peripheral ifGpos stays connected after deleting a.

During search process, each state s in the search tree con-tains edges (s, a) for all legal deletions a. Following Silveret al. (2017), each edge (s, a) stores the following statistics:

1This assumption is valid in many cases. For instance, ratio-nales for toxicity (i.e., toxicophores) are connected subgraphs inmost cases (Sushko et al., 2012).

Page 4: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

• N(s, a) is the visit count of deletion a, which is used forexploration-exploitation tradeoff in the search process.

• W (s, a) is total action value which indicates how likelythe deletion a will lead to a good rationale.

• R(s, a) = ri(s′) is the predicted property score of the

new subgraph s′ derived from deleting a from s.

Guided by these statistics, MCTS searches for rationales inmultiple iterations. Each iteration consists of two phases:

1. Forward pass: Select a path s0, · · · , sL from the root s0to a leaf state sL with less than N atoms and evaluate itsproperty score ri(sL). At each state sk, an deletion akis selected according to the statistics in the search tree:

ak = argmaxa

W (sk, a)

N(sk, a)+ U(sk, a) (5)

U(sk, a) = cpuctR(sk, a)

√∑bN(sk, b)

1 +N(sk, a)(6)

where cpuct determines the level of exploration. Thissearch strategy is a variant of the PUCT algorithm (Rosin,2011). It initially prefers to explore deletions with highR(s, a) and low visit count, but asympotically prefersdeletions that are likely to lead to good rationales.

2. Backward pass: The edge statistics are updated for eachstate sk. Specifically, N(sk, ak)← N(sk, ak) + 1 andW (sk, ak)←W (sk, ak) + ri(sL).

In the end, we collect all the leaf states s with ri(s) ≥ δiand add them to the rationale vocabulary V iS .

Multi-property Rationale For a set of M properties, wecan similarly define its rationale S [M ] by imposing M prop-erty constraints at the same time, namely

∀i : ri(S [M ]) ≥ δi, i = 1, · · · ,M

In principle, we can apply MCTS to extract rationales frommolecules that satisfy all the property constraints. However,in many cases there are no such molecules available. Asa result, we propose to construct rationales from single-property rationales for each property i. Specifically, eachmulti-property rationale S [M ] is a disconnected graph withM connected components S1, · · · ,SM :

V[M ]S = {S [M ] = S1 ⊕ · · · ⊕ SM | Si ∈ V iS} (7)

where ⊕ means to concatenate two graphs Si and Sj . Fornotational convenience, we denote both single and multi-property rationales as S. In the rest of the paper, S is arationale graph with one or multiple connected components.

3.2. Graph Completion

This module is a variational autoencoder which completes afull molecule G given a rationale S . Since each rationale S

Figure 4. Illustration of Monte Carlo tree search for molecules.Peripheral bonds and rings are highlighted in red. In the forwardpass, the model deletes a peripheral bond or ring from each statewhich has maximum Q + U value. In the backward pass, themodel updates the statistics of each state.

can be realized into many different molecules, we introducea latent variable z to generate diverse outputs:

P (G|S) =∫z

P (G|S, z)P (z)dz (8)

where P (z) is the prior distribution. Different from standardgraph generation, our graph decoder must generate graphsthat contain subgraph S. Our VAE architecture is adaptedfrom existing atom-by-atom generative models (You et al.,2018b; Liu et al., 2018) to incorporate the subgraph con-straint. For completeness, we present our architecture here:

Encoder Our encoder is a message passing network (MPN)which learns the approximate posterior Q(z|G,S) for vari-ational inference. Let e(au) be the embedding of atom uwith atom type au, and e(buv) be the embedding of bond(u, v) with bond type buv. The MPN computes atom repre-sentations {hv|v ∈ G}.

{hv} = MPNe (G, {e(au)}, {e(buv)}) (9)

For simplicity, we denote the MPN encoding process asMPN(·), which is detailed in the appendix. The atomvectors are aggregated to represent G as a single vectorhG =

∑v hv. Finally, we sample latent vector zG from

Q(z|G,S) with mean µ(hG) and log variance Σ(hG):

zG = µ(hG) + exp(Σ(hG)) · ε; ε ∼ N (0, I) (10)

Decoder The decoder generates molecule G according toits breadth-first order. In each step, the model generates a

Page 5: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

new atom and all its connecting edges. During generation,we maintain a queue Q that contains frontier nodes in thegraph who still have neighbors to be generated. Let Gt bethe partial graph generated till step t. To ensure G containsS as subgraph, we set the initial state of G1 = S and put allthe peripheral atoms of S to the queue Q (only peripheralatoms are needed due to the rationale extraction algorithm).

In tth generation step, the decoder first runs a MPN overcurrent graph Gt to compute atom representations h(t)

v :

{h(t)v } = MPNd (Gt, {e(au)}, {e(buv)}) (11)

The current graph Gt is represented as the sum of its atomvectors hGt =

∑v∈Gt h

(t)v . Suppose the first atom in Q is

vt. The decoder decides to expand Gt in three steps:

1. Predict whether there will be a new atom attached to vt:

pt = sigmoid(MLP(h(t)vt ,hGt , zG)) (12)

where MLP(·, ·, ·) is a ReLU network whose input is aconcatenation of multiple vectors.

2. If pt < 0.5, discard vt and move on to the next node inQ. Stop generation if Q is empty. Otherwise, create anew atom ut and predict its atom type:

put = softmax(MLP(h(t)vt ,hGt , zG)) (13)

3. Predict the bond type between ut and other frontier nodesin Q = {q1, · · · , qn} (q1 = vt). Since atoms are gener-ated in breadth-first order, there are no bonds betweenut and atoms not in Q.

To fully capture edge dependencies, we predict the bondsbetween ut and atoms in Q sequentially and update therepresentation of ut when new bonds are added to Gt. In thekth step, we predict the bond type of (ut, qk) as follows:

but,qk = softmax(MLP(g(k−1)ut

,h(t)qk,hGt , zG)

)(14)

where g(k−1)ut is the new representation of ut after bonds{(ut, q1), · · · , (ut, qk−1)} have been added to Gt:

g(k−1)ut= MLP

(e(aut

),∑k−1

j=1MLP(h(t)

qj , e(bqj ,ut))

)3.3. Training Procedure

Our training objective is to maximize the expected rewardof generated molecules G, where the reward is an indicatorof ri(G) ≥ δi for all properties 1 ≤ i ≤M∑

GI[∧

iri(G) ≥ δi

]P (G) + λH[P (S)] (15)

We incorporate an entropy regularization term H[P (S)] toencourage the model to explore different types of rationales.

Algorithm 1 Training method with n property constraints.

1: for i = 1 toM do2: V iS ← rationales extracted from existing molecules

Dposi positive to property i. (see §3.1)3: end for4: Construct multi-property rationales V [M ]

S .5: Pre-train P (G|S) on the pre-training dataset Dpre.6: Fine-tune model P (G|S) on Df for L iterations using

policy gradient.7: Compute P (S) based on Eq.(16) using fine-tuned

model P (G|S).

The rationale distribution P (S) is a categorical distributionover the rationale vocabulary. Let I[G] = I [

∧i ri(G) ≥ δi].

It is easy to show that the optimal P (S) has a closed formsolution:

P (Sk) ∝ exp

(1

λ

∑GI[G]P (G|Sk)

)(16)

The remaining question is how to train graph generatorP (G|S). The generator seeks to produce molecules that arerealistic and positive. However, Eq.(15) itself does not takeinto account whether generated molecules are realistic or not.To encourage the model to generate realistic compounds,we train the graph generator in two phases:

• Pre-training P (G|S) using real molecules.

• Fine-tuning P (G|S) using policy gradient with rewardfrom property predictors.

The overall training algorithm is shown in Algorithm 1.

3.3.1. PRE-TRAINING

In addition to satisfying all the property constraints, theoutput of the model should constitute a realistic molecule.For this purpose, we pre-train the graph generator on a largeset of molecules from ChEMBL (Gaulton et al., 2017). Eachtraining example is a pair (S,G), where S is a (random)subgraph of a molecule G. The task is to take as inputsubgraphs and complete them into a full molecule. Given amolecule G, we consider two types of random subgraphs:

• S is a connected subgraph of G with up to N atoms.

• S is a disconnected subgraph of G with multiple con-nected components. This is to simulate the case of gener-ating molecules from multi-property rationales.

Finally, we train the graph generator to maximize the likeli-hood of the pre-training dataset Dpre = {(Gi,Si)}ni=1.

3.3.2. FINE-TUNING

After pre-training, we further fine-tune the graph generatoron property-specific rationales S ∈ VS in order to maximize

Page 6: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

Eq.(15). The model is fine-tuned through multiple iterationsusing policy gradient (Sutton et al., 2000). Let Pθt(G|S)be the model trained till tth iteration. In each iteration, weperform the following two steps:

1. Initialize the fine-tuning set Df = ∅. For each ratio-nale Si, use the current model to sample K molecules{G1i , · · · ,GKi } ∼ Pθt(G|Si). Add (Gki ,Si) to set Df ifGki is predicted to be positive.

2. Update the model Pθ(G|S) on the fine-tuning set Dfusing policy gradient method.

After fine-tuning P (G|S), we compute the rationale distri-bution P (S) based on Eq.(16).

4. ExperimentsWe evaluate our method on molecule design tasks under vari-ous combination of property constraints. In our experiments,we consider the following three properties:

• DRD2: Dopamine type 2 receptor activity. The DRD2 ac-tivity prediction model is trained on the dataset providedby Olivecrona et al. (2017), which contains 7219 positiveand 100K negative compounds.

• GNK3β: Inhibition against glycogen synthase kinase-3beta. The GNK3β prediction model is trained on thedataset from Li et al. (2018), which contains 2665 posi-tives and 50K negative compounds.

• JNK3: Inhibition against c-Jun N-terminal kinase-3. TheJNK3 prediction model is also trained on the dataset fromLi et al. (2018) with 740 positives and 50K negatives.

Following Li et al. (2018), the property prediction model isa random forest using Morgan fingerprint features (Rogers& Hahn, 2010). We set the positive threshold δi = 0.5.

Multi-property Constraints We also consider variouscombinations of property constraints:

• GNK3β + JNK3: Jointly inhibiting JNK3 and GSK-3β may provide potential benefit for the treatment ofAlzheimers disease (Li et al., 2018). There exist 316 dualinhibitors already available in the dataset.

• DRD2 + JNK3: JNK3 inhibitors active to DRD2.

• DRD2 + GSK3β: GSK3β inhibitors active to DRD2.

• DRD2 + GSK3β + JNK3: Combining all constraints.

Except the first case, all other combinations have less thanthree positive molecules in the datasets, which impose sig-nificant challenge for molecule design methods.

Evaluation Metrics Our evaluation effort measures variousaspects of molecule design. For each method, we generaten = 5000 molecules and compute the following metrics:

• Success: The fraction of sampled molecules predicted to

be positive (i.e., satisfying all property constraints). Agood model should have a high success rate. Followingprevious work (Olivecrona et al., 2017; You et al., 2018a),we only consider the success rate under property predic-tion, as it is hard to obtain real property measurements.

• Diversity: It is also important for a model generate di-verse range of positive molecules. To this end, we mea-sure the diversity of generated positive compounds bycomputing their pairwise molecular distance sim(X,Y ),which is defined as the Tanimoto distance over Morganfingerprints of two molecules.

• Novelty: Crucially, a good model should discover novelpositive compounds. In this regard, for each generatedpositive compound G, we find its nearest neighbor GSNN

from positive molecules in the training set. We define thenovelty as the fraction of molecules with nearest neighborsimilarity greater than 0.4 (Olivecrona et al., 2017):

Novelty =1

n

∑G

1 [sim(G,GSNN) > 0.4]

Baselines We compare our method against the followingstate-of-the-art generation methods for molecule design:

• REINVENT (Olivecrona et al., 2017) is a RL modelgenerating molecules based on their SMILES strings(Weininger, 1988). To generate realistic molecules, theirmodel is pre-trained over one million molecules fromChEMBL and then finetuned under property reward.

• GCPN (You et al., 2018a) is a RL model which generatesmolecular graphs atom by atom. It uses GAN (Goodfel-low et al., 2014) to help generate realistic molecules.

• GVAE + RL is a graph variational autoencoder which gen-erates molecules atom by atom. The graph VAE architec-ture is the same as our model, but it generates moleculesfrom scratch without using rationales (i.e., S = ∅). Themodel is pre-trained on the same ChEMBL dataset andfine-tuned for each property using policy gradient. This isan ablation study of our method to show the importanceof using rationales.

Rationales Details of the rationales used in our model:

• Single property: For single properties, the rationale sizeis required to be less than 20 atoms. For each positivemolecule, we run 20 iteration of MCTS with cpuct =10. In total, we extracted 8611, 6299 and 417 differentrationales for DRD2, GSK3β and JNK3.

• Two properties: We constructed dual-property rationalesby V iS ⊕ V

jS (i, j ∈ DRD2,GSK3β, JNK3). To limit

the vocabulary size, we cluster the rationales using thealgorithm from Butina (1999) and keep the cluster cen-troids only. In total, we extracted 60K, 9.9K, 9.2K ratio-nales for DRD2+GSK3β, DRD2+JNK3, GSK3β+JNK3respectively.

Page 7: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

Table 1. Results on molecule design with single property constraint.

Method DRD2 GSK3β JNK3Success Novelty Diversity Success Novelty Diversity Success Novelty Diversity

GVAE + RL 55.4% 56.2% 0.858 33.2% 76.4% 0.874 57.7% 62.6% 0.832GCPN 40.1% 12.8% 0.880 42.4% 11.6% 0.904 32.3% 4.4% 0.884REINVENT 98.1% 28.8% 0.798 99.3% 61.0% 0.733 98.5% 31.6% 0.729Ours 100% 84.9% 0.866 100% 76.8% 0.850 100% 81.4% 0.847

Table 2. Results on molecule design with two property constraints.

Method DRD2 + GSK3β DRD2 + JNK3 GSK3β + JNK3Success Novelty Diversity Success Novelty Diversity Success Novelty Diversity

GVAE + RL 96.1% 0% 0.447 98.5% 0% 0.0 40.7% 80.3% 0.783GCPN 0.1% 40% 0.793 0.1% 25% 0.785 3.5% 8.0% 0.874REINVENT 98.7% 100% 0.584 91.6% 100% 0.533 97.4% 39.7% 0.595Ours 100% 100% 0.866 100% 100% 0.837 100% 94.4% 0.844

Table 3. Molecule design with three property constraints.

Method DRD2 + GSK3β + JNK3Success Novelty Diversity

GVAE + RL 0% 0% 0.0GCPN 0% 0% 0.0REINVENT 48.3% 100% 0.166Ours 86.2% 100% 0.726

• Three properties: The rationale for three properties isconstructed as V DRD2

S ⊕V GSK3βS ⊕V JNK3

S . In total, weextracted 8163 different rationales for this task.

Model Setup Our model (and all baselines) are pretrainedon the same ChEMBL dataset from Olivecrona et al. (2017).We fine-tune our model for L = 5 iterations, where eachrationale is expanded for K = 20 times.

4.1. Results

The results on all design tasks are reported in Table 1-3. Onthe single-property design tasks, our model and REINVENTdemonstrate nearly perfect success rate since there is onlyone constraint. We also outperform all the baselines in termsof novelty metric. Though our diversity score is slightlylower than GCPN, our success rate and novelty score ismuch higher.

Table 2 summarizes the results on dual-property designtasks. On the GSK3β+JNK3 task, we achieved 100% suc-cess rate while maintaining 94.4% novelty and 0.844 di-versity score. Meanwhile, GCPN fails to discover positivecompounds on all the tasks due to reward sparsity. GVAE +RL fails to discover novel positives on the DRD2+GSK3β

and DRD2+JNK3 tasks because there are less than threepositive compounds available for training. Therefore it canonly learn to replicate existing positives. Our method is ableto succeed in all cases regardless of training data scarcity.

Table 3 shows the results on the three-property design task,which is the most challenging. The difference between ourmodel the baselines become significantly larger. In fact,GVAE + RL and GCPN completely fail in this task due toreward sparsity. Our model outperforms REINVENT witha wide margin (success rate: 86.8% versus 48.3%; diversity:0.726 versus 0.166).

When there are multiple properties, our method significantlyoutperforms GVAE + RL, which has the same generativearchitecture but does not utilize rationales and generatesmolecules from scratch. Thus we conclude the importanceof rationales for multi-property molecular design.

Visualization We further provide visualizations to help un-derstand our model. In Figure 5, we plotted a t-SNE (Maaten& Hinton, 2008) plot of the extracted rationales for GSK3βand JNK3. For both properties, rationales mostly cover thechemical space populated by existing positive molecules.The generated GSK3β+JNK3 dual inhibitors mostly appearin the place where GSK3β and JNK3 rationales overlap. InFigure 6, we show examples of molecules generated fromdual-property rationales. Each rationale has two connectedcomponents: one from GSK3β and the other from JNK3.

4.2. Rationale Sanity Check

While our rationales are mainly extracted for generation, itis also important for them to be chemically relevant. In otherwords, the extracted rationales should accurately explain

Page 8: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

GSK3 GSK3 Rationale JNK3 JNK3 RationaleGSK3 RationaleJNK3 Rationale

Generated

Figure 5. Left & middle: t-SNE plot of the extracted rationales for GSK3β and JNK3. For both properties, rationales mostly covers thechemical space populated by existing positive molecules. Right: t-SNE plot of generated GSK3β+JNK3 dual inhibitors.

Figure 6. Left: Examples of molecules generated from dual-property rationales. The model learns to combine two disjoint rationalegraphs. The added subgraphs are highlighted in red. Right: Example structural alerts in the toxicity dataset. The ground truth rationale(Azobenzene) is highlighted in red. Our learned rationale almost matches the ground truth (error highlighted in dashed circle).

Table 4. Rationale accuracy on the toxicity dataset.

Method Partial Match Exact MatchIntegrated Gradient 0.857 39.4%MCTS Rationale 0.861 46.0%

the property of interest. As there is no ground truth ratio-nales available for DRD2, JNK3 and GSK3β, we turn toan auxiliary toxicity dataset for evaluating rationale quality.Specifically, our dataset contains 100K molecules randomlyselected from ChEMBL. Each molecule is labeled as toxic ifit contains structural alerts (Sushko et al., 2012) — chemicalsubstructures that is correlated with human or environmen-tal hazards (see Figure 6).2 We trained a neural networkto predict toxicity on this dataset. Under this setup, thestructural alerts are ground truth rationales and we evaluatehow often the extracted rationales match them.

We compare our MCTS based rationale extraction with inte-

2Structural alerts used in our paper are from surechembl.org/knowledgebase/169485-non-medchem-friendly-smarts

grated gradient (Sundararajan et al., 2017), which has beenapplied to explain property prediction models (McCloskeyet al., 2019). We report two metrics: partial match AUC (at-tribution AUC metric used in McCloskey et al. (2019)) andexact match accuracy which measures how often a rationalegraph exactly matches the true rationale in the molecule.As shown in Table 4, our method significantly outperformsthe baseline in terms of exact matching. The extracted ra-tionales has decent overlap with true rationales, with 0.86partial match on average. Therefore, our model is capableof finding rationales that are chemically meaningful.

5. ConclusionIn this paper, we developed a rationale based generativemodel for molecular design. Our model generates moleculesin two phases: 1) identifying rationales whose presence in-dicate strong positive signals for each property; 2) expand-ing rationale graphs into molecules using graph generativemodels and fine-tuning it towards desired combination ofproperties. Our model demonstrates strong improvementover prior reinforcement learning methods in various tasks.

Page 9: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

ReferencesButina, D. Unsupervised data base clustering based on

daylight’s fingerprint and tanimoto similarity: A fast andautomated way to cluster small and large data sets. Jour-nal of Chemical Information and Computer Sciences, 39(4):747–750, 1999.

Dai, H., Tian, Y., Dai, B., Skiena, S., and Song, L. Syntax-directed variational autoencoder for structured data. arXivpreprint arXiv:1802.08786, 2018.

De Cao, N. and Kipf, T. Molgan: An implicit genera-tive model for small molecular graphs. arXiv preprintarXiv:1805.11973, 2018.

Doshi-Velez, F. and Kim, B. Towards a rigorous sci-ence of interpretable machine learning. arXiv preprintarXiv:1702.08608, 2017.

Gaulton, A., Hersey, A., Nowotka, M., Bento, A. P., Cham-bers, J., Mendez, D., Mutowo, P., Atkinson, F., Bellis,L. J., Cibrian-Uhalte, E., et al. The chembl database in2017. Nucleic acids research, 45(D1):D945–D954, 2017.

Gomez-Bombarelli, R., Wei, J. N., Duvenaud, D.,Hernandez-Lobato, J. M., Sanchez-Lengeling, B., She-berla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams,R. P., and Aspuru-Guzik, A. Automatic chemical de-sign using a data-driven continuous representation ofmolecules. ACS Central Science, 2018. doi: 10.1021/acscentsci.7b00572.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,Warde-Farley, D., Ozair, S., Courville, A., and Bengio,Y. Generative adversarial nets. In Advances in neuralinformation processing systems, pp. 2672–2680, 2014.

Guimaraes, G. L., Sanchez-Lengeling, B., Farias, P. L. C.,and Aspuru-Guzik, A. Objective-reinforced generative ad-versarial networks (organ) for sequence generation mod-els. arXiv preprint arXiv:1705.10843, 2017.

Jin, W., Barzilay, R., and Jaakkola, T. Junction tree varia-tional autoencoder for molecular graph generation. Inter-national Conference on Machine Learning, 2018.

Jin, W., Barzilay, R., and Jaakkola, T. Hierarchicalgraph-to-graph translation for molecules. arXiv preprintarXiv:1907.11223, 2019a.

Jin, W., Yang, K., Barzilay, R., and Jaakkola, T. Learn-ing multimodal graph-to-graph translation for molecularoptimization. International Conference on Learning Rep-resentations, 2019b.

Kajino, H. Molecular hypergraph grammar with its ap-plication to molecular optimization. arXiv preprintarXiv:1809.02745, 2018.

Kang, S. and Cho, K. Conditional molecular design withdeep generative models. Journal of chemical informationand modeling, 59(1):43–52, 2018.

Kusner, M. J., Paige, B., and Hernandez-Lobato, J. M.Grammar variational autoencoder. arXiv preprintarXiv:1703.01925, 2017.

Lee, G.-H., Jin, W., Alvarez-Melis, D., and Jaakkola, T. S.Functional transparency for structured data: a game-theoretic approach. arXiv preprint arXiv:1902.09737,2019.

Lei, T., Barzilay, R., and Jaakkola, T. Rationalizing neuralpredictions. arXiv preprint arXiv:1606.04155, 2016.

Li, Y., Zhang, L., and Liu, Z. Multi-objective de novo drugdesign with conditional graph generative model. arXivpreprint arXiv:1801.07299, 2018.

Liu, Q., Allamanis, M., Brockschmidt, M., and Gaunt, A. L.Constrained graph variational autoencoders for moleculedesign. Neural Information Processing Systems, 2018.

Ma, T., Chen, J., and Xiao, C. Constrained generationof semantically valid graphs via regularizing variationalautoencoders. In Advances in Neural Information Pro-cessing Systems, pp. 7113–7124, 2018.

Maaten, L. v. d. and Hinton, G. Visualizing data usingt-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008.

McCloskey, K., Taly, A., Monti, F., Brenner, M. P., and Col-well, L. J. Using attribution to decode binding mechanismin neural network models for chemistry. Proceedings ofthe National Academy of Sciences, 116(24):11624–11629,2019.

Murray, C. W. and Rees, D. C. The rise of fragment-baseddrug discovery. Nature chemistry, 1(3):187, 2009.

Olivecrona, M., Blaschke, T., Engkvist, O., and Chen, H.Molecular de-novo design through deep reinforcementlearning. Journal of cheminformatics, 9(1):48, 2017.

Popova, M., Isayev, O., and Tropsha, A. Deep reinforcementlearning for de novo drug design. Science advances, 4(7):eaap7885, 2018.

Ribeiro, M. T., Singh, S., and Guestrin, C. ” why shouldi trust you?” explaining the predictions of any classifier.In Proceedings of the 22nd ACM SIGKDD internationalconference on knowledge discovery and data mining, pp.1135–1144, 2016.

Rogers, D. and Hahn, M. Extended-connectivity finger-prints. Journal of chemical information and modeling, 50(5):742–754, 2010.

Page 10: Abstract arXiv:2002.03244v1 [cs.LG] 8 Feb 2020

Rosin, C. D. Multi-armed bandits with episode context.Annals of Mathematics and Artificial Intelligence, 61(3):203–230, 2011.

Samanta, B., De, A., Jana, G., Chattaraj, P. K., Gan-guly, N., and Gomez-Rodriguez, M. Nevae: A deepgenerative model for molecular graphs. arXiv preprintarXiv:1802.05283, 2018.

Seff, A., Zhou, W., Damani, F., Doyle, A., and Adams, R. P.Discrete object generation with reversible inductive con-struction. In Advances in Neural Information ProcessingSystems, pp. 10353–10363, 2019.

Segler, M. H., Kogej, T., Tyrchan, C., and Waller, M. P.Generating focussed molecule libraries for drug dis-covery with recurrent neural networks. arXiv preprintarXiv:1701.01329, 2017.

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou,I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M.,Bolton, A., et al. Mastering the game of go withouthuman knowledge. Nature, 550(7676):354–359, 2017.

Simonovsky, M. and Komodakis, N. Graphvae: Towardsgeneration of small graphs using variational autoencoders.arXiv preprint arXiv:1802.03480, 2018.

Stolle, M. and Precup, D. Learning options in reinforcementlearning. In International Symposium on abstraction,reformulation, and approximation, pp. 212–223. Springer,2002.

Sundararajan, M., Taly, A., and Yan, Q. Axiomatic attri-bution for deep networks. In Proceedings of the 34thInternational Conference on Machine Learning-Volume70, pp. 3319–3328. JMLR. org, 2017.

Sushko, I., Salmina, E., Potemkin, V. A., Poda, G., andTetko, I. V. Toxalerts: a web server of structural alerts fortoxic chemicals and compounds with potential adversereactions, 2012.

Sutton, R. S., Precup, D., and Singh, S. Between mdpsand semi-mdps: A framework for temporal abstraction inreinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.

Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour,Y. Policy gradient methods for reinforcement learningwith function approximation. In Advances in neural in-formation processing systems, pp. 1057–1063, 2000.

Weininger, D. Smiles, a chemical language and informationsystem. 1. introduction to methodology and encodingrules. Journal of chemical information and computersciences, 28(1):31–36, 1988.

Ying, Z., Bourgeois, D., You, J., Zitnik, M., and Leskovec, J.Gnnexplainer: Generating explanations for graph neuralnetworks. In Advances in Neural Information ProcessingSystems, pp. 9240–9251, 2019.

You, J., Liu, B., Ying, R., Pande, V., and Leskovec, J. Graphconvolutional policy network for goal-directed molecu-lar graph generation. arXiv preprint arXiv:1806.02473,2018a.

You, J., Ying, R., Ren, X., Hamilton, W. L., and Leskovec,J. Graphrnn: A deep generative model for graphs. arXivpreprint arXiv:1802.08773, 2018b.

Zhou, Z., Kearnes, S., Li, L., Zare, R. N., and Riley, P. Op-timization of molecules via deep reinforcement learning.arXiv preprint arXiv:1810.08678, 2018.