A Generative Model for Molecular Distance Geometrynot targeting a distribution over conformations...

A Generative Model for Molecular Distance Geometry

Gregor N. C. Simm * 1 José Miguel Hernández-Lobato 1

Abstract

Great computational effort is invested in generat-ing equilibrium states for molecular systems us-ing, for example, Markov chain Monte Carlo. Wepresent a probabilistic model that generates statis-tically independent samples for molecules fromtheir graph representations. Our model learns alow-dimensional manifold that preserves the ge-ometry of local atomic neighborhoods through aprincipled learning representation that is based onEuclidean distance geometry. In a new benchmarkfor molecular conformation generation, we showexperimentally that our generative model achievesstate-of-the-art accuracy. Finally, we show how touse our model as a proposal distribution in an im-portance sampling scheme to compute molecularproperties.

1. IntroductionOver the last few years, many highly-effective deep learningmethods generating small molecules with desired proper-ties (e.g., novel drugs) have emerged (Gómez-Bombarelliet al., 2018; Segler et al., 2018; Dai et al., 2018; Jin et al.,2018; Bradshaw et al., 2019a; Liu et al., 2018; You et al.,2018; Bradshaw et al., 2019b). These methods operateusing graph representations of molecules in which nodesand edges represent atoms and bonds, respectively. A rep-resentation that is closer to the physical system is one inwhich a molecule is described by its geometry or conforma-tion. A conformation x of a molecule is defined by a set ofatoms {(�i, ri)}Nvi=1, whereNv is the number of atoms in themolecule, �i ∈ {H,C,O, ...} is the chemical element of theatom i, and ri ∈ R3 is its position in Cartesian coordinates.Importantly, the relative positions of the atoms are restrictedby the bonds in the molecule and the angles between them.Due to thermal fluctuations resulting in stretching of and

1Department of Engineering, University of Cambridge,Cambridge, UK. Correspondence to: Gregor N. C. Simm.

Proceedings of the 37 th International Conference on MachineLearning, Online, PMLR 119, 2020. Copyright 2020 by the au-thor(s).

Figure 1. Standard graph representation G of a molecule (left)with a set of possible conformations {xi} (right). It is the goal ofthis work to generate such conformations from the graph represen-tation of a molecule. Conformations feature the same atom typesand bonds but the atoms are arranged differently in space. Thesedifferences arise from rotations around and stretching of bonds inthe molecule. Hydrogen (H), carbon (C), and oxygen (O) atomsare colored white, gray, and red, respectively.

rotations around bonds, there exist infinitely many confor-mations of a molecule. A molecule’s graph representationand a set of its conformations are shown in Fig. 1. Undera wide range of conditions, the probability p(x) of a con-formation x, is governed by the Boltzmann distribution andis proportional to exp{−E(x)/kBT}, where E(x) is theconformation’s energy, kB is the Boltzmann constant, andT is the temperature.

To compute a molecular property for a molecule, one mustsample from p(x). The main approach is to start with oneconformation and make small changes to it over time, e.g.,by using Markov chain Monte Carlo (MCMC) or molec-ular dynamics (MD). These methods can be used to ac-curately sample equilibrium states of molecules, but theybecome computationally expensive for larger ones (Shim& MacKerell, 2011; Ballard et al., 2015; De Vivo et al.,2016). Other heuristic approaches exist in which distancesbetween atoms are set to fixed idealized values (Havel, 2002;Blaney & Dixon, 2007). Several methods based on statis-tical learning have also recently been developed to tacklethe issue of conformation generation. However, they aremainly geared towards studying proteins and their foldingdynamics (AlQuraishi, 2019). Some of these models arenot targeting a distribution over conformations but the moststable folded configuration, e.g. AlphaFold (Senior et al.,

arX

iv:1

909.

1145

9v4

[st

at.M

L]

13

Aug

202

0


2020), while others are not transferable between differentmolecules (Lemke & Peter, 2019; Noé et al., 2019).

This work includes the following key contributions:

• We introduce a novel probabilistic model for learningconformational distributions of molecules with graphneural networks.

• We create a new, challenging benchmark for conforma-tion generation, which is made publicly available. Tothe best of our knowledge, this is the first benchmarkof this kind.

• By combining a conditional variational autoencoder(CVAE) with an Euclidean distance geometry (EDG)algorithm we present a state-of-the-art approach forgenerating one-shot samples of molecular conforma-tions for unseen molecules that is independent of theirsize and shape.

• We develop a rigorous experimental approach for eval-uating and comparing the accuracy of conformationgeneration methods based on the mean maximum devi-ation distance metric.

• We show how this generative model can be used as aproposal distribution in an importance sampling (IS)scheme to estimate molecular properties.

2. MethodOur goal is to build a statistical model that generates molec-ular conformations in a one-shot fashion from a molecule’sgraph representation. First, we describe how a molecule’sconformation can be represented by a set of pairwise dis-tances between atoms and why this presentation is advan-tageous over one in Cartesian coordinates (Section 2.1).Second, we present a generative model in Section 2.2 thatwill generate sets of atomic distances for a given molecu-lar graph. Third, we explain in Section 2.3 how a set ofpredicted distances can be transformed into a molecular con-formation and why this transformation is necessary. Finally,we detail in Section 2.4 how our generative model can beused as a proposal distribution in an IS scheme to estimatemolecular properties.

2.1. Extended Molecular Graphs and DistanceGeometry

In this study, a molecule is represented by an undirectedgraph which is defined as a tupleG = (V,E). V = {vi}Nvi=1is the set of nodes representing atoms, where each vi ∈ RFvholds atomic attributes (e.g., the element type �i). E ={(ek, rk, sk)}Nek=1 is the set of edges, where each ek ∈ RFeholds an edge’s attributes (e.g., the bond type), and rk and

sk are the nodes an edge is connecting. Here, E representsthe molecular bonds in the molecule.

We assume that, given a molecular graph G, one can repre-sent one of its conformations x by a set of atomic distancesd = {dk}Nek=1, where dk = |rrk − rsk | is the Euclideandistance between the positions of the atoms rk and sk inthis conformation. As the set of edges between the bondedatoms (Ebond) alone would not suffice to describe a confor-mation, we expand the traditional graph representation of amolecule by adding auxiliary edges to obtain an extendedgraph G. Auxiliary edges between atoms that are secondneighbors in the original graph G fix angles between atoms,and those between third neighbors fix dihedral angles (de-noted Eangle and Edihedral, respectively). In this work, Eangleare added between nodes in G which are second neighborsin G. After all Eangle have been added, additional edgesare added to G from a node v to a randomly chosen thirdneighbor of v in G if v has less then three neighbors inG. Therefore, a graph G can give rise to multiple differentextend graphs G. In Fig. 2, the process of extending themolecular graph and the extraction of d from x and G areillustrated.

A key advantage of a representation in terms of distancesis its invariance to rotation and translation; by contrast,Cartesian coordinates depend on the (arbitrary) choice oforigin, for example. In addition, it reflects pair-wise physicalinteractions and their generally local nature. Auxiliary edgescan be placed between higher-order neighbors depending onhow far the physical interactions dominating the potentialenergy of the system reach.

We have a set of NG pairs, {Gi, xi}NGi=1, consisting of amolecular graph and a conformation. With the protocoldescribed above, we convert each pair into a pair of anextended molecular graph together with a set of distancesd to obtain {Gi,di}NGi=1. With this data, we will train agenerative model which we detail in the following section.

2.2. Generative Model

We employ a CVAE (Kingma & Welling, 2014; Pagnoniet al., 2018) to model the distribution over distances d givena molecular graph G. A CVAE first encodes G together withd into a latent space z ∈ RkNv , where k ∈ N+, with anencoder qφ(z|d,G). Subsequently, the decoder pθ(d|z,G)decodes z back into a set of distances. A graphical model isshown in Fig. 2, C.

A conformation has, in general, 3Nv − 6 spatial degrees offreedom (dofs): one dof per spacial dimension per atom mi-nus three translational and three rotational dofs. Therefore,the latent space should be proportional to the number ofatoms in the molecule. In addition, the latent space shouldbe smaller than 3Nv as it is the role of the encoder to project


A B C

OH

H

HH ...

Figure 2. A) The structural formula of a molecule G is converted to an extended molecular graph G consisting of nodes representingatoms (circles, e.g., v1) and edges representing molecular bonds (solid lines, e.g., e1 ∈ Ebond) and auxiliary edges (dotted lines, e.g.,e2 ∈ Eangle and e3 ∈ Edihedral). B) The distances d are extracted from a conformation x based on the edges E. C) Graphical model of thevariational autoencoder: generative model pθ(d|z,G)pθ(z|G) (solid lines) and variational approximation qφ(z|d,G) (dashed lines).

the conformation into a lower-dimensional space. As aresult, we set k = 1.1

Here, qφ(z|d,G) and pθ(d|z,G) are Gaussian distributions,the mean and variance of which are modeled by two ar-tificial neural networks. At the center of this model aremessage-passing neural networks (MPNNs) (Gilmer et al.,2017). In short, an MPNN is a convolutional neural net-work that allows end-to-end learning of prediction pipelineswhose inputs are graphs of arbitrary size and shape. In aconvolution, neighboring nodes exchange so-called mes-sages between neighbors to update their attributes. Edgesupdate their attributes with the features of the nodes theyare connecting. The MPNN is a well-studied techniquethat achieves state-of-the-art performance in representationlearning for molecules (Kipf & Welling, 2017; Duvenaudet al., 2015; Kearnes et al., 2016; Schütt et al., 2017b; Gilmeret al., 2017; Kusner et al., 2017; Bradshaw et al., 2019a).

In the following, we describe the details of the modewhich is illustrated in Fig. 3.2 In the encoder qφ(z|d,G),each dk is concatenated with the respective edge fea-ture ek to give e′k ∈ RFe+1. Then, each vi andeach e′k are passed to Fenc,v and Fenc,e (two multilayerperceptrons, MLPs), respectively, to give G(0)enc , whereG(t)enc = ({v(t)i,enc}

Nvi=1, {(e

(t)k,enc, rk, sk)}

Nek=1), v

(t)i,enc ∈ RLv ,

and e(t)k,enc ∈ RLe . Then, T MPNNs of depth 1, {MP(t)enc}Tt=1,

are consecutively applied to obtain G(T )enc . Finally, the read-out function Renc (an MLP) takes each v

(T )i, enc to predict the

mean µzi ∈ R and the variance σ2zi ∈ R of the Gaussiandistribution for zi. The so-called reparametrization trick isemployed to draw a sample for zi. In summary,

v(0)i,enc = Fenc,v(vi), e

(0)k,enc = Fenc,e(e

′i), (1)

1 Experiments showed that our model performs similarly wellwith a latent space of R2Nv and R3Nv . We chose to use k = 1 forsimplicity.

2The model is available online https://github.com/gncs/graphdg

G(t+1)enc = MP(t)enc(G(t)enc), (2)

µzi , σ2zi = Renc(v

(T )i,enc). (3)

In the decoder pθ(d|z,G), each zi is concatenated withthe respective node feature vi to give v′i ∈ RFv+1.Each v′i and each ek are passed to Fdec,v and Fdec,e(two MLPs), respectively, to give G(0)dec , where G

(t)dec =

({v(t)i,dec}Nvi=1, {(e

(t)k,dec, rk, sk)}

Nek=1), v

(t)i,dec ∈ RLv , and

e(t)k,dec ∈ RLe . Then, T MPNNs of depth 1, {MP

(t)dec}Tt=1,

are consecutively applied to obtain G(T )dec . Finally, the read-out function Rdec (an MLP) takes each e

(T )k, dec to predict the

mean µdk ∈ R and the variance σ2dk ∈ R of the Gaussiandistribution for dk. In summary,

v(0)i,dec = Fdec,v(v

′i), e

(0)k,dec = Fdec,e(ei), (4)

G(t+1)dec = MP(t)dec(G

(t)dec), (5)

µdk , σ2dk

= Rdec(e(T )k,dec). (6)

The sets of parameters in the encoder and decoder, φ and θ(i.e., parameters in Fenc,v, Fenc,e, {MP(t)enc}Tt=1, Renc, Fdec,v,Fdec,e, {MP(t)dec}Tt=1, Rdec), respectively, are optimized bymaximizing the evidence lower bound (ELBO):

L = Ez∼qφ(z|d,G)[log pθ(d|z,G)]−DKL[qφ(z|d,G)||pθ(z|G)],

(7)

where the prior pθ(z|G) consists of factorized standard Gaus-sians. The optimal values for the hyperparameters for thenetwork dimensions, number of message passes, batch size,and learning rate of the Adam optimizer (Kingma & Ba,2014) were manually tuned by maximizing the validationperformance (ELBO) and are reported in the Appendix.

2.3. Conformation Generation through EuclideanDistance Geometry

To compute molecular properties, quantum-chemical meth-ods need to be employed which require the input, i.e., the

https://github.com/gncs/graphdghttps://github.com/gncs/graphdg


+

+

...

+

......

......

.........

...

...

...

Figure 3. The molecular graph G together with the distances d are passed through the model consisting of an encoder qφ(z|d,G) and adecoder pθ(d|z,G). See Section 2.2 for details.

molecule, to be in Cartesian coordinates.3 Therefore, we usean EDG algorithm to translate the set of distances {dk}Nek=1to a set of atomic coordinates {ri}Nvi=1.4

EDG is the mathematical basis for a geometric theory ofmolecular conformation. In the field of machine learning,Weinberger & Saul (2006) used it for learning image man-ifolds, Tenenbaum et al. (2000) for image understandingand handwriting recognition, Jain & Saul (2004) for speechand music, and Demaine et al. (2009) for music and musi-cal rhythms. An EDG description of a molecular systemconsists of a list of lower and upper bounds on the dis-tances between pairs of atoms {(dk,min, dk,max)}Nek=1. Here,pθ(d|z,G) is used to model these bounds, namely, we setthe bounds to {(µdk − σdk , µdk + σdk)}, where µdk andσdk are the mean and standard deviation for each distancedk given by the CVAE. Then, an EDG algorithm determinesa set of Cartesian coordinates {ri}Nvi=1 so that these boundsare fulfilled (see the Appendix for details).5 Together withthe corresponding chemical elements {�i}Nvi=1, we obtain aconformation x.

2.4. Calculation of Molecular Properties

We can get an MC estimate of the expectation E[O] of aproperty O (e.g., the dipole moment) for a molecule rep-

3 Even though quantum-chemical methods require the inputto be in Cartesian coordinates, calculated properties, such as theenergy, are invariant under translation and rotation.

4 There are additional constraints due to chirality. However,since they are given by G and are fixed, they are not modeled byour method.

5 Often there exist multiple solutions for the same set of bounds.As the bounds are generally tight, the solutions are very similar.Therefore, we only generate one set of coordinates per set ofbounds.

resented by G by generating an extended graph G, draw-ing conformational samples xi ∼ p(x|G), and comput-ing O(xi) ∈ R with a quantum-chemical method (e.g.,density functional theory). Since we cannot draw sam-ples from p(x|G) directly, we employ an IS integrationscheme (Bishop, 2009) with our CVAE as the proposaldistribution. We assume that we can readily evaluate theunnormalized probability of a conformation p̃(x|G) =exp{−E(x)/kBT}, where x must be a conformation ofthe molecule and the energy E(x) is determined witha quantum-chemical method. Since the EDG algorithmis mapping the distribution pθ(d|z,G) to a point massin R3Nv , the MC estimate for the resulting distributionpprop(x|G) is approximated by a mixture of delta functions,each of which is centered at the xi resulting from map-ping pθ(d|zi,G) to R3Nv , where zi ∼ pθ(z|G), that is,pprop(x|G) ≈ 1N

∑Ni=1 δ(x− xi). The IS estimator for the

expectation of O w. r. t. p̃(x|G) then reads

ÊG [O]MC≈ 1

N

N∑i=1

O(xi)IS=

1

N

N∑i=1

O(x′i)p̃(x′i|G)

pprop(x′i|G),

(8)where xi ∼ p̃(xi|G) and x′i ∼ pprop(x′i|G), so that theexpectation of O w. r. t. the normalized version of p̃(x) isthen

EG [O] =ÊG [O]ÊG [1]

≈ 1Z

N∑i=1

O(xi)p̃(x′i|G), (9)

where ÊG [1] is the expectation of an operator that returns 1for every conformation x, Z ≈

∑Ni=1 p̃(x

′i | G), and N is

the number of samples. When dividing two delta functionswe have assumed that they take some arbitrarily large finitevalue.


3. Related WorksThe standard approach for generating molecular conforma-tions is to start with one, and make small changes to it overtime, e.g., by using MCMC or MD. These methods are con-sidered the gold standard for sampling equilibrium states,but they are computationally expensive, especially if themolecule is large and the Hamiltonian is based on quantum-mechanical principles (Shim & MacKerell, 2011; Ballardet al., 2015; De Vivo et al., 2016).

A much faster but more approximate approach for confor-mation generation is EDG (Havel, 2002; Blaney & Dixon,2007; Lagorce et al., 2009; Riniker & Landrum, 2015).Lower and upper distance bounds for pairs of atoms ina molecule are fixed values based on ideal bond lengths,bond angles, and torsional angles. These values are often ex-tracted from crystal structure databases (Allen, 2002). Thesemethods aim to produce a low-energy conformation, not togenerate unbiased samples from the underlying distributionat a certain temperature.

There exist several machine learning approaches as well,however, they are mostly tailored towards studying proteindynamics. For example, Noé et al. (2019) trained Boltzmanngenerators on the energy function of proteins to provide un-biased, one-shot samples from their equilibrium states. Thisis achieved by training an invertible neural network to learna coordinate transformation from a systems configurationsto a latent space representation. Further, Lemke & Peter(2019) proposed a dimensionality reduction algorithm thatis based on a neural network autoencoder in combinationwith a nonlinear distance metric to generate samples forprotein structures. Both models learn protein-specific coor-dinate transformations that cannot be transferred to othermolecules.

AlQuraishi (2019) introduced an end-to-end differentiablerecurrent geometric network for protein structure learningbased on amino acid sequences. Also, Ingraham et al. (2019)proposed a neural energy simulator model for protein struc-ture that makes use of protein sequence information. Re-cently, Senior et al. (2020) significantly advanced the fieldof protein-structure prediction with a new model called Al-phaFold. In contrast to amino acid sequences, moleculargraphs are, in general, not linear but highly branched and of-ten contain cycles. This makes these approaches unsuitablefor general molecules.

Finally, Mansimov et al. (2019) presented a conditionaldeep generative graph neural network to generate molecularconformations given a molecular graph. Their goal is to pre-dict the most likely conformation and not a distribution overconformations. Instead of encoding molecular environmentsin atomic distances, they work directly in Cartesian coor-dinates. As a result, the generated conformations showed

significant structural differences compared to the ground-truth and required refinement through a force field, which isoften employed in MD simulations.

We argue that our model has several advantages over theapproaches reviewed above:

• It is a fast alternative to resource-intensive approachesbased on MCMC or MD.

• Our principled representation based on pair-wise dis-tances does not restrict our approach to any particularmolecular structure.

• Our model is, in principle, transferable to unseenmolecules.

4. The CONF17 BenchmarkThe CONF17 benchmark is the first benchmark for molec-ular conformation sampling.6 It is based on the ISO17dataset (Schütt et al., 2017a) which consists of conforma-tions of various molecules with the atomic compositionC7H10O2 drawn from the QM9 dataset (Ramakrishnanet al., 2014). These conformations were generated by ab ini-tio molecular dynamics simulations at 500 Kelvin. From theISO17 dataset, 430692 valid molecular graph-conformationpairs could be extracted and 197 unique molecular graphscould be identified. We split the dataset into training andtest sets such that no molecular graph in the training set canbe found in the test or vice versa. Training and test splitsconsist of 176 and 30 unique molecular graphs, respectively(see Appendix A for details).

In Fig. 4, A, the structural formulae of a random selection ofmolecules from this benchmark are shown. Most moleculesfeature highly-strained, complex 3D structures such as ringswhich are typical of drug-like molecules. It is thus thestructural complexity of the molecules, not their number ofdegrees of freedom, that makes this benchmark challenging.In Fig. 4, B–D, the frequency of distances (in Å) in theconformations are shown for each edge type. It can be seenthat the marginal distributions of the edge distances aremultimodal and highly context-dependent.

5. ExperimentsWe assess the performance of our method, named GraphDistance Geometry (GRAPHDG), by comparing it with twostate-of-the-art methods for molecular conformation gen-eration: RDKIT (Riniker & Landrum, 2015), a classicalEDG approach, and DL4CHEM (Mansimov et al., 2019),

6 Datasets such as the one published by Kanal et al. (2018) onlyinclude conformers, i.e., the stable conformations of a molecule,and not a distribution over conformations.


CH3O

O

H3CH3C

HOO

H3C

CH3O

O

OHH3C

O

1.0 1.5Distance [ ]

0

250

500

750

1000

1250

1500

1750

Coun

t

1e3

H-CH-OC-CC-O

2 3Distance [ ]

0

500

1000

1500

2000

Coun

t

1e3

H-HH-CH-OC-CC-OO-O

2 3 4Distance [ ]

0

20

40

60

80

100

120

Coun

t

1e3

H-HH-CH-OC-CC-OO-O

A B

C D

Figure 4. Overview of the CONF17 benchmark. A: Structuralformulae of a random selection of molecules. B–D: Distributionof distances (in Å) grouped by edge (B: Ebond, C: Eangle, and D:Edihedral) and vertex type (chemical element).

a machine learning approach. We trained GRAPHDG andDL4CHEM on three different training and test splits of theCONF17 benchmark using Adam (Kingma & Ba, 2014). Wegenerated 100 conformations with each method for molecu-lar graphs in a test set.

5.1. Distributions Over Distances

We assessed the accuracy of the distance distributions ofRDKIT, DL4CHEM, and GRAPHDG by calculating themaximum mean discrepancy (MMD) (Gretton et al., 2012)to the ground-truth distribution. In particular, we computethe MMD using a Gaussian kernel, where we set the stan-dard deviation to be the median distance between distancesd in the aggregate sample. For this, we determined the dis-tances in the conformations from the ground-truth and thosegenerated by RDKIT, DL4CHEM, and GRAPHDG. Foreach train-test split and each G in a test set, we compute theMMD of the joint distribution of distances between C and Oatoms p({dk}|G) (H atoms are usually ignored), the MMDsof pair-wise distances p(di, dj |G), and the MMDs betweenthe marginals of individual distances p(di|G). We aggregatethe results of three train-test splits, and, finally, computethe median MMDs and average rankings. The results are

summarized in Table 1. It can be seen that the samples fromGRAPHDG are significantly closer to the ground-truth dis-tribution than the other methods. RDKIT is slightly worsethan GRAPHDG while DL4CHEM seems to struggle withthe complexity of the molecules and the small number ofgraphs in the training set.

In Fig. 5, we showcase the accuracy of our model by plottingthe marginal distributions p(di|G) for distances between Cand O atoms, given a molecular graph from a test set. Itcan be seen that RDKIT consistently underestimates themarginal variances. This is because this method aims topredict the most stable conformation, i.e., the distribution’smode. In contrast, DL4CHEM often fails to predict thecorrect mean. For this molecule, GRAPHDG is the mostaccurate, predicting the right mean and variance in mostcases. Additional figures can be found in the Appendix,where we also show plots for the marginal distributionsp(di, dj |G).

5.2. Generation of Conformations

We passed the distances from our generative model to anEDG algorithm to obtain conformations. For 99.9% of thesets of distances, all triangle inequalities held. For 83% ofthe molecular graphs, the algorithm succeeded which is 7 pphigher than the success rate we observed for RDKIT. Foreach molecular graph in a test set, we generated 50 confor-mations with each method. This took DL4CHEM, RDKIT,and GRAPHDG on average around hundreds of millisec-onds per molecule.7 In contrast, a single conformation inthe ISO17 dataset takes around a minute to compute.

To assess the approximations made in the IS scheme, westudied the overlap between p(d|z,G) for a given G anddifferent samples of z. We found experimentally that for 50samples the overlap between the distributions is small. Thisfinding can be explained by the high dimensionality of dwhich is on average ≈ 60.

In Fig. 6, an overlay of these conformations of six moleculesgenerated by the different methods is shown. It can be seenthat RDKIT’s conformations show too little variance, whileDL4CHEM’s structures are mostly invalid, which is due inpart to its failure to predict the correct interatomic angles.Our method slightly overestimates the structural variance(see, for example, Fig. 6, top row, second column), butproduces conformations that are the closest to the ground-truth.

5.3. Calculation of Molecular Properties

We estimate expected molecular properties for moleculargraphs from the test set with N = 50 conformational sam-

7All simulations were carried out on a computer equipped withan i7-3820 CPU and a GeForce GTX 1080 Ti GPU.


Table 1. Assessment of the accuracy of the distributions over conformations generated by three models compared to the ground-truth. Wecompare the distributions with respect to the marginals p(dk|G), p(dk, dl|G), and the distribution over all edges between C and O atomsp({dk}|G). Two different metrics are used: median MMD between ground-truth conformations and generated ones, and mean ranking (1to 3) based on the MMD. Reported are the results for molecular graphs in a test set from three train-test splits. Standard deviations aregiven in brackets.

Median MMD Mean RankingRDKIT DL4CHEM GRAPHDG RDKIT DL4CHEM GRAPHDG

p(dk|G) 0.37 (0.23) 1.11 (0.25) 0.13 (0.13) 1.98 (0.44) 2.90 (0.35) 1.12 (0.33)p(dk, dl|G) 0.47 (0.18) 1.12 (0.15) 0.14 (0.11) 1.95 (0.29) 2.98 (0.13) 1.07 (0.26)p({dk}|G) 0.57 (0.11) 1.03 (0.13) 0.19 (0.08) 2.00 (0.00) 3.00 (0.00) 1.00 (0.00)

1.0 1.50

50

100

Coun

t 0 1

0.5 1.0 1.5

1 2

2 3

1 7

1.5 2.0

1 8

1.5 2.0

2 3

1.5 2.0

3 4

1.5 2.0

3 5

1.5 2.0 2.5

3 8

1 20

50

100

Coun

t 5 6

1.0 1.5

6 7

1.0 1.5

7 8

1 2

0 2

2.5 3.0 3.5

0 7

2.5 3.0

0 8

1.5 2.0

1 3

2.5 3.0 3.5

1 6

0 20

50

100

Coun

t 2 4

2.5 3.0 3.5

2 5

2.50 2.75

2 7

2.00 2.25

2 8

2.25 2.50 2.75

3 6

1.5 2.0

3 7

2.5 3.0 3.5

4 5

2.0 2.5

4 8

1.5 2.0 2.50

50

Coun

t 5 7

2.5 3.0

5 8

2.0 2.5

6 8 Ground-truth

GraphDG

DL4ChemRDKit

0

1

2

87

34

65

Figure 5. Marginal distributions p(dk|G) of ground-truth and predicted distances (in Å) between C and O atoms given a molecular graphfrom a test set. The atoms connected by each edge dk are indicated in each subplot (sk–rk). In the 3D structure of the molecule, carbonand oxygen atoms are colored gray and red, respectively. H atoms are omitted for clarity.

Ground-truth RDKit DL4Chem GraphDG Ground-truth RDKit DL4Chem GraphDG

Figure 6. Overlay of 50 conformations from the ground-truth and three models based on six random molecular graphs from the test set.C, O, and H atoms are colored gray, red, and white, respectively.


Table 2. Median difference in average properties between ground-truth and RDKIT and GRAPHDG: total electronic energy Eelec(in kJ/mol), the energy of the HOMO and the LUMO �HOMO and�LUMO, respectively (in eV), and the dipole moment µ (in debye).Reported are the results for molecular graphs from the test set,averaged over three train-test splits. Standard errors are given inbrackets.

RDKIT GRAPHDG

Eelec 42.7 (4.3) 58.0 (21.0)�HOMO 0.08 (0.04) 0.10 (0.05)�LUMO 0.15 (0.03) 0.09 (0.05)µ 0.29 (0.05) 0.33 (0.09)

ples each. Due to their poor quality, we could not computeproperties O(x), including the energy E(x), for conforma-tions generated with DL4CHEM, and thus, this method isexcluded from this analysis. In Table 2, it can be seen thatRDKIT and GRAPHDG perform similarly well (computa-tional details can be found in the Appendix). However, bothmethods are still highly inaccurate for Eelec (in practice, anaccuracy of less than 5 kJ/mol is required). Close inspectionof the conformations shows that, even though GRAPHDGpredicts the most accurate distances overall, the variancesof certain strongly constrained distances (e.g., triple bonds)are overestimated so that the energies of the conformationsincrease drastically.

6. LimitationsThe first limitation of this work is that the CVAE can sampleinvalid sets of distances for which there exists no 3D struc-ture. Second, the CONF17 benchmark covers only a smallportion of chemical space. Finally, a large set of auxiliaryedges would be required to capture long-range correlations(e.g., in proteins). Future work will address these points.

7. ConclusionsWe presented GRAPHDG, a transferable, generative modelthat allows sampling from a distribution over molecular con-formations. We developed a principled learning represen-tation of conformations that is based on distances betweenatoms. Then, we proposed a challenging benchmark forcomparing molecular conformation generators. With thisbenchmark, we show experimentally that conformationsgenerated by GRAPHDG are closer to the ground-truth thanthose generated by other methods. Finally, we employ ourmodel as a proposal distribution in an IS integration schemeto estimate molecular properties. While orbital energies andthe dipole moments were predicted well, a larger and morediverse dataset will be necessary for meaningful estimatesof electronic energies. Further, methods have to be devised

to estimate how many conformations need to be generatedto ensure all important conformations have been sampled.Finally, our model could be trained on conformational distri-butions at different temperatures in a transfer learning-typesetting.

AcknowledgmentsWe would like to thank the anonymous reviewers fortheir valuable feedback. We further thank Robert Per-harz and Hannes Harbrecht for useful discussions and feed-back. GNCS acknowledges funding through an Early Post-doc.Mobility fellowship by the Swiss National ScienceFoundation (P2EZP2 181616).

ReferencesAllen, F. H. The Cambridge Structural Database: A quarter

of a million crystal structures and rising. Acta Crystallogr.,Sect. B: Struct. Sci, 58(3):380–388, 2002. doi: 10.1107/S0108768102003890.

AlQuraishi, M. End-to-End Differentiable Learning of Pro-tein Structure. Cell Systems, 8(4):292–301.e3, 2019. doi:10.1016/j.cels.2019.03.006.

Ballard, A. J., Martiniani, S., Stevenson, J. D., Somani, S.,and Wales, D. J. Exploiting the potential energy landscapeto sample free energy. WIREs Comput. Mol. Sci., 5(3):273–289, 2015. doi: 10.1002/wcms.1217.

Bishop, C. M. Pattern Recognition and Machine Learning.Information Science and Statistics. Springer, New York,8 edition, 2009. ISBN 978-0-387-31073-2.

Blaney, J. M. and Dixon, J. S. Distance Geometry in Molec-ular Modeling. In Reviews in Computational Chemistry,pp. 299–335. John Wiley & Sons, Ltd, 2007. ISBN 978-0-470-12582-3. doi: 10.1002/9780470125823.ch6.

Bradshaw, J., Kusner, M. J., Paige, B., Segler, M. H. S.,and Hernández-Lobato, J. M. A generative model forelectron paths. In International Conference on LearningRepresentations, 2019a.

Bradshaw, J., Paige, B., Kusner, M. J., Segler, M. H. S.,and Hernández-Lobato, J. M. A Model to Search forSynthesizable Molecules. arXiv:1906.05221, 2019b.

Dai, H., Tian, Y., Dai, B., Skiena, S., and Song, L. Syntax-directed variational autoencoder for structured data. InInternational Conference on Learning Representations,2018.

De Vivo, M., Masetti, M., Bottegoni, G., and Cavalli, A.Role of Molecular Dynamics and Related Methods inDrug Discovery. J. Med. Chem., 59(9):4035–4061, 2016.doi: 10.1021/acs.jmedchem.5b01684.


Demaine, E. D., Gomez-Martin, F., Meijer, H., Rappa-port, D., Taslakian, P., Toussaint, G. T., Winograd, T.,and Wood, D. R. The distance geometry of music.Computational Geometry, 42(5):429–454, 2009. doi:10.1016/j.comgeo.2008.04.005.

Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell,R., Hirzel, T., Aspuru-Guzik, A., and Adams, R. P. Con-volutional Networks on Graphs for Learning MolecularFingerprints. In Cortes, C., Lawrence, N. D., Lee, D. D.,Sugiyama, M., and Garnett, R. (eds.), Advances in Neu-ral Information Processing Systems 28, pp. 2224–2232.Curran Associates, Inc., 2015.

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., andDahl, G. E. Neural message passing for quantum chem-istry. In Proceedings of the 34th International Conferenceon Machine Learning - Volume 70, ICML’17, pp. 1263–1272, 2017.

Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D.,Hernández-Lobato, J. M., Sánchez-Lengeling, B., She-berla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams,R. P., and Aspuru-Guzik, A. Automatic Chemical De-sign Using a Data-Driven Continuous Representation ofMolecules. ACS Cent. Sci., 4(2):268–276, 2018. doi:10.1021/acscentsci.7b00572.

Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf,B., and Smola, A. A Kernel Two-Sample Test. J. Mach.Learn. Res., 13:723–773, 2012.

Havel, T. F. Distance Geometry: Theory, Algorithms, andChemical Applications. In Encyclopedia of Computa-tional Chemistry. American Cancer Society, 2002. ISBN978-0-470-84501-1. doi: 10.1002/0470845015.cda018.

Ingraham, J., Riesselman, A., Sander, C., and Marks, D.Learning Protein Structure with a Differentiable Simula-tor. In International Conference on Learning Representa-tions, 2019.

Jain, V. and Saul, L. K. Exploratory analysis and visual-ization of speech and music by locally linear embedding.In 2004 IEEE International Conference on Acoustics,Speech, and Signal Processing, volume 3, pp. iii–984,2004. doi: 10.1109/ICASSP.2004.1326712.

Jensen, J. XYZ2Mol.https://github.com/jensengroup/xyz2mol, 2019.

Jin, W., Barzilay, R., and Jaakkola, T. Junction tree vari-ational autoencoder for molecular graph generation. InDy, J. and Krause, A. (eds.), Proceedings of the 35th In-ternational Conference on Machine Learning, volume 80of Proceedings of Machine Learning Research, pp. 2323–2332, Stockholm, Sweden, 2018. PMLR.

Kanal, I. Y., Keith, J. A., and Hutchison, G. R. A soberingassessment of small-molecule force field methods for lowenergy conformer predictions. Int. J. Quantum Chem.,118(5):e25512, 2018. doi: 10.1002/qua.25512.

Kearnes, S., McCloskey, K., Berndl, M., Pande, V., andRiley, P. Molecular graph convolutions: Moving beyondfingerprints. J. Comput.-Aided Mol. Des., 30(8):595–608,2016. doi: 10.1007/s10822-016-9938-8.

Kingma, D. P. and Ba, J. Adam: A Method for StochasticOptimization. arXiv:1412.6980, 2014.

Kingma, D. P. and Welling, M. Auto-Encoding VariationalBayes. In International Conference on Learning Repre-sentations, 2014.

Kipf, T. N. and Welling, M. Semi-Supervised Classifica-tion with Graph Convolutional Networks. InternationalConference on Learning Representations, 2017.

Kusner, M. J., Paige, B., and Hernández-Lobato, J. M.Grammar Variational Autoencoder. In Precup, D. andTeh, Y. W. (eds.), Proceedings of the 34th InternationalConference on Machine Learning, volume 70 of Pro-ceedings of Machine Learning Research, pp. 1945–1954,International Convention Centre, Sydney, Australia, 2017.PMLR.

Lagorce, D., Pencheva, T., Villoutreix, B. O., and Miteva,M. A. DG-AMMOS: A New tool to generate 3D con-formation of small molecules using Distance Geometryand Automated Molecular Mechanics Optimization forin silico Screening. BMC Chem. Biol., 9:6, 2009. doi:10.1186/1472-6769-9-6.

Lemke, T. and Peter, C. EncoderMap: DimensionalityReduction and Generation of Molecule Conformations.J. Chem. Theory Comput., 2019. doi: 10.1021/acs.jctc.8b00975.

Liu, Q., Allamanis, M., Brockschmidt, M., and Gaunt,A. Constrained Graph Variational Autoencoders forMolecule Design. In Bengio, S., Wallach, H., Larochelle,H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.),Advances in Neural Information Processing Systems 31,pp. 7795–7804. Curran Associates, Inc., 2018.

Mansimov, E., Mahmood, O., Kang, S., and Cho, K. Molec-ular Geometry Prediction using a Deep Generative GraphNeural Network. Sci. Rep., 9(1):1–13, 2019. doi:10.1038/s41598-019-56773-5.

Noé, F., Olsson, S., Köhler, J., and Wu, H. Boltzmann gener-ators: Sampling equilibrium states of many-body systemswith deep learning. Science, 365(6457):eaaw1147, 2019.doi: 10.1126/science.aaw1147.


Pagnoni, A., Liu, K., and Li, S. Conditional Vari-ational Autoencoder for Neural Machine Translation.arXiv:1812.04405, 2018.

Perdew, J. P., Ernzerhof, M., and Burke, K. Rationale formixing exact exchange with density functional approxi-mations. J. Chem. Phys., 105(22):9982–9985, 1996. doi:10.1063/1.472933.

Ramakrishnan, R., Dral, P. O., Rupp, M., and von Lilienfeld,O. A. Quantum chemistry structures and properties of134 kilo molecules. Sci. Data, 1:140022, 2014. doi:10.1038/sdata.2014.22.

Riniker, S. and Landrum, G. A. Better Informed DistanceGeometry: Using What We Know To Improve Conforma-tion Generation. J. Chem. Inf. Model., 55(12):2562–2574,2015. doi: 10.1021/acs.jcim.5b00654.

Schütt, K., Kindermans, P.-J., Sauceda Felix, H. E., Chmiela,S., Tkatchenko, A., and Müller, K.-R. SchNet: Acontinuous-filter convolutional neural network for model-ing quantum interactions. In Guyon, I., Luxburg, U. V.,Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S.,and Garnett, R. (eds.), Advances in Neural InformationProcessing Systems 30, pp. 991–1001. Curran Associates,Inc., 2017a.

Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R.,and Tkatchenko, A. Quantum-chemical insights fromdeep tensor neural networks. Nat. Commun., 8:13890,2017b. doi: 10.1038/ncomms13890.

Segler, M. H. S., Kogej, T., Tyrchan, C., and Waller, M. P.Generating Focused Molecule Libraries for Drug Discov-ery with Recurrent Neural Networks. ACS Cent. Sci., 4(1):120–131, 2018. doi: 10.1021/acscentsci.7b00512.

Senior, A. W., Evans, R., Jumper, J., Kirkpatrick, J., Sifre,L., Green, T., Qin, C., Žı́dek, A., Nelson, A. W. R.,Bridgland, A., Penedones, H., Petersen, S., Simonyan,K., Crossan, S., Kohli, P., Jones, D. T., Silver, D.,Kavukcuoglu, K., and Hassabis, D. Improved proteinstructure prediction using potentials from deep learning.Nature, pp. 1–5, 2020. doi: 10.1038/s41586-019-1923-7.

Shim, J. and MacKerell, Jr., A. D. Computational ligand-based rational design: Role of conformational samplingand force fields in model development. Med. Chem. Com-mun., 2(5):356–370, 2011. doi: 10.1039/C1MD00044F.

Sun, Q., Berkelbach, T. C., Blunt, N. S., Booth, G. H., Guo,S., Li, Z., Liu, J., McClain, J. D., Sayfutyarova, E. R.,Sharma, S., Wouters, S., and Chan, G. K.-L. PySCF:The Python-based simulations of chemistry framework.WIREs Comput. Mol. Sci., 8(1), 2018. doi: 10.1002/wcms.1340.

Tenenbaum, J. B., de Silva, V., and Langford, J. C. AGlobal Geometric Framework for Nonlinear Dimension-ality Reduction. Science, 290(5500):2319–2323, 2000.doi: 10.1126/science.290.5500.2319.

Weigend, F. Accurate Coulomb-fitting basis sets for H toRn. Phys. Chem. Chem. Phys., 8(9):1057–1065, 2006.doi: 10.1039/B515623H.

Weigend, F. and Ahlrichs, R. Balanced basis sets of splitvalence, triple zeta valence and quadruple zeta valencequality for H to Rn: Design and assessment of accuracy.Phys. Chem. Chem. Phys., 7(18):3297–3305, 2005. doi:10.1039/B508541A.

Weinberger, K. Q. and Saul, L. K. Unsupervised Learningof Image Manifolds by Semidefinite Programming. Int.J. Comput. Vision, 70(1):77–90, 2006. doi: 10.1007/s11263-005-4939-z.

You, J., Liu, B., Ying, Z., Pande, V., and Leskovec, J.Graph Convolutional Policy Network for Goal-DirectedMolecular Graph Generation. In Bengio, S., Wallach,H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., andGarnett, R. (eds.), Advances in Neural Information Pro-cessing Systems 31, pp. 6410–6421. Curran Associates,Inc., 2018.


A. CONF17 BenchmarkA.1. Data Generation

The ISO17 dataset (Schütt et al., 2017a) was processedin the following way. First, conformations in which couldnot be parsed by the tool XYZ2MOL (Jensen, 2019) werediscarded. Second, the molecular graphs were augmentedby adding auxiliary edges for reasons described in the maintext. This can lead to an over-specification of the system’sgeometry, however, this did not pose a problem in our ex-periments.

A.2. Input Features

Below we list the node and edge features in the CONF17benchmark.

Table 3. Node features.

Feature Data Type Dimension

atomic number one-hot (H, He, ... F) 9chiral tag one-hot (R, S, and None) 3

Table 4. Edge features.

Feature Data Type Dimension

kind one-hot (Ebond, Eangle,or Edihedral)

3

stereo chemistry one-hot (E, Z, Any,None)

4

type one-hot (single, dou-ble, triple, aromatic orNone)

5

is aromatic binary 1is conjugated binary 1is in ring of size one-hot (3, 4, . . . , 9) or

None7

B. Model ArchitectureThe source code of the model (including pre-processingscripts) is available online https://github.com/gncs/graphdg. In Table 5, the model architecture aresummarized. After each hidden layer, a ReLU non-linearityis used. In Table 6, all hyperparameters are listed.

Table 5. Model architecture for generative model.

Network Output Layers

Fenc,v 50, 50, 10Fenc,e 50, 50, 10

MP(t)enc 50, 50, 10MP(t)dec 50, 50, 10Rdec,v 50, 50, 1Rdec,e 50, 50, 1

Table 6. Hyperparameters used throughout this work.

Hyperparameter Value

Minibatch size 32Learning rate 0.001

Number of convolutions T 3

C. Computational DetailsC.1. Quantum-Chemical Calculations

All quantum-chemical calculations were carried out withthe PySCF program package (version 1.5) (Sun et al., 2018)employing the exchange-correlation density functional PBE(Perdew et al., 1996), and the def2-SVP (Weigend &Ahlrichs, 2005; Weigend, 2006) basis set.

Conformations generated by DL4CHEM did not succeed assome atoms were too close to each other. Self-consistentfield algorithms in quantum-chemical software such asPySCF do not converge for such molecular structures.

With quantum-chemical methods, we calculate several prop-erties that concern the states of the electrons in the con-formation. These are the total electronic energy Eelec, theenergy of the electron in the highest occupied molecularorbital (HOMO in eV) �HOMO, the energy of the lowest un-occupied molecular orbital (LUMO in eV) �LUMO, and thenorm of the dipole moment µ (in debye).

C.2. Euclidean Distance Geometry

We refer the reader to Havel (2002) for theory on EDG,algorithms, and chemical applications. In summary, theEDG procedure consists of the following three steps:

1. Bound smoothing: extrapolating a complete set oflower and upper limits on all the distances from thesparse set of lower and upper bounds.

2. Embedding: choosing a random distance matrix fromwithin these limits, and computing coordinates that area certain best-fit to the distances.

3. Optimization: optimizing these coordinates versus an

https://github.com/gncs/graphdghttps://github.com/gncs/graphdg


error function which measures the total violation of thedistance (and chirality) constraints.

We use the EDG implementation found in RDKIT (Riniker& Landrum, 2015) with default settings.

D. Generation of ConformationsFig. 7 shows an overlay of 50 conformations from theground-truth, RDKIT, DL4CHEM, and GRAPHDG basedon two random molecular graphs from a test set.

E. Distributions over DistancesWe show the marginal distributions p(dk|G) and p(di, dj |G)of ground-truth and predicted distances (in Å) for additionalmolecules from a test set.


Ground-truth RDKit DL4Chem GraphDG

Figure 7. Overlay of 50 conformations from the ground-truth, RDKIT, DL4CHEM, and GRAPHDG based on two random moleculargraphs from a test set. C, O, and H atoms are colored gray, red, and white, respectively.

Figure 8. Marginal distributions p(di, dj |G) of ground-truth and predicted distances for a molecular graph from a test set (in Å). Here, diand dj are restricted to edges representing bonds between C and O atoms. In the 3D structure of the molecule, carbon and oxygen atomsare colored gray and red, respectively. H atoms are omitted for clarity.


1.0 1.50

50

100

Coun

t 0 1

0.5 1.0 1.5

1 2

2 3

1 5

2 3

1 6

1 2 3

2 3

0.5 1.0 1.5

3 4

0.5 1.0 1.5

3 8

2 3

4 5

1.0 1.50

50

100

Coun

t 5 6

0.5 1.0 1.5

6 7

2 3

6 8

1 2

0 2

3 4

0 5

3 4

0 6

0 2

1 3

1 2

1 4

2.5 3.00

50

100

Coun

t 1 7

1 2

1 8

1 2

2 4

3 4

2 5

3 4

2 6

1 2

2 8

2 3

3 5

2.5 3.0

3 6

3 40

50

100

Coun

t 4 6

1.5 2.0 2.5

4 8

1 2

5 7

2.5 3.0

5 8

2.25 2.50 2.75

7 8 Ground-truth

GraphDG

DL4ChemRDKit

0

1

28

7

3

4

65

Figure 9. See caption of Fig. 5


1.5 1.60

50

100

Coun

t 0 1

1.5 1.6

1 2

1.5 2.0 2.5

1 5

1.5 2.0 2.5

1 7

1.5 1.6

2 3

1.25 1.50

3 4

1.50 1.75

3 8

1.5 2.0 2.5

4 5

1.4 1.50

50

100

Coun

t 5 6

1.5 2.0 2.5

5 7

1.5 2.0 2.5

7 8

2.50 2.75

0 2

2.5 3.0 3.5

0 5

2.5 3.0

0 7

2.2 2.4

1 3

2.5 3.0 3.5

1 4

1.5 2.0 2.50

50

100

Coun

t 1 6

1.5 2.0 2.5

1 8

2.25 2.50

2 4

2.4 2.6

2 5

2.5 3.0 3.5

2 7

2.25 2.50

2 8

1.5 2.0

3 5

2.5 3.0 3.5

3 7

2.5 3.00

50

100

Coun

t 4 6

3 4

4 7

2.25 2.50 2.75

4 8

2.0 2.5

5 8

1.5 2.0 2.5

6 7 Ground-truth

GraphDG

DL4ChemRDKit

0

1

2

8

7

3

4

6

5



1.5 1.6 1.70

50

100

Coun

t 0 1

1.5 1.6 1.7

1 2

1.0 1.5

1 7

1.25 1.50

1 8

1.5 1.6 1.7

2 3

1.25 1.50

3 4

1.4 1.6

3 8

1.5 2.0 2.5

4 5

1.25 1.50 1.750

50

100

Coun

t 5 6

1.4 1.6

5 8

1.5 2.0

6 7

2.50 2.75

0 2

2.0 2.5

0 7

2.50 2.75

0 8

2.1 2.2 2.3

1 3

2.25 2.50

1 5

2.2 2.40

50

Coun

t 1 6

2.4 2.6

2 4

2.4 2.6

2 7

1.75 2.00 2.25

2 8

1.5 2.0

3 5

2.5 3.0 3.5

4 6

2.0 2.5

4 8

2.0 2.5

5 7

2.0 2.50

20

40

Coun

t 6 8

1.5 2.0 2.5

7 8 Ground-truth

GraphDG

DL4ChemRDKit 0

1

2

8

7

3

4

6

5


A Generative Model for Molecular Distance Geometrynot targeting a distribution over conformations...

Documents

Transcript of A Generative Model for Molecular Distance Geometrynot targeting a distribution over conformations...