Autonomous Molecular Design: Then and...

Autonomous Molecular Design: Then and NowTanja Dimitrov,† Christoph Kreisbeck,†,‡ Jill S. Becker,† Alań Aspuru-Guzik,¶,†

and Semion K. Saikin*,†,‡

†Kebotix, Inc., 501 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States‡Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States¶Department of Chemistry and Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada

ABSTRACT: The success of deep machine learning inprocessing of large amounts of data, for example, in imageor voice recognition and generation, raises the possibilitiesthat these tools can also be applied for solving complexproblems in materials science. In this forum article, we focuson molecular design that aims to answer the question on howwe can predict and synthesize molecules with tailored physical, chemical, or biological properties. A potential answer to thisquestion could be found by using intelligent systems that integrate physical models and computational machine learningtechniques with automated synthesis and characterization tools. Such systems learn through every single experiment in ananalogy to a human scientific expert. While the general idea of an autonomous system for molecular synthesis andcharacterization has been around for a while, its implementations for the materials sciences are sparse. Here we provide anoverview of the developments in chemistry automation and the applications of machine learning techniques in the chemical andpharmaceutical industries with a focus on the novel capabilities that deep learning brings in.

KEYWORDS: machine learning, inverse design, deep learning, artificial intelligence, autonomous synthesis, neural networks

1. INTRODUCTION

Discovery of new organic materials with tailored properties is acomplex process which combines systematic and tedious taskswith a number of “lucky” coincidences. There are manyquestions as to what materials with specific properties weshould make. Example questions include which moleculescould form an ideal organic superconducting material or whichmolecules could make the most energy-efficient light sensorsand emitters in wearable electronics? How would we makethese materials nontoxic? These are just a few of the questionsthat molecular design aims to answer. For a given set ofmacroscopic properties, we aim to find the correspondingmicroscopic molecular structures and molecular packings. Twocomplementary possibilities to approach this problem with arethe inverse and the direct design. In inverse design,microscopic structures are derived from the macroscopicproperties. However, the structure−property relations of theinverse design are very complex and in most of the practicalapplications cannot be derived using analytical or computa-tional models. In contrast, the direct approach tests micro-scopic structures for the desired macroscopic properties. Forexample, in a naive “trial and error” method, we randomlyselect molecules, synthesize, and test them for the property.This approach is also very inefficient because the number ofpotentially synthesizable molecules is huge.1−3 The conven-tional approach to address these problems is based on humans’abilities to correlate and generalize experience. Makingeducated guesses, scientists generate hypotheses, synthesize,and test the molecules and then adjust these hypothesesaccording to the obtained experimental feedback.

Nowadays, many routine synthesis and characterizationoperations as well as computational modeling can beautomated. However, a higher level analysis of the resultsand decision making is still attributed to humans heavilyinvolving themselves in the molecular discovery loop. Such anautomated open-loop system possesses several advantagesbecause, on average, machines can operate at a higher speedmaintaining a higher precision. Moreover, this approachreleases researchers from monotonous, tedious proceduresleaving more time for creative work.Recent advances in deep machine learning (ML)4,5 brought

us a set of analytical tools that enter in many aspects of our life.For example, it is natural now to use ML-based speech and facerecognition, text translation, maps, and navigation on our cellphones. These tools are also exploited for the analysis ofscientific data and the scientific decision making process. Theproblems that ML has been successfully applied includesearches for molecules with specific properties,6 discoveries ofchemical reaction pathways,7 modeling of excitation dynam-ics,8 analysis of wave functions of complex systems,9 andidentification of phase transitions.10 One of the reasons for thesuccess of ML in materials science is the intrinsic hierarchy ofphysics phenomena.11 However, a systematic understanding ofwhich ML methods are optimal for molecular characterizationand what their limitations are is yet to be developed.

Special Issue: Materials Discovery and Design

Received: January 21, 2019Accepted: March 15, 2019

Forum Article

www.acsami.orgCite This: ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

© XXXX American Chemical Society A DOI: 10.1021/acsami.9b01226ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

Dow

nloa

ded

by P

EK

ING

UN

IV a

t 00:

31:5

7:24

9 on

Jun

e 12

, 201

9fr

om h

ttps:

//pub

s.ac

s.or

g/do

i/10.

1021

/acs

ami.9

b012

26.

www.acsami.orghttp://pubs.acs.org/action/showCitFormats?doi=10.1021/acsami.9b01226http://dx.doi.org/10.1021/acsami.9b01226

Combining automatic characterization, synthesis, andcomputational modeling with deep ML-based analysis anddecision-making modules naturally closes the moleculardiscovery loop. The main advantage of such a closed-loopautonomous system, as we see it now, is in a thoroughunbiased analysis of the data and generation of hypotheseswith a larger fraction of hits, rather than just a higherthroughput. Figure 1 shows a schematic diagram of such asystem, where human intuition and existing knowledgeprovides an input to the control module. The system generateshypotheses using a variety of theoretical models and ML tools,automatically plans, executes, and analyzes the experimentproviding feedback at each stage.12

The general idea of having a feedback loop in automatedsynthesis and characterization of molecules has been around atleast since the late 1970s. However, the elements composingthe system and our understanding of how it should operatehave evolved sufficiently. Feedback loops can be implementedat multiple levels including the optimization of the synthesisprocedure, finding appropriate reaction pathways, andsuggesting new molecular structures for testing. Moderncomputational capabilities allow us to process data at a muchhigher rate; therefore, more complex data analysis models canbe used. Moreover, the automated system can learn directlyfrom online databases and even generate and update existinglibraries. These are just some highlights of modern capabilitiesthat still have to be implemented and tested in the autonomousdiscovery workflow. In this forum article, we provide a conciseoverview of a closed-loop discovery approach bringing togethermodern advances with the research from the late previouscentury, when many components of automated chemistry andmachine learning were developed.The rest of the forum article is structured as follows. In

section 2, we discuss the key steps in chemistry automationfocusing on the developments from 1970s until 2000s. Section3 shortly outlines the early developments in ML and high-throughput screening before the period of deep learning.Section 4 is focused on applications of deep machine learningmethods in prediction of molecular properties. Section 5provides examples and discuss general trends in usingautonomous systems in characterization and synthesis ofmolecules. Finally, section 6 provides a discussion and somepractical ideas about the current issues in the design ofautonomous platforms.

2. AUTOMATION

The first successful attempts in a complete automation of asystem that synthesize molecules date back to the 1960s−1970s.13−15 In these pioneering studies, automation of thereaction was used for the optimization of reaction conditions.In one of the earliest research studies involving a computer-controlled synthesis,14 the authors developed a system, wherethe dispensing of chemicals into the reactor was controlledthrough a set of pumps and syringes. The products of thereaction were also characterized automatically. The operationof the system was demonstrated using a hydrolysis of p-nitrophenyl phosphate to p-nitrophenol by the enzyme alkalinephosphatase. The product of this reaction is yellow colored.Therefore, the amount of the product was easily monitoredusing spectrophotometry tools during the experiment. Threemain operational phases had been shown by the authors of thestudy implementing (1) a routine operation of the system,where conditions for single chemical reactions werepreprogrammed on a computer and then the synthesis wasexecuted autonomously, (2) a design of experimentalprocedures, where a set of experiments were executedautomatically while scanning selected condition parameters,and finally (3) a simple decision making procedure. Thisprocedure included a feedback with adapting a grid step in theexperiment which allowed the authors to optimize theexperimental conditions. Importantly, the authors of thestudy highlighted that the computer, besides controllingroutine procedures, can be used for such tasks as datainterpretation and the design of the experiment with afeedback loop. In yet another early work,15 the researchersfrom Smith Klein and French Laboratories had developed aclosed-loop automated chemical synthesis system to optimizechemical reaction conditions using a simplex algorithm. Thesystem was composed of a single chemical reactor, where thereactor conditions including heating/cooling, stirring, andadding chemical reagents were controlled by a protocol thatran on a computer. In addition, a liquid chromatographiccolumn had been used for the characterization of the products.Interestingly, this work showed a prototype of a distributed lab,where a computer was communicating through a modem withthe synthetic system. We also would like to highlight here thesystem developed by Legard and Foucard.16 The main featuresof this system were its modularity and ability to use standardchemistry lab equipment. It was introduced as a versatilechemistry automation kit, Logilap, that would allow for

Figure 1. Schematic illustration of an autonomous molecular discovery system with multiple feedback loops.

ACS Applied Materials & Interfaces Forum Article

DOI: 10.1021/acsami.9b01226ACS Appl. Mater. Interfaces XXXX, XXX, XXX−XXX

B

http://dx.doi.org/10.1021/acsami.9b01226

automatic control of the reaction parameters. Later in 1980s,this Logilap kit had been used by several research teams.17

Within the following two decades, the automation ofchemical laboratories became more systematic with a focuson the versatility of the systems. It has been recognized sincethe early stages that automation of chemistry requires a diverseset of methods. Therefore, the automated systems have to beeasily adapted for chemical reactions, using solid and liquidreagents, multiple solvents, and various chemical condi-tions.18−20 Moreover, the computer-controlled architecturerequires that the software should be easily modifiable byexperimentalists, who in turn are not specialized in softwaredesign.18 Architectures with multiple reactors and roboticarms21 have been introduced to parallelize the process. In theparticular study,21 the designed system had a robot with threeremote hands, a reagent station, or a stockroom that couldcontain up to 15 30−100 mL bottles, a reaction stage with upto nine magnetically stirred reactors, and a storage area thatcould accommodate 100 test tubes. The authors demonstratedthe operation of the platform using a preparation oftrifunctional vinyl sulfone in a one-pot sequence from keto-sulfone and methyl coumalate. This multistep reaction issensitive to the catalyst and the solvent used. Therefore, theseparameters were used in the automated optimization.Particular attention should be paid to the studies from TakedaChemistry Industries.22−24 The developed automated platformwas used for the synthesis of substituted N-(carboxyalkyl)-amino acids where some intermediate components areunstable. To optimize the reaction conditions, the researchteam developed a kinetic model for the reaction. Theparameters of this model were obtained from the experimentcreating a feedback loop.23 The developed platform was usedto generate 90 compounds, working 24 h a day with an averagereported productivity of three compounds per day. Thisworkstation demonstrated that even if the chemical yields arelow under the optimum conditions, it is still possible to obtaina sufficient amount of the desired product by repeating thereaction.The interest in combinatorial methods for automated

chemistry grew tremendously since the end of the 1980s. Inpart, this was triggered by the success of high-throughputscreening (HTS) in the pharmaceutical industry.25 Ascompared to process chemistry, where the main focus is onthe control and optimization of the process conditions,combinatorial chemistry is focused on the versatility ofproducts. This approach allows generating large libraries ofrelated chemicals using similar reaction pathways. By the end

of the 1990s, sufficient progress had been achieved in thesynthesis of polymers, where combinatorial libraries ofpolymers were automatically synthesized and characterizedusing fluorescence or Fourier transform-infrared (FT-IR)spectroscopy.26,27 Development of HTS also resulted innumerous patents on high-throughput experimentation(HTE) in the research and development of new materials.28

At the same time, commercial synthesizers appeared on themarket.29,30 Moreover, sufficient progress in materials charac-terization specific to combinatorial chemistry had beenachieved. For example, for many applications, fast character-ization of mixtures before purification has been implementedas a more time and cost efficient approach. The software usedfor controlling automated experiments has also changed toaccount for parallel processing. While the original protocolsused in 1980s were focused on serial implementation ofreaction steps and optimization algorithms, the plannersdesigned in the 1990s accounted for parallel processes andimplementing a factorial design, a design of experiment, amodified simplex, and tree search algorithms.31−33

By the beginning of the 20th century, the main componentsfor chemical reaction automation were (1) the delivery of thedifferent chemical components to the reactor, (2) controllingthe reactor conditions, (3) product purification, and (4)product characterization. All components were conceptuallysolved and implemented in commercially available synthe-sizers. These synthesizers were combined with characterizationtools in systems ready for generations of combinatoriallibraries. Figure 2 illustrates how the integration of automatedchemical systems evolved from the 1970s16 until the end of the1990s.34 While the original motivation for automatedchemistry was in replacing repetitive work,15,16 HTE broughtin the capabilities for exploring more chemical spaces. Multiplechemical reactions have been demonstrated using thesesystems.35 However, the synthetic capabilities of eachparticular system were limited to specific types of reactions,and a true universal chemical synthesizer had yet to bedeveloped. It had been postulated that the design of thesynthesizers should be adjusted for the particular tasks of eitherprocess screening, process optimization, or library generation.Within the last 20 years, automation in chemistry kept

evolving on a bit slower pace with the major improvements inthe integration of the components. Many automated character-ization devices became conventional tools in chemistry andbiological laboratories. However, the problem of the synthesisversatility had not been solved. In his review of automatedsystems for chemical synthesis developed at the beginning of

Figure 2. Examples of automated chemical synthesis systems from 1970s (a) and 1990s (b) illustrate the improved level of integration. Left panel isreproduced with permission from ref 16. Copyright 1978 American Chemical Society. Right panel is reproduced from ref 34 by permission of JohnWiley & Sons, Inc.



C


the 1990s, Lindsey outlined17 the following architectures: (1)flow reactors, (2) single-batch reactors, (3) single-roboticsynthesizers, (4) dual-robotic synthesizers, and (5) work-stations. Presently, these architectures have merged into twomain classes: (1) systems for flow and (2) systems for batchchemistry. Both classes have their own advantages. Forexample, flow chemistry setups allow for a continuousscreening of experimental parameters, such as concentrationsof active chemical components and catalysts, temperature, andtime.36,37 Moreover, the up-scaling of the process is mucheasier in the flow setup which could be transferred fromresearch to the developmental stage. In contrast, the batchsetup brings in the “digitization” of the synthesis process,which is natural for the generation of libraries of chemicalcomponents and compounds. Additionally, the batch setupallows the operation with very small amounts of chemicals,which is important when the cost of chemicals is a criticalfactor.38

3. VIRTUAL SCREENING AND MACHINE LEARNINGWhile being introduced much earlier, high-throughput virtualscreening and machine learning methods were extensivelydeveloped for medical chemical applications in the 1990s−2000s.39,40 The limitations of the synthetic capabilities werequickly recognized as the main issues of the HTS approach,especially for applications in the pharmaceutical domain. Thesizes of the largest molecular libraries, on the order of 104−105molecules, were negligible as compared to the number ofpotentially synthesizable molecules.41 Therefore, virtualscreening methods naturally appeared as a tool for helpingwith the exponential growth of HTS cost. These methodsallowed for a less expensive presynthesis analysis of moleculesin order to limit the search space. Closed-loop platformsinvolving virtual screening, scoring the molecules, synthesizingthe promising leads, followed by their characterization havebeen proposed.39

It should be noted that despite many similarities, the virtualscreening for drug design and for materials design aredifferent.42 While the virtual screening of potential drugcandidates is frequently based on phenomenological equationsor empirical data and cannot be done using microscopicmodels, the virtual screening of materials can be done usingphenomenological microscopical computation models or abinitio methods such as molecular dynamics or densityfunctional theory (DFT). Further, the virtual screening ofmaterials comes with the following drawbacks/issues: (1) largelibraries are still expensive, (2) the precision of computedresults is often not sufficient, and (3) some properties aredifficult to compute.In parallel, ML methods evolved as a tool for the

optimization, classification, and prediction of molecularproperties in the pharmaceutical industry.43−48 The methodsincluded random forest, decision trees, support vectormachines, and artificial neural networks (ANNs). At thebeginning of the 20th century, these methods becameconventional tools for classifying results from HTS. Onaverage, the ML methods showed a several-fold improvementin hit-rates as compared to the random or an expert basedHTS. It had been found that many existing molecular librariesused in HTS are not diverse enough and contain multipleclusters of molecules. Proper learning from these unbalanceddata sets and evaluating the ML methods required additionalbalancing and reweighting of the components. As an

illustration of the ML methods before the era of deep learning,we review the effort of a research group for DuPont publishedin ref 49. This study illustrates the state-of-the-art of ML fromHTS by the middle of 2000s. The research team comparedfour different ML models, two decision-tree models, an in-house built InfoEvolve model,50 and a neural network modelwith eight different descriptors to the classification of large,∼106 molecules, agrochemical data sets. The main outcome ofthis multiyear project was that the best prediction performancecould be obtained using a combined model trained on differenttypes of descriptors. The authors argued that the descriptorsets contain complementary information and combining themaverages the performance of the models.Sparse applications of ANNs in chemistry can be dated back

at least to the beginning of 1990s.51,52 In these studies, ANNswere applied to molecular characterization, including theprediction of chemical shifts in NMR spectra of organiccompounds,53 classification of IR spectra,54 and FT-Ramanspectra.55 The training sets used in these studies weresufficiently small, ∼100−1000 data points. The performanceof ANNs had been compared to other ML methods.49,56

However, the results did not seem too promising. This also hasbeen associated with a higher computation cost of training ofNNs as well as the complexities in their design.

4. DEEP LEARNING IN CHEMISTRYIn recent years, we experienced major breakthroughs in variousapplications for deep learning. One of the most famousexamples is AlphaGo,57 which outperformed the best humanplayer in a complex and abstract board game. The main driverfor the recent developments in deep learning is based on thevast increase in computational resources following Moore’sLaw. Today, even standard desktop computers can reach terafloating point operations per second (TFLOP) as tailoredhardware technologies for deep learning, such as Google’stensor processing units (TPUs) or the Tesla V100 graphicsprocessing unit (GPU) (up to 125 TFLOPS TensorPerformance) are available as consumer products.The fast paced advances in deep learning have been quickly

adopted by the field of chemistry for small moleculedesign.58−60 Successes range from (1) the prediction ofbinding activities of small molecules,61−63 (2) sophisticatedAI software for reaction prediction64,65 and reaction routeplanning,7 and (3) the inverse design of small molecules.60,66

New opportunities have emerged in the last years by theadvent of deep generative models67−69 originally developed fortext and images. Deep generative models offer the prospect of aparadigm shift from traditional forward design and virtualscreening of combinatorial libraries toward more diverse, yetfocused, exploration of chemical space for various applicationsranging from de novo drug discovery to small molecule basedmaterials for organic photovoltaics or energy storageapplications, among others. Such an inverse-design pipelineneeds to (1) learn the rules of chemistry to generate validchemical structures, (2) efficiently evaluate the molecularproperties of newly generated structures, and (3) quicklyidentify the relevant chemical space resulting in either focusedlibraries or a small set of lead candidates for experimentaltesting.Since molecules can be represented as Simplified Molecular

Input Line Entry Specification (SMILES) character strings,recurrent neural networks for sequence and text genera-tion70−73 serve as natural frameworks for algorithms where the



D


computer “dreams” up new molecular libraries. Segler et al.74

have trained a long short-term memory (LSTM) recurrentneural network (RNN) on a large molecular library of 1.4million molecules extracted from the ChEMBL database.About 900 000 new molecules were generated by sampling50 000 000 SMILES symbols. The properties of the generatedmolecules, such as number of proton H-donors/acceptors,solubility (log P), total polar surface area, etc., resemble whatthe model has seen in the training set. In the first step, the taskof the LSTM-RNN model is to reproduce valid SMILESstrings, in the second step transfer learning is applied to shiftthe distribution of the molecular properties toward thegeneration of active compounds against certain targets. Sincethe model is already pretrained, less examples of known activecompounds (∼1 000) are sufficient to retrain the model togenerate more focused libraries of novel potential drug-candidates. A similar approach has been used by Merk et al.75

for the de novo molecular design for agonists of therapeuticallyrelevant retinoid X and/or peroxisome proliferator-activatedreceptors. A total of 5 out of 49 high-scoring lead candidatesextracted by the deep-learning approach were experimentallytested, and 4 of them did indeed show considerable potency.While focused library generation has been shown to yield a

diverse set of molecules resulting in experimentally verifiednovel bioactive compounds,75,76 the covered chemical space isinherently restricted by the used training set. One way to pushthe distribution of generated molecules outside the chemicalspace of the training set is to couple the sequence basedgenerative models with policy based reinforcement learn-ing.77,78 In ref 78, a prior RNN was trained on 1.5 millioncompounds in the ChEMBL database, while another RNN isused as the agent network. Based on Markov decisionprocesses, the agent chooses the next character in the sequencewhen generating the SMILES string. During the learningepochs, the agent policy will be subsequently updated tomaximize its expected return when evaluating an applicationspecific scoring function for the generated character sequence.The model has been successfully demonstrated for varioustasks ranging from avoiding certain elements in the generatedmolecules to producing new compounds which are predictedto be biologically active. Reinforcement learning for inversemolecular design has been also applied in combination withGenerative Adversarial Networks (GANs) for molecularstructure generation.79,80 Both stochastic and deterministicpolicy gradients have been used. When compared to RNN forcharacter sequence generation, GANs offer more flexibility inthe representation of molecular structures. MolGAN,80 forexample, represents molecules as graphs.Another approach to the inverse design of small molecules,

pioneered by Go ́mez-Bombarelli et al.66 uses variationalautoencoders (VAEs). An encoder NN is used to compressmolecular space into a continuous vector space representation.Vector points in this so-called latent space can then bedecoded back into molecules. By training the VAE jointly witha model for property prediction, the latent space ensures asufficiently smooth and continuum representation of both,structures and properties. This facilitates Bayesian optimizationin the latent space for the de novo design of molecules with thedesired properties. In a simple example, Goḿez-Bombarelli etal. demonstrate their concept for the design of druglikecompounds that are easy to synthesize. Hereby, the authorstrained the VAE on 250 000 molecules extracted from theZINC database and estimate that their VAE architecture can

potentially generate approximately 7.5 million distinctstructures.Since generative models such as autoencoders do not

depend on hand-coded rules, the model needs to learnchemistry and the syntax of molecular representations, forexample, SMILES strings. This can be a challenge, and usuallyVAEs produce to a large degree invalid molecules. This issuehas been addressed by various improvements on theautoencoder architectures for small molecules. For example,Kusner et al.81 introduced a grammar variational autencoder(GVAE) to generate molecules with valid SMILES syntax.However, the SMILES language is not entirely context freewhich becomes a problem when ringbonds are involved. Thishas led to the development of a syntax-directed VAE,82 whichapplies an on-the-fly semantic validation by implementingstochastically lazy attributes. Besides SMILES strings, graphsalso have been used to generate a diverse library ofmolecules.83,84 To date, the best VAE when it comes togenerating chemically valid molecules is the Junction TreeVariational Autoencoder85 which represents molecules asgraphs, which are dissected into subgraphs and smallerbuilding blocks. Each molecule is then described as a tree-structured scaffold over chemical substructures. Further, avariety of additional methods for inverse-design have recentlyemerged. For example, ChemTS86 uses a Monte Carlo treesearch approach for which a RNN is trained as rollout policy.Molecules are represented as SMILES. The approach wassuccessfully demonstrated for the design of molecules thatmaximize the octanol−water partition coefficient (log P) andsimultaneously optimize for synthetic accessibility with anadditional penalty score to avoid the generation of moleculeswith large rings. Another recent approach to inverse design isbased on Bayesian molecular design.87 Hereby, ML models aretrained to predict structure−function relationships formolecular properties. Then Bayes’s law is used to derive aposterior distribution for backward prediction. Moleculargeneration is based on the SMILES representation, and achemical language model is trained separately. A summary ofthe discussed deep learning methods that are used in chemistrycan be found in Table 1.

Table 1. Deep Learning in Chemistry

acronym description

RNN LSTM Long short-term memory recurrent neural network forgenerating focused molecular libraries.74

GAN Generative Adversarial Networks for molecular structuregeneration.79,80

MolGAN Generative Adversarial Networks that represents moleculesas graphs.80

VAE Variational autoencoders for the inverse design of smallmolecules.66

GVAE Grammar variational autoencoder to generate molecules withvalid SMILES syntax.81

SD-VAE Syntax-directed variational autoencoder which applies an on-the-fly semantic validation of the generated SMILES.82

JTVAE Junction Tree Variational Autoencoder that representsmolecules as graphs.85

ChemTS Combines recurrent neural networks and Monte Carlo treesearch for de novo molecular design.86

Bayesianmoleculardesign

Inverse molecule design using the Bayesian statisticscombined with a chemical language model.87



E


5. AUTONOMOUS DISCOVERY SYSTEMSThe progress in automation and deep learning algorithms, asdescribed in the previous sections, has enabled the design ofautonomous discovery systems. In particular, for chemicalsystems, autonomous molecular discovery systems require togo beyond the automation of the synthesis and/or thecharacterization of potential compounds. Such a systemneeds to (1) generate the hypotheses, (2) test them, and (3)adapt these hypotheses by performing automated experiments.Adam and Eve, designed to generate and test hypotheses in a

closed-loop cycle using laboratory automation in the field ofbiomedical research,89,90 are considered among the firstautonomous systems. Adam produced new scientific knowl-edge by analyzing genes and enzyme functions of yeast.89,91−93

To propose hypotheses, Adam used logic programming forrepresenting background knowledge, where the metabolism ofyeast, including most of the genes, proteins, enzymic functions,and metabolites, were modeled as a directed, labeled hyper-graph.89 In a closed-loop, Adam then applied abductivereasoning to form hypotheses, used active learning to selectexperiments, generated experimental data by measuring theoptical density of the yeast cultures, and then tested thehypotheses using decision tree and random forest algorithms.89

The combination of this software with the automation ofAdams hardware, which includes robotic arms to control theexperimental setup such as a liquid handler and a plate reader,allowed Adam to autonomously perform microbial experi-ments.89

While Adam had been designed to investigate genes andenzyme functions, Eve specializes on early stage screening anddesign of drugs that target neglected Third Worlddiseases.88−94 Eve, shown in Figure 3, comprises three types

of liquid handlers, two microplate readers, and an automatedcellular imager that are operated by an active learningalgorithm.88,95 This algorithm allows for cellular growth assays,cell based chemical compound screening assays, and cellularmorphology assays.90 Eve can perform scientific investigationsthat go beyond standard library screening and is capable toperform hit-confirmation and lead generation.96 While Evedemonstrates the possibilities of active learning in compoundscreening, same as Adam, it was not designed for synthesizingchemicals.88

Several academic research teams are driving the progress inthe design of autonomous chemical synthesis systems that canbe used for specific applications.97−110 There is also a numberof automated platforms in pharmaceutical companies. How-ever, since these platforms are often commercialized and arenot open source, the current state-of-the-art of these systems isnot as transparent and it is not so easy to evaluate howautonomous they are. In the following, we highlight a fewselected examples from five different research groups (1−5)explicitly, some of which are also shown in Figure 4. Asummary of the discussed systems can be found in Table 2.(1) Remarkable designs of autonomous systems have been

developed in the laboratory of Lee Cronin. These include anorganic synthesis system,111 a dropfactory system,112 and theChemputer.113 The latter is shown in Figure 4a.The autonomous organic synthesis system111 with its liquid

handling robot and inline spectroscopy tools, i.e., a NMRsystem, a mass spectrometer, and an infrared spectrometer,performs chemical reactions, analyzes the products, and thenuses real-time data processing and feedback mechanisms topredict the reactivity of chemical compounds. Combiningmachine learning, robotics, real-time feedback with theinformation provided by human experts, who initially labelthe reactivity of 72 mixtures to train the support vectormachine algorithm and manually performed the reactions, thesystem was able to autonomously explore the chemicalreactivity landscape of about 1000 reaction combinations.111

As the result of this study, the researchers were able to revealfour new chemical reactions.Discovery of molecules with targeted properties requires

algorithms that allow for searching outside the local chemicalspace. Using a curiosity driven algorithm and by investigatingthe behavior of oil-in-water proto-cell droplets in a closed-loopcycle, the autonomous drop factory system112 was able toidentify and classify modes of proto-cell droplet motion amongother behaviors.In the same research group, the Chemputer,113 a modular

system driven by a chemical programming language,synthesized the three pharmaceutical compounds Nytol,rufinamide, and sildenafil.(2) Researchers from the group of Steven Ley developed a

flow system controlled by the LeyLab software that allowsmonitoring and controlling of chemical reactions, automationof synthetic procedures, and autonomously self-optimizingreaction parameters. The system was able to optimize a three-dimensional heterogeneous catalytic reaction and a five-dimensional Appel reaction.107 Further, the group demon-strated a cloud-based solution that allows scientific collabo-rations across the globe. For the synthesis of the activepharmaceutical ingredients Tramadol, Lidocaine, and Bupro-pion, researchers in the USA remotely initiated, monitored,and controlled the experimental setup stationed in a laboratoryin the U.K., including the chemicals and a self-optimizedcontinuous IR flow system as shown in Figure 4b.(3) The autonomous self-optimizing reactor designed in the

group of Klavs F. Jensen with its design-of-experiment (DOE)-based adaptive response surface algorithm allows for theoptimization of discrete variables, such as types of catalysts orsolvents and continuous variables like temperature, reactiontime, and concentration simultaneously. The reactor was usedfor precatalyst selection in Suzuki-Miyaura cross-couplings andfor the optimization of an alkylation reaction. Among tendifferent solvents and three continuous variables,98,99 the

Figure 3. Eve autonomous system. The figure is adapted from ref 88and used under CC BY 4.0.



F


reactor has identified solvents and reaction conditions thatmaximized the yields of the monoalkylated product.In the same research group, a major drawback of

continuous-flow chemical synthesis systems has been ad-dressed, i.e., such systems are often built for very specificchemical reactions and/or targets.100 The researchersaddressed this problem by designing a plug-and-play systemwith interchangeable modules as shown in Figure 4c. Thisreconfigurable system allows optimizing of a variety of differentchemical reactions including high-yielding implementations of

C−C and C−N cross-couplings, olefinations, reductiveaminations, nucleophilic aromatic substitutions (SNAr),photoredox catalysis, and multistep sequences.(4) Researchers in the group of Franco̧is-Xavier Felpin110

developed an autonomous self-optimizing flow reactor that iscontrolled by a custom-made optimization algorithm derivedfrom the Nelder−Mead and golden section searches with aflexible monitoring system. The system was able to perform amultistep synthesis for the total synthesis of carpanone.110

(5) Autonomous closed-loop platforms have also beendesigned for the synthesis of single-walled carbon nanotubesby the team of Benji Murayama at the Air Force ResearchLaboratory (AFRL). The Autonomous Research System(ARES) autonomously learned to target growth rates.117

These are a just a few selected examples of specializedsystems with autonomous control. Additional discussions ofautonomous systems and their specific applications for energyand drug discovery processes can be found in refs 12, 110, and118. One needs to note that the boundary between theautomated systems with optimization loops described in theprevious section and the autonomous systems discussed here isvery thin. Most importantly, an optimization or a searchalgorithm lies at the heart of each autonomous system. Thisalgorithm allows for the learning of an objective function, i.e., afunction that describes the experimental outcome, e.g., theyield of a chemical reaction, as a function of a set of externalparameters.119 By feeding the experimental data intosupervised or unsupervised machine learning algorithms, thedependency of desired properties as a function of the set ofexperimental conditions can be learned in real-time, thusallowing the autonomous system to explore the parameterspace of the experiment it is designed for. The ML algorithmscan be also used for understanding the general energylandscape of the studied systems.120−122 While supervisedalgorithms use labeled data to train the model, unsupervised

Figure 4. Examples of modern chemical synthesis systems with autonomous control. (a) Chemputer designed in the group of Lee Cronin, from ref113. Reprinted with permission from AAAS. (b) Remotely controlled FlowIR developed in the group of Steven Ley. Figure adapted from ref 115and used under CC BY 4.0. (c) The plug-and-play system designed in the group of Klavs F. Jensen. Figure adapted from ref 100. Reprinted withpermission from AAAS.

Table 2. Autonomous Discovery Systems

system description

Adam89,91−93 Autonomous system for biomedical research to analyzegene and enzyme functions of yeast.

Eve88 Autonomous system for early stage screening and design ofdrugs targeting Third World diseases.

Organicsynthesissystem111

Autonomous organic synthesis system to predict thereactivity of chemical compounds.

Dropfactorysystem112

Autonomous system to explore the behavior of oil-in-waterproto-cell droplets.

Chemputer113 Modular system that synthesized Nytol, Rufinamide, andSildenafil.

FlowsystemLeyLab107

Autonomous flow system that optimized catalyticreactions.

FlowIR114,115 Cloud based flow system that synthesized Tramadol,Lidocaine, and Bupropion

Reactor98,99 Self-optimizing reactor that allows to optimize discrete andcontinuous parameters.

Plug-and-playsystem100

Plug-and-play system with interchangeable modules for avariety of different chemical reactions.

Flow reactor110 Autonomous self-optimizing flow reactor that performed amultistep synthesis of carpone.

ChemOS116 Software package to handle workflow of autonomousplatforms that explored color and cocktail spaces.

ARES117 Autonomous Research System for the synthesis of single-walled carbon nanotubes.



G


models have successfully proven to be able to unravelnontrivial patterns in complex data. Promising unsupervisedalgorithms to operate autonomous platforms compriseBayesian deep learning, Bayesian conditional generativeadversarial networks,123 or deep Bayesian optimization119

among others. For instance, ChemOS,116 a software packageable to handle the workflow of autonomous platforms,combines Bayesian optimization119 with laboratory instru-ments such as high-performance liquid chromatography(HPLC) to learn the color and cocktail chemical spaces.116

Each synthesis and characterization step performed by such anautonomous discovery system adds new information that isthen used within the feedback loop as a decision-makingprocess for the next synthesis step, guiding the experimentalsearch for new molecules. The vision is to move away fromautonomous systems toward the more general concept ofautonomous molecular discovery platforms to target a broadrange of different chemical reactions. Therefore, to synthesizespecific molecules with desired properties, an autonomousplatform should be as flexible as possible in design on bothends, i.e., on hardware and on software. This demands modularinterchangeable monitoring and synthesis systems. In addition,the algorithm that controls the system should be robustenough to optimize an objective function within a chemicalsearch space spanned by discrete and continuous variables aswell as flexible enough to explore the molecular design space.All of the autonomous systems outlined above demonstratehow combining robotics and machine learning algorithms canlead to new scientific understanding and discovery. Each ofthese systems tackle different aspects that bring us closer to ageneralized autonomous molecular discovery platform andultimately bring us closer to building platforms that will be ableto autonomously dream up new materials at the push of abutton.

6. LOOKING AHEADThe general trend of chemistry automation has not been acontinuous progress but rather involved periods of stagnationand exploration of various designs. Automation in chemistrystill lacks the versatility desired. Currently, most of theautomated systems are limited to specific sets of chemicalreactions, and adjusting them between the reactions mayrequire a sufficient effort. Synthetic platforms should bereaction agnostic and easily transformable. Systems focused onproducts versatility are useful for generating large molecularlibraries. However, to push the envelope of smart synthesissystems, process versatility should be also explored in depth.The ability of flexible access of software to the devices is the

heart of any autonomous system. Many characterization andsynthesis devices exist in the market as modular tools.Unfortunately, so far, not all of them have easy access forthird-party software control. Ideally, chemical synthesis andcharacterization systems should adopt a general roboticstandard for their interface with the external devices. Thefield of robotics enters into a phase where robotic systems cancooperate with humans in a safe way or even learn from them.This standard has yet to be used by chemistry automation andcomes with a lot of opportunities and challenges. On the onehand, graphical user interfaces (GUI) are practical for humanresearchers operating or trying hypotheses but becomeimpractical for automated control systems. On the otherhand, application program interfaces (API) are practical for thesoftware control of devices but not flexible enough to use for

single experiments. The balance between the integration vsmodularity should be also addressed in future autonomoussystems. Highly integrated characterization tools such as NMRspectrometers and HPLC/MS are commonly used inautomated experimentation. However, the entire systemshould be modular, where each unit can be easily added orremoved without reprogramming the entire software.Theoretical/computational tests of deep ML methods,

especially for generative models, show multiple advantages ofthese as compared to the ML tools used previously. However,the deep learning techniques still have to be implemented andtested in complete discovery pipelines. One of the issues herecan be with the general hype about deep learning. Most likely,deep learning is not a magical tool that will eventually solve allproblems of materials discovery. However, it is applicable forsome intermediate steps, providing solutions, e.g., on how toperform a proper and complete enough sampling of thechemical space. The overall goal here is to increase the hit rate,rather than just to increase the size of the libraries. One relatedquestion to be addressed here is how to evaluate and comparethe performance of the deep learning methods.In addition to the questions raised above, the question

remains on how fast our deep learning models for autonomoussystems/platforms can learn from the experiments. Theconventional paradigm of deep learning is that the modelsare trained on a tremendous amount of data. This is not anissue for the data from social networks and for imageprocessing. However, in the case of automated chemicalsynthesis, each data point can be expensive and its generationis limited by the throughput of the synthesis and character-ization systems. How much experimental data is enough tolearn from experimental measurements with active feedback?Also, this data can be obtained with different precision. Theactive learning algorithms should automatically decide whatprecision to use in order to balance the cost of the experimentand the value it brings to the training of the algorithm.One of the main advantages of deep learning is in training of

a model on raw unprocessed data. However, this cannot bedirectly implemented for learning of molecular properties. Inthe latter case we need to digitize molecular information andwrite it in a form of molecular descriptors. Therefore, the deeplearning model can learn only as much information as it isencoded in the descriptors.We are in an exciting era, where human-robot co-operations

redefine the scientific discovery process and accelerate thediscovery and synthesis of molecules with targeted properties.The generalization of autonomous platforms for moleculardiscovery will require from us to rethink existing designs ofplatforms encompassing several challenges on the hardwareand software side as well as on the interface in between them.

■ AUTHOR INFORMATIONCorresponding Author*E-mail: [email protected] Dimitrov: 0000-0002-5675-7825Alań Aspuru-Guzik: 0000-0002-8277-4434Semion K. Saikin: 0000-0003-1924-3961NotesThe authors declare no competing financial interest.



H

mailto:[email protected]://orcid.org/0000-0002-5675-7825http://orcid.org/0000-0002-8277-4434http://orcid.org/0000-0003-1924-3961http://dx.doi.org/10.1021/acsami.9b01226

■ ACKNOWLEDGMENTSA.A.-G. thanks Dr. Anders G. Frøseth for his generous support.

■ REFERENCES(1) Bohacek, R. S.; McMartin, C.; Guida, W. C. The Art andPractice of Structure-based Drug Design: A Molecular ModelingPerspective. Med. Res. Rev. 1996, 16, 3−50.(2) Ertl, P. Cheminformatics Analysis of Organic Substituents:Identification of the Most Common Substituents, Calculation ofSubstituent Properties, and Automatic Identification of Drug-likeBioisosteric Groups. J. Chem. Inf. Comput. Sci. 2003, 43, 374−380.(3) Polishchuk, P.; Madzhidov, T.; Varnek, A. Estimation of the Sizeof Drug-like Chemical Space Based on Gdb-17 Data. J. Comput.-AidedMol. Des. 2013, 27, 675−679.(4) LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015,521, 436.(5) Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R. P.; de Freitas, N.Taking the Human Out of the Loop: A Review of BayesianOptimization. Proc. IEEE 2016, 104, 148.(6) Goḿez-Bombarelli, R.; Aguilera-Iparraguirre, J.; Hirzel, T.;Duvenaud, D.; Maclaurin, D.; Blood-Forsythe, M. A.; Chae, H.;Einzinger, M.; Ha, D. G.; Wu, T.; Markopoulos, G.; Jeon, S.; Kang,H.; Miyazaki, H.; Numata, M.; Kim, S.; Huang, W.; Hong, S.; Baldo,M.; Adams, R.; Aspuru-Guzik, A. Design of Efficient MolecularOrganic Light-emitting Diodes by a High-throughput VirtualScreening and Experimental Approach. Nat. Mater. 2016, 15, 1120.(7) Segler, M.; Preuss, M.; Waller, M. Planning Chemical Syntheseswith Deep Neural Networks and Symbolic Ai. Nature 2018, 555, 604.(8) Has̈e, F.; Kreisbeck, C.; Aspuru-Guzik, A. Machine Learning forQuantum Dynamics: Deep Learning of Excitation Energy TransferProperties. Chem. Sci. 2017, 8, 8419−8426.(9) Carleo, G.; Troyer, M. Solving the Quantum Many-BodyProblem with Artificial Networks. Science 2017, 355, 602.(10) van Nieuwenburg, E. P. L.; Liu, Y.-H.; Huber, S. D. LearningPhase Transitions by Confusion. Nat. Phys. 2017, 13, 435.(11) Lin, H. W.; Tegmark, M.; Rolnick, D. Why Does Deep andCheap Learning Work so Well? J. Stat. Phys. 2017, 168, 1223.(12) Tabor, D. P.; Roch, L. M.; Saikin, S. K.; Kreisbeck, C.; Sheberla,D.; Montoya, J. H.; Dwaraknath, S.; Aykol, M.; Ortiz, C.; Tribukait,H.; Amador-Bedolla, C.; Brabec, C. J.; Maruyama, B.; Persson, K. A.;Aspuru-Guzik, A. Accelerating the Discovery of Materials for CleanEnergy in the Era of Smart Automation. Nat. Rev. Mater. 2018, 3, 5−20.(13) Merrifield, R. B.; Stewart, J. M.; Jernberg, N. Instrument forAutomated Synthesis of Peptides. Anal. Chem. 1966, 38, 1905−1914.(14) Deming, S. N.; Pardue, H. L. Automated Instrumental Systemfor Fundamental Characterization of Chemical Reactions. Anal. Chem.1971, 43, 192−200.(15) Winicov, H.; Schainbaum, J.; Buckley, J.; Longino, G.; Hill, J.;Berkoff, C. Chemical Process Optimization by Computer - a Self-directed Chemical Synthesis System. Anal. Chim. Acta 1978, 103,469−476.(16) Legrand, M.; Foucard, A. Automation on the LaboratoryBench. J. Chem. Educ. 1978, 55, 767.(17) Lindsey, J. S. A Retrospective on the Automation of LaboratorySynthetic Chemistry. Chemom. Intell. Lab. Syst. 1992, 17, 15−45.(18) Legrand, M.; Bolla, P. A Fully Automatic Apparatus forChemical Reactions on the Laboratory Scale. J. Autom. Chem. 1985, 7,31−37.(19) Porte, C.; Roussin, D.; Bondiou, J.-C.; Hodac, F.; Delacroix, A.The ’Automated Versatile Modular Reactor’: Construction and Use. J.Autom. Chem. 1987, 9, 166−173.(20) Guette, J.-P.; Crenne, N.; Bulliot, H.; Desmurs, J.-R.; Igersheim,F. Automation in the organic chemistry laboratory: Why? How? PureAppl. Chem. 1988, 60, 1669−1678.(21) Frisbee, A. R.; Nantz, M. H.; Kramer, G. W.; Fuchs, P. L.Laboratory Automation. 1: Syntheses Via Vinyl Sulfones. 14. RoboticOrchestration of Organic Reactions: Yield Optimization Via an

Automated System with Operator-specified Reaction Sequences. J.Am. Chem. Soc. 1984, 106, 7143−7145.(22) Hayashi, N.; Sugawara, T. Computer-assisted AutomatedSynthesis I: Computer-controlled Reaction of Substituted N-(carboxyalkyl)amino Acids. Tetrahedron Comput. Methodol. 1988, 1,237−246.(23) Hayashi, N.; Sugawara, T.; Shintani, M.; Kato, S. Computer-assisted Automatic Synthesis Ii. Development of a Fully AutomatedApparatus for Preparing Substituted N-(carboxyalkyl)amino Acids. J.Autom. Chem. 1989, 11, 212−220.(24) Hayashi, N.; Sugawara, T.; Kato, S. Computer-assistedAutomated Synthesis. Iii. Synthesis of Substituted N-(carboxyalkyl)Amino-acid Tert-butyl Ester Derivatives. J. Autom. Chem. 1991, 13,187−197.(25) Pereira, D. A.; Williams, J. A. Origin and Evolution of HighThroughput Screening. Br. J. Pharmacol. 2007, 152, 53−61.(26) Hoogenboom, R.; Meier, M. A. R.; Schubert, U. S.Combinatorial Methods, Automated Synthesis and High-ThroughputScreening in Polymer Research: Past and Present. Macromol. RapidCommun. 2003, 24, 15−32.(27) Meier, M. A. R.; Hoogenboom, R.; Schubert, U. S.Combinatorial Methods, Automated Synthesis and High-ThroughputScreening in Polymer Research: The Evolution Continues. Macromol.Rapid Commun. 2004, 25, 21−33.(28) Dar, Y. L. High-Throughput Experimentation: A PowerfulEnabling Technology for the Chemicals and Materials Industry.Macromol. Rapid Commun. 2004, 25, 34−47.(29) Armitage, M. A.; Smith, G. E.; Veal, K. T. A Versatile and Cost-Effective Approach to Automated Laboratory Organic Synthesis. Org.Process Res. Dev. 1999, 3, 189−195.(30) Hird, N. Automated Synthesis: New Tools for the OrganicChemist. Drug Discovery Today 1999, 4, 265−274.(31) Andrew Corkan, L.; Lindsey, J. S. Experiment ManagerSoftware for an Automated Chemistry Workstation, Including aScheduler for Parallel Experimentation. Chemom. Intell. Lab. Syst.1992, 17, 47−74.(32) Plouvier, J. C.; Andrew Corkan, L.; Lindsey, J. S. ExperimentPlanner for Strategic Experimentation with an Automated ChemistryWorkstation. Chemom. Intell. Lab. Syst. 1992, 17, 75−94.(33) Dixon, J. M.; Lindsey, J. S. Performance of Search Algorithms inthe Examination of Chemical Reaction Spaces with an AutomatedChemistry Workstation. JALA 2004, 9, 364−374.(34) Okamoto, H.; Deuchi, K. Design of a Robotic Workstation forAutomated Organic Synthesis. Lab. Rob. Autom. 2000, 12, 2−11.(35) Harre, M.; Tilstam, U.; Weinmann, H. Breaking the NewBottleneck: Automated Synthesis in Chemical Process Research andDevelopment. Org. Process Res. Dev. 1999, 3, 304−318.(36) Malet-Sanz, L.; Susanne, F. Continuous Flow Synthesis. aPharma Perspective. J. Med. Chem. 2012, 55, 4062−4098.(37) Wegner, J.; Ceylan, S.; Kirschning, A. Flow Chemistry−a KeyEnabling Technology for (multistep) Organic Synthesis. Adv. Synth.Catal. 2012, 354, 17−57.(38) Buitrago Santanilla, A.; Regalado, E. L.; Pereira, T.; Shevlin, M.;Bateman, K.; Campeau, L.-C.; Schneeweis, J.; Berritt, S.; Shi, Z.-C.;Nantermet, P.; Liu, Y.; Helmy, R.; Welch, C. J.; Vachal, P.; Davies, I.W.; Cernak, T.; Dreher, S. D. Nanomole-scale High-throughputChemistry for the Synthesis of Complex Molecules. Science 2015, 347,49−53.(39) Walters, W.; Stahl, M. T.; Murcko, M. A. Virtual Screening - anOverview. Drug Discovery Today 1998, 3, 160−178.(40) Shoichet, B. K. Virtual Screening of Chemical Libraries. Nature2004, 432, 862−865.(41) Dobson, C. M. Chemical Space and Biology. Nature 2004, 432,824−828.(42) Pyzer-Knapp, E. O.; Suh, C.; Gomez-Bombarelli, R.; Aguilera-Iparraguirre, J.; Aspuru-Guzik, A. What Is High-throughput VirtualScreening? a Perspective from Organic Materials Discovery. Annu.Rev. Mater. Res. 2015, 45, 195−216.



I


(43) Aoyama, T.; Suzuki, Y.; Ichikawa, H. Neural Networks Appliedto Structure-Activity Relationships. J. Med. Chem. 1990, 33, 905−908.(44) King, R. D.; Muggleton, S.; Lewis, R. A.; Sternberg, M. J. DrugDesign by Machine Learning: The Use of Inductive LogicProgramming to Model the Structure-activity Relationships ofTrimethoprim Analogues Binding to Dihydrofolate Reductase. Proc.Natl. Acad. Sci. U. S. A. 1992, 89, 11322−11326.(45) Tetko, I. V.; Tanchuk, V. Y.; Chentsova, N. P.; Antonenko, S.V.; Poda, G. I.; Kukhar, V. P.; Luik, A. I. HIV-1 Reverse TranscriptaseInhibitor Design Using Artificial Neural Networks. J. Med. Chem.1994, 37, 2520−2526.(46) Schneider, G.; Schrodl, W.; Wallukat, G.; Muller, J.; Nissen, E.;Ronspeck, W.; Wrede, P.; Kunze, R. Peptide Design by ArtificialNeural Networks and Computer-based Evolutionary Search. Proc.Natl. Acad. Sci. U. S. A. 1998, 95, 12179−12184.(47) Schneider, G.; Wrede, P. Artificial Neural Networks forComputer-based Molecular Design. Prog. Biophys. Mol. Biol. 1998, 70,175−222.(48) Burbidge, R.; Trotter, M.; Buxton, B.; Holden, S. Drug Designby Machine Learning: Support Vector Machines for PharmaceuticalData Analysis. Comput. Chem. 2001, 26, 5−14.(49) Simmons, K.; Kinney, J.; Owens, A.; Kleier, D. A.; Bloch, K.;Argentar, D.; Walsh, A.; Vaidyanathan, G. Practical Outcomes ofApplying Ensemble Machine Learning Classifiers to High-throughputScreening (HTS) Data Analysis and Screening. J. Chem. Inf. Model.2008, 48, 2196−2206.(50) Vaidyanathan, G. Infoevolve: Moving from Data to KnowledgeUsing Information Theory and Genetic Algorithms. Ann. N. Y. Acad.Sci. 2004, 1020, 227−238.(51) Simon, V.; Gasteiger, J.; Zupan, J. A Combined Application ofTwo Different Neural Network Types for the Prediction of ChemicalReactivity. J. Am. Chem. Soc. 1993, 115, 9148−9159.(52) Gasteiger, J.; Zupan, J. Neural Networks in Chemistry. Angew.Chem., Int. Ed. Engl. 1993, 32, 503−527.(53) Aires-de Sousa, J.; Hemmer, M. C.; Gasteiger, J. Prediction of 1H NMR Chemical Shifts Using Neural Networks. Anal. Chem. 2002,74, 80−90.(54) Munk, M. E.; Madison, M. S.; Robb, E. W. The NeuralNetwork As a Tool for Multispectral Interpretation. J. Chem. Inf.Comput. Sci. 1996, 36, 231−238.(55) Lewis, I. R.; Daniel, N. W.; Chaffin, N. C.; Griffiths, P. R.Raman Spectrometry and Neural Networks for the Classification ofWood Types-1. Spectrochimica Acta Part A: Molecular Spectroscopy1994, 50, 1943−1958.(56) Byvatov, E.; Fechner, U.; Sadowski, J.; Schneider, G.Comparison of Support Vector Machine and Artificial NeuralNetwork Systems for Drug/Nondrug Classification. J. Chem. Inf.Comput. Sci. 2003, 43, 1882−1889.(57) Silver, D.; Huang, A.; Maddison, C.; Guez, L.; Sifre, A.; van denDriessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.;Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.;Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.;Hassabis, D. Mastering the Game of Go with Deep Neural Networksand Tree Search. Nature 2016, 529, 484−489.(58) Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T.The Rise of Deep Learning in Drug Discovery. Drug Discovery Today2018, 23, 1241−1250.(59) Gawehn, E.; Hiss, J. A.; Schneider, G. Deep Learning in DrugDiscovery. Mol. Inf. 2016, 35, 3−14.(60) Sanchez-Lengeling, B.; Aspuru-Guzik, A. Inverse MolecularDesign Using Machine Learning: Generative Models for MatterEngineering. Science 2018, 361, 360−365.(61) Ragoza, M.; Hochuli, J.; Idrobo, E.; Sunseri, J.; Koes, D. R.Protein-Ligand Scoring with Convolutional Neural Networks. J.Chem. Inf. Model. 2017, 57, 942−957.(62) Jimeńez, J.; Skalic, M.; Martínez-Rosell, G.; De Fabritiis, G.Kdeep: Protein-Ligand Absolute Binding Affinity Prediction Via 3d-convolutional Neural Networks. J. Chem. Inf. Model. 2018, 58, 287−296.

(63) Feinberg, E. N.; Sur, D.; Wu, Z.; Husic, B. E.; Mai, H.; Li, Y.;Sun, S.; Yang, J.; Ramsundar, B.; Pande, V. S. PotentialNet forMolecular Property Prediction. ACS Cent. Sci. 2018, 4, 1520−1530.(64) Wei, J. N.; Duvenaud, D.; Aspuru-Guzik, A. Neural Networksfor the Prediction of Organic Chemistry Reactions. ACS Cent. Sci.2016, 2, 725−732.(65) Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.;Jensen, K. F. Prediction of Organic Reaction Outcomes UsingMachine Learning. ACS Cent. Sci. 2017, 3, 434−443.(66) Goḿez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernańdez-Lobato, J. M.; Sańchez-Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A.Automatic Chemical Design Using a Data-driven ContinuousRepresentation of Molecules. ACS Cent. Sci. 2018, 4, 268−276.(67) Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative AdversarialNets. In Advances in Neural Information Processing Systems 27;Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D.,Weinberger, K. Q., Eds.; Neural Information Processing SystemsFoundation, Inc., 2014; pp 2672−2680.(68) Yu, L.; Zhang, W.; Wang, J.; Yu, Y. SeqGAN: SequenceGenerative Adversarial Nets with Policy Gradient. arXiv 2017,1609.05473.(69) Bowman, S. R.; Vilnis, L.; Vinyals, O.; Dai, A.; Jozefowicz, R.;Bengio, S. Generating Sentences from a Continuous Space.Proceedings of The 20th SIGNLL Conference on Computational NaturalLanguage Learning, Berlin, Germany, August 11−12, 2016; pp 10−21.(70) Sutskever, I.; Martens, J.; Hinton, G. Generating Text withRecurrent Neural Networks. In Proceedings of the 28th InternationalConference on Machine Learning, Bellevue, WA, June 28−July 2, 2011.(71) Sutskever, I.; Vinyals, O.; Le, Q. V. Advances in NeuralInformation Processing Systems 27; Neural Information ProcessingSystems Foundation, Inc., 2014; pp 3104−3112.(72) Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.;Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representa-tions using RNN EncoderDecoder for Statistical Machine Translation.In Proceedings of the 2014 Conference on Empirical Methods in NaturalLanguage Processing (EMNLP), Doha, Qatar, October 25−29, 2014;pp 1724−1734.(73) Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translationby Jointly Learning to Align and Translate. arXiv 2014, 1409.0473.(74) Segler, M. H. S.; Kogej, T.; Tyrchan, C.; Waller, M. P.Generating Focused Molecule Libraries for Drug Discovery withRecurrent Neural Networks. ACS Cent. Sci. 2018, 4, 120−131.(75) Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. De NovoDesign of Bioactive Small Molecules by Artificial Intelligence.Mol. Inf.2018, 37, 1700153.(76) Yuan, W.; Jiang, D.; Nambiar, D. K.; Liew, L. P.; Hay, M. P.;Bloomstein, J.; Lu, P.; Turner, B.; Le, Q.-T.; Tibshirani, R.; Khatri, P.;Moloney, M. G.; Koong, A. C. Chemical Space Mimicry for DrugDiscovery. J. Chem. Inf. Model. 2017, 57, 875−882.(77) Popova, M.; Isayev, O.; Tropsha, A. Deep ReinforcementLearning for De Novo Drug Design. Sci. Adv. 2018, 4, No. eaap7885.(78) Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. MolecularDe-novo Design through Deep Reinforcement Learning. J. Cheminf.2017, 9, 48.(79) Guimaraes, G. L.; Sanchez-Lengeling, B.; Outeiral, C.; Farias, P.L. C.; Aspuru-Guzik, A. Objective-Reinforced Generative AdversarialNetworks (ORGAN) for Sequence Generation Models. arXiv 2017,1705.10843.(80) De Cao, N.; Kipf, T. MolGAN: An Implicit Generative Modelfor Small Molecular Graphs. arXiv 2018, 1805.11973.(81) Kusner, M. J.; Paige, B.; Hernańdez-Lobato, J. M. GrammarVariational Autoencoder. arXiv 2017, 1703.01925.(82) Dai, H.; Tian, Y.; Dai, B.; Skiena, S.; Song, L. Syntax-DirectedVariational Autoencoder for Structured Data. arXiv 2018,1802.08786.



J


(83) Liu, Q.; Allamanis, M.; Brockschmidt, M.; Gaunt, A. L.Constrained Graph Variational Autoencoders for Molecule Design.arXiv 2018, 1805.09076.(84) Kipf, T. N.; Welling, M. Variational Graph Auto-Encoders.arXiv 2016, 1611.07308.(85) Jin, W.; Barzilay, R.; Jaakkola, T. Junction Tree VariationalAutoencoder for Molecular Graph Generation. arXiv 2018,1802.04364.(86) Yang, X.; Zhang, J.; Yoshizoe, K.; Terayama, K.; Tsuda, K.ChemTS: An Efficient Python Library for de novo MolecularGeneration. Sci. Technol. Adv. Mater. 2017, 18, 972−976.(87) Ikebata, H.; Hongo, K.; Isomura, T.; Maezono, R.; Yoshida, R.Bayesian Molecular Design with a Chemical Language Model. J.Comput.-Aided Mol. Des. 2017, 31, 379−391.(88) Williams, K.; Bilsland, E.; Sparkes, A.; Aubrey, W.; Young, M.;Soldatova, L. N.; Grave, K. D.; Ramon, J.; de Clare, M.; Sirawaraporn,W.; Oliver, S. G.; King, R. D. Cheaper Faster Drug DevelopmentValidated by the Repositioning of Drugs against Neglected TropicalDiseases. J. R. Soc., Interface 2015, 12, 20141289.(89) King, R. D.; Rowland, J.; Aubrey, W.; Liakata, M.; Markham,M.; Soldatova, L. N.; Whelan, K. E.; Clare, A.; Young, M.; Sparkes, A.;Oliver, S. G.; Pir, P. The Robot Scientist Adam. Computer 2009, 42,46.(90) Sparkes, A.; Aubrey, W.; Byrne, E.; Clare, A.; Khan, M. N.;Liakata, M.; Markham, M.; Rowland, J.; Soldatova, L. N.; Whelan, K.E.; Young, M.; King, R. D. Towards Robot Scientists for AutonomousScientific Discovery. Automated Experimentation 2010, 2, 1.(91) King, R. D.; Rowland, J.; Oliver, S. G.; Young, M.; Aubrey, W.;Byrne, E.; Liakata, M.; Markham, M.; Pir, P.; Soldatova, L. N.;Sparkes, A.; Whelan, K. E.; Clare, A. The Automation of Science.Science 2009, 324, 85−89.(92) King, R. D.; Liakata, M.; Lu, C.; Oliver, S. G.; Soldatova, L. N.On the Formalization and Reuse of Scientific Research. J. R. Soc.,Interface 2011, 8, 1440−1448.(93) King, R. D. Rise of the Robo Scientists. Sci. Am. 2011, 304, 72−77.(94) Milo, A. The Art of Organic Synthesis in the Age ofAutomation. Isr. J. Chem. 2018, 58, 131−135.(95) Cohn, D. A.; Ghahramani, Z.; Jordan, M. I. Active Learningwith Statistical Models. J. Artif. Intell. Res. 1996, 4, 129−145.(96) King, R. D.; Costa, V. S.; Mellingwood, C.; Soldatova, L. N.Automating Sciences: Philosophical and Social Dimensions. IEEETechnol. Soc. Mag. 2018, 37, 40−46.(97) Krishnadasan, S.; Brown, R. J. C.; deMello, A. J.; deMello, J. C.Intelligent Routes to the Controlled Synthesis of Nanoparticles. LabChip 2007, 7, 1434−1441.(98) Reizman, B. J.; Jensen, K. F. Feedback in Flow for AcceleratedReaction Development. Acc. Chem. Res. 2016, 49, 1786−1796.(99) Reizman, B. J.; Jensen, K. F. Simultaneous Solvent Screeningand Reaction Optimization in Microliter Slugs. Chem. Commun. 2015,51, 13290−13293.(100) Bed́ard, A.-C.; Adamo, A.; Aroh, K. C.; Russell, M. G.;Bedermann, A. A.; Torosian, J.; Yue, B.; Jensen, K. F.; Jamison, T. F.Reconfigurable System for Automated Optimization of DiverseChemical Reactions. Science 2018, 361, 1220−1225.(101) Hsieh, H.-W.; Coley, C. W.; Baumgartner, L. M.; Jensen, K.F.; Robinson, R. I. Photoredox Iridium−Nickel Dual-CatalyzedDecarboxylative Arylation Cross-Coupling: From Batch to Continu-ous Flow via Self-Optimizing Segmented Flow Reactor. Org. ProcessRes. Dev. 2018, 22, 542−550.(102) Bourne, R. A.; Skilton, R. A.; Parrott, A. J.; Irvine, D. J.;Poliakoff, M. Adaptive Process Optimization for ContinuousMethylation of Alcohols in Supercritical Carbon Dioxide. Org. ProcessRes. Dev. 2011, 15, 932−938.(103) Houben, C.; Peremezhney, N.; Zubov, A.; Kosek, J.; Lapkin,A. A. Closed-Loop Multitarget Optimization for Discovery of NewEmulsion Polymerization Recipes. Org. Process Res. Dev. 2015, 19,1049−1053.

(104) Echtermeyer, A.; Amar, Y.; Zakrzewski, J.; Lapkin, A. Self-optimization and Model-based Design of Experiments for Developinga C−H Activation Flow Process. Beilstein J. Org. Chem. 2017, 13, 150.(105) Jeraal, M. I.; Holmes, N.; Akien, G. R.; Bourne, R. A.Enhanced Process Development Using Automated ContinuousReactors by Self-optimization Algorithms and Statistical EmpiricalModelling. Tetrahedron 2018, 74, 3158−3164.(106) Sans, V.; Porwol, L.; Dragone, V.; Cronin, L. A SelfOptimizing Synthetic Organic Reactor System Using Real-time In-line Nmr Spectroscopy. Chem. Sci. 2015, 6, 1258−1264.(107) Fitzpatrick, D. E.; Battilocchio, C.; Ley, S. V. A NovelInternet-based Reaction Monitoring, Control and Autonomous Self-optimization Platform for Chemical Synthesis. Org. Process Res. Dev.2016, 20, 386−394.(108) Poscharny, K.; Fabry, D.; Heddrich, S.; Sugiono, E.; Liauw,M.; Rueping, M. Machine Assisted Reaction Optimization: A Self-optimizing Reactor System for Continuous-flow PhotochemicalReactions. Tetrahedron 2018, 74, 3171−3175.(109) Holmes, N.; Akien, G. R.; Savage, R. J. D.; Stanetty, C.;Baxendale, I. R.; Blacker, A. J.; Taylor, B. A.; Woodward, R. L.;Meadows, R. E.; Bourne, R. A. Online Quantitative MassSpectrometry for the Rapid Adaptive Optimisation of AutomatedFlow Reactors. React. Chem. Eng. 2016, 1, 96−100.(110) Corteś-Borda, D.; Wimmer, E.; Gouilleux, B.; Barre,́ E.; Oger,N.; Goulamaly, L.; Peault, L.; Charrier, B.; Truchet, C.; Giraudeau, P.;Rodriguez-Zubiri, M.; Le Grognec, E.; Felpin, F.-X. An AutonomousSelf-Optimizing Flow Reactor for the Synthesis of Natural ProductCarpanone. J. Org. Chem. 2018, 83, 14286−14299.(111) Granda, J. M.; Donina, L.; Dragone, V.; Long, D.-L.; Cronin,L. Controlling an Organic Synthesis Robot with Machine Learning toSearch for New Reactivity. Nature 2018, 559, 377.(112) Grizou, J.; Points, L.; Sharma, A.; Cronin, L. A Closed LoopDiscovery Robot Driven by a Curiosity Algorithm Discovers Proto-Cells That Show Complex and Emergent Behaviours. ChemRxiv2018, DOI: 10.26434/chemrxiv.6958334.(113) Steiner, S.; Wolf, J.; Glatzel, S.; Andreou, A.; Granda, J. M.;Keenan, G.; Hinkley, T.; Aragon-Camarasa, G.; Kitson, P. J.;Angelone, D.; Cronin, L. Organic Synthesis in a Modular RoboticSystem Driven by a Chemical Programming Language. Science 2019,363, eaav2211.(114) Fitzpatrick, D.; Ley, S. V. Engineering Chemistry for theFuture of Chemical Synthesis. Tetrahedron 2018, 74, 3087−3100.(115) Fitzpatrick, D. E.; Maujean, T.; Evans, A. C.; Ley, S. V. Across-the-World Automated Optimization and Continuous-Flow Synthesisof Pharmaceutical Agents Operating Through a Cloud-Based Server.Angew. Chem., Int. Ed. 2018, 57, 15128−15132.(116) Roch, L. M.; Has̈e, F.; Kreisbeck, C.; Tamayo-Mendoza, T.;Yunker, L. P.; Hein, J. E.; Aspuru-Guzik, A. Chemos: OrchestratingAutonomous Experimentation. Science Robotics 2018, 3, No. eaat5559.(117) Nikolaev, P.; Hooper, D.; Webber, F.; Rao, R.; Decker, K.;Krein, M.; Poleski, J.; Barto, R.; Maruyama, B. Autonomy in MaterialsResearch: A Case Study in Carbon Nanotube Growth. npj Comput.Mater. 2016, 2, 16031.(118) Schneider, G. Automating Drug Discovery. Nat. Rev. DrugDiscovery 2017, 17, 97−113.(119) Has̈e, F.; Roch, L. M.; Kreisbeck, C.; Aspuru-Guzik, A.Phoenics: A Bayesian Optimizer for Chemistry. ACS Cent. Sci. 2018,4, 1134−1145.(120) Ballard, A. J.; Das, R.; Martiniani, S.; Mehta, D.; Sagun, L.;Stevenson, J. D.; Wales, D. J. Energy Landscapes for MachineLearning. Phys. Chem. Chem. Phys. 2017, 19, 12585−12603.(121) Artrith, N.; Urban, A.; Ceder, G. Constructing First-PrinciplesPhase Diagrams of Amorphous LixSi Using Machine-Learning-Assisted Sampling with an Evolutionary Algorithm. J. Chem. Phys.2018, 148, 241711.(122) Choudhary, K.; DeCost, B.; Tavazza, F. Machine Learningwith Force-Field-Inspired Descriptors for Materials: Fast Screeningand Mapping Energy Landscape. Phys. Rev. Materials 2018, 2, 083801.



K

http://dx.doi.org/10.26434/chemrxiv.6958334http://dx.doi.org/10.1021/acsami.9b01226

(123) Abbasnejad, M. E.; Shi, Q.; Abbasnejad, I.; Hengel, A. v. d.;Dick, A. Bayesian Conditional Generative Adverserial Networks. arXiv2017, 1706.05477.



L

Autonomous Molecular Design: Then and...

Documents

Transcript of Autonomous Molecular Design: Then and...