The Dark Reactions Project: A Case Study in Using …...The Dark Reactions Project: A Case Study in...
Transcript of The Dark Reactions Project: A Case Study in Using …...The Dark Reactions Project: A Case Study in...
The Dark Reactions Project: A Case Study in Using Machine Learning to
Accelerate Exploratory Materials Synthesis
Joshua Schrier Haverford College /Fordham University
A Related Effort: The Mission Innovation
Materials Acceleration Platform Workshop Report (Jan 2018)
The workshop (September 2017 in Mexico City) drew 133 attendees: ● 55 professors and scientists from top universities and research institutions; ● 6 keynote speakers and panellists, including Nobel Laureate Dr. Mario Molina; ● 16 MI member governments represented: Australia, Canada, Denmark, Finland,
France, Germany, European Union, India, Italy, Korea, Mexico, Netherlands, Norway, Saudi Arabia, United Kingdom, and United States;
● affiliates of Mexico- and U.S.-based universities, groups, labs, and companies; ● graduate students and postdoctoral researchers; and observers from different
countries
The Workshop
Updating the vision of MGI 1.0
Materials Genome Initiative 2011
Simula'on
Machine
Learning
Robo'c
Synthesis
Mission Innovation Challenge 6 2017
Main Recommendation: Close the Loop!
Automated
Synthesis
Automated
Characteriza'on
Screencomputa'onally(AI+simula'on)
Biggest surprise: Importance of modularity.
SearsTower
BurjKhalifa
MA
P e
lem
ents
1.Closingtheloop
2.AIforMaterials
3.ModularMaterialsRobo'cs
4.InverseDesign
5.BridgingLengthandTimescales
6.DataInfrastructureandExchange
1. Closing the loop Integratepowerful,yetusuallyseparateelementsofmaterialsdesigninaclosedloop
Autonomous closed-loop nanoparticle synthesis system Benji Maruyama Air Force Research Laboratories
2. AI for Materials
Thescaleanddetailsoftheore'cal,computa'onal,synthe'c,andcharacteriza'onevidenceinmaterialsresearchrequiretheestablishmentofthisnewbranchofAI.
Na'onalandinterna'onalresearchorganiza'onsmustfacilitateanintegratedcomputerandmaterialsscienceresearchefforttodevelopalgorithmsthatmimic,andthensupersede,theintellectandintui'onofexpertmaterialsscien'sts.
Toaccommodateevolvingmaterialsdemandsandtheever-expandingbreadthofcleanenergytechnologies,autonomouslaboratoriesmustremainnimbleandmo'vateamodularapproachtothedevelopmentofmaterialsscienceautoma'on.
Trea'ngtechniquesandmaterialsasmodularbuildingblocksfostershuman-machinecommunica'onandsimplifiesthepathtomaterialsexplora'onbeyondtheboundsofknownmaterials.
The Synthesis Machine Marty Burke, University of Illiniois at Urbana Champaign
3. Modular Materials Robotics
Inversedesignenablesautomatedgenera'onofcandidatematerialsdesignedtomeettheperformance,cost,andcompa'bilityrequirementsofagivencleanenergytechnology.
Pro
perty
rang
e FORWARD
INVERSE
4. Inverse Design
The Materials Project
Pitt Quantum Repository
Harvard Clean Energy Project AFLOW-lib and AFLOW-ml
6. Data Infrastructure
So where are we today?
A case study on “The Dark Reactions Project”
Initial Motivations • Most synthesis reactions are “failures”
● Archivedinlaboratorynotebooks● Neverreportedintheliterature● “Darkreac'ons”
• Can we learn from the “dark reactions”? ● Defineboundaryof“successful”reac'ons● Useasadatasetformachinelearning● Predictnewreac'ons?Speedupdiscovery?● Developnew(testable)chemicalhypotheses?
Amine-Templated Metal Oxides…
Norquist et al. Inorg. Chem. 2006, 45, 5529.
0
c
b
Norquist et al. Inorg. Chem. 2009, 48, 11277.
Norquist et al. Inorg. Chem. 2004, 43, 6528; Norquist et al. Cryst. Growth Des. 2005, 5, 1913.
Norquist et al. J. Solid State Chem. 2012, 195, 86.
Norquist et al. J. Solid State Chem. 2011, 184, 1445.
Norquist et al. Inorg. Chem. 2014, 53, 12027.
Norquist et al. Inorg. Chem. 2012, 51, 11040.
…produced by hydrothermal synthesis.
Norquist et al. Inorg. Chem. 2006, 45, 5529.
0
c
b
Norquist et al. Inorg. Chem. 2009, 48, 11277.
Norquist et al. Inorg. Chem. 2004, 43, 6528; Norquist et al. Cryst. Growth Des. 2005, 5, 1913.
Norquist et al. J. Solid State Chem. 2012, 195, 86.
Norquist et al. J. Solid State Chem. 2011, 184, 1445.
Norquist et al. Inorg. Chem. 2014, 53, 12027.
Norquist et al. Inorg. Chem. 2012, 51, 11040.
Hydrothermal syntheses
…described by a simple, modular recipe. • Two metal sources • One amine • (Oxalic acid) • Water • pH • Temperature • Time
The Problem
• Exploratory synthesis—vast chemical space • Can not predict reaction outcome!
• Unanticipated structures are often observed • No predictive model for reaction success
Is this simply a classifier problem?
Inputs: Concentration pH Time Temperature Etc.
Outputs: Big crystal Small crystal Something bad (“tar”) No reaction
“chemistry”
“machine learning”
Including data on unpublished "failed" experiments is essential for reliable ML models.
unreported experimental
details
• To distinguish “success” and “failure”, we need examples of both.
• But publication norms preclude exhaustive detail ● Noreportsof“failure”
● Onlyonereportof“success”percompound
● “Everythingshouldwork?”
• The reactions have been performed and recorded… ● ...instacksofoldlaboratorynotebooks
• Dark Reactions Project ● h\p://darkreac'ons.haverford.edu
Input Property 1
Inpu
t Pro
pert
y 2
Nature 533, 73-76 (2016)
Nature 533, 73-76 (2016)
Nature 533, 73-76 (2016)
Hydrothermal Synthesis • Two metal sources • One amine • (Oxalic acid) • Water • pH • Temperature • Time
Simple database schema!
Nature 533, 73-76 (2016)
You can’t build a classifier on raw data…
Inputs: Concentration pH Time Temperature Etc.
Outputs: Big crystal Small crystal Something bad (“tar”) No reaction
“chemistry”
“machine learning”
…you need an appropriate representation.
Inputs: Concentration pH Time Temperature Etc.
Outputs: Big crystal Small crystal Something bad (“tar”) No reaction
“chemistry”
327-dim reaction description vector
Turning lab data into descriptors • Start with laboratory notebook data
● Reagentnames,masses,reac'oncondi'ons● Categorizedcrystalqualityoutcomes(1..4)
• Use cheminformatics to add physical properties ● [Keepthereac'oncondi'ons,e.g.,temperature,'me]● Stoichiometries,ra'os● Organicmoleculeproper'es
● Solubility,polarsurfacearea,#hbonddonors/acceptors,etc.
● Inorganiccomponentproper'es● Electronega'vity,ioniza'onenergy,radii,posi'oninperiodictable,etc.
• Use these as reaction descriptors (327 per reaction) Nature 533, 73-76 (2016)
Be careful how you test & train…
• Random test & train does not generalize! • “Exploratory” split
1. Puteverysetofreac'onsthatsharetheexactsamereactantsintoeitherthetrainortestset
2. Testshowthemodelperformson“novel”reac'ons● Avoid“hidden”overficng.Seealsorelevantrecent
workbyMaxHutchinson(Google)&IzharWallach(Atomwise)
Nature 533, 73-76 (2016)
• …and recent work in KDD ‘18 “Optimizing a Machine Learning System for Materials Discovery” (and GMN’s thesis)
Nature 533, 73-76 (2016)
Database-assisted Synthesis Planning
• Use databases of all commercially available organics ● Emolecules.com● Keeponlydiamines,eliminate“bad”func'onalgroups● Select34/1681basedonpriceandavailability
• Sample across structural similarity to existing amines • Chemical novelty
● “Thechemistryofthese34aminesisessen'allyunknownintheforma'onoforganicallytemplatedmetaloxides,asindicatedbythealmostcompleteabsenceofsuchcompoundsintheCambridgeStructuralDatabase(CSD).Onaverage,2structureshavebeenreportedforeachofthe34amines,with19notexis'nginanytemplatedmetaloxidestructuretheCSD.Incontrast,anaverageof151uniquestructuresexistforthemostfrequentlyusedamines(piperazine,ethylenediamine,4,4’-dipyridyl,anddabco).”
Nature 533, 73-76 (2016)
Nature 533, 73-76 (2016)
Human versus Machine • Human expert’s best “intuition”:
● Scalethemolequan''esoftheorganictomatchthesuccessful“seedreac'on”
• Model predictions ● Scanthroughthousandsofpossiblecondi'ons
● Computa'onalblackbox(SVM)predictsoutcomes● Doexperimentonmostpromisingoutcome
• Tested with 495 new experiments
Nature 533, 73-76 (2016)
Katie Elbert ’14 (UPenn) Yunwen (‘Helen’) Yang ‘15 Wenjia (‘Scott’) Huang ’15 (JP Morgan/Chase) Aurellio Mollo ’17 (Harvard) Malia Wenny ’17(Harvard)
We made many new compounds!
Human 78% Machine 89%
Fischer exact test: p<0.01 that machine is not better than human
Interpretable ML models can reveal insight into reaction processes and provide new
hypotheses—ML need not be a "black box".
Nature 533, 73-76 (2016)
SVMs are not “human readable”
• SVMs lead to high-quality predictions ● …butwecan’tlearnfromthem!
• Build a “model of the model” ● Surrogatemodels● TrainadecisiontreethatreproducestheSVMpredic'ons
Nature 533, 73-76 (2016)
Nature 533, 73-76 (2016)
Nature 533, 73-76 (2016)
What do the humans learn ? • Low polarizability amines require no Na+, longer reaction times, pH > 3
(green) • Spherical, compact amines (with moderate polarizabilities) require the
presence of S (blue) • Long linear tri- and tetramines (with high polarizabilities) require the
presence of oxalate (red)
Nature 533, 73-76 (2016)
Nature 533, 73-76 (2016)
ML can be used to generate ideas for new experimental plans, and
optimal ways of verifying proposed hypotheses.
• Primary: Reactant concentrations ● Specia'onandrela'veconcentra'ons
• Secondary: Charge density matching ● Fixedaminechargedensi'es
● Variablechargedensi'esontheinorganiccomponents
• Tertiary: Weaker influences ● Sterics
● Hydrogen-bonding
● Symmetry
Ferey, G. J. Fluorine Chem. 1995, 72, 187.
Can we test existing conceptual “models” of crystal formation?
Descriptors represent the theory…
Type Subset Descriptor
Reac'on Stoichiometry Amineamount(moles)
Vanadiumamount(moles)
Seleniumamount(moles)
Condi'ons Ini'alpH
Timeatmaximumtemperature
Maximumtemperature
Amine Aminestructure Chainlength
Molecularweight
Bondcount
Nitrogencount
Primaryammoniumsite(yes/no)
Cyclicstructure(yes/no)
Spherical(yes/no)
Amineacidity MinimumpKa
MaximumpKa
Chargedensity Maximalprojec'onarea/nitrogen
Generalproper'es ReagentisanHClsalt(yes/no)
Inorganics Vanadiumcounterions VanadiumsourcecontainsNH4+(yes/no)
VanadiumsourcecontainsNa+(yes/no) Adapted from Ferey, G. J. Fluorine Chem. 1995, 72, 187.
Polyhedron 114, 184-193 (2016)
…select amines to test the theory. • 14 amines were selected using several criteria
● Shape(linear,cyclic,bicyclic)
● Amineproper'es(1o,2o,3o,mixed)
● pKa
● Chargedensity
Adapted from Ferey, G. J. Fluorine Chem. 1995, 72, 187.
Polyhedron 114, 184-193 (2016)
Send students into the lab to generate data…
Polyhedron 114, 184-193 (2016)
• C4.5 decision tree algorithm uses information gain as the criteria for selecting branches
• Use this to generate the hierarchy of influences supported by the experimental data
Polyhedron 114, 184-193 (2016)
Hierarchy of Influences = Decision Tree
What amine properties govern 2D-layer formations?
Molecular Systems Design & Eng. (2018) DOI: 10.1039/c7me00127d
Historical dataset: 75 experiments
What amine properties govern 2D-layer formations?
Molecular Systems Design & Eng. (2018) DOI: 10.1039/c7me00127d
Historical dataset: 75 experiments
Usethesereactantproper'es +stoichiometryvaria'ons
…todesignafrac'onalfactorialdesignexperiment(128expts)(23amines)thatcreatesthedatasetwewouldneedtobuildamodel
Two new layers types appear…
Molecular Systems Design & Eng. (2018) DOI: 10.1039/c7me00127d
What reactants best test the model?
Molecular Systems Design & Eng. (2018) DOI: 10.1039/c7me00127d
What reactants best test the model?
Iden'fy19newaminesnearthedecisionboundaries
Runtheexperiments!
Refinethedecisionboundary
Molecular Systems Design & Eng. (2018) DOI: 10.1039/c7me00127d
A dream (and work in progress)
“Self-driving Chemistry” Semi-autonomous materials discovery ● Robo'csynthesis,103experimentsperday…
● APIforperformingexperimentsandstoring/retrievingresults(includingexperimentalmetadata)
● …drivenbyAc'veLearning+InterpretableModels● Itera'velychoose“op'mal”experimentsattheexplora'on/exploita'onfron'er,basedonmodeluncertainty
● Gethumaninput/vetoonproposedexperiments● Quan'fy“valueadded”ofhumanexperts
● Outputhumanreadable“rules”● …forasystemwithuncleardesignspace
● e.g.,Organohalideperovskites● fasterexperiments● tremendousstructuraldiversity(byvaryingorganiccomponents)● photovoltaics/sensors
Synergistic Discovery and Design (SD2)
Organohalide hybrid perovskites are an emerging class solution-processable
semiconductor of materials…
Source: DOI: 10.1039/C4EE00942H (Review Article) Energy Environ. Sci., 2014, 7, 2448-2463
…with promising applications to photovoltaics, etc.
“Ordinary” perovskite synthesis
Our approach: Robots
Liana Alves ’18 Alyssa Sherman ‘18 Peter Cruz Parilla ‘19
“Robot ready” perovskite synthesis
88
• DOE VFP @ LBNL • EmoryChan(MolecularFoundry)
• LianaAlves’18
Sample data: Crystal formation
Liana Alves ’18 (!UCSD) Peter Cruz Parilla ‘20 Alyssa Sherman ’18 (!UPenn) Emily Brown ‘19
89
Optical microscopy
Sample data: Crystal quality & properties
~800 rxns in 4 days—discover new crystal phase…story in progress…
Early finding: New retrograde soluble phase of phenethylammonium lead iodide
© 2017 Emerald Cloud Lab. All rights reserved.
The Emerald Cloud Lab provides a software interface to remotely control every aspect of experiments run in ECL’s automated facilities
© 2017 Emerald Cloud Lab. All rights reserved.
Emerald Cloud Lab Overview
Interpretable+Active learning (with oversight) • Interpretable: explainwhyMLdecisionsaremade
• Ac've:selectnewexperimentsthatbestimprovethemodel(itera've)
• Oversight:Expertcanveto“hypothesis”and“proposedexperiment”—quan'fyvalueaddedofhumaninloop
Exampleproposedexperiment
UncertaintyexplanaBon
Richard Philips ’18 (!Cornell)
Lessons learned • “Failed” reactions make better classifiers…
● Findoverlooked“surprises”latentinarchivaldata● Challenge:Needforpublicrepositories/datasets
• …simple models can be useful… ● Increasedlaboratoryproduc'vity/novelty● Challenges:
● meaningfulrepresenta'onsofmaterials&experiments● smallandwidedata,withlotsofhumanbias
• …and you might even learn something! ● Keepushonestaboutour“folk-theories”● Givesusideasofnewideastotest…● …andideasonhowtotestthem.
• Robots! Coming to a lab near you… ● Notjustspeed…reproducibility….● Importanceofmodularity
• Postdoctoral Fellows ● PhilipAdler(2015-2016)(Visa)—”original”darkreac'onsproject
● IanPendeton(2018-)—perovskites
• Undergraduate Students ● PaulRaccuglia’14(Google)—modeldevelopment
● Ka'eElbert’14(UPenn)—tes'ngreac'ons
● Yongjia“Sco\”Huang’15(JPMorgan/Chase)—tes'ngreac'ons
● Yunwen“Helen”Yang‘15—tes'ngreac'ons
● CaseyFalk’16(Amazon)—website/databasedevelopment
● GeoffreyMar'n-Noble’16(Google)—Dataentry,modeldevelopment
● DavidReilley’16(UCLA)—Dataentry
● AurellioMollo’17(Harvard)—tes'ngreac'ons,DFTfornewdescriptors
● MaliaWenny’17(Harvard)—tes'ngreac'ons,DFTfornewdescriptors,ini'alperovskiteswork
● PhilipNega’16(UPenn)—tes'ngreac'ons,experimentdesignproject
● BrianGuggenheimer’16(Amazon)—website/visualiza'ons
● NoraTien’18(Google)—website/visualiza'ons,modeldevelopment
● RosalindXu’18(!Harvard)—tes'ngreac'ons,factorialexptdesignproject;
● XiwenJia’19—structuraloutcomedescriptors;irra'onalityinexperimentplanning
● RichardPhilips’19(!Cornell)—ac'velearning
● LianaAlves’18(!UCSanDiego)—robo'csynthesisofperovskites
● PeterCruzParilla‘20—hypothesisvalida'on,perovskitechemistry
● AlyssaSherman’18(!UPenn)—hypothesisvalida'on,perovskitechemistry,
Acknowledgements
• Collaborators ● EmoryChan(MolecularFoundry,LBNL)—high-throughputperovskitesynthesis
● SorelleFriedler(Haverford)—interpretablemodels
● AlexanderNorquist(Haverford)—hydrothermalsynthesis
● EmeraldCloudLabs—robotarmy
• Funding ● NSFDMR-1307801&DMR-1709351● DARPAHR001118C0036
● WorkatMolecularFoundry:DOEBESDE-AC02-05CH11231
● DOEWorkforceDevelopment(WDTS)Visi'ngFacultyProgram(VFP)
● Dreyfus-TeacherScholarAward
Acknowledgements
Questions? [email protected] http://darkreactions.haverford.edu