Artificial intelligence in chemistrySYNTHETIC ORGANIC CHEMISTRY Synthesis is the process of creating...
Transcript of Artificial intelligence in chemistrySYNTHETIC ORGANIC CHEMISTRY Synthesis is the process of creating...
ARTIFICIAL INTELLIGENCE IN CHEMISTRY Blaine Berrington
OVERVIEW
• Provide an overview of synthesis and its challenges
• Discuss past approaches of AI to synthesis
• Discuss solutions that modern AI poses for synthesis
• Synthesis planning
• Reaction prediction
• summarize other applications of AI to the field of chemistry
SYNTHETIC ORGANIC CHEMISTRY
Synthesis is the process of creating new molecules designated as a target through controlled stepwise chemical reactions.
• Used in a huge array of industries from pharmaceuticals and dyes to superconductors and plastics
• Involves a number of problem solving strategies
• Requires meticulous planning and skill to carry out
https://techtransfer.cancer.gov/aboutttc/successstories/taxol
https://en.wikipedia.org/wiki/Paclitaxel
SYNTHESIS
• Intermediate approach
• Direct Associative approach
• Logic-centered approach
• Reduction of chemical complexity
• Formation of a “tree” of paths
• Time consuming
https://www.organic-chemistry.org/totalsynthesis/totsyn04/quinine-woodward-williams.shtm
LOGIC-CENTERED SYNTHESIS
• Perception of structurally important features within a target molecule: • Functional groups • Stereocenters • Regional reactivity (instability and sensitivity)
• Reductions of molecular complexity (goal) • Internal connectivity scission • Chain/appendage reduction • Functionality removal • Stereochemistry simplification • Instability removal
• Subgoals • Functional group interchange • Protecting groups/positional groups • Rearrangement
RETROSYNTHESIS OF A NATURAL PRODUCT (PENICILLIN)
https://chemistonthekeys.wordpress.com/2012/03/12/classic-synthesis-i-penicillin-v/
SYNTHESIS OF A NATURAL PRODUCT (PENICILLIN)
https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=i
mages&cd=&ved=2ahUKEwi2o7bkuv7lAhUMP30KHR_pB3QQj
Rx6BAgBEAQ&url=%2Furl%3Fsa%3Di%26rct%3Dj%26q%3D%26e
src%3Ds%26source%3Dimages%26cd%3D%26cad%3Drja%26u
act%3D8%26ved%3D%26url%3Dhttps%253A%252F%252Fwww.r
edbubble.com%252Fpeople%252Fnarwhalfire%252Fworks%25
2F23613992-penicillin-v-total-
synthesis%26psig%3DAOvVaw2eKb1R_Cf-
BD8pISh5HO7e%26ust%3D1574533687162943&psig=AOvVaw2
eKb1R_Cf-BD8pISh5HO7e&ust=1574533687162943
https://www.youtube.com/watch?v
=rh0Tn_oPS30
DEVELOPING NEW MOLECULES
• You guess
• Quantitative structure–activity relationship (QSAR) approach
SOFTWARE FOR SYNTHESIS THEN(1960’S)
• Started DENDRAL, LHASA(Corey and Wipke), and SEC (WIPKE) nearly 50 years ago
• Retrosynthesis oriented programs for determining reaction routes were SEC and LHASA
• DENDRAL was used in characterization of unknown molecules via spectral data
• How do computers relate?
RETROSYNTHESIS TO COMPUTERS
LHASA
• Programmed and maintained by Corey as a tool of efficiency for synthesis design
• Interactive with chemist to yield reaction pathways of interest
• Chemist input of target molecule
• boundary conditions with goals are defined
• Inverse synthetic operations that satisfy goals are computed and unlikely solutions are deleted
• Chemist then assesses the outputted precursors
LHASA
• Graphical Module
• Chemist draws a structure
• Represented as a connection table for atoms and bonds
• Coupled with a list of coordinates for atom positions
• Other representations fall short (line notation)
LHASA
• Perception Module
• Recognizes functional groups, chains, rings, symmetry, redundancy, and related atoms
• For Rings: A is the origin atom and path is grown along network until it doubles back on itself. (An appears before Ai) The ring is then added to a list of rings.
Path: A1, A2,..., Ai,..., An, An-1
Sequence: Ai, Ai+1,... An-1
*Allows for set operations
https://www.onlinemathlearning.com/union-set.html
LHASA
• Strategy and control module
• Heuristics applied: introduction of reactive functionalities, mechanistic disconnections, transforms that lead to disconnections
• Knowledge based rules are applied
• Requires a knowledge base of fundamental reactions
LHASA
• Modification module
• Subroutines operating on the connection table are applied to introduce the transforms necessary for generating the precursors
• Making and breaking of bonds, loss/addition of atoms, loss/addition of charge.
LHASA
• Evaluation Module
• Bulk of evaluation is done by the chemist
• Program evaluates valence violations, etc..
• Structural simplicity is evaluated (rings, appendages, etc.)
*ring system simplicity
LHASA
LHASA
DENDRAL (1965)
• Utilized a heuristic based approach to determining molecular structure based off spectral data
• Isomers, alcohols, and ketones were problematic
DENDRAL
• Heuristic approach
LHASA
• Problems with LHASA and DENDRAL
• Memory limitations
• Difficult to add new reactions
• Backtracking the rationale of solutions is difficult
• Doesn’t scale well
AI AND CHEMISTRY TODAY
• Synthesis planning
• Prediction of Organic reaction outcomes
• Robot chemists
• Chemical property prediction
SYNTHESIS PLANNING
• Deep neural networks trained on fundamental organic chemistry retrosynthesis rules
• Trained program run in collaboration with monte carlo search tree algorithm
• Selection
• Expansion
• Exploration
• Updating
• Reaction prediction indistinguishable from a human’s
SYNTHESIS PLANNING
• Training for fundamental rules of organic synthesis
• Neural networks can be trained to recognize and apply retrosynthetic fundamentals
• Use reaction records as a training basis
SYNTHESIS PLANNING
• Monte Carlo Search tree algorithm
• Selection • Child node with greatest probability of succeeding is selected
• Expansion • Successor nodes to the previously selected node are expanded
• Exploration • Reinforcement learning to make random decision further down from children nodes • Children nodes are explored at random assigning a “reward” to each one based on
the proximity of its output to the desired solution
• Updating • Parent nodes are updated based on the scores of the children nodes • A pathway is then selected after updating nodes “reward” states based on selection
of a node that satisfies the query
MONTE CARLO SEARCH
PREDICTION OF REACTION OUTCOMES
• Determine reaction outcome based on reactants and conditions
• The term “reaction” is an abstraction
• Prediction is based on three approaches
• Physical laws
• Rule based expert systems
• Inductive machine learning
PREDICTION OF REACTION OUTCOMES
• Rule based expert systems
• Employs heuristics, graph rewrite patterns, and constraints
• Drawbacks
• Large knowledge base
• Not scalable
• Confined in its ability
PREDICTION OF REACTION OUTCOMES
• Physical laws
• reactions are modeled as minimum energy paths between stable configurations on a high-dimensional potential energy surface, where saddle points represent transition states.
• Schrodinger’s equation cannot be solved for exact solutions
PREDICTION OF REACTION OUTCOMES
• Mechanistic
• Easier to predict
• *preferred representation
PREDICTION OF REACTION OUTCOMES
• A novel approach
• Incorporates idealized graph based MO’s
• Trained on “productive” reactions
• MO constructive interaction is statistically ranked
• (electron filled/unfilled MO)
PREDICTION OF REACTION OUTCOMES
• Reaction prediction model
• Requires training set of reactions
• mechanistic construction of MOs
• Ranking of productive mechanisms
PREDICTION OF REACTION OUTCOMES
• Construction of the molecular orbitals
• For a molecule “m” a connection graph is generated
• Vertices Am represent labeled atoms and the edges Bm
• Quadruples of the filled and unfilled orbitals are generated
• Each atom can have multiple MO designations
m = Gm(Am,Bm)
f := (a, tf , nf , cf )
PREDICTION OF REACTION OUTCOMES
• Trained neural network: Reaction site filtering
• Reaction explorer system provided training data
• Reactivity (l) is assessed based on the learned model for (a,c) where l=1 or 0
• Trained neural networks using sigmoidal activation functions in a single hidden layer and a single output node
• Gradients on the weights of the neural network are calculated with standard back-propagation
• provide a probabilistic prediction of an (a ,c ) tuple being labeled reactive
• Determines the possible reactions based on electron sources and sinks
• Possibilities involving unfilled unreactive MOs are disregarded
• Orbital interaction computed
PREDICTION OF REACTION OUTCOMES
• Orbital interaction is computed
PREDICTION OF REACTION OUTCOMES
• Orbital interaction ranking
• training on ordered pairs of productive and unproductive orbital interactions
• pair of shared weight artificial neural networks, each with a single sigmoidal hidden layer and a linear output node
• sigmoidal output layer with fixed weights of +1,-1
• Yields a ranked set of rational reaction outcomes based on reactants and conditions
PREDICTION OF REACTION OUTCOMES
• Conclusion
• Huge amounts of data exist for training
• Scalable
• Accurate
OTHER AI APPLICATIONS IN CHEMISTRY
• Robot chemists
• Property prediction
• Electron density prediction
• A lot more...
FUTURE OF AI IN CHEM
• AI will not replace chemists but will be a tool added to the chemist toolkit
• Automated reactions
REFERENCES
• Yanaka, M.; Nakamura, K.; Kurumisawa, A.; Wipke, W. T. Automatic Knowledge Base Building for the Organic Synthesis Design Program (SECS). Tetrahedron Computer Methodology 1990, 3 (6), 359–375.
• Proceedings of the 2019 Workshop on Network Meets AI & ML - NetAI19. 2019.
• Wipke, W.; Ouchi, G. I.; Krishnan, S. Simulation and Evaluation of Chemical Synthesis—SECS: An Application of Artificial Intelligence Techniques. Artificial Intelligence 1978, 11(1-2), 173–193.
• Peiretti, F.; Brunel, J. M. Artificial Intelligence: The Future for Organic Chemistry? ACS Omega 2018, 3 (10), 13263–13266.
• Kayala, M. A.; Azencott, C.-A.; Chen, J. H.; Baldi, P. Learning to Predict Chemical Reactions. Journal of Chemical Information and Modeling 2011, 51 (9), 2209–2222.
QUESTIONS