Introduction to Chemoinformatics

Dr. Igor V. Tetko

Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) Institute of Bioinformatics & Systems Biology (HMGU)

Kyiv, 13 August 2010, V Summer School AACIMP 2010

Helmholtz Zentrum München statistics

•  Part of Helmholtz Network (€3 Milliards, 30000 people)

•  Leading center for Environmental Health in Germany

•  25 institutes (1789 people, 569 scientists & 280 PhD students)*

•  70 contracts with EU

•  Disciplines

•  Biology 40%

•  Chemistry/biochemistry 15%

•  Physics 10%

•  Medicine 8%

•  Chemoinformatics group, Institute for Bioinformatics & Systems Biology

  9 peoples, strong expertise in in silico data analysis, machine learning methods, chemoinfomatics software development, data dissemination

*January 2010

Layout

•  Chemoiformatics??? What does it mean?

•  Role of chemoinformatics

•  in drug discovery

•  Chemical Biology, NIH Roadmaps

•  REACH, chemical safety

•  Definitions of chemoinformatics

•  OCHEM

•  Overview of the demonstration course

Cheminformatics & Bioinformatics keywords in SCOPUS

Chemoinfomatics Cheminformatics

2009 0

Bioinfomatics

about 10x lower if journals and references are exclued

There was no dedicated journal, many authors does not explicitly use world chemoinformatics

Dmitrii Ivanovich Mendeleev, 1834-1907

Discoverer of the Periodic Table — An Early “Chemoinformatician”

Why Mendeleev?

Faced with a large amount of data, with many gaps, Mendeleev:

•  Sought patterns where none were obvious,

•  Made predictions about properties of unknown chemical substances, based on observed properties of known substances,

•  Created a great visualization tool!

Why he? Why at that time?

“The Law of transformation of Quantity into Quality” by Karl Marx

Quantity aspects of chemoinformatics

Major challenges of Chemoinformatics

Millions of structures Thousand of publications

Storage, organization and search of information QSAR / QSPR studies: prediction of properties and activities

QSAR/QSPR - Quantitative Structure-Activity (Property) Relationship Studies

Chemical biology & drug discovery In silico environmental toxicology, REACH

Goal of chemistry: Synthesis of Properties

The most fundamental and lasting objective of synthesis is

not production of new compounds but

production of properties

George S. Hammond Norris Award Lecture, 1968

Where Chemoinformatics is required?

Pharma companies and public organisation

•  data collection and handling

•  model development QSAR/QSPR, in silico design

•  In vitro, in vivo data analysis and interpretation

•  > 140,000 compounds

•  Development of in silico methods to predict toxicity of chemical compounds

Chemical industry

•  Design of new properties; chemical synthesis

•  Prediction of toxicity of chemicals BEFORE they enter market

Target Discovery Lead Discovery

Lead Optimisation ADME/T

Lead Profiling ADME/T

Clinics Approval

1.000.000 Cpds <5000 Cpds < 500 Cpds < 5 Cpds

1 drug Up to 15 Years:

In vitro and in vivo property determination: Millions of screens for solubility, stability, absorption, metabolism, transport, reactive products, drug interactions, etc etc

Preclinics Costs: > $300m PER COMPOUND to reach approval

Pharma R&D: Cost and Productivity issues Compound numbers

Pharma: Declining R&D productivity in the drug discovery industry

Approved medicine

See http://www.frost.com/prod/servlet/market-insight-top.pag?docid=128394740

Pharma R&D Cost and productivity: Reasons for compound failure

TOP four reasons are connected to compound Absorption, Distribution, Metabolism and Excretion, all of which may contribute to lack of efficacy and Toxicity: ADME/T issues

Experimental measurements or computational predictions?

Pharmacokinetics

Animal toxicity

Adverse effects

Lack of efficacy

Commercial reasons

Miscellaneous

Chemoinformatics!

Public organizations: NIH Roadmaps

Dr. Elias Zerhouni Former NIH director (2002-2008) $29.5 billions in 2008 for 27 Institutes

Ukraine GDP is ca $127 billions Belarus GDP is ca $48 billions Chemical Biology

Public organizations: NIH Roadmaps

New pathways to discovery

•  understand complex biological systems

•  understand their connections

•  build better "toolbox" for researchers

Research teams of the future

Re-engineering the clinical research team

Molecular Libraries Screening Center Network (MLSCN)

•  10 centers were created

•  250 assays to screen >200,000 molecules

Chemoinfomatics!

•  PubChem database to handle data

Registration, Evaluation, Authorisation and Restriction of Chemical substances

European Chemicals Agency (ECHA) in Helsinki

REACH and Environmental Chemoinformatics

> 140,000 chemicals to be registered

It is expensive to measure all of them ($200,000 per compound), a lot of animal testing => €24 billions over next 10 years

QSAR models can be used to prioritize compounds

•  Compound is predicted to be toxic

•  Biological testing will be done to prove/ disprove the models

•  Compound is predicted to be not toxic

•  tests can be avoided, saving money, animals

•  but ... only if we are confident in the predictions

Kozma Prutkov

"One can not embrace the unembraceable.”

Possible: 1060 - 10100 molecules theoretically exist

Achievable: 1020 - 1024 can be synthesized now by companies (weight of the Moon is ca 1023 kg)

Available: 2*107 molecules are on the market

Measured: 102 - 105 molecules with ADME/T data

Problem: To predict ADME/T properties of just molecules on the market we must extrapolate data from one to 1,000 - 100,000 molecules!

1080 atoms in the Universe

Applicability domain: Key points

There are NO universal computational models that work well on the whole chemical space If x is small,

Sin(x) ≈ x

Applicability domain: Key points

There are NO universal computational models that work well on the whole chemical space

Performance on the test set Performance on a dissimilar set

Tetko et al, Chem. & Biodiver., 2009, 6(11), 197-202.

Making predictions with the experimental accuracy: case study of 96,000 Pfizer molecules

Non-reliable Reliable

60% of molecules are predicted with average accuracy of 0.3 log units

Definitions of Chemoinformatics

Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information. G. Paris, 1988

Chemoinformatics is the mixing of those information resources to transform data in to information and information in to knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization. F.K. Brown, 1998

Chemoinformatics is the application of informatics methods to solve chemical problems. J. Gasteiger, 2004

Chemoinformatics is a field dealing with molecular objects (graphs, vectors) in multidimensional chemical space. A. Varnek & A. Tropsha, 2007.

Chemoinfomatics: data to knowledge

What is required to create a good (chemoinformatics) model?

•  Data is the most essential part!

•  Appropriate representation of molecules (choice of descriptors is crucial)

•  Good statistical methods

>100,000 lines of code in Java

Database schema Simplified overview

Introducer Bill G., Sergey B.

Date of modification Informationsystem

Tags Toxicology, Biology, Partition coefficient.

logP = 0.5 Melting Point = 100

Property Temperature, pH, species,

tissue, method

Condition

Garberg, P “In vitro models for …”

Article Benzene, Urea, ...

Structure

Filtering Toxicology, Biology, Partition coefficient.

Data Point

Manipulation Editing

Organization Working sets<

Practical course on using OCHEM and data modeling on-line

Tomorrow 10:00-12:00 auditorium 235-6

Advice: make yourself familiar with the on-line tutorials available as http://ochem/tutorials

If you have data (SMILES/sdf file and molecular activities) – bring it with you – we can try to build model.

HMGU Team

Interested in Short-Term stays (3-12 months)? Apply!!!

http://www.eco-itn.eu

Introduction to Chemoinformatics

Education

Transcript of Introduction to Chemoinformatics

Application of Chemoinformatics to the structural ...carpe.usal.es/~roberto/papers/LopezIdeal2006.pdf · 1 Introduction Chemoinformatics is the application of informatics methods

Integrated Text Mining and Chemoinformatics Analysis ... · Integrated Text Mining and Chemoinformatics Analysis Associates Diet to Health Benefit at Molecular Level Kasper Jensen1,

Solved and Unsolved Problems in Chemoinformaticsinfochim.u-strasbg.fr/CS3_2014/Slides/CS3_2014_Gasteiger.pdf · Applications of Chemoinformatics: Drug Design Chemoinformatics has

An Introduction to Chemoinformatics - A. Leach, V. Gillet (Springer, 2007) WW

Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.

Introduction to Chemoinformatics · © João Aires de Sousa 2 Recommended reading Chemoinformatics -A Textbook, Johann Gasteiger and Thomas Engel, Wiley-VCH 2003.

Practical Chemometrics Guide - National Physical … introduction to Chemoinformatics, Kluwer Academic Publishers (2003) • S. Wold, PCA

A bibliometric analysis of chemoinformatics

Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities

AN INTRODUCTION TO CHEMOINFORMATICS - The Eye · the current interest in chemoinformatics can be ascribed to the natural enthusiasm for things new, the main reason for its emergence

Molecular Structure Representation in Chemoinformatics ...bigchem.eu/sites/default/files/School3_Schwab.pdfMolecular Structure Representation in Chemoinformatics Applications Christof

Chemoinformatics in Drug Design

System biology and chemoinformatics approaches to decode ...

CHEMOINFORMATICSdspace.cusat.ac.in/jspui/bitstream/123456789/248/1... · CHEMOINFORMATICS Certificate This is to certify that the seminar report entitled “Chemoinformatics” is

Chemoinformatics And Bioinformatics

1 Introduction to Chemoinformatics in Drug Discovery – A ... · 2 1 Introduction to Chemoinformatics in Drug Discovery – A Personal View N COOH NH CH2 CH2 COOEt H O H CH3 O Enalapril

Chemoinformatics tools for lead discovery

History and Challenges of Chemoinformatics - Unistrainfochim.u-strasbg.fr/CS3/program/material/Gasteiger.pdf · History and Challenges of Chemoinformatics ... structure elucidation

Chemoinformatics in Molecular Docking and Drug Discovery

Developing Chemoinformatics Models