Interactive Datamining of Large-Scale Screening Datasets

24
© Oellien, Ihlenfeldt, Engel, Ertl C 3 MMWS 2002 Interactive Datamining of Large-Scale Screening Datasets Klaus Engel, Thomas Ertl Visualization and Interactive Systems Group University Stuttgart Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum University Erlangen-Nuremberg

description

Interactive Datamining of Large-Scale Screening Datasets. Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum Universit y Erlangen-Nuremberg. Klaus Engel, Thomas Ertl Visualization and Interactive Systems Group Universit y Stuttgart. Overview. - PowerPoint PPT Presentation

Transcript of Interactive Datamining of Large-Scale Screening Datasets

Page 1: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Interactive Datamining of Large-Scale Screening Datasets

Klaus Engel, Thomas ErtlVisualization and Interactive Systems Group University Stuttgart

Frank Oellien, Wolf D. IhlenfeldtComputer-Chemie-Centrum University Erlangen-Nuremberg

Page 2: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

Page 3: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

Page 4: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Chemical data

0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

16000000

18000000

Merck Katalog

Synopsys PG

ACX

NCI DTP

ChemInform

Spresi

Beilstein

CAS

Current datasets

Page 5: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Multi-Variate and Multi-Dimensional Numeric Datasets Today

Change in chemical synthesis technology

• new technologies (HTS, combinatorial synthesis) experiments generate terabytes of data per year

• development of data mining and visualization tools could not keep pace

• most critical bottleneck in R&D today !

tools for interactive mining and information visualization are needed

Page 6: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data

Standard applications• barchart, 2D and pseudo 3D

scatter plots, molecular spreadsheets• limited to small subsets• platform-dependent

Our goal: applications that are• simple to use• allow straightforward interpretation of results• generalized access to tabular numeric data• platform-independent

Page 7: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

Page 8: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

3D Tools for Interactive Information Visualization

Information Visualization Applications that uses 3D capabilities of modern clients

• Glyph-based InfVis approaches

• Volume-based InfVis approaches

Page 9: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Glyph-based InfVis Tools

• 3 orthogonal axes

• color

• shape

• size

• transparency

• surface effects

• animation

• up to ~100 Glyphs

Page 10: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet

Tool Panel(filters, selection

tools, details)

Java3DCanvas

ControlPanel

Page 11: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet3D Render Panel

3D Barchart3D Glyphs

Page 12: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet3D Tool Panel

Dynamic Filter Tools

Selection Tools

Detail Tools

Page 13: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Java/Java3D InfVis Applet3D Control Panel

Page 14: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Advantages of Volume-based InfVis Tools

Databases with millions of data points – Glyph-based InfVis approaches

• produce millions of geometricprimitives

• interactive visualization not possible

– Volume-based InfVis approaches • can handle large number of

data points• interactive visualization using

low-cost graphics hardware is possible

Page 15: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

Page 16: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

ChemCodes Reaction Database

• 100 most important FGs ~75% chemistry• 100 standard reactions• Limits of standard reactions• Functional Group Compatibility• Generating Rules

Goal: Analysis of the reaction space

Page 17: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

ChemCodes - Reaction Optimization I

• Goal: Reaction Optimization: > 95% Yield

• 7 Dimensions:reagent, solvent, time, temperature,stoichiometry,reagent order,FG-compatibility

Page 18: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

ChemCodes - Reaction Optimization II

Page 19: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

FunctionalGroupCompatibilityCheck

ChemCodes - Reaction Planning

N

H H

H

O

Page 20: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Example 2: NCI Anti-tumor / Anti-viral Database

• Initiated in April 1990 (modified 1994)• ~ 250.000 compounds• ~ 30.000 with anti-tumor screening data

Enhanced NCI Database Browser• > 30 different molecular properties• up to 23 3D conformers per compound

Page 21: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Lead Compound Discovery II

Page 22: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Lead Compound Discovery II

Page 23: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Overview

Multi-variate and multi-dimensional datasets

• Motivation

• Information Visualization Techniques

• Examples (ChemCodes Inc., NCI)

• Demo

Page 24: Interactive Datamining of Large-Scale Screening Datasets

© Oellien, Ihlenfeldt, Engel, ErtlC3 MMWS 2002

Acknowledgment

• Prof. Johann GasteigerComputer-Chemie-CentrumUniversity of Erlangen-Nuremberg

• Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive SystemsUniversity of Stuttgart

• Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc.

• Marc NicklausLaboratory of Medicinal ChemistryNCI, NIH

• Deutsche Forschungsgemeinschaft