Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and...

21
Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director IT, Cenix BioScience Presentation for: KNIME User Group Meeting 2011 Zürich, March 3 rd 2011

Transcript of Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and...

Page 1: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Primetime for KNIME:

Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data

F. Oliver Gathmann, Ph. D.Director IT, Cenix BioScience

Presentation for:KNIME User Group Meeting 2011

Zürich, March 3rd 2011

Page 2: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Overview

• Explain “RNAi Screening”

• IT infrastructure for HT-HCS

(High-Throughput, High-Content Screening)

• Workflow software evolution at Cenix: past, present, and future

Page 3: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

How RNAi works

siRNA

Unwinding of siRNA

Target mRNA recognition

Target mRNA

RISC

Degradation of mRNA

Explain “RNAi Screening”

• First Take Home Message:RNAi allows you to investigate the function of genes by knocking them down selectively

Page 4: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

The Drug Discovery Pipeline

Target Discovery

Target Validationin vitro

Target Validation

in vivo

Lead Identification

Lead Optimization

ADME/Tox

ClinicalPhase I

ClinicalPhase II

ClinicalPhase III Registration

Target Discovery (in vitro)Target Discovery (in vitro)Direct LoF ScreensDirect LoF Screens Modifier ScreensModifier Screens

Target Validation (in vitro)Target Validation (in vitro)Phenotypic ProfilingPhenotypic Profiling Phenotypic TitrationPhenotypic Titration

Explain “RNAi Screening”

• Second Take Home Message:“Early In The Drug Discovery Pipeline” means high-throughput and lots of data

Page 5: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Information Layers

Metabolic Pathway

Gene

Silencing Reagent

Experiment

Hit

Phenotype

Explain “RNAi Screening”

Sequence, species, pathway annotations, transcripts

Structure, targeted Gene(s), stock and order information

Meta data (sample and control positions), production data

Cell images, morphology data

Phenotype annotations, knock down, reproducibility, significance

Gene network, disease conditions

• Last Take Home Message:“High-Content” means complex data structures

Page 6: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Cenix

IT Infrastructure for HT-HCS

DatabaseServer

FileServer

Farm

Automated Microscope

Pipetting Robot

Tube Handler

LDAPServer

Scientist WorkstationsLIMS

Server

Page 7: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Terminology: “Workflows”

• Process-centric: mapping a work process in the physical world; focused on data acquisition

• Data-centric: mapping an algorithm; focused on data processing

• Not always clear-cut, but still useful distinction

Workflow software evolution

“Process-centric Workflows”vs.

“Data-centric Workflows”

Page 8: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Primordial Process Workflows: DesignWorkflow software evolution

Page 9: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Workflow software evolution

Primordial Process Workflows: Implementation

Page 10: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Data Analysis Workflows: Excel

• In the beginning, there was Excel.

+ Advantages: • Ubiquitous and easy to use• Full flexibility for the end user (in theory, anyways)

– Disadvantages:• Hard to debug• Nightmarish version control• Slow and cumbersome

Workflow software evolution

Page 11: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Data Analysis

Storage

Experiment DesignData Acquisition

Image Processing

Data Analysis Workflows: Excel

qPCR

Autoscope

Plate reader

LIMSServer

Job Server

File Server DatabaseServer

LIMSClient

Excel

Design experiment

Submit experiment

Post image data

Engines

Run image processing job; store phenotype data

Load phenotype data files; run analysis; generate graphs

Store image data

Excel

Img. AnalysisClient

Submit image processing job

Workflow software evolution

Store experiment data; track experiment; wait for image data

Page 12: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Data Analysis Workflows: Web Tools

• Next: Web tools with tabular data as input and output.

+ Advantages:• Encapsulation of complex functionality• Centralized administration• Executed on server

– Disadvantages:• Low flexibility• Frugal web interface

Workflow software evolution

Page 13: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Data Analysis

Storage

Experiment DesignData Acquisition

Image Processing

Data Analysis Workflows: Web Tools

qPCR

Autoscope

Plate reader

LIMSServer

Job Server

File Server DatabaseServer

LIMSClient

Excel

Engines

Load result data files; generate graphs

Img. AnalysisClient

Web ToolsServer Browser

Spotfire

Run analysis

Download result data files

Workflow software evolution

Upload phenotype and design data files

Page 14: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Data Analysis Workflows: KNIME!

• KNIME: A giant leap forward– Flexible and easy to use and yet robust,

scalable, performant and extensible!

• Current KNIME infrastructure:– Centrally administered Windows and Mac

installations, configured to point to a user-specific workspace on the file server

– Workflow curation policy: Versioned reference workflows for each project, owned by power users

– Experiment meta data provided through database nodes, raw data through files

– Complex statistics implemented with (remote) R scripting nodes

Workflow software evolution

Page 15: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Data Analysis

Storage

Experiment DesignData Acquisition

Image Processing

Data Analysis Workflows: KNIME!

qPCR

Autoscope

Plate reader

LIMSServer

Job Server

File Server DatabaseServer

KNIME

Spotfire

LIMSClient

Excel

Engines

Workflow software evolution

Img. AnalysisClient

Load result data files; generate graphs

Run workflow on phenotype data and experiment design

Page 16: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Primetime: Requirements

• Streamlining the Screening Pipeline– Analysis has become the bottleneck: Potential for

10-20 % increase in overall throughput

• Even “Higher” Content:– More parameters using advanced analysis methods– Single object rather than population data– Integrate gene annotations and pathway data

• Enable customers to explore and (re-)analyze delivered data sets– Selecting/weighing parameters– Tight integration with Spotfire, including raw data

Workflow software evolution

Page 17: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Data Analysis

Storage

Experiment DesignData Acquisition

Image Processing

qPCR

Autoscope

Plate reader

LIMSServer

KNIMEServer

Job Server

File Server DatabaseServer

KNIME

Spotfire

LIMSClient

Excel

Submit image analysis job; wait for phenotype data

Engines

Post phenotype data

Post phenotype data; run workflow on phenotype data and experiment design; post result data

Retrieve result data; run Spotfire

Store image data; launch image processing workflow

Primetime: IRISWorkflow software evolution

“Integrated computational environment for high throughput RNA Interference Screening”

Page 18: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Primetime: Beyond IRIS

• Use KNIME for process-centric workflows as well

• This would require– Standard interface to the LIMS server to drive the

“business logic” (REST)– Easily configurable User Interfaces to parameterize

processing steps (something like RGG?)

Workflow software evolution

Page 19: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Primetime: Beyond IRIS

• KNIME “solutions”: Hide complexity of workflows by exposing only a few “knobs” to the end user

• Features:– Again, a User Interface generator to make it easy

for non-IT power users to create new solutions– Ideally, a way to publish the “solution” to a server

and run it remotely

Workflow software evolution

Page 20: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Conclusions

• KNIME has quickly become an integral part of the HT-HCS screening pipeline at Cenix

• Current work on the data analysis infrastructure around KNIME is focused on tight integration with the LIMS server, with Definiens for image processing, and with Spotfire for data visualization

• Further down the road, we plan to use KNIME for all workflows at Cenix and to build pre-packaged “solutions”

Page 21: Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director

Thank you! Any questions?