Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and...
Transcript of Primetime for KNIME · 2017-05-23 · Primetime for KNIME: Towards an Integrated Analysis and...
Primetime for KNIME:
Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data
F. Oliver Gathmann, Ph. D.Director IT, Cenix BioScience
Presentation for:KNIME User Group Meeting 2011
Zürich, March 3rd 2011
Overview
• Explain “RNAi Screening”
• IT infrastructure for HT-HCS
(High-Throughput, High-Content Screening)
• Workflow software evolution at Cenix: past, present, and future
How RNAi works
siRNA
Unwinding of siRNA
Target mRNA recognition
Target mRNA
RISC
Degradation of mRNA
Explain “RNAi Screening”
• First Take Home Message:RNAi allows you to investigate the function of genes by knocking them down selectively
The Drug Discovery Pipeline
Target Discovery
Target Validationin vitro
Target Validation
in vivo
Lead Identification
Lead Optimization
ADME/Tox
ClinicalPhase I
ClinicalPhase II
ClinicalPhase III Registration
Target Discovery (in vitro)Target Discovery (in vitro)Direct LoF ScreensDirect LoF Screens Modifier ScreensModifier Screens
Target Validation (in vitro)Target Validation (in vitro)Phenotypic ProfilingPhenotypic Profiling Phenotypic TitrationPhenotypic Titration
Explain “RNAi Screening”
• Second Take Home Message:“Early In The Drug Discovery Pipeline” means high-throughput and lots of data
Information Layers
Metabolic Pathway
Gene
Silencing Reagent
Experiment
Hit
Phenotype
Explain “RNAi Screening”
Sequence, species, pathway annotations, transcripts
Structure, targeted Gene(s), stock and order information
Meta data (sample and control positions), production data
Cell images, morphology data
Phenotype annotations, knock down, reproducibility, significance
Gene network, disease conditions
• Last Take Home Message:“High-Content” means complex data structures
Cenix
IT Infrastructure for HT-HCS
DatabaseServer
FileServer
Farm
Automated Microscope
Pipetting Robot
Tube Handler
LDAPServer
Scientist WorkstationsLIMS
Server
Terminology: “Workflows”
• Process-centric: mapping a work process in the physical world; focused on data acquisition
• Data-centric: mapping an algorithm; focused on data processing
• Not always clear-cut, but still useful distinction
Workflow software evolution
“Process-centric Workflows”vs.
“Data-centric Workflows”
Primordial Process Workflows: DesignWorkflow software evolution
Workflow software evolution
Primordial Process Workflows: Implementation
Data Analysis Workflows: Excel
• In the beginning, there was Excel.
+ Advantages: • Ubiquitous and easy to use• Full flexibility for the end user (in theory, anyways)
– Disadvantages:• Hard to debug• Nightmarish version control• Slow and cumbersome
Workflow software evolution
Data Analysis
Storage
Experiment DesignData Acquisition
Image Processing
Data Analysis Workflows: Excel
qPCR
Autoscope
Plate reader
LIMSServer
Job Server
File Server DatabaseServer
LIMSClient
Excel
Design experiment
Submit experiment
Post image data
Engines
Run image processing job; store phenotype data
Load phenotype data files; run analysis; generate graphs
Store image data
Excel
Img. AnalysisClient
Submit image processing job
Workflow software evolution
Store experiment data; track experiment; wait for image data
Data Analysis Workflows: Web Tools
• Next: Web tools with tabular data as input and output.
+ Advantages:• Encapsulation of complex functionality• Centralized administration• Executed on server
– Disadvantages:• Low flexibility• Frugal web interface
Workflow software evolution
Data Analysis
Storage
Experiment DesignData Acquisition
Image Processing
Data Analysis Workflows: Web Tools
qPCR
Autoscope
Plate reader
LIMSServer
Job Server
File Server DatabaseServer
LIMSClient
Excel
Engines
Load result data files; generate graphs
Img. AnalysisClient
Web ToolsServer Browser
Spotfire
Run analysis
Download result data files
Workflow software evolution
Upload phenotype and design data files
Data Analysis Workflows: KNIME!
• KNIME: A giant leap forward– Flexible and easy to use and yet robust,
scalable, performant and extensible!
• Current KNIME infrastructure:– Centrally administered Windows and Mac
installations, configured to point to a user-specific workspace on the file server
– Workflow curation policy: Versioned reference workflows for each project, owned by power users
– Experiment meta data provided through database nodes, raw data through files
– Complex statistics implemented with (remote) R scripting nodes
Workflow software evolution
Data Analysis
Storage
Experiment DesignData Acquisition
Image Processing
Data Analysis Workflows: KNIME!
qPCR
Autoscope
Plate reader
LIMSServer
Job Server
File Server DatabaseServer
KNIME
Spotfire
LIMSClient
Excel
Engines
Workflow software evolution
Img. AnalysisClient
Load result data files; generate graphs
Run workflow on phenotype data and experiment design
Primetime: Requirements
• Streamlining the Screening Pipeline– Analysis has become the bottleneck: Potential for
10-20 % increase in overall throughput
• Even “Higher” Content:– More parameters using advanced analysis methods– Single object rather than population data– Integrate gene annotations and pathway data
• Enable customers to explore and (re-)analyze delivered data sets– Selecting/weighing parameters– Tight integration with Spotfire, including raw data
Workflow software evolution
Data Analysis
Storage
Experiment DesignData Acquisition
Image Processing
qPCR
Autoscope
Plate reader
LIMSServer
KNIMEServer
Job Server
File Server DatabaseServer
KNIME
Spotfire
LIMSClient
Excel
Submit image analysis job; wait for phenotype data
Engines
Post phenotype data
Post phenotype data; run workflow on phenotype data and experiment design; post result data
Retrieve result data; run Spotfire
Store image data; launch image processing workflow
Primetime: IRISWorkflow software evolution
“Integrated computational environment for high throughput RNA Interference Screening”
Primetime: Beyond IRIS
• Use KNIME for process-centric workflows as well
• This would require– Standard interface to the LIMS server to drive the
“business logic” (REST)– Easily configurable User Interfaces to parameterize
processing steps (something like RGG?)
Workflow software evolution
Primetime: Beyond IRIS
• KNIME “solutions”: Hide complexity of workflows by exposing only a few “knobs” to the end user
• Features:– Again, a User Interface generator to make it easy
for non-IT power users to create new solutions– Ideally, a way to publish the “solution” to a server
and run it remotely
Workflow software evolution
Conclusions
• KNIME has quickly become an integral part of the HT-HCS screening pipeline at Cenix
• Current work on the data analysis infrastructure around KNIME is focused on tight integration with the LIMS server, with Definiens for image processing, and with Spotfire for data visualization
• Further down the road, we plan to use KNIME for all workflows at Cenix and to build pre-packaged “solutions”
Thank you! Any questions?