Code camp 2015 visual programming mm

22
Visual Programming Environments for Science and Business MITCH MILLER SCIENTIFIC THINKING CODE CAMP 2015 SEPTEMBER 19, 2015

Transcript of Code camp 2015 visual programming mm

Page 1: Code camp 2015 visual programming mm

Visual Programming

Environments for

Science and BusinessMITCH MILLER

SCIENTIFIC THINKING

CODE CAMP 2015

SEPTEMBER 19, 2015

Page 2: Code camp 2015 visual programming mm

Disclaimer

This talk represents my opinion and personal experience using 2 fine

software systems developed by third parties

The software systems shown are very complex and have hundreds of components. I have only worked with a small number.

Every task shown today can be accomplished in multiple ways. I’m

only showing of those ways.

Page 3: Code camp 2015 visual programming mm

Overview

Introduction: first demo

What is a ‘visual programming environment’

The two systems we’ll look at today

What are these systems capable of?

Second set demos (in-depth)

Page 4: Code camp 2015 visual programming mm

Demo 1: set-up

Task: produce report of all compounds registered during January

Page 5: Code camp 2015 visual programming mm

Visual Programming: informal

definition

Drag functional components onto canvas to create program

Configure most components by setting parameters

Connect components to route data from one to another

Run and observe data traveling down the lines

Page 6: Code camp 2015 visual programming mm

Component types

File I/O

Read/write text files

Read/write MS Office documents

XML

JSON

PDF

Database access

Connect

Query

Update

Page 7: Code camp 2015 visual programming mm

Component types (continued)

Web service consumption

Domain-specific processing

Chemical structure I/O

Chemical structure processing and analysis

Sequence processing

Extensibility

Add your own libraries for more sophisticated processing

Page 8: Code camp 2015 visual programming mm

Component types (continued)

Visualization

Graphing

Statistical calculations

Scripting

Tip: aim for brief scripts

Data transformation

If/else processing

Filtering

Column selection

And many more…

Page 9: Code camp 2015 visual programming mm

KNIME

Originally a production of the University of Konstanz, Germany 2004

Currently produced by KNIME.com AG, a company in Zurich,

Switzerland

KNIME stands for KoNstanz Information MinEr

Pronounced “Nighm”

A general purpose data analytics platform

Free version available for download

For-sale version available with added extensions

Page 10: Code camp 2015 visual programming mm

KNIME (continued)

Java based

Written in Java

Scripted, extensible in Java

URL: https://www.knime.org/

Page 11: Code camp 2015 visual programming mm

Pipeline Pilot

Developed and sold by BIOVIA, San Diego, CA

Originally developed by Scitegic, San Diego in 1999

Designed for scientists to “rapidly create, test and publish scientific

services that automate the process of accessing, analyzing and

reporting scientific data”

(http://accelrys.com/products/collaborative-science/biovia-

pipeline-pilot/)

Client-server system

Commercial product

Extensible using .NET and Java

Scripted using an original language, ‘PilotScript’

Page 12: Code camp 2015 visual programming mm

KNIME Terminology

Components are called “Nodes”

Programs are “Workflows”

Reusable sets of Nodes are “Metanodes”

Groups of related Nodes are “Extensions”

Page 13: Code camp 2015 visual programming mm

Pipeline Pilot Terminology

Components are called “Components”

Programs are “Protocols”

Reusable sets of Components are “Subprotocols”

Groups of related Components are “Packages”

Different protocols can be combined

One protocol provides initial UI –including a Web form

A second protocol handles form data processing (‘work protocol’)

Page 14: Code camp 2015 visual programming mm

Different systems shown today

serve different populations

KNIME can be used ad hoc on the desktop of a power user. It is also

used by companies in a variety of industries

Pipeline Pilot is geared towards scientists and is part of an enterprise system and requires a server installation

Page 15: Code camp 2015 visual programming mm

Programs can be deployed outside

the development client

Give users a URL to access your program

Users of BIOVIA Electronic Lab Notebook and other software can access

Pipeline Pilot protocols outside the Pipeline Pilot UI

Users access a Web application that shows them the data they’re

looking for in a purpose-built user interface

The application does not look like the system with which it was built

For-sale version of KNIME Server provides similar functionality

Page 16: Code camp 2015 visual programming mm

Server Features

User access configuration

Shared data sources

Automatic jobs

Etc.

Page 17: Code camp 2015 visual programming mm

Second demo

Exploration of data set using KNIME and Pipeline Pilot

Data set comes from National Cancer Institute (NCI)’s Developmental

Therapeutics Program (DTP)

Results of laboratory tests for activity against 60 types of human cancer

cell lines

Data freely available:

https://dtp.cancer.gov/discovery_development/nci-60/default.htm

Page 18: Code camp 2015 visual programming mm

Additional demos

Pipeline Pilot Web Port sample

Page 19: Code camp 2015 visual programming mm

Suggestions for getting started

Download the KNIME software(knime.org)

Install on your computer

Look at the sample workflows

Start simple; build up

Page 20: Code camp 2015 visual programming mm

Types of applications

Reporting

Data set comparisons

ETL

Data Analysis

Page 21: Code camp 2015 visual programming mm

References

Scholarly article on KNIME and Pipeline Pilot

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3414708/

www.knime.org

https://www.youtube.com/user/KNIMETV

http://accelrys.com/products/collaborative-science/biovia-

pipeline-pilot/

https://dtp.cancer.gov/

Page 22: Code camp 2015 visual programming mm

Who is your speaker?

Mitch Miller, Ph.D. in Chemistry and 20+ years of IT experience

Independent consultant: Scientific Thinking, LLC

[email protected]

Some recent projects

Ongoing custodian of one chemical database implementation for ChemIDplus project within the National Library of Medicine

Upgraded 10-year-old Java Servlet lab workflow application to latest version of JDK, Internet Explorer 11 and implemented enhancements

Windows service to handle communication between 2 legacy applications

Import wizard for chemical array designer

Merged a set of chemical databases and harmonized data