USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

28
Instant JChem Instant JChem - enabling new enabling new ways of working with data and ways of working with data and ways of working with data and ways of working with data and access to new data to work with access to new data to work with Dana Vanderwall Bristol-Myers Squibb Research Information Technology & Automation Chemaxon US UGM, Sept 2014 1

description

The introduction of Instant JChem and underlying ChemAxon technologies, along with a new data infrastructure designed with analytics in mind, has provided a platform with significantly more flexibility in bringing chemistry and data to the scientist’s desktop. We will discuss the architecture we evolved to and the myriad of new use cases supported by an improved data flow and new ways of looking at the data that have improved decision making, design, and collaboration in drug discovery.

Transcript of USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Page 1: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Instant JChem Instant JChem -- enabling new enabling new ways of working with data andways of working with data andways of working with data and ways of working with data and

access to new data to work withaccess to new data to work withDana VanderwallBristol-Myers Squibb

Research Information Technology & Automation

Chemaxon US UGM, Sept 2014

1

Page 2: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Initial State in Chemistry AnalyticsInitial State in Chemistry Analyticsy yy yCDR

SI FormsSI Forms

KnowledgeKnowledge•Annotation•Folks-onomies

Additional dataAdditional data

Manual copy Manual copy & paste, & paste, typingtyping

SI FormsSI Forms

HPLC log P vs. rat Vds

y = 0.0344x + 3.886R2 = 0.2737

4.00

4.50

5.00

5.50

log

P

ExcelExcel

•Folks-onomies

VisualizationVisualizationExportExport

2.00

2.50

3.00

3.50

0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00

Vds

HPL

C

Scatter PlotScatter Plot

Master Spreadsheet (Excel, Word)Master Spreadsheet (Excel, Word)

Additional chemical structure analyses:

Rat Pct BoundRat Pct Bound

Export Export Manual copy & pasteManual copy & paste

2

Additional chemical structure analyses:•SAR R-group analysis•Clustering (CADD and in-house solutions)•Predictive models (HERG, Solubility, Permeability; FACT)

compound compound structures, IDsstructures, IDs

Page 3: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

The DARE ProjectThe DARE Project(Data & Analytics for Research)(Data & Analytics for Research)(Data & Analytics for Research)(Data & Analytics for Research)

Simplify.

• Replace legacy app/workflow… with integrated tools for analytics• Decrease stand alone docs/reports• Put any needed calculations & predicted properties where they’re

neededneeded

Modernize

• A new product that maintains the functionality of form view…• Plus a richer set of views, tables, in-place conditional formatting,

graphs, & more chemistry functionality

• Learn by doing; established base camp in 1st yr, then ramped up

Phased approach to dev. & migration

y g y• Gradually phasing in IJC over 2013-2014

3

Page 4: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

DARE technology mapDARE technology mapUser interface

gy pgy pDrill down: web service for conc. response curves &

secondary results

SOLR Index for text queries (IBM Patent DB only)

Data Alerts

Annotations

Data Marts: New data layer for access & integration Lead

Evaluation

PAMPACellular CYP InhDWG A Enzyme

DWG A Cellular

DWG B Receptor

DWG B Cellular

Data common to most DWGs Data unique to a DWG

ss

EvaluationProfiling: Enzymes MetStab CYP

InductionDWG A

selectivityDWG B

Selectivity

InformaticaInformatica

Operational Screening

BioBook

calculated fields

Chemical structures, properties,

calculated fields

Meta Data Annotation

Web

W

eb

serv

ices

serv

ices

Central Data Repository (CDR)

4

Page 5: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

DARE TimelinesDARE TimelinesH1-2012 H2-2012 H1-2013 H2-2013 H1-2014 H2-2014

Phase 1:Informatica & ChemAxon Set-up , Prototyping and Build

Deliverable5 DWGs deployed Deliverableand Build 5 DWGs deployed Deliverable

~40 DWGs deployed

Deliverables~20 DWGs deployed

Phase 2:GUI & Datamart –Prototyping and B ild

Phase 3:GUI & Mart – Deployment

deployed

Phase 4:

Build

Phase 4:GUI & Mart – Deployment

Decommissions

5

Page 6: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Start with the basics & build upStart with the basics & build upppFoundation

• Program Specific Forms and use cases• Universal Forms (profiling platforms or compilations of data commonly

used)

Extended use cases

• Use cases requiring bespoke data structures, scripting, or visualization• Unique data sources, combinations of data, all biological data

• Hooks into internal web services: drill down for curves/secondary data

Extended functionality

6

Hooks into internal web services: drill down for curves/secondary data• Query to SOLR index• Data Alerts

Page 7: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

DatamartDatamart InfrastructureInfrastructureGeneral

• ETL from primarily CDR, some additional sources• Provides environment to create tables & other data structures for IJC

• Tables in IJC not enormously popular with users• Comfort and orientation with data in text box, fixed in position on form

‘Cell Factory’

, p• Cell = entity in oracle that effectively provides the data for one assay;

CDR queries sometimes require complex set of conditions• Captures metadata associated with cell creation, keeps them unique,

etcetc

Incremental updates

7

• Via Informatica Power Center, 15-30 min incremental updates • Gentle failure in face of long running jobs

Page 8: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Data management v1Data management v1ggBA catalogs data required for new project teamfor new project team

Passes it to DB

Manually:• New tables/entities

promoted to IJC• New data tree created

• Build Form• Add new Passes it to DB

developer to define new ETL

New data tree created• Build edges cells/columns to form

IJC IJC FormsFormsDARE DARE

D t tD t tIJC SchemaIJC SchemaCDRCDR ETL

FormsFormsData martData mart

Manual coding/scriptingRate determining step

8

Rate determining stepDB development not self documenting

Page 9: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Automated data managementAutomated data managementgg

User User

• UI to search/define/create cells, tables, calc. fields• Consumes metadata & creates meta data ‘cell’

definitioncreates creates cell/table cell/table

definition• Promotes the new table / new fields into IJC• If it’s a new entity then

o Creates a new data Tree using a data tree TemplateAdd th T bl t th d t t

Metadata Metadata UIUI RepositoryRepository

Metadata Metadata RepositoryRepository

o Adds the new Table to the new data treeo Create a new form on the new data tree

• Creates edges

Auto PromotionAuto PromotionETL

IJC IJC FormsFormsDARE DARE

Data martData martIJC SchemaIJC SchemaCDRCDR ETL

Promote QueuePromote Queue

Data martData mart

• Creates tables & columns immediately upon cell ‘activation’ 9

Page 10: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

ScaleScale

Instant JChem• 1455 forms + Grids• 288 saved queries

Data• 211 data trees• 526 ‘entities’

Traffic• 631 users (to date)• 1000-2000 db

• 474 saved lists• 10 scripts

• 8 schema• 2400 assays • 41,571 ‘fields’

connections daily

10

Page 11: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

The flexibility of [datamart + IJC] have enabled solutions well beyond the standard ‘program’ formsolutions well beyond the standard program form

IBM PatentExternal data source

Novel & multiple data structures & presentationsPatent

DB

HT Metabolite

structures & presentations

IJC

MutagenesisDBVisualizations;

Integration of custom scripts & calculations

Datamarts

ChiralAlliance

D t

scripts & calculations

Chiral Separations

Drug Safety

Data AccessIntegration active &

historical of BMS dataDrug Safety Warehouse

Integration of BMS data not in the CDR11

Page 12: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

HighHigh--Throughput Mutagenesis: SAR, but differentThroughput Mutagenesis: SAR, but differentgg g p gg p g

• Lead Evaluation Applied Genomics Research IT & Automation ComputerLead Evaluation, Applied Genomics, Research IT & Automation, Computer Aided Drug Design designed & built cloning and screening platform

• >150 mutants, testing >30 compounds12

Page 13: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

A Different Data A Different Data ScaleScaleFor each cmpd compare WT to 150 mutants For 30 compounds

13

Page 14: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Endpoint variation over mutants by compoundEndpoint variation over mutants by compound

Datamart

• Mutagenesis datamart created drawing on data from 2 operational data sources• ‘Mart generation automated & refreshed as new data is available• DataMart structure is heavily augmented based on the need of Instant JChem(IJC)y g ( )• Utilize IJC’s flexible entity relationship model & charting fxns to aid data visualization

14

Page 15: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

All compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutants

• Offered summary birds-eye view on all compounds by each individual result type (EC50, WTRATIO, KBWTRATIO etc) to identify trendsresult type (EC50, WTRATIO, KBWTRATIO etc) to identify trends

• Compound as column header- a novel pivot

15

Page 16: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Shift workload from queryquery & discovering to discovering to alerting alerting & reportingreportinggg p gp g Define what the teams want to monitor Automate the delivery of new data packages

Base case: Go find the data & construct analysis

Open SI Forms Open form query Select Select

dataExport data

Import data

Table or visualization

Table or visualization

Is my data there yet?

Q d tNew capability: Push data alerts

Is my data there yet?Is my data….

Instant JChem with

new data

Spreadsheet & link to open

project-form-list in IJC

Datamart

Query data source

in IJCin IJC

Automated email to user when new data

User data alert parameters

16

Page 17: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Alert manager (internal GWT), 2Alert manager (internal GWT), 2--way way integration with IJCintegration with IJCgg

• Grab active data tree ID and bring it to alert tool• Take all the assays under the data tree as selection

source for data alertSt thi i f ti d t h it i t DMART• Store this information and match it against DMART

• Create hit list using compound ID/lot ID as ‘permanent list’.

• Send the link to subscribers

17

Page 18: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

What do the users think about all this?What do the users think about all this?

• Change is never easyg y• Sub-populations are attracted to new capabilities and

adopt new tools and practices• Others need more encouragement; stability is critical• Maintaining the capabilities of the familiar and well

understood in the new environment a pre requisite forunderstood in the new environment a pre-requisite for complete migration

• We’re getting thereg g

18

Page 19: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Legacy application usage vs. Instant JChem

Unique Users per MonthAnnouncement of SI

Forms retirement

700

800

900

500

600

DARE

SI

200

300

400SI

0

100

200

Mar 2014 Apr 2014 May 2014 Jun 2014Mar 2014 Apr 2014 May 2014 Jun 2014

19

Page 20: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Reduced number of data sets exported for analysis

Number of Data Exports per month

2600

2700

Number of Data Exports per month

2400

2500

2300

2400

2100

2200

1900

2000

2014 - MAR 2014 - APR 2014 - May 2014 - Jun

20

Page 21: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Monitor URL Sharing in IJC

70

80

Launched URLs

50

60

70

30

40 Total Form URL

List URLs

Query URLs

10

20

01 2 3 4 5 6

2014

21

Page 22: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

a moment for reflectiona moment for reflection

cause for dancing• Conditional formatting!

what we learned• Train just in time

coaching• More thorough regressionConditional formatting!

• Grid view• Query builder• Query/browse

performance!

Train just in time• Listen; listen some more• STOP the presses if it’s not

right- they’d rather wait• Simple >> rich

More thorough regression testing

• Clearer release notes• Login/start-up

performanceperformance!• Tabbed panes• URL sharing*• Help from CXN!!

• Simple >> rich• Provide a thread of

continuity to lead through new tools

• Don’t disrupt the

performance• List query result retains

original order• Cleaner Excel export, keep

structure orientation• Don t disrupt the workflow, let it evolve

structure orientation• More conversations!• Web services• Plexus! 22

Page 23: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

The DARE teamThe DARE team

Heather Artman Dong LiAcknowledgementsScientific Computing

Core TeamHeather ArtmanDawn CohenJohn Duncan

Dong LiMark ManfrediMinimol Mathew

Scientific Computing ServicesRay ReichardPadma Vellanki

Ramesh DurvasulaJames EwenLisa Johnson

Christa MusialMatthias NolteAnusha Ramanathan

Padma VellankiThomas CurnealMike Beluch

Lisa JohnsonSangeet Khullar

Anusha RamanathanDavid VanderbrookeDana Vanderwall

Nelly MasiasMahesh Nawade

BMS Internal 23

Page 24: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

End user supportEnd user support

Support email groupUser Community SharePoint

Training and reference, FAQs, External links, contact info All reported issues and status All reported issues and status

– [open, in progress, scheduled fix/improvement, resolved]

Internal BMS User Group Meeting1 h thl i t d t ti & t i i f 2 3 i l 1 hr. monthly session to cover demonstration & training for 2-3 special topics or features

Topics drawn from suggestions and requests for more info or training; topics covered to date:topics covered to date:

– IJC: Query Builder; Visualization; Sharing by URL; exporting; working list (pick list); R group decomposition; Markush draw/search

– JChem4XL- patent doc creation

BMS Internal

p– IBM Patent Database; Metabolite Database

24

Page 25: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

Assay meta data Assay meta data yy• Describe assay protocol & conditions

in controlled vocabulary

Biological description Targety

• Protocols would have a minimum set of fields that would have to be populated before going into production

Gene name (look-up, and capture locus link)

SpeciesC ll t

• Opportunity for business rules that guide the protocol registration

• All downstream systems would utilize th f k & t d t

Cell typeAssay description

Assay typeA dthe same framework & meta-data

• Propose adopting established standard, aligning/collaborating with

Assay mode Detection method

ResultsR lt tNIH BARD Project & BioAssay

Ontology (BAO)

• Requires process & roles for

Result type Modifier Units

tRequires process & roles for maintaining up to date dictionaries and governance

etc

25

Page 26: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

BAO scope and purposeBAO scope and purposep p pp p p• BAO to describe assays and screening results

• Defines relevant assays and result annotations• Provides controlled terminology

Formalizes knowledge of assays and screening results• Formalizes knowledge of assays and screening results• Describes and formalizes screening campaigns, i.e.

relationship between assays in terms of their use

• BAO addresses problems with using data and facilitates

• Leveraging existing data in discovery projects• Global analysis across diverse data sets

I t ti f d t f diff t• Integration of data from different resources

26

Page 27: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

What do we need to describe assaysWhat do we need to describe assaysyy

27

Page 28: USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem

28