HARNESSING COMPREHENSIVE SMALL MOLECULE PROJECT DATA … · PRINCIPAL SCIENTIST STEPHANIE...
Transcript of HARNESSING COMPREHENSIVE SMALL MOLECULE PROJECT DATA … · PRINCIPAL SCIENTIST STEPHANIE...
PRINCIPAL SCIENTIST
STEPHANIE GEUNS-MEYER
HARNESSING COMPREHENSIVE SMALL MOLECULE PROJECT DATA TO INFORM DESIGN AND DECISIONS
Symposium: “Streamlining Drug Discovery and Development: Leveraging
data analysis and modelling for design,” Cambridge, MA April 11, 2016
2
HARNESSING PROJECT DATA: TOPICS
• Assembly and architecture of the project tables
• Managing pivoted and unpivoted data in a single table
• Multiparameter optimization using the dose equation and ADME
models
• Value for external collaborations
• Examples of how scientists interact with the data
• Exhortation
3
TEMPLATE FOR BUILDING PROJECT SPOTFIRE FILES
• Project-specific inputs
– Project compounds, optional others by assay
– Assays/result types from RG* project view
– Chemotype identification by SMARTS queries
• Cross-project inputs
– “Non-project” assays from RG lists
– In vivo PK data
– ADME predictions (à la carte)
– Calculated physical properties
– Registration details, inventory
• Data table (.txt)
– Data mostly pivoted
(one compound = one row)
– Exception: In vivo PK (extra row
for each experiment)
Pipeline Pilot protocol runs daily
*RG = “Research Gateway,” a web application that provides access to multiple data sources
**Active link is maintained between source table files (.txt, xlsx,.csv) and .dxp file; data is pulled from sources when
Spotfire file is opened or data is reloaded
** • Other tables
– Collaborator
data
– PD data
– Assay queues
– Other notes
**
Spotfire file
4
MODULAR PIPELINE PILOT PROTOCOL FOR PROJECT FILES
Lei Jia
RG
Assay list (name,
result type,
description)
In vivo
assays RG
Description
dictionaries
Updated
assay list
Assay data
(non-in vivo)
Project
compounds
External
compounds,
chemotypes
Compound
matching
Chemotype
identification
Property
calculation ADME
Clean up, ordering, formatting,
data type assignment
Project-
specific
operations
Spotfire
data table
RG RG
Assay Assembly Chemistry Prediction Database
5
MODULAR PIPELINE PILOT PROTOCOL FOR PROJECT FILES
Pipe 1: Inclusion and naming
of project assays/result types
Pipe 2: Retrieval of assay
data, compound registration
info, predictions; data
assembly
Pipe 3: Data formatting for
Spotfire
Lei Jia
6
DEALING WITH MULTIPLE ROWS PER COMPOUND: KEY SPOTFIRE CALCULATED COLUMN FUNCTIONS
• “If” (or “case”) functions: conditional use of column values in a calculated column
– “Dose-normalized rat IV”: If (([CL] is not null) and ([Animal Species]="Rat"),[AUCinf] / Real([dose]),null)
• “Over” function: cascade values down through all the rows that share a compound ID
– “Rat CL”: Avg(If(([Animal Species]="Rat") and ([Route of Administration]="IV"),[CL],null)) OVER ([COMPOUND])
• “Rank” function: remove duplicate rows (since cross-row data is extracted into new columns)
– “Row rank for AMG ID (set = 1 to remove duplicate rows)”: Rank(RowId(),"asc",[COMPOUND])
Calculated columns Imported columns
7
STANDING VISUALIZATIONS IN PROJECT SPOTFIRE FILES POPULATE WITH NEW DATA EVERY DAY
All Amgen medicinal chemists have access to Spotfire desktop software with Lead Discovery permissions, current v. 6.5
8
COMBINING PARAMETERS USING THE DOSE EQUATION
Plasma conc_unb / Human cell IC50
In v
ivo
re
sp
on
se (
% in
hib
itio
n)
Concavg_unb / Cell IC50 3
“efficacy”
Quantitative pharmacology (QP)
𝐃𝐨𝐬𝐞 =Cavg,unb • Clint,u • t
𝐟𝐚
Assume t = 1/day
and fa = 1*
*t = dosing interval (day/dose); fa = fraction absorbed
𝐃𝐨𝐬𝐞 Cavg,unb • Clint,u
𝐃𝐨𝐬𝐞 3 • Cell IC50 • Clint,u
𝐃𝐨𝐬𝐞 3 • Cell IC50 • Hu hep Clint,u
Based on QP relationship
Cavg,unb 3 • Cell IC50
Based on a good IVIVC using
hepatocytes:
Clint,u Hu (scaled) hep Clint,u
Angel Guzman-Perez
9
PROPERTY-BASED IN SILICO MODELS: ESTIMATING HUMAN HEPATOCYTE DATA
Property-based in silico
(PBIS) models:
• Are most effective when
trained with many
structurally diverse
compounds
• Can only predict related
compounds with high
confidence
• Work best for data that is
bulk property-driven and
poorly for target potency
Models may be general or project-specific, and are included in the project Spotfire files à la carte;
modeling approach: Hua Gao et al., Drug. Metab. Dispos. 2008, 2130-2135.
Hu hepatocyte Fu
(actual vs. predicted)
Hu Hep CL_unbound
(actual vs. predicted) Y=X
* 2
/ 2
Predicted_project_human_Hep_CLu
Hu
He
p C
L_
un
b (
ba
se
d o
n p
red
_h
ep
_F
u)
Hu
Hep
Fu
Predicted_project_hepatocyte_Fu
Hua Gao
Y=X * 2
/ 2
Naïve data
Training set
Y=X * 2
/ 2
10
ROBUST PBIS MODELS CAN PROVIDE EARLY DOSE ESTIMATES FOR COMPOUND DESIGN AND RANKING
• Experimental data for Hu hep CL, hep Fu
and hu plasma protein Fu is limited
and/or takes time to generate
• Multiple combinations of IC50_unbound
and Hhep Clint_unb provide comparable
estimated dose
• Caution needed with diverse chemical
space (models here worked well with
lipophilic acids)
11
Provides crucial foundation for shared understanding of the data
MED CHEM COLLABORATION EXPEDITED BY SPOTFIRE INTEGRATION OF EXTERNAL PKDM DATA
On a weekly basis our
external collaborator:
• Sends .xlsx file with all of
their project PKDM data
• Receives up-to-date
embedded Spotfire file
containing all combined
data and visualizations
12
ORGANIZED FILTERS IN A TEXT VISUALIZATION: HELPING SCIENTISTS FILTER TO THE COMPOUNDS THEY WANT
*For example, “WB assay candidate” combines limits based on six assays or calculations that are also represented in
the slider filters (Hu WB empty; Enzyme IC50 < 0.05; Hu cell < 0.03; est dose fr enz < 1000; HLM < 100 or empty; Hu
PXR < 40 or empty; CYP ratio < 300 or empty)
• Every column in Spotfire is
available as a filter in the filter
panel – helpful to mirror key ones
in a text view
• Calculated columns that detail
multiple filter limits may be helpful
to expedite triage*
• Default: filters affect all
visualizations. Also possible to set
visualization-specific filters, or filter
based on marking
13
ColorBrewer divergent scheme “PiYG” is colorblind friendly
THE COLOR OF MULTIPARAMETER SAR
Example is a CNS target. Color scheme for leftward columns is qualitatively based on CNS MPO parameter distributions described in:
Wager et al., ACS Chem Neurosci. 2010, 420-434; Wager et al., ACS Chem. Neurosci. 2010, 435-449; Gunaydin ACS Med. Chem. Lett. 2016, 89-93.
Colorbrewer 2.0 “Color advice for cartography”: http://colorbrewer2.org
14
PAIRWISE ANALYSIS: COMPREHENSIVE SAR OF ONE CHANGE
*Compound list can include thousands of compounds unrelated to the .rxn query (which is a mapped Chemdraw reaction
saved in Isisdraw format); for example, all compounds assigned to a project Hua Gao
1. Compound list & .rxn file are inputs
for the Pairwise Webport tool*
2. File (.csv) is e-mailed to user; contain
columns for compound ID, pair
assignment, and class
3. User imports columns to Spotfire file
15
SPOTFIRE TABLE VIEW WITH CUSTOM CONCATENATED COLUMNS FOR SWIFT SLIDE GENERATION
TRPA1 program SAR: Schenkel et al., J. Med. Chem. 2016, 2794–2809
Two-minute compound table slides:
Filter on compounds, mark table cells (Ctrl-A), right click copy paste into excel bring in structures from the
database (Isentrys for Excel), copy Excel table, transpose into new sheet, paste cells into .ppt template
16
Big picture and the ability to zoom in on specific compounds
TRACKING SERIES RESULTS VS. NUMBER OF COMPOUNDS
Syntax for calculated column “Cmpd submission order within series:” DenseRank([COMPOUND],"asc",[series1])
• Desired compounds are
below line in top graph,
above line in bottom
• Marked compounds
appear orange in both
graphs
• Shape formatting
provides additional info
on counterassay result
17
SAR MATRICES USING SPOTFIRE LEAD DISCOVERY R-GROUP DECONSTRUCTION: POTENTIAL TO BECOME VERY USEFUL
• Current version at Amgen is Lead Discovery 6.5
• Version 7.0 has a solution for the non-canonical SMILES strings issue that may work for most cases. Structures on trellis
headers are limited by lack of TIBCO API; can be built in javascript as custom visual. -via Josh Bishop, PerkinElmer
6 column plot in
which low values
are always good
• Can show Chemdraw
structures on cross table
axes of SAR matrix
– Not great for multiparameter
SAR
– R-groups are not captured as
unique SMILES strings
• Can trellis by R-group
substituents to create a
multiparameter SAR matrix
– No current option to render
trellis header SMILES as
chemical structures
18
INSTANT ACCESS TO ALL DATA AND TOOLS FOR EVERYONE
http://www.sas.com/en_us/insights/articles/analytics/how-to-find-and-equip-citizen-data-
scientists.html?utm_source=TWITTER&utm_medium=social&utm_campaign=Analytics&postid=374238923
“You democratize analytics when you give people access to data and the tools to work
with it to transform the discovery process. With more people actively looking for new
answers, discovery becomes more widespread in the organization and a bigger part of
the mindset. It is practiced by people in all roles at all levels…
“Citizen data scientists also place new and different demands on the IT organization.
They want more data, including more unfiltered data…IT must recognize and cultivate
this new class of power user…
“Business leaders should embrace the democratization of analytics. It’s happening, it’s
going to be pervasive, and it’s good. But it’s not something that you’re going to
control. So don’t try the top-down approach.”
- Bernard Blais, Senior Manager, SAS Global Technology Practice
19
• Lei Jia
• Hua Gao
• Yax Sun
• Angel Guzman-Perez
• Margaret Chu-Moyer
• Data enthusiasts in med chem, molecular engineering,
PKDM, and therapeutic areas
ACKNOWLEDGEMENTS
20
EXTRAS
21
THE DOSE EQUATION -- ASSUMING HEPATIC CLEARANCE AS THE ROUTE OF ELIMINATION
𝐃𝐨𝐬𝐞 =Cavg,unb • Clb,u • t
𝐅 =
Cavg,unb • Clint,u • (1 – Clb/Qh) • t
𝒇𝒂• (1 – Clb/Qh)
= Cavg,unb• Clint,u
• t 𝐟
𝐚
• Where:
– Cavg,unb = free (unbound)
average blood concentration
– Clb,u = free (unbound) blood
clearance
– t = dosing interval
(day/dose)
– F = oral bioavailability
• Since:
– F = fa • (1 – Clb/Qh)
– Clb,u = Clint,u • (1 - Clb/Qh)
• Where:
– Fa = fraction of dose absorbed
– Clb = total blood clearance
– Qh = hepatic blood flow
– Clint,u = free (unbound) intrinsic
hepatic clearance
Angel Guzman-Perez