U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo...

30
Office of Research and Development U.S. EPA’s ToxCast, Tox21, and COSMOS Projects: Cheminformatics Approaches to Creating Data Linkages and Synergies Ann Richard U.S. EPA, National Center for Computational Toxicology Office of Research and Development The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA Society of Toxicology, Phoenix, AZ, Mar 24-27, 2014

Transcript of U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo...

Page 1: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development

U.S. EPA’s ToxCast, Tox21, and COSMOS Projects: Cheminformatics Approaches to Creating Data Linkages and Synergies

Ann Richard U.S. EPA, National Center for Computational Toxicology Office of Research and Development

The views expressed in this presentation are those of the author and do not

necessarily reflect the views or policies of the U.S. EPA

Society of Toxicology, Phoenix, AZ, Mar 24-27, 2014

Page 2: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

ToxCast & Tox21:

Chemicals, Data and Release Timelines

Set Chemicals Assays Endpoints Completion Available

ToxCast Phase I 293 ~600 ~700 2011 Now

ToxCast Phase II 767 ~600 ~700 03/2013 Now

ToxCast E1K 800 ~50 ~120 03/2013 Now

Tox21 ~8300 ~80 ~150 Ongoing Ongoing

ToxCast Phase III ~900 ~100 ~100 Just starting 2014-2015

Chemicals

Assa

ys

~600

0

Pesticides , antimicrobials, food additives, green alternatives, HPV, MPV,

endocrine reference cmpds, tox reference cmpds, NTP in vivo, FDA GRAS,

FDA PAFA, EDSP, water contaminants, exposure data, industrial, failed drugs,

marketed drugs, fragrances, flame retardants, etc.

~9000

~9000

Page 3: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

ToxCast PhII Data Release: http://www.epa.gov/ncct/toxcast/data.html

3

• ToxCast Assay Summary Activity Files

• ToxCast Assay Annotation Files

• ToxCast Chemical Library & Structure

Files (DSSTox)

• ToxCast Concentration Response Data

Files

• ToxRefDB Effect & Endpoint Data Files

Page 4: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

ToxCast PhI&PhII 1060:

# Compounds per Inventory

PesticideInerts

Water

Consumer

Antimicrobials

Green Chemistry

HPV

MPV

TRI

IRIS

EDSP

GRAS

AIR

243

217

210

91

85

232

83

216

240

130

26

90

Total In vivo

FDA CFSAN

NTP In Vivo

Donated Pharmaceuticals

PesticideActives

580

94

202

135

329

Excellent coverage of

multiple high-interest inventories

Many chemicals appear on

many lists

Broad diversity of chemical-

use categories

Large overlap with data-rich

in vivo inventories

Page 5: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

714

936 166

Synergies: Tox21/ToxCast

Chemical Overlaps with COSMOS

Tox21

~8300 (unique structures)

COSMOS ~5500 (unique

structures)

1478

ToxRef DB In vivo animal

studies

ToxCast

PhI (300)

PhII (1060)

E1k & PhIII (2300)

Significant CASRN overlap

increased shared data & knowledge

resources for these chemicals

What about non-overlapping

chemicals?

How do we utilize full chemical-

data landscape?

Page 6: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Chemical Elements to Data Integration:

Chemical representations Uses

Structure

Generic

Substance

Test

Sample

Chemical Name

CASRN

Supplier, Lot/Batch,

physical description

Features

Properties

Chemotypes, fingerprints,

phys-chem properties, ...

SMILES

InChI

Experimental

Endpoint Data

Public toxicity

datasets

Structure searching

& modeling

Chemical analogs,

Read-across,

SAR modeling

Page 7: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

-5

0

5

10

15

VAR(1)

0

1

2

3

VAR(2)

1

2

3

4

VA

R(3

)

Tox21 (7324 unique)

ToxCast e1k (+800)

ToxCast PhaseII (767)

Donated Pharma (135)

ToxCast PhaseI (293)

Chemical properties computed using “Adrianna” software by

Molecular Networks (P. Volarath)

LOG P =

Octanol/Water

partition coefficient

TPSA = log (Total

Polar Surface Area)

Complexity = log

(complexity based on

paths, branching,

atoms)

ToxCast & Tox21 Property Space

Page 8: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

8

Estimating Toxicity Mechanism Coverage: DEREK (LHASA) Predictions for ToxCast PhII (1060)

0 10 20 30 40 50 60 70 80 90

Halogenated benzene

Polyhalogenated aromatic

Alkylating agent

Phenol or precursor

Organophosphorus ester

Alkyl ester of phosphoric or phosphonic acid

Substituted pyrimidine or purine

Aromatic primary or secondary amine

1,2-Dihalogenated hydrocarbon

beta-O/S-Substituted carboxylic acid or…

Polyhalogenated benzene

Alkyl aldehyde or precursor

Alkylphenol

Hydrazine or precursor

Simple aniline or precursor

Di- to poly-halogenated alkane or cycloalkane

HERG Pharmacophore I

Organophosphorus di- or tri-ester

1,2-Ethyleneglycol or derivative

Aromatic nitro compound

328/450 unique DEREK

alerts fired across entire

dataset

128 alerts fired 5 or more

times across dataset

DEREK predicts 1 or

more toxicity endpoints for

80% of chemicals

DEREK predicts 3 or

more endpoints for 40%

chemicals

Page 9: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Chemistry: What’s needed?

Incorporate chemical information into usable tools for

chemical prioritization & safety assessments

Publicly available data & computational tools &

resources for chemists, toxicologists & modelers

to access & utilize chemical information

Harvesting of existing chemical activity (in vitro, in vivo)

data into databases & computational forms

Integration of available data resources (HTS, in vivo)

Cheminformatics foundation to enable structure modeling

Ability to “look across” data (HTS, in vivo, chemical)

to form hypotheses, guide analog selection, and

improve prediction models

Data!

Public availability

Transparency

Tools

Usability

Page 10: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Public Resources:

EPA ToxCast On-line Resources

>300K structures >16K structures

Data Integration Chemicals

HTS assay results

In vivo data

Product categories

Analysis tools

iCSS Dashboard

>2K structures

Page 11: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Public Resources:

Tox21 Chemical & Bioassay Data

DSSTox:TOX21S

structures

Tox21 assays x

ToxCast cmpds

PubChem: Tox21

88 bioassays

9762 compounds

Page 12: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Public Resources: COSMOS DB v1.0

http://www.cosmostox.eu/

• >12K toxicity studies across 27 endpoints for more than 1,600 compounds •US FDA PAFA content donated by US FDA Office for Food Additives Safety (OFAS) and

oRepeatToxDB compiled by COSMOS Consortium.

•Endpoints including both repeat dose toxicity studies and genetic toxicity data.

•Toxicity data searchable by endpoints, test system, route of exposure, sites or other details.

• >80K records, 40K unique structures •Searchable by name, CAS, structure, structure-

similarity

Page 13: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Public Resources: KNIME Chemistry

Data Analytics https://www.knime.org/

• Workflows can be freely published & shared

– reproducible & transparent

– promotes quality standards

• Scripting for “non-programmers”

• Using to improve quality of structures in ACToR and efficiency of DSSTox curation

• KNIME chemotyper implemented in multiple COSMOS projects & workflows

KNIME Workflow developed by Kamel Mansouri, ORISE PostDoc, NCCT

Structure processing

SMILES, InChI

Chemical properties

Fingerprinting

Structure similarity

Statistics

Visualization

Page 14: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Public Resources:

Chemotyper & ToxPrint Chemotypes

Developed by Altamira & Molecular Networks, Funded by US FDA

Chemotyper allows visualization of

chemotypes in an imported

structure inventory (e.g., ToxCast)

Chemotyper “fingerprint” files generated for ToxCast & Tox21 inventories

ToxPrint feature set designed

to capture important structural

frameworks, fragments and

elements spanning inventories

of toxciological & regulatory

interest to EPA, FDA.

Page 15: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

15

ToxCast ToxPrint Chemotype “Fingerprints”

DS

ST

ox_G

SID

bond:C

=O

_carb

on

yl_

gene

ric

chain

:aro

matic

Alk

ane_P

h-

C1_acyc

lic_generic

bond:C

OH

_alc

ohol_

generic

bond:N

C=

O_am

inocarb

on

yl_

gene

ric

bond:C

(=O

)N_carb

oxam

ide_ge

ne

ric

ring:h

ete

ro_[6

]_Z

_g

ene

ric

bond:C

X_halid

e_aro

ma

tic-X

_gen

eric

bond:C

OH

_alc

ohol_

alip

hatic

_gene

ric

bond:C

N_am

ine_alip

hatic

_generic

bond:C

N_am

ine_aro

matic

_gene

ric

bond:C

(=O

)O_

ca

rbo

xylic

Acid

_generic

bond:C

X_halid

e_alk

yl-X_g

ene

ric

chain

:alk

eneC

yclic

_eth

ene_ge

ne

ric

chain

:alk

eneLin

ear_

mono-

ene_eth

ylene_

gene

ric

bond:S

=O

_sulfo

nyl_

gene

ric

ring:h

ete

ro_[6

]_N

_p

yrid

ine_ge

ne

ric

bond:C

N_am

ine_te

r-N_gen

eric

bond:C

C(=

O)C

_keto

ne_g

ene

ric

ring:h

ete

ro_[5

_6

]_Z

_ge

ne

ric

bond:C

N_am

ine_pri-N

H2_generic

bond:C

X_halid

e_gene

ric-X

_dih

alo

_(1

_2

-)

bond:C

N_am

ine_sec-N

H_generic

ring:h

ete

ro_[6

_6

]_Z

_ge

ne

ric

bond:C

C(=

O)C

_keto

ne_alip

ha

tic_g

ene

ric

bond:C

N_am

ine_alic

yclic

_generic

bond:C

=O

_carb

on

yl_

ab-

unsatu

rate

d_gen

eric

bond:S

~N

_generic

bond:S

(=O

)O_sulfo

nic

Acid

_g

ene

ric

bond:C

C(=

O)C

_keto

ne_alk

ene_cyclic

_2

-en

-

1-o

ne_generic

ring:h

ete

ro_[6

]_O

_p

yra

n_gen

eric

ring:h

ete

ro_[6

]_N

_dia

zin

e_

(1_

3-)_

gen

eric

bond:C

C(=

O)C

_keto

ne_alk

ene_

gene

ric

bond:N

C=

O_ure

a_gen

eric

ring:h

ete

ro_[5

]_N

_p

yrro

le_ge

ne

ric

bond:C

#N

_nitrile

_generic

ring:h

ete

ro_[6

]_N

_tria

zin

e_g

ene

ric

chain

:aro

matic

Alk

ene_P

h-

C2_acyc

lic_generic

bond:C

X_halid

e_alk

enyl-X

_gen

eric

gro

up:a

min

oA

cid

_am

inoA

cid

_generic

bond:C

X_halid

e_alk

yl-X_e

thyl_

gene

ric

bond:P

~S

_generic

bond:C

=O

_ald

eh

yde

_ge

ne

ric

ring:fu

sed_ste

roid

_ge

ne

ric_[5

_6

_6_6

]

bond:N

=N

_azo_gene

ric

bond:m

eta

l_m

eta

lloid

_S

i_generic

bond:q

uatN

_generic

bond:C

=S

_carb

on

yl_th

io_ge

ne

ric

bond:S

(=O

)O_sulfu

ricA

cid

_gen

eric

bond:C

=N

_carb

oxam

idin

e_

gene

ric

ring:h

ete

ro_[7

]_gen

eric

_1-Z

chain

:alk

yne_eth

yne_

gen

eric

ring:h

ete

ro_[3

]_Z

_g

ene

ric

20197 1 0 1 1 1 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

47368 1 1 0 1 1 1 0 0 1 0 0 0 1 0 1 1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

47271 1 1 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47305 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47346 1 0 0 1 1 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

47375 1 1 0 0 0 1 1 0 0 1 1 0 1 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

21244 1 1 1 1 1 1 0 1 1 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

22519 1 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

24107 1 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47254 1 1 0 1 1 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47289 1 0 0 1 1 1 0 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

47311 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47355 1 0 0 1 1 1 1 0 1 1 0 0 1 0 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

48507 1 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

48511 1 1 0 1 1 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

20822 1 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

21097 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

21233 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

21777 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

22588 1 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

23322 1 0 1 0 0 0 0 1 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0

23412 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

23645 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

25234 1 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

34260 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47316 1 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

47325 1 1 0 1 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0

47339 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

47347 1 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 47351

1 1 0 1 1 1 0 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

“toxprint_v2_vs_TOX21S_v4a_8599_03Dec2013.csv”

Excellent Coverage (#chem w/chemotypes): Tox21: 8599 chemicals x 729 chemotypes

all 8454 structures have ≥ 1 chemotype

95% have ≥ 5 chemotypes each

65% have ≥ 10 chemotypes each

Diversity (#chemotypes present) ToxCast (1860) Tox21(8599) 500/729 (68%) 627/729 (86%)

Page 16: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Filter 1892 ToxCast chemicals by

ToxPrint_Chemotype

Export all ToxCast Assay data

ToxPrint Chemotype

Chemical Use Category, phys-chem properties, assay hits…

Filter by ToxPrint_Chemotype:

bond.C..O.N_carboxamide_.NH2

EPA ToxCast iCSS Dashboard:

http://actor.epa.gov/dashboard/

Page 17: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

17

•Refine or expand chemotype

subgroup of interest

•Are there HTS assay hits enriched

within this chemotype subgroup?

Page 18: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

e.g. ToxCast (1860) “Bisphenol A” chemotype search

Can export list of chemotypes for

selected chemicals

Can export structures containing

chemotypes

Use in iCSS Dashboard to

explore ToxCast HTS results

Page 19: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

714

936 166

Synergies: Tox21/ToxCast

Chemical Overlaps with COSMOS

Tox21

~8300 (unique structures)

COSMOS ~5500 (unique

structures)

1478

ToxRef DB In vivo animal

studies

ToxCast

PhI (300)

PhII (1060)

E1k & PhIII (2300)

What about non-overlapping

chemicals?

How do we utilize full chemical-

data landscape?

Significant overlap increased

shared data & knowledge resources

per overlapping chemical

Page 20: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

0% 20% 40% 60% 80% 100%

bond:C(=O)O_carboxylicEster_aromatic

bond:C=O_aldehyde_generic

bond:CC(=O)C_ketone_generic

bond:CN_amine_pri-NH2_aromatic

bond:CN_amine_pri-NH2_generic

bond:CN_amine_sec-NH_generic

bond:CN_amine_ter-N_generic

bond:COC_ether_aliphatic

bond:COC_ether_aliphatic__aromatic

bond:COC_ether_alkenyl

bond:COC_ether_aromatic

bond:COH_alcohol_aromatic_phenol

bond:COH_alcohol_diol_(1_1-),(1_2-),(1_3-)

bond:COH_alcohol_generic

bond:CS_sulfide

bond:CX_halide_alkyl-X_generic

bond:CX_halide_aromatic-X_generic

bond:CX_halide_generic-X_dihalo_(1_2-)

bond:N=N_azo_generic

bond:NC=O_urea_generic

bond:quatN_alkyl_acyclic

bond:S(=O)O_sulfonate

bond:S(=O)O_sulfonicEster_acyclic_(S-C(ring))

bond:metal_metalloid_Si_generic

bond:metal_metalloid_Si_organo

chain:alkaneLinear_octyl_C8

chain:alkaneLinear_decyl_C10

chain:alkaneLinear_dodedyl_C12

chain:alkaneLinear_tetradecyl_C14

chain:alkaneLinear_hexadecyl_C16

chain:alkaneLinear_stearyl_C18

ring:hetero_[5]_Z_1_2-Z, 2_3-Z,2_4_1_3_4-Z

ring:hetero_[5]_Z_1_3-Z

ring:hetero_[6]_N_pyridine_generic

ring:hetero_[6]_O_pyran_generic

0 500 1000 1500 2000

How representative is

Tox21-COSMOS

overlap (1478) of

remainder of COSMOS

structures (5540-

1478)?

Percentage Proportional

COSMOS-only (5540-1478)

Tox21 Overlap (1478)

Reinforcement of major

COSMOS ToxPrint chemotypes.

ToxPrint Profiling:

Synergies across inventories

Page 21: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

0

200

400

600

800

1000

1200

1400

1600

1800

2000

ToxCast not in COSMOS

Tox21 not in COSMOS

COSMOS_5540

In what areas of chemotype

space can ToxCast & Tox21

chemical-assay data inform

COSMOS?

* 60 ToxPrint chemotypes mapped to

20 categories

Chem

oty

pe c

ount

ToxPrint Chemotypes *

ToxPrint Profiling:

Synergies across inventories

Page 22: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

0,0

0,5

1,0

1,5

2,0

2,5

3,0

3,5

bond:C(=O)N_carbamatebond:CC(=O)C_ketone_generic

bond:CN_amine_sec-NH_generic

bond:COC_ether_aliphatic__aromatic

bond:COH_alcohol_aromatic_phenol

bond:COH_alcohol_generic

bond:CX_halide_alkyl-X_generic

bond:N=N_azo_generic

bond:quatN_alkyl_acyclic

bond:metal_group_III_other_Sn_organo

chain:alkaneLinear_octyl_C8

chain:alkaneLinear_tetradecyl_C14

chain:alkeneLinear_diene_linoleic_(C18)

ring:hetero_[5]_Z_1_2_3-Z

ring:hetero_[5]_Z_1_3-Z

Chemotype distribution across chemicals

with ToxRefDB Developmental study (all

species) compared to COSMOS & Tox21

COSMOS (55

Tox21 (w/o ToxRef)

ToxRef DEV (all species)

Log (Chemical count)

ToxPrint Profiling:

Synergies across inventories

Page 23: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

0

2

4

6

8

10

12

14

*Altamira beta version of

ToxPrint Chemotypes

S1: Metabolically Activated (134 cmpds)

S2: Direct acting &

inactives (157 cmpds)

Propose use of S1

feature set to

predict chemical

space in PhII &

Tox21 more likely

to require

metabolic

activation for Rat

Carcinogenicity

ToxPrint Profiling:

e.g. Modeling in vivo activity subsets

ToxCast Phase I (291 total) Rat Carcinogenicity Study

using ToxRefDB & Meteor:Derek workflow, Volarath et al.

Page 24: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

ToxCast Phase I Assays Assay Hits

in S1

Assay Hits

in S2

Fraction of total

Hits in S1

BSK_SM3C_MCP1_up 16 7 0.7

BSK_hDFCGF_IL8_up 14 4 0.78

BSK_BE3C_MIG_up 6 1 0.86

ATG_PPARa_TRANS 5 2 0.71

CLM_Hepat_LysosomalMass_1hr 5 2 0.71

CLM_Hepat_LysosomalMass_48hr 5 2 0.71

CLM_NuclearSize_24hr 5 2 0.71

NVS_NR_hPR 5 1 0.83

0

2

4

6

8

10

12

14

16 # Chemicals in 159-Dataset

# Chemicals in 134-Dataset

Subset of ToxCast assays that differentiate

metabolically activated RatCarc chemicals

(S1) from the remainder (S2)

HTS activity profile sensitive to chemical features!

ToxPrint Profiling:

e.g. Modeling in vitro to in vivo endpoint

ToxCast Phase I (291 total) Rat Carcinogenicity Study

using ToxRefDB & Meteor:Derek workflow, Volarath et al.

Page 25: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

ToxPrint:

e.g. Data mining & QSAR models

2. QSAR model:

Further differentiation of cleft palate

actives by HTS assay results (TGFb) &

partial pi- and sigma- charges yields

predictive model within chemotype

subgroups

C Yang et al., Altamira

1. Data Mining:

Tox21 cleft palate actives (ToxRef, public,

CERES) significantly enriched within

triazole/imidazole chemotype groups

Page 26: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

QSAR using biologically informed

chemical features

Toxicity

Biological features

HTS Assays

In vitro In vivo

ToxPrint

“Chemotypes”

HTS results are used to inform feature selection, linking chemical features to putative toxicity mechanism

Page 27: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Building a public chemotype

“knowledge- base”

Chemicals

Cosmos

CERES

ToxRef

ToxCast

Tox21

Use categories

Fate & Transport

ADME

Reactivity

Biotransformation Phys-chem

properties

Biological

activities

Page 28: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Data!

Public availability

Transparency

Tools

Usability

Chemistry: What’s needed?

Incorporate chemical information into usable tools for

chemical prioritization & safety assessments

Publicly available data & computational tools &

resources for chemists, toxicologists & modelers

to access & utilize chemical information

Harvesting of existing chemical activity (in vitro, in vivo)

data into databases & computational forms

Integration of available data resources (HTS, in vivo)

Cheminformatics foundation to enable structure modeling

Ability to “look across” data (HTS, in vivo, chemical)

to form hypotheses, guide analog selection, and

improve prediction models

ToxCast

ToxRefDB

iCSS Dashboard

DSSTox

ACToR

KNIME

ToxPrint &

Chemotyper

FDA CERES

Page 29: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Acknowledgements:

EPA NCCT ToxCast Team Richard Judson (ACToR)

Keith Houck (HTS)

Matt Martin (ToxRefDB, Dashboard)

Lisa Truong

Tox21 leadership & consortium

External Collaborators:

Altamira: Chihae Yang, Jim Rathman

Molecular Networks: Aleksey Tarkhov, Christof Schwab

COSMOS: Mark Cronin

U.S. FDA: Kirk Arvidson, Patra Volarath (formerly EPA Post Doc)

This work was reviewed by EPA and approved for publication but does not

necessarily reflect official Agency policy.

Page 30: U.S. EPA’s ToxCast, Tox21, and COSMOS Projects€¦ · bond:metal_metalloid_Si_organo chain:alkaneLinear_octyl_C8 chain:alkaneLinear_decyl_C10 chain:alkaneLinear_dodedyl_C12 chain:alkaneLinear_tetradecyl_C14

Office of Research and Development National Center for Computational Toxicology

Questions?