Bridging cheminformatics and bioinformatics using protein structures

22
Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001

description

Bridging cheminformatics and bioinformatics using protein structures. Edith Chan Inpharmatica London 10 April 2001. SELECTING THE BEST TARGETS. High Validity and Drugability Requires a Unifying Informatics Framework. - PowerPoint PPT Presentation

Transcript of Bridging cheminformatics and bioinformatics using protein structures

Page 1: Bridging  cheminformatics and bioinformatics using  protein structures

Bridging cheminformatics and bioinformatics

using protein structures

Edith Chan

Inpharmatica

London10 April 2001

Page 2: Bridging  cheminformatics and bioinformatics using  protein structures

2

Bioinformatics Cheminformatics

SELECTING THE BEST TARGETS

Disease-association doesn’t make a protein a target - requires validation as point of intervention in pathway

Having good biological rationale doesn’t make a protein tractable to chemistry (drugable)

Genomics, HTS and Combichem have increased numerical throughput many hundred fold - overload of poorly integrated data, shortfall in productivity

Target Validation Process

Disease TargetTarget

Selection

Drug Discovery Process

ClinicLeads

Inpharmatica’s protein structure focus - uniquely placed to assess both parameters

High Validity and Drugability Requires a Unifying High Validity and Drugability Requires a Unifying Informatics FrameworkInformatics Framework

Page 3: Bridging  cheminformatics and bioinformatics using  protein structures

3

BIOPENDIUM AND CHEMATICA

Genome Data Target Structure Lead Hypotheses

O

O

HO

O

O

N

F

O

OO

O

O

NN

O

OO

O

Biopendium Chematicactgacaagtatgaaaacaacaagctgattg tccgcagagggcagtctttctatgtgcaga ttgacctcagtcgtc

protein target validation drug discoveryand selection

Page 4: Bridging  cheminformatics and bioinformatics using  protein structures

4

%

SEQ

UEN

CE

ID

AdvancedApproaches

AHHLDRPGHNMCEAGFWQPILLTest Sequence

100%

30%

0

Standard Approaches

STRUCTURE-BASED METHODS FIND MANY HOMOLOGUES (AND PUTATIVE TARGETS) NOT DETECTABLE FROM SEQUENCE SIMILARITY

Biochemical function and drugability defined by 3D structure, not sequence - structure is better conserved

Inpharmatica

Page 5: Bridging  cheminformatics and bioinformatics using  protein structures

5

BIOPENDIUM Inputs - all public (or proprietary) protein data Proprietary methods

Genome-ThreaderGenome-Threader QBI--Blast Reverse Search MaximisationReverse Search Maximisation

Massive computation 1 million cpu hour set of calculations employing the most advanced algorithms (1100

processor farm) Applied to 600,000 sequences, 14,000 structures + bound ligands Yields 670m precalculated protein relationships

Query results in 15 minutes vs. two weeks with traditional bioinformatics in an Oracle database Protein Information

Structures Sequences Bound ligands Families Functions

Page 6: Bridging  cheminformatics and bioinformatics using  protein structures

6

Link complementary datain the 7 resources

Precalculated data for 600,000 protein sequences.

(scores and alignments for each hit)

Pairwisesequencesearches

Profilebased

searches

Threadingbased

approaches

InpharmaticaWorkbench

Ligplot ligand interaction

editor

Inpharmaticaenhanced RasMol

3D viewer

Interactive sequence alignment

editor

RelationalDatabase

Taxonomy

Processed PDBto XMAS data

Mask sequences

THE INPHARMATICA BIOPENDIUM

Genbank PDBPrositePrints EnzymeSwissprot

Ligplot

Proprietary seq.ORF prediction

Proprietarystructures

Page 7: Bridging  cheminformatics and bioinformatics using  protein structures
Page 8: Bridging  cheminformatics and bioinformatics using  protein structures

8

CHEMATICA Drugable site

identified

DRUGABLE TARGET DISCOVERYFinding a novel brain metalloproteaseFinding a novel brain metalloprotease

BIOPENDIUM Novel brain

protein identified

Page 9: Bridging  cheminformatics and bioinformatics using  protein structures

9

CHEMATICA IS….

SiteMapping

SiteIdentification

FragmentMapping

Pharmacophore Generation

Database of putative/known binding sites site mapping and pharmacophore generation

similarity searching/clustering of siteslarge scale virtual screening resource

Gene FamilyData Views

Chemical annotation of

PDB ‘real’ ligand structures

N

O

N

O

CO

O

NN

O

O

O

Ligand 2-D structures

Gene family structures

consensus family analysis

Page 10: Bridging  cheminformatics and bioinformatics using  protein structures

10

a. Sphere is placed between the VDW surfaces of each atom pair.

b. Any neighbouring atoms penetrating sphere cause its size to be reduced.

c. Repeat for all possible atom pairs.

d. Generate surface around surviving sphere to define site region.

SURFNET: A program for visualizing molecular surfaces, cavities and intermolecular interactions.

Laskowski R A (1995), J. Mol. Graph., 13, 323-330.

Site identification - How sites in a protein structure are delineated?

Page 11: Bridging  cheminformatics and bioinformatics using  protein structures

11

Volume Hydrophobic contentPolar contentsurface accessibility ……

In total - 20 parameters calculated.

Physical Parameters of the clefts

8 largest sites are stored together with their physical parameters

Page 12: Bridging  cheminformatics and bioinformatics using  protein structures

12

Prediction of binding/active sites

Rule driven:use of Neural Netsa on a training set of 100 ligand/protein PDBs

Validation:success rate = 90% on a extended set of 500 PDBs

a backpropagation net -7-5-1 network

Page 13: Bridging  cheminformatics and bioinformatics using  protein structures

13

•3-D distributions of 20 different atom types about the 20 amino acids are calculated.

•No assumption of energy terms.

How XSITE potential is derived?

X-SITE: use of empirically derived atomic packing preferences to identify favourable interaction regions in the binding sites of proteins.

Laskowski R A, Thornton J M, Humblet C & Singh J (1996), Journal of Molecular Biology, 259, 175-201.

Page 14: Bridging  cheminformatics and bioinformatics using  protein structures

14

Data set Used

(1) 521 non-homologus protein chains* from PDB that satisfy

no two sequence identity is > 20%resolution <1.8ÅR factor < 0.2

AND

(2) 376 protein-ligand PDB structures for studying additional atom types other than those from peptides and proteins, such as Cl, F.

Note: The PDB has about 14K entries!

*cullpdb_pc20_res1.8_R0.2_d001130_chains521 (R. Dunbrack, Jr.)

U. Hobohm, M. Scharf, R. Schneider, "Selection of representative protein data sets." Protein Science, 1, 409-417 (1993).

Page 15: Bridging  cheminformatics and bioinformatics using  protein structures

15

Application of XSITE distributions to side-chains making up the calculated protein binding site

Projecting XSITE distributions onto the predicted binding site

Page 16: Bridging  cheminformatics and bioinformatics using  protein structures

16

How Pharmacophore is generated?

a. Compare the XSITE predictions generated for the different probe atoms at a 3D grid of densities encompassing the region of the binding site.

b. The higher the value at a given grid-point the higher the likelihood of finding that type of atom at that location.

c. For each probe atom, it derives a “best” map.

d. The net result is a new set of 3D grid maps, one per probe atom, holding only those regions where that atom scored higher than the others.

Page 17: Bridging  cheminformatics and bioinformatics using  protein structures

17

 

What is fragments mapping?

a. In-built database of more than 100 small molecule fragments - most common functional groups and represent the common building blocks that satisfy drug-like elements used in chemistry.

b. Privileged structures from companies.

O

O

O

O

N

ON

H

H

H

O

O

O S O

N

O

S N N

NN

N

N

NS

O

N

N S

N N

SOO

N SHS

OON O

H

Cl FF

F

FCl

Cl

Cl

P O

O

O

N+O

O

t-butyl ethyl tBoc

phenyl naphthayl di-phenyl bi-phenyl

carbonyl carboxyl acetic acid acetamide methylamine

furan thiophene oxazole thiazole pryrole imidazole triazine

cyclohexyl thiazolidine piperazine thiadiazole

sulfonyl sulfnamide cyano mercapto methol

Page 18: Bridging  cheminformatics and bioinformatics using  protein structures

18

How is fragments mapping done?

• Each atom in a fragment is assigned one of the 20 atom type.

• Each fragment is placed at every grid-point within the binding site and subjected to 300 rotations.

• At each rotation a score is calculated using the appropriate X-SITE predictions for the atom types that the fragment contains.

C.ar

C.ar

C.ar

Page 19: Bridging  cheminformatics and bioinformatics using  protein structures

19

CHEMATICA

Curated, high-quality annotation and presentation of important ‘drugable’ gene families

NHRs, kinases, caspases, GPCRs,….

Contains ligand structure information

Contains crystal environment classification

Automatic alerts for newly released structures

Multiple structure comparison options

Gene Family Data Views

Page 20: Bridging  cheminformatics and bioinformatics using  protein structures

20

Consensus Family Analysis

MMP-1 MMP-8 MMP-13 MMP-3

Size and topology of binding sites for MMP-1 & MMP-8 are similar, but detailed interactions differ

Spheres signify negative charge requirement in different areas of the binding pockets

provides potential for specificity

CHEMATICA

Page 21: Bridging  cheminformatics and bioinformatics using  protein structures

21

Taken two sets of data from literature

1) GOLD (Jones, Willett, Glen, Leach and Taylor) Genetic Optimization for Ligand Docking (71% success rate in ligand binding mode in 100 pdbs) our method - 70%

2) SUPERSTAR (Verdonk, Cole and Taylor) Empirical method for interactions in proteins (67% success rate for original 4 probes ~67% in 122 pdbs) our method - 84%

Validation Study

1. Jones et al. J. Mol. Biol. (1997) 267, 727-7482. Verdonk et al. J. Mol. Biol. (1999) 289, 1093-1108

Page 22: Bridging  cheminformatics and bioinformatics using  protein structures

22

Acknowledgements

InpharmaticaAlex MichieJohn OveringtonSimon Skidmore

UCLRoman LaskowskiAdrian ShepherdJanet Thornton