Datamining and Drug Design - University of...

64
Computational Modeling of Proteins Digital Biology 2016 1/64 Datamining and Drug Design L. Ridgway Scott Departments of Computer Science and Mathematics, Computation Institute, and Institute for Biophysical Dynamics, University of Chicago This talk is based on joint work with Ariel Fernd ´ andez (Argentine Institute of Mathematics “Alberto Pedro Calder ´ on” and Collegium Basilea, Institute for Advanced Study, Basel, Switzerland), Chris Fraser, Bioanalytical Computing Inc. (formerly UChicago), Harold Scheraga (Cornell), and Kristina Rogale Plazonic (Princeton); and at U. Chicago: Steve Berry, Tatiana Orlova.

Transcript of Datamining and Drug Design - University of...

Page 1: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Computational Modeling of Proteins

Digital Biology 2016 1/64

Datamining and Drug Design

L. Ridgway ScottDepartments of Computer Science and Mathematics,Computation Institute, and Institute for Biophysical Dynamics,University of Chicago

This talk is based on joint work with Ariel Ferndandez (Argentine Instituteof Mathematics “Alberto Pedro Calderon” and Collegium Basilea, Institutefor Advanced Study, Basel, Switzerland), Chris Fraser, BioanalyticalComputing Inc. (formerly UChicago), Harold Scheraga (Cornell), andKristina Rogale Plazonic (Princeton); and at U. Chicago: Steve Berry,Tatiana Orlova.

Page 2: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Computational Models

Digital Biology 2016 2/64

dynamics

femtosec picosec nanosec microsec msecattosec

10 picometers

Angstrom

nanometer

10 nanometers

100 nanometers

quantummechanics

electrostatics

continuum

molecular

Figure 1: Computational models at multiple scales

Nonbonded interactions involved in all scales.

Page 3: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Failure of drug designs in clinical trials [2]

Digital Biology 2016 3/64

Figure 2: FDA approvals of new molecule entities (NME), numbers ofphase III clinical trials completed and pharmaceutical R&D spending from1990 to 2005. NME approvals reached a peak in the period from 1996 to1999 and are now dropping in spite of continually increasing R&D spendingand phase III clinical trials.

Page 4: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Aligned backbones for two paralog kinases

Digital Biology 2016 4/64

Dehydrons for Chk1 marked in green and those for Pdk1 in red [8].

Page 5: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

HIV-1 protease with ‘dehydron wrapper’ inhibitor

Digital Biology 2016 5/64

Detail of the protease cavity, pattern of packing defects,and inhibitor positioned as dehydron wrapper.

Page 6: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Desolvation spheres for flap

Digital Biology 2016 6/64

Desolvation spheres for flap Gly-49–Gly-52 dehydroncontaining nonpolar groups of the wrapping inhibitor.

Page 7: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Body composition

Digital Biology 2016 7/64

Page 8: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Proteins as digital components

Digital Biology 2016 8/64

Proteins are the essential components of life:• used to build complexes, e.g., viruses

(bricks and mortar)• involved in signalling (information transmission)• enzymes essential in catalysis (chemical machines)

In all these cases, protein-ligand interaction isessential.

These interactions are deterministic (alwaysthe same).

Proteins function as discrete components not as analogdevices.

Page 9: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

The hydrophobic effect

Digital Biology 2016 9/64

Hydrophobic effect crucial in protein-ligand association.Water is essential to life as we know it, but hostile toproteins.

The role of water in protein biophysics: to modulateelectric forces via the dielectric effect.

Hydrophobicity fosters water removal and supportsprotein-ligand interaction, but it also modulates thedielectric effect.

Water is a strong dielectric, and protein sidechains are acomplex mix of charged, polar, and hydrophobic parts.

But the hydrophobic effect is non-specific in action.What makes proteins interact in a repeatable way?

Page 10: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

What sidechains are found at interfaces?

Digital Biology 2016 10/64

By examining interfaces in PDB structures, we can see

which residues are most likely to be found at interfaces.

�� ❅❅❅❅

CH2

C

NH2 O

Asparagine

C

CH3

H OH

Threonine

H

Glycine

C

H

H OH

Serine

�� ❅❅❅❅

CH2

C

O− O

Asparticacid

CH3

Alanine

CH2

SH

Cysteine

Sidechains most likely to be involved in interactions,ordered from the left (asparagine), are not hydrophobic.

Page 11: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Electronic forces

Digital Biology 2016 11/64

The only force of significance in biochemistry is theelectric force.

— But often modulated by indirection or induction.In terrestrial biology, water plays a significant role as adielectric which mediates non-covalent interactions(hydrogen bonds, salt bridges, cation-pi interactions).

But the dielectric effect of water is modulated byhydrophobic components of proteins.

Moreover, a ligand can change the hydrophobicenvironment upon binding.

In protein-ligand interactions, this makes intramolecularbonds as important as intermolecular interactions.

Page 12: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Our technology

Digital Biology 2016 12/64

Interaction between physical chemistry anddata mining in biophysical data bases.

• Data mining can lead to new results in physicalchemistry that are significant in biology.

• Using physical chemistry to look at data providesinsights regarding function.

We review recent results regarding protein-ligandinteraction based on insights about hydrophobic effects.We show that sidechain configurations modulatedielectric effect.

We discuss how these can be used to understanda novel factor that supports protein-ligand binding.

Page 13: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Competing effects: why this is so hard

Digital Biology 2016 13/64

Protein sidechains have large electrostatic gradientsWater is a strong dielectricHydrophobic groups modify the water structure

Large electrostatic gradients Screening by dielectric effect

Modulation of dielectric strength by hydrophobic effect

Figure 3: Three competing effects that determine protein behavior. Theseconspire to weaken interactive forces, making biological relationships moretenuous and amenable to mutation.

Page 14: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Protein basics

Digital Biology 2016 14/64

Proteins are sequences of amino acids which arecovalently bonded along a “backbone.”

Proteins of biological significance fold into athree-dimensional structure by adding hydrogen bondsbetween carbonyl and amide groups on the backbone ofdifferent amino acids.

In addition, other bonds, such as a salt bridge or adisulfide bond can form between particular amino acids(Cysteine has sulfur atoms in its sidechain).

But the hydrogen bond is the primary modeof structure formation in proteins.

Page 15: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Chains of amino acid residues

Digital Biology 2016 15/64

Proteins are chains of amino acid residueswhose basic unit is the peptide group.

(a)

✧✧❜❜

✧✧ ❜❜

N+

C

C i

α

Ri

❅❅ O−

C i+1α

Ri+1❅❅

H

(b)

❜❜ ✧✧

✧✧ ❜❜

N+

C

C i

α

Ri

❅❅ O−

C i+1α

Ri+1

✟✟ H

Figure 4: The rigid state of the peptide bond: (a) trans form, (b) cis form.The double bond between the central carbon and nitrogen keeps the pep-tide bond planar.

Page 16: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Linear (primary) structure of proteins

Digital Biology 2016 16/64

RR

R

R

R

R

Figure 5: Cartoon of peptide sequence where all peptides are in trans form(cf. Figure 4). Small boxes represent C-alpha carbons, arrow heads rep-resent amide groups NH, arrow tails represent carbonyl groups CO, andthin rectangular boxes are double bond between backbone C and N. Thedifferent residues are indicated by R’s. The numbering scheme is increas-ing from left to right, so that the arrow formed by the carbonyl-amide pairpoints in the direction of increasing residue number. The three-dimensionalnature of the protein is left to the imagination.

Page 17: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Hydrogen bonds and secondary structure

Digital Biology 2016 17/64

Proteins have a hierarchy of structure, the next being secondarystructure consisting of two primary types: alpha-helices and beta-sheets(a.k.a., α-helices and β-sheets).Alpha helices are helical arrangements of the subsequent peptidecomplexes with a distinctive hydrogen bond arrangement between theamide (NH) and carbonyl (OC) groups in peptides separated by k steps inthe sequence, where primarily k = 4 but with k = 3 and k = 5 alsooccurring less frequently:

(a)

Page 18: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Beta sheets

Digital Biology 2016 18/64

Beta sheets represent different hydrogen bond arrangements: (b) is theanti-parallel arrangement and (c) is the parallel.

(b) (c)

Both structures are essentially flat, in contrast to the helical structure in(a).

Page 19: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydrons in human hemoglobin

Digital Biology 2016 19/64

Dehydronsin human hemoglobinFrom PNAS 100: 6446-6451 (2003)Ariel Fernandez, Jozsef Kardos,L. Ridgway Scott, Yuji Goto,and R. Stephen Berry.Structural defects and thediagnosis of amyloidogenic propensity.

Well-wrapped hydrogen bonds aregrey, and dehydrons are green.

The standard ribbon modelof “structure” lacks indicatorsof electrostatic environment.

Page 20: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

A quote

Digital Biology 2016 20/64

from Nature’s Robots ....

“The exact and definite determination of lifephenomena which are common to plants and animalsis only one side of the physiological problem of today.

The other side is theconstruction of a mental picture ofthe constitution of living matterfrom these general qualities. In this portion of our workwe need the aid of physical chemistry.”Jacques Loeb, The biological problems of today: physiology.Science 7, 154-156 (1897).

so our theme is not so new ....

Page 21: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Data mining definition

Digital Biology 2016 21/64

WHATIS.COM: Data mining is sorting through data toidentify patterns and establish relationships.Data mining parameters include:

• Association - looking for patterns where one event isconnected to another event

• Sequence or path analysis - looking for patternswhere one event leads to another later event

• Classification - looking for new patterns (May resultin a change in the way the data is organized butthat’s ok)

• Clustering - finding and visually documenting groupsof facts not previously known

Conclusion: Data mining involves looking at data.

Page 22: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Data mining lens

Digital Biology 2016 22/64

If data mining is looking at data then☛

✠What type of lens do we use?

• All of these have chemical representations, e.g.,

C400H620N100O120P1S1

• Alphabetic sequences describe much of biology:DNA, RNA, proteins.

• All of these have three-dimensional structure.• But structure alone does not explain how they

function.

Physical chemistry clarifies the picture andallows function to be more easily interpreted.

Page 23: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Sequences can tell a story

Digital Biology 2016 23/64

Protein sequences

aardvarkateatavisticallyacademicianaccelerativeacetylglycineachievementacidimetricallyacridityactressadamantadhesivenessadministrativelyadmitafflictiveafterdinneragrypniaaimlessnessairlift

and DNA sequences

actcatatactagagtacttagacttatactagagcattacttagat

can be studied using automatically determined lexicons.

Joint work with John Goldsmith, Terry Clark, Jing Liu.

Page 24: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Sequences can tell a story

Digital Biology 2016 24/64

Protein sequences (a linguistic lens)

aardvarkateatavisticallyacademicianaccelerativeacetylglycineachievementacidimetricallyacridityactressadamantadhesivenessadministrativelyadmitafflictiveafterdinneragrypniaaimlessnessairlift

and DNA sequences

actcatatactagagtacttagacttatactagagcattacttagat

can be studied using automatically determined lexicons.

Joint work with John Goldsmith, Terry Clark, Jing Liu.

Page 25: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

What sidechains are found at interfaces?

Digital Biology 2016 25/64

By examining interfaces in PDB structures, we can seewhich residues are most likely to be found at interfaces.

�� ❅❅❅❅

CH2

C

NH2 O

Asparagine

C

CH3

H OH

Threonine

H

Glycine

C

H

H OH

Serine

�� ❅❅❅❅

CH2

C

O− O

Asparticacid

CH3

Alanine

CH2

SH

Cysteine

Sidechains most likely to be involved in interactions,ordered from the left (asparagine), are not hydrophobic.

Page 26: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Amino acid sidechains have different properties

Digital Biology 2016 26/64

Carbonaceous groups on sidechains are hydrophobic:

�� ❅❅

CH2

CH2CH2

Valine

�� ❅❅

CH2

CH

CH3 CH3

Leucine

CH CH3

CH2

CH3

Isoleucine

��❅❅

CH2CH2

CH2

Proline

✟✟✟✟❍❍

❍❍ ✟✟❍❍

CH2

Phenyl-alanine

Amino acid residues (sidechains only shown)having only carbonaceous groups.

Page 27: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Charges in a dielectric

Digital Biology 2016 27/64

Charges in a dielectric are like lights in a fog.

Page 28: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Wrapping modifies dielectric effect

Digital Biology 2016 28/64

Hydrophobic (CHn) groups remove water locally.

This causes a reduction in ε locally.

(Resulting increase in φ makes dehydrons sticky.)

This can be quantified and used to predict binding sites.

The placement of hydrophobic groups near anelectrostatic bond is called wrapping.

Like putting insulation on an electrical wire.

We can see this effect on a single hydrogen bond.

Page 29: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Unit of hydrophobicity

Digital Biology 2016 29/64

A single carbonaceous group CHn can enhancethe strength and stability of a hydrogen bond.

Consider the effect of such a group in

• methyl alchohol versus ethyl alchohol• ethylene glycol versus propylene glycol• (deadly versus drinkable)

Can we see a molecular-level effect analogous to thechange in dielectric permittivity?

What can a simple model of dielectric modulationpredict?

Page 30: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Wrapping protects hydrogen bond from water

Digital Biology 2016 30/64

C

OHHO

H

H

CHn

CHn CHn

CHn

CHn

NHO

C

OH

CHn

CHnCHn

O HH

H

HON

Well wrapped hydrogen bond Underwrapped hydrogen bond

Page 31: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Extent of wrapping changes nature of hydrogen bond

Digital Biology 2016 31/64

Hydrogen bonds (B) not protected from water do not persist.

From De Simone, et al., PNAS 102 no 21 7535-7540 (2005)

Page 32: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dynamics of hydrogen bonds and wrapping

Digital Biology 2016 32/64

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

100

200

300

400

500

600

700

800

900

Figure 6: Distribution of bond lengths for two hydrogen bonds formed in astructure of the sheep prion [4]. Horizontal axis measured in nanometers,vertical axis represents numbers of occurrences taken from a simulationwith 20, 000 data points with bin widths of 0.1 Angstrom. Distribution for thewell-wrapped hydrogen bond (H3) has smaller mean value but a longer (ex-ponential) tail, whereas distribution for the underwrapped hydrogen bond(H1) has larger mean but Gaussian tail.

Page 33: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Ligand binding removes water

Digital Biology 2016 33/64

L IG A N D

������������������������������

������������������������������

OH

CHn

CHnCHn

O HH

H

HONC C O H N

CHn

CHn

CHn

Binding of ligand changes underprotected hydrogen

bond (high dielectric) to strong bond (low dielectric)

No intermolecular bonds needed!

Page 34: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Inter- versus intra-molecular

Digital Biology 2016 34/64

Intermolecular bonds are like the power cord on my computer.

Figure 7: Wireless Charging (from Technology Review).

Intramolecular bonds are like the charger on electric toothbrush.

Page 35: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Intermolecular versus intramolecular H bonds

Digital Biology 2016 35/64

C

CHn

CHnCHn

HN C O H N

CHn

CHn

CHn

L IG A N DO

Energetic contribution to binding comparable

but can be better for intramolecular.

Page 36: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydrons in human hemoglobin

Digital Biology 2016 36/64

Dehydronsin human hemoglobinFrom PNAS 100: 6446-6451 (2003)Ariel Fernandez, Jozsef Kardos,L. Ridgway Scott, Yuji Goto,and R. Stephen Berry.Structural defects and thediagnosis of amyloidogenic propensity.

Well-wrapped hydrogen bonds aregrey, and dehydrons are green.

The standard ribbon modelof “structure” lacks indicatorsof electrostatic environment.

Page 37: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Wrapping made quantitative

Digital Biology 2016 37/64

Wrapping made quantitative by counting carbonaceous groups inthe neighborhood of a hydrogen bond.

Page 38: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Distribution of wrapping

Digital Biology 2016 38/64

Distribution of wrapping for an antibody complex.

0

2

4

6

8

10

12

0 5 10 15 20 25 30 35 40

freq

uenc

y of

occ

urre

nce

number of noncarbonaceous groups in each desolvation sphere: radius=6.0 Angstroms

PDB file 1P2C: Light chain, A, dotted line; Heavy chain, B, dashed line; HEL, C, solid line

line 2line 3line 4

0

2

4

6

8

10

12

0 5 10 15 20 25 30 35 40

freq

uenc

y of

occ

urre

nce

number of noncarbonaceous groups in each desolvation sphere: radius=6.0 Angstroms

PDB file 1P2C: Light chain, A, dotted line; Heavy chain, B, dashed line; HEL, C, solid line

line 2line 3line 4

Page 39: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Stickiness of dehydrons

Digital Biology 2016 39/64

Attractive force of dehydrons predicted and measured inAriel Fernandez and L. Ridgway Scott. Adherence of packing defects insoluble proteins. Phys. Rev. Lett. 2003 91:18102(4)by considering rates of adhesion to phospholipid (DLPC) bilayer.

Deformation of phospholipid bilayer by dehydrons measured inAriel Fernandez and L. Ridgway Scott. Under-wrapped soluble proteinsas signals triggering membrane morphology. Journal of Chemical Physics119(13), 6911-6915 (2003).

Single molecule measurement of dehydronic force inAriel Fernandez. Direct nanoscale dehydration of hydrogen bonds.Journal of Physics D: Applied Physics 38, 2928-2932, 2005.

Fine print: careful definition of dehydron requires assessing modificationof dielectric enviroment by test hydrophobe. That is, geometry of carbongroups matters, although counting gets it right ≈ 90% of the time [7].

Page 40: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Charge-force relationship

Digital Biology 2016 40/64

Here’s the math....Charges ρ induce an electric field e = ∇φ given by

∇· (ε∇φ) = ∇· (εe) = ρ, (1)

where ε is permittivity of medium. Energy =∫ρφ dx.

In vacuum, ε is permittivity of free space, ε0.

In other media (e.g., water) the value of ε is much larger.

ε measures strength of dielectric enviroment.

Water removal decreases ε in (1), and increases φ.

Hydrophilic groups contribute to right-hand side ρ in (1).

Page 41: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

HIV protease dehydron

Digital Biology 2016 41/64

TheHIV proteasehas a dehydronat an antibodybinding site.

When theantibody bindsat the dehydron,it wraps itwith hydrophobicgroups.

Page 42: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

A model for protein-protein interaction

Digital Biology 2016 42/64

Foot-and-mouth disease virus assembly from small proteins.

Page 43: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydrons guide binding

Digital Biology 2016 43/64

Dehydrons guide binding of component proteins VP1, VP2 andVP3 of foot-and-mouth disease virus.

Page 44: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Extreme interaction: amyloid formation

Digital Biology 2016 44/64

Standard application of bioinformatics: look at distribution tails.If some is good, more may be better, but too many may be bad.Too many dehydrons signals trouble: the human prion.

From PNAS 100: 6446-6451 (2003) Ariel Fernandez, Jozsef Kardos, L.Ridgway Scott, Yuji Goto, and R. Stephen Berry. Structural defects andthe diagnosis of amyloidogenic propensity.

Page 45: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydrons as indicators of protein interactivity

Digital Biology 2016 45/64

If dehydrons provide mechanism for proteins to interact, then moreinteractive proteins should have more dehydrons, and vice versa.

We only expect a correlation since there are (presumably) otherways for proteins to interact.

The DIP database collects information about protein interactions, basedon individual protein domains: can measure interactivity of differentregions of a given protein.

Result: Interactivity of proteins correlates strongly withnumber of dehydrons.

PNAS 101(9):2823-7 (2004)The nonconserved wrapping of conserved protein folds reveals a trendtoward increasing connectivity in proteomic networks.Ariel Fernandez, L. R. Scott and R. Steve Berry

Page 46: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydron counts correlate with interactivity

Digital Biology 2016 46/64

Page 47: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydron variation over different species

Digital Biology 2016 47/64

Species (common name) peptides H bonds dehydronsAplysia limacina (mollusc) 146 106 0

Chironomus thummi thummi (insect) 136 101 3Thunnus albacares (tuna) 146 110 8Caretta caretta (sea turtle) 153 110 11Physeter catodon (whale) 153 113 11

Sus scrofa (pig) 153 113 12Equus caballus (horse) 152 112 14

Elephas maximus (Asian elephant) 153 115 15Phoca vitulina (seal) 153 109 16H. sapiens (human) 146 102 16

Number of dehydrons in Myoglobin of different species

1ECA (insect) [4]1MYT (yellow−fin tuna) [8]1LHT (sea turtle) [11]

1MBS (seal) [16]1BZ6 (sperm whale) [11]

1DRW (horse) [14]1MWC (wild boar) [12]

2MM1 (human) [16]

1MBA (mollusc) [0]

Page 48: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Anecdotal evidence

Digital Biology 2016 48/64

Anecdotal evidence:basic structure is similar,dehydron number increases.

SH3 domains are fromnematode C. elegans (a)H. sapiens (b);

ubiquitin from E. coli (c)and H. sapiens (d);

hemoglobinfrom Paramecium (e)and H. sapiens-subunit (f).

Page 49: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Wrapping technology in drug design

Digital Biology 2016 49/64

Synopsis of “Modulating drug impact by wrapping target proteins” by ArielFernandez and L. Ridgway Scott, Expert Opinion on Drug Discovery2007.

Drug ligands bind to proteins near dehydrons,enhancing wrapping upon attachment.

Drug side effects often caused by binding toproteins with structure similar to target.

We can exploit the differences in dehydronpatterns in homologous proteins to makedrugs more specific.

Page 50: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Drug ligand non-polar groups

Digital Biology 2016 50/64

Drug ligand provides additional non-polar carbonaceous group(s) indesolvation domain, enhancing wrapping of hydrogen bond.

Page 51: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

HIV-1 protease with ‘dehydron wrapper’ inhibitor

Digital Biology 2016 51/64

Detail of the protease cavity, pattern of packing defects,and inhibitor positioned as dehydron wrapper.

Page 52: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Desolvation spheres for flap

Digital Biology 2016 52/64

Desolvation spheres for flap Gly-49–Gly-52 dehydroncontaining nonpolar groups of the wrapping inhibitor.

Page 53: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Drug specificity

Digital Biology 2016 53/64

Tyrosine kinases: a family of proteins with very similarstructure.

• Called paralogous because they are similar proteinswithin a given species.

• Presumed to have evolved from a common source.• Crucial target of cancer drug therapy.

Gleevec targets particular tyrosine kinases and hasbeen one of the most successful cancer drugs.However, it also targets similar proteins and can causeunwanted side effects (it is cardiotoxic).Differences between the dehydron patters in similarproteins can be used to differentiate them and guide there-design of drug ligands.

Page 54: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Aligned paralog kinases

Digital Biology 2016 54/64

Aligned backbones for two paralog kinases; dehydrons for Chk1 aremarked in green and those for Pdk1 are in red.

Page 55: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydron in C-Kit

Digital Biology 2016 55/64

Dehydron Cys673-Gly676 in C-Kit not conserved in paralogs Bcr-Abl, Lck,Chk1 and Pdk1. By methylating Gleevec at para position (1), inhibitorbecomes selective wrapper of C-Kit dehydron.

Page 56: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Wrapping technology: WBZ drugs

Digital Biology 2016 56/64

Two variants of imatinib, WBZ-4 and WBZ-7, weredeveloped using wrapping technology:

“WBZ-4, unlike imatinib, targets C-Kit but not Bcr-Abl....potential for cardiotoxicity is even lower, researchersfound in laboratory, animal, and computer testing.WBZ-4 appears to be as effective against GIST asimatinib”

“Gleevec is far more effective against a drug-resistantstrain of cancer when the drug wraps the target ....modified version of the drug, WBZ-7, ... seals out waterfrom a critical area.”

Page 57: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Genetic code

Digital Biology 2016 57/64

Genetic code minimizes changes of polarity due to single-letter codonmutations, but it facilitates changes in wrapping due to single-letter codonmutations.

+ −uuauug

uuuuuc

Leu

Phe 7

4

gucuucc

ucguca

cuucuccuacug

Leu 4

Ser 0 + −

auuaucauaaug

4Ile

Met + −guugucguagug

Val 3

cagu

u

u

c

c

c

a

a

a

g

g

g

ccu

ccgccaccc

Pro 2

acuaccacaacg

Thr + −1

gcugccgcagcg

Ala 1

uacuaauag

uau

caucaccaacag

+ −

+ −

aauaacaaaaag

+ −

+ +

gacgaagag

gau − −

− −

1

Tyr 6

His 1

Gln 2

Asn 1

Lys 3

Asp

Glu 2

1

+ − ugcugaugg

ugu Cys + −0

Trp 7cgucgccgacgg

2 + +

| |

| |

| |

| |

| |

| |

| |

aguagcagaagg

Ser 0 + −

2 + +

Arg

Arg

ggcggaggg

ggu

c

a

g

u stopstopstop

Gly 0 + −

u

Second Position

Firs

t Pos

ition

Third P

osition

u c a

First digit after residue name is amount of wrapping. Second indicator ispolarity; | |: nonpolar, +−: polar, −−: negatively charged, ++: positivelycharged.

Page 58: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Determining dehydrons without structure

Digital Biology 2016 58/64

Can use disorder scores as proxy for dehydrons.Dehydrons correspond to changes in score.

Page 59: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Dehydron regions identified by sequence alone

Digital Biology 2016 59/64

Changes in disorder score relate to (a) protein binding and (b)catalytic regions (dehydrons facilitate catalysis) [5, 6].

Technology used to find peptides for treating heart disease [9].

Ariel Fernandez and L. Ridgway Scott. Drug leads for interactive protein targetswith unknown structure. Drug Discovery Today, October 2015.

Page 60: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Peptides for treating heart disease

Digital Biology 2016 60/64

U.S. Patent No. 9,051,387 B2, 9 June 2015

Treating Heart Failure by Inhibiting Myosin Interactionwith a Regulatory Myosin Binding Protein [MyBP-C]

INVENTORS - Richard Moss, Ariel Fernandez

Inhibitor peptide: FSSLLKKRDAFRRDAK

“The Wisconsin Alumni Research Foundation (WARF) isseeking commercial partners interested in developingand using peptides that promote cardiac musclecontraction by disrupting the binding of MyBP-C tomyosin.”

Page 61: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Oppositely charged residues without a counter ion?

Digital Biology 2016 61/64

Lysines become deprotonated by nearby dehydrons [10]?

Page 62: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Conclusions

Digital Biology 2016 62/64

Data mining in the Protein Data Bank...and other protein data bases

can yield insights regarding a

“mental picture of the constitution of livingmatter” Jacques Loeb, Science (1897).

Yields technologies that can aid

• understanding of disease,• drug design, and• fundamental protein biophysics.

Page 63: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

Thanks

Digital Biology 2016 63/64

This talk is based on joint work with (among others)

Ariel Ferndandez (Argentine Institute of Mathematics“Alberto Pedro Calderon” and Collegium Basilea,Institute for Advanced Study, Basel, Switzerland),

Chris Fraser, Bioanalytical Computing Inc.,

Harold Scheraga (Cornell), and Kristina RogalePlazonic (Princeton); and at U. Chicago: Steve Berry,Tatiana Orlova.We acknowledge partial support from Institute forBiophysical Dynamics at the University of Chicago andNSF grant DMS-1226019.

We thank the MBI at Ohio State for support during fall2015.

Page 64: Datamining and Drug Design - University of Chicagopeople.cs.uchicago.edu/~ridg/digbio16/digbio16ug.pdfDatamining and Drug Design L. Ridgway Scott Departments of Computer Science and

References

Digital Biology 2016 64/64

[1] Kevin Cahill and V. Adrian Parsegian. Computational consequences of neglected first-order van der Waals forces. arXiv preprintq-bio/0312005, 2003.

[2] Yugong Cheng, Tanguy LeGall, Christopher J. Oldfield, James P. Mueller, Ya-Yue J. Van, Pedro Romero, Marc S. Cortese,Vladimir N. Uversky, and A. Keith Dunker. Rational drug design via intrinsically disordered protein. Trends in Biotechnology,24(10):435 – 442, 2006.

[3] T. C. Choy. van der Waals interaction of the hydrogen molecule: An exact implicit energy density functional. Phys. Rev. A,62:012506, Jun 2000.

[4] Alfonso De Simone, Guy G. Dodson, Chandra S. Verma, Adriana Zagari, and Franca Fraternali. Prion and water: Tight anddynamical hydration sites have a key role in structural stability. Proceedings of the National Academy of Sciences, USA,102:7535–7540, 2005.

[5] Ariel Fernandez. Packing defects functionalize soluble proteins. FEBS letters, 589(9):967–973, 2015.

[6] Ariel Fernandez. Quantum theory of interfacial tension quantitatively predicts spontaneous charging of nonpolar aqueousinterfaces. Physics Letters A, 2015.

[7] Ariel Fernandez and L. Ridgway Scott. Dehydron: a structurally encoded signal for protein interaction. Biophysical Journal,85:1914–1928, 2003.

[8] Ariel Fernandez and L. Ridgway Scott. Modulating drug impact by wrapping target proteins. Expert Opinion on Drug Discovery,2:249–259, 2007.

[9] Richard Moss and Ariel Fernandez. Inhibition of mybp-c binding to myosin as a treatment for heart failure, June 2015.

[10] L. Ridgway Scott and Ariel Fernandez Stigliano. Mismatched ions indicate quantum effects in proteins. Research Report UC/CSTR-2015-10, Dept. Comp. Sci., Univ. Chicago, 2015.