Bioinformatics: Protein Structure - Lehigh Universityinbios21/PDF/Fall2015/Chen_11042015.pdfA...

61
Bioinformatics: Protein Structure Brian Y. Chen Lehigh University Dept. Computer Science & Engineering

Transcript of Bioinformatics: Protein Structure - Lehigh Universityinbios21/PDF/Fall2015/Chen_11042015.pdfA...

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Bioinformatics:Protein Structure

Brian Y. Chen

Lehigh University

Dept. Computer Science & Engineering

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Informatics is the science of designing methodologies for

gathering, analyzing, integrating, and visualizing data

used to form an opinion.

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Word Lens, iPhone App

released: Dec 16, 2010

Translation Off Translation On

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

What is Bioinformatics?The science of designing methodologies for

gathering, analyzing, integrating, and visualizing dataused to form an opinion on Biological Systems.

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

GenBank – Global public repository of DNA sequences

Protein Data Bank – Global public repository of protein structures

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Biological systems are nestedand interacting machines

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Individual molecules are thefoundation of all systems

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Structural biology studiesmolecular machines

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Structural biology connectsstructure with function

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

1946: Sumner

1962: Crick, Watson, Wilkins1962: Perutz, Kendrew

1964: Hodgkin

1972: Anfinsen

1982: Klug

1988: Deisenhofer, Huber,Michel

1991: Ernst

1997: Walker

2002: Wuthrich2003: MacKinnon2006: Kornberg2009: Steitz, Yonath

Tim

elin

e of

Nob

el P

rizes

in S

truc

tura

l Bio

logy

Ramakrishnan,

Number of Entries in the Protein Data Bank

10000

20000

30000

40000

50000

60000

2009

2005

2000

1995

1990

1985

1980

1975

Total

Annual

Structural biology has becomerich in data

year

# entriesNIH ProteinStructure Initiative

Source: www.pdb.org

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Structural bioinformatics connects structurewith function at scale and with precision

StructuralBioinformatics

StructurePrediction

IntegrativeMethods

MolecularSimulation

StructureAlignment

FunctionalSite

Comparison

Docking

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

The General Problem:

Gather, analyze, integrate, and visualize data used to form an opinion on Biological Systems.

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

HAWPFMVSLQL-AGG----HFCGATLIAPNFV-----MSAAHCVANVNV

O

O

N

C

C

N

C

C

C

C

C

C

C

C

C

C

C

O

N

C

A

F

G

O

N

C

C

O

C

C

C CL

Proteins are chains of amino acids

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

HAWPFMVSLQL-AGG----HFCGATLIAPNFVMSAAHCVANVNV

HAWPFMVSLQL-RGG----HFCGATLIAPNFVMSAAHCVANVK-

HSWPWQISLQY-SKNDAWGHTCGGTLIASNYVLTAAHCISNAKT

HSRPYMVSLQV-Q---G-NHFCGGTLIHPQFVMTAAHCIDKINP

LA-PYIASLQRNRGG----HFCGGTLIHQQFVMTAAHCINSRNV

O

O

N

C

C

N

C

C

C

C

C

C

C

C

C

C

C

O

N

C

A

F

G

O

N

C

C

O

C

C

C CL

Similar sequences imply similar function

FASTA Mackey, et al. Mol. Cell. Prot. 2002.CLUSTALW Larkin et al. Bioinformatics., 2007.BLAST Altschul et al. Nuc. Acid. Res. 1997.

ConSurf Glaser, et al. Bioinformatics, 2003.Evolutionary Trace Mihalek, et al. Proteins, 2006.HMAP Tang, et al. J. Mol. Biol. 2003.

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Similar functional sites imply similar function

MASH Chen et al, J. Comput. Biol., 2007Combinatorial Extension Jia et al, J. Comput. Biol., 2004Geometric Hashing Nussinov et al, Proteins, 2001pevoSOAR Tseng et al, J. Mol. Biol., 2009Ska Petrey et al, Methods Enzymol. 2003. Geometric Sieving Chen et al, J. Bioinf. Comput. Biol., 2007PINTS Stark et al, Nucleic Acids Res, 2003.JESS Barker et al, Bioinformatics, 2003.Dali Holm et al, Bioinformatics, 2008.

MotifTarget

Match

Known function

Unknown function

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Protein surfaces reveal functional sites

GRASP2 Petrey et al, Methods Enzymol. 2003. SURFNET Glaser et al, Proteins, 2006.CASTp Dundas et al, Nucleic Acids Res. 2006.SCREEN Nayal et al, Proteins, 2006eF-seek Kinoshita et al, Nucleic Acids Res. 2007APROPOS Peters et al, J. Mol. Biol., 1996

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Similarity doesn’t tell us everything

How does this protein fit in the system?

What parts of the protein make it work?

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Specificity is preferential binding

Specificity is an aspect of function

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Cavity shape influences specificity

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Proteins with the same function can havedifferent specificity

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

VASP: Volumetric Analysis of the Surfaces of Proteins• Identify amino acids that alter cavity shape• Identify subcavities that alter cavity shape

VASP isolates differences in cavity shape

Results: VASP finds influences on specificity

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

DefineCavities

VolumetricComparisonOf Cavities

AlignStructures

Output:VolumetricDifferences

Input:A ProteinFamily

The VASP procedure

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

DefineCavities

VolumetricComparisonOf Cavities

AlignStructures

Output:VolumetricDifferences

Input:A ProteinFamily

SkaPetrey D, Honig B. GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences. Methods Enzymol. 374:492-509. 2003.

The VASP procedure

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

DefineCavities

VolumetricComparisonOf Cavities

AlignStructures

Output:VolumetricDifferences

Input:A ProteinFamily

Computational Solid Geometry

The VASP procedure

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Computational Solid Geometry (CSG)

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

CSG was originally for modeling parts

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Union

A B

Intersection

Difference

Boolean SetOperations

Computational Solid Geometry (CSG)

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Using CSG with protein structures

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

DefineCavities

VolumetricComparisonOf Cavities

AlignStructures

Output:VolumetricDifferences

Input:A ProteinFamily

The VASP procedure

A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic

Begin with the molecular surface

A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic

Compute an envelope surface

A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic

Find the interior surface

A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic

Identify nearby amino acids

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Compute the convex hull

Barber, C.B., Dobkin, D.P., and Huhdanpaa, H.T., ACM T Math Software, 22(4):469-483Schematic

A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic

CSG hull minus molecular surface

A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic

CSG intersection with the envelope surface

A Quantitative Universe, Nov. 16, 2012 Brian Y. ChenSchematic

Remove disconnected pieces

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

DefineFunctionalCavities

VolumetricComparisonOf Cavities

AlignStructures

Output:VolumetricDifferences

Input:A ProteinFamily

• Amino Acids affecting cavity shape

• Subcavities affecting cavity shape

The VASP procedure

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Finding amino acids that affect cavity shape

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Finding amino acids that affect cavity shape

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

1 2

345

6 7

8 9 10

1 2 3 4 5 6 7 8 9 10 …

Å3

0

Finding amino acids that affect cavity shape

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

DefineFunctionalCavities

VolumetricComparisonOf Cavities

AlignStructures

Output:VolumetricDifferences

Input:A ProteinFamily

• Amino Acids affecting cavity shape

• Subcavities affecting cavity shape

The VASP procedure

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

What makes A cavities different from B?

A B

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Intersection

What is common in A?

A B

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Intersection

What is the maximum extent of B?

A B

Union

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

output

Intersection Union

Difference

All parts of A that are not in any part of B

A B

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

• Serine Proteases: Same function, different specificity– Trypsins

– Elastases

– Chymotrypsins

• Experiments– VASP identifies amino acids that influence specificity

– VASP identifies subcavities that influence specificity

Results

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Serine Proteases

Chymotrypsin ClanCatalytic Triad: His-Asp-Ser

Subtilisin Clan (Asp-His-Ser)

Other clans (not used)Oligopeptidases (Asp-Ser-His)

Carboxypeptidases (Ser-Asp-His)

Others..

Rawlings ND, Barrett AJ. Evolutionary families of peptidases.Biochem J. 1993 Feb 15; 290(pt.1):205-18.

Chymotrypsins

Trypsins

Elastases

Subtilisins

The serine protease family

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Serine Protease

P1P2P3P4 P3’P2’P1’N C

HydrolyzePeptide Bonds

Blow DM, Birktoft JJ, Hartley BS. 1969. Role of a buried acid group in themechanism of action of chymotrypsin. Nature 221:337-340

CatalyticTriad

Serine proteases break up other proteins

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

1a0j Atlantic Salmon Trypsin

Asp 102His 57

Ser 195

Triad Catalytic Triad

Ala-Arg Peptide Substrate

(from 1fn8)

A serine protease up close

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Elastases3.4.21.36

Trypsins3.4.21.4

Chymotrypsins3.4.21.1

Alignment by Catalytic triad + S1 residue(Ca and Cb atoms)

Structural Alignment

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

S1S2S3S4 S3’S2’S1’

P1P2P3P4 P3’P2’P1’N C

Serine Protease

Schechter I and Berger A. On the size of the active site in proteases. I. Papain. Biochem. Biophys. Res. Commun. 1967 Apr 20;27(2):157-62

Serine proteases have specificity fordifferent sequences of amino acids

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Chymotrypsins

S1

P1P2 P1’N C

Chymotrypsin

{ Tyr, Phe, Trp }

Ser 189

Wesolowska, Krokoszynska, Krowarsch, Otlewski, Enhancement of Chymotrypsin-inhibitor/substrate interactions by 3M NaCl Biochimica et Biophysica Acta 1545 (2001) 78-85

1ab9Bovine Chymotrypsin

Triad

Chymotrypsins prefer big amino acids

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Trypsins

Asp 189

Lesner, Kupryszewski, and Rolka. Chromogenic Substrates of Bovine beta-trypsin: The influence of an Amino Acid Residue in P1 Position on their Interaction with the EnzymeBiochemical and Biophysical Research Communications, 285, pp 1350-1353 (2001)

S1

P1P2 P1’N C

Trypsin

{ Arg, Lys }

1a0jAtlantic Salmon Trypsin

Triad

+

-

Trypsins bind positively charged residues

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Elastases

S1

P1P2 P1’N C

Elastase

{ Ala, Gly, Val, .. }

1b0ePorcine Pancreatic Elastase

Ser 189

Triad

Elastases prefer small amino acids

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Pro

tein

Dat

a B

ank

ElastasesTrypsinsChymotrypsins

58individualstructures

45Non-mutants

2

< 90%SequenceIdentity

371individualstructures

290Non-mutants

11

< 90%SequenceIdentity

91individualstructures

91Non-mutants

2

< 90%SequenceIdentity

The data is filtered for noise and bias

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Chymotrypsin Elastase

Shotton D.M., Watson H.C. Three-dimensional structure of tosyl-elastase.Nature 225(5235): 811-816. 1970.

VASP finds amino acids in elastasethat influence specificity

Amino Acid Sequence Number

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Chymotrypsin Trypsin

Steitz T.A., Henderson R., Blow D.M. Structure of crystalline alpha-chymotrypsin. 3. Crystallographic studies ofsubstrates and inhibitors bound to the active site of alpha-chymotrypsin. J. Mol. Biol. 46(2): 337-348. 1969.

VASP finds amino acids in trypsinsthat influence specificity

Amino Acid Sequence Number

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Trypsin Intersection Elastase Union

VASP finds subcavities in trypsins andelastases that influence specificity

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

• VASP can identify:– Amino acids that influence specificity

– Subcavities that influence specificity

• Contributions– The first unsupervised analysis of protein structures that identifies

active components of functional sites

– The first algorithm to isolate the basis for specificity in protein structures

– The first representation of proteins using smooth solid volumes

• What can we use VASP for ?• Identify amino acids that might change specificity in drug resistance

• Influential subcavities point to drug designs that bind more specifically, and thus reduce side effects

Discussion

A Quantitative Universe, Nov. 16, 2012 Brian Y. Chen

Questions