Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide...

36
Macromolecular structure Bioinformatics

Transcript of Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide...

Page 1: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Macromolecular structure

Bioinformatics

Page 2: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Contents

Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis

Structure alignment Domain recognition

Structure prediction Homology modelling Threading/folder recognition Secondary structure prediction ab initio prediction

Page 3: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Jacques van [email protected]

Determinationof protein structure

Structure

Page 4: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Crystal

Hanging drop method / vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1&2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Page 5: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

Page 6: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

A high resolution protein structure : 1.5 - 2.0 Å resolution

q

q

q

The resolution problem

Slide courtesy from Shoshana Wodak

Page 7: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Nuclear Magnetic Resonance (NMR)

Source: Branden & Tooze (1991)

Page 8: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Interatomic forces

Covalent interactions Hydrogen bonds Hydrophobic/hydrophilic interactions Ionic interactions van der Waals force Repulsive forces

Page 9: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Jacques van [email protected]

Structure databases

Structure

Page 10: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Structure databases

PDB (Protein database) Official structure repository

SCOP (Stuctural Classification Of Proteins) Structure classification. Top level reflect structural classes.The

second level, called Fold, includes topological and similaritycriteria.

CATH (Class, Architecture, Topology and Homologoussuperfamily)

Page 11: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

PDB entry header

HEADER TRANSCRIPTION REGULATION 06-MAR-92 1D66 1D66 2

COMPND GAL4 (RESIDUES 1 - 65) COMPLEX WITH 19MER DNA 1D66 3

SOURCE (SACCHAROMYCES $CEREVISIAE) OVEREXPRESSED IN (ESCHERICHIA 1D66 4

SOURCE 2 $COLI) 1D66 5

AUTHOR R.MARMORSTEIN,S.HARRISON 1D66 6

REVDAT 1 15-APR-93 1D66 0 1D66 7

JRNL AUTH R.MARMORSTEIN,M.CAREY,M.PTASHNE,S.C.HARRISON 1D66 8

JRNL TITL /DNA$ RECOGNITION BY /GAL4$: STRUCTURE OF A 1D66 9

JRNL TITL 2 PROTEIN(SLASH)/DNA$ COMPLEX 1D66 10

JRNL REF NATURE V. 356 408 1992 1D66 11

JRNL REFN ASTM NATUAS UK ISSN 0028-0836 006 1D66 12

REMARK 1 1D66 13

REMARK 2 1D66 14

REMARK 2 RESOLUTION. 2.7 ANGSTROMS. 1D66 15

REMARK 3 1D66 16

REMARK 3 REFINEMENT. 1D66 17

REMARK 3 PROGRAM CORELS;TNT;XPLOR 1D66 18

REMARK 3 AUTHORS J.SUSSMAN;D.TRONRUD;A.BRUNGER 1D66 19

REMARK 3 R VALUE 0.230 1D66 20

REMARK 3 RMSD BOND DISTANCES 0.015 ANGSTROMS 1D66 21

REMARK 3 RMSD BOND ANGLES 2.9 DEGREES 1D66 22

REMARK 4 1D66 23

REMARK 4 THERE ARE TWO DNA CHAINS WHICH HAVE BEEN ASSIGNED CHAIN 1D66 24

REMARK 4 INDICATORS *D* AND *E*. THERE ARE TWO PROTEIN CHAINS 1D66 25

REMARK 4 WHICH HAVE BEEN ASSIGNED CHAIN INDICATORS *A* AND *B*. 1D66 26

REMARK 4 EACH PROTEIN - DNA COMPLEX CONTAINS FOUR BOUND CD IONS. 1D66 27

...

Page 12: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

In CATH, proteindomains are classifiedaccording to a tree with 4levels of hierarchically Class Architecture Topology Homology

Page 13: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

CATH: structural classification of proteins, [http://www.biochem.ucl.ac.uk/bsm/cath/] SCOP: Structural classification of proteins [http://scop.mrc-lmb.cam.ac.uk/scop/] FSSP:Fold classification based on structure alignments [http://www.sander.ebi.ac.uk/fssp/] HSSP: Homology derived secondary structure assignments [http://www.sander.ebi.ac.uk/hssp/] DALI:Classification of protein domains [http://www.ebi.ac.uk/dali/domain/] VAST: structural neighbours by direct 3D structure comparison [http://www.ncbi.nlm.nih.gov:80/Structure/VAST/vast.shtml] CE: Structure comparisons by Combinatorial Extension [http://cl.sdsc.edu/ce.html]

Classifications of protein structures (domains)

Slide courtesy from Shoshana Wodak

Page 14: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Books

Branden, C. & Tooze, J. (1991). Introduction to proteinstructure. 1 edit, Garland Publishing Inc., New York andLondon.

Westhead, D.R., J.H. Parish, and R.M. Twyman. 2002.Bioinformatics. BIOS Scientific Publishers, Oxford.

Mount, M. (2001). Bioinformatics: Sequence andGenome Analysis. 1 edit. 1 vols, Cold Spring HarborLaboratory Press, New York.

Gibas, C. & Jambeck, P. (2001). DevelopingBioinformatics Computer Skills, O'Reilly.

Page 15: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Jacques van [email protected]

Secondary structure elements

Structure

Page 16: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Secondary structure - α-helix

Source: Branden & Tooze (1991)

3.6 residues

hydrogen bond

CarbonNitrogenOxygen

Page 17: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Hydrophobicity of side-chain residues in helices

Source: Branden & Tooze (1999)Blue: polarRed: basic or acidic

Page 18: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Secondary structure - β sheets

Antiparallel Parallel

Source: Branden & Tooze (1991)

Page 19: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Secondary structure - twist of β sheets

Mixed β sheet

Source: Branden & Tooze (1991)

Page 20: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Angles of rotation

Each dipeptide unit is characterizedby two angles of rotation Phi around the N-Calpha bond Psi around the Calpha-C bond

Image from Branden & Tooze (1999)

Page 21: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Dipeptide unit

The Ramachandran map

Slide courtesy from Shoshana Wodak

Dipeptide unit

Page 22: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Jacques van [email protected]

Tertiary structure

Structure

Page 23: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Combinations of secondary structures

loop

α-helix

β-sheet

Retinol binding protein (PDB:1rpb)

Page 24: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Jacques van [email protected]

Analysis of structure

Bioinformatics

Page 25: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Question: Is structure A similar to structure B ?

Structure AStructure B

Approach: structure alignments

Structure-structure alignment and comparison

Slide courtesy from Shoshana Wodak

Page 26: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Open form Closed form

Citrate synthase, ligand induced conformational changesDomain motion and small structural distortions

Analyzing conformational changes

Slide courtesy from Shoshana Wodak

Page 27: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Defining Domains: What for?

Link between domain structure and function

Different structural domains can be associated with

different functions

Enzyme active sites are often at domain interfaces;domain movements play

a functional role

Cathepsin DDNA Methyltransferase

Slide courtesy from Shoshana Wodak

Page 28: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

N

C

N

C

C

N

1-cut

2-cuts

4-cuts

Slide courtesy from Shoshana Wodak

Methods for Identifying Domains

Underlying principle Domain limits are defined by identifying groups of residues such

that the number of contacts between groups is minimized.

Page 29: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Domains From Contact Map

Lactate dehydrogenase

Slide courtesy from Shoshana Wodak

Page 30: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Jacques van [email protected]

Structure prediction

Structure

Page 31: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Methods for structure prediction

Homology modelling Building a 3D model on the basis of similar sequences

Threading Threading the sequence on all known protein structures, and

testing the consistency

Secondary structure prediction ab initio prediction of tertiary structure

For proteins of normal size, it is almost impossible to predictstructures ab initio.

Some results have been obtained in the prediction ofoligopeptide structures.

Page 32: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Homology modelling - steps

Similarity search Modelling of backbone

Secondary structure elements Loops

Modelling of side chains Refinement of the model Verification

Steric compatibility of the residues

Page 33: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Homology modelling - similarity search

Starting from a query sequence, search for similarsequences with known structure. Search for similar sequences in a database of protein structures. Multiple alignment. A weight can be assigned to each matching protein (higher

score to more similar proteins)

The higher is the sequence similarity, the more accuratewill be the predicted structure. When one disposes of structure for proteins with >70% similarity

with the query, a good model can be expected. When the similarity is <40%, homology modeling gives poor

results. The lack of available structures constitutes one of the main

limitations to homology modeling• In 2004, PDB contains

Page 34: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Homology modelling - Backbone modelling

Modelling of secondary structure elements a-helices b-sheets For each secondary structure element of the template, align the

backbone of query and template.

Loop modelling Databases of loop regions Loop main chain depends on number of aa and neighbour

elements (a-a, a-b, b-a, b-b)

Page 35: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Homology modelling - Side chain modelling

Side-chain conformation (model building and energyrefinement) Conserved side chains take same coordinates as in the template. For non-conserved side chains, use rotamer libraries to

determine the most favourable conformation.

Page 36: Bioinformaticspedagogix-tagc.univ-mrs.fr/courses/bioinfo_intro/pdf...1-cut 2-cuts 4-cuts Slide courtesy from Shoshana Wodak Methods for Identifying Domains Underlying principle Domain

Homology modelling - refinement

After the steps above have been completed, the modelcan be refined by modifying the positions of some atomsin order to reduce the energy.