Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ......

35
Brian Kidd November 23, 2010 Computational Biology Tools Lecture 15: Protein Structure Prediction/Analysis *Slides from David Bernick and Carol Rohl

Transcript of Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ......

Page 1: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Brian Kidd

November 23, 2010

Computational Biology Tools

Lecture 15:

Protein Structure Prediction/Analysis

*Slides from David Bernick and Carol Rohl

Page 2: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Questions/Concerns from Last Time

Page 3: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Overview

1. Structure alignments

• methods and applications

2. Protein structure prediction

• methods and applications

3. Case study

4. 3D structure visualization

Page 4: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Structure more conserved than sequence

Why Examine Protein Structures?

Similar folds often share similar function

Remote similarities may only be detectable at structure level

Interpret experimental dataLocate sites of interesting mutations

Locate splice sites

Design ExperimentsIn silico mutagenesis

Page 5: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Structure Analysis

Identify interesting sites on a protein

Homologs

Mutants

With and without ligand (or binding partner)

Measure geometry (distances, angles, ...)

Examine surface properties (shape, charge)

Compare two structures

Page 6: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Comparing Protein Structures

Defined alignmentMutant v. wildtype, model v. experimental, i.e. two different conformations

Unique solution exists – we know the true alignment

Derived alignmentQuery is an unknown protein

Known parent (assumed homolog)

Calculate an “optimal” alignment computationally

Infer annotation from parent to query

Page 7: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

What do we want from an alignment?

Optimal alignmentImportant parts of protein should associate (align) with each other

Catalytic residues and their positionsImportant structures (hinges, binding sites, etc)Protein interface residues and their positionEvolutionary history

Natural selection only selects for successful function

Sequences (and alignments) are assumed to be sequential

Page 8: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

What do we want from an alignment?

Sequence alignments can be improved when we have structural information

No unique solution (more residues or closer match?)

Structural alignment implies a sequence alignment

Page 9: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Tools and DatabasesNCBI Structure (VAST and MMDB)

http://www.ncbi.nlm.nih.gov/Structure/Molecular Modeling Database

Experimentally derived structures from PDB (not theoretical)FSSP (DALI)

http://www2.embl-ebi.ac.uk/dali/fssp/http://ekhidna.biocenter.helsinki.fi/daliFamilies of structurally similar proteins

Maintains database of protein neighbors organized by PDB codeFully automated using the DALI algorithm (Holm & Sander)

No internal node annotationsStructural similarity search using DALI

CEhttp://cl.sdsc.edu/Combinatorial extension

Maintains database of protein neighbors by PDB code

Page 10: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Tools and Databases

Structure classification by domainClassification based on secondary structureSCOP – Structural Classification of Proteins

http://scop.berkeley.edu/Class-fold-superfamily-familyManual assembly by inspection (last release June 2009)

CATH – Class-Architecture-Topology-Homologous Superfamilyhttp://www.biochem.ucl.ac.uk/bsm/cath/Manual classification at Architecture levelAutomated topology classification using SSAP (Orengo & Taylor)Last release July 2009

CEMC – Multiple Structure Alignmenthttp://bioinformatics.albany.edu/~cemc/

Page 11: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

How Structure Alignments Work

MethodsStructal – Gerstein group at Yale

DALI – Holm group at Helsinki

VAST – NCBI resource

Structure similarity measuresRMSD – similarity metric

Pvalues – significance measure

Page 12: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Iterative Dynamic Programming

Algorithm

1. Make initial guess for the superposition

2. Calculate all pairwise Ca-Ca distances and generate scoring matrix

3. Find optimal alignment according to this scoring matrix by dynamic programming

4. Re-superimpose structures using this alignment

5. Repeat steps 2–4 until converged

No guarantee of optimal solution, final results depends on the initial alignment selected

Structural: Subbiah et al., Curr. Biol. 3:141 (1993)

Page 13: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Structural Alignment

Many methods other than dynamic programming are used

Most methods use some sort of heuristics to speed things up and make good initial guess

Sheba – sequence alignment

Mammoth – local structural alignment

VAST – aligns secondary structure element vectors

DALI – distance matrix alignment

Page 14: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Distance Matrix AlignmentMatrix of all pairwise distances

Characteristic patterns:

Main diagonal runs correspond to helix (i.e. local contacts)

Hairpins – start on main diagonal, run perpendicular

Parallel pairs run parallel to main diagonal

Others are long range contacts

Converts 3D alignment problem into a 2D problem

Find best subset of rows and columns such that the distance matrices of two proteins are optimally similar

Myoglobin

Page 15: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

!"#$%&'()

*+%,&-

*+.&"/&'

00*1$".'21

Contact Map Comparison

Myoglobin

Protein G

// strands

α-helix

β-hairpin

Page 16: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Similarity Measure

RMSD – root mean square deviation�< ||XA

i −XBi ||2 >

1. Superimpose optimally

2. Pair up residues

3. Calculate RMSD

!"#

!$#!%#

!&#

!'#

!"(

!$(

!&(

!%(!'(

Sensitive to outliersDepends on number of pairs comparedA better measure is the significance of this RMSD for similar sized matches

Page 17: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

z-scores and p-values

!"#$%&'()*+,"-.*+'(*)(/

0'$12(3(-42(,"-.*+'(5(3

/(-42(,"-.*+'(5(/

6(-42(,"-.*+'(5(6,"-.*+'(5(7

,"-.*+'(5(8

z-score: number of standard deviations above/below the mean

± 1 sd ~ 66%

± 2 sd ~ 95%

If we have a histogram, we can just count, or integrate a function fitted to the histogram

p-valueprobability of obtaining ≥ this score under the null model (normally distributed data -- “by chance”)

Histogram of scores for random matches

Page 18: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Meaning of Structural Alignments!"""""""!""!"""""""!""""""""!!!!!!!!""!!!!!!!!!!!"""!

###$%&'()*!+*,)*&*+-(!-./0*&-1##!####()2)&%!0)-,&..0%%333!!!3!3333333333!33333333!!!!!!!!!!!!!!!!!!3333332*4)(*+&1-!2-,&1-*&05!000*4&+022!--2,+0+.4/!562,25/*52

"""""""!""!"""""""!""""""""!!!!!!!!""!!!!!!!!!!!"""

"""""!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!""""""""!"""""6+&'2,)%##!#+-0,6##*+!/########0!41&%)-/*+7!+(+6+6,33!3!3!3!!!!3333!!!33!3!!!!!!!!3!33333333,*&*/,*&0%!/0%/'+000%!&-2,4(+*5(!24.*/05*&)!*7%--,+"""""!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!""""""""!"""""

!"#$ %&'(

Two proteins are clearly structurally similar

Mammoth identifies similar substructures, but the alignment is not entirely “correct”

Opportunistic matched residuesMisses some analogous elements

1ubq 4fxc

Page 19: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis
Page 20: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Why Predict Protein Structures?

Page 21: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Test models and theories about structural biochemistry

Why Predict Protein Structures?

Identify drug targets for medicine

Experimentally derived structures are still slow and not all structures are easily solved

Explore states that are difficult to examine experimentally

Page 22: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Challenges for Structure Prediction

Search space is astronomical – need an efficient sampling algorithm

Actual proteins tend to be in energy minimums – need a scoring system for discriminating between modesl

Page 23: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

CASP

Critical Assessment of Structure Prediction

Community effort to improve predictions

Forced scientists to start learning what actually works in prediction

Page 24: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Types of Predictions

Comparative Model

Ab initio or de novo

Page 25: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

!"#$%&%!"#$%&'()!*'+,!-#.%/0!Stage I. Fragment

Assembly!

Baker Method

*Slides from Rhiju Das

Page 26: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

!"#$%&%!"#$%&'()!*'+,!-#.%/0!Stage II. All-atom

refinement!

Baker Method

*Slides from Rhiju Das

Page 27: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Example

Page 28: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

!"#$%&%!"#$$%""%"&!'(()!"

Native! Model!2.0 Å over 61 residues

CASP7 target T0316 (domain 3)

*Slides from Rhiju Das

Page 29: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis
Page 30: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Case Study

PAZ domain of Pf_Ago

Pf_Ago 1u04

hAgo1 1si2/1si3

Y212 Y309(Y90)

Y216 Y314(Y95)

H217 H269(H49)

Y190 Y277(Y57)

Ji-Joon Song [PMID: 15284453] asserts on p. 1435 that the following are functionally equivalent:

What do you think?

Page 31: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Case Study Continued

1u04 – PAZ domain, chain A 152-275

1si2 – PAZ domain, chain A 4-128

http://www.pdb.orgGet the above structures

http://www.ebi.ac.uk/DaliLite/Align 1UO4 with 1SI2:A (h_Ago)

What is the z-score? Is it significant?

What is the RMSD? Is this a reasonable alignment?

How many residues aligned?

Page 32: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

PyMol InClassLoad both molecules (aligned) into PyMol

Action-preset-pretty for both molecules

For 1u04, delete everything you don’t need

select-rename object 1u04

chain B; select-remove atoms

chain A and resi 1-151; select-remove atoms

chain A and resi 276-770; select remove atoms

color red

Load 1si2, color it yellow

chain B is a small RNA; show spheres, chain B; color blue

select 1u04 and resi 212; show as sticks

repeat for 190, 216, 217

select 1si2 and resi 309; show as sticks

repeat for 269, 277, 314

Page 33: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

In Class Summary

So, who’s correct?

Is J.J. Song correct?

Is Dali?

Is Vast?

Page 34: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

Essentials at this PointAccessing literature and sequence information from various databases (NCBI and UCSC)

BLAST (all variants)

Pairwise sequence analysis tools and algorithms

Single sequence analysis tools DNA:EMBOSS, ORFs, Restriction Enzymes, & Primers

Protein databases and analysis tools

PSI and PHI BLASTs

Multiple sequence alignments

Phylogeny

RNA structure (basics and analytical tools)

Protein structure (basics and analytical tools)

This is everything!

Page 35: Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ... programming are used Most methods use some sort of heuristics ... Single sequence analysis

For Next Time

Reading

Problem set

Review

Finish up PS #3 (due Tuesday, November 23)

Start working on PS #4 (due Friday, December 3)

http://www.soe.ucsc.edu/classes/bme110/Fall10/calendar.html