Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ......

Post on 10-Mar-2018

220 views 5 download

Transcript of Computational Biology Tools - Courses · PDF fileMeasure geometry (distances, angles, ......

Brian Kidd

November 23, 2010

Computational Biology Tools

Lecture 15:

Protein Structure Prediction/Analysis

*Slides from David Bernick and Carol Rohl

Questions/Concerns from Last Time

Overview

1. Structure alignments

• methods and applications

2. Protein structure prediction

• methods and applications

3. Case study

4. 3D structure visualization

Structure more conserved than sequence

Why Examine Protein Structures?

Similar folds often share similar function

Remote similarities may only be detectable at structure level

Interpret experimental dataLocate sites of interesting mutations

Locate splice sites

Design ExperimentsIn silico mutagenesis

Structure Analysis

Identify interesting sites on a protein

Homologs

Mutants

With and without ligand (or binding partner)

Measure geometry (distances, angles, ...)

Examine surface properties (shape, charge)

Compare two structures

Comparing Protein Structures

Defined alignmentMutant v. wildtype, model v. experimental, i.e. two different conformations

Unique solution exists – we know the true alignment

Derived alignmentQuery is an unknown protein

Known parent (assumed homolog)

Calculate an “optimal” alignment computationally

Infer annotation from parent to query

What do we want from an alignment?

Optimal alignmentImportant parts of protein should associate (align) with each other

Catalytic residues and their positionsImportant structures (hinges, binding sites, etc)Protein interface residues and their positionEvolutionary history

Natural selection only selects for successful function

Sequences (and alignments) are assumed to be sequential

What do we want from an alignment?

Sequence alignments can be improved when we have structural information

No unique solution (more residues or closer match?)

Structural alignment implies a sequence alignment

Tools and DatabasesNCBI Structure (VAST and MMDB)

http://www.ncbi.nlm.nih.gov/Structure/Molecular Modeling Database

Experimentally derived structures from PDB (not theoretical)FSSP (DALI)

http://www2.embl-ebi.ac.uk/dali/fssp/http://ekhidna.biocenter.helsinki.fi/daliFamilies of structurally similar proteins

Maintains database of protein neighbors organized by PDB codeFully automated using the DALI algorithm (Holm & Sander)

No internal node annotationsStructural similarity search using DALI

CEhttp://cl.sdsc.edu/Combinatorial extension

Maintains database of protein neighbors by PDB code

Tools and Databases

Structure classification by domainClassification based on secondary structureSCOP – Structural Classification of Proteins

http://scop.berkeley.edu/Class-fold-superfamily-familyManual assembly by inspection (last release June 2009)

CATH – Class-Architecture-Topology-Homologous Superfamilyhttp://www.biochem.ucl.ac.uk/bsm/cath/Manual classification at Architecture levelAutomated topology classification using SSAP (Orengo & Taylor)Last release July 2009

CEMC – Multiple Structure Alignmenthttp://bioinformatics.albany.edu/~cemc/

How Structure Alignments Work

MethodsStructal – Gerstein group at Yale

DALI – Holm group at Helsinki

VAST – NCBI resource

Structure similarity measuresRMSD – similarity metric

Pvalues – significance measure

Iterative Dynamic Programming

Algorithm

1. Make initial guess for the superposition

2. Calculate all pairwise Ca-Ca distances and generate scoring matrix

3. Find optimal alignment according to this scoring matrix by dynamic programming

4. Re-superimpose structures using this alignment

5. Repeat steps 2–4 until converged

No guarantee of optimal solution, final results depends on the initial alignment selected

Structural: Subbiah et al., Curr. Biol. 3:141 (1993)

Structural Alignment

Many methods other than dynamic programming are used

Most methods use some sort of heuristics to speed things up and make good initial guess

Sheba – sequence alignment

Mammoth – local structural alignment

VAST – aligns secondary structure element vectors

DALI – distance matrix alignment

Distance Matrix AlignmentMatrix of all pairwise distances

Characteristic patterns:

Main diagonal runs correspond to helix (i.e. local contacts)

Hairpins – start on main diagonal, run perpendicular

Parallel pairs run parallel to main diagonal

Others are long range contacts

Converts 3D alignment problem into a 2D problem

Find best subset of rows and columns such that the distance matrices of two proteins are optimally similar

Myoglobin

!"#$%&'()

*+%,&-

*+.&"/&'

00*1$".'21

Contact Map Comparison

Myoglobin

Protein G

// strands

α-helix

β-hairpin

Similarity Measure

RMSD – root mean square deviation�< ||XA

i −XBi ||2 >

1. Superimpose optimally

2. Pair up residues

3. Calculate RMSD

!"#

!$#!%#

!&#

!'#

!"(

!$(

!&(

!%(!'(

Sensitive to outliersDepends on number of pairs comparedA better measure is the significance of this RMSD for similar sized matches

z-scores and p-values

!"#$%&'()*+,"-.*+'(*)(/

0'$12(3(-42(,"-.*+'(5(3

/(-42(,"-.*+'(5(/

6(-42(,"-.*+'(5(6,"-.*+'(5(7

,"-.*+'(5(8

z-score: number of standard deviations above/below the mean

± 1 sd ~ 66%

± 2 sd ~ 95%

If we have a histogram, we can just count, or integrate a function fitted to the histogram

p-valueprobability of obtaining ≥ this score under the null model (normally distributed data -- “by chance”)

Histogram of scores for random matches

Meaning of Structural Alignments!"""""""!""!"""""""!""""""""!!!!!!!!""!!!!!!!!!!!"""!

###$%&'()*!+*,)*&*+-(!-./0*&-1##!####()2)&%!0)-,&..0%%333!!!3!3333333333!33333333!!!!!!!!!!!!!!!!!!3333332*4)(*+&1-!2-,&1-*&05!000*4&+022!--2,+0+.4/!562,25/*52

"""""""!""!"""""""!""""""""!!!!!!!!""!!!!!!!!!!!"""

"""""!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!""""""""!"""""6+&'2,)%##!#+-0,6##*+!/########0!41&%)-/*+7!+(+6+6,33!3!3!3!!!!3333!!!33!3!!!!!!!!3!33333333,*&*/,*&0%!/0%/'+000%!&-2,4(+*5(!24.*/05*&)!*7%--,+"""""!!!!!!!!"!!!!!!!!!!!!!!!!!!!!!""""""""!"""""

!"#$ %&'(

Two proteins are clearly structurally similar

Mammoth identifies similar substructures, but the alignment is not entirely “correct”

Opportunistic matched residuesMisses some analogous elements

1ubq 4fxc

Why Predict Protein Structures?

Test models and theories about structural biochemistry

Why Predict Protein Structures?

Identify drug targets for medicine

Experimentally derived structures are still slow and not all structures are easily solved

Explore states that are difficult to examine experimentally

Challenges for Structure Prediction

Search space is astronomical – need an efficient sampling algorithm

Actual proteins tend to be in energy minimums – need a scoring system for discriminating between modesl

CASP

Critical Assessment of Structure Prediction

Community effort to improve predictions

Forced scientists to start learning what actually works in prediction

Types of Predictions

Comparative Model

Ab initio or de novo

!"#$%&%!"#$%&'()!*'+,!-#.%/0!Stage I. Fragment

Assembly!

Baker Method

*Slides from Rhiju Das

!"#$%&%!"#$%&'()!*'+,!-#.%/0!Stage II. All-atom

refinement!

Baker Method

*Slides from Rhiju Das

Example

!"#$%&%!"#$$%""%"&!'(()!"

Native! Model!2.0 Å over 61 residues

CASP7 target T0316 (domain 3)

*Slides from Rhiju Das

Case Study

PAZ domain of Pf_Ago

Pf_Ago 1u04

hAgo1 1si2/1si3

Y212 Y309(Y90)

Y216 Y314(Y95)

H217 H269(H49)

Y190 Y277(Y57)

Ji-Joon Song [PMID: 15284453] asserts on p. 1435 that the following are functionally equivalent:

What do you think?

Case Study Continued

1u04 – PAZ domain, chain A 152-275

1si2 – PAZ domain, chain A 4-128

http://www.pdb.orgGet the above structures

http://www.ebi.ac.uk/DaliLite/Align 1UO4 with 1SI2:A (h_Ago)

What is the z-score? Is it significant?

What is the RMSD? Is this a reasonable alignment?

How many residues aligned?

PyMol InClassLoad both molecules (aligned) into PyMol

Action-preset-pretty for both molecules

For 1u04, delete everything you don’t need

select-rename object 1u04

chain B; select-remove atoms

chain A and resi 1-151; select-remove atoms

chain A and resi 276-770; select remove atoms

color red

Load 1si2, color it yellow

chain B is a small RNA; show spheres, chain B; color blue

select 1u04 and resi 212; show as sticks

repeat for 190, 216, 217

select 1si2 and resi 309; show as sticks

repeat for 269, 277, 314

In Class Summary

So, who’s correct?

Is J.J. Song correct?

Is Dali?

Is Vast?

Essentials at this PointAccessing literature and sequence information from various databases (NCBI and UCSC)

BLAST (all variants)

Pairwise sequence analysis tools and algorithms

Single sequence analysis tools DNA:EMBOSS, ORFs, Restriction Enzymes, & Primers

Protein databases and analysis tools

PSI and PHI BLASTs

Multiple sequence alignments

Phylogeny

RNA structure (basics and analytical tools)

Protein structure (basics and analytical tools)

This is everything!

For Next Time

Reading

Problem set

Review

Finish up PS #3 (due Tuesday, November 23)

Start working on PS #4 (due Friday, December 3)

http://www.soe.ucsc.edu/classes/bme110/Fall10/calendar.html