Modelling Genome Structure and Function Ram Samudrala University of Washington.

15
Modelling Genome Structure and Function Ram Samudrala University of Washington

Transcript of Modelling Genome Structure and Function Ram Samudrala University of Washington.

Page 1: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Modelling Genome Structure and FunctionRam Samudrala

University of Washington

Page 2: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Rationale for understanding protein structure and function

Protein sequence

-large numbers of sequences, including whole genomes

Protein function

- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution

?

structure determination structure prediction

homologyrational mutagenesisbiochemical analysis

model studies

Protein structure

- three dimensional- complicated- mediates function

Page 3: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Comparative modelling of protein structure

KDHPFGFAVPTKNPDGTMNLMNWECAIPKDPPAGIGAPQDN----QNIMLWNAVIP** * * * * * * * **

… …

scanalign

refine

physical functions

build initial model

minimum perturbation

construct non-conservedside chains and main chains

graph theory, semfold

de novo simulation

Page 4: Modelling Genome Structure and Function Ram Samudrala University of Washington.

CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity

**T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%)

**T128/sodm – 1.0 Å (198 residues; 50%)

**T125/sp18 – 4.4 Å (137 residues; 24%)

**T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%)

Comparative modelling at CASP

CASP2

fair~ 75%~ 1.0 Å~ 3.0 Å

CASP3

fair~75%

~ 1.0 Å~ 2.5 Å

CASP4

fair~75%~ 1.0 Å~ 2.0 Å

CASP1

poor~ 50%~ 3.0 Å> 5.0 Å

BC

excellent~ 80%1.0 Å2.0 Å

alignmentside chainshort loopslonger loops

Page 5: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Ab initio prediction of protein structure

sample conformational space such thatnative-like conformations are found

astronomically large number of conformations5 states/100 residues = 5100 = 1070

select

hard to design functionsthat are not fooled by

non-native conformations(“decoys”)

Page 6: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Semi-exhaustive segment-based foldingEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK

generatefragments from database14-state , model

… …

minimisemonte carlo with simulated annealingconformational space annealing, GA

… …

filter all-atom pairwise interactions, bad contactscompactness, secondary structure

Page 7: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Ab initio prediction at CASP

CASP1: worse than random

CASP2: worse thanrandom with one

exception

CASP4: consistently predicted correct topology - ~4-6.0 A for 60-80+ residues

CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues

**T110/rbfa – 4.0 Å (80 residues; 1-80) *T114/afp1 – 6.5 Å (45 residues; 36-80)

**T97/er29 – 6.0 Å (80 residues; 18-97)

**T106/sfrp3 – 6.2 Å (70 residues; 6-75)

*T98/sp0a – 6.0 Å (60 residues; 37-105) **T102/as48 – 5.3 Å (70 residues; 1-70)

Before CASP (BC):“solved”

(biased results)

Page 8: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Application of prediction methods to Invb

Page 9: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Computational aspects of structural genomics

D. ab initio prediction

C. fold recognition

*

*

*

*

*

*

*

*

*

*

B. comparative modellingA. sequence space

*

*

*

*

*

*

*

*

*

*

*

*

E. target selection

targets

F. analysis

*

*

(Figure idea by Steve Brenner.)

Page 10: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Computational aspects of functional genomics

structure based methodsmicroenvironment analysis

zinc binding site?

structure comparison

homology function?

sequence based methods

sequence comparisonmotif searches

phylogenetic profilesdomain fusion analyses

+

experimental data+

*

**

*G. assign function

*

*

assign function toentire protein space

Page 11: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Modelling structure and function of the Oryza sativa (rice) genome

Most common functions (from PROSITE)

ATP/GTP-binding site motif A (P loop)Serine/Threonine protein kinase active site

EF-hand (Calcium binding)Cytochrome C Heme binding site

Most common functions (from annotations)

Reverse transcriptaseNucleotide Binding Site (NBS) Serine/Threonine protein kinase

Chitinase

~30 % with known homologs in PDB

6813 coding sequences3149 without a product annotation 816 classified as hypothetical protein1187 with a hypothetical function

47%

12%

17%

24%Annotation?

Protein?

Function?

Assigned

Page 12: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Bioverse webserver

sequence

structure summary

summary

function summary

see another variantopen/close subgrouplist links (or follow)mapping to sequence

http://bioverse.compbio.washington.edu

Page 13: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Bioverse webserver

sequence

structure summary

secondary structure

tertiary structure

summary

sequence

evidence for sequence

evidence for tertiary structure

structural similarity to another protein

structural similarity to another protein

structural similarity to another protein

evidence for similarity

Page 14: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Bioverse webserver

sequence

structure summary

summary

function summary

function 1

function 2evidence for function 2

functional similarity to another protein

functional similarity to another protein

functional similarity to another protein

evidence for similarity

Page 15: Modelling Genome Structure and Function Ram Samudrala University of Washington.

Take home message

Prediction of protein structure and function can be used to model whole genomes to understand

organismal function and evolution

Jason McDermottYi-Ling Chen

Levitt and Moult groups

Acknowledgements