Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or...

86
Bologna Winter School 2007 Protein Function

Transcript of Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or...

Page 1: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Bologna Winter School 2007

Protein Function

Page 2: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Basic questions:

How do proteins evolve changed or novel functions?

Given the amino acid sequences of proteins inferred from genomic sequences, how can we assign

functions to them?

Page 3: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Genomics gives us many new protein

sequences Often there is little experimental information

about the proteins themselves

What can we deduce about proteins from their amino acid sequences?

… from the amino acid sequence of one protein alone?

… from comparisons of amino acid sequences of related proteins from different species?

Page 4: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

What properties of proteins do we want to learn about and how do we measure and

analyse them?

amino acid sequence

three-dimensional structure

FUNCTION

expression pattern

regulation

Page 5: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Can we learn these properties by studying purified proteins in isolation?

amino acid sequence – yes, in principle

three-dimensional structure -- certainly

FUNCTION -- ??????

expression pattern – yes if we had to

regulation – probably not

Page 6: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

How do we learn these?

amino acid sequence – genomic sequences

three-dimensional structure – X-ray, NMR, ... modelling

FUNCTION – experiment? inference?

expression pattern -- microarrays

regulation – chip/chip experiments

Page 7: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Does knowledge about related proteins help?

amino acid sequence – possibly

three-dimensional structure – MR, modelling

FUNCTION – YES! BUT, HOW??

expression pattern – maybe

regulation -- maybe

Page 8: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Function is difficult

Sequence determines structure determines function

From knowing sequence and structure of one protein alone, can we deduce its function?

Identify binding site?

Identify catalytic residues?

Identify ligand?

Analogy to drug-design problem.

Page 9: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Given a protein structure can we predict function directly?

Sometimes… To some extent …

What are reasonable goals?

Sometimes structure gives general idea, guiding laboratory work to pin it down

Some examples from H. influenzae structural genomics project

Page 10: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

HI1679

α/β- hydrolase fold, putative remote homology to L-2-haloacid dehydrogenases

Several substrates tried.

HI1679 cleaved 6-phosphogluconate, phosphotyrosine

Page 11: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

HI1434

related to a region in tRNA synthetases.

contains putative binding site, likely to bind nucleotide

no specific ligand has yet been identified

Page 12: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Nuclear Transport Factor-2

• Protein known to be involved in traffiicking across nuclear membrane

• Crystal structure determined

• Mechanism of function not obvious

• ???

Page 13: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

NtF-2 homologous to scytalone

dehydratase• Alexei Murzin

spotted a similarity of fold between NTF-2 and scytalone dehydratase

• This structure shows scytalone dehydratase binding an inhibitor

Page 14: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Scytalone dehydratase

Scytalone dehydratase is an enzyme in the pathway for

melanin synthesis

Page 15: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

NTF-2 Superposition

Page 16: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Search for ligands

On the basis of the structural similarity, many ligands were designed and tested

So far, none has shown any binding or catalyzed reactivity

Conclusion: structural similarity is useful guide to hypotheses about function, but doesn’t always work …

Page 17: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

But many similar proteins have similar functions, don't they?

In many cases closely-related proteins have closely-related functions.

Example: human and horse haemoglobin

43 residue differences out of 446 (α+β chains)

96% residue identity

SAME FUNCTION

Page 18: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Function assignment from homology?

OK, if the sequences differ greatly then the function may differ

But if the sequences are similar, the functions

will be the same – WON'T THEY?

Well, sometimes ...

Page 19: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

'Homology modelling' of function?

Sequence determines structure determines function

Small changes in sequence produce small changes in structure

BUT:

dependence of function on sequence (and even on structure) doesn't have simple ‘topology’

Page 20: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Similar sequences produce similar structures

Page 21: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.
Page 22: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Recruitment

In many cases, similar proteins retain similar functions (example: mammalian globins)

Distantly-related proteins can retain function or diverge in function

But closely-related proteins can have very different functions

Even identical proteins can carry out different functions

Page 23: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Avian eye-lens proteins

In the duck, crystallins have identical sequences to liver enolase and lactate dehydrogenase

They never see the substrates in the eye

In other birds, sequences have changed enough to lose catalytic activity. This proves that enzymatic activity not necessary in eye

Page 24: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Proteinase do = DegP

Chaperone at low temperatures

Proteinase at high temperatures

Logic: moderate stress – try to rescue proteins

more extreme stress – give up and recycle

Page 25: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Function annotation in databases

Proteins appear in databases when their sequences are known

Annotation of function? Experimental evidence for function

Transfer of function from homologue How well does this work? How can we tell? Requires measure of distance between functions

Page 26: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Two goals of this kind of work

1. To study how protein function diverges as

amino acid sequence diverges

2. To evaluate the accuracy of transfer of

annotation among homologous proteins

Problems associated with goal 2 make goal 1

harder

Page 27: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

How do proteins change function as

their sequences diverge Divergence v. recruitment

Divergence:

Change in specificity (chymotrypsin, trypsin)

Change in regulation (myoglobin,

haemoglobin)

Related functions with similar mechanisms

(adaptation of catalytic site) (Gerlt & Babbitt)

Page 28: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Gene duplication and divergence General way to develop new functions Very old theory about how metabolic pathways

developed – new protein developed to provide substrate for current initial step: Now growing on B (BCD…ATP) Medium runs out of B. BC enzyme duplicates, diverges to catalyze AB Now you can grow on A (ABCD…ATP)

Attractive because: BC enzyme has binding site for B explains gene organization in operon

WRONG: mechanism of AB in general different from BC, needs different structure, catalytic residues

Page 29: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Derivation of function from coordinates

analysis of sequence and structure Homologous proteins may have diverged in

sequence and function (leave aside recruitment) Assume no strong sequence similarity to protein

of known function Align sequences Use structure to get better alignments Check for conservation of binding site, catalytic

residues

Page 30: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Structure-based function assignment

Extract functional residues from structures of

known function

Residues contributing to function of entire

homologous family conserved in whole family

Residues contributing to specific function of

subfamily conserved only in subfamily

Page 31: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Several groups have applied these ideas

Cohen & Lichtarge, ‘Evolutionary Trace Method’ (J. Mol. Biol. 1996)

Irving, Whisstock, Lesk (Proteins 2001) Hannenhalli & Russell (J. Mol. Biol. 2000) Sternberg and coworkers (PNAS 2004, Phil.

Trans. Roy. Soc. 2006)

See also: Automated Function Prediction, ISMB Special Interest Group Meeting, 2005

Page 32: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

How could we test predictions of

function?

Page 33: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

How to measure distance between functions?

For sequences and structures, there are natural measures of divergence

Sequence: count identical residues

Structures: r.m.s.d. of well-fitting parts

(Specialists may argue about details, or propose alternatives, but basically the answers aren't too different.)

Function: no natural measure of difference

Page 34: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Enzyme Commission / EC numbers

(EC numbers NOT European Commission)

Authorized by International Union of Biochemistry and Commission on Enzyme Nomenclature

EC set up by International Union of Biochemistry in 1955.

Report in 1961, modified 1964, several supplements since then.

Published as book, now available on web

Page 35: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

What does EC classify

Enzyme nomenclature

Classification of reactions catalysed by

enzymes

NOT a set of assignment of function to proteins

– That is a different task

(Note that Gene Ontology – another

classification scheme – also does not assign

functions to proteins)

Page 36: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Enzyme Commission numbers

Four-level hierarchy

Example: isopentenyl-diphosphate ∆-isomerase EC number 5.3.3.2: 5 = general category (of isomerases) 5.3 = intramolecular isomerases 5.3.3 = enzymes that transpose C=C bonds 5.3.3.2 = specific reaction

EC classifies reactions, names enzymes that catalyse reactions, does not name proteins.

Page 37: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Gene Ontology

EC limited to enzymes

Gene Ontology consortium produced new, more general classification of protein function

Three independent categories: Molecular function (overlaps EC)

Biological process

Subcellular location

GO: not tree structure, directed acyclic graph

Page 38: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Gene Ontology project

Initiated by Michael Ashburner (early 1990’s).

Has since grown, become de facto standard

References: Lewis, S.E. (2004). Gene Ontology: looking

backwards and forwards.Genome Biology 6:103.

Ashburner, M. (2006). Won for All / How the Drosophila Genome was Sequenced.  Cold Spring Harbor Laboratory Press.

Page 39: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

What is an ontology?

Specification of how to describe a body of knowledge

Nomenclature (fixed vocabulary)

Rules of syntax of terms

Types of relationships among entities:

‘Is a’: for instance: ‘A cat is a mammal.’

‘Part of’: for instance: ‘A tail is part of a cat.’

Page 40: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

What is an ontology?

Types of relationships among entities:

‘Is a’: for instance: ‘A cat is a mammal.’

‘Part of’: for instance: ‘A tail is part of a cat.’

Note that ‘A cat is a mammal. A mammal is an

animal’ implies that ‘A cat is an animal’

But ‘A tail is part of a cat. A cat is a mammal.’ does

NOT imply that a tail is a mammal.

Page 41: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Gene Ontology

EC limited to enzymes

Gene Ontology consortium produced new, more general classification of protein function

Three independent categories: Molecular function (overlaps EC)

Biological process

Subcellular location

GO: not tree structure, directed acyclic graph

Page 42: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Gene Ontology

EC limited to enzymes

Gene Ontology consortium produced new, more general classification of protein function

Three independent categories: Molecular function (overlaps EC)

Biological process

Subcellular location

GO: not tree structure, directed acyclic graph

Page 43: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.
Page 44: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

GO classification of isopentenyl-diphosphate ∆-isomerase

Page 45: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Several groups have measured relationship between sequence divergence and

functional divergence using EC classification

Example: Todd, Orengo & Thornton, JMB 2001

For enzymes, sequence identity > 40%, all four EC numbers conserved

sequence identity > 30% three levels of EC numbers conserved for 70% of pairs

How can this work be extended to GO classification?

Page 46: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Several groups have measured relationship between sequence divergence and functional divergence using EC classification

How to define metric on functions?

Distal GO-IDs

How to measure distance between SETS of GO-IDs

Page 47: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

How to define metric on functions?

Page 48: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Distal GO-IDs

Page 49: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

How to measure distance between SETS of GO-IDs

Page 50: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Dependence of function divergence on sequence divergence: the EF-hand family

GO distance

Fraction of pairs

Page 51: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

GO: Sources of annotation GO categories of sources of annotation:

IDA: Inferred from direct assay

TAS: Traceable author statement

IMP: Inferred from mutant phenotype

IGI: Inferred from genetic interaction

IPI: Inferred from physical interaction

ISS: Inferred from sequence similarity

IEA: Inferred from electronic annotation

NAS: Non-traceable author statement

Page 52: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Sources of Annotation: Experiment / InferredFrom: Thomas, P.D., Mi, H. & Lewis, S. (2007). Curr. Opin. Chem. Biol. 11, 4-11.

Page 53: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

To study accuracy of annotation transfer, use

experimental annotation only?

Obviously.

But there are problems.

Many fewer data

Inconsistencies

Sometimes annotation correct, but source of

annotation incorrect

Page 54: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Conclusions It is possible to define statistical distribution

describing relationship between divergence of sequence and divergence of function

General rule: sequences diverge, function diverges But: exceptions exist

Threshold at about 50% sequence identity at which sequence starts to diverge more radically

Databases contain many errors or incompleteness, still human, labour-intensive activity

Page 55: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Errors in databases

1. Keep them out – But how?

2. natural language processing by computer?

(Automatic: literature → database)???

3. If you find them correct them (you = WHO?)

4. Correct them where?

Master copy of database?

What about copies? Errors propagate?

How to propagate corrections?

Page 56: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Correction of Errors in Databases?

Eternal vigilance at each installation?????

Community involvement – curation by experts?

Open source idea – bulletin board?

‘Knowbots’ running around web? Security?

Distribute programs for ‘health checks’?

Page 57: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Inconsistencies

Different databases use different versions of GO

Different versions of different databases

Downloaded versions of different databases may

not be updated to reflect changes in parent

databases

What can be done?

Page 58: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Distributed updating of databases Park, Park &

Kim (2004). Bioinformatics Appl. Note.

Gene Ontology classification provides basis for database annotations

Updates to GO include: new terms new obsoletions term name changes new definitions new term merges term movements

Require updating of annotations

Page 59: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

GOChase (Park, Park & Kim) Recommend updates (security considerations

require local file changes) Web-based interfaces:

GOChase-History: evolution of GO ID GOChase-Correct: suggests change Health check of your database: flag problems Submit GO ID: report its use in annotation in a list of

common databases

http://www.strubi.org/software/GOChase/

Page 60: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.
Page 61: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

What other relationships among

properties of organisms are useful in

assigning function?

Page 62: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

What are we looking for?

We might try to identify proteins that have similar

functions in same or different species

Human and Horse haemoglobin

We may be able to find these if they are homologues

We might try to identify proteins that have

coordinated functions in same or different species

Two or more proteins in same metabolic pathway, or part

of same macromolecular complex

These may in general NOT be homologues

Page 63: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Various clues that proteins have

coordinated activities Linked on genome? (Best for bacteria, not for

archaea; occasionally for eukaryotes)

Appear as separate (monomeric) proteins in

one species, and as single multidomain protein

in other species

Often separate proteins in prokaryotes are

fused in eukaryotes (but some examples of

opposite are known)

Page 64: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Function assignment by reconstruction of metabolic

pathways

Page 65: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Shikimate kinase in Methanococcus jannaschii

In E. coli, shikimate kinase is an enzyme in the pathway of synthesis of chorismate from erythrose-4-phosphate

chorismate is a branch compound for the

synthesis of aromatic amino acids

tryptophan synthetase pathway one of the best

worked-out in E. coli, in terms of enzymology

and regulation

Page 66: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Pathway of synthesis of shikimate from erythrose-4-P in E. coli

From: Daugherty et al., J Bacteriol. 2001 January; 183(1): 292–300.

Page 67: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Cross-table of metabolic steps and genes

Match up known genes and known metabolic steps

No recognized protein for metabolic step?

Maybe metabolic step is missing from that organism

No recognized function for some gene?

Maybe can match up missing function with

gene missing function assignment

Page 68: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Matching gene with function

Check for homologues

Maybe find several

Maybe find none

Look in genome for operons containing

succession of genes for steps in pathway

Usually works in bacteria

Less common in archaea

Page 69: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Aromatic amino acid biosynthesis

R. Boyer

Page 70: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

E. coli trp operon

From: Garret, R.H. & Grisham, C.M. (1999) Biochemistry. 2nd ed. (Thomson Higher Education, Belmont, CA)

Note collinearity of genes with order of reactions in pathway

Page 71: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Shikimate kinase in Methanococcus jannaschii

In M. jannaschii, the shikimate kinase pathway is NOT catalysed by enzymes consecutive in the genome in an operon

Sequence similarity identified most enzymes but not shikimate kinase

In another archaeon, A. pernix, the genes in this pathway ARE collinear.

From this is was possible to identify the A. pernix shikimate kinase, and from that the M. jannaschii homologue.

Reference: Dougherty et al., J. Bacteriology (2001). 183, 292–300.

Page 72: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

From: Daugherty et al., J Bacteriol. 2001 January; 183(1): 292–300.

Mapping of genes in silicate synthesis pathway in several prokaryotic genomes

Page 73: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Mapping of genes for shikimate synthesis

in several prokaryotic genomes

From: Daugherty et al., J Bacteriol. 2001 January; 183(1): 292–300.

Page 74: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

From: Daugherty et al., J Bacteriol. 2001 January; 183(1): 292–300.

Page 75: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Why didn’t homology search work?

Archaeal shikimate kinase is NOT related to

bacterial or eukaryotic shikimate kinases.

It is distantly related to homoserine kinases of

the GHMP kinase superfamily.

M. jannaschii homoserine kinase IS identifiable

by homology

The two enzymes are substrate-specific

Page 76: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.
Page 77: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Phylogenetic profiles Clues to function from genes shared among

different organisms Different groups of organisms need different

sets of genes For instance, some bacteria have flagellae Genes found in bacteria that contain flagellae

but not in other bacteria or other groups of organisms: involved in flagellar function

Page 78: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Phylogenetic Profiles

Developed by Marcotte, Eisenberg et al. (PNAS 96,

4285-4288, 1999 and elsewhere)

Tabulate homologues of E. coli proteins in 16 other

genomes

(Note: assume homologues share function – this is

input to method, not result)

Table: column = organism, row = gene

Put a if organism has gene

Page 79: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

From: Pellegrini et al. (1999). Proc. Natl. Acad. Sci. U.S.A. 96, 4285-4288

Page 80: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Phylogenetic profile Pattern of row = barcode of which organisms a

gene occurs in Result: Genes that share patterns are

‘functionally linked’ Functionally linked = participate in some

coordinated way in some structure or process Note: proteins can be functionally linked even if

they are not homologous

Page 81: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Example: ribosomal proteins Homologues of coil protein RL7 are found in 10

bacterial genomes and yeast, not in archaea Those that match phylogenetic profile have

functions associated with ribosome Have pulled out sets of ribosomal proteins on

basis of phylogenetic profile Linked proteins need not be homologues nor be

localized in genome

Page 82: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Combine phylogenetic profiling with

matching ‘orphans’ Create metabolic network for an organism

Assign functions by homology when possible

Missing enzymes in pathway?

Genes that lack assignment?

Try to match these up (recall archaeal shikimate

kinase)

Phylogenetic profiles can assist in this

Page 83: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

From: Chen & Vitkup (2006). Genome Biol. 7, R17

Page 84: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Phylogenetic profiles / orphan assignment Chen &

Vitkup (2006). Genome Biol. 7, R17

Phylogenetic profiles can link proteins in a metabolic pathway

Even more, better fit of profile implies closer in metabolic network

Test, using yeast: remove gene from network try to recover it from pool of ~6000 genes results: 22.8% top prediction correct

(37.3% correct answer in top 10)

Page 85: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

Conclusion

Inferring protein function from knowledge of function of close relative is like solving the clue of an American crossword puzzle. Finding the precise word is difficult but task in principle straightforward

Inferring function a priori from structure like British crossword puzzle. Which clues are real? which clues are misleading?

Page 86: Bologna Winter School 2007 Protein Function. Basic questions: How do proteins evolve changed or novel functions? Given the amino acid sequences of proteins.

State of the art in function assignment

We have a ‘bag of tricks’ – that is, many

methods, all of which work sometimes and fail

sometimes.

In some cases, no method works except go

back to the lab and work it out.

We do not have a unified framework or a

systematic approach to function assignment