Automate Function Prediction

23
Automate Function Prediction

description

Automate Function Prediction. Outline. Goal How function is defined Why Gene Ontology Methods for protein function prediction End points. GOAL. A) You find a new protein B) You sequence the whole genome of your favorite organism Obtained gene (s) should be annotated - PowerPoint PPT Presentation

Transcript of Automate Function Prediction

Page 1: Automate Function Prediction

Automate Function Prediction

Page 2: Automate Function Prediction

Outline

• Goal• How function is defined• Why Gene Ontology• Methods for protein function prediction• End points

Page 3: Automate Function Prediction

GOAL

• A) You find a new protein• B) You sequence the whole genome of your

favorite organism• Obtained gene(s) should be annotated

• A can be solved manually. B needs automatic tools

Page 4: Automate Function Prediction

How function is defined

• Functional description as text• Linking gene to Key Words (Uniprot)• Linking gene Gene Ontology • Linking gene to Signalling Pathways or

Biochemical Pathways (KEGG)

Page 5: Automate Function Prediction

Why Gene Ontology (GO)

• GO represents a popular standard currently in the gene annotation

• GO represents categories that represent gene function

• Creates an union for genes in same process• Easy summary for genes with similar function

Page 6: Automate Function Prediction

Why Gene Ontology (GO)

• 3 sub-parts: Biological Process, Molecular Function, Cellular Localization – Molecular Function => chemical activity– Biological Process => Biology, cellular process– Cellular localization => Location of gene

• Hierarchical structure– Categories with very precise function– Categories with less precise function– Categories with very broad function

Page 7: Automate Function Prediction

How GO helps

• End user: Summary categories for genes with various functions

• Computer programs: Classifier algorithms can be taught to predict the categories for genes

Page 8: Automate Function Prediction

Understanding GO• Amigo server

(http://amigo.geneontology.org/cgi-bin/amigo/go.cgi)

Page 9: Automate Function Prediction

Function Prediction: What can we use to predict function

• Sequence homology (BLAST result list)• Phylogenetic tree of sequences• Protein Domains (PFAM domains)• Short sequence patterns – motifs• Sequence features (sec. struct., low compl.

regions)

Page 10: Automate Function Prediction

Sequence Homology Methods

• Do a BLAST search with a query sequence• Collect GO classes for genes in the BLAST

result hit• Give a weight to each BLAST hit – often log(E-value)

• Combine the scores from the genes that belong to same GO class

• Report the top best / significant GO classes

Page 11: Automate Function Prediction

Sequence Homology Methods

• Simple methods• Programs– BLAST2GO (http://www.blast2go.com/b2ghome)

– GOTCHA (http://www.compbio.dundee.ac.uk/gotcha/gotcha.php)

– ARGOT(http://www.medcomp.medicina.unipd.it/Argot2/form.php)

– PFP (http://kiharalab.org/web/pfp.php)

Page 12: Automate Function Prediction

Phylogenetic tree methods

• Create the pair-wise distances for the set of genes• Do a hierarchical clustering of genes• Map the know GO functions to cluster tree• Look for unknown genes in a cluster with many

genes from the same GO class• Report the top best / significant GO classes

• More => http://genome.cshlp.org/content/8/3/163.full

Page 13: Automate Function Prediction

Phylogenetic tree methods

• These should outperform sequence homology methods (CAFA 2011?)

• Require a set of related genes• Often much heavier calculations• Programs:– Sifter

(http://genome.cshlp.org/content/early/2011/07/22/gr.104687.109)

Page 14: Automate Function Prediction

Prediction with Protein domains

• Look what protein domains there are in query protein (PFAM)

• Map the functions that are linked to domains to your query sequence– PFAM2GO

• Programs: InterProScan + PFAM2GO • Drawbacks: – This mapping is same in plant, mammal, bacteria– Many domains to specific function

Page 15: Automate Function Prediction

Prediction with Protein domains

• Benefits:– Can create annotation from separate domains– Similar seq:s do not have to be in database

• Programs (?): InterProScan (http://www.ebi.ac.uk/InterProScan/)

• Drawbacks: – The mapping is same in plant, mammal, bacteria– Many domains to specific function

Page 16: Automate Function Prediction

Prediction with patterns and motifs

• Same principle as before, but we look sequence patterns and motifs

• Map the functions that are linked to patterns to your query sequence

• Programs: – InterProScan – IBM BioDictionary (http://cbcsrv.watson.ibm.com/Tpa.html)

• Drawbacks and benefits appr. same as before

Page 17: Automate Function Prediction

Prediction with sequence features

• Again same principle as before • We look seq. features (see pict.)• These are given as an input to classifier

algorithm (Support Vector Machine)

Page 18: Automate Function Prediction

Prediction with sequence features

Page 19: Automate Function Prediction

Prediction with sequence features

• Benefits: – No actual seq. similarity needed– Info collected from vague similarities– Use of classifier => feature weighting

• Program: FFPred (http://bioinf.cs.ucl.ac.uk/ffpred/)

• Drawbacks: • Calculations probably quite heavy• No use of nearby sequence similarities (domains etc.)

Page 20: Automate Function Prediction

Our contribution: PANNZER

• Use BLAST result list• Add Taxonomic information• Score GO classes using a score that takes the

frequency of GO class in seq. DB into account• Method is used to predict:– GO Classes– Description line

Page 21: Automate Function Prediction

Our contribution: PANNZER

• Benefits:– Taking the species taxonomy into account– Improved use of statistics

• Not public yet

Page 22: Automate Function Prediction

Our contribution: No Name Yet

• Take PFAM domain predictions, BLAST similarities and Taxonomic information

• Feed this to feature selection and to classifier algorithm

• …Wait…• Method is used to predict GO-classes• Not public + testing is ongoing

Page 23: Automate Function Prediction

Conclusion

• These methods increasingly needed• Some methods exist• Unfortunately no clear evaluation (my

opinion)• Remember: These are predictions. No certain

info until they are tested in wet lab…