Tyler functional annotation thurs 1120

29
Functional Annotation and the Gene Ontology Brett Tyler Virginia Bioinformatics Institute

Transcript of Tyler functional annotation thurs 1120

Page 1: Tyler functional annotation thurs 1120

Functional Annotation and the Gene Ontology

Brett Tyler Virginia Bioinformatics Institute

Page 2: Tyler functional annotation thurs 1120

What is Annotation

• comments, notes, explanations, or other types of external remarks that can be attached to a document……

• For genomicsfunctional annotation means attaching biological information to sequences

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 3: Tyler functional annotation thurs 1120

Functional Annotation

Manual curation

Structural Annotation

Automated GO MetabolicPathways

EC Number

Assignments

Searches

Domain/MotifsNucleotide/Protein Databases

Page 4: Tyler functional annotation thurs 1120

Functional AnnotationStructural Annotation

Automated GO MetabolicPathways

EC Number

Assignments

Manual curation

Searches

Domain/MotifsNucleotide/Protein Databases

Page 5: Tyler functional annotation thurs 1120

Automated Searches

• Search programs can be downloaded and run internally on unix system

• Graphic user interfaces but normally takes limited sequences

Page 6: Tyler functional annotation thurs 1120

Homology or similarity based searches

• Local pairwise alignment tools : look for any regions of similarity within the proteins that score well.– BLAST

• fast

• Global pairwise alignment tools take two sequences and attempt to find an alignment of the two over their full lengths.– Needleman-Wunsch

• finds best out of all possible alignments

• Multiple alignments tools try to align 3 or more proteins so that the maximal number of amino acids from each protein are matched in the alignment - this may or may not include the full length of some or all of the proteins– clustalW

Page 7: Tyler functional annotation thurs 1120

BLAST Programs

• Blastn: Search a nucleotide database using a nucleotide query

• BlastP: Search protein database using a protein query

• Blastx: Search protein database using a translated nucleotide query

• Tblastn: Search translated nucleotide database using a protein query

• Tblastx: Search translated nucleotide database using a translated nucleotide query

Page 8: Tyler functional annotation thurs 1120

Example of BLAST output

top row is the search protein (query) and the bottom row is the match protein (subject).Middle row is consensus+ indicates similar amino acidsnumbers indicate amino acid position in the sequence

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 9: Tyler functional annotation thurs 1120

Functional AnnotationStructural Annotation

Automated GO MetabolicPathways

EC Number

Assignments

Manual curation

Searches

Domain/MotifsNucleotide/Protein Databases

Page 10: Tyler functional annotation thurs 1120

Domain Search

Hidden Markov Models• Stastistical models of the primary

structure consensus of a sequence family

Presenter
Presentation Notes
Add Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 11: Tyler functional annotation thurs 1120

Pfamhttp://pfam.sanger.ac.uk/

• Large collection of protein families represented by multiple sequence alignments and HMMs

• Analyze protein sequences for Pfam match• Look at multiple alignments of members of

the gene family

Page 12: Tyler functional annotation thurs 1120

INTERPROhttp://www.ebi.ac.uk/interpro/

• Database of protein families, domains and sites with identified in known proteins which can be applied to new protein sequences

• Collects protein families from other databases such as Pfam, UniProtKb and TIGRFAMs

• Sequence search is done with InterProScanDownloadable (rans faster on own

server, large set)GUI (limited number of sequences)

Page 13: Tyler functional annotation thurs 1120

Subcellular localization

• Signal P:Predicts the presence and location of signal peptide and cleavage sites in organism

• TMHMM: Predicts transmembrane • TargetP:Predicts subcellular location based

on chlroplast transit peptide and mitochondrial targeting sequence

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 14: Tyler functional annotation thurs 1120

Signal P Searchhttp://www.cbs.dtu.dk/services/SignalP/

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 15: Tyler functional annotation thurs 1120

Sample SignalP OutputCRN2…confirmed with proteomics

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 16: Tyler functional annotation thurs 1120

Sample SignalP OutputCRN2…confirmed with proteomics

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 17: Tyler functional annotation thurs 1120

Search EC numbershttp://ca.expasy.org/enzyme/

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 18: Tyler functional annotation thurs 1120

Functional AnnotationStructural Annotation

Automated GO MetabolicPathways

EC Number

Assignments

Manual curation

Searches

Domain/MotifsNucleotide/Protein Databases

Page 19: Tyler functional annotation thurs 1120

Metabolic Pathways

•Help improve annotation by showing missing genes in essentail pathways•Useful for comparative genomicsKEGG: http://www.genome.jp/kegg/pathway.htmlReactome: http://www.reactome.orgMetacyc:http://www.metacyc.org

Add lots of others

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 20: Tyler functional annotation thurs 1120

KEGG: Kyoto Encyclopedia of Genes and Genomes

http://www.genome.jp/kegg/pathway.html

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.
Page 21: Tyler functional annotation thurs 1120

Functional AnnotationStructural Annotation

Automated GO MetabolicPathways

EC Number

Assignments

Manual curation

Searches

Domain/MotifsNucleotide/Protein Databases

Page 22: Tyler functional annotation thurs 1120
Page 23: Tyler functional annotation thurs 1120
Page 24: Tyler functional annotation thurs 1120
Page 25: Tyler functional annotation thurs 1120

First set of terms

These processes are general to all associations

Some initial PAMGO Biological Process Terms Included in initial 35 terms added Jan 2005

Page 26: Tyler functional annotation thurs 1120

GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death

oomycete

bacterium

Page 27: Tyler functional annotation thurs 1120

GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death

GO: 0052048 interaction with host via secreted substance GO: 0052044 induction by symbiont of host programmed cell death

oomycete

bacterium

GO: 0009405 pathogenesis

Page 28: Tyler functional annotation thurs 1120

Functional AnnotationStructural Annotation

Automated GO MetabolicPathways

EC Number

Assignments

Manual curation

Searches

Domain/MotifsNucleotide/Protein Databases

Page 29: Tyler functional annotation thurs 1120

Why manual Annotation

Combine all search information and evidenceManually look through all informationAdd experimental data from literature when availableApproach conservatively

SetbackTime-consuming and more expensive.

Presenter
Presentation Notes
Find out the advantage of one over another BLAST:t is used to compare a novel sequence with those contained in nucleotide and protein databases by aligning the novel sequence with previously characterised genes. The emphasis of this tool is to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of this novel sequence. Regions of similarity detected via this type of alignment tool can be either local, where the region of similarity is based in 1 location, or global, where regions of similarity can be detected across otherwise unrelated genetic code.