Functional Annotation & Comparative Genomics
description
Transcript of Functional Annotation & Comparative Genomics
![Page 1: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/1.jpg)
1
Functional Annotation & Comparative Genomics
Lavanya RishishwarFebruary 26th, 2014
26th Feb 2014
![Page 2: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/2.jpg)
Outline
Functional annotation• What is functional annotation?• Importance of functional annotation• Approaches to functional annotation• Pros/cons of available approaches
Comparative genomics• What is comparative genomics?• Importance of comparative genomics• Approaches and tools
26th Feb 2014 2
![Page 3: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/3.jpg)
THE ‘WHAT?’Functional Annotation
326th Feb 2014
![Page 4: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/4.jpg)
Genome Assembly
Assemble the Pieces Right
426th Feb 2014
![Page 5: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/5.jpg)
Gene Prediction
When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .
Identify the words
When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .
526th Feb 2014
![Page 6: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/6.jpg)
Functional Annotation
When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .
nat·u·ral·ist [nach-er-uh-list, nach-ruh-]noun1. a person who studies or is an expert in natural history, especially a zoologist or botanist.2. an adherent of naturalism in literature or art.Origin: 1580–90; natural + -ist
Origin of Species, Thenoun( On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life ) a treatise (1859) by Charles Darwin setting forth his theory of evolution.
Identify the function (i.e., meaning) of each word
DATABASESPROFILES
626th Feb 2014
![Page 7: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/7.jpg)
Comparative Genomics
When on board HMS Beagle, as naturalist, I was much struck with certain facts in the distribution of the inhabitants of South America, and in the geological relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of species - that mystery of mysteries, as it has been called by one of our greatest philosophers .
726th Feb 2014
When on board RMS Titanic, as painter, I was much struck with certain facts in the distribution of the inhabitants of United Kingdom, and in the socioeconomical relations of the present to the past inhabitants of that continent. These facts seemed to me to throw some light on the origin of capitalism- that mystery of mysteries, as it has been called by one of our greatest philosophers .
![Page 8: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/8.jpg)
THE GRAVITY OF THE ANNOTATION PROCESS
Not just Newtonian826th Feb 2014
![Page 9: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/9.jpg)
“Ultimately, one wishes to determine how genes—and the proteins they
encode—function in the intact organism.”
Albert B, et al. (2002) Molecular biology of cell. New York: Garland Science.
function
926th Feb 2014
![Page 10: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/10.jpg)
Function? What is it?
• To a cell biologist function might refer to the network of interactions in which the protein participates or to the location to a certain cellular compartment.
• To a biochemist, function refers to the metabolic process in which a protein is involved or to the reaction catalyzed by an enzyme.
1026th Feb 2014
![Page 11: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/11.jpg)
Functional Annotation
Functional annotation consists of attaching biological information to genomic elements.• Biochemical function• Biological function• Involved regulation and interactions• Expression
1126th Feb 2014
![Page 12: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/12.jpg)
12
What needs to be annotated?
• Proteins – – Domain/Motifs– Signaling Peptide– Transmembrane region
• Coding and non-coding RNAs• Operons
26th Feb 2014
![Page 13: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/13.jpg)
Domain/Motif • Domain:
A discrete structural unit that is assumed to fold independently of the rest of the protein and to have its own function.~20-100 aa
• Motif:Are short, conserved regions and frequently are the most conserved regions of domains. Motifs are critical for the domain to function.
26th Feb 2014 13
![Page 14: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/14.jpg)
Coding and non coding RNA’s
Protein CodingEnzymesStructural Regulatory Signal TransductionReceptors ToxinsVirulence Factors Membrane/Transmembrane
Non Coding Riboswitches CRISPRSrna's
Pathway Prediction 26th Feb 2014 14
![Page 15: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/15.jpg)
How Gene Performs Function? Operon • Operon: Several genes with related functions that are regulated
together, because one piece of mRNA codes for several related proteins.
• Polycistronic mRNA, mRNA coding for more than one polypeptide, is found only in prokaryotes
26th Feb 2014 15
![Page 16: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/16.jpg)
APPROACHES TO FUNCTIONAL ANNOTATION
An overview1626th Feb 2014
![Page 17: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/17.jpg)
17
Approaches to functional annotation
• Ab initioBased on intrinsic characteristics of gene/protein features– Signaling peptides (SignalP, LipoP)– Transmembrane domains (TMHMM)
• Homology BasedInformation transfer from experimentally characterized system– BLAST– InterPro
26th Feb 2014
![Page 18: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/18.jpg)
18
Ab initio approaches
• Fairly standard – TM and Signaling peptides have a distinct pattern of sequence composition
• TM proteins are membrane bound receptors and channels that are of particular pharmacological relevance (therapeutic or vaccine target)
• Signal peptides direct proteins to their proper cellular or extracellular location
26th Feb 2014
![Page 19: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/19.jpg)
19
Homology based approaches
• Significant sequence similarity implies homology or shared ancestry that often leads to shared function
• Assumption – – Genes/proteins evolved to perform some function will retain
that function– Deleterious mutations will be weeded out by purifying
selection– Evolution is mostly dominated by divergence– Homology will thus entail a high chance of shared origin and
function
26th Feb 2014
![Page 20: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/20.jpg)
20
Homology based approaches• Databases:
– NCBI• GenBank• RefSeq
– EBI• SwissProt• UniProt
– DDBJ• KEGG
• Tools– BLAST– InterProScan– GO-based
26th Feb 2014
![Page 21: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/21.jpg)
21
Databases
26th Feb 2014
![Page 22: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/22.jpg)
22
Primary vs derivative sequence databases
26th Feb 2014
Sequence Data GenBank
From Sequencing Labs
RefSeq
Genomes
UniGene
Curators
Assemblies
ComputationalAlgorithms
PGAAP
![Page 23: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/23.jpg)
23
Database choices
• RefSeq, SwissProt and UniProt are all– Very reliable– High level of annotation– Minimal redundancy– Integration with other databases
26th Feb 2014
![Page 24: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/24.jpg)
24
Gene Ontology
26th Feb 2014 Shulaev, V., Sargent, D. J., Crowhurst, R. N., Mockler, T. C., Folkerts, O., Delcher, A. L., ... & Salama, D. Y. (2010). The genome of woodland strawberry (Fragaria vesca). Nature genetics, 43(2), 109-116.
![Page 25: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/25.jpg)
25
Analysis Tools - BLAST
26th Feb 2014
God help you if you do this here.
![Page 26: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/26.jpg)
26
Analysis Tools - InterProScan
26th Feb 2014
![Page 27: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/27.jpg)
27
Analysis Tools - InterProScan
26th Feb 2014
![Page 28: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/28.jpg)
28
Analysis Tools – GO Based
• Blast2GO• GOMiner• …?
26th Feb 2014
![Page 29: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/29.jpg)
Criteria for selecting methods
1. Currently being maintained2. Applicable to Prokaryotic sequences3. Could be installed locally (support batch jobs
if GUI)OR
Could be included in a pipeline i.e., have a command-line interface
2926th Feb 2014
![Page 30: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/30.jpg)
30
Gene naming• You need to have a clear logic and support for assigning names
to the predicted proteins
• A generally accepted scheme is as follows:– High confidence matches – function and annotation can be transferred– Multiple high confidence matches – assign a less specific name e.g.
ABC transporter– Low confidence matches – assign function as putative– Match to a hypothetical protein – conserved hypothetical protein– No match in the database – hypothetical protein
• How high is high? Depends on your data.
26th Feb 2014
![Page 31: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/31.jpg)
31
Automated PipelinesTakes in whole genome assembly and spits out annotations. E.g.:
• PGAAP – Prokaryotic Genome Automatic Annotation Pipeline
• CG-Pipeline – Computational Genomics Pipeline
• RAST – Rapid Annotation using subsystem technology
• KEGG – Kyoto Encyclopedia of Genes and Genomes
• And more comes out each year with specific focus and capability
26th Feb 2014
![Page 32: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/32.jpg)
CAUTION!PROS AND CONS OF CONVENTIONAL APPROACHES
Choosing The Right Function Prediction Tool
3226th Feb 2014
![Page 33: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/33.jpg)
3326th Feb 2014
![Page 34: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/34.jpg)
“Perutz et al. showed in 1960 that myoglobin and hemoglobin, the first two protein structures to be solved at atomic resolution using X-ray crystallography, have similar structures even though their sequences differ.”
26th Feb 2014 34
![Page 35: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/35.jpg)
Pros and Cons: There are no free lunches!
• Homology Useful but different from “same” function– Simply implies common ancestry
Punta, M., & Ofran, Y. (2008). The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS computational biology, 4(10), e1000160.
![Page 36: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/36.jpg)
Pros and Cons: There are no free lunches!
3626th Feb 2014
Punta, M., & Ofran, Y. (2008). The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS computational biology, 4(10), e1000160.
![Page 37: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/37.jpg)
Pros and Cons: There are no free lunches!
• Again: Quality of prediction is as good as the quality of annotation of the database
• Eukaryotic function predictor can not be used for Prokaryotes and vice versa
• Building pan-genomes is a good strategy for finding more confident matches
3726th Feb 2014
![Page 38: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/38.jpg)
38
COMPARATIVE GENOMICS
26th Feb 2014
![Page 39: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/39.jpg)
39
Comparative Genomics
26th Feb 2014 Ciccarelli, F. D., Doerks, T., Von Mering, C., Creevey, C. J., Snel, B., & Bork, P. (2006). Toward automatic reconstruction of a highly resolved tree of life.Science, 311(5765), 1283-1287.
![Page 40: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/40.jpg)
40
Comparative Genomics
• In a nutshell – it’s comparing similarities and differences in genomes (proteins/genes/SNPs) of multiple organisms from same or different species.
• Helps in answering – – Present: lifestyle - virulent vs avirulent; horizontally
acquired segments– Past: Evolution
26th Feb 2014
![Page 41: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/41.jpg)
41
Comparative Genomics
• Biological questions of general interest:– Are there are rearrangements?– Is the region(s) of interest syntenic across species?– Are their gene gain/loss event leading to specific
trait?– What organisms are more similar? What are most
distant?– What factors confer virulence to the genome?– In our case: capsule switching? What, why and how?
26th Feb 2014
![Page 42: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/42.jpg)
42
Comparative Genomics
Darling, Aaron E., István Miklós, and Mark A. Ragan. "Dynamics of genome rearrangement in bacterial populations." PLoS Genetics 4.7 (2008): e1000128.
Genomic Rearrangement
26th Feb 2014
![Page 43: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/43.jpg)
43
Comparative Genomics
Krause, A., Ramakumar, A., Bartels, D., Battistoni, F., Bekel, T., Boch, J., ... & Goesmann, A. (2006). Complete genome of the mutualistic, N2-fixing grass endophyte Azoarcus sp. strain BH72. Nature biotechnology, 24(11).
Synteny
26th Feb 2014
![Page 44: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/44.jpg)
44
Comparative Genomics
http://textbookofbacteriology.net/HorizontalTransfer.gif
Horizontal Gene Transfer
26th Feb 2014
![Page 45: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/45.jpg)
45
Comparative Genomics
• You are going to hear more about your specific goals next week
• Remember: The focus here is not about the tools but (1) identification of the biological question, (2) your approach to answering the question and (3) your results with interpretation
26th Feb 2014
![Page 46: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/46.jpg)
46
Databases• As before – there are number of sequence databases available
– You need to decide what subset of that database should you taking into consideration
– For e.g.: what organism/serogroup/sequence type should your database be focused on?
• If we are also looking for virulence factors - VFDB
• If we are interested in pathways – KEGG, Pathway Tools
26th Feb 2014
![Page 47: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/47.jpg)
47
Analysis Tools
• Homology Based – BLAST, Protein Clusters, Pathway Analysis
• Phylogenetics – MEGA, T-Coffee• Virulence - VFDB• Horizontal/Lateral Gene Transfer – Dark Horse,
Alien Hunter, Phylogeny Based• Visualization
26th Feb 2014
![Page 48: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/48.jpg)
48
Phylogenetic Analysis• There are a number of ways you can compare
organisms/genomes:– 16S rRNA tree– MLST based methods– ANI based methods
• All three can be visualized as a tree to assess the relatedness between the organisms
• ANI has been shown to correlate well with DDH by Konstantinidis et al
More traditional
Konstantinidis, K. T., Ramette, A., & Tiedje, J. M. (2006). The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1475), 1929-1940.
Goris, J., Konstantinidis, K. T., Klappenbach, J. A., Coenye, T., Vandamme, P., & Tiedje, J. M. (2007). DNA–DNA hybridization values and their relationship to whole-genome sequence similarities. International journal of systematic and evolutionary microbiology, 57(1), 81-91.26th Feb 2014
![Page 49: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/49.jpg)
49
Different phenotype, same evolutionary lineages
• Phenotypic concordance need not support same ancestral lineage
• At times it has been observed that species tend to gain certain set of mutations in same or different gene(s) which leads to the same phenotype
• Acquiring antibiotic resistance is one such example
• The investigation of such cases depends on a case-by-case manner with underlying reasons varying from SNPs, gene gain/loss, indels, plasmid uptake etc
26th Feb 2014
![Page 50: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/50.jpg)
50
Visualization• It’s important than you think• Plethora of visualization tools are available today for
various purposes• E.g.:
– Circos– CGView– BRIG– Artemis– IGV– Mauve– VISTA, etc
26th Feb 2014
![Page 51: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/51.jpg)
51
Visualization
26th Feb 2014
![Page 52: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/52.jpg)
52
Visualization
26th Feb 2014
![Page 53: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/53.jpg)
53
Visualization
26th Feb 2014
Rishishwar, L., Katz, L. S., Sharma, N. V., Rowe, L., Frace, M., Thomas, J. D., ... & Jordan, I. K. (2012). Genomic Basis of a Polyagglutinating Isolate of Neisseria meningitidis. Journal of bacteriology, 194(20), 5649-5656.
![Page 54: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/54.jpg)
54
Capsule switching breakpoint resolution
26th Feb 2014
Rishishwar, L., Katz, L. S., Sharma, N. V., Rowe, L., Frace, M., Thomas, J. D., ... & Jordan, I. K. (2012). Genomic Basis of a Polyagglutinating Isolate of Neisseria meningitidis. Journal of bacteriology, 194(20), 5649-5656.
![Page 55: Functional Annotation & Comparative Genomics](https://reader035.fdocuments.in/reader035/viewer/2022062218/568168f0550346895ddff4b5/html5/thumbnails/55.jpg)
5526th Feb 2014