1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics...
description
Transcript of 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics...
![Page 1: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/1.jpg)
1-Month Practical Master CourseGenome Analysis
Jaap HeringaCentre for Integrative Bioinformatics VU (IBIVU)Vrije Universiteit AmsterdamThe Netherlands
CENTR
FORINTEGRATIVE
BIOINFORMATICSVU
E
![Page 2: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/2.jpg)
MathematicsStatistics
Computer ScienceInformatics
BiologyMolecular biology
Medicine
Chemistry
Physics
Bioinformatics
![Page 3: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/3.jpg)
Biological Sequence Analysis
Pair-wise sequence alignmentResidue exchange matricesMultiple sequence alignmentPhylogeny
C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U
E
![Page 4: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/4.jpg)
.....acctc ctgtgcaaga acatgaaaca nctgtggttc tcccagatgg gtcctgtccc aggtgcacct gcaggagtcg ggcccaggac tggggaagcc tccagagctc aaaaccccac ttggtgacac aactcacaca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc acggtgccca gagcccaaat cttgtgacac acctccccca tgcccacggt gcccagagcc caaatcttgt gacacacctc ccccgtgccc ccggtgccca gcacctgaac tcttgggagg accgtcagtc ttcctcttcc ccccaaaacc caaggatacc cttatgattt cccggacccc tgaggtcacg tgcgtggtgg tggacgtgag ccacgaagac ccnnnngtcc agttcaagtg gtacgtggac ggcgtggagg tgcataatgc caagacaaag ctgcgggagg agcagtacaa cagcacgttc cgtgtggtca gcgtcctcac cgtcctgcac caggactggc tgaacggcaa ggagtacaag tgcaaggtct ccaacaaagc aaccaagtca gcctgacctg cctggtcaaa ggcttctacc ccagcgacat cgccgtggag tgggagagca atgggcagcc ggagaacaac tacaacacca cgcctcccat gctggactcc gacggctcct tcttcctcta cagcaagctc accgtggaca agagcaggtg gcagcagggg aacatcttct catgctccgt gatgcatgag gctctgcaca accgctacac gcagaagagc ctctc.....
DNA sequenceDNA sequence
![Page 5: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/5.jpg)
Genome sizeGenome sizeOrganism Number of base pairsX-174 virus 5,386Epstein Bar Virus 172,282Mycoplasma genitalium 580,000Hemophilus Influenza 1.8 106 Yeast (S. Cerevisiae) 12.1 106
Human Human 3.2 3.2 10 1099
Wheat 16 109
Lilium longiflorum 90 109
Salamander 100 109 Amoeba dubia 670 109
![Page 6: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/6.jpg)
Three main principles
• DNA makes RNA makes Protein
• Structure more conserved than sequence
• Sequence Structure Function
![Page 7: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/7.jpg)
TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)
Genome
Expressome
Proteome
Metabolome
Functional GenomicsFunctional Genomics
Regulation, signalling cascades, chaperonins, compartmentalisation
![Page 8: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/8.jpg)
How to go from DNA to protein sequence
A piece of double stranded DNA:
5’ attcgttggcaaatcgcccctatccggc 3’3’ taagcaaccgtttagcggggataggccg 5’
DNA direction is from 5’ to 3’
![Page 9: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/9.jpg)
How to go from DNA to protein sequence
6-frame translation using the codon table (last lecture):
5’ attcgttggcaaatcgcccctatccggc 3’
3’ taagcaaccgtttagcggggataggccg 5’
![Page 10: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/10.jpg)
Dean, A. M. and G. B. Golding: Pacific Symposium on Bioinformatics 2000
Evolution and three-dimensional protein structure information
Isocitratedehydrogenase:
The distance fromthe active site(in yellow) determinesthe rate of evolution(red = fast evolution, blue = slow evolution)
![Page 11: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/11.jpg)
Protein Sequence-Structure-FunctionProtein Sequence-Structure-Function
Sequence
Structure
Function
Threading
Homology searching (BLAST)
Ab initio prediction and folding
Function prediction from structure
![Page 12: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/12.jpg)
Widely used tool for homology detection: PSI-BLAST
• Heuristic tool to cut down computations required for database searching (~1M sequences in DB)
• Sensitivity gained by iteratively finding hits (local alignments) and repeating search
Q
DBT
hits
PSSM
![Page 13: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/13.jpg)
Threading
Query sequence
Template sequence
+
Template structure
Compatibility score
![Page 14: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/14.jpg)
Threading
Query sequence
Template sequence
+
Template structure
Compatibility score
![Page 15: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/15.jpg)
Fold recognition by threading
Query sequence
Compatibility scores
Fold 1
Fold 2
Fold 3
Fold N
![Page 16: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/16.jpg)
“Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky (1900-1975))
“Nothing in bioinformatics makes sense except in the light of Biology”
Bioinformatics
![Page 17: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/17.jpg)
Divergent evolution Ancestral sequence: ABCD
ACCD (B C) ABD (C ø)
ACCD or ACCD Pairwise Alignment AB─D A─BD
mutation deletion
![Page 18: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/18.jpg)
Divergent evolution Ancestral sequence: ABCD
ACCD (B C) ABD (C ø)
ACCD or ACCD Pairwise Alignment AB─D A─BD
true alignment
mutation deletion
![Page 19: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/19.jpg)
Mutations under divergent evolution
Ancestral sequence
Sequence 1 Sequence 2
1: ACCTGTAATC2: ACGTGCGATC * **D = 3/10 (fraction different sites (nucleotides))
G
G C
(a) G
A C
(b)
G
A A
(c)
One substitution -one visible
Two substitutions -one visible
Two substitutions -none visible
G
G A
(d)
Back mutation -not visible G
![Page 20: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/20.jpg)
Convergent evolution
• Often with shorter motifs (e.g. active sites)• Motif (function) has evolved more than once
independently, e.g. starting with two very different sequences adopting different folds
• Sequences and associated structures remain different, but (functional) motif can become identical
• Classical example: serine proteinase and chymotrypsin
![Page 21: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/21.jpg)
Serine proteinase (subtilisin) and chymotrypsin
• Different evolutionary origins, no sequence similarity • Similarities in the reaction mechanisms. Chymotrypsin,
subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base.
• The geometric orientations of the catalytic residues are similar between families, despite different protein folds.
• The linear arrangements of the catalytic residues reflect different family relationships. For example the catalytic triad in the chymotrypsin clan (SA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC).
![Page 22: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/22.jpg)
A protein sequence alignmentMSTGAVLIY--TSILIKECHAMPAGNE--------GGILLFHRTHELIKESHAMANDEGGSNNS * * * **** ***
A DNA sequence alignmentattcgttggcaaatcgcccctatccggccttaaatt---tggcggatcg-cctctacgggcc----*** **** **** ** ******
![Page 23: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/23.jpg)
What can sequence tell us about structure(HSSP)
Sander & Schneider, 1991
![Page 24: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/24.jpg)
Searching for similaritiesWhat is the function of the new gene?
The “lazy” investigation (i.e., no biologial experiments, just bioinformatics techniques):
– Find a set of similar protein sequences to the unknown sequence
– Identify similarities and differences
– For long proteins: identify domains first
![Page 25: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/25.jpg)
Evolutionary and functional relationships
Reconstruct evolutionary relation:
•Based on sequence-Identity (simplest method)-Similarity
•Homology (common ancestry: the ultimate goal)•Other (e.g., 3D structure)
Functional relation:Sequence Structure Function
![Page 26: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/26.jpg)
Common ancestry is more interesting:Makes it more likely that genes sharethe same function
Homology: sharing a common ancestor– a binary property (yes/no)– it’s a nice tool:When (an unknown) gene X is homologous to (a known) gene G it means that we gain a lot of information on X: what we know about G can be transferred to X as a good suggestion.
Searching for similarities
![Page 27: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/27.jpg)
Biological definitions for Biological definitions for related sequencesrelated sequences
Homologues are similar sequences in two different organisms that have been derived from a common ancestor sequence. Homologues can be described as either orthologues or paralogues.
Orthologues are similar sequences in two different organisms that have arisen due to a speciation event. Orthologs typically retain identical or similar functionality throughout evolution.
Paralogues are similar sequences within a single organism that have arisen due to a gene duplication event.
Xenologues are similar sequences that do not share the same evolutionary origin, but rather have arisen out of horizontal transfer events through symbiosis, viruses, etc.
![Page 28: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/28.jpg)
How to evolveImportant distinction:• Orthologues: homologous proteins in different species (all
deriving from same ancestor)• Paralogues: homologous proteins in same species (internal gene
duplication)
• In practice: to recognise orthology, bi-directional best hit is used in conjunction with database search program (this is called an operational definition)
![Page 29: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/29.jpg)
Source: http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html
So this means …So this means …
![Page 30: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/30.jpg)
Example today: Pairwise sequence alignment needs sense of evolution
Global dynamic programmingMDAGSTVILCFVG
MDAASTILCGS Amino Acid
Exchange Matrix
Gap penalties (open,extension)
Search matrix
MDAGSTVILCFVG-MDAAST-ILC--GS
Evolution
![Page 31: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/31.jpg)
How to determine similarityFrequent evolutionary events at the DNA level:
1. Substitution
2. Insertion, deletion
3. Duplication
4. Inversion
We will restrict ourselves to these events
![Page 32: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/32.jpg)
A DNA sequence alignmentattcgttggcaaatcgcccctatccggccttaaatt---tggcggatcg-cctctacgggcc----*** **** **** ** ******
A protein sequence alignmentMSTGAVLIY--TSILIKECHAMPAGNE--------GGILLFHRTHELIKESHAMANDEGGSNNS * * * **** ***
nucleotide one-letter code
amino acid one-letter code
![Page 33: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/33.jpg)
– Substitution (or match/mismatch)
• DNA
• proteins
– Gap penalty
• Linear: gp(k)=ak
• Affine: gp(k)=b+ak
• Concave, e.g.: gp(k)=log(k)
The score for an alignment is the sum of the scores over all alignment columns
Dynamic programmingScoring alignments
![Page 34: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/34.jpg)
Dynamic programmingScoring alignments
Sa,b = -
gp(k) = gapinit + kgapextension affine gap penalties
li jbas ),( )(kgpN
kk
![Page 35: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/35.jpg)
DNA: define a score for match/mismatch of lettersSimple:
Used in genome alignments:
A C G T
A 1 -1 -1 -1
C -1 1 -1 -1
G -1 -1 1 -1
T -1 -1 -1 1
A C G T
A 91 -114 -31 -123
C -114 100 -125 -31
G -31 -125 100 -114
T -123 -31 -114 91
![Page 36: 1-Month Practical Master Course Genome Analysis Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam The Netherlands](https://reader036.fdocuments.in/reader036/viewer/2022081604/5681681f550346895dddaf46/html5/thumbnails/36.jpg)
Dynamic programmingScoring alignments
10 1Amino Acid Exchange Matrix Affine gap
penalties (open, extension)
2020
Score: s(T,T)+s(D,D)+s(W,W)+s(V,L)-Po-2Px ++s(L,I)+s(K,K)
T D W V T A L KT D W L - - I K