Immunological bioinformatics

56
Immunological bioinformatics Ole Lund, Center for Biological Sequence Analysis (CBS) Denmark.

description

Immunological bioinformatics. Ole Lund, Center for Biological Sequence Analysis (CBS) Denmark. World-wide Spread of SARS. Status as of July 11, 2003: 8437 Infected, 813  Dead. SARS. First severe infectious disease to emerge in the post-genomic era - PowerPoint PPT Presentation

Transcript of Immunological bioinformatics

Page 1: Immunological bioinformatics

Immunological bioinformatics

Ole Lund,Center for Biological Sequence Analysis (CBS)Denmark.

Page 2: Immunological bioinformatics

World-wide Spread of SARSStatus as of July 11, 2003: 8437 Infected, 813  Dead

Page 3: Immunological bioinformatics

SARS

First severe infectious disease to emerge in the post-genomic era Modern societies are vulnerable to epidemics Classical containment strategies has been successful in

controlling the epidemic, but– SARS may resurface (e.g. be seasonal)– Suggested existence of an animal reservoir could compromise the

containment strategy Need to develop a vaccine strategy Biotechnology has provided new tools to analyze

genome/proteome information and guide vaccine development. The causative virus, the SARS corona virus (SARS CoV), has

been isolated and full-length sequenced.

 

Page 4: Immunological bioinformatics

Main scientific achievements

Discovery of causative agent Genome(s) 3D Structure of main

proteinase Origin

– Similar virus found in from Himalayan palm civets and other animals, including a raccoon-dog, and in humans working at an animal market in Guangdong, China (Guan et al., Sep 4, 2003).

 

Himalayan (Masked) palm civet

Ferret-Badger

Raccoon-doghttp://biobase.dk/~david-c/uk-dk-mammmal-list.htm

Page 5: Immunological bioinformatics

New corona viruses

1978 Porcine Epidemic diarrhea virus (PEDV)Probably from humans

1984 Porcine Respiratory Coronavirus1987 Porcine Reproductive and Respiratory

Syndrome (PRRS)1993 Bovine corona virus2003 SARS

 

Source: Michael Buchmeier, Beijing June, 2003

Page 6: Immunological bioinformatics

Will it be back?

When?– Every year?, Like the flu.– Every few years? Like measles used to.– Sporadic? Like Ebola– Never?

Lab safety: The patient, a 27-year-old virologist, worked on the West Nile virus in a biosafety level 3 lab at the Environmental Health Institute, where the SARS coronavirus was also studied (Enserink, 2003)

 

Page 7: Immunological bioinformatics

How does the immune system “see” a virus? 

Page 8: Immunological bioinformatics

The immune system

The innate immune system– Found in animals and plants – Fast response– Complement, Toll like receptors

The adaptive Immune system– Found in vertebrates– Stronger response 2nd time– B lymphocytes

Produce antibodies (Abs) recognizes 3D shapes Neutralize virus/bacteria outside cells

– T lymphocytes Cytotoxic T lymphocytes (CTLs) - MHC class I

– Recognize foreign protein sequences in infected cells– Kill infected cells

Helper T lymphocytes (HTLs) - MHC class II– Recognize foreign protein sequences presented by immune cells– Activates cells

Page 9: Immunological bioinformatics

 

Page 10: Immunological bioinformatics

Weight matrices (Hidden Markov models)

YMNGTMSQVGILGFVFTLALWGFFPVVILKEPVHGVILGFVFTLTLLFGYPVYVGLSPTVWLSWLSLLVPFVFLPSDFFPSCVGGLLTMVFIAGNSAYE

A2 Logo

Page 11: Immunological bioinformatics

Protein sequence information content

Entropy– Average Uncertainty in the random variable

– H = -pilog2pi range: 0 to log2(20) = 4.3

– Logo height I = log2(20) + H

Relative entropy (Kullback Leibler distance)

– D = pilog2(pi/qi) range: 0 to infinity

Mutual information– Reduction in uncertainty due to knowledge of another random

variable (corresponds to correlation)

– M = pijlog2(pij/pipj)

 

Page 12: Immunological bioinformatics

Prediction of MHC binding specificity

Simple Motifs– Allowed (non allowed) amino acids

Extended motifs– Amino acid preferences

Structural models– Limitations: precision of force field, and speed

of calculations

Neural networks– Can take correlations into account

 

Page 13: Immunological bioinformatics

Log odds ratios

Used for scoring Alignments (BLAST), HMMs, Matrix methods

Odds ratio of observing given amino acids– Relative probability of observing amino acid i

in motif position j– Oj = p(aai at pos j)/p(aai)

Assumption of independence =>– Odds for observing sequence = O1O2 … On

Log odds ratio– LO = log(O1O2 … On) = log(O1)+log(O2)+…

log(On)– LO in half bits = 2 LO/log(2)

 

Page 14: Immunological bioinformatics

A

F

C

G

Page 15: Immunological bioinformatics

Evaluation of prediction accuracy 

Coverage = TP/actual_positive

Reliability = TP/predicted_positive

Page 16: Immunological bioinformatics

A*1101 performance154 peptides, 9 Binders

0.350.450.5

0.76

0

0.2

0.4

0.6

0.8

1

Pea

rso

n c

orr

elat

ion

co

effi

cien

t

Prediction method

Correlation

SYFPEITHI Bimas HMM NN

0.110.18

0.330.43

0

0.2

0.4

0.6

0.8

1

True

pos

itive

rat

io

Prediction method

50% Coverage

SYFPEITHI Bimas HMM NN

0 0 0

0.44

0

0.2

0.4

0.6

0.8

1

Cov

erag

e

Prediction method

95% Reliability

SYFPEITHI Bimas HMM NN

Page 17: Immunological bioinformatics

From Bill Paul, ”Fundamental Immunology”, 4th Ed

The MHC gene region

Page 18: Immunological bioinformatics

Human LeuHuman Leukkocyte antigen ocyte antigen (HLA=(HLA=MHC in humansMHC in humans) ) polymorphism - allelespolymorphism - alleles

A total of229 HLA-A464 HLA-B111 HLA-C

class I alleles have been named,a total of

2 HLA-DRA, 364 HLA-DRB22 HLA-DQA1, 48 HLA-DQB120 HLA-DPA1, 96 HLA-DPB1

class II sequences have also been assigned.

As of October 2001 (http://www.anthonynolan.com/HIG/index.html)

Page 19: Immunological bioinformatics

HLA polymorphism HLA polymorphism - supertypes- supertypes

•Each HLA molecule within a supertype essentially binds the same peptides•Nine major HLA class I supertypes have been defined

•HLA-A1, A2, A3, A24,B7, B27, B44, B58, B62

Sette et al, Immunogenetics (1999) 50:201-212

Page 20: Immunological bioinformatics

Supertypes Phenotype frequencies

Caucasian Black Japanese ChineseHispanicAverage

A2,A3, B27 83 % 86 % 88 % 88 % 86 % 86%

+A1, A24, B44 100 % 98 % 100 % 100 % 99 % 99 %

+B7, B58, B62 100 % 100 % 100 % 100 % 100 % 100 %

Sette et al, Immunogenetics (1999) 50:201-212

HLA polymorphism - frequencies

Page 21: Immunological bioinformatics

 

Page 22: Immunological bioinformatics

 

Page 23: Immunological bioinformatics

 

Page 24: Immunological bioinformatics

 

Page 25: Immunological bioinformatics

 

Page 26: Immunological bioinformatics

Conclutions

We suggest to– split some of the alleles in the A1 supertype into a

new A26 supertype– split some of the alleles in the B27 supertype into a

new B39 supertype. – the B8 alleles may define their own supertype– The specificities of the class II molecules can be

clustered into nine classes, which only partly correspond to the serological classification

 

Lund O, Nielsen M, Kesmir C, Petersen AG, Lundegaard C, Worning P, Sylvester-Hvid C, Lamberth K, Roder G, Justesen S, Buus S, Brunak S. Definition of supertypes for HLA molecules using clustering of specificity matrices. Immunogenetics. 2004 Feb 13 [Epub ahead of print]

Page 27: Immunological bioinformatics

MHC class I binding of SARS peptides

Predictions for all supertypes– Broad population coverage

Allele specific neural networks– Peptides with associated measured binding affinity– A1 (A0101), A2 (A0204), A3 (A1101+A0301), B7 (B0702)

Weight matrices– Peptides from public databases (Sypfeithi, MHCpep)– A24, B27, B44, B58 and B62

Page 28: Immunological bioinformatics

Super type weight matrices

B27

B62B58

B44

Page 29: Immunological bioinformatics

Proteasomal cleavage

Page 30: Immunological bioinformatics
Page 31: Immunological bioinformatics

Epitope predictions

Binding to MHC class I High probability for C-terminal proteasomal

cleavage No sequence variation

Page 32: Immunological bioinformatics
Page 33: Immunological bioinformatics

Inside out:1. Position in RNA2. Translated regions (blue)3. Observed variable spots4. Predicted proteasomal cleavage5. Predicted A1 epitopes6. Predicted A*0204 epitopes7. Predicted A*1101 epitopes8. Predicted A24 epitopes9. Predicted B7 epitopes10. Predicted B27 epitopes11. Predicted B44 epitopes12. Predicted B58 epitopes13. Predicted B62 epitopes

Page 34: Immunological bioinformatics

Christina Sylvester-Hvid, University of Copenhagen , July, 2003

DevelopmentDevelopment

22mmHeavy chainHeavy chain

peptidepeptide IncubationIncubationPeptide-MHC Peptide-MHC complexcomplex

Strategy for the quantitative ELISA assay

C. Sylvester-Hvid, et al., Tissue antigens, 2002: 59:251

• Step I: Folding of MHC class I molecules in solutionStep I: Folding of MHC class I molecules in solution

• Step II: Detection of Step II: Detection of de novode novo folded MHC class I molecules by ELISA folded MHC class I molecules by ELISA

Page 35: Immunological bioinformatics

Summery of peptide binding assays

#tested #binding <500nMA1 15 13A2 15 12 A3 15 14 A24 0 -B7 15 10B27 13 2B44 0 -B58 15 13B62 14 12

 

Page 36: Immunological bioinformatics

• New epitopes 12• Poor C-term cleavage 8• Cleavage within 31• Linker length 12

Initial polytope (19 HIV epitopes)

Page 37: Immunological bioinformatics

• New epitopes 1• Weak C-term cleavage 3• Cleavage within 7• Linker length 37

Optimized polytope

Page 38: Immunological bioinformatics
Page 39: Immunological bioinformatics
Page 40: Immunological bioinformatics

MHC class II Molecule

Page 41: Immunological bioinformatics

Virtual matrices

HLA-DR molecules sharing the same pocket amino acid pattern, are asumed to have identical amino acid binding preferences.

Page 42: Immunological bioinformatics

MHC Class II binding

Virtual matrices– TEPITOPE: Hammer, J., Current Opinion in Immunology 7, 263-269, 1995, – PROPRED: Singh H, Raghava GP Bioinformatics 2001 Dec;17(12):1236-7

Web interface http://www.imtech.res.in/raghava/propred

Prediction Results

Page 43: Immunological bioinformatics

MHC class II prediction

Complexity of problem– Peptides of different

length– Weak motif signal

Alignment crucial Gibbs Monte Carlo

sampler

RFFGGDRGAPKRGYLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPAGSLFVYNITTNKYKAFLDKQSALLSSDITASVNCAKPKYVHQNTLKLATGFKGEQGPKGEPDVFKELKVHHANENISRYWAIRTRSGGITYSTNEIDLQLSQEDGQTIE

Page 44: Immunological bioinformatics

Class II binding motif

RFFGGDRGAPKRG YLDPLIRGLLARPAKLQVKPGQPPRLLIYDASNRATGIPA GSLFVYNITTNKYKAFLDKQ SALLSSDITASVNCAK PKYVHQNTLKLAT GFKGEQGPKGEP DVFKELKVHHANENI SRYWAIRTRSGGI TYSTNEIDLQLSQEDGQTI

Random ClustalW

Gibbs sampler

Alignment by Gibbs sampler

Page 45: Immunological bioinformatics

MHC class II predictionsAllele DRB1_0401

00.10.20.30.40.50.60.70.80.9

MHCbench1

MHCbench2

MHCbench3

MHCbench4

MHCbench5

MHCbench6

MHCbench7

MHCbench8

Southwood

Geluk

Tepitope

Gibbs

Accuracy

Page 46: Immunological bioinformatics

 

Page 47: Immunological bioinformatics

 

Page 48: Immunological bioinformatics

Polytope construction

NH2 COOH

Epitope

Linker

M

C-terminal cleavage

Cleavage within epitopes

New epitopescleavage

Page 49: Immunological bioinformatics
Page 50: Immunological bioinformatics
Page 51: Immunological bioinformatics

Prediction of Antibody epitopes

Linear– Hydrophilicity scales (average in ~7 window)

Hoop and Woods (1981) Kyte and Doolittle (1982) Parker et al. (1986)

– Other scales & combinations Pellequer and van Regenmortel Alix

Discontinuous– Protrusion (Novotny, Thornton, 1986)

Neural networks (In preparation)

Page 52: Immunological bioinformatics

Secondary structure in epitopes

Sec struct: H T B E S G I .

Log odds ratio

-0.19 0.30 0.21 -0.27 0.24 -0.04 0.00 0.17

H: Alpha-helix (hydrogen bond from residue i to residue i+4)

G: 310-helix (hydrogen bond from residue i to residue i+3)

I: Pi helix (hydrogen bond from residue i to residue i+5)

E: Extended strand

B: Beta bridge (one residue short strand)

S: Bend (five-residue bend centered at residue i)

T: H-bonded turn (3-turn, 4-turn or 5-turn)

. : Coil

Page 53: Immunological bioinformatics

Amino acids in epitopes

Amino Acid

G A V L I M P F W S

e/E 0.09 0.07 0.05 0.08 0.04 0.02 0.06 0.03 0.01 0.08

. 0.07 0.08 0.07 0.10 0.06 0.03 0.05 0.05 0.02 0.07

Amino acid

C T Q N H Y E D K R

e/E 0.03 0.08 0.04 0.04 0.02 0.04 0.06 0.07 0.07 0.04

. 0.03 0.06 0.04 0.05 0.02 0.03 0.04 0.04 0.05 0.04

Fre

Page 54: Immunological bioinformatics

Dihedral angles in epitopes

Z-scores for number of dihedral angle combinations in epitopes vs. non epitopes

Phi\Psi 1 2 3 4 5 6 7 8 9 10 11 12

1 -0.47 0.44 -0.58 0.45 0.46 0.00 0.00 -0.73 -0.79 0.00 -0.83 1.42

2 -0.01 -0.12 -1.82 0.52 1.75 0.00 0.00 0.00 1.42 -0.82 0.00 0.00

3 1.82 -2.26 -1.57 0.48 0.10 0.00 -0.77 0.45 1.77 0.00 -0.82 0.99

4 1.76 1.15 -0.34 0.75 0.00 0.00 0.97 0.16 0.38 1.03 0.00 0.00

5 -0.85 0.45 -1.09 0.57 0.00 0.00 0.00 0.13 1.52 0.00 1.02 -0.79

6 0.60 1.28 1.30 1.73 0.00 0.00 0.00 0.00 1.32 -0.89 -0.76 0.00

7 0.27 -0.91 1.67 -0.51 0.00 0.00 0.00 0.00 -1.02 -1.09 0.00 0.00

8 0.93 1.21 -0.23 -3.63 0.49 0.00 0.00 0.00 0.00 -0.19 0.31 -0.82

9 0.00 0.28 -0.67 0.33 0.01 -0.83 0.00 0.00 0.87 0.23 0.00 0.00

10 0.00 0.95 1.71 -0.70 0.00 0.00 0.00 1.29 1.08 0.00 1.00 0.00

11 0.00 0.00 1.02 0.00 0.00 0.00 0.00 0.86 -0.75 0.00 0.00 0.00

12 0.42 0.83 0.28 1.68 0.00 0.00 0.00 0.00 1.03 -0.21 -0.79 0.93

Page 55: Immunological bioinformatics

Immunological bioinformatics

Classical experimental research– Few data points– Data recorded by pencil and paper/spreadsheet

New experimental methods– Sequencing– DNA arrays– Proteomics

Need to develop new methods for handling these large data sets

Immunological Bioinformatics/Immunoinformatics

 

Page 56: Immunological bioinformatics

Acknowledgements

CBS, Technical University of Denmark

Søren Brunak (Director of CBS)

Morten Nielsen (Epitope prediction)

Peder Worning (Genome atlases)

Claus Lundegaard (Data bases)

Mette Børgesen (CTL prediction)

Jesper Schantz (Polytope optimization)

IMMI, University of Copenhagen

Søren Buus (Professor)

Christina Sylvester-Hvid (Experimental coordinator)

Kasper Lamberth (Peptide bank, Quality control)

Erland Johansson, Jeanette Nielsen (Preparations of peptides)

Hanne Møller (ELISA binding assay)