Immunological Bioinformatics: Prediction of epitopes in pathogens Ole Lund.

43
Immunological Bioinformatics: Prediction of epitopes in pathogens Ole Lund
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of Immunological Bioinformatics: Prediction of epitopes in pathogens Ole Lund.

Immunological Bioinformatics: Prediction of epitopes in

pathogens Ole Lund

Data driven predictions List of peptides that

have a given biological feature

Mathematical model (neural network, hidden Markov model)

Search databases for other biological sequences with the

same feature/property

YMNGTMSQVGILGFVFTLALWGFFPVVILKEPVHGVILGFVFTLTLLFGYPVYVGLSPTVWLSWLSLLVPFVFLPSDFFPSCVGGLLTMVFIAGNSAYE

>polymerase“

MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITAD

KRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFE

KVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE

SQLTITKEKKEELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW

EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGIRMVDILRQ

NPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGY

EEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF

...

Prediction algorithms

MHC binding

data

Prediction

algorithmsGenome

scans

Influenza A virus (A/Goose/Guangdong/1/96(H5N1))

>polymerase“

MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITAD

KRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFE

KVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE

SQLTITKEKKEELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW

EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGIRMVDILRQ

NPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGY

EEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF

...

and 9 other proteins

MERIKELRD

ERIKELRDL

RIKELRDLM

IKELRDLMS

KELRDLMSQ

ELRDLMSQS

LRDLMSQSR

RDLMSQSRT

DLMSQSRTR

LMSQSRTRE

and 4376 other 9mers

Proteins

9mer peptides

>Segment 1

agcaaaagcaggtcaattatattcaatatggaaagaataaaagaactaagagatctaatg

tcgcagtcccgcactcgcgagatactaacaaaaaccactgtggatcatatggccataatc

aagaaatacacatcaggaagacaagagaagaaccctgctctcagaatgaaatggatgatg

gcaatgaaatatccaatcacagcagacaagagaataatggagatgattcctgaaaggaat

and 13350 other nucleotides on 8 segments

Genome

Arms race between humans and microbes

Recognize

Escape

Peptides frommicrobes

HLA molecules In Humans

Figure by Anne Mølgaard, peptide (KVDDTFYYV) used as vaccine by Snyder et al. J Virol 78, 7052-60 (2004).

Human MHC:~1000 variants distributed over 12 types

Peptide:up to 209 variants

HLA A and B diversity

Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, Roder G, Peters B, Sette A, Lund O, Buus S., NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PLoS ONE. 2007 2:e796.

Binding affinity vs antigenecity

A quantitative analysis of the variables affecting the repertoire of T cell specificities recognized after vaccinia virus infection.Assarsson E, Sidney J, Oseroff C, Pasquetto V, Bui HH, Frahm N, Brander C, Peters B, Grey H, Sette A. J Immunol. 2007 Jun 15;178(12):7890-901.

Prediction of MHC I epitopes

Major histocompatibility complex class I binding predictions as a tool in epitope discovery. Lundegaard C, Lund O, Buus S, Nielsen M. Immunology. 2010 Jul;130(3):309-18. Epub 2010 May 26. Review.

Recent benchmark studies• Class I

– Peters B, Bui HH, Frankild S et al. A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol 2006; 2:e65.

– Lin HH, Ray S, Tongchusak S, Reinherz EL, Brusic V. Evaluation of MHC class I peptide binding prediction servers: applications for vaccine research. BMC Immunol 2008; 9:8.

• Class II– Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters B. A systematic assessment of MHC class II peptide

binding predictions and evaluation of a consensus approach.PLoS Comput Biol 2008; 4:e1000048.– Lin HH, Zhang GL, Tongchusak S, Reinherz EL, Brusic V. Evaluation of MHC-II pep- tide binding prediction

servers: applications for vaccine research. BMC Bioinformatics 2008; 9(Suppl. 12):S22.

– Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and toolsLianming Zhang, Keiko Udaka, Hiroshi Mamitsuka, Shanfeng ZhuBriefings in bioinformatics (impact factor: 7.33). 09/2011; DOI: 10.1093/bib/bbr060

Validation of binding predictions

Response diversity

Hoof, et al., JI, 2010

TB epitope discovery strategy

Tang ST et al Submitted.

Mtb H37Rv genome sequence

Selection of peptides predicted to bind to HLA supertypes (NetCTL, protFun, SubCell):

A2 (A*0201), A3 (A*0301), B7 (B*0702)(coverage approx. 80% of the world population)

Synthesis selected peptides

Measuring peptide/MHC binding affinityin vitro

Screening for peptide recognition in in vitro CD8+ T cell assay in healthy PPD+ donors

Direct ex vivo determination of frequencies of peptide/tetramer+ CD8+ T cells in TB patients

(Multi) functionality of peptide responsive CD8+ T cells in TB patients

Genome-Based In Silico Identification of New Mycobacterium tuberculosis Antigens Activating Polyfunctional CD8+ T Cells in Human Tuberculosis. Tang ST, van Meijgaarden KE, Caccamo N, Guggino G, Klein MR, van Weeren P, Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S, Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011 Jan 15;186(2):1068-80. Epub 2010 Dec 17.

TB

Genome-Based In Silico Identification of New Mycobacterium tuberculosis Antigens Activating Polyfunctional CD8+ T Cells in Human Tuberculosis. Tang ST, van Meijgaarden KE, Caccamo N, Guggino G, Klein MR, van Weeren P, Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S, Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011 Jan 15;186(2):1068-80. Epub 2010 Dec 17.

TB

Genome-Based In Silico Identification of New Mycobacterium tuberculosis Antigens Activating Polyfunctional CD8+ T Cells in Human Tuberculosis. Tang ST, van Meijgaarden KE, Caccamo N, Guggino G, Klein MR, van Weeren P, Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S, Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011 Jan 15;186(2):1068-80. Epub 2010 Dec 17.

Tetramer and cytokine staining of 10 cured TB patients and 10 healthy controls

Genome-Based In Silico Identification of New Mycobacterium tuberculosis Antigens Activating Polyfunctional CD8+ T Cells in Human Tuberculosis. Tang ST, van Meijgaarden KE, Caccamo N, Guggino G, Klein MR, van Weeren P, Kazi F, Stryhn A, Zaigler A, Sahin U, Buus S, Dieli F, Lund O, Ottenhoff TH. J Immunol. 2011 Jan 15;186(2):1068-80. Epub 2010 Dec 17.

TM cytokine1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

pMtb1 0 0 0 0 0 0 0 0 0 0pMtb2 0 0 0 0 0 0 0 0 0 0pMtb3 0 0 0 0 0 0 0 0 0 0pMtb4 0 0 0 0 0 0 0 0 0 0pMtb5 0 0 0 0 0 0 0 0 0 0pMtb6 0 0 0 0 0 0 0 0 0 0pMtb7 0 0 0 0 0 0 0 0 0 0pMtb8 0 0 0 0 0 0 0 0 0 0pMtb9 0 0 0 0 0 0 0 0 0 0

TB pMtb10 0 0 0 0 0 0 0 0 0 0pMtb11 0 0 0 0 0 0 0 0 0 0pMtb12 0 0 0 0 0 0 0 0 0 0pMtb13 0 0 0 0 0 0 0 0 0 0pMtb14 0 0 0 0 0 0 0 0 0 0pMtb15 0 0 0 0 0 0 0 0 0 0pMtb16 0 0 0 0 0 0 0 0 0 0pMtb17 0 0 0 0 0 0 0 0 0 0pMtb18 0 0 0 0 0 0 0 0 0 0

pCMV 0 0 0 0 0 0 0 0 0 0pEbola 0 0 0 0 0 0 0 0 0 0

pMtb1pMtb2pMtb3pMtb4pMtb5pMtb6pMtb7pMtb8pMtb9

HC pMtb10pMtb11pMtb12pMtb13pMtb14pMtb15pMtb16pMtb17pMtb18

pCMVpEbola

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

> 0.10% > 1.00%0.05 > % < 0.10 0.10 > % < 1.00

0.05 > % < 0.10

The challenge of rational epitope selection

• We have more than 2500 MHC molecules• We often have more than 500 different

pathogenic strains• How to design a method to select a small

pool of peptides that will cover both the MHC polymorphism and the pathogen diversity?– No peptide will bind to all MHC molecules and

few (maybe even no) peptides will be present in all pathogenic strains

Vaccine discovery - HIV case story

• 10 HIV proteins– > 2,000,000 different peptides exist within the

known HIV clades

• Patient diversity– More than 2500 different MHC molecules

• The challenge– Select 100 (0.005%) peptides with optimal

genomic and HLA coverage

HIV Gag phylogenetic treeClade C

Clade D

Clade BClade A

Clade AE

Few peptides conserved between all viral strains

Epitope identification

56 (1.5%) 9mer are conserved among all 15 Clade A gag sequences

Polyvalent vaccines• Select epitopes in a way so that they

together cover all strains.

Strain 1

Strain 2

Strain 1

Strain 2

Epitope

Uneven coverage, Average coverage = 2

Even coverage, Average coverage = 2

X

EpiSelect. Pathogen diversity

Selected West Nile Virus EpitopesShown relative to NC001563/M12296

Mette Voldby Larsen

Use of EpiSelect: CTL Epitopes with Maximum HIV-1 Coverage

Problem: The high mutation rate of HIV-1 makes it difficult to identify CTL epitopes that are conserved among all subtypes.

Possible solution: Chose a number of predicted and experimentally identified epitopes that together constitute a broad coverage of the HIV-1 strains examined.

Data: 300+ fully sequenced HIV-1 strains A (A1 and A2), B, C, D, and CRF01_AE

Methods:Prediction of CTL epitopes restricted by A1, A2, A3, A24, B7, B44, or B58 Select the epitopes that give the broadest coverage

The algorithm chooses epitopes found in as many strains as possible, while up prioritizing epitopes from strains with few already-selected epitopes.

Results: The final set consists of 180 epitopes. On average, each strain is covered by 54 epitopes (minimum 29). Ongoing work by Annika Karlsson: The ability of the chosen epitopes to elicit CTL response will be examined by using PBMCs from HIV-1 infected patients.

Annika Karlsson and Carina Perez

Supertypes Phenotype frequencies

Caucasian Black Japanese Chinese Hispanic Average

A2,A3, B7 83 % 86 % 88 % 88 % 86 % 86%

+A1, A24, B44 100 % 98 % 100 % 100 % 99 % 99 %

+B27, B58, B62 100 % 100 % 100 % 100 % 100 % 100 %

HLA polymorphism - frequencies

Sette et al, Immunogenetics (1999) 50:201-212

Perez et al., JI, 2008

Annika Karlsson

KarolinskaInstitute

Carina Perez

Response of 31 HIV infected patients to 184 predicted HIV epitopes

All HIV responsive patients respond to at least one of nine peptides

Perez et al., JI, 2008

PopCover – 2D searching

• > 2,000,000 different peptides exists within the known HIV clades

• 227091 peptides with prediction binding affinity stronger than 500 nM to any MHC molecule– 5608(tat), 20961(nef), 31848(gag),42748(pol),125926 (env)

• No Gag peptides are found in all clades and 92% of all Gag peptides are shared only between 0-5% of all clades

• The challenge• Select 64 (less than 0.001%) peptides with optimal

genomic and HLA coverage– tat(4), nef(15), gag(15), pol(15), env(15)

EpiSelect and PoPCover

• EpiSelect

The sum is over all genomes i. Pji is 1 if epitope j is present in genome i. Ci is the

number of times genome i has been targeted in the already selected set of epitopes

• PopCover

The sum is over all genomes i and HLA alleles k. Rjki is 1 if epitope j is present in

genome i and is presented by allele k, and Eki is the number of times allele k has been targeted by epitopes in genome i by the already selected set of epitopes, fk is the frequency of allele k in a given population and gi is the genomes frequency

18 april 2023 Marcus Buggert 33

An average of 4,79 recognized peptides per patient

Tat

Nef

Gag

Pol

Env

Marcus Buggert et al., In preparation

Experimental validation of HIV class II epitopes

Experimental validation

Vaccine design. Polytope construction

NH2 COOH

Epitope

Linker

M

C-terminal cleavage

Cleavage within epitopes

New epitopescleavage

Polytope starting configuration

Immunological Bioinformatics, The MIT press.

Polytope optimal configuration

Immunological Bioinformatics, The MIT press.

Prediction servers at CBSWeb servers

CTL epitopeswww.cbs.dtu.dk/services/NetCTL

MHC bindingwww.cbs.dtu.dk/services/NetMHC www.cbs.dtu.dk/services/NetMHCII www.cbs.dtu.dk/services/NetMHCpan www.cbs.dtu.dk/services/NetMHCconswww.cbs.dtu.dk/services/NetMHCIIpan www.cbs.dtu.dk/services/HLArestrictor

MHC Motif viewerwww.cbs.dtu.dk/biotools/MHCMotifViewer/Home.html

Proteasome processingwww.cbs.dtu.dk/services/NetChop-3.0

B-cell epitopeswww.cbs.dtu.dk/services/BepiPred/ www.cbs.dtu.dk/services/DiscoTope

Plotting of epitopes relative to reference sequencewww.cbs.dtu.dk/services/EpiPlot-1.0

Analysis of human immunoglobulin VDJ recombinationwww.cbs.dtu.dk/services/VDJsolver

Geno-pheno type association based mapping of binding siteswww.cbs.dtu.dk/services/SigniSite

PhD/master course in Immunological Bioinformatics, June, 2012www.cbs.dtu.dk/courses/27685.imm

Peters B, et al. Immunogenetics. 2005 57:326-36, PLoS Biol. 2005 3:e91.

Immune Epitope Database (IEDB)

Cross-reactivity • Crossreactivity is predictable (Pearsons r = 0.35-0.6)

– Rule of thumb: Each mutation halfs the response

Frankild et al., PLoS ONE 3(3): e1831 Hoof, et al., JI, 2010

Pilot study of immunogenecity based on DrugBank

• www.drugbank.ca• Records corresponding to 123 FDA-approved biotech

(protein/peptide) drugs were downloaded• Sequences were compared to the human proteome

(sequences from “Homo Sapiens” in NR (non redundant database from NCBI)) using blast.

• Sequences found in DrugBank and NR need to be manually validated/curated

Types of proteins• Human/Human protein sequence Identical proteins• Modified/allelic human proteins• Non human proteins• Antibodies

• Non human• Human-murine chimaer• Humanized• Human

• Who – Allelic differences of VDJ genes• How much – Break tolerance• Tolerance to own B cell receptors?

Proposed application in assessment of protein drugs

1 Compare amino acid sequence of drug with the human proteome

2 Predict epitopes in regions that differ from the human proteome

3 Select representative HLA alleles

4 Verify binding experimentally

5 Assess predicted immunogenecity using blood from treated patients/transgenic animals/naïve donors

6 Compare with clinical findings of immunogenecity/adverse effects/lack of effect

Data acquired Data on 33 approved therapeutic proteins

Julie Serritslev, Jens Vindahl Kringelum, et al., in preparation

•Immunogenicity•Percent of recipients in a clinical study that had detectable antibodies against the therapeutic protein•The primary source of immune response data was the reviewed data presented in Meyler's Side Effects of Drugs and from FDA labels.

Alleles representative of HLA-A, HLA-B and HLA-DRB, HLA-DQB1 and four

HLA-DPB1 super-types

Julie Serritslev, Jens Vindahl Kringelum, et al., in preparation; Nielsen et al., 2008

Prediction of epitopes

• MHC Class I and II binders can be predicted for all known alleles (AROC ~ 0.8-0.9)

• Binding correlates with likelihood of response

• No epitope give response in all individuals• Cross reactivity correlates with epitope

similarity• B cell epitopes are hard to predict (AROC ~

0.6-0.7)