IMGT/HighV-QUEST for NGS antibody repertoire I analysis MGT › IMGTposters ›...

1
Biological Context AGENCE NATIONALE DE LA RECHERCHE ©2013 Alamyar Eltaf, Géraldine Folch, Joumana Jabado-Michaloud and Marie-Paule Lefranc IMGT ® founder and director: Marie-Paule Lefranc ([email protected]) Bioinformatics manager: Véronique Giudicelli ([email protected]) Computer manager: Patrice Duroux ([email protected]) © Copyright 1995-2013 IMGT ® , the international ImMunoGeneTics information system ® [1] Giudicelli V and Lefranc M-P, Front Genet, 3:79, 2012. [2] Alamyar E et al. Mol Biol 882:569-604, 2012. [3] Alamyar E et al. Immunome Res 8(1):26, 2012. [4] Li S et al. Nat. Commun. 4:2333 doi: 10.1038/ncomms3333 (2013). Acknowledgments: this work was granted access to the HPC resources of CINES under the allocation 036029-(2010-2013) made by GENCI (Grand Equipement National de Calcul Intensif). Eltaf Alamyar, Véronique Giudicelli, Géraldine Folch, Joumana Jabado-Michaloud, Patrice Duroux, and Marie-Paule Lefranc IMGT®, the international ImMunoGeneTics information system®, Laboratoire d'ImmunoGénétique Moléculaire LIGM, Institut de Génétique Humaine IGH, UPR CNRS 1142, 141 rue de la Cardonille, Montpellier, 34396, France I m M uno G ene T ics Information system® http://www.imgt.org IMGT/HighV-QUEST for NGS antibody repertoire analysis The adaptive immune response is characterized by an extreme diversity of the specific antigen receptors that comprise the immunoglobulins (IG) or antibodies and the T cell receptors (TR) (10¹² different IG and 10¹² different TR per individual, in humans). The complex molecular mechanisms (DNA rearrangements, N-diversity, and for IG, somatic hypermutations) that occur in B cells and T cells are at the origin of that huge diversity. 10¹² IG (antibodies) per individual IMGT-ONTOLOGY concepts L-PART1 FR1-IMGT FR2-IMGT FR3-IMGT L-PART2 CDR1-IMGT CDR2-IMGT CDR3-IMGT V-EXON V-GENE L-V-GENE-UNIT V-GENE-UNIT L-INTRON-L V-RS 1st-CYS CONSERVED-TRP 2nd-CYS V-REGION V-INTRON ACCEPTOR-SPLICE DONOR-SPLICE INIT-CODON V-SPACER 3’V-REGION V-HEPTAMER V-NONAMER 5'UTR 3'UTR V-GENE 3’D-RS 5’D-RS 3’D-SPACER 3’D-HEPTAMER 5’D-HEPTAMER 3’D-NONAMER 5’D-SPACER 5’D-NONAMER D-REGION D-GENE D-GENE-UNIT 3'UTR 5'UTR D-GENE J-RS J-HEPTAMER 5’J-REGION J-SPACER J-NONAMER 3'UTR 5'UTR J-REGION J-GENE J-GENE-UNIT J-TRP or J-PHE DONOR-SPLICE J-GENE Prototypes of IG and TR V, D, J genes DESCRIPTION The concepts of description correspond to IMGT® standar- dized labels. They are more than 560 standardizerd labels (available in the IMGT Scientific chart), 277 for the nucleo- tide sequences and 285 for the 3D structures. CLASSIFICATION The concepts of classification allowed to classify and name the human IG and TR genes and alleles which were approved by HGNC and endorsed by WHO-IUIS. They provide the frame for the standardized IG and TR nomenclature of jawed vertebrates. NUMEROTATION The concepts of numerotation comprise the ‘IMGT unique numbering’ and ‘IMGT Collier de Perles’. Prototypes are graphical representation based on the concepts of description Since the availability of IMGT/HighV-QUEST in October 2010, more than 1.4 billions of sequences (from external users) have been submitted. They required more than 720,000 hours of computational resources. About 32 terabytes of results were generated. 303 of IMGT/HighV-QUEST users are from USA (43%), 266 are from EU (17 countries) (38%), and 133 from other countries (19%). Other countries’ users Users and Analyses IMGT-ONTOLOGY Concepts IMGT ® , the international ImMunoGeneTics information system® , created in 1989 at Montpellier, France, by Marie-Paule Lefranc (CNRS and Université Montpellier 2), is at the birth of immunoinformatics. IMGT ® manages the immunogenetics data, and more particularly the sequences, genes and structures of immunoglobulins (IG) or antibodies and T cell receptors (TR). Standardization and data integration are obtained through the IMGT-ONTOLOGY [1] concepts of identification (IMGT standardized keywords), classification (IMGT standardized nomenclature: IMGT gene and allele names approved by HGNC and used by NCBI Gene), description (IMGT standardized labels) and numerotation (IMGT unique numbering and IMGT Colliers de Perles: widely used for antibody engineering and humanization). IMGT ® comprises seven databases (including IMGT/mAb-DB), seventeen tools and more than 15,000 pages of Web resources. To answer the needs of high throughput and Next Generation Sequencing (NGS) data, the IMGT/HighV-QUEST tool [2-4] was developed which analyses up to 500,000 long 454 sequences by run. The results, based on IMGT-ONTOLOGY, include identification of the closest germline genes and alleles for genotype and haplotype analysis, and standardized characterization of the 'IMGT clonotypes (AA)' for antibody clonal diversity and expression and achieve, for the first time, a degree of resolution for NGS verifiable by the user at the sequence level. Amino acid frequency can be determined at each CDR-IMGT and FR-IMGT positions. This tool provides a paradigm for IG clonal diversity and expression repertoire analysis from NGS and high resolution results for antibody engineering and combinatorial library construction. TAKE HOME MESSAGE: * IMGT/HighV-QUEST analyses antibody NGS data at a sequence level verifiable by the user. * Standardized characterization of IMGT clonotypes, based on the IMGT-ONTOLOGY concepts, identifies clonal diversity and expression, * Amino acid frequency can be determined at each CDR-IMGT and FR-IMGT position. VH clonal diversity 61 59 29 23 13 12 9 9 9 9 7 5 5 5 4 33 1 Germany United Kingdom Netherlands France Switzerland Italy Belgium Spain Norway Austria Sweden Finland Greece Luxembourg Denmark Portugal Czech Republic Ireland 34 27 12 10 9 7 6 6 3 3 3 3 2 111 11 111 Japan China Canada Taiwan Australia Israel Russian Federation Mexico Korea Singapore Uruguay Hong Kong South Africa Brazil Iran Pakistan India New Zealand Venezuela Argentina Colombia EU users 303 266 133 US EU Other Users 19% 38% 43% IMGT Clonotypes (AA) VH clonal expression 0 200 400 600 800 1000 1200 1400 Homsap IGHV1-c Homsap IGHV1-f Homsap IGHV2-70D Homsap IGHV3-30-3 Homsap IGHV3-43D Homsap IGHV3-64D Homsap IGHV3-NL1 Homsap IGHV4-30-2 Homsap IGHV4-30-4 Homsap IGHV4-b Homsap IGHV5-a Homsap IGHV7-4-1 Homsap IGHV7-81 Homsap IGHV5-78 Homsap IGHV3-74 Homsap IGHV3-73 Homsap IGHV3-72 Homsap IGHV3-71 Homsap IGHV2-70 Homsap IGHV1-69 Homsap IGHV1-68 Homsap IGHV3-66 Homsap IGHV3-64 Homsap IGHV3-63 Homsap IGHV3-62 Homsap IGHV4-61 Homsap IGHV4-59 Homsap IGHV1-58 Homsap IGHV4-55 Homsap IGHV3-54 Homsap IGHV3-53 Homsap IGHV3-52 Homsap IGHV5-51 Homsap IGHV3-49 Homsap IGHV3-48 Homsap IGHV3-47 Homsap IGHV1-46 Homsap IGHV1-45 Homsap IGHV3-43 Homsap IGHV7-40 Homsap IGHV4-39 Homsap IGHV3-38 Homsap IGHV3-35 Homsap IGHV7-34-1 Homsap IGHV4-34 Homsap IGHV3-33 Homsap IGHV3-32 Homsap IGHV4-31 Homsap IGHV3-30 Homsap IGHV4-28 Homsap IGHV2-26 Homsap IGHV3-25 Homsap IGHV1-24 Homsap IGHV3-23 Homsap IGHV3-22 Homsap IGHV3-21 Homsap IGHV3-20 Homsap IGHV3-19 Homsap IGHV1-18 Homsap IGHV3-16 Homsap IGHV3-15 Homsap IGHV3-13 Homsap IGHV3-11 Homsap IGHV2-10 Homsap IGHV3-9 Homsap IGHV1-8 Homsap IGHV3-7 Homsap IGHV2-5 Homsap IGHV4-4 Homsap IGHV1-3 Homsap IGHV1-2 Homsap IGHV6-1 0 1000 2000 3000 4000 5000 6000 Homsap IGHJ1P Homsap IGHJ1 Homsap IGHJ2 Homsap IGHJ3 Homsap IGHJ4 Homsap IGHJ5 Homsap IGHJ3P Homsap IGHJ6 The filtered-in sequences comprise the ‘1 copy’ and the ‘More than 1’. The ‘1 copy’ includes the ‘single allele’ and ‘several alleles (or genes)’. A high proportion of ‘single allele’ is a good indicator of sequence length and quality. In IMGT®, the clonotype designated as ‘IMGT clonotype (AA)’ is defined among the ‘1 copy’ ‘single allele’ (for V and J) by a unique V-(D)-J rearrangement (IMGT genes and alleles determined at the nt level), conserved anchors (C104, W or F 118), and a unique CDR3-IMGT AA in frame junction [4]. Each ‘IMGT clonotype (AA)’ is characterized by a selected unique representative sequence. 0 200 400 600 800 1000 1200 Homsap IGHD1-1 Homsap IGHD2-2 Homsap IGHD3-3 Homsap IGHD4-11 Homsap IGHD4-4 Homsap IGHD5-5 Homsap IGHD6-6 Homsap IGHD1-7 Homsap IGHD2-8 Homsap IGHD3-9 Homsap IGHD3-10 Homsap IGHD5-12 Homsap IGHD6-13 Homsap IGHD1-14 Homsap IGHD2-15 Homsap IGHD3-16 Homsap IGHD4-17 Homsap IGHD5-18 Homsap IGHD6-19 Homsap IGHD1-20 Homsap IGHD2-21 Homsap IGHD3-22 Homsap IGHD4-23 Homsap IGHD5-24 Homsap IGHD6-25 Homsap IGHD1-26 Homsap IGHD7-27 Analysis of the CDR3-IMGT can be performed for clonal diversity and clonal expression. Amino acid frequency can be determined, with their IMGT AA physicochemical classes (Pommié C et al. 2004), at each CDR-IMGT and FR-IMGT position. For clonal expression,the number of sequences for each ‘IMGT clonotypes (AA)’ is added, stepwise: (1) the ‘1 copy’ representative sequence. (2) the ‘1 copy’ ‘single allele’ sequences not selected as representative. (3) the ‘1 copy’ ‘several alleles (or genes)’ with the same CDR3-IMGT (AA) and the same V and J alleles of the representative sequence among those proposed by IMGT/HighV-QUEST. (4) the ‘More than 1’ corresponding to the ‘1 copy’ of steps 1-3. ‘single allele’ ‘several alleles (or genes)’ IMGT/HighV-QUEST based on IMGT ® standard Clonal diversity is the number of ‘IMGT clonotypes (AA)’ per V, D and J gene. Clonal expression is the number of sequences assigned to ‘IMGT clonotypes (AA)’ per V, D and J gene. IMGT clonotypes (nt) # Exp. ID V gene and allele D gene and allele J gene and allele CDR3- IMGT length (AA) CDR3-IMGT sequence (AA) Anchors 104,118 V % Sequence length Sequence ID Total nb of '1 copy' Total nb of 'More than 1' Total Sequences le ('1 copy') 1 137-mid5 Homsap IGHV1-2*02 F Homsap IGHD2-2*01 F Homsap IGHJ6*03 F 22 AA ARDLYCSSTSCYGGWYYYYMDV C,F 95.14 425 GJNZTB402H8X9K _length=425 1 0 1 Sequences le 2 157-mid5 Homsap IGHV1-2*02 F Homsap IGHD6-6*01 F Homsap IGHJ6*03 F 22 AA ARERVGRSIAARRAPDYYYMDV C,F 97.92 426 GJNZTB402H4DK W_length=426 1 0 1 Sequences le 3 305-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-22*01 F Homsap IGHJ4*02 F 21 AA ARGPYHRPTYYYDSSGYYGDY C,F 96.15 374 GJNZTB402FSHFL _length=374 1 0 1 Sequences le 4 331-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-10*02 F Homsap IGHJ3*02 F 21 AA ARNVGHRPGSSDAWEADAFDI C,F 99.31 422 GJNZTB402JFJ2O_ length=422 1 0 1 Sequences le 5 374-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-3*01 F Homsap IGHJ3*02 F 21 AA ATTHEPAITIFGVVINDAFDI C,F 96.18 422 GJNZTB402H3QEP _length=422 1 0 1 Sequences le 6 647-mid5 Homsap IGHV1-2*02 F Homsap IGHD6-19*01 F Homsap IGHJ6*03 F 19 AA AKGAIAVAGTNYYYYYMDV C,F 97.74 330 GJNZTB402HVTXP _length=330 1 0 1 Sequences le 7 693-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-22*01 F Homsap IGHJ5*02 F 19 AA ARDGSTYYYDSSGYYWFDP C,F 99.65 417 GJNZTB402FI7QZ _length=417 1 0 1 Sequences le 8 709-mid5 Homsap IGHV1-2*02 F Homsap IGHD4-23*01 ORF Homsap IGHJ2*01 F 19 AA ARDMGRYGGNLRRYWYFDL C,F 97.57 416 GJNZTB402G7EN4 _length=416 1 0 1 Sequences le ID IMGT clonotype (AA) denion IMGT clonotype (AA) representave sequence Nb Homsap IGHV1-2*02 F

Transcript of IMGT/HighV-QUEST for NGS antibody repertoire I analysis MGT › IMGTposters ›...

Page 1: IMGT/HighV-QUEST for NGS antibody repertoire I analysis MGT › IMGTposters › PosterAntibodyengineering201312.pdf · Biological Context AGENCE NATIONALE DE LA RECHERCHE ©2013 Alamyar

Biological Context

AGENCENATIONALEDE LARECHERCHE

©2013 Alamyar Eltaf, Géraldine Folch, Joumana Jabado-Michaloud and Marie-Paule Lefranc

IMGT® founder and director: Marie-Paule Lefranc ([email protected])Bioinformatics manager: Véronique Giudicelli ([email protected])Computer manager: Patrice Duroux ([email protected])

© Copyright 1995-2013 IMGT®, the international ImMunoGeneTics information system®

[1] Giudicelli V and Lefranc M-P, Front Genet, 3:79, 2012. [2] Alamyar E et al. Mol Biol 882:569-604, 2012. [3] Alamyar E et al. Immunome Res 8(1):26, 2012. [4] Li S et al. Nat. Commun. 4:2333 doi: 10.1038/ncomms3333 (2013).

Acknowledgments: this work was granted access to the HPC resources of CINES under the allocation 036029-(2010-2013) made by GENCI (Grand Equipement National de Calcul Intensif).

Eltaf Alamyar, Véronique Giudicelli, Géraldine Folch, Joumana Jabado-Michaloud, Patrice Duroux, and Marie-Paule LefrancIMGT®, the international ImMunoGeneTics information system®, Laboratoire d'ImmunoGénétique Moléculaire LIGM, Institut de Génétique Humaine IGH, UPR CNRS 1142, 141 rue de la Cardonille, Montpellier, 34396, France

ImMunoGeneTics

Informationsystem®

http://www.imgt.org

IMGT/HighV-QUEST for NGS antibody repertoire analysis

The adaptive immune response is characterized by an extreme diversity of the specific antigen receptors that comprise the immunoglobulins (IG) or antibodies and the T cell receptors (TR) (10¹² different IG and 10¹² different TR per individual, in humans). The complex molecular mechanisms (DNA rearrangements, N-diversity, and for IG, somatic hypermutations) that occur in B cells and T cells are at the origin of that huge diversity.

10¹² IG (antibodies)per individual

IMGT-ONTOLOGY concepts

L-PART1

FR1-IMGT FR2-IMGT FR3-IMGT

L-PART2 CDR1-IMGT CDR2-IMGT CDR3-IMGT

V-EXON

V-GENE

L-V-GENE-UNIT

V-GENE-UNIT

L-INTRON-L

V-RS

1st-CYS CONSERVED-TRP 2nd-CYS

V-REGION

V-INTRON

ACCEPTOR-SPLICEDONOR-SPLICEINIT-CODON

V-SPACER3’V-REGION

V-HEPTAMERV-NONAMER

5'UTR 3'UTR

V-GENE

3’D-RS5’D-RS3’D-SPACER

3’D-HEPTAMER5’D-HEPTAMER 3’D-NONAMER

5’D-SPACER

5’D-NONAMER

D-REGION

D-GENED-GENE-UNIT

3'UTR5'UTR

D-GENE

J-RS

J-HEPTAMER

5’J-REGIONJ-SPACER

J-NONAMER

3'UTR5'UTR

J-REGION

J-GENE

J-GENE-UNIT

J-TRP or J-PHE DONOR-SPLICE

J-GENE

Prototypes of IG and TR V, D, J genesDESCRIPTIONThe concepts of description correspond to IMGT® standar-dized labels. They are more than 560 standardizerd labels (available in the IMGT Scientific chart), 277 for the nucleo-tide sequences and 285 for the 3D structures.

CLASSIFICATIONThe concepts of classification allowed to classify and name the human IG and TR genes and alleles which were approved by HGNC and endorsed by WHO-IUIS. They provide the frame for the standardized IG and TR nomenclature of jawed vertebrates.

NUMEROTATIONThe concepts of numerotation comprise the ‘IMGT unique numbering’ and ‘IMGT Collier de Perles’.

Prototypes are graphical representation based on the concepts of description

Since the availability of IMGT/HighV-QUEST in October 2010, more than 1.4 billions of sequences (from external users) have been submitted. They required more than 720,000 hours of computational resources. About 32 terabytes of results were generated.

303 of IMGT/HighV-QUEST users are from USA (43%), 266 are from EU (17 countries) (38%), and 133 from other countries (19%).

Other countries’ users

Users and Analyses

IMGT-ONTOLOGY Concepts

IMGT®, the international ImMunoGeneTics information system®, created in 1989 at Montpellier, France, by Marie-Paule Lefranc (CNRS and Université Montpellier 2), is at the birth of immunoinformatics. IMGT® manages the immunogenetics data, and more particularly the sequences, genes and structures of immunoglobulins (IG) or antibodies and T cell receptors (TR). Standardization and data integration are obtained through the IMGT-ONTOLOGY [1] concepts of identification (IMGT standardized keywords), classification (IMGT standardized nomenclature: IMGT gene and allele names approved by HGNC and used by NCBI Gene), description (IMGT standardized labels) and numerotation (IMGT unique numbering and IMGT Colliers de Perles: widely used for antibody engineering and humanization). IMGT® comprises seven databases (including IMGT/mAb-DB), seventeen tools and more than 15,000 pages of Web resources. To answer the needs of high throughput and Next Generation Sequencing (NGS) data, the IMGT/HighV-QUEST tool [2-4] was developed which analyses up to 500,000 long 454 sequences by run. The results, based on IMGT-ONTOLOGY, include identification of the closest germline genes and alleles for genotype and haplotype analysis, and standardized characterization of the 'IMGT clonotypes (AA)' for antibody clonal diversity and expression and achieve, for the first time, a degree of resolution for NGS verifiable by the user at the sequence level. Amino acid frequency can be determined at each CDR-IMGT and FR-IMGT positions. This tool provides a paradigm for IG clonal diversity and expression repertoire analysis from NGS and high resolution results for antibody engineering and combinatorial library construction. TAKE HOME MESSAGE: * IMGT/HighV-QUEST analyses antibody NGS data at a sequence level verifiable by the user. * Standardized characterization of IMGT clonotypes, based on the IMGT-ONTOLOGY concepts, identifies clonal diversity and expression, * Amino acid frequency can be determined at each CDR-IMGT and FR-IMGT position.

VH clonal diversity

61

592923

1312

99

99 7 5 5 5 4 3 3 1

GermanyUnited KingdomNetherlandsFranceSwitzerlandItalyBelgiumSpainNorwayAustriaSwedenFinlandGreeceLuxembourgDenmarkPortugalCzech RepublicIreland

34

271210

9

7

66 3 3 3 3 2 1111 1 1 1 1

JapanChinaCanadaTaiwanAustraliaIsraelRussian FederationMexicoKoreaSingaporeUruguayHong KongSouth AfricaBrazilIranPakistanIndiaNew ZealandVenezuelaArgentinaColombia

EU users

303

266

133

USEUOther

Users

19%

38%

43%

IMGT Clonotypes (AA)

VH clonal expression

0 200 400 600 800 1000 1200 1400

Homsap IGHV1-cHomsap IGHV1-f

Homsap IGHV2-70DHomsap IGHV3-30-3Homsap IGHV3-43DHomsap IGHV3-64DHomsap IGHV3-NL1Homsap IGHV4-30-2Homsap IGHV4-30-4

Homsap IGHV4-bHomsap IGHV5-a

Homsap IGHV7-4-1Homsap IGHV7-81Homsap IGHV5-78Homsap IGHV3-74Homsap IGHV3-73Homsap IGHV3-72Homsap IGHV3-71Homsap IGHV2-70Homsap IGHV1-69Homsap IGHV1-68Homsap IGHV3-66Homsap IGHV3-64Homsap IGHV3-63Homsap IGHV3-62Homsap IGHV4-61Homsap IGHV4-59Homsap IGHV1-58Homsap IGHV4-55Homsap IGHV3-54Homsap IGHV3-53Homsap IGHV3-52Homsap IGHV5-51Homsap IGHV3-49Homsap IGHV3-48Homsap IGHV3-47Homsap IGHV1-46Homsap IGHV1-45Homsap IGHV3-43Homsap IGHV7-40Homsap IGHV4-39Homsap IGHV3-38Homsap IGHV3-35

Homsap IGHV7-34-1Homsap IGHV4-34Homsap IGHV3-33Homsap IGHV3-32Homsap IGHV4-31Homsap IGHV3-30Homsap IGHV4-28Homsap IGHV2-26Homsap IGHV3-25Homsap IGHV1-24Homsap IGHV3-23Homsap IGHV3-22Homsap IGHV3-21Homsap IGHV3-20Homsap IGHV3-19Homsap IGHV1-18Homsap IGHV3-16Homsap IGHV3-15Homsap IGHV3-13Homsap IGHV3-11Homsap IGHV2-10Homsap IGHV3-9Homsap IGHV1-8Homsap IGHV3-7Homsap IGHV2-5Homsap IGHV4-4Homsap IGHV1-3Homsap IGHV1-2Homsap IGHV6-1

0 1000 2000 3000 4000 5000 6000

Homsap IGHJ1P

Homsap IGHJ1

Homsap IGHJ2

Homsap IGHJ3

Homsap IGHJ4

Homsap IGHJ5

Homsap IGHJ3P

Homsap IGHJ6

The filtered-in sequences comprise the ‘1 copy’ and the ‘More than 1’.The ‘1 copy’ includes the ‘single allele’ and ‘several alleles (or genes)’. A high proportion of ‘single allele’ is a good indicator of sequence length and quality.

In IMGT®, the clonotype designated as ‘IMGT clonotype (AA)’ is defined among the ‘1 copy’ ‘single allele’ (for V and J) by a unique V-(D)-J rearrangement (IMGT genes and alleles determined at the nt level), conserved anchors (C104, W or F 118), and a unique CDR3-IMGT AA in frame junction [4]. Each ‘IMGT clonotype (AA)’ is characterized by a selected unique representative sequence.

0 200 400 600 800 1000 1200

Homsap IGHD1-1Homsap IGHD2-2Homsap IGHD3-3

Homsap IGHD4-11Homsap IGHD4-4Homsap IGHD5-5Homsap IGHD6-6Homsap IGHD1-7Homsap IGHD2-8Homsap IGHD3-9

Homsap IGHD3-10Homsap IGHD5-12Homsap IGHD6-13Homsap IGHD1-14Homsap IGHD2-15Homsap IGHD3-16Homsap IGHD4-17Homsap IGHD5-18Homsap IGHD6-19Homsap IGHD1-20Homsap IGHD2-21Homsap IGHD3-22Homsap IGHD4-23Homsap IGHD5-24Homsap IGHD6-25Homsap IGHD1-26Homsap IGHD7-27

Analysis of the CDR3-IMGT can be performed for clonal diversity and clonal expression. Amino acid frequency can be determined, with their IMGT AA physicochemical classes (Pommié C et al. 2004), at each CDR-IMGT and FR-IMGT position.

For clonal expression,the number of sequences for each ‘IMGT clonotypes (AA)’ is added, stepwise: (1) the ‘1 copy’ representative sequence. (2) the ‘1 copy’ ‘single allele’ sequences not selected as representative. (3) the ‘1 copy’ ‘several alleles (or genes)’ with the same CDR3-IMGT (AA) and the same V and J alleles of the representative sequence among those proposed by IMGT/HighV-QUEST. (4) the ‘More than 1’ corresponding to the ‘1 copy’ of steps 1-3.

‘single allele’ ‘several alleles (or genes)’

IMGT/HighV-QUEST based on IMGT® standard

Clonal diversity is the number of ‘IMGT clonotypes (AA)’ per V, D and J gene.

Clonal expression is the number of sequences assigned to ‘IMGT clonotypes (AA)’ per V, D and J gene.

IMGT clonotypes (nt)

# Exp. ID V gene and allele D gene and allele J gene and allele

CDR3-IMGT length (AA)

CDR3-IMGT sequence (AA)Anchors 104,118

V %Sequence length

Sequence ID

Total nb of '1 copy'

Total nb of 'More than 1'

TotalSequences file ('1 copy')

1 137-mid5 Homsap IGHV1-2*02 F Homsap IGHD2-2*01 F Homsap IGHJ6*03 F 22 AA ARDLYCSSTSCYGGWYYYYMDV C,F 95.14 425GJNZTB402H8X9K_length=425

1 0 1Sequences file

2 157-mid5 Homsap IGHV1-2*02 F Homsap IGHD6-6*01 F Homsap IGHJ6*03 F 22 AA ARERVGRSIAARRAPDYYYMDV C,F 97.92 426GJNZTB402H4DKW_length=426

1 0 1Sequences file

3 305-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-22*01 F Homsap IGHJ4*02 F 21 AA ARGPYHRPTYYYDSSGYYGDY C,F 96.15 374GJNZTB402FSHFL_length=374

1 0 1Sequences file

4 331-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-10*02 F Homsap IGHJ3*02 F 21 AA ARNVGHRPGSSDAWEADAFDI C,F 99.31 422GJNZTB402JFJ2O_length=422

1 0 1Sequences file

5 374-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-3*01 F Homsap IGHJ3*02 F 21 AA ATTHEPAITIFGVVINDAFDI C,F 96.18 422GJNZTB402H3QEP_length=422

1 0 1Sequences file

6 647-mid5 Homsap IGHV1-2*02 F Homsap IGHD6-19*01 F Homsap IGHJ6*03 F 19 AA AKGAIAVAGTNYYYYYMDV C,F 97.74 330GJNZTB402HVTXP_length=330

1 0 1Sequences file

7 693-mid5 Homsap IGHV1-2*02 F Homsap IGHD3-22*01 F Homsap IGHJ5*02 F 19 AA ARDGSTYYYDSSGYYWFDP C,F 99.65 417GJNZTB402FI7QZ_length=417

1 0 1Sequences file

8 709-mid5 Homsap IGHV1-2*02 FHomsap IGHD4-23*01 ORF

Homsap IGHJ2*01 F 19 AA ARDMGRYGGNLRRYWYFDL C,F 97.57 416GJNZTB402G7EN4_length=416

1 0 1Sequences file

ID IMGT clonotype (AA) definitionIMGT clonotype (AA) representative sequence

Nb

Homsap IGHV1-2*02 F