Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged...

24
Click to edit Master title style 1 Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes Bruno Gaëta School of Computer Science and Engineering, UNSW

description

The study of inherited variation in the immunoglobulin heavy chain (IGH) locus has lagged behind that of other loci. This locus undergoes recombination during B-­‐ lymphocyte differentiation, as well as somatic hypermutation after antigen challenge, and the resulting variation is difficult to distinguish from inherited polymorphisms. In addition, most large-­‐scale human genomics projects (including the Human Genome Project and the 1000 Genomes Project) have ignored the IGH locus as they are based on sequencing DNA from lymphoblastoid cells in which the IGH locus has been recombined. As an alternative, our group has pioneered the use of ultra-­‐deep sequencing of rearranged immunoglobulin genes to understand inherited variation in the germline locus. By sampling and comparing tens of thousands of rearranged sequences from an individual it is possible to identify the patterns of variation that are consistent with inherited polymorphisms instead of resulting from somatic mutation. It is also possible to genotype, and in some cases haplotype, the IGH loci for this individual. This approach has required the development of a whole new range of bioinformatics algorithms tailored to immunoglobulin genes, and has resulted in the discovery of several new polymorphisms as well as providing the basis for in-­‐depth population analysis of the IGH locus. In this presentation I will outline the difficulties in applying standard genomic techniques to immunoglobulin genes and describe the bioinformatics methods we developed to study this unusual locus.

Transcript of Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged...

Page 1: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

1

Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing

of rearranged immunoglobulin genes

Bruno GaëtaSchool of Computer Science and Engineering, UNSW

Page 2: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

2

Immunoglobulin Rearrangement

Katherine Jackson

Page 3: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

3

So what do we know about the process?

• Combinatorial diversity (VDJ)

• P-addition

• N-addition

• Exonuclease action

• Somatic hypermutation

Page 4: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

4

Human heavy chain immunoglobulin variable region genes (IGH)

• IGHV

About 46 functional genes (7 families)

Up to 20 (?) reported alleles per gene

• IGHD

About 24 genes (7 families), 1-3 known alleles/gene

• IGHJ

6 functional genes, 1-4 known alleles/gene

Still very controversial!

Page 5: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

5

The Immunoglobulin Factsbook (Lefrancand Lefranc 2001)

Page 6: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

6

Page 7: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

7

Characterizing variation in the IGH locus

• The Human Genome Project, HapMap and the 1000 Genomes Project have all ignored the IGH locus

• Conventional methods are difficult to apply to this locus

• Our approach focuses on mass sequencing of rearranged sequences

Page 8: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

8

Blood sample Rearranged IGH gene sequences (VDJ)

Data generation

Data from Stanford University (Lyndon Zhang, Katherine Jackson, Scott D. Boyd, Andrew Z. Fire)

Multiplex PCR Sequencing (454)

Page 9: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

9

Bioinformatics analysis

Rearranged IGH gene

sequences (VDJ)

HaplotypeGenotypeIdentify

germlinegenes

Draft genotype

iHMMune-align model

Page 10: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

10

iHMMune-align

• Hidden Markov model of immunoglobulin rearrangement and diversification processes

• Designed to identify the most likely germlinegene segments in a rearranged Ig gene sequence and partition the sequence

• Can also be used to calculate the probability of a sequence originating from a specific germline gene

Page 11: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

11

iHMMune-align HMM topology

Gaëta et al (2007) Bioinformatics 23:1580

Page 12: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

12

Genotyping

• Find the combination of alleles most likely to generate the observed data

Page 13: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

13

IGHV Genotyping

• Pre-align sequences (Vmatch) with the IGHV repertoire to filter out unlikely alleles (draft genotype)

• Calculate P(si|gn) using the iHMMune-align model

• Calculate likelihood of sequence set for each combination of alleles in the draft genotype

• Select most likely genotype

Page 14: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

14

IGHD Genotyping

• IGHD genes very short and difficult to identify unambiguously: use a combination of iHMMune-align (with only IGHV alleles present in the genotype) and specific pattern searches

• Calculate P(si|gn) using a simplified iHMMune-align model

Page 15: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

15

IGHJ Genotyping

• Similar to IGHV genotyping but with a simplified iHMMune-align model

Page 16: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

16

Page 17: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

17

Genotyping - evaluation

Page 18: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

18

Once the genotype is determined…

• Re-identify germline genes in the sequence set, using only germline genes present in the genotype (iHMMune-align)

Page 19: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

19

Determination of phased haplotypes

IGHV1-2*01 IGHJ6*02

Only possible for subjects heterozygous at the IGHJ4 or IGHJ6 loci

IGHV1-2*01 IGHJ6*02

IGHV1-2*04 IGHJ6*03

IGHV1-2*04 IGHJ6*03

IGHV1-2*04 IGHJ6*03

Page 20: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

20

Automated classification5. Mult inomial Logist ic Regression for the Ident ificat ion of Immunoglobulin

Haplotypes

Figure 5.4: Classificat ion er ror rat es of di↵erent algor i t hms for I GH D

haplotyping. The error rates of using ‘Counts of Sequences’(CoS) ,‘Bino-

mial Probabilit ies’ (BP) and ‘Counts of Sequences Plus Binomial Probabilit ies’

(CoSBP) as attributes in the classificat ions were also compared.

respect ively. The classificat ion correctness given by di↵erent algorithms using

di↵erent at tributes is shown in Table 5.5.

Figure 5.4 compares the performance of Logist ic Regression, Linear Regres-

sion,SVM and Decision Tree. The logist ic regression using ‘Binomial Probabili-

t ies’(BP) as classificat ion attributes gave the best classificat ion.

Table 5.6 shows the di↵erence of haplotypes ident ified by manual and au-

tomatic haplotyping. Excellent agreement was observed between manually and

156

5. Mult inomial Logist ic Regression for the Ident ificat ion of Immunoglobulin

Haplotypes

Figure 5.3: Classificat ion er ror rat es of di↵erent algor i t hms for IGH V

haplotyping. Logist ic regression, linear regression and J48 decision tree’s per-

formances were compared. The error rates of using ‘Counts of Sequences’(CoS),

‘Binomial Probabilit ies’ (BP) and ‘Counts of Sequences Plus Binomial Probabil-

it ies’ (CoSBP) as att ributes in the classificat ions were also compared.

154

Page 21: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

21

Page 22: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

22

IGHD Haplotypes

Page 23: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

23

IGHV Haplotypes

Ambiguity

Duplication

D Deletion

Page 24: Characterising the human immunoglobulin heavy chain locus by ultra-deep sequencing of rearranged immunoglobulin genes - Bruno Gaëta

Click to edit Master title style

24

The team…

• BABS, UNSW

– Marie Kidd

– Yan Wang

– Mark Tanaka

– Andrew Collins

• CSE, UNSW

– Zhiliang Chen

– Bruno Gaëta

• Pathology, Stanford

– Lyndon Zhang

– Katherine Jackson

– Scott Boyd

– Andrew Fire