Identification and characterization of copy number variation in Indian population and its...

51
Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07 May 2012

Transcript of Identification and characterization of copy number variation in Indian population and its...

Page 1: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Identification and characterization of copy number variation in Indian population

and its association with disease

Pankaj Kumar

CAS-MPG Presentation

07 May 2012

Page 2: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

CNVs are

- variations in the # of copies of genomic regions

- Can be insertions, deletions and duplications

- have size ranging from > 1 Kb to Mbs

Introduction

CNV SNP

Total Number 38,406 14,708,752

% of Reference Genome

29.74% <1%

CNV vs. SNPS

Page 3: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

C D F

Deletion

Polymorphism

Phenotypic Variability Disease Susceptibility

A B

C D EA B

Duplication

C D EA B D C EA B

Mutation

Freq

uenc

yOrigin

Types

Occurrence

Introduction contd..

Page 4: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Scherer et al. Nature Review Genetics 2006

Introduction contd..Consequence of CNVs

Unmask recessive alleles Disrupt genes

Alter regulation Cumulative effects

Page 5: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Objectives:

1. To identify CNVs in diverse Indian populations

2. To map CNV regions with disease susceptibility

3. To study consequence of CNV in disease

4. To explore the role of CNV in Spinocerebellar Ataxia

Page 6: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

CNV & Diseases

Proof -of-concept study

Page 7: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

APOBEC3b: insertion/ deletion polymorphism

Cytidine deaminase family of proteins

29 kb insertion/deletion polymorphism

Kidds et al. PLoS Genetics, 2007

Page 8: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Spectrum of APOBEC3B deletion frequency in Indian populations studied

Page 9: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

APOBEC3b insertion/deletion polymorphism & malaria endemicity

Insertion deletion

White - insertion Dark - deletion

Page 10: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Malaria cohortComparisons (Fisher's

test)Genotypes

Odds Ratio (95 % CI)

P value

Endemic

Non-severe vs. control AB & AA7.11

(3.20 to 15.97) 1x10-7

Severe vs. control AB & AA8.13

(2.62 to 26.59) 1.7x10-5

Severe vs. non-severe AB & AA1.14

(0.37 to 3.81)0.8

Non-endemic

Severe vs. control AB & AA0.39 (0.16 to 0.93)

0.0211

Severe vs. control BB & AB6.44

(1.76 to 24.99)

0.0012

Severe vs. control BB &

(AA+AB)3.17

(1.10 to 10.32)0.0177

Significant association of APOBEC3b with falciparum malaria

A - insertion alleleB- deletion allele

Insertion allele of APOBEC3B seems to be protective for malaria

Page 11: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Positive Selection for APOBEC3B locus in Malaria

???

APOBEC3B

500 Kb upstream 500 Kb downstream

EHH and Haplotype Analysis

Positive selection

markers markers

5' 3'

Page 12: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Endemic case Endemic control

Non-endemic case Non-endemic control

Haplotype based analysis for larger linkage disequilibrium

Selection for ABOPEC3B region has not been observed in malaria

Page 13: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Schematic representation of APOBEC gene cluster and segmental duplication region

Segmental duplication regions

Due to large no. of segmental duplication regions in this locus selection for APOBEC3B was not observed

Page 14: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Conclusions

• Insertion allele of APOBEC3B seems to be protective for malaria

• APOBEC3B locus has not Shown signature of positive selection by conventional methods may be due to high recombination events

• Since this gene is expressed in liver & spleen this might provide a new mechanism of host protective response

Page 15: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Identification of CNVs in the Indian population

A basal Database

Page 16: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Identification of large CNVs (>100k) in the Indian population : MethodologyIdentification of large CNVs (>100k) in the Indian population : Methodology

Sampling of IGV populations

477 samples, 26 populations477 samples, 26 populations

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

IE -W-IP2

IE-E-LP2

IE-N-LP1IE-N-LP9

IE-N-LP18

TB-N-IP1TB-N-SP1

IE-W-LP3

IE-W-LP1

IE-W-LP2

IE-E-IP1IE-NE-IP1

AA-NE-IP1

TB-NE-LP1

IE-N-IP2

IE-N-LP10IE-N-SP4

AA-E-IP3

AA-C-IP5

DR-S-LP

IE-W-LP4OG-W-IP

DR-S-LP3

DR-S-LP

IE-N-LP5

IE-E-LP4

IE-NE-LP1

DR-C-IP2

Affy 50k array (~58000 SNPs with av. inter-marker

distance 50 kb)

Raw intensity files

Retrieve segments >100 kb length & minimum 10 probes using G-

Console

CNV calling and QC(Genotyping Console+SVS7)

Validation using Sequenom massARRAY QGE assay

Page 17: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Results

Instances of genomic segment prone to CNVs

Raw CNV deletion = 70174 (<1Mb segment size) and 212 (>1Mb segment size)

Raw CNV duplication = 73580 (<1Mb segment size) and 60 (>1Mb segment size)

Total CNVRs deletions = 1425

Total CNVRs duplications = 1337

Page 18: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

result contd..Extent of CNVs in IGV populations

Page 19: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Chromosomal landscape of common CNV regions in all the populations pooled together

Page 20: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

5750(65%)

2048(23%)

1006(11%)

Deletion Duplication

GTC 3.0.2

2986(50%)

1461(25%)

1515(25%)

Deletion Duplication

SVS 7

Concordance of dataset using two independent algorithms

result contd..

~ 60% of copy number variable regions showed deletion and duplication both

Comparison using both the software shown 50% concordance prone to CNVs

Page 21: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

CNV Validation and Heterogeneityresult contd..

Validation using Sequenom MassARRAY QGE

Amplification

DeletionLess validation due to heterogeneity in CNV boundaries

Selection of probe for validation is a also key factor

Page 22: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

TB populations and isolatedHimalayan populations

AA and DR isolated populations

IE large populations

CNVs and Population Structure result contd..

Populations clustered according to genetic and linguistic affinity

Page 23: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

SN GENE_SYMBOL Disorder name Class1 KDR Hemangioma, capillary infantile, somatic Cancer 2 IRF4 Multiple myeloma Cancer 3 BRAF Adenocarcinoma of lung, somatic Cancer 4 KCNE2 Atrial fibrillation, familial, Long QT syndrome-6 Cardiovascular

5 AGT,AGTR1 Hypertension, essential, Renal tubular dysgenesis Cardiovascular

6 ADRB1 Congestive heart failure, susceptibility to, Resting heart rate Cardiovascular

7 KRT6A Pachyonychia congenita, Jadassohn-Lewandowsky type Dermatological

8 GTF2H5 Trichothiodystrophy, complementation group A, Dermatological

9 PRSS2 Pancreatitis, chronic Gastrointestinal 10 IL23R Crohn disease Gastrointestinal 11 ABCG5 Sitosterolemia Metabolic 12 HGD Alkaptonuria Metabolic

13 PPM2C Pyruvate dehydrogenase phosphatase deficiency Metabolic

14 A2M,APPAlzheimer disease, susceptibility to,

Emphysema due to alpha-2-macroglobulin deficiency

Neurological

15 ATXN8OS Spinocerebellar ataxia 8 Neurological 16 ATXN1 Spinocerebellar ataxia-1 Neurological 17 PRKCH Cerebral infarction Neurological 18 BFSP1 Cataract, cortical, juvenile-onset Ophthamological

19 HTRA1 Macular degeneration, age-related, 7, Macular degeneration, age-related, neovascular type Ophthamological

20 HMCN1 Macular degeneration, age-related, 1, Posterior column ataxia with retinitis pigmentosa Ophthamological

21 PTGDR,IL12B,HNMT,PTGER2 Asthma Respiratory

CNVs present in IGV map to genes that are associated with diseases

Page 24: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Conclusions

Observed 0.05 % to 1.46% of genomic fraction per individual

• A set of genes that are encompassed in CNVRs are novel and not reported in DGV (database of genomic variation).

• Validation process of individual CNVs showed substantial heterogeneity in the boundaries of CNVs within a gene.

• CNVs can be shared between genetically related populations

• Basal data for genomic region prone to CNVs in Indian population

• CNV regions predispose to many diseases in Indian populations.

Page 25: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Role of CNVs as a genetic modifier in SCA12 phenotype

Page 26: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Investigating the involvement of CNV in sub-phenotypes of SCA12

Neuro-degenerative disorder

CAG repeat expansion in 5’ UTR region of PPP2R2B gene

Two distinct sub-phenotypes have been observedTremor dominantGait dominant

SCA12

Could CNV be involved????

Page 27: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Workflow of CNV Identification

10 index cases of Gait 14 index cases of Tremor

SCA12 (CAG repeat in

PPP2R2B)

Affymetrix 6.0 SNP array

CNV calling (PennCNV)

Gene Annotation

Validation (RealTime method)

Data QC

Functional annotation clustering

IE large populations

Page 28: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

CN state Count in SCA12 Count in IE

0 987 389

1 2697 1226

3 257 465

4 158 257

Copy number state distribution in SCA12 and IE population

Chr CNV start

CNV end Sizes in Kb

Genes Gait Del

Gait Dup

HT Del

HT Dup

p value

odds ratio (OR)

chr110582072

810582389

83.17

Non genic

1 4 2 00.017

2Inf

chr14

105609468

105641621

32.1Non genic

6 1 1 10.004

425.144

2

chr5 32142841 32208250 51GOLPH

30 5 0 0

0.0048

Inf

Case control association analysis between gait and tremor groups

Page 29: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Amplification of chr5p13.3 region in Gait Ataxia

5/8 of gait samples0/14 of HT samples

GOLPH3 amplification Real Time validation

Page 30: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

GOLPH3 (golgi phosphoprotein 3 (coat-protein))

A Golgi localized protein

Have a regulatory role in Golgi trafficking

Identified as potent oncogene

modulates mTOR signaling

Inhibition of mTOR induces autophagy and reduces toxicity of polyglutamine expansions in fly and mouse models of Huntington disease

Brinda Ravikumar et al. Nature Genetics (2004)

Autophagy induction reduces mutant ataxin-3 levels and toxicity in a mouse model of spinocerebellar ataxia type 3

Fiona M. Menzies et al. Brain (2009)

Page 31: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Term Count % P value Bonferroni

Benjamini

Fold Enrichme

ntGO; 0005216~ ion channel activity

18 6.593 3.74E-05 0.0172 0.0172 3.2549

GO:0022838~substrate specific channel activity

18 6.593 5.48E-05 0.0252 0.0084 3.1568

GO:0015267~channel activity

18 6.593 8.39E-05 0.0383 0.0097 3.0495

GO:0022803~passive transmembrane transpore activity

18 6.593 8.64E-05 0.0394 0.0080 3.0421

Functional annotation clustering of genes under CNV specific to SCA12

significant enrichment of ion channel activity processes in SCA12

Page 32: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

A multigene enrichment analysis for dissection of biological system

Biological process

Molecular functions

Page 33: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Cellular components

CNV in ion channel genes and its involvement in different biological, molecularand cellular functions suggest physiological impairment in SCA12

Page 34: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Future direction

Page 35: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Conclusions

• Although SCA12 is a monogenic disorder, phenotypic variability could be due to other Genetic factors.

• Amplification in GOLPH3 gene could be a modifier gene that leads to gait ataxia feature.

• As Autophagy pathway is influenced by GOLPH3 through mTOR pathway that finally leads to Autophagolysis of inclusion bodies.

• GOLPH3 could be good intervention molecule for SCA12 pathogenesis.

• Ion channel genes and its implication in different neurological diseases, suggests physiochemical abnormalities in SCA12

Page 36: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Conclusion of my PhD work ……………

“Any two individual genomes taken from nature, in any species, will have dozens to hundreds of differences in their total number of functional genes.”

[Daniel R. Schrider and Matthew W. Hahn, Proc. R. Soc. B; 2010]

In conclusion our genome is less static and CNVs could play an important role in dynamics of the genome that facilitates evolution, adaptation and selection in populations and diseases due to dosage effect of functional genes/regions.

Page 37: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Jha P, Sinha S, Kanchan K, Qidwai T, Narang A, Singh PK, Pati SS, Mohanty S, Mishra SK, Sharma SK, Awasthi S, Venkatesh V, Jain S, Basu A, Xu S; Indian Genome Variation Consortium, Mukerji M, Habib S. Deletion of the APOBEC3B gene strongly impacts susceptibility to falciparum malaria. Infect Genet Evol. 2012 Jan;12(1):142-8.

Datta S, Chowdhury A, Ghosh M, Das K, Jha P, Colah R, Mukerji M, Majumder PP. A Genome-Wide Search for Non-UGT1A1 Markers Associated with Unconjugated Bilirubin Level Reveals Significant Association with a Polymorphic Marker Near a Gene of the Nucleoporin Family. Ann Hum Genet. 2012 Jan;76(1):33-41.

Abhimanyu, Indian Genome variation consortium, Jha P and Mridula Bose. Footprints of genetic susceptibility to pulmonary tuberculosis: Cytokine gene variants in north Indians. Indian J Med Res., 2011 (accepted)

Lall M, Thakur S, Puri R, Verma I, Mukerji M, Jha P. A 54 Mb 11qter duplication and 0.9 Mb 1q44 deletion in a child with laryngomalacia and agenesis of corpus callosum. Mol Cytogenet. 2011 Sep 21;4:19.

Publications

Page 38: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Gautam P*, Jha P*, Kumar D, Tyagi S, Varma B, Dash D, Mukhopadhyay A; Indian Genome Variation Consortium, Mukerji M. Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity. Hum Genet. 2011 Jul 9. * Equal contributing authors.

Ankita Narang*, Jha P*, Vimal Rawat, Arijit Mukhopadhayay, Debasis Dash, Analabha Basu, Mitali Mukerji. Recent admixture in an Indian population of African ancestry. Am. J. Hum. Genet. 2011 Jul 5. * Equal contributing authors.

Jha P, Suri V, Sharma V, Singh G, Sharma MC, Pathak P, Chosdol K, Jha P, Suri A,Mahapatra AK, Kale SS, Sarkar C. IDH1 mutations in gliomas: First series from a tertiary care centre in India with comprehensive review of literature. Exp Mol Pathol. 2011 May 3;91(1):385-393. Abhimanyu, Jha P, Jain A, Arora K, Bose M. Genetic association study suggests a role for SP110 variants in lymph node tuberculosis but not pulmonary tuberculosis in north Indians. Hum Immunol. 2011 Apr 20. Abhimanyu, Mangangcha IR, Jha P, Arora K, Mukerji M, Banavaliker JN, Consortium IG, Brahmachari V, Bose M. Differential serum cytokine levels are associated with cytokine gene polymorphisms in north Indian populations with active pulmonary tuberculosis. Infect Genet Evol. 2011 Apr 1. 

Page 39: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Jha P, Suri V, Jain A, Sharma MC, Pathak P, Jha P, Srivastava A, Suri A, Gupta D, Chosdol K, Chattopadhyay P, Sarkar C. O6-methylguanine DNA methyltransferase gene promoter methylation status in gliomas and its correlation with other molecular alterations: first Indian report with review of challenges for use in customized treatment. Neurosurgery. 2010 Dec; 67(6):1681-91. Jha P, Jha P, Pathak P, Chosdol K, Suri V, Sharma MC, Kumar G, Singh M, Mahapatra AK, Sarkar C. TP53 polymorphisms in gliomas from Indian patients: Study of codon 72 genotype, rs1642785, rs1800370, and 16 base pair insertion in intron-3. Exp Mol Pathol. 2011 Apr;90(2):167-72. (2010) Nov 27. Aggarwal S, Negi S, Jha P, Singh PK, Stobdan T, Pasha MA, Ghosh S, Agrawal A; Indian Genome Variation Consortium, Prasher B, Mukerji M. EGLN1 involvement in high-altitude adaptation revealed through genetic analysis of extreme constitution types defined in Ayurveda. Proc Natl Acad Sci U S A. (2010) Nov 2;107(44):18961-6.

HUGO Pan-Asian SNP Consortium, Mapping human genetic diversity in Asia. Science. (2009) Dec 11;326(5959):1541-5

Indian Genome Variation Consortium. Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet. (2008) Apr;87(1):3-20.

Page 40: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

TCGA for Genotyping Facility

Indian Genome Variation Consortium

CSIR

AcknowledgementsQuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressorare needed to see this picture.

Page 41: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Thank you

Page 42: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Extra slides

Page 43: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Copy Number Variation in Indian Population

547 healthy individuals from26 Reference Population from Indian Genome Variation Consortium

Affymetrix 50k Xba 240 array (raw intensity file)

CNV calling and QC(Genotyping Console+SVS7)

≥ 10 probes≥ 100 kb segment

Reference Sample(30) Test Sample(447)

Common CNV(> 5% of samples)

Rare CNV(< 5% of samples)

Validation using Sequenom massARRAY QGE assay(a subset of 12 genes)

Functional Enrichment Analysis

Mapping with Disease Associated regions

Genotype QC

Page 44: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.
Page 45: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.
Page 46: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Ins Homo Heterozygote Del HomoHWE test p-

value

Endemic case 29 41 3 0.018Too many

heterozygotes

Endemic control

64 18 0 0.586

Non-endemic case

56 11 17 7.95 × 10-9

Loss of too many

heterozygotes

Non-endemic control

51 25 5 0.508

Test for HWE

HWD generally indicates some kind of natural selection, after data quality control for genotyping error and population stratification

Page 47: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.
Page 48: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.
Page 49: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.
Page 50: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.
Page 51: Identification and characterization of copy number variation in Indian population and its association with disease Pankaj Kumar CAS-MPG Presentation 07.

Future direction

GOLPH3

mTOR Pathway

AUTOPHAGY

Amplification

Induction of mTOR pathway

Autophagy Inhibition

Aggregate formation

SCA12 modifier genes