Exploring Relationships between Vector-Borne Diseases and ...
Exploring complex diseases using genome-wide association: challenges and strategies
description
Transcript of Exploring complex diseases using genome-wide association: challenges and strategies
![Page 1: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/1.jpg)
Exploring complex diseases using genome-wide association: challenges and strategies
Li Jin, Ph.D.
Fudan University
CAS-MPG Partner Institute for Computational Biology
HGM2006, Helsinki
![Page 2: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/2.jpg)
AGC
GGC
Ser
Gly
Positional Cloning
HGM2006, Helsinki
![Page 3: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/3.jpg)
LinkageDisequilibrium
Linkage
HGM2006, Helsinki
![Page 4: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/4.jpg)
Daly et al. Nature Genetics, 2001 HGM2006, Helsinki
![Page 5: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/5.jpg)
Genome-wide Association Study
Candidate Gene/Region Association Study
Genotyping tagSNPsSelect tagSNPs
Association analysis
HGM2006, Helsinki
![Page 6: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/6.jpg)
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
![Page 7: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/7.jpg)
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
![Page 8: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/8.jpg)
Multiple Testing
• Large number of SNPs
– Number of tagging SNPs remains to be large (106)
• Multiple testing problem:
– Stringent p-value (10-6 – 10-7)
– Freimer and Sabatti (2004)
– Sample size and power
• Association:
– Linear transformation: T is an invariable
– Nonlinear transformation
)()( PPPPT ATA
7105 gwP
HGM2006, Helsinki
![Page 9: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/9.jpg)
Motivation
PPPhPh AA )()(
)()( PhPh A
Statistics based on
Higher Power?
Statistics based on
PP A
Low Power
HGM2006, Helsinki
![Page 10: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/10.jpg)
Nonlinear Transformations
Entropy
Function Derivative
xx log xlog1 Exponential
xexe
12 xx 12 xPolynomial
Sigmoid
xe 1
12)1( x
x
e
e
Gaussian
2
2
2
)(
cx
e
2
2
2
)(
2
cx
exc
Reciprocal
x
12
1
x
HGM2006, Helsinki
![Page 11: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/11.jpg)
Power (Case-Control )
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.5
1
1.5
Allele Frequency
Expecte
d N
oncentr
ality
Para
mete
r
Entropy
2
Exp
Quadratic
Sigmoid
Gasaian
Reciprocal
Expected noncentrality parameters of the nonlinear test statistics
NA=NG=100, PD=0.5
HGM2006, Helsinki
![Page 12: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/12.jpg)
Association test of MMP-2 gene with esophageal carcinoma
P values entropy exponential polynomial sigmoid reciprocal χ2
3.2 ×10-8 2.3 ×10-7 1.9 ×10-7 2.0 ×10-7 5.1 ×10-6 7.0 ×10-6
Yu C, et al. Cancer Res 2004, 64: 7622-7628
Association Studies
HGM2006, Helsinki
![Page 13: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/13.jpg)
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
![Page 14: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/14.jpg)
How LD patterns are compared between populations?
• Step 1: Infer haplotype blocks for each population• Step 2: Compare the boundaries of LD blocks between
populations.Pop A
Pop BTarget SNP
HGM2006, Helsinki
![Page 15: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/15.jpg)
HGM2006, Helsinki
![Page 16: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/16.jpg)
Factors Influencing Block Inferences
• Sample size
• Criterion and thresholds
• Genotyping error
• Gene flow
• Search algorithm
HGM2006, Helsinki
![Page 17: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/17.jpg)
Af
As Eu
Daic (Thai) ?
HGM2006, Helsinki
![Page 18: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/18.jpg)
Samples
Uighur 45
Han50
Wa45 Zhuang
44
Hmong 46
European40
African American48
Samoan50
HGM2006, Helsinki
![Page 19: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/19.jpg)
SNP Selection and Genotyping
• Selected from dbSNP (build 117)• Most of them are double-hits• 26,112 SNPs on Chro. 21• 1 SNP for every 1.3 kb (Golden Path b.34)
• Illumina BeadLab platform• 17 oligonucleotide primer sets• Three QA criteria
– Samples– SNP: trios & duplicates– SNP: Hardy-Weinberg Expectation
HGM2006, Helsinki
![Page 20: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/20.jpg)
Zhuang Han Hmong
Samoan Uighur
Wa
European African AmericanHGM2006, Helsinki
![Page 21: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/21.jpg)
Phylogeny of Human Populations
HMJ
CCY
HAN
WBM
UIG
EUR
AA0.0684
0.0372
0.0093
0.0133
0.0093
0.0202
0.0039
0.0103
0.0341
0.0016
0.0023
0.01
Genetic Distance (FST)
HGM2006, Helsinki
Hmong
Zhuang
Han
Wa
Uyghur
European
African
![Page 22: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/22.jpg)
Measurement of LD Sharing
• SNPs presented in both Pop A & Pop B• SNPs with MAF 0.1 were included• In LD, if r2 c (c = 0.1, 0.5, 0.8)
Pop A
Pop BTarget SNP
a = # LD in A
c = # LD in A & B
b = # LD in B
SAB = c/a
SBA = c/b
200kb
HGM2006, Helsinki
![Page 23: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/23.jpg)
0.5
0.6
0.7
0.8
0.9
0.00 0.02 0.04 0.06 0.08 0.10 0.12
Fst
S
r2 > 0.1
r2 > 0.5SAB ~ FST
FST increases with time after divergence (t)
In non-Africans
HGM2006, Helsinki
![Page 24: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/24.jpg)
Pop A
Pop BTarget SNP
a = # LD in A
c = # LD in A & B
b = # LD in B
SAB = c/a
SBA = c/b
200kb
Correlation of LD between Populations = corr(a,b)
HGM2006, Helsinki
![Page 25: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/25.jpg)
Correlation of LD Between Populations and Genetic Distance (FST)
0. 5
0. 6
0. 7
0. 8
0. 9
1
0 0. 05 0. 1 0. 15
HGM2006, Helsinki
![Page 26: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/26.jpg)
Portability of tagging SNPs (RAB)
RAB =Number of SNPs captured by tagSNPs
Total number of SNPs
Pop A
Pop B
Portability from A to B = RAB
HGM2006, Helsinki
![Page 27: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/27.jpg)
0.05
0.10
0.15
0.20
0.25
0.00 0.02 0.04 0.06 0.08 0.10 0.12
Fst
Rab
r2 > 0.1
r2 > 0.5
RAB ~ FST
• R can be estimated using FST
• FST can be estimated using a small number of SNPs• Conclusion: R can be approximately estimated by typing a small number of SNPs
1-
HGM2006, Helsinki
![Page 28: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/28.jpg)
t
RAB FST
HGM2006, Helsinki
![Page 29: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/29.jpg)
Conclusions
• Substantial LD sharing between populations: ancestral LDs
• tagSNPs are generally portable between populations, at least within Asia
• Portability of a population to another can be estimated empirically using a small set of SNPs
HGM2006, Helsinki
![Page 30: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/30.jpg)
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
![Page 31: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/31.jpg)
Population Stratification
• 209 languages belonging to 6 linguistic families• Consistent observation of south-north differentiation• Affect the power of association studies - false positives• Different loci show different level of differentiation: Is
there an adequate adjustment?
HGM2006, Helsinki
![Page 32: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/32.jpg)
Individual treeChromosome 2120,288 SNPs
HGM2006, Helsinki
![Page 33: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/33.jpg)
Cluster Decomposition of Chinese PopulationsHGM2006, Helsinki
![Page 34: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/34.jpg)
Y Chromosomes143 populations
mtDNA91 populations
CODIS STRs79 populations
HLA-A107 populations
Geographic Genetic Clines Based on Principle Components
HGM2006, Helsinki
![Page 35: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/35.jpg)
Distributions of mtDNA Haplogroups
HGM2006, Helsinki
![Page 36: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/36.jpg)
Distributions of Y Haplogroups
HGM2006, Helsinki
![Page 37: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/37.jpg)
All haplogroups
All haplogroups
Major haplogroups
HGM2006, Helsinki
![Page 38: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/38.jpg)
Uyghurs
HGM2006, Helsinki
![Page 39: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/39.jpg)
Uyghurs
HGM2006, Helsinki
![Page 40: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/40.jpg)
Population Stratification
• Different loci show different level of differentiation• Admixture indeed exist at least in some of the
populations• Adjustment for population stratification using average
differentiation is not adequate
HGM2006, Helsinki
![Page 41: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/41.jpg)
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
![Page 42: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/42.jpg)
Perfect Phylogeny Approach
• No recombination and recurrent mutation
• No loop in network
• Not necessarily continuous
• Objective: Group SNPs into PP sets
PP(A)PP(B)PP(C)
HGM2006, Helsinki
![Page 43: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/43.jpg)
1
1
2
34
5432site 1site 2site 3site 4
(1, 2, 3) (4, 5)(2 , 3) (1, 4, 5)(1, 2, 3, 5) (4)(2) (1, 3, 4, 5)
Inference of Phylogeny
HGM2006, Helsinki
![Page 44: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/44.jpg)
Sample Size
HaploTree PHASE 2.0.2 PPH
Accuracy Run time Accuracy Run time Accuracy Run time
25 94.81% 0.36s 94.55% 12.23s 92.50% 0.14s
50 97.44% 0.58s 97.37% 14.37s 96.48% 0.23s
100 98.78% 0.82s 98.74% 18.42s 98.07% 0.62s
Comparison of Different Algorithms
HGM2006, Helsinki
![Page 45: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/45.jpg)
1
1
2
34
5432site 1site 2site 3site 4
(1, 2, 3) (4, 5)(2 , 3) (1, 4, 5)(1, 2, 3, 5) (4)(2) (1, 3, 4, 5)
Inference of Phylogeny
HGM2006, Helsinki
![Page 46: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/46.jpg)
Identification of Disease Mutation
• For each PP, it allows a stepwise search to localize the most likely branch (edge) of the mutation.
• The best PP can be determined based on the likelihood (with adjustment of degree of freedom)
PP(A)PP(B)PP(C)
HGM2006, Helsinki
![Page 47: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/47.jpg)
Challenges
• Adjustment for multiple testing and power
• Portability of tagging SNPs between populations
• Population stratification
• Mapping the mutation
• Exploring gene-gene interaction
HGM2006, Helsinki
![Page 48: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/48.jpg)
A Study of CAD
• Coronary Atherosclerosis in Chinese Populations
• 123 candidate genes belong to several pathways including antioxidant, inflammation, coagulation
• 1,518 tagSNPs typed
• 916 samples (492 cases and 424 controls)
HGM2006, Helsinki
![Page 49: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/49.jpg)
HGM2006, Helsinki
![Page 50: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/50.jpg)
CD36MMP8
PDGFC
DSCR1
ITGB1
ITGA2
PDGFB
SELL CCR2ITGA6
LAMA4
EDN1 SELE
TGFB3
VEGF
MSR1
NFKB1
MMP9IL1B
ACE
PON2
PON3 PON1
GPX3
SOD2
TXN
HMOX1GSRGCLM
NOS3GSS
NPR3
TXN
MMP9
Anti-oxidation Pathway
Inflammatory Pathway
With-PW interaction
Between-PW interactionHGM2006, Helsinki
![Page 51: Exploring complex diseases using genome-wide association: challenges and strategies](https://reader035.fdocuments.in/reader035/viewer/2022081513/56814d81550346895dbae1a7/html5/thumbnails/51.jpg)
CreditsCredits
• University of Texas – Houston
– Momiao Xiong, Jinying Zhao
• Chinese Human Genome Center at Shanghai
– Wei Huang, Haifeng Wang, Ying Wang, Zhu Chen, Guoping Zhao
• Fudan University & CAS-MPG Institute of Computational Biology
– Shuhua Xu, Fuzhong Xue, Yungang He, Yi Wang, Ming Lu, Ji Qian, Bo Wen, Hui Li, Wenqing Fu, Li Jin
HGM2006, Helsinki