Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs
description
Transcript of Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs
RECOMB Satellite Workshop, 2007
Algorithms for Association Mapping of Complex Diseases With
Ancestral Recombination Graphs
Yufeng WuUC Davis
2
Association (or LD) Mapping
• Given a subset of SNPs from unrelated individuals, find unobserved genetic variations that strongly discriminate individuals with the trait (cases) and those without the trait (controls)
• Complex Diseases: difficult to map
3
Illustration (Zollner and Pritchard, Genetics, 2005)
Cases
ControlsSNP markers
1: 0011012: 1100003: 0011104: 0010005: 0000106: 1111017: 1000118: 1100019: 11001010: 10001111: 01000012: 101101
4
Some Challenges in Association Mapping
1 2
5
The Genealogy Approach• “..the best information that we could
possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard
• Goal: infer genealogy from marker data with recombination– Approximation (e.g. in Zollner and Pritchard)
6
Ancestral Recombination Graph (ARG)
10 01 00
S1 = 00S2 = 01S3 = 10S4 = 10
MutationsS1 = 00S2 = 01S3 = 10S4 = 11
10 01 0011
Recombination
Assumption:
at most one mutation per site
1 0 0 1
1 1
7
Full-ARG Approaches• First full ARG mapping method (Minichiello and
Durbin)– Use full plausible ARG, but heuristic– Less complex disease model
• Our results (Wu, 2007)– Sampling full ARGs with provable property, and work on
more complex disease model– Focus on parsimonious history
• minARGs: ARGs that use the minimum number of recombinations
• Near minimum ARGs– Uniform sampling of minARGs
8
Special Case: ARG with Only Input Sequences
• Self-derivability (SD) Problem: construct an ARG with only the input sequences
• In fact, such ARG, if exits, must be a minARG
• Runs in O(2n) time• Heuristics to extend to non-self-
derivable data
9
00000
01000
01100
01101
11000
00010
11011
00011 1 2
00000
01000
01100
01101
11000
00010
00011
11011
N1=16400000
01000
01100
11000
00010
11011
00011
01101
N2=76N = 164*1 + 76*2
= 316
Counting Self-derived ARGs
00000
01000
01100
01101
11000
00010
11011
00011 1 2
00000
01000
01100
01101
11000
00010
00011
11011
16400000
01000
01100
11000
00010
11011
00011
01101
76
1. Random value Rnd = 0.3 < 0.52
316
Select 11011 with prob = 164/316 = 0.52, and 01101 with prob = 76*2/316 = 0.48
2. Pick seq = 11011 as last row to derive3. Move to reduced matrix
11
ARGs Represents a Set of Marginal Trees
• Clear separation of cases/controls: NOT expected for complex diseases!
12
Disease Model (Zollner & Pritchard)
Disease mutations: Poisson Process
Two alleles: wild-type and mutant
0.05
0.05
0.05 0.05
0.1
0.1
0.050.05
13
Disease Penetrance (Zollner & Pritchard)
PA,1: probability of a mutant sequence becomes a casePC,1 = 1.0 - PA,1
PA,0: probability of a wild-type sequence becomes a casePC,0 = 1.0 - PA,0
0.05
0.05
0.05 0.05
0.1
0.1
0.050.05
CaseControl
14
Phenotype Likelihood (Zollner and Pritchard)
• Given a tree Tx at position x and case/control phenotype of its leaves, what is the probability Pr( | Tx) of observing on Tx? (Zollner & Pritchard)– Sum over all subset of mutated edges
• Adopted in this work
15
Expected Phenotype Likelihood
• Need for assessing statistical significance.• Null model: randomly permute case/control
labels.• Our result: O(n3) algorithm for computing
expected value of phenotype likelihood.– Exact, fully deterministic method.
16
Diploid Penetrance
Diploid: two sequences per individualDiploid enetrance:PA,00: prob. Individual with two wild-type sequences becomes a casePA,01 : …, PA,11: …
CaseControl
Efficient computation of phenotype likelihood: stated but unresolved in Zollner and PritchardOur result (Wu, 2007): computing phenotype likelihood with diploid penetrance is NP-hard
17
Simulation Results
Comparison: TMARG (uniform), TMARG (pathway), LATAG, MARGARITA
50 ARGs per data
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
Uniform Pathway LATAG MARGARITA
50/5000 ARGs per data
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
n50 n5000 LATAG MARGRITA
18
Acknowledgement
• Software available at: http://wwwcsif.cs.ucdavis.edu/~wuyu
• I want to thank– Dan Gusfield– Dan Brown– Chuck Langley– Yun S. Song