Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

18
RECOMB Satellite Workshop , 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis

description

Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs. Yufeng Wu UC Davis. Association (or LD) Mapping. - PowerPoint PPT Presentation

Transcript of Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

Page 1: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

RECOMB Satellite Workshop, 2007

Algorithms for Association Mapping of Complex Diseases With

Ancestral Recombination Graphs

Yufeng WuUC Davis

Page 2: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

2

Association (or LD) Mapping

• Given a subset of SNPs from unrelated individuals, find unobserved genetic variations that strongly discriminate individuals with the trait (cases) and those without the trait (controls)

• Complex Diseases: difficult to map

Page 3: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

3

Illustration (Zollner and Pritchard, Genetics, 2005)

Cases

ControlsSNP markers

1: 0011012: 1100003: 0011104: 0010005: 0000106: 1111017: 1000118: 1100019: 11001010: 10001111: 01000012: 101101

Page 4: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

4

Some Challenges in Association Mapping

1 2

Page 5: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

5

The Genealogy Approach• “..the best information that we could

possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard

• Goal: infer genealogy from marker data with recombination– Approximation (e.g. in Zollner and Pritchard)

Page 6: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

6

Ancestral Recombination Graph (ARG)

10 01 00

S1 = 00S2 = 01S3 = 10S4 = 10

MutationsS1 = 00S2 = 01S3 = 10S4 = 11

10 01 0011

Recombination

Assumption:

at most one mutation per site

1 0 0 1

1 1

Page 7: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

7

Full-ARG Approaches• First full ARG mapping method (Minichiello and

Durbin)– Use full plausible ARG, but heuristic– Less complex disease model

• Our results (Wu, 2007)– Sampling full ARGs with provable property, and work on

more complex disease model– Focus on parsimonious history

• minARGs: ARGs that use the minimum number of recombinations

• Near minimum ARGs– Uniform sampling of minARGs

Page 8: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

8

Special Case: ARG with Only Input Sequences

• Self-derivability (SD) Problem: construct an ARG with only the input sequences

• In fact, such ARG, if exits, must be a minARG

• Runs in O(2n) time• Heuristics to extend to non-self-

derivable data

Page 9: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

9

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

N1=16400000

01000

01100

11000

00010

11011

00011

01101

N2=76N = 164*1 + 76*2

= 316

Counting Self-derived ARGs

Page 10: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

16400000

01000

01100

11000

00010

11011

00011

01101

76

1. Random value Rnd = 0.3 < 0.52

316

Select 11011 with prob = 164/316 = 0.52, and 01101 with prob = 76*2/316 = 0.48

2. Pick seq = 11011 as last row to derive3. Move to reduced matrix

Page 11: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

11

ARGs Represents a Set of Marginal Trees

• Clear separation of cases/controls: NOT expected for complex diseases!

Page 12: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

12

Disease Model (Zollner & Pritchard)

Disease mutations: Poisson Process

Two alleles: wild-type and mutant

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

Page 13: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

13

Disease Penetrance (Zollner & Pritchard)

PA,1: probability of a mutant sequence becomes a casePC,1 = 1.0 - PA,1

PA,0: probability of a wild-type sequence becomes a casePC,0 = 1.0 - PA,0

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

CaseControl

Page 14: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

14

Phenotype Likelihood (Zollner and Pritchard)

• Given a tree Tx at position x and case/control phenotype of its leaves, what is the probability Pr( | Tx) of observing on Tx? (Zollner & Pritchard)– Sum over all subset of mutated edges

• Adopted in this work

Page 15: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

15

Expected Phenotype Likelihood

• Need for assessing statistical significance.• Null model: randomly permute case/control

labels.• Our result: O(n3) algorithm for computing

expected value of phenotype likelihood.– Exact, fully deterministic method.

Page 16: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

16

Diploid Penetrance

Diploid: two sequences per individualDiploid enetrance:PA,00: prob. Individual with two wild-type sequences becomes a casePA,01 : …, PA,11: …

CaseControl

Efficient computation of phenotype likelihood: stated but unresolved in Zollner and PritchardOur result (Wu, 2007): computing phenotype likelihood with diploid penetrance is NP-hard

Page 17: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

17

Simulation Results

Comparison: TMARG (uniform), TMARG (pathway), LATAG, MARGARITA

50 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

Uniform Pathway LATAG MARGARITA

50/5000 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

n50 n5000 LATAG MARGRITA

Page 18: Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs

18

Acknowledgement

• Software available at: http://wwwcsif.cs.ucdavis.edu/~wuyu

• I want to thank– Dan Gusfield– Dan Brown– Chuck Langley– Yun S. Song