Epidemiologisk FredagsmøDe 15 2 2008

111
Association Mapping Through local genealogies Bioinformatics Research Center http://www.birc.au.dk / Thomas Mailund

Transcript of Epidemiologisk FredagsmøDe 15 2 2008

Page 1: Epidemiologisk FredagsmøDe 15 2 2008

Association MappingThrough local genealogies

Bioinformatics Research Center

http://www.birc.au.dk/

Thomas Mailund

Page 2: Epidemiologisk FredagsmøDe 15 2 2008

“Genetic” Diseases

Gunshot w

oundsC

ar accidents

Smoking induced

lung cancer

Cardiovascular

diseaseO

besityD

iabetes 2

Alzheim

erSchizophrenia

BRC

A1

breast cancer

Cystic fibrosis

Haem

ophilia

Page 3: Epidemiologisk FredagsmøDe 15 2 2008

Disease Mapping...

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Locate disease-affecting polymorphism

Cases (affected)

Controls (unaffected)

Page 4: Epidemiologisk FredagsmøDe 15 2 2008

Unrealistic Assumptions

-C- -A---G--

-A----G---T-- -C-

-A---C--T--G--

-A----C---A--

We only measure“unphased” data

Page 5: Epidemiologisk FredagsmøDe 15 2 2008

Unrealistic Assumptions

-C- -A---G--

-A----G---T-- -C-

-A---C--T--G--

-A----C---A--

--T--------G--------A----G--------C---C---A----

--A--------C--------A----G--------T---C---A----

We only measure“unphased” data

We first need toinfer the phase

Page 6: Epidemiologisk FredagsmøDe 15 2 2008

Unrealistic Assumptions

-C- -A---G--

-A----G---T-- -C-

-A---C--T--G--

-A----C---A--

--T--------G--------A----G--------C---C---A----

--A--------C--------A----G--------T---C---A----

--T--------G--------A----G--------T---C---A----

--A--------C--------A----G--------C---C---A----

We only measure“unphased” data

We first need toinfer the phase

Page 7: Epidemiologisk FredagsmøDe 15 2 2008

Unrealistic Assumptions

-C- -A---G--

-A----G---T-- -C-

-A---C--T--G--

-A----C---A--

--T--------G--------A----G--------C---C---A----

--A--------C--------A----G--------T---C---A----

--T--------G--------A----G--------T---C---A----

--A--------C--------A----G--------C---C---A----

--T--------C--------A----G--------T---C---A----

--A--------G--------A----G--------C---C---A----

We only measure“unphased” data

We first need toinfer the phase

Page 8: Epidemiologisk FredagsmøDe 15 2 2008

Unrealistic Assumptions

-C- -A---G--

-A----G---T-- -C-

-A---C--T--G--

-A----C---A--

--T--------G--------A----G--------C---C---A----

--A--------C--------A----G--------T---C---A----

--A--------G--------A----G--------C---C---A----

--T--------C--------A----G--------T---C---A----

--T--------C--------A----G--------T---C---A----

--A--------G--------A----G--------C---C---A----?

We only measure“unphased” data

We first need toinfer the phase

Page 9: Epidemiologisk FredagsmøDe 15 2 2008

Disease Mapping...

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

Markers are locally correlated

Page 10: Epidemiologisk FredagsmøDe 15 2 2008

Disease Mapping...

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

Search for indirect signals

Page 11: Epidemiologisk FredagsmøDe 15 2 2008

Marker RelatednessLinkage disequilibrium (LD)

Recombination rate

LD (

r2 )

Empirical Results Theoretical Results

Clark et al. 2003, AJHG 73:285-300. Hein et al. 2005

Page 12: Epidemiologisk FredagsmøDe 15 2 2008

Indirect Association

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

“Tag” markers Unobserved marker

Page 13: Epidemiologisk FredagsmøDe 15 2 2008

Indirect Association

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

Page 14: Epidemiologisk FredagsmøDe 15 2 2008

Indirect Association

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

Page 15: Epidemiologisk FredagsmøDe 15 2 2008

Indirect Association

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

Page 16: Epidemiologisk FredagsmøDe 15 2 2008

Indirect Association

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

Page 17: Epidemiologisk FredagsmøDe 15 2 2008

IndirectMulti-Marker

Association

--A--------C--------A----G---X----T---C---A------T--------G--------A----G---X----C---C---A------A--------G--------G----G---X----C---C---A------A--------C--------A----G---X----T---C---A------T--------C--------A----G---X----T---C---A------T--------C--------A----T---X----T---A---A----

--A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------A----G---X----T---C---G------T--------C--------A----T---X----T---C---A------A--------C--------A----G---X----T---C---A------A--------C--------G----T---X----C---A---A------A--------C--------A----G---X----C---C---G----

Cases (affected)

Controls (unaffected)

Page 18: Epidemiologisk FredagsmøDe 15 2 2008

The Ancestral Recombination Graph

Hudson 1990, Griffith and Marjoram 1996

Page 19: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 20: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 21: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 22: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 23: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 24: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 25: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 26: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 27: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 28: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 29: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 30: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 31: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 32: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 33: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 34: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 35: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 36: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 37: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 38: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 39: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 40: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 41: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 42: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 43: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 44: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 45: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 46: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 47: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 48: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 49: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 50: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 51: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 52: Epidemiologisk FredagsmøDe 15 2 2008

The Coalescent Process

Page 53: Epidemiologisk FredagsmøDe 15 2 2008

A Reasonable Local Model

Copyright ! 2007 by the Genetics Society of AmericaDOI: 10.1534/genetics.107.071126

On Recombination-Induced Multiple and Simultaneous Coalescent Events

Joanna L. Davies,1 Frantisek Simancık, Rune Lyngsø, Thomas Mailund and Jotun Hein

Department of Statistics, University of Oxford, Oxford, OX1 3TG, United Kingdom

Manuscript received January 18, 2007Accepted for publication October 2, 2007

ABSTRACTCoalescent theory deals with the dynamics of how sampled genetic material has spread through a

population from a single ancestor over many generations and is ubiquitous in contemporary molecularpopulation genetics. Inherent in most applications is a continuous-time approximation that is derivedunder the assumption that sample size is small relative to the actual population size. In effect, thisprecludes multiple and simultaneous coalescent events that take place in the history of large samples. Ifsequences do not recombine, the number of sequences ancestral to a large sample is reduced sufficientlyafter relatively few generations such that use of the continuous-time approximation is justified. However,in tracing the history of large chromosomal segments, a large recombination rate per generation willconsistently maintain a large number of ancestors. This can create a major disparity between discrete-timeand continuous-time models and we analyze its importance, illustrated with model parameters typical ofthe human genome. The presence of gene conversion exacerbates the disparity and could seriouslyundermine applications of coalescent theory to complete genomes. However, we show that multiple andsimultaneous coalescent events influence global quantities, such as total number of ancestors, but havenegligible effect on local quantities, such as linkage disequilibrium. Reassuringly, most applications of thecoalescent model with recombination (including association mapping) focus on local quantities.

KINGMAN (1982) models the ancestry of a sampleof sequences with a continuous-time Markov pro-

cess referred to as the Kingman coalescent. Lineagescollide or coalesce after random exponential waitingtimes with rate dependent upon the population andsample size. This means that the probability of multiple(i.e., three or more sequences coalescing into a com-mon ancestor in a single coalescent event) and simul-taneous (i.e., two or more coalescent events happeningat exactly the same time) coalescent events is zero. Thederivation of the process can be obtained by scaling thediscrete-time Wright–Fisher model and taking the limitas the population size tends to infinity. This model isextended by Hudson (1983) to incorporate recombi-nation. The derivation of Hudson’s continuous-timeapproximation to the Wright–Fisher model with recom-bination is discussed later in more detail but is validprovided only that the set of ancestors to the sample ofextant sequences remains small relative to the effectivepopulation size. In such situations it is justified toassume that multiple and simultaneous coalescentevents do not occur in the evolutionary history of thesample and that ancestral sequences can recombineonly with nonancestral sequences and never with eachother. As the sample size increases relative to the pop-

ulation size, the probability of such events occurringbecomes nonnegligible and consequently in theseinstances the rate of coalescence is underestimatedby Hudson’s continuous-time model. Hudson’s modelis widely used in population genetics to describeancestries of sequences that can recombine. Conse-quently it is of interest to question to what extent therate of coalescence is underestimated and how thisinfluences other features of the coalescent.Fu (2006) shows the Kingman coalescent Kingman

(1982) provides a good approximation to the discrete-time Wright–Fisher Model in most cases, even when thesample size is not small relative to the population size.This study is performed in the absence of recombina-tion and any large sample will quickly coalesce to a smallsample such that the assumption soon becomes validand the corresponding results are accurate. In the pres-ence of recombination this is not the case; the processtracking the number of sequences ancestral to the ex-tant sample can be shown to reach an equilibrium dis-tribution in which the number of sequences remainslarge for a significant amount of time.Pitman (1999), Sagitov (1999), Schweinsberg (2000),

and Sagitov (2003) derive continuous-time exactcoalescent processes allowing for coalescents with mul-tiple collisions, simultaneous multiple collisions, andsimultaneous and multiple collisions, respectively, al-though none of these processes incorporate recombi-nation. Wiuf and Hein (1997) derive analytical results

1Corresponding author: Department of Statistics, University of Oxford,1 S. Parks Rd., Oxford, OX1 3TG, United Kingdom.E-mail: [email protected]

Genetics 177: 2151–2160 (December 2007)

Page 54: Epidemiologisk FredagsmøDe 15 2 2008

A Reasonable Local Model

• The “back in time” approach (in general) means we ignore selection

• Implicit assumption that the disease is selectively neutral

• Which may or may not be reasonable...

• Might be okay for late onset diseases...

Page 55: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

P( )

Page 56: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

P( )|

Page 57: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

P( )|

Page 58: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

P( )|

Page 59: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

)

P( )P(| , )|

Page 60: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

P( | )=

)

P( )P(| , )d|∫

lhd( )=

Page 61: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

)

P( )P(| , )d|∫lhd( )=

Integration by magic

Page 62: Epidemiologisk FredagsmøDe 15 2 2008

The ARG as a Statistical Model

)

P( )P(| , )d|∫lhd( )=

Integration by magicstatistical sampling

Page 63: Epidemiologisk FredagsmøDe 15 2 2008

• Sampling ARGs from the coalescence process

• Sampling ARGs conditional on the data (importance sampling)

• Sampling parsimonious ARGs conditional on the data

ARG Methods

Page 64: Epidemiologisk FredagsmøDe 15 2 2008

• Sampling ARGs from the coalescence process

• This is a no go -- you would never sample an ARG that can explain the data

• Sampling ARGs conditional on the data (importance sampling)

• Sampling parsimonious ARGs conditional on the data

ARG Methods

Page 65: Epidemiologisk FredagsmøDe 15 2 2008

• Sampling ARGs from the coalescence process

• Sampling ARGs conditional on the data (importance sampling)

• Larribe, Lessard and Schork 2002 -- scales to tens of individuals and tens of markers

• Sampling parsimonious ARGs conditional on the data

ARG Methods

Page 66: Epidemiologisk FredagsmøDe 15 2 2008

• Sampling parsimonious ARGs conditional on the data

• Lyngsø, Song & Hein 2005 (calculates parsimonious ARGs -- a 2008 paper in press for sampling)

• Minichiello & Durbin 2006 (samples parsimonious ARGs and scores local genealogies)

• Both preferentially selects mutations and coalescence events over recombinations

• Scales to thousands of individuals and hundreds of markers

ARG Methods

Page 67: Epidemiologisk FredagsmøDe 15 2 2008

Local PhylogeniesFor each “point” on the chromosome, the ARGdetermines a (local) tree:

Page 68: Epidemiologisk FredagsmøDe 15 2 2008

Local PhylogeniesFor each “point” on the chromosome, the ARGdetermines a (local) tree:

Page 69: Epidemiologisk FredagsmøDe 15 2 2008

Local PhylogeniesFor each “point” on the chromosome, the ARGdetermines a (local) tree:

Page 70: Epidemiologisk FredagsmøDe 15 2 2008

Local PhylogeniesFor each “point” on the chromosome, the ARGdetermines a (local) tree:

Page 71: Epidemiologisk FredagsmøDe 15 2 2008

Changing PhylogeniesType 1: No change

Type 2: Change in branch lengths

Type 3: Change in topology

From Hein et al. 2005

Page 72: Epidemiologisk FredagsmøDe 15 2 2008

Trees and LD

Recombination rate

Tree

sim

ilari

ty

Recombination rate

LD r

2

Page 73: Epidemiologisk FredagsmøDe 15 2 2008

Can we use just the trees?

Page 74: Epidemiologisk FredagsmøDe 15 2 2008

Clustering on a Tree

Disease affecting mutation

Page 75: Epidemiologisk FredagsmøDe 15 2 2008

Clustering on a Tree

Complete penetrance

Incomplete penetrance

Spurious disease

Page 76: Epidemiologisk FredagsmøDe 15 2 2008

Clustering on a Tree

60%

40%

25%

75%

Case/control clusteringis not random on the tree...

Page 77: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 78: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 79: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 80: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 81: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 82: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 83: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 84: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 85: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 86: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 87: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 88: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 89: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 90: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 91: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 92: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 93: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

Page 94: Epidemiologisk FredagsmøDe 15 2 2008

Sampling Trees(with recombination)

Zöllner & Pritchard 2005

We only sample the process on the left --much fewer events

Page 95: Epidemiologisk FredagsmøDe 15 2 2008

Using “Perfect Phylogenies”Use the four-gamete test to find regions thatcan be explained by a tree with no recurrent mutations

Mailund, Besenbacher & Schierup 2006

Page 96: Epidemiologisk FredagsmøDe 15 2 2008

Using “Perfect Phylogenies”Build trees for each such region

Mailund, Besenbacher & Schierup 2006

Page 97: Epidemiologisk FredagsmøDe 15 2 2008

Using “Perfect Phylogenies”Each marker splits a sub-tree in two

Mailund, Besenbacher & Schierup 2006

Page 98: Epidemiologisk FredagsmøDe 15 2 2008

Using “Perfect Phylogenies”Each marker splits a sub-tree in two

Mailund, Besenbacher & Schierup 2006

Page 99: Epidemiologisk FredagsmøDe 15 2 2008

Using “Perfect Phylogenies”Each marker splits a sub-tree in two

Mailund, Besenbacher & Schierup 2006

Page 100: Epidemiologisk FredagsmøDe 15 2 2008

Using “Perfect Phylogenies”

Much faster (and much cruder)

Catches the essential tree structure

Mailund, Besenbacher & Schierup 2006

Page 101: Epidemiologisk FredagsmøDe 15 2 2008

Scoring the Clustering

Red=casesGreen=controls

Are the case chromosomes significantly over-represented in some clusters?

Page 102: Epidemiologisk FredagsmøDe 15 2 2008
Page 103: Epidemiologisk FredagsmøDe 15 2 2008

Mutation

We can place “mutations” on the tree edges and partition chromosomes into “mutants” and “wild-types” and test for different distributions of cases and controls

Mutants

Wild-types

Page 104: Epidemiologisk FredagsmøDe 15 2 2008

Mutation

Use average or maximum to score the tree

Average is kosher Bayesian stats; maximum needs to be corrected for over-fitting.

Mutants

Wild-types

Page 105: Epidemiologisk FredagsmøDe 15 2 2008

Blossoc(BLOck aSSOCiation)

Homepage: www.birc.au.dk/~mailund/Blossoc

Command line andgraphical user interface(with limited functionality)

Page 106: Epidemiologisk FredagsmøDe 15 2 2008

Blossoc(BLOck aSSOCiation)

Homepage: www.birc.au.dk/~mailund/Blossoc

Fast enough to analysetens of thousands of individuals in hundred of thousands of markers in a day or two on a desktop computer...

Page 107: Epidemiologisk FredagsmøDe 15 2 2008
Page 108: Epidemiologisk FredagsmøDe 15 2 2008
Page 109: Epidemiologisk FredagsmøDe 15 2 2008

A single causal mutationMax BF / min p-value used as point estimate

Localisation Accuracy

Page 110: Epidemiologisk FredagsmøDe 15 2 2008

Localisation Accuracy

Two causal mutationsMax BF / min p-value used as point estimate

Page 111: Epidemiologisk FredagsmøDe 15 2 2008

Thank you!

More information athttp://www.birc.au.dk/~mailund/association-mapping/