1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.

Post on 02-Jan-2016

216 views 1 download

Tags:

Transcript of 1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.

1

Genes and MS in Tasmania, cont.

Lecture 5, Statistics 246February 3, 2004

Mapping genes contributing to complex diseases

3

MS susceptibility genes are difficult to map

MS is a complex disease. Analyses with traditional methods such as single marker association studies and standard linkage approaches (affected sib-pairs, pedigrees etc) have failed to agree on genomic regions other than the HLA region.

There are a variety of possible reasons for this: • Allelic and locus heterogeneity

(no single gene model fits all)• Significant environmental influences• Imprecise phenotyping

4

Linkage vs Association studies

• Linkage mapping: tests for cosegregation of a marker allele with the disease within families

• Association mapping: seeks a marker allele that is present more frequently in cases than in controls; all affected individuals are treated as distant relatives– Case/control studies– Transmission disequilibrium test (needs triads)

We will do a quick review of association mapping before turning to our MS study.

5

Linkage disequilibrium

Suppose that we have a marker with just two alleles, M and m say, having frequencies p and 1-p, and a (not necessarily linked) disease locus with alleles D and d, having frequencies q and 1-q. A (haploid) gamete must have one of the four combinations (haplotypes) DM, Dm, dM or dm. Let the frequencies in a population of these four haplotypes be x1, x2, x3 and x4 .

Under independence, we would have x1 = pq, etc. Deviations of the observed haplotype frequencies from these products is

termed linkage disequilibrium (LD), or, better, gametic association.

If inheriting the allele D at the disease locus increases the chance of getting the disease, and the disease and marker loci are in LD, then the frequencies of the marker alleles M and m will differ between diseased and non-diseased individuals. This observation is the basis of association studies.

6

Case-control studies in genetic epidemiology

Case-control studies compare case and control allele frequencies at markers or candidate genes (the “exposure” variables). All the standard potential drawbacks of such studies apply, with the similarity of the two base populations being the most critical here. It is thought to be relatively easy for samples from racially mixed populations to differ in allele frequencies, and hard to deal with this in the genetic context. Key term: population structure.

If our cases are MS patients, who are our controls? It would be rare for a study to be able to afford or get ethics approval to carry out random sampling of the relevant background population. More commonly, controls are people such as blood donors, whose blood (DNA) has been collected for other purposes. How close will they be to a random sample from the case population?

In an effort to deal with this, the TDT which follows in effect uses untransmitted genotypes as controls, bypassing any population structure.

7

The transmission-disequilibrium test

The TDT, as it is called, in its simplest form, starts with parents and an affected child, and considers a biallelic marker locus at which all three are typed, and we can determine which maternal and paternal alleles were transmitted, and which were not.

For example, if the parents were a1/a2 and a1/a1 , and the affected offspring was a1/a2 , then a2 was transmitted and a1 was not transmitted by the first parent.

From a random sample of such trios (called triads), a 22 table

can be built up giving the number of times a1 and a2 were transmitted and were not transmitted, respectively, and a simple test can be derived. Many generalizations of this procedure now exist, see notes for Stat 260, 1998 Week 5.

8

2 1361

9 15174

1 962

9 172 12

12714671

18 181 410 10

Genotypes Haplotypes

13115492171276118410

26917

16921214718110

Haplotype

Re-construction

• A collection of alleles derived from the same chromosome

What is a haplotype?

Chromosome phase is knownChromosome phase is unknown

9

Haplotype mapping

If alleles at a disease locus are associated with alleles at one nearby marker locus on gametes, they are likely to be associated with alleles at other nearby marker loci, and hence with marker haplotypes.

A potentially more powerful way to locate disease genes is to search for associations between marker haplotypes and disease.

There are two possible problems here stemming from the fact that there can be a very large number of marker haplotypes: we may have to deal with very small frequencies, and we have a multiple testing problem.

10

Searching for common or rare haplotypes in cases alone is one form of association mapping. It has been successful, as very substantial LD can arise around disease loci. In general controls are necessary as the background LD can be large.

That is, there can be substantial LD between putative disease gene alleles and alleles of nearby markers, without there being any causal link between the gene and the disease. We call this background LD.

Background LD can be large – when the population is young– when the # of founders is small (bottlenecks)– through admixture of populations

LD, haplotype mapping and background LD

11

Exercises on LD

1. Under a random mating assumption, the long term values of the frequencies x1, x2, x3 and x4 on page 5 above are pq, (1-p)q, p(1-q) and (1-p)(1-q). (Week 5, Stat 260, 1998).

2. Demonstrate that a mixture (e.g. 50:50) of two populations initially in linkage equilibrium at two loci, will typically not be in LE.

3. Explain why a single mutant arising by chance, will initially be in strong LD with alleles at loci near the locus on which it arises.

Mapping MS genes in Tasmania

13

Area: 67,800 km2

Population: 470,000

Tasmania

Capital city: Hobart (~200,000)

Tasmanian Population Growth

1 : First settled by Europeans (1803)

2 : 24,000 free settlers19,000 convicts (1836)

4 : End of convicttransportation (1853)

3 : Civil registration of births and marriages (1838)

5 : “The Gold Rush” (1860’s)

1 2345

15

Mapping with haplotype sharing

Time1800-1850’s

6-8 generations2000

Premise: Tasmanians share large(ish) segments of haplotypes because they are distantly relatedSimilarly our MS patients should share these large(ish) segments but even more so (in size and in number) in regions around MS susceptibility genes

16

Haplotypes are “eroded” by recombinationAncestral

chromosome

Time/generations/meioses

MS MS MS MSMS MS 25 cM (SD=18)

Recombination events can help to map genes with precision, but erode haplotypes making them more difficult to detect

17

What might have happened in the population?

• A mutation arises in, or is introduced to, a population leading to disease (say MS) in those individuals

• The mutation arises on the background of a unique haplotype

• As this mutation spreads through the population (by chance, or inbreeding) so do remnants of this original haplotype by hitchhiking (linkage disequilibrium)

timeMS

Ancestralsusceptibility

haplotype

MS

MS

MS

MS

Design of the Tasmanian MS study

19

What strategy could be used to map MS susceptibility genes in Tasmania?

• Too few affected sib pairs/multiplex families for a conventional linkage approach• Prefer a model free (non-parametric) approach A haplotype-based case-control study design seemed appropriate

20

MS study in Tasmania: design

• Collect as many MS cases with ancestral links to Tasmania as possible, and a suitable (not necessarily equal) number of similar, socioeconomically and geographically matched unrelated controls

• Around each case and each control, collect a constellation of ~ 4 close relatives for (probabilistic) haplotype reconstruction

• Infer genome-wide haplotypes for all cases and controls

• Carry out a case/control study with the haplotypes, seeking regions of the genome shared more by the cases, in comparison with the controls

21

Analysis options

Transmitted

Case Haplotypes

Untransmitted Case

Haplotypes

Transmitted Control

Haplotypes

Untransmitted Control

Haplotypes

Green: hope to find signal Red: hope to find nothing

First mathematical questions

• Resolution of genome-wide scan (length of likely shared chromosomal segments)

• Nature and number of relatives needed to permit the reconstruction of accurate haplotypes with high probability

Average length of shared chromosomal segments

Exercise. Assume the Poisson model for crossovers along a chromosome. What is the mean and variance of the length in cM of the chromosomal segments shared by individuals with a common ancestor 7 generations back?

Nature and numbr of relatives needed to give accurate haplotypes

Exercise. Explain why it is that when we have both sets of parental genotypes, and the markers are reasonably polymorphic, we can reconstruct an individual’s haplotypes with high probability. What are the difficult cases?

If we have no parents, or just one parent, and grandparents’, siblings’ or offsprings’ genotypes are available, which are most informative for an individual’s haplotype reconstruction?

25

Reconstructing haplotypes from genotypes

• Observe genotyping data for an individual

At marker 1 : (1,3)

At marker 2 : (b,d)

• Reconstruct the haplotype by inferring recombination events from genotypes of relatives

At marker 1 : Mum (1,2) Dad (3,4)

At marker 2 : Mum (a,b) Dad (c,d)

1

b

3

d

Marker 1

Marker 2

26

Genotyping

Use STR (short tandem repeat)

also known as microsatellite markers

…AGCTAGCGCGC….GCGCGGCATTA…

…AGCTAGCGCGC….GCGCGGCGCATTA…

Eventual plan: 5 cM genome wide scan (~ 800 markers) with dinucleotide STRs