Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.

33
Linkage Analysis: Linkage Analysis: An Introduction An Introduction Pak Sham Twin Workshop 2001

Transcript of Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.

Linkage Analysis:Linkage Analysis:An IntroductionAn Introduction

Pak Sham

Twin Workshop 2001

Linkage MappingLinkage Mapping

Compares inheritance pattern of trait with the inheritance pattern of chromosomal regions

First gene-mapping in 1913 (Sturtevant)

Uses naturally occurring DNA variation (polymorphisms) as genetic markers

>400 Mendelian (single gene) disorders mapped

Current challenge is to map QTLs

Linkage = Co-Linkage = Co-segregationsegregation

A2A4

A3A4

A1A3

A1A2

A2A3

A1A2 A1A4 A3A4 A3A2

Marker allele A1

cosegregates withdominant disease

RecombinationRecombinationA1

A2

Q1

Q2

A1

A2

Q1

Q2

A1

A2 Q1

Q2

Likely gametes(Non-recombinants)

Unlikely gametes(Recombinants)

Parental genotypes

Recombination of three Recombination of three linked locilinked loci

(1-1)(1-2)

1 2

(1-1)2

1(1-2)

12

Map distanceMap distance

Map distance between two loci (Morgans)

= Expected number of crossovers per meiosis

Note: Map distances are additive

Recombination & map Recombination & map distancedistance

2

1 2me

Haldane mapfunction

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 0.2 0.4 0.6 0.8 1

Map distance (M)

Re

co

mb

ina

tio

n f

rac

tio

n

Methods of Linkage Methods of Linkage AnalysisAnalysis

Model-based lod scores Assumes explicit trait model

Model-free allele sharing methods Affected sib pairs Affected pedigree members

Quantitative trait loci Variance-components models

Double Backcross :Double Backcross :Fully Informative GametesFully Informative Gametes

AaBb aabb

AABB aabb

AaBb aabb Aabb aaBb

Non-recombinant Recombinant

Linkage Analysis :Linkage Analysis :Fully Informative GametesFully Informative Gametes

Count Data Recombinant Gametes: RNon-recombinant Gametes: N

Parameter Recombination Fraction:

Likelihood L() = R (1- )N

Parameter

Chi-square

)(ˆ RNR

)5log(.)(

)1log(log22

NR

NR

Phase Unknown MeiosesPhase Unknown Meioses

AaBb aabb

AaBb aabb Aabb aaBb

Non-recombinant Recombinant

Recombinant Non-recombinant

Either :

Or :

Linkage Analysis :Linkage Analysis :Phase-unknown MeiosesPhase-unknown Meioses

Count Data Recombinant Gametes: XNon-recombinant Gametes: Y

or Recombinant Gametes: YNon-recombinant Gametes: X

Likelihood L() = X (1- )Y + Y (1- )X

An example of incomplete data :

Mixture distribution likelihood function

Parental genotypes Parental genotypes unknownunknown

Likelihood will be a function of

allele frequencies (population parameters)

(transmission parameter)

AaBb aabb Aabb aaBb

Trait phenotypesTrait phenotypes

Penetrance parameters

Genotype Phenotype

f2AA

aa

Aa

Disease

Normal

f1

f0

1- f2

1- f1

1- f0

Each phenotype is compatible with multiple genotypes.

General Pedigree General Pedigree LikelihoodLikelihood

Likelihood is a sum of products (mixture distribution likelihood)

n

f

imf

f

i

G

n

gggtransgpopgxpenL iiii

111

)|()()|( ,

number of terms = (m1, m2 …..mk)2n

where mj is number of alleles at locus j

Elston-Stewart algorithmElston-Stewart algorithmReduces computations by Peeling:

Step 1Condition likelihoods of family 1 on genotype of X.

1

2X

Step 2Joint likelihood of families 2 and 1

Lod Score: Morton Lod Score: Morton (1955)(1955)

5.0

log

L

LLod

Lod > 3 conclude linkage

Prior odds linkage ratio Posterior odds1:50 1000 20:1

Lod <-2 exclude linkage

Linkage AnalysisLinkage AnalysisAdmixture TestAdmixture Test

Model

Probabilty of linkage in family =

Likelihood

L(, ) = L() + (1- ) L(=1/2)

Allele sharing Allele sharing (non-parametric) (non-parametric)

methodsmethodsPenrose (1935): Sib Pair linkage

For rare disease IBDConcordant affected

Concordant normalDiscordant

Therefore Affected sib pair design

Test H0: Proportion of alleles IBD =1/2

Affected sib pairs: Affected sib pairs: incomplete marker incomplete marker

informationinformationParameters: IBD sharing probabilities

Z=(z0, z1, z2)

iIBDMPzzLi

i

|2

0

Marker Genotype Data M: Finite Mixture Likelihood

SPLINK, ASPEX

Joint distribution of Joint distribution of Pedigree IBDPedigree IBD

IBD of relative pairs are independent

e.g If IBD(1,2) = 2 and IBD (1,3) = 2

then IBD(2,3) = 2

Inheritance vector gives joint IBD distribution

Each element indicates whether

paternally inherited allele is transmitted (1)

or maternally inherited allele is transmitted (0)

Vector of 2N elements (N = # of non-founders)

Pedigree allele-sharing Pedigree allele-sharing methodsmethods

Problem

APM: Affected family members Uses IBS

ERPA: Extended Relative Pairs Analysis Dodgy statisticGenehunter NPL: Non-Parametric Linkage Conservative

Genehunter-PLUS: Likelihood (“tilting”)

•All these methods consider affected members only

Convergence of Convergence of parametric and non-parametric and non-parametric methodsparametric methods

Curtis and Sham (1995)

MFLINK: Treats penetrance as parameter

Terwilliger et al (2000)

Complex recombination fractions

Parameters with no simple biological interpretation

Quantitative Sib Pair LinkageQuantitative Sib Pair Linkage

X, Y standardised to mean 0, variance 1r = sib correlationVA = additive QTL variance

(X-Y)2 = 2(1-r) – 2VA(-0.5) +

Haseman-Elston Regression (1972)Haseman-Elston Regression (1972)

Haseman-Elston Revisited (2000)Haseman-Elston Revisited (2000)

XY = r + VA(-0.5) +

Improved Haseman-Improved Haseman-ElstonElston

Sham and Purcell (2001) Use as dependent variable

Gives equivalent power to variance components model for sib pair data

2YX

2

2

)1( r

YX

2

2

2

2

)1()1( r

YX

r

YX

Variance components Variance components linkagelinkage

Models trait values of pedigree members jointly Assumes multivariate normality conditional on IBD Covariance between relative pairs

= Vr + VA [-E()]

Where V = trait variance

r = correlation (depends on relationship)

VA= QTL additive variance

E() = expected proportion IBD

QTL linkage model for sib-pair QTL linkage model for sib-pair datadata

PT1

QSN

PT2

Q S N

1

[0 / 0.5 / 1]

n qs nsq

No linkageNo linkage

Under linkageUnder linkage

Incomplete Marker Incomplete Marker InformationInformation

IBD sharing cannot be deduced from marker genotypes with certainty

Obtain probabilities of all possible IBD values

Finite mixture likelihood

Pi-hat likelihood

Ai ViIBDXLZL ;|

AVIBDXLL ;ˆ2|

QTL linkage model for sib-pair QTL linkage model for sib-pair datadata

PT1

QSN

PT2

Q S N

1

n qs nsq

Conditioning on Trait Conditioning on Trait ValuesValues

Usual test

0;ln

;|lnln

A

Ai

VXL

ViIBDXLZMaxLR

Conditional test

Ai

Ai

ViIBDXLP

ViIBDXLZMaxLR

;|ln

;|lnln

Zi = IBD probability estimated from marker genotypesPi = IBD probability given relationship

QTL linkage: some QTL linkage: some problemsproblems

Sensitivity to marker misspecification of marker allele frequencies and positions

Sensitivity to non-normality / phenotypic selection Heavy computational demand for large pedigrees or

many marker loci Sensitivity to marker genotype and relationship errors Low power and poor localisation for minor QTL