Workshop Biostatistics in relationship testing ESWG-meeting 2014

32
Workshop Biostatistics in relationship testing ESWG-meeting 2014 Daniel Kling Norwegian Institute of Public Health and Norwegian University of Life Sciences, Norway Andreas Tillmar National Board of Forensic Medicine, Sweden

description

Workshop Biostatistics in relationship testing ESWG-meeting 2014. Daniel Kling Norwegian Institute of Public Health and Norwegian University of Life Sciences, Norway Andreas Tillmar National Board of Forensic Medicine, Sweden. http://www.famlink.se/BiostatWorkshopESWG2014.html. - PowerPoint PPT Presentation

Transcript of Workshop Biostatistics in relationship testing ESWG-meeting 2014

Page 1: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Workshop

Biostatistics in relationship testingESWG-meeting 2014

Daniel Kling

Norwegian Institute of Public Health and Norwegian University of Life Sciences, Norway

Andreas Tillmar

National Board of Forensic Medicine, Sweden

Page 2: Workshop Biostatistics in relationship testing ESWG-meeting 2014

http://www.famlink.se/BiostatWorkshopESWG2014.html

Page 3: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Outline of this first session• Likelihood ratio principle• Theoretical: paternity issues

– Calculation of LR “by hand”.– Accounting for:

• Mutations• Null/silent alleles• Linkage and linkage disequilibrium (allelic association)• Co-ancestry

Page 4: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Is Mike the father of Brian?

DNA-testing to solve relationship issues

DNA-data

vWA D12S391 D21S11 ….

Mike: 12,13 22,22 31,32.2 ….

Mother of Brian: 12,14 23,23.2 30,30 ….

Brian: 13,14 22,23 30,31 ….

Can we use the information from the DNA-data to answer the question?

YES!We estimate the “weight of evidence” by biostatistical calculations.

How much more (or less) probable is it that M is the father of B, compared to not being the father of B?

We will follow the recommendations given by Gjertson et al., 2007

“ISFG: Recommendations on biostatistics in paternity testing.”

Page 5: Workshop Biostatistics in relationship testing ESWG-meeting 2014

H1: The alleged father is the father of the child

H2: The alleged father is unrelated to the child

We wish to find the probability for H1 to be true, given the evidence

= Pr(H1|E)Via Baye’s theorem:

)Pr(

)Pr(

)|Pr(

)|Pr(

)|Pr(

)|Pr(

2

1

2

1

2

1

H

H

HE

HE

EH

EH

Posterior odds LR (or PI) Prior odds

Today we focus on the likelihood ratio (LR)

)|Pr(

)|Pr(

2

1

HDNA

HDNA

Calculating the “Weight of evidence” in a case with DNA-data

Page 6: Workshop Biostatistics in relationship testing ESWG-meeting 2014

)|Pr(

)|Pr(

2

1

HDNA

HDNALR=

C

M AF Founders

Child Transmission probabilities

Mendelian factor

(1, 0.5, µ, etc)

Genotype probabilities

2*pi*pj*(1-Fst)

Fst*pi+ (1-Fst)*pi*pi

2*pi*pj

pi*pi

Assuming HWE

GC

GMGAF

Page 7: Workshop Biostatistics in relationship testing ESWG-meeting 2014

C

M AF

GM=a,b GAF=c,d

GC=a,c

ba pp 2

dc pp 2 0.5 0.5

ba pp 2

dc pp 2 0.5 C

M AF

GM=a,bGAF=c,d

GC=a,c

dc pp 2

Paternity trio – An example

Page 8: Workshop Biostatistics in relationship testing ESWG-meeting 2014

ba pp 2

dc pp 2 0.5 0.5

ba pp 2

dc pp 2 0.5 dc pp 2LR==

=

=0.5

dc pp 2

Paternity trio – An example

1

1

N

xp cC

Probability to observe a certain allele in the population (e.g. population frequency)

Page 9: Workshop Biostatistics in relationship testing ESWG-meeting 2014

C

M AF

GM=10,11 GAF=13,14.2

GC=10,14.2

0.5 0.5

C

M AF

GM=10,11 GAF=13,14.2

GC=10,14.2

11102 pp 2.14132 pp

0.5 11102 pp 2.14132 pp

2.14132 pp

LR=0.5

2.14132 pp

Paternity trio – real data

Page 10: Workshop Biostatistics in relationship testing ESWG-meeting 2014

C

M AF

GM=10,11 GAF=13,15.2

GC=10,14.2

0.5 11102 pp 2.15132 pp

Paternity trio - Mutation

Mutation?

Page 11: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Mutations

Brinkmann et al (1998) found 23 mutations in 10,844 parent/child offsprings. Out of these 22 were single step and 1 were two-step mutations.

AABB summary report covering ~4000 mutations showed approx 90-95% single step, 5-10% two-step and 1% or less more than two steps.

Mutation rate depend on marker, sex (female/male), age of individual, allele size

Note that these are the observed mutation rate, not the actual mutation rate (hidden mutations, assumption of mutation with the smallest step.

“The possibility of mutation shall be taken into account whenever a genetic inconsistency is observed”

Page 12: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Mutations cont

Approaches to calculate LR accounting for mutations:

LR~(µtot*adj_steps)/p(paternal allele) e.g “stepwise mutation model”

LR~µ/A (A=average PE)

LR~µ

LR~µ/p(paternal allele)

0 +1 +2 +3-1 +2 +3-2-31- µtot

µ+1µ-1 µ+2 µ+3µ-2µ-3

12 13 14 1511109Allele:

µtot=µ+1+µ-1+µ+2+µ-2+…..

Page 13: Workshop Biostatistics in relationship testing ESWG-meeting 2014

C

M AF

GM=10,11 GAF=13,15.2

GC=10,14.2

)( 2.142.152.1413 mutmut

5.09.05.02.142.15 Totµmut “Stepwise decreasing with range”

50% for 15.2 to be transmitted

Total mutation rate

90% of mutations are 1 step

50% are loss of fragment size

Mutation model decreasing with range

Paternity trio - Mutation

Page 14: Workshop Biostatistics in relationship testing ESWG-meeting 2014

C

M AF

GM=10,11 GAF=13,15.2

GC=10,14.2

0.5

C

M AF

GM=10,11 GAF=13,15.2

GC=10,14.2

11102 pp

0.5 11102 pp 2.14132 pp

LR= 2.14132 pp

2.15132 pp

2.15132 pp

)( 2.142.152.1413 mutmut

Paternity trio - Mutation

Page 15: Workshop Biostatistics in relationship testing ESWG-meeting 2014

C

M AF

GM=10,11 GAF=17.2

GC=10

GAF=17.2,17.2

GAF=17.2, s

GC=10,10

GC=10,s

Paternity trio – Silent allele

,s

,s

Page 16: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Null/silent allele“STR null alleles mainly occur when a primer in the PCR reaction fails to hybridize to the template DNA...”

“Laboratories should try to resolve this situation with different pairs of STR primers”

“…no data from which to estimate the null allele frequency exists, consider a generous bracket of plausible values for the frequency and compute a corresponding range of values for the PI. If the resulting range of case-specific PI values are all large (as large as a laboratory’s threshold value for issuing non-exclusion paternity reports), then report that the PI is ‘‘at least greater than the smallest value’’ in the range”

Impact of null/silent allele may vary on the pedigree and the genotype constellations

Page 17: Workshop Biostatistics in relationship testing ESWG-meeting 2014

C

M AF

GM=10,11 GAF=17.2

GC=10

C

M AF

GM=10,11 GAF=17.2

GC=10

11102 pp

11102 pp

LR=

GAF=17.2,17.2

GAF=17.2, s

GC=10,10

GC=10,s

5.05.02 2.17 spp

)(5.0)2( 102.172.172.17 ss pppppp

Paternity trio – Silent allele

Page 18: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Linkage and Linkage disequilibrium

• Linkage– Can be described as the co-segregation of closely

located loci within a family or pedigree.– Effects the transmission probabilities!

• Linkage disequilibrium (LD)– Allelic association.– Two alleles (at two different markers) which is

observed more often/less often than can be expected.– Effects the founder genotype probabilities, not the

transmission probabilities!

Page 19: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Mother: 16,18 21,24

Chr3:1 Chr3:2

18

24

16

21

Chr3:1 Chr3:2

16

24

18

21

16

21

18

24

18

21

16

24

Distance R, Theta

Marker 1 Marker 2

If LD, different probabilities for each phase

If linkage, different probability of transmission

Page 20: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Mother: 16,18 21,24

Child1: 16,25 21,29

Child2: 16,22 24,30

Chr3:1 Chr3:2

25 paternal

29 paternal

16

21

Chr3:1 Chr3:2

16

29

25

21

Chr3:1 Chr3:2

22 paternal

30 paternal

16

24

Chr3:1 Chr3:2

16

30

22

24

Page 21: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Linkage and LD when it comes to forensic DNA-markers?

• 14 pair of autosomal STRs < 50cM– Phillips et al 2012– vWA-D12S391 (around 10cM, i.e. 10% recomb rate)

• Linkage will have an effect on LR for some pedigrees– Gill et al 2012, O’Connor & Tillmar 2012, Kling et al, 2012.– Incest.– Sibling, uncle-nephew.– Impact of linkage will depend on

• Pedigree• Markers typed• Individuals’ DNA data

• LE for vWA-D12S391!– Gill et al 2012, and O’Connor et al 2011.

Page 22: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Linkage cont.

• Assuming LE (no allelic association)– Gill et al 2012: "...linkage has no effect at all in a

pedigree unless:• at least one individual in the pedigree (the central individual) is

involved in at least two transmissions of genetic material…, and• that individual is a double heterozygote at the loci in question."

• Thus…– For standard paternity cases (father vs unrelated),

linkage has no impact on LR– BUT for paternity vs uncle (or other close related

alternative), linkage has impact on LR

Page 23: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Presenting the evidence

• Likelihood ratio (LR) or Paternity index (PI)• Posterior probability (must set priors)

Simplifies to LR/LR+1, assuming two hypotheses with equal priors

Li=Likelihood under hypothesis iπi=Prior probability for hypothesis i

Page 24: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Subpopulation correction

24

What is subpopulation effects?

First developed by Wright in (1965)

Balding (1994)

Hardy Weinberg equilibrium

How to estimate it?

Reasonable values (0.01-0.05)

Sampling formula (Fst = θ)

* (1 )* ( )'( )

1 ( 1)*ia i

iObs

Fst n Fst p ap a

N Fst

Page 25: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Example 1. Genotype probabilities

25

HWE

Homozygouts: p(A)p(A)

Heterozygots: 2p(A)p(B)

HWD

Homozygouts: Fst*p(A) + (1-Fst)*p(A)p(A)

Heterozygouts: (1-Fst)*p(A)p(B)

2*0 (1 ) ( ) *1 (1 ) ( )( , ) (Sampling two A:s)= * ( ) (1 ) ( )

1 (0 1) 1 (1 1)

p A p AP A A P p A p A

*0 (1 ) ( ) *0 (1 ) ( )( , ) (Sampling one A and one B)= * (1 ) ( ) ( )

1 (0 1) 1 (1 1)

p A p BP A B P p A p B

Page 26: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Example 2. Random Match Prob.

26

H1: The profiles G1 and G2 are from the same individual

H2: The profiles G1 and G2 are from different individuals

Let G1=A,A and G2=A,A

HWE:

HWD:

12

2

( | ) ( 1) 11/

( | ) ( 1)* ( 2) ( )

P Data H P GRMP

P Data H P G P G p A

1

2

( | ) ( 1) (Sampling two A:s)1/

( | ) ( 1, 2) (Sampling four A:s)

(1 )(1 2 )

2 (1 ) ( ) 3 (1 ) ( )

P Data H P G PRMP Sampling

P Data H P G G P

p A p A

Page 27: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Example 3. Paternity

27

H1: The alleged father (AF) is the real father

H2: AF and the child are unrelated.

p(A) = 0.05

AFA/A

motherB/B

childA/B

Standard paternity case. First marker

Page 28: Workshop Biostatistics in relationship testing ESWG-meeting 2014

28

probability of data father

probability of data unrelated

( | , ) 1 10 20.

( | ) 0.05A

given AFLR

given AF

P child mother AF

P child mother p

AFA/A

motherB/B

childA/B

Standard paternity case. First marker

( ) ( ) ( | , ) (Sampling two A:s and two B:s)0

( ) ( ) ( | ) (Sampling three A:s and two B:s)

1 1 1 32 (1 )(Sampling the third A) 2 (1 )

1 (4 1)A A

P mother P AF P child mother AF PLR

P AF P mother P child mother P

pP p

Page 29: Workshop Biostatistics in relationship testing ESWG-meeting 2014

29

( ) ( ) ( | , ) (Sampling two A:s and two B:s)0

( ) ( ) ( | ) (Sampling three A:s and two B:s)

1 1 1 32 (1 )(Sampling the third A) 2 (1 )

1 (4 1)A A

P mother P AF P child mother AF PLR

P AF P mother P child mother P

pP p

Page 30: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Example 4. Siblings

30

H0: Two persons are full siblings

H1: The same two persons are unrelated

The two persons are homozygous A,A (p(A)=0.05)

We use the “IBD method”( | 0)

0( | 1)

(Sampling two A:s)*P(IBD=2|H0)+ (Sampling three A:s)*P(IBD=1|H0)+ (Sampling four A:s)*P(IBD=0|H0)

(Sampling four A:s)

0.25 0.5 (Sampling the third A) 0.25 (Sampling the

P Data HLR

P Data H

P P P

P

P P

last two A:s)

(Sampling the last two A:s)

2 (1 ) 2 (1 ) 3 (1 )0.25 0.5 0.25 *

1 (2 1) 1 (2 1) 1 (3 1)2 (1 ) 3 (1 )

*1 (2 1) 1 (3 1)

A A A

A A

P

p p p

p p

Page 31: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Example 4. Siblings

31

Page 32: Workshop Biostatistics in relationship testing ESWG-meeting 2014

Example 5. General relationships

32

We have N number of founder genotypes

If Fst!=0 these are not independent

We use the sampling formula to calculate updated allele frequencies

It is more complex to do by hand

In simple pairwise relationships use the IBD method (See Example 4)

Otherwise, use validated software!