Grouping loci
description
Transcript of Grouping loci
Grouping lociCriteria• Maximum two-point recombination fraction
– Example -rij ≤ 0.40
• Minimum LOD score - Zij
– For n loci, there are n(n-1)/2 possible combinations that will be tested
– Expect probability of false positives• Significant probability value - pij
– Example pij ≤ 0.00001
Locus ordering• Ideally, we would estimate the likelihoods for all
possible orders and take the one that is most probable by comparing log likelihoods
• That is computationally inefficient when there are more than ~10 loci
• Several methods have been proposed for producing a preliminary order
Locus ordering
6)2)(1(
kkkntriplets
No. of locik
Possible orders
No. of triplets
2 1 0
3 3 1
5 60 10
10 1,814,400 120
20 1.22 X 1018 1,140
40 4.08 X 1047 9,880
Number of orders among k loci
Number of triplets among k loci
2)1)(2)(3)...(2)(1(
2!
kkkknk
Three-point Analysis
32
)1)(2)(3(2!3
3 n
Number of unique orders among k loci2!knk
Order Mirror Order
ABC CBA
ACB BCA
BAC CAB
For three loci (k = 3 )
Three-point analysis
Non-Additivity of recombination frequencies
A B C
rAB rBC
rAC
The recombination frequency over the interval A – C (rAC) is less than the sum of rAB and rBC : rAC < rAB + rBC.This is because (rare) double recombination events (a recombination in both A - B and B - C) do not contribute to recombination between A and C.
Non-Additivity of recombination frequenciesA B C
A B C
A B C
A B C
P00=(1-rAB)(1-rBC)
P10=rAB(1-rBC)
P01=(1-rAB)rBC
P11=rABrBC
rAC=rAB(1-rBC)+(1-rAB)rBC
rAC=rAB+rBC-2rABrBC
• Interference means that recombination events in adjacent intervals interfere. The occurrence of an event in a given interval may reduce or enhance the occurrence of an event in its neighbourhood.
• Positive interference refers to the ‘suppression’ of recombination events in the neighbourhood of a given one.
• Negative interference refers to the opposite: enhancement of clusters of recombination events.
• Positive interference results in less double recombinants (over adjacent intervals) than expected on the basis of independence of recombination events.
Interference
rAC=rAB+rBC-2CrABrBC
Interference
C = coefficient of coincidence
A B C
a b c
Interference I = 1 - C
overs sdoublecors ofnumber Expectedovers cross double ofnumber Observed
CCoefficient
of coincidence
Expected number of double crossovers = rABrBCN
Observed Count: 22 24 16 14 8 10 2 4
24.0100
)42108(ˆ
36.0100
)421416(ˆ
BC
AB
r
r
DH population N=100, locus order ABC
69.064.86
10024.036.042
DCs ofnumber ExpectedDCs ofnumber Observed
C
Interference
• No interference– C = 1 and Interference = 1-C = 0
• Complete interference– C = 0 and Interference = 1-C = 1
• Negative interference– C > 1 and Interference = 1-C < 0
• Positive interference– C < 1 and Interference = 1-C > 0
Three locus analysis, DH population
Expected frequency
Genotypes Observed count Without interference With interference
ABC/ABC f1 0.5(1r1)(1r2) 0.5(1r1r2Cr1r2)
ABc/ABc f2 0.5(1r1) r2 0.5(r2Cr1r2)
AbC/AbC f3 0.5r1r2 0.5Cr1r2
Abc/Abc f4 0.5(1r2) r1 0.5(r1Cr1r2)
aBC/aBC f5 0.5(1r2) r1 0.5(r1Cr1r2)
aBc/aBc f6 0.5r1r2 0.5Cr1r2
abC/abC f7 0.5(1r1) r2 0.5(r2Cr1r2)
abc/abc f8 0.5 (1r1)(1r2) 0.5(1r1r2Cr1r2)
NR
NR
DC12
DC12
SC2
SC2
SC1
SC1
For the ABC locus order
MLE of two-locus recombination fractions
Nffffr
Nffffr
Nffffr
BC
AC
AB
)(ˆ
)(ˆ
)(ˆ
7263
5472
5463
2
2121
1
2rr
rCrrrrrr
BC
AC
AB
Genotypes Observed count
Expected frequency
ABC/ABC f1 = 34 0.5(1r1r2Cr1r2)ABc/ABc f2 = 5 0.5(r2Cr1r2)AbC/AbC f3 = 11 0.5Cr1r2Abc/Abc f4 = 0 0.5(r1Cr1r2)aBC/aBC f5 = 1 0.5(r1Cr1r2)aBc/aBc f6 = 10 0.5Cr1r2abC/abC f7 = 4 0.5(r2Cr1r2)abc/abc f8 = 35 0.5(1r1r2Cr1r2)
3.0100
)451011(ˆ
1.0100
)1045(ˆ
22.0100
)101011(ˆ
BC
AC
AB
r
r
rRegardless of locus order the MLEs of r are
For the ABC locus order
Ordering Loci by Minimizing Double Crossovers
Genotypes Observed count
ABC/ABC f1 = 34
ABc/ABc f2 = 5
AbC/AbC f3 = 11
Abc/Abc f4 = 0
aBC/aBC f5 = 1
aBc/aBc f6 = 10
abC/abC f7 = 4
abc/abc f8 = 35
Genotypes Observed countABC + abc f1 + f8 = 34 + 35 = 69
ABc + abC f2 + f7 = 5 + 4 = 9
AbC + aBc f3 + f6 = 11 + 10 = 21
Abc + aBC f4 + f5 = 0 + 1 = 1 Rarest genotypes are double recombinants
B A C
b a cX X
B a C
b A c
The order of loci is BAC
Ordering Loci by using recombination fractions
MLEs of r are
Largest r is rBC = 0.3
Smallest r is rAC = 0.1
B C
A CB A C
Order
3.0100
)451011(ˆ
1.0100
)1045(ˆ
22.0100
)101011(ˆ
BC
AC
AB
r
r
r
Minimum Sum of Adjacent Recombination Frequencies (SARF) (Falk 1989)
1
1
ˆl
iaa jirSARF
Order SARF
ABC 0.22 + 0.30 = 0.52
BAC 0.22 + 0.10 = 0.32
ACB 0.10 + 0.30 = 0.40
r = recombination frequency between adjacent loci ai and ajfor a given order: 1, 2, 3, …, l -1, l
The B-A-C order gives MIN[SARF] and the minimum distance (MD) map
3.0100
)451011(ˆ
1.0100
)1045(ˆ
22.0100
)101011(ˆ
BC
AC
AB
r
r
r
Simulations have shown that SARF is a reliable method to obtain markers orders for large datasets
Minimum Product of Adjacent Recombination Frequencies (PARF) (Wilson 1988)
Order PARF
ABC 0.22 x 0.30 = 0.066
BAC 0.22 x 0.10 = 0.022
ACB 0.10 x 0.30 = 0.0303.0ˆ1.0ˆ22.0ˆ
BC
AC
AB
rrr
r = recombination frequency between adjacent loci ai and ajfor a given order: 1, 2, 3, …, l -1, l
The B-A-C order gives MIN[PARF] and the minimum distance (MD) map
SARF and PARF are equivalent methods to obtain markers orders for large datasets
1
1
ˆl
iaa jirPARF
Maximum Sum of Adjacent LOD Scores(SALOD)
1
1
l
iaa ji
zSALOD
Order SALOD
ABC 3.135 + 1.551 = 4.686
BAC 3.135 + 6.942 = 10.077
ACB 6.942 + 1.551 = 8.493551.1;3.0ˆ942.6;1.0ˆ135.3;22.0ˆ
BCBC
ACAC
ABAB
ZrZrZr
Z = LOD score for recombination frequency between adjacent loci ai and aj
for a given order: 1, 2, 3, …, l -1, l
The B-A-C order gives MAX[SALOD]
SALOD is sensitive to locus informativeness
Minimum Count of Crossover Events (COUNT) (Van Os et al. 2005)
1
1
l
iaa ji
XCOUNT
Order COUNT
ABC 22 + 30 = 52
BAC 22 + 10 = 32
ACB 10 + 30 = 40
X = simple count of recombination events between adjacent loci ai and aj
for a given sequence: 1, 2, 3, …, l -1, l
The B-A-C order gives MIN[COUNT]
3.0100
)451011(ˆ
1.0100
)1045(ˆ
22.0100
)101011(ˆ
BC
AC
AB
r
r
r
COUNT is equivalent to SARF and PARF with perfect data. COUNT is superior to SARF with incomplete data
Locus Order- Likelihood Approach
k
iii pfCrrZ
121 )4log(),,(
r1 = Recombination fraction in interval 1r2 = Recombination fraction in interval 2C = Coefficient of coincidencepi = fi / nfi = Expected frequency of the ith pooled phenotypic classI = 1, 2, …, kk = No. of pooled phenotypic classes
k
iii pfCrrL
121 log),,(
Three locus analysis, DH population
Expected frequency
Genotypes Observed count Without interference With interference
ABC/ABC f1 0.5(1r1)(1r2) 0.5(1r1r2Cr1r2)
ABc/ABc f2 0.5(1r1) r2 0.5(r2Cr1r2)
AbC/AbC f3 0.5r1r2 0.5Cr1r2
Abc/Abc f4 0.5(1r2) r1 0.5(r1Cr1r2)
aBC/aBC f5 0.5(1r2) r1 0.5(r1Cr1r2)
aBc/aBc f6 0.5r1r2 0.5Cr1r2
abC/abC f7 0.5(1r1) r2 0.5(r2Cr1r2)
abc/abc f8 0.5 (1r1)(1r2) 0.5(1r1r2Cr1r2)
NR
NR
DC12
DC12
SC2
SC2
SC1
SC1
For the ABC locus order
MLE of two-locus recombination fractions
Nffffr
Nffffr
Nffffr
BC
AC
AB
)(ˆ
)(ˆ
)(ˆ
7263
5472
5463
2
2121
1
2rr
rCrrrrrr
BC
AC
AB
Genotypes Observed count
Expected frequency
ABC/ABC f1 = 34 0.5(1r1r2Cr1r2)ABc/ABc f2 = 5 0.5(r2Cr1r2)AbC/AbC f3 = 11 0.5Cr1r2Abc/Abc f4 = 0 0.5(r1Cr1r2)aBC/aBC f5 = 1 0.5(r1Cr1r2)aBc/aBc f6 = 10 0.5Cr1r2abC/abC f7 = 4 0.5(r2Cr1r2)abc/abc f8 = 35 0.5(1r1r2Cr1r2)
3.0100
)451011(ˆ
1.0100
)1045(ˆ
22.0100
)101011(ˆ
BC
AC
AB
r
r
rRegardless of locus order the MLEs of r are
For the ABC locus order
Haplotypes Obs. No. Freq. C=3.00 Exp. freq. Exp. freq. C=0 Exp. freq. C=1ABC + abc f1 = 69 0.69 1r1r2Cr1r2 1-0.10-0.3=0.60 1-0.10-0.30+0.03=0.63
ABc + abC f2 = 9 0.09 Cr1r2 0.00 0.03
AbC + aBc f3 = 21 0.21 r2Cr1r2 0.30 0.30-0.03=0.27
Abc + aBC f4 = 1 0.01 r1Cr1r2 0.10 0.10-0.03=0.07
Haplotypes Obs. No.
Freq. C=3.18 Exp. freq. Exp. freq. C=0 Exp. freq. C=1
ABC + abc f1 = 69 0.69 1r1r2Cr1r2 1-0.22-0.30=0.48 1-0.22-0.30+0.066=0.546
ABc + abC f2 = 9 0.09 r2Cr1r2 0.30 0.30-0.066=0.234
AbC + aBc f3 = 21 0.21 Cr1r2 0.00 0.066
Abc + aBC f4 = 1 0.01 r1Cr1r2 0.22 0.22-0.066=0.154
Haplotypes Obs. No.
Freq. C=0.45 Exp. freq. Exp. freq. C=0 Exp. freq. C=1
ABC + abc f1 = 69 0.69 1r1r2Cr1r2 1-0.22-0.10=0.68 1-0.22-0.10+0.022=0.702
ABc + abC f2 = 9 0.09 r2Cr1r2 0.10 0.10-0.022=0.078
AbC + aBc f3 = 21 0.21 r1Cr1r2 0.22 0.22-0.022=0.198
Abc + aBC f4 = 1 0.01 Cr1r2 0.00 0.022
ABC ORDER
BAC ORDER
ACB ORDER
k
iii pfCrrZ
121 )4log(),,(
Haplotypes Obs. No. pi, C=3.18 pi, C=1
ABC + abc f1 = 69 0.69 0.546
ABc + abC f2 = 9 0.09 0.234
AbC + aBc f3 = 21 0.21 0.066
Abc + aBC f4 = 1 0.01 0.154
764.3601.0log121.0log2109.0log969.0log69)18.3,30.0,22.0( L
ABC ORDER
k
iii pfCrrL
121 log),,(
413.49154.0log1066.0log21234.0log9546.0log69)0.1,30.0,22.0( L
793.10)154.0)(4log(1)066.0)(4log(21)234.0)(4log(9)546.0)(4log(69)0.1,30.0,22.0( Z
441.23)01.0)(4log(1)21.0)(4log(21)09.0)(4log(9)69.0)(4log(69)18.3,30.0,22.0( Z
Haplotypes Obs. No. pi, C=0.45 pi, C=1
ABC + abc f1 = 69 0.69 0.702
ABc + abC f2 = 9 0.09 0.078
AbC + aBc f3 = 21 0.21 0.198
Abc + aBC f4 = 1 0.01 0.022
BAC ORDER
764.3601.0log121.0log2109.0log969.0log69)45.0,10.0,22.0( L
002.37022.0log1198.0log21078.0log9702.0log69)0.1,10.0,22.0( L
205.23)022.0)(4log(1)198.0)(4log(21)078.0)(4log(9)702.0)(4log(69)0.1,10.0,22.0( Z
441.23)01.0)(4log(1)21.0)(4log(21)09.0)(4log(9)69.0)(4log(69)45.0,10.0,22.0( Z
k
iii pfCrrZ
121 )4log(),,(
k
iii pfCrrL
121 log),,(
Haplotypes Obs. No. pi, C=3.00 pi, C=1
ABC + abc f1 = 69 0.69 0.63
ABc + abC f2 = 9 0.09 0.03
AbC + aBc f3 = 21 0.21 0.27
Abc + aBC f4 = 1 0.01 0.07
ACB ORDER
764.3601.0log121.0log2109.0log969.0log69)0.3,30.0,10.0( L
648.4007.0log127.0log2103.0log963.0log69)0.1,30.0,102.0( L
558.19)07.0)(4log(1)27.0)(4log(21)03.0)(4log(9)63.0)(4log(69)0.1,30.0,10.0( Z
441.23)01.0)(4log(1)21.0)(4log(21)09.0)(4log(9)69.0)(4log(69)0.3,30.0,10.0( Z
k
iii pfCrrZ
121 )4log(),,(
k
iii pfCrrL
121 log),,(
Likelihood method
Unconstrained Model Constrained Model
Order C Likelihood LODLikelihood
C=1LOD C=1
ABC 3.18 -36.764 23.441 -49.413 10.793
BAC 0.45 -36.764 23.441 -37.001 23.204
ACB 3.00 -36.764 23.441 -40.648 19.558
The B-A-C order gives highest likelihood and LOD under a no interference C=1 model Most multipoint ML mapping algorithms use no interference models
Ordering Loci• GMENDEL (Liu and Knapp 1990) minimizes SARF
(Minimum Sum of Adjacent Recombination Frequencies )
• PGRI (Lu and Liu 1995) minimizes SARF (Minimum Sum of Adjacent Recombination Frequencies ) or maximizes the likelihood.
• RECORD (Van Os et al. 2005) minimizes COUNT (Minimum Count of Crossover Events)
Ordering Loci• JoinMap 4 (Van Ooijen, 2005)
– minimizes the least square locus order using a stepwise search (regression)
– Monte Carlo maximum likelihood (ML). Very fast computation of high density maps