; 1 10 20 25; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A
Locus file
Mapping using Recombinant Inbred Lines
Genetic Cross
Genotyping
Raw Marker Scores
Mapping – Inference of linear order of markers using raw scores
MadMapper_RECBIT – Quality control of genetic markers and group analysis
MadMapper_XDELTA – Inference of linear order of markers on linkage groups
CheckMatrix (py_matrix_2D_V248_RECBIT.py) –Visualization and validation of genetic maps using two-dimensional heat-plots and graphical genotyping
MadMapper and CheckMatrix are multi-platform Python programs that can be used on UNIX,
Windows, and Mac OS X; Detailed analysis (quality control and clustering)
can be done on a set of ~2,000 markers;
Map construction works in a reasonable timeframe with
up to ~500 markers;Large images (up to 10,000 x 10,000 pixels)
can visualize up to ~2 millionpairwise scores simultaneously
MadMapper_RECBIT input and output files
Group Info:[ *.group_info ]
one file per iteration
16 iterations with different cutoff
values
Adjacency List:[ *.adj_list ]one file per
iteration
16 iterations with different cutoff
values
RecombinationDistance Scores:
[ *.pairs_all ]
...................GM01 GM07 0.36 GM01 GM08 0.40 GM01 GM09 0.48 GM01 GM10 0.52 GM01 GM11 0.60 GM01 GM12 0.68 GM02 GM01 0.04 GM02 GM02 0.00 GM02 GM03 0.08 GM02 GM04 0.16 GM02 GM05 0.20 GM02 GM06 0.24 ...................
Group Info Summary:file [ *.x_tree_clust ]
Summary for clustering results for all 16 iterations
Distinct linkage groups can be inferred by analysis of this clustering / grouping information
Non-Redundant Marker Scores:
[ *.z_nr_scores.loc ]locus file with
non-redundant raw marker scores
; 1 10 20 25; | | | | GM01 A A A A A A A A A A A A A A A A B B B B B B B B B GM02 A A A A A A A A A A A A A A A B B B B B B B B B B GM03 A A A A A A A A A A A A A B B B B B B B B B B B B GM04 A A A A A A A A A A A B B B B B B B B B B B B B B GM05 A A A A A A A A A A B B B B B B B B B B B B B B B GM06 A A A A A A A A A B B B B B B B B B B B B B B B B GM07 A A A A A A A A A B B B B B B B B B B B B B B A A GM08 A A A A A A A A A B B B B B B B B B B B B B A A A GM09 A A A A A A A A A B B B B B B B B B B B A A A A A GM10 B A A A A A A A A A B B B B B B B B B A A A A A A GM11 B B A A A A A A A A B B B B B B B B A A A A A A A GM12 B B B A A A A A A A B B B B B B B A A A A A A A A
INPUT: Locus file
Python_MadMapper_V248_RECBIT_012.py
Marker summary:[ *.z_marker_sum ]
for each marker, a ‘quality class’ is
assigned, which is useful for selection of ‘core’ markers
Marker Scores Info:
[ *.x_scores_stat ]detailed
information about scores and linkage
Trio Analysis:[ *.z_trio_good ][ *.z_trio_best ][ *.z_trios_bad ]
analysis of all trios (triplets) for
non-redundant set of markers
LOG file:( *.x_log_file )
information about run parameters
one input file - locus file with raw marker scores
82 output files
MadMapper BIT scoring system is used as an alternative to LOD scores to quantify linkage confidence between markers
JoinMap LOD scores JoinMap REC scores
MadMapper BIT scores MadMapper REC scores
Arabidopsis Genetic Map (Dean and Lister), five linkage groups: Comparison of Different Scoring Systems
MadMapper_RECBIT Clustering: Group Info Summary [ *.x_tree_clust file ]provides information about marker grouping –
belonging of any particular marker to specific linkage group
MadMapper_RECBIT BIN Analysis distinguishes true bins from linked groups
M_1 A A A B B B A A A A B B B B A A - A A B B B B A B B A A B A A A B B B B
M_2 A A A B B B A - A A B B B B A A A A A B B B B A B B A A B A A A B B B B
M_3 A A A B B B A A A A B B - B A A A A A B B B B A B B A A B A A A B B B B
M_4 A A A B B B A A A A B B A B A A A A A B B - B A B B A A B A A A B B B B
M_2
M_4
M_3
M_1
LinkedGroup
SaturatedNode
DilutedNode
Example of Complete Graph:
all nodes are‘saturated’
MadMapper_RECBIT Marker Summary [ *.z_marker_sum file ] provides info about redundancy of scores, marker qualities, and allele distortion
MARKER_Flank_1
REC1BIT1
D_FR1MARKER_
MiddleREC2
BIT2
D_FR2MARKER_F
lank_2
REC_Flank
BIT_F
D_FR_F
D_REC D_REC_
Flank
COR47 0.1 336 0.6931 CAT3 0.0857 348 0.6931 G2395 *** 0.0253 450 0.7822 *** 5 + 0
G2395 0.0857 348 0.6931 CAT3 0.1 336 0.6931 COR47 *** 0.0253 450 0.7822 *** 5 + 0
LK141 0.1522 192 0.4554 GUT15 0.193 210 0.5644 MI238 *** 0.1231 294 0.6436 *** 4 + 0
MI238 0.193 210 0.5644 GUT15 0.1522 192 0.4554 LK141 *** 0.1231 294 0.6436 *** 4 + 0
MI204 0.051 528 0.9703 MI51 0.0806 312 0.6139 SGCSNP41 *** 0.0161 360 0.6139 *** 4 + 0
SGCSNP41 0.0806 312 0.6139 MI51 0.051 528 0.9703 MI204 *** 0.0161 360 0.6139 *** 4 + 0
M336 0.0494 438 0.802 COR15 0.0385 432 0.7723 VE018 *** 0.0879 450 0.901 *** 0 + 0
VE018 0.0385 432 0.7723 COR15 0.0494 438 0.802 M336 *** 0.0879 450 0.901 *** 0 + 0
ARR7 0.0115 510 0.8614 COR47 0 282 0.4653 F15571 *** 0.0179 324 0.5545 *** 0 + 0
F15571 0 282 0.4653 COR47 0.0115 510 0.8614 ARR7 *** 0.0179 324 0.5545 *** 0 + 0
PAP3 0.0227 504 0.8713 COR78 0.0577 276 0.5149 PDC2 *** 0.0588 270 0.505 *** 0 + 0
PDC2 0.0577 276 0.5149 COR78 0.0227 504 0.8713 PAP3 *** 0.0588 270 0.505 *** 0 + 0
Bad
Tri
os
Go
od
Tri
os
MadMapper_RECBIT Trio (Triplet) Analysis
Number of double crossovers should be low for ‘good’ trios
Number of double crossovers is high for ‘bad’ trios
M_1 A A A B B B A A A A B B B B A A - A A B B B B A B B A A B A A A B B B B X X X XM_M A A A B A B A - A A B B B B A B A A A B B A B A B B A A B A A A B B A B X X X XM_2 A A A B B B A A A A B B - B A A A A A B B B B A B B A A B A A A B B B B
‘middle’ marker
flanking marker 1
flanking marker 2
MadMapper_XDELTA Usage:
MadMapper_XDELTA takes three files as input:
1. Matrix (pairwise distances between markers)
2. List of ‘frame’ markers
3. List of markers to map
First step: finding the best map for ‘frame’ markers by checking all possible combinations
(up to 10 markers)
optionally: unlimited list of ‘frame’ markers with a fixed orderBest-Fit extension
Take one marker from the list of markers to map and insert it into 2-dimensional matrix of the current best map. Check
for all possible positions. Calculate ‘delta’ and find the map with the lowest ‘delta’ value (lowest ‘entropy’)
Move to the next marker to map until all markers are mapped. Optional shuffling (ripple) after several steps
Visual Explanation ofMinimum Entropy Approach to Infer Linear Order
Using MadMapper_XDELTA program
CheckMatrix 2D plot:
randomorderhigh
‘entropy’
partiallywrongorder
rightorderlow
‘entropy’
MadMapper_XDELTA analyzes two-dimensional matrices of all pairwise scores and finds the best map that has a minimum total sum of differences between adjacent cells (map with the lowest ‘entropy’).
Visualization of numerical data
using CheckMatrix
=============================================
MATRIX (ALL PAIRS) : madmapper_test_small.out.pairs_all
MARKERS TO MAP : madmapper_test_small.list
FRAME MARKERS LIST : madmapper_test_small.frame
OUTPUT MAP FILE : madmapper_test_small.xdelta
MAX FRAME LENGTH : 12
FIXED FRAME ORDER : FALSE
LINKAGE GROUP ID : LG
DUMMY DEBUG : TRUE
=============================================
=======
GM02 GM06 GM10 *** 1.52 *** 0.5067 *** 1
GM02 GM10 GM06 *** 1.92 *** 0.64 *** 2
GM06 GM02 GM10 *** 1.68 *** 0.56 *** 3
=======
GM03 GM02 GM06 GM10 *** 2.16 *** 0.54 *** 1
GM02 GM03 GM06 GM10 *** 2.0 *** 0.5 *** 2
GM02 GM06 GM03 GM10 *** 2.64 *** 0.66 *** 3
GM02 GM06 GM10 GM03 *** 3.2 *** 0.8 *** 4
=======
GM08 GM02 GM03 GM06 GM10 *** 3.64 *** 0.728 *** 1
GM02 GM08 GM03 GM06 GM10 *** 4.32 *** 0.864 *** 2
GM02 GM03 GM08 GM06 GM10 *** 3.28 *** 0.656 *** 3
GM02 GM03 GM06 GM08 GM10 *** 2.56 *** 0.512 *** 4
GM02 GM03 GM06 GM10 GM08 *** 3.16 *** 0.632 *** 5
=======
GM09 GM02 GM03 GM06 GM08 GM10 *** 4.8 *** 0.8 *** 1
GM02 GM09 GM03 GM06 GM08 GM10 *** 5.92 *** 0.9867 *** 2
GM02 GM03 GM09 GM06 GM08 GM10 *** 4.72 *** 0.7867 *** 3
GM02 GM03 GM06 GM09 GM08 GM10 *** 3.76 *** 0.6267 *** 4
GM02 GM03 GM06 GM08 GM09 GM10 *** 3.12 *** 0.52 *** 5
GM02 GM03 GM06 GM08 GM10 GM09 *** 3.52 *** 0.5867 *** 6
Example of the construction of a framework map and Best-Fit Extension for the remaining markers:
map calculated by checking all possible combinations
marker GM03 was inserted
marker GM09 was inserted
marker GM08 was inserted
LG MARKER
POS #1# DST1 #2# DST2 #3# DST3 #S# SUMM #D# DIFF STATUS CLASS
2 G4553 0 #1# 0 #2# NNNNNN #3# NNNNNN #S# NNNNNN #D# NNNNNN NNNNNN NNNNN
2 M246 1 #1# 0.043 #2# 0.0213 #3# 0.0778 #S# 0.0643 #D# -0.0135 GOOD __0__
2 MI320 2 #1# 0.0213 #2# 0 #3# 0.0211 #S# 0.0213 #D# 0.0002 GOOD __0__
.. … .. .. … .. … .. … .. … .. … .. …
2 NGA1126 26 #1# 0.0225 #2# 0.0645 #3# 0.0702 #S# 0.087 #D# 0.0168 GOOD __0__
2 SGCSNP135 27 #1# 0.0645 #2# 0.0645 #3# 0.0842 #S# 0.129 #D# 0.0448 GOOD __0__
2 MI54 28 #1# 0.0645 #2# 0.0211 #3# 0.0968 #S# 0.0856 #D# -0.0112 GOOD __0__
2 VE014 29 #1# 0.0211 #2# 0.0532 #3# 0.0745 #S# 0.0743 #D# -0.0002 GOOD __0__
2 M283 30 #1# 0.0532 #2# 0.1803 #3# 0.1833 #S# 0.2335 #D# 0.0502 GOOD __1__
2 SGCSNP333 31 #1# 0.1803 #2# 0.1167 #3# 0.0968 #S# 0.297 #D# 0.2002 GOOD LARGE
2 SGCSNP210 32 #1# 0.1167 #2# 0 #3# 0.1154 #S# 0.1167 #D# 0.0013 GOOD __0__
2 COP1 33 #1# 0 #2# 0.0196 #3# 0.0263 #S# 0.0196 #D# -0.0067 GOOD __0__
2 SPL3 34 #1# 0.0196 #2# 0.04 #3# 0.0339 #S# 0.0596 #D# 0.0257 GOOD __0__
2 C4H 35 #1# 0.04 #2# 0.0227 #3# 0.0303 #S# 0.0627 #D# 0.0324 GOOD __0__
.. … .. .. … .. … .. … .. … .. … .. …
2 M336 54 #1# 0.0519 #2# 0.0625 #3# 0 #S# 0.1144 #D# 0.1144 GOOD __X__
2 UBIQUE 55 #1# 0.0625 #2# 0.0526 #3# 0.0619 #S# 0.1151 #D# 0.0532 GOOD __1__
2 MI79A 56 #1# 0.0526 #2# 0.0781 #3# 0.0645 #S# 0.1307 #D# 0.0662 GOOD __1__
2 ATHB7 57 #1# 0.0781 #2# 0.1579 #3# 0.1698 #S# 0.236 #D# 0.0662 GOOD __1__
2 SGCSNP214 58 #1# 0.1579 #2# 0.1429 #3# 0.1667 #S# 0.3008 #D# 0.1341 GOOD __X__
2 SGCSNP198 59 #1# 0.1429 #2# NNNNNN #3# NNNNNN #S# NNNNNN #D# NNNNNN NNNNNN NNNNN
MadMapper_XDELTA Map Output: text tab-delimited file with ordered markers and detailed info about adjacent recombination scores
ABC
A – marker aboveB – middle markerC – marker below
Distance[A-B]
Distance[B-C]
Distance[A-C]
[A-B] + [B-C]([A-B] + [B-C]) - [A-C]
################################################################## ## EXAMPLES OF SCORING: ## ## ## POSITIVE LINKAGE: ## ## AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*20 = 120 ## AAAAAAAAAAAAAAAAAAAA REC SCORE = 0 (0.0) ## .. ## AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*18 - 6*2 = 96 ## AAAAAAAAAAAAAAAAAABB REC SCORE = 2 (2/20 = 0.1) ## ## AAAAAAAAAABBBBBBBBBB BIT SCORE = 6*10 + 6*10 = 120 ## AAAAAAAAAABBBBBBBBBB REC SCORE = 0 (0.0) ## .. ## AAAAAAAAABABBBBBBBBB BIT SCORE = 6*18 - 6*2 = 96 ## AAAAAAAAAABBBBBBBBBB REC SCORE = 2 (2/20 = 0.1) ## ## ## NO LINKAGE: ## .......... ## AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*10 - 6*10 = 0 ## AAAAAAAAAABBBBBBBBBB REC SCORE = 10 (10/20 = 0.5) ## . . . . . . . . . . ## BBBAABBAAAAAAABAABBB BIT SCORE = 6*10 - 6*10 = 0 ## BABBAABBABABABBBAABA REC SCORE = 10 (10/20 = 0.5) ## ## ## NEGATIVE LINKAGE: ## .................. ## AAAAAAAAAAAAAAAAAAAA BIT SCORE = 6*2 - 6*18 = -96 ## AABBBBBBBBBBBBBBBBBB REC SCORE = 18 (18/20 = 0.9) ## .................. ## ABABABABABABABABABAB BIT SCORE = 6*2 - 6*18 = -96 ## ABBABABABABABABABABA REC SCORE = 18 (18/20 = 0.9) ## ##################################################################
################################################################## +-------+ GENOTYPES: ## | BIT | A – 1st; B – 2nd ## SCORING SYSTEM: | | C - NOT A ( H or B ) ## | REC | D - NOT B ( H or A ) ## +-------+ H - A and B ## ## . +-------+-------+-------+-------+-------+-------+ ## . | | | | | | | ## . | A | B | C | D | H | - | ## .| | | | | | | ## +-------*-------+-------+-------+-------+-------+-------+ ## | | 6 | -6 | -4 | 4 | -2 | 0 | ## | A | | | | | | | ## | | 0 | 1 | 1 | 0 | 0.5 | 0 | ## +-------+-------*-------+-------+-------+-------+-------+ ## | | -6 | 6 | 4 | -4 | -2 | 0 | ## | B | | | | | | | ## | | 1 | 0 | 0 | 1 | 0.5 | 0 | ## +-------+-------+-------*-------+-------+-------+-------+ ## | | -4 | 4 | 4 | -4 | 0 | 0 | ## | C | | | | | | | ## | | 1 | 0 | 0 | 1 | 0 | 0 | ## +-------+-------+-------+-------*-------+-------+-------+ ## | | 4 | -4 | -4 | 4 | 0 | 0 | ## | D | | | | | | | ## | | 0 | 1 | 1 | 0 | 0 | 0 | ## +-------+-------+-------+-------+-------*-------+-------+ ## | | -2 | -2 | 0 | 0 | 2 | 0 | ## | H | | | | | | | ## | | 0.5 | 0.5 | 0 | 0 | 0 | 0 | ## +-------+-------+-------+-------+-------+-------*-------+ ## | | 0 | 0 | 0 | 0 | 0 | 0 | ## | - | | | | | | | ## | | 0 | 0 | 0 | 0 | 0 | 0 | ## +-------+-------+-------+-------+-------+-------+-------*. ## ##################################################################
MadMapper_RECBIT Dataflow: Input and Output files
Genetic Map visualization using CheckMatrix: Two dimensional heat plot of recombinationscores between all pairs of markers
detection ofproblematic
marker
Inference of linear order of markers using MadMapper_XDELTA
MadMapper_RECBIT, MadMapper_XDELTA and CheckMatrix:Python programs to infer orders of genetic markers and for visualization andvalidation of genetic maps and haplotypes (detailed description of dataflow)
http://cgpdb.ucdavis.edu/XLinkage/MadMapper/Alexander Kozik and Richard Michelmore. UC Davis Genome Center
General procedure to construct a genetic map using the MadMapper suite:
1 – Grouping of markers using MadMapper_RECBIT 2 – Selection of up to ten core markers per linkage group 3 – Construction of frame map using core markers by checking all possible combinations 4 – Best-fit extension for remaining markers (optional shuffle/ripple function can dramatically improve map quality, however, it increases the time for map construction)
5 – Visualization of constructed map using CheckMatrix 6 – Examination of MadMapper_XDELTA text output files 7 – Attempt to re-map markers (if required) that do not fit well into major framework 8 – Construction, visualization and examination of final map
Once the large framework map is constructed, adding new markers does not require changing the order of core markers and can be done relatively fast. In this case, the framework map is used with a fixed order to find the best positions for new markers.
Analysis of MadMapper_RECBITtext output files provides:
1 – assignment of markers to particular linkage groups
2 – sorting of markers into different quality groups
3 – detection and discrimination of mis-scored markers
4 – selection of high quality markers to build core map
5 – creation of non-redundant set of markers for further map construction
TrueBin
Trio-analysis helps reveal markers that were most likely
misscored and should be dropped from further analysis
Side-by-side comparison of scores (JoinMap LOD, JoinMap recombination, MadMapper BIT and MadMapper haplotype distances – REC)
Best-Fit Extension:On each iteration of the best-fit extension, the proper position for the
newly added markercorresponds to the two-dimensional matrix with
the lowest entropy
Building of framework map:The number of comparisons that
have to done to check all possible orders of markers:
# of markers - # of comparisons3 markers – 3
4 markers – 125 markers – 60
6 markers – 3607 markers – 2,520
8 markers – 20,1609 markers – 181,440
10 markers – 1,814,400
Locus file with raw marker scores is used as initial input for MadMapper_RECBIT program
Input files for MadMapper_XDELTA are usually output files from
MadMapper_RECBIT
Iter
atio
ns
of
clu
ster
ing
wit
h in
crem
enta
l cu
toff
val
ues
Arbitrary group ID after each round of clustering
Alle
le c
om
po
siti
on
/dis
tort
ion
Allele composition/distortion[ excess of ‘B’ alleles in this particular case ]
Marker ID
Mar
ker
map
po
siti
on
or
rela
tive
ord
er[
ord
er in
th
is p
arti
cula
r ca
se ]
High density of markers
Low density of markers
lowest score for the best order of markers is highlighted in red
con
fid
ence
cla
ss f
or
corr
ect
mar
ker
po
siti
on
[ L
AR
GE
is b
ad ]
small absolute differenceis good, large is bad
sep
arat
ion
of
mar
kers
in
to t
wo
d
isti
nct
lin
kag
e g
rou
ps
information about framework markers
mis
sin
g s
co
res
ma
y
cre
ate
so
me
pro
ble
ms
w
he
n d
efi
nin
g B
INs
Mad
Map
per
BIT
Sco
rin
g M
atri
x
exam
ple
s o
f B
IT s
cori
ng
pai
rwis
e d
ista
nc
e m
atri
x
… continue until all markers are inserted and ordered
LG
_1
LG
_2
LG
_3
LG
_4
LG
_5
LG
_1
LG
_2
LG
_3
LG
_4
LG
_5
LG_1 LG_2 LG_3 LG_4 LG_5 LG_1 LG_2 LG_3 LG_4 LG_5
flanking marker 1
flanking marker 2
middlemarker
MadMapper_XDELTA works with non-redundant set of scores
framework markers are highlighted in red
negative linkage between markers
Locus file with raw marker scores:each allele is scored as ‘A’ or ‘B’
Marker ID
Generation of segregating population:Collection (set) of Recombinant Inbred Lines
after several steps of self-pollination
Genotyping – assignment of a particular allele scoreto each marker
It is a long process from obtaining a set of recombinant inbred lines (RILs) to its genotyping with a thousand markers or more. Management, data processing, and genetic mapping of thousands of markers simultaneously is not a trivial task. The MadMapper suite and CheckMatrix programs simplify genetic marker data manipulation and analysis. The suite has some features other genetic programs may lack. MadMapper and CheckMatrix perform well on large scale sets of genotyping data, such as data derived from SFP (single feature polymorphism) microarray analysis. Only one input file is required to accomplish map construction: the locus file with raw marker scores. However, there are several major steps and dozens of output files in the MadMapper pipeline. Understanding of the purpose of each step and output file is required for successful genetic mapping. This poster describes details of the dataflow.
Data source: http://elp.ucdavis.edu/ West MA, van Leeuwen H, Kozik A, Kliebenstein DJ, Doerge RW, St Clair DA, Michelmore RW. High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis. Genome Res. 2006 Jun;16(6):787-795. [ PubMed:16702412 ]
Example Project: Construction of high-density genetic map of Arabidopsis thaliana linkage group 1 based on Affymetrix microarray SFP genotyping data using MadMapper
STEPS 1-2: Marker grouping and selection of framework markers
STEPS 3-4-5: Map construction and visualization with CheckMatrix
Comparison of inferred order with physical location of genes on Arabidopsis genome:
Graphical genotyping:
Graphical genotyping - RILs are grouped and sorted according to their haplotype patterns:
Framework markers are highlighted in red
sort
ing
an
d g
rou
pin
g o
f R
ILs
Top Related