Genotyping and Genetic Maps Bas Heijmans Leiden University Medical Centre The Netherlands
description
Transcript of Genotyping and Genetic Maps Bas Heijmans Leiden University Medical Centre The Netherlands
Genotyping and Genetic Maps
Bas HeijmansLeiden University Medical CentreThe Netherlands
111122222
123412345
001100111
002200222
111111111
132200565
243400877
Pedigree file in linkage format
122112121
111122222
123412345
001100111
002200222
111111111
132200565
243400877
Pedigree file in linkage format
122112121
family id
person id
fathermother
sexdisease sta
tus
marker data (1 marke
r)
Marker choice for genome-wide linkage scans
Short tandem repeats (STR, a.k.a. microsatellites) because:
• High heterozygosity (1 STR ~ 5 SNPs)
• There are more than enough (1/30kb thus >>1/cM)
• Reliable genetic maps (Marshfield, Decode)
• Optimized marker sets, spacing down to 5cM (Marshfield/Applied Biosystems)
• Reasonably automated measurement (2 persons 40,000 checked genotypes in database per week)
• Low cost per genotype (<$0.15 for consumables)
• Reasonable success and error rates (>92% and <0.8%)
Short tandem repeats
AACTAACTAACTAACTTTGATTGATTGATTGAAACTAACTTTGATTGA
Paternalallele
Maternalallele
4 repeats
2 repeats
Tetranucleotide repeat:
Short tandem repeats
AACTAACTAACTAACTTTGATTGATTGATTGAAACTAACTTTGATTGA
Paternalallele
Maternalallele
4 repeats
2 repeats
Tetranucleotide repeat:
CACACACACACACACAGTGTGTGTGTGTGTGTCACACAGTGTGT
Paternalallele
Maternalallele
8 repeats
3 repeats
Dinucleotide repeat:
And there also are tri- and pentanucleotide repeats….
Principle of genotyping methods
CACACACACACACACAGTGTGTGTGTGTGTGTCACACAGTGTGT
• Short tandem repeats length differences
GCAT
• SNPs only sequence difference
• Destruction restriction site (RFLP)• Hybridization differences (TaqMan)• One base-pair sequencing reaction- primer extension (Sequenom, Orchid)• Ligation assay (Illumina)
• VNTR, insertion/deletion polymorphisms (1 bp to ~300 bp for Alu repeat)
Genotyping STRs – step 1: PCR
Genotyping STRs – step 1: PCR
CACAGTGT
20 3525 4 20 104 bp+ + + + =
CACACACAGTGTGTGT
20 3525 8 20 108 bp+ + + + =
genomic DNA+
primers+
Taq DNA polymerase+
dNTPs (ACGT)+
buffer
Genotyping STRs – step 1: PCR in practice
Agarose or polyacrylamide slab gel• DNA is negatively charged• Longer fragments migrate slower than shorter ones through polymer network.
— electrode
+electrode
Genotyping STRs – step 2: electophoresisDetect length differences
To scan the whole human genome…
• 1 short tandem repeat every 10 cM
• makes 400 markers per individual
• Assuming 1000 individuals (preferably 1000s)
• One whole genome scan = 400,000 genotypings
Not like this…….
Not like this……. but like this
96-well plates
384-well plates
Not like this…….
Not like this……. but like this
Not like this…….
Not like this……. but like this
• 96 capillaries (no lanes) (ABI3700)• Put in machine and all goes automatically• Primers are labelled with fluorescent dye• Machine detects PCR products through a laser
Electrophoresis using automated sequencer
TCTCAGAG
TGTGTGACACAC
GTGTCACA
CACAGTGT
Typically 15 markers in one capillary: start
2.5 h
A bit later
Laser
Detector
-
+
Through-put
A 384-well plate taking about one night
• 384 samples minus 16 controls = 368
• 15 markers per sample
• makes 5520 genotypes (if succes rate 100%)
Tetranucleotide repeat marker (e.g. multiples of AACT)
• Detected length of PCR product depends on machine
• Standards are used to correct this (CEPH DNA samples)
• Take this into account when analysing data from different machines/labs
Dinucleotide repeat marker (e.g. multiples of CA)
• Dinucleotide repeats give less clean pictures but in practice this is no problem as long as pattern is always the same
• However, markers not in standard 10 cM screening sets often are more problematic (different stutter patterns for different samples, non-constant ratio ‘real peak’/plus-A peak) increased error rates?
The result: allele lengths
CACAGTGT
20 3525 4 20 104 bp+ + + + =
CACACACAGTGTGTGT
20 3525 8 20 108 bp+ + + + =
111122222
123412345
001100111
002200222
111111111
Pedigree file in linkage format
122112121
102106104104
00
111112111
104110106110
00
118114114
Raw marker data
132200565
243400877
Renumbered data
Genetic map of measured markers
For IBD estimation using Merlin or other software • Pedigree file
• Genetic map
Markers measured on chromosome 19
16 markersd19s247d19s1034d19s391d19s865d19s394d19s588d19s49d19s433 d19s47d19s420d19s178apoc2d19s246d19s180d19s210d19s254
Genetic maps
Available from
• Marshfield Center for Medical Genetics http://research.marshfieldclinic.org/genetics/
• Decode Genetics (most accurate) Supplemental data to Kong et al. Nat Genet 2002;31:241-7. see F:\Bas\Genotyping&Maps\DecodeMap.xls
Merlin Map File
CHROMOSOME MARKER LOCATION19 d19s247 9.8419 d19s1034 20.7519 d19s391 28.8319 d19s865 32.3919 d19s394 34.2519 d19s588 42.2819 d19s49 50.8119 d19s433 51.88 19 d19s47 63.1019 d19s420 66.3019 d19s178 68.0819 apoc2 69.5019 d19s246 78.0819 d19s180 87.6619 d19s210 100.0119 d19s254 100.61