Quality Control of High Throughput EST ... - spotfire.co.kr
Transcript of Quality Control of High Throughput EST ... - spotfire.co.kr
European Spotfire Users Conference - Versailles, April 25-26, 2002
Quality Control of High Throughput EST & SAGE Sequence Production
Marcel de LeeuwIT director
2 Quality Control of High Throughput EST & SAGE Sequencing Production
Summary
I. Application fieldII. Why quality control?III. EST exampleIV. SAGE exampleV. Decision making
3 Quality Control of High Throughput EST & SAGE Sequencing Production
I. Application field
• Genomic resources for gene expression analysisExpressed Sequence Tags (ESTs)
• Direct gene expression analysisSerial Analysis of Gene Expression (SAGE)Discriminative Analysis of Clone Signature (DACS)
4 Quality Control of High Throughput EST & SAGE Sequencing Production
I. Application field
Customer
GENOME express
Expressed DNA
Plasmidvector with
insert
Insert
Hosts containing bothhost DNA & plasmid DNA
Resistant colonies
Amplified DNA
T7 SP6
Plasmidvector
Growth medium withanti-biotic
384
384 96 96/38496/384
96/384384384
Wet Dry
384
Analysis & Decision making
DNA extraction Sequencing reaction Sequencing Trimming, Masking, QC
5 Quality Control of High Throughput EST & SAGE Sequencing Production
I. Application field
T7 SP6
3primevector
5primevector EST insert
AACGTCTACCAAAAAAAAAAAAAAAATTGCAGATGGTTTTTTTTTTTTTTTT
T7
SP6
Mis-orientedinsert featuring
a polyT
6 Quality Control of High Throughput EST & SAGE Sequencing Production
Summary
I. Application fieldII. Why quality control and how?III. EST exampleIV. SAGE exampleV. Decision making
7 Quality Control of High Throughput EST & SAGE Sequencing Production
II. Why quality control and how ?
• Keep the production process goingspot equipment degradation before impactreject bad reagent lots
• Make the business model possibledecision making : test plates before HT sequencing‘outsourcing’: both customer and sub-contractor need to closely survey qualityvolume: small efficiency improvements yield €€…
8 Quality Control of High Throughput EST & SAGE Sequencing Production
II. Why quality control and how ?
• Process measurementsdosagesignal characteristics
• Basecaller outputconfidence in base ‘call’
• Knowledge about DNA contentsvector parts in case of cloned DNAregularly spaced tags
9 Quality Control of High Throughput EST & SAGE Sequencing Production
II. Why quality control and how ?
• Strong correlations of content with process QCGC or AT rich DNA needs sequencing reaction tuninghairpins & other primary structuresstretches of A’s in gene transcripts (ESTs)contamination from neighboring clones or previous plates
• Data mining issuesvolumecomplexityvariation of libraries
Need for an Advanced & Interactive Data Mining Application
10 Quality Control of High Throughput EST & SAGE Sequencing Production
Summary
I. Application fieldII. Why quality control and how?III. EST sequencingIV. SAGE sequencingV. Decision making
11 Quality Control of High Throughput EST & SAGE Sequencing Production
IV. EST sequencing
• dashboardprovides overview allowing to detect most common QC problems
• details-on-demand (HTML)allowing inspection of individual reads
source plate biassource plate bias
5’ vector masking coherence5’ vector masking coherence
read length histogramread length histogram
machine biasmachine bias global pieglobal pie
12 Quality Control of High Throughput EST & SAGE Sequencing Production
IV. EST sequencing
• main panein-depth view of QC accept/reject decision
Main
tlen0 100 200 300 400 500 600 700 800
-200
0
200
400
600
800
1000
QC rejected sequenceQC rejected sequence QC passed sequenceQC passed sequence
sequencing difficultysequencing difficulty
empty insertsempty inserts
bad quality sequencebad quality sequence
tilt & size indicatedouble clones
tilt & size indicatedouble clones
long poly Tlong poly T long poly Along poly A
13 Quality Control of High Throughput EST & SAGE Sequencing Production
IV. EST sequencing• QC inspection example
bias in 384 well view
14 Quality Control of High Throughput EST & SAGE Sequencing Production
IV. EST sequencing
• QC inspection example (contd.) dbc (quality ratio before/past 5’ vector end) vs. clone id
clone
quad - 1
quad - 3
quad - 2
quad - 4
rnt.007A01rnt.007O07rnt.008M06rnt.009K12rnt.010I19rnt.011G22rnt.012E22 rnt.007A01rnt.007O07rnt.008M06rnt.009K12rnt.010I19rnt.011G22rnt.012E220
10
20
30
40
50
0
10
20
30
40
50
5’ vector end
15 Quality Control of High Throughput EST & SAGE Sequencing Production
IV. EST sequencing
• QC example : Trimming plot reveals plate bias, despite strong machine influence
% low Q bases vs. length of trimmed sequencenumber of HQ reads strongly varies
25
79 125
tlen
plate - 7
plate - 10
plate - 8
plate - 11
plate - 9
plate - 12
0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800
0
1
2
3
4
5
0
1
2
3
4
5
plate7 8 9 10 11 12
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
16 Quality Control of High Throughput EST & SAGE Sequencing Production
Summary
I. Company & application fieldII. Why quality control and how?III. ApproachIV. EST sequencingV. SAGE sequencingVI. Decision making
17 Quality Control of High Throughput EST & SAGE Sequencing Production
V. SAGE sequencing
• principle of operationbase call, quality trimmingvector maskenzyme site matchingdi-tag extraction
>lsw6.001H06F.010830(M13.F1) 225 (46, 270) 3.7%NxxXxxXXXXXXXxxxXXXXXXXXXXXXXXXXXXXXXXXXXXXng CATGCC
AGGCTGCGTCCCTCCGGCGGAAGCCGCGGTCCATGGGCTCTGAATTAGGAGGGGCTGGGCCTTAAAACATGGGCCGCGTTCGCACCAACTAGCTAGGATTAAACATGGGCTGGGGGCCAGGGCTGTGCCGCCCACCCGGCCATGCCCCCGTGAAGCTGGGCGGAGGAGAATGTTTTCATGTATTACTTTTGTAGTTGGCCCGAGCTCAGgggcatgggcagaaAAgtgcTgaCCtcattctAAAGTGATGACCTTGAACTTTACTGTGGTAGGGCGTCTTCATCTCGCCTtgtcaaGAAA
>lsw6.001H06F.010830(M13.F1) 225 (46, 270) 3.7%NngxxXxxXXXXXXXxxxXXXXXXXXXXXXXXXXXXXXXXXXXXXCATGCCAGGCTGCGTCCCTCCGGCGGAAGCCGCGGTCCATGGGCTCTGAATTAGGAGGGGCTGGGCCTTAAAACATGGGCCGCGTTCGCACCAACTAGCTAGGATTAAACATGGGCTGGGGGCCAGGGCTGTGCCGCCCACCCGGCCATGCCCCCGTGAAGCTGGGCGGAGGAGAATGTTTTCATGTATTACTTTTGTAGTTGGCCCGAGCTCAGgggcatgggcagaaAAgtgcTgaCCtcattctAAAGTGATGACCTTGAACTTTACTGTGGTAGGGCGTCTTCATCTCGCCTtgtcaaGAAA
<CATG>………<CATG>3primevector
di-tagrank 1
<CATG>di-tagrank N
enzymesite
5primevector
SAGE concatemer
18 Quality Control of High Throughput EST & SAGE Sequencing Production
V. SAGE sequencing
• use of enzyme restriction site recognitionprovides indication of true base calling errors
di-tag start position in sequence (bp)
5
10
15
20
100 200 300 400 500 600 700
di-ta
gra
nkin
seq
uenc
e
19 Quality Control of High Throughput EST & SAGE Sequencing Production
V. SAGE sequencing
• other specific SAGE QC criteriaaverage phred quality vs. rankdistribution of di-tag length
di-tag length
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
100
300
500
di-tag rank2 4 6 8 10 12 14 16 18 20 22
10
30
50
di-ta
g av
erag
e qu
ality
di-tag quality
20 Quality Control of High Throughput EST & SAGE Sequencing Production
Summary
I. Company & application fieldII. Why quality control and how?III. ApproachIV. EST sequencingV. SAGE sequencingVI. Decision making
21 Quality Control of High Throughput EST & SAGE Sequencing Production
VI. Decision making
• Decision processeslibrary qualitylibrary contentsprocess quality
Generate ExpressionLibrary
insufficientDiagnosticsAdapt Conditions
Customer
GENOME express
Post-process Check Process Quality
Check Process Quality
Reprocess Selected Plates or Clones
3prime Sequencing Demandfor (EST) Clones of Interest
OK
Create Library Lotor Test Plate
Check LibrarCheck Library
Quality y Quality
Normalize and/or Subtract (EST)
Extract Sequence
OK Insert sizePolyA size(Redundancy)
Analyze Library Contents
Success ratePlate biasMachine bias
insufficient
Diagnostics