Quality Control of High Throughput EST ... - spotfire.co.kr

21
European Spotfire Users Conference - Versailles, April 25-26, 2002 Quality Control of High Throughput EST & SAGE Sequence Production Marcel de Leeuw IT director

Transcript of Quality Control of High Throughput EST ... - spotfire.co.kr

Page 1: Quality Control of High Throughput EST ... - spotfire.co.kr

European Spotfire Users Conference - Versailles, April 25-26, 2002

Quality Control of High Throughput EST & SAGE Sequence Production

Marcel de LeeuwIT director

Page 2: Quality Control of High Throughput EST ... - spotfire.co.kr

2 Quality Control of High Throughput EST & SAGE Sequencing Production

Summary

I. Application fieldII. Why quality control?III. EST exampleIV. SAGE exampleV. Decision making

Page 3: Quality Control of High Throughput EST ... - spotfire.co.kr

3 Quality Control of High Throughput EST & SAGE Sequencing Production

I. Application field

• Genomic resources for gene expression analysisExpressed Sequence Tags (ESTs)

• Direct gene expression analysisSerial Analysis of Gene Expression (SAGE)Discriminative Analysis of Clone Signature (DACS)

Page 4: Quality Control of High Throughput EST ... - spotfire.co.kr

4 Quality Control of High Throughput EST & SAGE Sequencing Production

I. Application field

Customer

GENOME express

Expressed DNA

Plasmidvector with

insert

Insert

Hosts containing bothhost DNA & plasmid DNA

Resistant colonies

Amplified DNA

T7 SP6

Plasmidvector

Growth medium withanti-biotic

384

384 96 96/38496/384

96/384384384

Wet Dry

384

Analysis & Decision making

DNA extraction Sequencing reaction Sequencing Trimming, Masking, QC

Page 5: Quality Control of High Throughput EST ... - spotfire.co.kr

5 Quality Control of High Throughput EST & SAGE Sequencing Production

I. Application field

T7 SP6

3primevector

5primevector EST insert

AACGTCTACCAAAAAAAAAAAAAAAATTGCAGATGGTTTTTTTTTTTTTTTT

T7

SP6

Mis-orientedinsert featuring

a polyT

Page 6: Quality Control of High Throughput EST ... - spotfire.co.kr

6 Quality Control of High Throughput EST & SAGE Sequencing Production

Summary

I. Application fieldII. Why quality control and how?III. EST exampleIV. SAGE exampleV. Decision making

Page 7: Quality Control of High Throughput EST ... - spotfire.co.kr

7 Quality Control of High Throughput EST & SAGE Sequencing Production

II. Why quality control and how ?

• Keep the production process goingspot equipment degradation before impactreject bad reagent lots

• Make the business model possibledecision making : test plates before HT sequencing‘outsourcing’: both customer and sub-contractor need to closely survey qualityvolume: small efficiency improvements yield €€…

Page 8: Quality Control of High Throughput EST ... - spotfire.co.kr

8 Quality Control of High Throughput EST & SAGE Sequencing Production

II. Why quality control and how ?

• Process measurementsdosagesignal characteristics

• Basecaller outputconfidence in base ‘call’

• Knowledge about DNA contentsvector parts in case of cloned DNAregularly spaced tags

Page 9: Quality Control of High Throughput EST ... - spotfire.co.kr

9 Quality Control of High Throughput EST & SAGE Sequencing Production

II. Why quality control and how ?

• Strong correlations of content with process QCGC or AT rich DNA needs sequencing reaction tuninghairpins & other primary structuresstretches of A’s in gene transcripts (ESTs)contamination from neighboring clones or previous plates

• Data mining issuesvolumecomplexityvariation of libraries

Need for an Advanced & Interactive Data Mining Application

Page 10: Quality Control of High Throughput EST ... - spotfire.co.kr

10 Quality Control of High Throughput EST & SAGE Sequencing Production

Summary

I. Application fieldII. Why quality control and how?III. EST sequencingIV. SAGE sequencingV. Decision making

Page 11: Quality Control of High Throughput EST ... - spotfire.co.kr

11 Quality Control of High Throughput EST & SAGE Sequencing Production

IV. EST sequencing

• dashboardprovides overview allowing to detect most common QC problems

• details-on-demand (HTML)allowing inspection of individual reads

source plate biassource plate bias

5’ vector masking coherence5’ vector masking coherence

read length histogramread length histogram

machine biasmachine bias global pieglobal pie

Page 12: Quality Control of High Throughput EST ... - spotfire.co.kr

12 Quality Control of High Throughput EST & SAGE Sequencing Production

IV. EST sequencing

• main panein-depth view of QC accept/reject decision

Main

tlen0 100 200 300 400 500 600 700 800

-200

0

200

400

600

800

1000

QC rejected sequenceQC rejected sequence QC passed sequenceQC passed sequence

sequencing difficultysequencing difficulty

empty insertsempty inserts

bad quality sequencebad quality sequence

tilt & size indicatedouble clones

tilt & size indicatedouble clones

long poly Tlong poly T long poly Along poly A

Page 13: Quality Control of High Throughput EST ... - spotfire.co.kr

13 Quality Control of High Throughput EST & SAGE Sequencing Production

IV. EST sequencing• QC inspection example

bias in 384 well view

Page 14: Quality Control of High Throughput EST ... - spotfire.co.kr

14 Quality Control of High Throughput EST & SAGE Sequencing Production

IV. EST sequencing

• QC inspection example (contd.) dbc (quality ratio before/past 5’ vector end) vs. clone id

clone

quad - 1

quad - 3

quad - 2

quad - 4

rnt.007A01rnt.007O07rnt.008M06rnt.009K12rnt.010I19rnt.011G22rnt.012E22 rnt.007A01rnt.007O07rnt.008M06rnt.009K12rnt.010I19rnt.011G22rnt.012E220

10

20

30

40

50

0

10

20

30

40

50

5’ vector end

Page 15: Quality Control of High Throughput EST ... - spotfire.co.kr

15 Quality Control of High Throughput EST & SAGE Sequencing Production

IV. EST sequencing

• QC example : Trimming plot reveals plate bias, despite strong machine influence

% low Q bases vs. length of trimmed sequencenumber of HQ reads strongly varies

25

79 125

tlen

plate - 7

plate - 10

plate - 8

plate - 11

plate - 9

plate - 12

0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800

0

1

2

3

4

5

0

1

2

3

4

5

plate7 8 9 10 11 12

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 16: Quality Control of High Throughput EST ... - spotfire.co.kr

16 Quality Control of High Throughput EST & SAGE Sequencing Production

Summary

I. Company & application fieldII. Why quality control and how?III. ApproachIV. EST sequencingV. SAGE sequencingVI. Decision making

Page 17: Quality Control of High Throughput EST ... - spotfire.co.kr

17 Quality Control of High Throughput EST & SAGE Sequencing Production

V. SAGE sequencing

• principle of operationbase call, quality trimmingvector maskenzyme site matchingdi-tag extraction

>lsw6.001H06F.010830(M13.F1) 225 (46, 270) 3.7%NxxXxxXXXXXXXxxxXXXXXXXXXXXXXXXXXXXXXXXXXXXng CATGCC

AGGCTGCGTCCCTCCGGCGGAAGCCGCGGTCCATGGGCTCTGAATTAGGAGGGGCTGGGCCTTAAAACATGGGCCGCGTTCGCACCAACTAGCTAGGATTAAACATGGGCTGGGGGCCAGGGCTGTGCCGCCCACCCGGCCATGCCCCCGTGAAGCTGGGCGGAGGAGAATGTTTTCATGTATTACTTTTGTAGTTGGCCCGAGCTCAGgggcatgggcagaaAAgtgcTgaCCtcattctAAAGTGATGACCTTGAACTTTACTGTGGTAGGGCGTCTTCATCTCGCCTtgtcaaGAAA

>lsw6.001H06F.010830(M13.F1) 225 (46, 270) 3.7%NngxxXxxXXXXXXXxxxXXXXXXXXXXXXXXXXXXXXXXXXXXXCATGCCAGGCTGCGTCCCTCCGGCGGAAGCCGCGGTCCATGGGCTCTGAATTAGGAGGGGCTGGGCCTTAAAACATGGGCCGCGTTCGCACCAACTAGCTAGGATTAAACATGGGCTGGGGGCCAGGGCTGTGCCGCCCACCCGGCCATGCCCCCGTGAAGCTGGGCGGAGGAGAATGTTTTCATGTATTACTTTTGTAGTTGGCCCGAGCTCAGgggcatgggcagaaAAgtgcTgaCCtcattctAAAGTGATGACCTTGAACTTTACTGTGGTAGGGCGTCTTCATCTCGCCTtgtcaaGAAA

<CATG>………<CATG>3primevector

di-tagrank 1

<CATG>di-tagrank N

enzymesite

5primevector

SAGE concatemer

Page 18: Quality Control of High Throughput EST ... - spotfire.co.kr

18 Quality Control of High Throughput EST & SAGE Sequencing Production

V. SAGE sequencing

• use of enzyme restriction site recognitionprovides indication of true base calling errors

di-tag start position in sequence (bp)

5

10

15

20

100 200 300 400 500 600 700

di-ta

gra

nkin

seq

uenc

e

Page 19: Quality Control of High Throughput EST ... - spotfire.co.kr

19 Quality Control of High Throughput EST & SAGE Sequencing Production

V. SAGE sequencing

• other specific SAGE QC criteriaaverage phred quality vs. rankdistribution of di-tag length

di-tag length

12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46

100

300

500

di-tag rank2 4 6 8 10 12 14 16 18 20 22

10

30

50

di-ta

g av

erag

e qu

ality

di-tag quality

Page 20: Quality Control of High Throughput EST ... - spotfire.co.kr

20 Quality Control of High Throughput EST & SAGE Sequencing Production

Summary

I. Company & application fieldII. Why quality control and how?III. ApproachIV. EST sequencingV. SAGE sequencingVI. Decision making

Page 21: Quality Control of High Throughput EST ... - spotfire.co.kr

21 Quality Control of High Throughput EST & SAGE Sequencing Production

VI. Decision making

• Decision processeslibrary qualitylibrary contentsprocess quality

Generate ExpressionLibrary

insufficientDiagnosticsAdapt Conditions

Customer

GENOME express

Post-process Check Process Quality

Check Process Quality

Reprocess Selected Plates or Clones

3prime Sequencing Demandfor (EST) Clones of Interest

OK

Create Library Lotor Test Plate

Check LibrarCheck Library

Quality y Quality

Normalize and/or Subtract (EST)

Extract Sequence

OK Insert sizePolyA size(Redundancy)

Analyze Library Contents

Success ratePlate biasMachine bias

insufficient

Diagnostics