Automation Challenges in Next
Generation Sequencing
Robin Coope
Group Leader, Instrumentation
BC Cancer Agency Genome Sciences Centre
Outline
• The BCGSC’s Scale of Operations
• Current Automation Projects and Lessons
• Challenges of Small Volume Pipetting
• Automating Fragment Analysis
Calculations
• Automated Gel-Based Size Selection
BCGSC Monthly Total Libraries
14 HiSeqs, 3 MiSeqs, 8 Robots from 3 manufacturers
~ 50 Staff in high throughput lab groups.
BCGSC Monthly Pooled Library Output
12,500 Libraries per year
Biospecimens
RNA Isolation cDNA
Synthesis Normalization &
Shearing
End Repair A Tailing,
Adapter Ligation
PCR (addition of indexes)
Size Selection
Normalization and pooling
Sequencing
QC QC
Shearing and Amplicon Construction.
QC QC
QC
QC
QC
Four Plate PCR-Free WGS
• A new protocol based on Illumina’s but with
more automation friendly labware
• Fully utilizes Biomek FX deck, grippers and
plate stackers to enable library construction
of four plates in six hours (w/o QC)
• One FX program with nine integrated
stages
HL60 – PCR Free
HL60 – Normal (PCR) Library
Construction
MultiMACS and Nimbus for mRNA
RNA isolation with MuliMACS
allows Poly-A mRNA capture
while retaining the flow
through which can be used to
isolate miRNA
We are automating its
operation with a Hamilton
Nimbus. It has 1-1000ul
pipetting capacity and can
work with the MultiMACS
geometry.
Renormalization and Pooling
• Implementing a PE
Janus for up-front
quantification,
normalization and re-
arrays
• A Span-8 has been
acquired from our
clinical partners for post
library construction
pooling and re-arrays
Hard Lessons
• High throughput production protocols need to be robust and easily maintainable.
• This means formal standards in program structure and liquid handling methods must apply.
• Pairing engineers with biologists to develop automation protocols with clear distinctions in mandate but great communication is most successful.
Where Liquid Handler Software
Should Be
Complexity of
User Interface
Most Commercial
Liquid Handlers
Power of Programming Environment
The Real Sweet Spot
Pipetting
or
Things we discovered with difficulty that
may have been obvious to others
QC Relies on Accurate Pipetting
• Liquid transfers below 2ul may depend on:
– Plate geometry
– Tip geometry
– Temperature
– Liquid properties
– Liquid surface angle
– Humidity
The Heartbreak of Backpressure
Aspirate 2ul
Consider the following situation of successive transfers
The Heartbreak of Backpressure Consider the following situation of successive transfers
Dispense 2ul but with an added
dispense motion to “blow out” the tips
The Heartbreak of Backpressure
In air, aspirate extra
displacement for the
“blow out” step
Consider the following situation of successive transfers
But any leftover droplets
may cause a vacuum to
form in the tips
The Heartbreak of Backpressure Consider the following situation of successive transfers
On next aspiration, the
negative pressure from the
liquid seal and pre-aspirate
in some tips will cause
over-aspiration of the
sample
The Heartbreak of Backpressure
2 ul pipetted with a 1 ul blowout
2 ul pipetted with a 2 ul blowout
The Heartbreak of Backpressure
2 ul pipetted with a 4 ul blowout
The Heartbreak of Backpressure
The Heartbreak of Backpressure
2 ul pipetted with a 10 ul blowout
The Heartbreak of Surface
Tension
• Using an Eosyn dye assay, we
observed up to 20% variability in 2ul,
transfers depending on the surface
angle of the liquid. Successive single
transfers were not consistent.
• Using the same protocol with the Artel
MVS QC system, this effect was not
observed.
• It was also not observed with DNA
transfers.
intelligent Fluorescent Units
Illumina Clusters
• Higher molarity (number of
fragments) = denser clusters
• Shorter fragments cluster
more efficiently
• The optimum density is
4.6million clusters/tile with
100 tiles per lane.
• Too high density prevents
base calling
• Runs cost ~$2500/lane
20 MICRONS
Molarity Estimation
• To get the molarity you have
to get the mass and average
size of DNA fragments.
• The mass is obtained by a
fluorescent assay or qPCR
• Average size is generally
determined by visual
estimation from an “Agilent”
Trace”
• But this plot doesn’t show
molarity, it shows mass.
intelligent Fluorescent Units
(iFU)
x
ssnMW
x
smssFU DNA
x
s-2 s-1 s s+1 s+2
l
sv
ds
dv
tds
dv
t
svsvx
11
ds
dv
s
svskFUsn
n(s) is the sample molarity
dsds
dv
s
svsFUkQuantMass
0
so
Consider a capillary sections and DNA of
a mass density of ρ(s). s is for size.
The DNA moves with velocity ν(s) so:
and
To use this you have to determine v(s) from DNA ladder.
An Empirical Method 1. Calculate the area under each ladder band peak
2. Divide each peak’s corresponding mass concentration
by its area and generate a correction factor curve
3. Divide the FU signal of the sample by the correction
curve to generate a corrected mass concentration curve
4. Convert to molarity by dividing by size
The Two Methods Agree
The two methods of calculation have a 94% correlation
Complementary Errors
1. The x-axis is linear in
time not size, so larger
sizes are more bunched
together.
2. Linearizing the X-axis
increases the weight of
larger sizes.
3. Converting to molarity
increases the weight of
smaller sizes.
4. These visual errors
cancel for well shaped
peaks.
An Extreme Example
Correlation to Aligned Sequence
Top: Actual library fragment sizes
found from paired end sequencing and
alignment of an amplicon pool.
Middle: A plot of the same pool
showing mass vs. linearized size.
Bottom: A plot of the same pool
showing molarity calculated by iFU
using the empirical method.
For n=8 amplicons, correlation was 60% for
Agilent and 70% for iFU. For n=51 WGS
samples, correlation was >90% for both
Weighting for Cluster Efficiency
0
0
dssn
dssOLCsn
c
y = 4.8843ln(x) - 15.717 R² = 0.9622
0
2
4
6
8
10
12
14
16
18
20
0 200 400 600 800 1000
OLC
(p
M)
Size [bp]
Optimal Loading Concentration (OLC) vs. Size
The relationship of mean size to clustering efficiency is shown in the
empirically determined curve above. This gives a function OLC(s) which is
then used to find the optimal loading concentration using the following
formula. n(s) is the molarity as a function of size:
Weighting for Cluster Efficiency
3.50E+06
4.00E+06
4.50E+06
5.00E+06
5.50E+06
6.00E+06
6.50E+06C
lust
ers
Pe
r Ti
le
Amplicon Samples
Cluster Density
Cluster Density
iFU Prediction Values
Target Cluster Density
This retrospective prediction assumes a linear relationship between loading
concentration and cluster density. The iFU with OLC(s) weighting reduces the
cluster density error by an average of 60% for these (n=28) amplicons.
qPCR vs. Intercalating Dye
R² = 0.6982
0
20
40
60
80
100
120
140
200 300 400 500 600 700 800
Clu
ster
ing
effi
cien
cy
(K/m
m2
per
pM
)
Average library size (bp)
MiSeq library clustering efficiency - using qPCR quant All types
CTL
WGS
PCR-free WGS
WGBS
RNA-Seq
Amplicon
Direct-amplicon
Linear (All types)
R² = 0.0744
0
20
40
60
80
100
120
140
200 300 400 500 600 700 800
Clu
ster
ing
effi
cien
cy
(K/m
m2
per
pM
)
Average library size (bp)
MiSeq library clustering efficiency - using Qubit quant All types
CTL
WGS
PCR-free WGS
WGBS
RNA-Seq
Amplicon
Direct-amplicon
Linear (All types)
Conclusions
• For “tight” libraries visual estimation of mean
insert size gives good results and cluster errors
are more likely to be from mass quantification
errors than insert size estimation errors.
• For wider library distributions like amplicons,
iFU improves the correlation to the aligned
sequence by ~13% over the mass plot.
• For cluster generation, our prediction method
shows a reduction in cluster density error by
60% in this set (n=28) of amplicons.
Gel Based Size Selection
Why Gel Based Size Selection
• To purify small or large fragments, like
miRNA or kilobase+ fragments
• To mitigate PCR induced biases in
sensitive libraries such as for mRNA
• To rescue libraries where shearing is off
target and beads may miss the “peak of
peaks”
Barracuda
The GSC has built four 96 channel
size selection robots and run over
10,000 samples since 2010.
In Q3-Q4 of 2013, Coastal
Genomics will release The
Ranger based on this technology.
Hole-to-Hole Size Selection
60mm
miRNA Size Selection
N.B. All of the miRNA sequences in The Cancer Genome Atlas Project
were size selected on Barracuda
Mid Length Range
Target: 275-325 bp Target: 175-225bp
Two size selections of NGS libraries prior to adapter ligation
Before size selection
After size selection
Amplicons
A 1kb target before and after Amplicon before and after
10kb Target from HL60
A 9-13kb Target Region
This is an
Agilent artifact!
10kb Target From Sheared
gDNA
18kb Target from Ladder
A: 1kb Extended ladder
B: 2.5kb Molecular Ruler
C: A+B
A2,B2 Target 18-50kb
A3,B3 Target 25-50kb
Barracuda Control Software
Barracuda uses a single pipettor so each sample is loaded and runs
independently.
Performance
• Recovery efficiency is 90% for 10kb
bands.
• Time: Barracuda, with a single channel
can take up to 7.5 hours for a full plate
• Coastal Genomics’ Ranger will have 12
channels so expect full plates in ~1hr
depending on protocol.
Acknowlegements
Instrumentation
Lok Tin Lam
Steven Pleasance
Duane Smailus
Philip Tsao
Quality Assurance
Miruna Bala
Susan Wagner Andrew Mungall
and the Library
Construction
Team
Cluster Density Estimation
Rod Docking
Lateef Yang
Richard Corbett
Aly Karsan
Coastal Genomics
Jared Slobodan
Matt Nesbitt
Tech D, past and present
Martin Hirst
Michelle Moksa
Yongjun Zhao
Pawan Pandoh
Richard Moore
Michael Mayo and
The Sequencing Team
Quality Control in NGS
• You want to have the right amount of DNA
going into the protocol and know it’s not
“degraded”
• You want to know how much and what
size DNA is coming out of the protocol, so
you can get the right cluster density.
• In theory this is simple but sample
diversity and length dependent protocol
steps (shearing, bead cleans) makes it
complex
Declaration
Robin is a co-inventor of the size selection
technology described here and a board
member of Coastal Genomics.
Top Related