Alignment of raw reads in Avadis NGS
-
Upload
strand-life-sciences-pvt-ltd -
Category
Technology
-
view
687 -
download
0
description
Transcript of Alignment of raw reads in Avadis NGS
© Strand Life SciencesStrictly Confidential
PioneeringScientific Intelligence
DNA/Small RNA Alignment in Avadis NGS 1.3
© Strand
Questions we will seek to answer in this presentation
What is an Alignment algorithm?
What issues must an Alignment algorithm consider?
How do Alignment algorithms work?
How does CoBWeb work?
How does CoBWeb compare with other algorithms?
How is CoBWeb exposed in Avadis NGS?
What is the future evolution of CoBWeb?
© Strand
What is an Alignment algorithm?
© Strand
AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC
Subject’s Genome
AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTCReference Genome,
close but not quite the
same as the Subject’s Genome
© Strand
What issues must an Alignment algorithm consider?
© Strand
Reads
Reference Genome
SNP
Deletion
Mismatches and Gaps
© Strand
Subject’s Genome
Reference Genome
×
Handling paired reads
Repeat Region
Repeat Region
© Strand
A variety of Read Lengths
Short reads ~50, few
mismatches and gaps
Long reads, few
hundreds to thousands, many more mismatches
and gaps
© Strand
Speed and Memory
Run in 4GB RAM Allow use of
multiple cores/proces
sors
Scale speed with more memory
Billions of reads.
© Strand
How do Alignment algorithms work?
© Strand
Indexing the Genome to find Seed Matches
Scanning the Reference for each
Read takes too long
The Index very quickly yields
locations in the Reference where some part (seed)
of the Read matches.
The Reference Index
This Seed occurs at Reference
locations x1, x2…
This Seed occurs at Reference
locations x3, x4…
© Strand
Detailed Alignment at Seed Match Locations
Reference
Read
Seed Match
How many Mismatches and
Gaps are needed for the Read to match around the Seed?
Smith-Waterman or Dynamic
Programming
© Strand
C G A C $
A C $ C GC G A C $C $ C G AG A C $ C$ C G A C
The original
Reference
This column is the BWT
20314
All its circular shifts, sorted
lexicographically
Circular Shift Indices
The Index comprises these along with some
housekeeping data structures
The Burrows-Wheeler
based Index
These can be sampled to fit into reduced memory at
the expense of speed without
sacrificing correctness
© Strand
The Burrows-Wheeler
based Index
All Exact Matches of a Read (NO Mismatches or Gaps) in the Reference can be found in time proportional to the
length of the Read and largely independent of the
size of the Reference.
Reference
Read
EXACT Match
© Strand
How does CoBWeb work?
© Strand
Seeding Strategy
Use the BW based index, augmented
with additional data structures
for speed, to find one or more Long Seed Matches in the Reference
This 15-mer occurs at
locations x1, x2…
This 15-mer occurs at
locations x3, x4… This whole 30-
mer occurs at location x5
Justification: Most long Reads do not have
Mismatches and Gaps strewn across their length; there are
usually long stretches that match exactly.
And Long Seeds will have few
matching locations.
© Strand
Advantages
Seed length is not specified in advance,
so Long and Short reads can be handled
seamlessly.
Separating the Smith-Waterman phase
from the BW Index search allows an
unlimited number of gaps and
mismatches.
© Strand
How does CoBWeb compare with other algorithms?
© Strand
Comparison with BWA
Read Length 50
Read Length
150
A little faster than BWA with
comparable results
CoBWeb: 94%
Alignment Score with
up to 2 Gaps
BWA: 4% error + 1
gap of possibly multiple length
© Strand
How is CoBWeb exposed in Avadis NGS?
© Strand
Entry
Two new experiment types, DNA Alignment
and Small-RNA Alignment
© Strand
The Alignment Workflow
Run Alignment, and then create a DNA Variant or ChIP-Seq
Experiment from the results.
© Strand
Alignment Parameters
Specify number of Mismatches and
Gaps, and handling of Multiple Matching.
Specify Adaptor Trimming (only for
Small RNA) and 3’,5’ trimming based on
quality
Screen against Contaminant Databases.
© Strand
What is the future evolution of CoBWeb?
© Strand
ToDos
Chimeric Reads
RNA-Seq Alignment
Base Quality
recalibration
Affine Gap Costs