Alignment of raw reads in Avadis NGS

26
© Strand Life Sciences Strictly Confidential Pioneering Scientific Intelligence DNA/Small RNA Alignment in Avadis NGS 1.3

description

Avadis NGS provides support for aligning raw reads for small RNA, ChIP-Seq and DNA-Seq analysis. The alignment algorithm"COBWeb" integrated with Avadis NGS is a new proprietary algorithm based on the Burrows Wheeler Transform.

Transcript of Alignment of raw reads in Avadis NGS

Page 1: Alignment of raw reads in Avadis NGS

© Strand Life SciencesStrictly Confidential

PioneeringScientific Intelligence

DNA/Small RNA Alignment in Avadis NGS 1.3

Page 2: Alignment of raw reads in Avadis NGS

© Strand

Questions we will seek to answer in this presentation

What is an Alignment algorithm?

What issues must an Alignment algorithm consider?

How do Alignment algorithms work?

How does CoBWeb work?

How does CoBWeb compare with other algorithms?

How is CoBWeb exposed in Avadis NGS?

What is the future evolution of CoBWeb?

Page 3: Alignment of raw reads in Avadis NGS

© Strand

What is an Alignment algorithm?

Page 4: Alignment of raw reads in Avadis NGS

© Strand

AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC

Subject’s Genome

AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTCReference Genome,

close but not quite the

same as the Subject’s Genome

Page 5: Alignment of raw reads in Avadis NGS

© Strand

What issues must an Alignment algorithm consider?

Page 6: Alignment of raw reads in Avadis NGS

© Strand

Reads

Reference Genome

SNP

Deletion

Mismatches and Gaps

Page 7: Alignment of raw reads in Avadis NGS

© Strand

Subject’s Genome

Reference Genome

×

Handling paired reads

Repeat Region

Repeat Region

Page 8: Alignment of raw reads in Avadis NGS

© Strand

A variety of Read Lengths

Short reads ~50, few

mismatches and gaps

Long reads, few

hundreds to thousands, many more mismatches

and gaps

Page 9: Alignment of raw reads in Avadis NGS

© Strand

Speed and Memory

Run in 4GB RAM Allow use of

multiple cores/proces

sors

Scale speed with more memory

Billions of reads.

Page 10: Alignment of raw reads in Avadis NGS

© Strand

How do Alignment algorithms work?

Page 11: Alignment of raw reads in Avadis NGS

© Strand

Indexing the Genome to find Seed Matches

Scanning the Reference for each

Read takes too long

The Index very quickly yields

locations in the Reference where some part (seed)

of the Read matches.

The Reference Index

This Seed occurs at Reference

locations x1, x2…

This Seed occurs at Reference

locations x3, x4…

Page 12: Alignment of raw reads in Avadis NGS

© Strand

Detailed Alignment at Seed Match Locations

Reference

Read

Seed Match

How many Mismatches and

Gaps are needed for the Read to match around the Seed?

Smith-Waterman or Dynamic

Programming

Page 13: Alignment of raw reads in Avadis NGS

© Strand

C G A C $

A C $ C GC G A C $C $ C G AG A C $ C$ C G A C

The original

Reference

This column is the BWT

20314

All its circular shifts, sorted

lexicographically

Circular Shift Indices

The Index comprises these along with some

housekeeping data structures

The Burrows-Wheeler

based Index

These can be sampled to fit into reduced memory at

the expense of speed without

sacrificing correctness

Page 14: Alignment of raw reads in Avadis NGS

© Strand

The Burrows-Wheeler

based Index

All Exact Matches of a Read (NO Mismatches or Gaps) in the Reference can be found in time proportional to the

length of the Read and largely independent of the

size of the Reference.

Reference

Read

EXACT Match

Page 15: Alignment of raw reads in Avadis NGS

© Strand

How does CoBWeb work?

Page 16: Alignment of raw reads in Avadis NGS

© Strand

Seeding Strategy

Use the BW based index, augmented

with additional data structures

for speed, to find one or more Long Seed Matches in the Reference

This 15-mer occurs at

locations x1, x2…

This 15-mer occurs at

locations x3, x4… This whole 30-

mer occurs at location x5

Justification: Most long Reads do not have

Mismatches and Gaps strewn across their length; there are

usually long stretches that match exactly.

And Long Seeds will have few

matching locations.

Page 17: Alignment of raw reads in Avadis NGS

© Strand

Advantages

Seed length is not specified in advance,

so Long and Short reads can be handled

seamlessly.

Separating the Smith-Waterman phase

from the BW Index search allows an

unlimited number of gaps and

mismatches.

Page 18: Alignment of raw reads in Avadis NGS

© Strand

How does CoBWeb compare with other algorithms?

Page 19: Alignment of raw reads in Avadis NGS

© Strand

Comparison with BWA

Read Length 50

Read Length

150

A little faster than BWA with

comparable results

CoBWeb: 94%

Alignment Score with

up to 2 Gaps

BWA: 4% error + 1

gap of possibly multiple length

Page 20: Alignment of raw reads in Avadis NGS

© Strand

How is CoBWeb exposed in Avadis NGS?

Page 21: Alignment of raw reads in Avadis NGS

© Strand

Entry

Two new experiment types, DNA Alignment

and Small-RNA Alignment

Page 22: Alignment of raw reads in Avadis NGS

© Strand

The Alignment Workflow

Run Alignment, and then create a DNA Variant or ChIP-Seq

Experiment from the results.

Page 23: Alignment of raw reads in Avadis NGS

© Strand

Alignment Parameters

Specify number of Mismatches and

Gaps, and handling of Multiple Matching.

Specify Adaptor Trimming (only for

Small RNA) and 3’,5’ trimming based on

quality

Screen against Contaminant Databases.

Page 24: Alignment of raw reads in Avadis NGS

© Strand

What is the future evolution of CoBWeb?

Page 25: Alignment of raw reads in Avadis NGS

© Strand

ToDos

Chimeric Reads

RNA-Seq Alignment

Base Quality

recalibration

Affine Gap Costs

Page 26: Alignment of raw reads in Avadis NGS

© Strand

http://www.avadis-ngs.com