Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.
-
Upload
adam-davidson -
Category
Documents
-
view
217 -
download
2
Transcript of Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.
![Page 1: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/1.jpg)
Aligning Reads
Ramesh Hariharan
Strand Life SciencesIISc
![Page 2: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/2.jpg)
What is Read Alignment?
![Page 3: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/3.jpg)
AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC
Subject’s Genome
AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC
Reference Genome
Where do these
match in the
Reference?
Close but not quite
the same as the
Subject’s Genome
![Page 4: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/4.jpg)
What does “Match” mean?
![Page 5: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/5.jpg)
AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC
Reference Genome
GCTACGCA
Exact Match
CATAAAGAC
With Mismatche
s
CACTT_AGT
With Gaps
![Page 6: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/6.jpg)
Why mismatches and gaps?
![Page 7: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/7.jpg)
The subject genome could be different from the reference
![Page 8: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/8.jpg)
Reads
Reference
Genome
SNP
Deletion
Mismatches and Gaps
![Page 9: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/9.jpg)
The reading process could be erroneous
![Page 10: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/10.jpg)
How many mismatches and gaps?
![Page 11: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/11.jpg)
Short reads ~50, few
mismatches and gaps
Long reads, ~1000, many
more mismatches
and gaps
![Page 12: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/12.jpg)
How do aligners fare?
![Page 13: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/13.jpg)
BWA: Very few
mismatches and gaps
CoBWeb
BWA-SW: Many
mismatches and gaps
BowTie: only
mismatches, no gaps
No paired read
handling
No handling of adaptor
trimming for small RNA
Separate handling for
RNASeq
BowTie2
![Page 14: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/14.jpg)
How does an Aligner work?
![Page 15: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/15.jpg)
For simplicity, assume Exact Match
![Page 16: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/16.jpg)
For each read, scan the entire reference genome sequence
SLOW!!!!
![Page 17: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/17.jpg)
C G A C G
The Reference
C
C
G
T
T
A C
A G
A C
T
Index the Reference
![Page 18: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/18.jpg)
How can we find Exact Matches of a read quickly with this index?
![Page 19: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/19.jpg)
C G A C G
The Reference
C
C
G
T
T
A C
A G
A C
T
C G C
![Page 20: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/20.jpg)
The problem: 24GB
![Page 21: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/21.jpg)
Can this structure be compressed?
![Page 22: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/22.jpg)
C G A C $
A C $ C GC G A C $C $ C G AG A C $ C$ C G A C
The Reference
This column is the BWT
All its circular shifts, sorted
lexicographically
The Index: now an array instead
of a tree
The Burrows-Wheeler
based Index
Sampled to reduce memory at the
expense of speed (Ferragina and
Manzini)
![Page 23: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/23.jpg)
How about Mismatches and Gaps?
![Page 24: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/24.jpg)
BWA, BWA-SW and BowTie force mismatches and gaps into the BW Index searching
procedure
![Page 25: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/25.jpg)
CoBWeb uses the BW Index to find a ‘seed’ exact match and does Smith-Waterman around
this seed
This 15-mer occurs at
locations x1, x2…
This 15-mer occurs at
locations x3, x4…
This whole 30-mer occurs at
location x5
![Page 26: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/26.jpg)
Dynamic Programming
• Given a location in the reference with an read anchor, how well does the read match here?
Reference
Read
Anchor 14 mer
• Smith-Waterman (optimized for large gaps)
![Page 27: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/27.jpg)
Comparison with BWA
Read Length 50
Read Length
150
20% faster than BWA with
comparable results
CoBWeb: 3 mismatches and 2 gaps
BWA: 2 mismatches + 1 gap of possibly multiple length
![Page 28: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/28.jpg)
Comparison with BWA-SW
Read Length
400
8 mismatches
plus 10 gaps
CoBWeb BWA-SW
Reads 1m 1m
Time taken 1130s 2242s
Incorrectly Mapped 12598 9819
5650 mapped
incorrecty by BWA-SW
The remainder
has poor BWA mapping quality
![Page 29: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/29.jpg)
Avadis NGS
![Page 30: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/30.jpg)
Avadis NGS Alignment, DNA Var Detection,
RNASeq, ChIPSeq, Small RNASeq
![Page 31: Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.](https://reader034.fdocuments.in/reader034/viewer/2022051516/56649ec65503460f94bd1597/html5/thumbnails/31.jpg)
Thank You