1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R....
-
Upload
rosalind-henry -
Category
Documents
-
view
218 -
download
3
Transcript of 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R....
![Page 1: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/1.jpg)
1
Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Gra
phs
March 12, 2008
Daniel R. Zerbino and Ewan Birney
Presenter: Seunghak Lee
![Page 2: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/2.jpg)
2
What is de Bruijn Graphs?
“De Bruijn graph” is a directed graph An edge represents overlap between sequences of sy
mbols V=(s1, s2, …, sm) E={(v1,v2,…, vn),(w1,w2,…,wn)):v2=w1,v3=w2, …, vn=wn-1}
![Page 3: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/3.jpg)
3
Introduction
New sequencing techniques are commercially available (e.g. 454 Sequencing, Solexa)
454 Sequencing ~ 100 – 200bp
Solexa ~ 30bp
Algorithms whole genome shotgun (WGS) assembly are not suitable for short reads Overlap graph with a node per read is extremely large More ambiguous connections in assembly
![Page 4: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/4.jpg)
4
Introduction (cont)
Euler assembler (Pevzner 2001) used k-mer for a node of de Bruijn graphs
Reads are mapped as a path through the de Brujin graph
High redundancy does not affect the number of nodes
“Velvet” effectively deals with experimental errors and repeats by using Brujin graphs with k-mers
![Page 5: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/5.jpg)
5
De Bruijn Graphs - structure
Structure
![Page 6: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/6.jpg)
6
De Bruijn Graphs – structure (cont)
Adjacent k-mers overlap by k-1 nucleotides
Each node is attached to twin node Reverse series of reverse complement k-mers Overlap between reads from opposite strand
Union of a node and its twin node is called a “block”
Last k-mer overlaps with the first of
its destination
![Page 7: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/7.jpg)
7
De Bruijn Graphs – construction (cont)
Construction
Reads are hashed with predefined k-mer length
Small k-mer → increase connectivity → more ambiguous repeats
Large k-mer → increase specificity → decrease connectivity
Determine k considering “sensitivity” and “specificity”
![Page 8: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/8.jpg)
8
De Bruijn Graphs – construction (cont)
For each k-mer, hash table records ID of the first read and its position
Each k-mer is recorded with reverse complement
Node is created if there is distinct
interruption points
Reads are traced through the graph
Create a directed arc if necessary
![Page 9: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/9.jpg)
9
De Bruijn Graphs – simplification
Simplify the chains of blocks No information loss
If node A has only one outgoing arc to node B,
and if node B has only one ingoing arc → merge
A B
![Page 10: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/10.jpg)
10
De Bruijn Graphs – error removal
Velvet focuses on “topological features” of the graph
First step: remove tips Tip: chain of nodes disconnected on one end
Use two criteria: (1) length and (2) minority count Length: remove a tip if < 2k bp
since two nearby errors can create a tip up to 2k bp error error
k k
![Page 11: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/11.jpg)
11
De Bruijn Graphs – error removal (cont)
Minority count: multiplicity m < n
Starting from node B, going through the tip is an alternative to a more common path
m
n
B
tip
A
C
![Page 12: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/12.jpg)
12
De Bruijn Graphs – error removal (cont)
Second step: remove bubbles using Tour Bus
Redundant paths start and end at the same nodes
Bubbles are created by errors or biological variants such as SNP
Bubble
![Page 13: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/13.jpg)
13
De Bruijn Graphs – error removal (cont)
1. Detect redundant paths
2. Compare them using dynamic programming methods
3. If similar, merge them
Tour Bus
![Page 14: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/14.jpg)
14
De Bruijn Graphs – error removal (cont)
Third step: remove erroneous connections
Remove erroneous connections after Tour Bus algorithm
Remove erroneous connections with basic coverage
cutoff
Genuine short nodes which cannot be simplified in the graph should have high coverage
![Page 15: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/15.jpg)
15
Breadcrumb: resolution of repeats
1. Using read pairs, pair up the long nodes
2. Flag paired reads using unambiguous long nodes
unambiguous long nodes
![Page 16: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/16.jpg)
16
Breadcrumb: resolution of repeats
1. Using read pairs, pair up the long nodes
2. Flag paired reads using unambiguous long nodes
unambiguous long nodes
![Page 17: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/17.jpg)
17
Breadcrumb: resolution of repeats
Extends the nodes as far as possible using flagged paired reads
All nodes between A and B are paired up to either A or B
![Page 18: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/18.jpg)
18
Experimental Results
Test error removal pipeline on simulated data Simulate reads are from E. coli, S. cerevisiae,
C.elegans, and H. sapiens
Coverage density vs N50 for H. sapiens Limited by natural repetition of the reference genome
Ideal + Error (1%) + SNPN50
![Page 19: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/19.jpg)
19
Experimental Results (cont)
Test error removal pipeline on experimental data
173,428 bp human BAC was sequenced using Solexa machines
Reads were 35bp long, and k=31
Tour Bus increased sensitivity by correcting errors and
preserved the integrity of the graph structure
![Page 20: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/20.jpg)
20
Experimental Results (cont)
![Page 21: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/21.jpg)
21
Experimental Results (cont)
![Page 22: 1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649e9d5503460f94b9ebe9/html5/thumbnails/22.jpg)
22
Conclusions
Velvet is a de Bruijn graph based sequence assembly method for short reads
Errors are handled by removing tips and Tour Bus algorithm
A large number of repeats are resolved by Breadcrumb algorithm
Velvet was assessed using simulated and real datasets and it performed well