Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best...

61

Transcript of Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best...

Page 1: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best
Page 2: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Le  Reads  devo  essere  tra,ate  per  trasformare  i  da1  in  informazioni  

Assembly  

In  bioinforma1cs,  sequence  assembly  refers  to  aligning  and  merging  fragments  of  a  much  longer  DNA  sequence  in  order  to  reconstruct  the  original  sequence  

A  con1g  (from  con1guous)  is  a  set  of  overlapping  DNA  segments  that  together  represent  a  consensus  region  of  DNA.  In  bo,om-­‐up  sequencing  projects,  a  con1g  refers  to  overlapping  sequence  data  (reads);  

Page 3: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Dat

a si

ze

 Raw reads

 Pre- processing

 Assembly: Alignment /    de novo

   Application      specific:  Variant calling, count matrix,...

Compare  samples /    methods

Question

Generalized  NGS  analysis  

Answer?  

Page 4: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Merge small DNA fragments together so  they form a previously unknown sequence

What  is  de  novo  assembly?  

Merge millions reads together so they form previously unknown sequences

Page 5: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

de novo assembly • Assemble reads into longer fragments

Find overlap between reads

• Many approaches

reads  

con1gs  

scaffolds  

Page 6: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

de novo assembly • Assemble reads into longer fragments

Find overlap between reads

• Many approaches

reads  

con1gs  

scaffolds  

Page 7: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

• • • • •

Lets try to assemble some reads!

• Rules:      a minimum of 7-bp overlap      overlap must not include any N bases      same orientation so that the sequence can be read left to right      there may be 1-bp differences      simplified - no double stranded DNA

     Valid assemblies        ..NNNNGGACTATGATTCG          |||||||          TGATTCGAGGCTAANN..  ..NNNNNNNNCGATTCTGATCCGA        |||||||    GTCCTCGATTCTNNNNNNNN..

 Invalid assemblies      ..NNNNCGGACTATGATT

       ||||||        ATGATTCGAGGCTAANN..  

..NNNNNNNNCGCTACTGATCCGA      || | |||    GTCCTCGATTCTGNNNNNNN..

Page 8: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

     NGS de novo assembly                          

• Success is a factor of:

 • Genome size,genomic    repeats(!),ploidy

• High coverage,long read lengths,PE/MP libraries

Repeats in E.coli Domani  vedremo  una  storia  di  successo  di  un  genemo  assembly  

Page 9: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Two bacterial genomes de Bruijn graphs

 Few repeats “more” repeats

Alla  fine  di  questa  giornata  non  vedrete  due  scarabocchi,  ma  molto  altro  

Page 10: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

 Which approaches?                            Greedy (“Simple” approach)

• Overlap-Layout-Consensus (Long  fewer reads)

• de Bruijn graphs (Many short reads)

Page 11: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Simple approach - Greedy • Pseudo code:

1. Pairwise alignment of all reads

2. Identify fragments that have largest overlap      3. Merge these

4. Repeat until all overlaps are used

• Can only resolve repeats smaller than read length

High computational cost with increasing no.reads

Page 12: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Shredded  Book  Reconstruc1on  •  Dickens  accidentally  shreds  the  first  prin1ng  of  A  Tale  of  Two  Ci1es  

–  Text  printed  on  5  long  spools  

•  How can he reconstruct the text? –  5 copies x 138, 656 words / 5 words per fragment = 138k fragments –  The short fragments from every copy are mixed together –  Some fragments are identical

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,

It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …

It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …

It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,

It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …

It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …

It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …

Page 13: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Greedy  Reconstruc1on  

It was the best of

of times, it was the

best of times, it was

times, it was the worst

was the best of times,

the best of times, it

of times, it was the

times, it was the age

It was the best of

of times, it was the

best of times, it was

times, it was the worst

was the best of times,

the best of times, it

it was the worst of

was the worst of times,

worst of times, it was

of times, it was the

times, it was the age

it was the age of

was the age of wisdom,

the age of wisdom, it

age of wisdom, it was

of wisdom, it was the

wisdom, it was the age

it was the age of

was the age of foolishness,

the worst of times, it

 The  repeated  sequence  make  the  correct  reconstruc1on  ambiguous  •  It  was  the  best  of  1mes,  it  was  the  [worst/age]  

 Model  sequence  reconstruc1on  as  a  graph  problem.  

Page 14: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

La  teoria  dei  Grafi  

la  teoria  dei  grafi  si  occupa  di  studiare  i  grafi,  oggeX  discre1  che  perme,ono  di  schema1zzare  una  grande  varietà  di  situazioni  e  di  processi  e  spesso  di  consen1rne  l'analisi  in  termini  quan1ta1vi  e  algoritmici.  

La  teoria  dei  grafi  è  un  modo  di  vedere  le  cose  

 •  oggeX  semplici,  deX  ver1ci  (ver1ces)  o  nodi  (nodes),  •  collegamen1  tra  i  ver1ci.  I  collegamen1  possono  essere:  

•  orienta1,  e  in  questo  caso  sono  deX  archi  (arcs)  o  cammini  (paths),  e  il  grafo  è  de,o  orientato  

•  non  orienta1,  e  in  questo  caso  sono  deX  spigoli  (edges),  e  il  grafo  è  de,o  non  orientato  

•  eventualmente  da1  associa1  a  nodi  e/o  collegamen1.  

Per  grafo  si  intende  una  stru,ura  cos1tuita  da:  

Page 15: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

La  stru,ura  informa1ca  di  WikiPedia  

Page 16: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Problema  dei  pon1  di  Königsberg  

Königsberg,  è  percorsa  dal  fiume  Pregel  e  da  suoi  affluen1  e  presenta  due  estese  isole  che  sono  connesse  tra  di  loro  e  con  le  due  aree  principali  della  ci,à  da  se,e  pon1  

Nel  corso  dei  secoli  è  stata  più  volte  proposta  la  ques1one  se  sia  possibile  con  una  passeggiata  seguire  un  percorso  che  a,raversi  ogni  ponte  una  e  una  volta  soltanto  e  tornare  al  punto  di  partenza  

Nel  1736  Leonhard  Euler  affrontò  tale  problema,  dimostrando  che  la  passeggiata  ipo1zzata  non  era  possibile  

Page 17: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Problema  dei  pon1  di  Königsberg  

Eulero  ha  il  merito  di  aver  formulato  il  problema  in  termini  di  teoria  dei  grafi,  astraendo  dalla  situazione  specifica  di  Königsberg;  innanzitu,o  eliminò  tuX  gli  aspeX  con1ngen1  ad  esclusione  delle  aree  urbane  delimitate  dai  bracci  fluviali  e  dai  pon1  che  le  collegano;  secondariamente  rimpiazzò  ogni  area  urbana  con  un  punto,  ora  chiamato  ver1ce  o  nodo  e  ogni  ponte  con  un  segmento  di  linea,  chiamato  spigolo,  arco  o  collegamento.  

Eulero  rappresentò  la  disposizione  dei  se,e  pon1  congiungendo  con  altre,ante  linee  le  qua,ro  grandi  zone  della  ci,à,  come  nella  prima  immagine.  Si  no1  che  dai  nodi  A,  B  e  D  partono  (e  arrivano)  tre  pon1;  dal  nodo  C,  invece,  cinque  pon1.  Ques1  sono  i  gradi  dei  nodi:  rispeXvamente,  3,  3,  5,  3.  Prima  di  raggiungere  una  conclusione,  Eulero  ha  ipo1zzato  delle  situazioni  diverse  di  zone  e  pon1  (nodi  e  collegamen1):  con  qua,ro  nodi  e  qua,ro  pon1  è  possibile  par1re,  ad  esempio,  da  A,  e  tornarci  passando  per  tuX  i  pon1  una  e  una  sola  volta.  Il  grado  di  ciascun  nodo  è  un  numero  pari.  Se  invece  si  parte  da  A  per  arrivare  a  D,  ogni  nodo  è  di  grado  pari  a  eccezione  di  due  nodi,  di  grado  dispari  (uno).  Sulla  base  di  queste  osservazioni,  Eulero  ha  enunciato  il  seguente  teorema:    Un  qualsiasi  grafo  è  percorribile  se  e  solo  se  ha  tu5  i  nodi  di  grado  pari,  o  due  di  essi  sono  di  grado  dispari;  per  percorrere  un  grafo  "possibile"  con  due  nodi  di  grado  dispari,  è  necessario  par:re  da  uno  di  essi,  e  si  terminerà  sull’altro  nodo  dispari.  

Page 18: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

                                 Overlap Layout

 Consensus

de Bruijn

Graph  Theory!!!  

Page 19: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

 Create overlap graph by all-vs-all alignment

 Contigs created based on overlap          

In  the  graph  each  node  is  a  read,  edges  are  overlaps  between  reads  

Overlap-Layout-Consensus

Page 20: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

• Consensus:Hamiltonian path (visit each node exactly once)

 • Computationally hard    problem                                        

Overlap-Layout-Consensus

Page 21: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Assemblers:  ARACHNE,  PHRAP,  CAP,  TIGR,  CELERA  

Overlap:    find  poten1ally  overlapping  reads  

Layout:    merge  reads  into  con1gs  and                                                                      con1gs  into  supercon1gs  

Consensus:    derive  the  DNA  sequence  and  correct  read  errors   ..ACGATTACAATAGGTT..

Overlap-­‐Layout-­‐Consensus    

Page 22: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Find  the  best  match  between  the  suffix  of  one  read  and  the  prefix  of  another  

 

•  Due  to  sequencing  errors,  need  to  use  dynamic  programming  to  find  the  op1mal  overlap  alignment  

 

•  Apply  a  filtra1on  method  to  filter  out  pairs  of  fragments  that  do  not  share  a  significantly  long  common  substring  

Overlap  

Page 23: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

TAGATTACACAGATTAC

TAGATTACACAGATTAC |||||||||||||||||

•  Sort  all  k-­‐mers  in  reads            (k  ~  24)  

•  Find pairs of reads sharing a k-mer

•  Extend  to  full  alignment  –  throw  away  if  not  >95%  similar  

T GA

TAGA | ||

TACA

TAGT ||

Overlapping  Reads  

Che  cos’è  un  k-­‐mer  e  il  k-­‐mer?  

Page 24: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  A  k-­‐mer  that  appears  N  1mes,  ini1ates  N2  comparisons  

 •  For  an  Alu  that  appears  106  1mes  à  1012  comparisons  –  too  much  

 •  Solu:on:    Discard  all  k-­‐mers  that  appear  more  than    

                             t  ×  Coverage,  (t  ~  10)  

Overlapping  Reads  and  Repeats  

Alu  elements  are  the  most  abundant  transposable  elements  in  the  human  genome  

Page 25: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Create  local  mul1ple  alignments  from  the  overlapping  reads  

TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA

Finding  Overlapping  Reads  

Page 26: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Correct  errors  using  mul1ple  alignment  

TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA TAG TTACACAGATTATTGA TAGATTACACAGATTACTGA TAGATTACACAGATTACTGA

C: 20 C: 35 T: 30 C: 35 C: 40

C: 20 C: 35 C: 0 C: 35 C: 40

•  Score  alignments  •  Accept  alignments  with  good  scores    

A: 15 A: 25 A: 40 A: 25 -

A: 15 A: 25 A: 40 A: 25 A: 0

Finding  Overlapping  Reads  (cont’d)  

Page 27: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Repeats  are  a  major  challenge  •  Do  two  aligned  fragments  really  overlap,  or  are  they  from  two  copies  of  a  repeat?    

•  Solu1on:    repeat  masking  –  hide  the  repeats!!!  •  Masking  results  in  high  rate  of  misassembly  (up  to  20%)  

•  Misassembly  means  a  lot  more  work  at  the  finishing  step  

Layout  

Page 28: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Repeats  shorter  than  read  length  are  OK    •  Repeats  with  more  base  pair  differencess  than  sequencing  error  rate  are  OK  

 •  To  make  a  smaller  por1on  of  the  genome  appear  repe11ve,  try  to:  – Increase  read  length  – Decrease  sequencing  error  rate  

Repeats, Errors, and Contig Lengths  

Page 29: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,

It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …

It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …

It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …

De  Bruijn  graph  assembly  

Page 30: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Dickens  accidentally  shreds  the  first  prin1ng  of  A  Tale  of  Two  Ci1es  –  Text  printed  on  5  long  spools  

•  How can he reconstruct the text? –  5 copies x 138, 656 words / 5 words per fragment = 138k fragments –  The short fragments from every copy are mixed together –  Some fragments are identical

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,

It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …

It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …

It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,

It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …

It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …

It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …

Shredded  Book  Reconstruc1on  

Page 31: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Greedy  Reconstruc1on  

It was the best of

of times, it was the

best of times, it was

times, it was the worst

was the best of times,

the best of times, it

of times, it was the

times, it was the age

It was the best of

of times, it was the

best of times, it was

times, it was the worst

was the best of times,

the best of times, it

it was the worst of

was the worst of times,

worst of times, it was

of times, it was the

times, it was the age

it was the age of

was the age of wisdom,

the age of wisdom, it

age of wisdom, it was

of wisdom, it was the

wisdom, it was the age

it was the age of

was the age of foolishness,

the worst of times, it

 The  repeated  sequence  make  the  correct  reconstruc1on  ambiguous  •  It  was  the  best  of  1mes,  it  was  the  [worst/age]  

 Model  sequence  reconstruc1on  as  a  graph  problem.  

Page 32: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Dk  =  (V,E)  •  V  =  All  length-­‐k  subfragments    •  E  =  Directed  edges  between  consecu1ve  subfragments  

•  Nodes  overlap  by  k-­‐1  words  

•  Locally  constructed  graph  reveals  the  global  sequence  structure  •  Overlaps  between  sequences  implicitly  computed  

It was the best was the best of It was the best of

Original  Fragment   Directed  Edge  

de  Bruijn,  1946  Idury  and  Waterman,  1995  Pevzner,  Tang,  Waterman,  2001  

de  Bruijn  Graph  Construc1on  

Page 33: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Can  this  really  work?  •  How  do  we  choose  a  value  for  k?  

– Needs  to  be  big  enough  to  be  unique  – But  repeats  make  it  impossible  to  use  such  a  large  k,  because  en1re  reads  are  not  unique  

– So  pick  k  to  be  “big  enough”  

No  need  to  compute  overlaps!  

Page 34: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Dickens  accidentally  shreds  the  first  prin1ng  of  A  Tale  of  Two  Ci1es  –  Text  printed  on  5  long  spools  

•  How can he reconstruct the text? –  5 copies x 138, 656 words / 5 words per fragment = 138k fragments –  The short fragments from every copy are mixed together –  Some fragments are identical

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,

It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …

It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …

It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, …

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

It was the best worst of times, it was of times, it was the the age of wisdom, it was the age of foolishness,

It was the the worst of times, it best of times, it was was the age of wisdom, it was the age of foolishness, …

It was was the worst of times, the best of times, it it was the age of wisdom, it was the age of foolishness, …

It it was the worst of was the best of times, times, it was the age of wisdom, it was the age of foolishness, …

Shredded  Book  Reconstruc1on  

Page 35: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

de  Bruijn  Graph  Assembly  

the age of foolishness

It was the best

best of times, it

was the best of

the best of times,

of times, it was

times, it was the

it was the worst

was the worst of

worst of times, it

the worst of times,

it was the age

was the age of the age of wisdom,

age of wisdom, it

of wisdom, it was

wisdom, it was the

A  unique  Eulerian  tour  of  the  graph  reconstructs  the  

original  text    

If  a  unique  tour  does  not  exist,  try  to  simplify  the  

graph  as  much  as  possible  

Page 36: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

de  Bruijn  Graph  Assembly  

the age of foolishness

It was the best of times, it

of times, it was the

it was the worst of times, it

it was the age of the age of wisdom, it was the A  unique  Eulerian  tour  of  

the  graph  reconstructs  the  original  text  

 If  a  unique  tour  does  not  exist,  try  to  simplify  the  

graph  as  much  as  possible  

1  

2  

It was the best of of times, it was the times, it was the worst age of wisdom, it was the age of foolishness, …

Page 37: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

38

 Example                                                                                    

 TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG  

   AGTCGAG CTTTAGA CGATGAG CTTTAGA      GTCGAGG TTAGATC ATGAGGC GAGACAG            GAGGCTC                  ATCCGAT AGGCTTT GAGACAG    AGTCGAG TAGATCC ATGAGGC TAGAGAA  TAGTCGA CTTTAGA CCGATGA TTAGAGA          CGAGGCT AGATCCG TGAGGCT AGAGACA  TAGTCGA GCTTTAG TCCGATG GCTCTAG        TCGACGC GATCCGA GAGGCTT AGAGACA  TAGTCGA TTAGATC GATGAGG TTTAGAG      GTCGAGG TCTAGAT ATGAGGC TAGAGAC              AGGCTTT ATCCGAT AGGCTTT GAGACAG    AGTCGAG TTAGATT                    ATGAGGC AGAGACA                GGCTTTA TCCGATG TTTAGAG          CGAGGCT TAGATCC TGAGGCT GAGACAG    AGTCGAG TTTAGATC ATGAGGC TTAGAGA            GAGGCTT GATCCGA GAGGCTT GAGACAG

       Velvet / Curtain

Page 38: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 39

GTCG (1x)

Example          

 Read: GTCGAGG

Page 39: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 40

GTCG (1x)

TCGA (1x)

Example          

 Read: GTCGAGG

Page 40: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 41

GTCG (1x)

TCGA (1x)

CGAG (1x)

Example          

 Read: GTCGAGG

Page 41: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 42

GTCG (1x)

TCGA (1x)

CGAG (1x)

GAGG (1x)

Example          

 Read: GTCGAGG

Page 42: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 43

Example            New read: CGAGGCT

GTCG (1x)

TCGA (1x)

CGAG (2x)

GAGG (1x)

Page 43: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 44

GTCG (1x)

TCGA (1x)

CGAG (2x)

GAGG (2x)

Example          

 Read: CGAGGCT

Page 44: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 45

GTCG (1x)

TCGA (1x)

CGAG (2x)

GAGG (2x)

AGGC (1x)

Example          

 Read: CGAGGCT

Page 45: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 46

GTCG (1x)

TCGA (1x)

GGCT (1x)

CGAG (2x)

GAGG (2x)

AGGC (1x)

Example          

 Read: CGAGGCT

Page 46: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 47

Example          

 New read: TCGACGC

GTCG (1x)

TCGA (2x)

CGAG (2x)

GAGG (2x)

AGGC (1x)

Page 47: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 48

GTCG (1x)

TCGA (2x)

CGAG (2x)    CGAC (1x)

GAGG (2x)      GACG  (1x)

AGGC (1x)      ACGC  (1x)

Example          

 Read: TCGACGC

Page 48: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 49

AGAT (8x)

ATCC (7x)

TCCG (7x)

CCGA (7x)

CGAT (6x)

GATG (5x)

ATGA (8x)

TGAG (9x)

GATC (8x)

GATT (1x)

TAGT (3x)

AGTC (7x)

GTCG (9x)

TCGA (10x)

GGCT (11x)

TAGA (16x)

AGAG (9x)

GAGA (12x)

GACA (8x)

ACAG (5x)

GCTT (8x)

GCTC (2x)

CTTT (8x)

CTCT (1x)

TTTA (8x)

TCTA (2x)

TTAG (12x)

CTAG (2x)

AGAC (9x)

AGAA (1x)

CGAG (8x)

CGAC (1x)

GAGG (16x)

GACG (1x)

AGGC (16x)

ACGC (1x)

Example          

 etc…

Page 49: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 50

TAGTCGA

AGAGA TAGA

AGAT

GCTTTAG

GCTCTAG

AGACAG

AGAA

CGAG

CGACGC

GAGGCT

GATCCGATGAG

GATT

Example          

 After simplification…

GGCT

Page 50: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 51

Example          

 Tips removed…

TAGTCGA

AGAGA TAGA

AGAT

GCTTTAG

GCTCTAG

AGACAG

CGAG

GAGGCT

GATCCGATGAG

GGCT

Page 51: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 56

TAGTCGA

AGAGA TAGA

AGAT

GCTTTAG AGACAG

CGAG

GAGGCT

GATCCGATGAG

GGCT

Example          

 Bubbles removed… by TourBus

Page 52: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 57

TAGTCGAG AGAGACAG

AGATCCGATGAG

GAGGCTTTAGA

Example          

 Final simplification…

Page 53: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Velvet / Curtain 09.03.12 58

One possible walk through the graph ...  

 TAGTCGAG    GAGGCTTTAGA      AGATCCGATGAG        GAGGCTTTAGA          AGAGACAG

TAGTCGAG AGAGACAG

Example    TAGTCGAGGCTTTAGATCCGATGAGGCTTTAGAGACAG        Final simplification…

             AGATCCGATGAG

GAGGCTTTAGA

Page 54: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Now  we  create  a  dra}  assembly  in  con1g  

But  is  not  sufficient  to  understand  the  characteris1c  of  a  genome  

Contigs

Scaffolds  

Reads

‘De Bruijn’ assembly

To  go  ahead  we  have  to  talk  about  the  paired-­‐end  sequencing  technology  

Page 55: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Paired-­‐end  Sequencing  

Page 56: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Scaffolding

Contigs

Scaffolds

(An assembly)

Reads  ‘De  Bruijn’    assembly  

“Captured”  gaps  caused  by  repeats.  Represented  by  “NNN”  in  assembly  

Join contigs using evidence from paired end data

Align reads to DeBruijn contigs

Page 57: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

Scaffolding  

Page 58: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

SUPERSCAFOLDING!!!  

Page 59: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

A  “real”  protocol  

1.  Retrieve  reads  2.  Quality  check  of  reads  3.  Trimming  and  filtering  4.  Assembly  5.  Using  paired-­‐end  for  scaffolding  6.  Check  the  genome  quality  

Reads

Overlap  

Local  Mul1ple  Alignment    

Con1gs    

Scaffolding      

Alignment  Scoring  

Finishing    

Assembly Problems: -Repeats

-Chimerism

-Gaps

Page 60: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

•  Number of large contigs

•  Total size •  Coverage

•  Average length •  N50

•  Longest contig •  % genome assembled

Important Assembler Metrics How  can  we  asses  the  quality  of  a  genome?  

Page 61: Le#Reads#devo#essere#traate per# trasformare#i#da in ... · Greedy#Reconstruc1on# It was the best of of times, it was the best of times, it was times, it was the worst was the best

How  can  we  understand  if  we  performed  a  good  assembly?  

Species Genome size

(Mb) N50 Scaffold

index N50 scaffold size

(Mb) # scaffolds N50 contig size

(Kb) sequencing technology reference

Melon 450 26 4,678 1,594 18.2 454, Sanger this report

Potato 844 121 1,782 2,043 31,4 Illumina, 454,

Sanger The Potato Genome Sequencing Consortium

2011 Apple 743 102 1,542 1,629 13.4 Sanger, 454 Velasco et al 2010

Fragaria 240 n.a. 1,361 3,263 n.a. 454, Illumina,

SOLiD Shulaev et al 2011

Cucumber 367 59 1,144 47,837 19.8 Illumina, Sanger Huang et al 2009

Brassica rapa 529 n.a. 1,97 n.a. 27.3 Illumina

The Brassica rapa Genome Sequencing Project Consortium 2011

Cacao 430 178 0,47 4,792 19,8 454 Argout et al 2011

Date palm 658 n.a. 0,03 57,277 6.4 Illumina Al-Dous et al 2011