Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

34
Joyce Njoki Nzioki BecAILRI Hub, Nairobi, Kenya h;p://hub.africabiosciences.org/ h;p://www.Ilri.org/ [email protected] Introduc)on to CLC Main Workbench ILRI Training / EthopiaTraining 27, August 2015

Transcript of Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Page 1: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Joyce  Njoki  Nzioki  BecA-­‐ILRI  Hub,  Nairobi,  Kenya  h;p://hub.africabiosciences.org/  h;p://www.Ilri.org/  [email protected]  

Introduc)on  to  CLC  Main  Workbench  ILRI  Training  /  EthopiaTraining  

27,  August  2015    

Page 2: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Ge#ng  started  with  CLC  

CLC  Main  Workbench  is  a  so7ware  package  that  supports  analysis  of  sequence  data    Func)ons  include:  

ü Sequence  assembly  ü Primer  design  ü Alignment  and  Phylogeny  ü Blast  /  Database  searches  ü Addi)onal  plugins    

Page 3: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 4: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Ge#ng  around  in  CLC  

ü CLC  has  a  has  a  main  menu  with  features  available  as  shown  above  

ü File  menu  has  opAons  to  manipulate  data  ü The  most  useful  menu  is  the  TOOLBOX  that  has  various  analysis  opAons  to  manipulate  data  

Page 5: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Sequenced  Data  

ü You  can  view  your  sequences  data  by  opening  the  sequence  files  (trace  files)  extension  .ab1  /.abi  

ü NOTE:  In  order  to  obtain  good  sequencing  results,  you  MUST  download  and  examine  your  sequencing  chromatogram.  If  you  are  using  just  the  text  data,  you  could  be  publishing  data  that  is  completely  invalid!    

ü So7ware  used  for  viewing  include:  CLC  bio,  BioEdit,  TracerView  

Page 6: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

ManipulaAng  Data  in  CLC  

Crea)ng  folders    

ü It  is  best  to  organize  data  in  the  navigaAon  area  in  folders.    

ü To  create  a  folder  go  to  File  |  New  |  Folder  ü Or  click  on  the  new  folder  icon  on  the  tool  bar  ü Name  the  folder  

   

Page 7: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 8: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 9: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 10: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

ManipulaAng  Data  in  CLC  

Impor)ng  Data  ü Allows  you  to  bring  sequenced  data  into  CLC  from  where  it  is  stored  on  your  computer.    

ü Go  to  File  |  import  or  click  the  import  icon  on  the  tool  bar.    

ü Navigate  to  where  your  sequences  are  stored  on  your  computer  

ü Select  the  file  format  to  import  in  the  case  of  sequenced  data  select  Trace  files  (.abi/.ab1/.scf/.phd)    

ü Select  the  folder  to  save  the  sequences  to        

Page 11: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 12: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 13: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 14: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trouble  shoot  sequenced  data  “the  good”  

•  Good  quality  peaks  are  smooth,  disAnct  or  well  formed,  evenly  spaced  and  with  li]le  baseline  noise  

Page 15: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trouble  shoot  sequenced  data  “the  bad”  

ü A failed sequencing reaction: the chromatographs look messy, many ‘N’s in the sequence.

ü Non-usable sequenced data: can be due to low concentration of DNA template, none or wrong primer added.

Page 16: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trouble  shoot  sequenced  data  “double  peaks”  

ü Double  peaks:  mulAple  peaks  of  same  or  different  length  at  the  same  posiAon;  this  is  due  to  clone  contaminaAon,  heterozygous  posiAon  (SNP),  contaminated  PCR  reacAon  

ü Can  be  corrected  using  degenerate  codes;    N  (a  c  t  g  )  ,  Y  (c  t  ),  R  (a  g)  

Page 17: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trouble  shoot  sequenced  data  “stu]ering”  

ü Sequence  data  quality  is  poor  a7er  stretches  of  7  or  more  nucleoAdes  of  the  same  base.  This  is  due  to  polymerase  slippage  during  DNA  synthesis,  it’s  a  limitaAon  of  sanger  

Page 18: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trouble  shoot  sequenced  data  “drop  off”  

ü The  DNA  sequence  suddenly  stops  or  peak  intensely  drops  off  substanAally.  This  is  caused  by  secondary  structures  like  hairpin  loops  or  GC/GT  rich  regions.  

Page 19: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trouble  shoot  sequenced  data  “mis-­‐called  bases”  

ü NucleoAdes  that  have  been  erroneously  inserted  into  a  sequence  will  appear  oddly  spaced  relaAve  to  their  neighboring  bases  

Page 20: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trouble  shoot  sequenced  data  “mis-­‐called  bases”  

ü NucleoAdes  that  have  been  erroneously  inserted  into  a  sequence  will  appear  oddly  spaced  relaAve  to  their  neighboring  bases  

Page 21: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trim  3’  and  5’  ends  At 5’ end sequences don’t start of very clearly till about bases 20-30 bases. Due to non-fully activated taq polymerase / poor termination near the primer

Page 22: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trim  3’  and  5’  ends  At 5’ end towards the end base 500-800 the quality will degrade as well. due to diminishing bases.

Page 23: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Trimming  sequences  ü After carefully scrutinizing your sequence you

can determine where your reliable sequence starts and ends.

ü You can delete / or trim the unreliable sequences from each end of your sequence file.

ü As a gel processes it looses resolution and the reads become more erroneous. Trim sequences when the errors become too frequent for your purpose

Page 24: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Quality  Control  using  CLC  ü  The first step in sequence analysis is to check the quality

of reads and trim sequences where need be to eliminate poor quality or vector contamination.

ü When the trimming is done the parts of the sequences that are trimmed are not actually removed but trim annotations are saved to the sequences. These annotated sections are ignored in further analysis.

 

Page 25: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Assemble  sequence  

Sequence  assembly  refers  to  merging  and  aligning  fragment  of  a  much  longer  DNA  sequence  in  order  to  reconstruct  the  much  longer  DNA  sequence    

I.  Reference  assembly  –  reference  guided  assembly.  

II.   De  novo  assembly  –  assembling  without  the  aid  of  a  reference  genome.  

Page 26: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

De  novo  assembly  

ü In  most  cases  forward  and  reverse  primers  are  used,  hence  you  sequence  both  forward  and  reverse  sequences.  

ü Assembling  the  two  sequences  aligns  the  two  sequences  at  they  point  the  overlap  to  get  a  conAguous  sequence  called  a  conAg.  

Page 27: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Page 28: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Conflicts  The  example  shows  a  conflict  in  which  the  forward  strand  show  base  call  “A”  and  reverse  strand  shows  a  “gap”  

F  

R  

Page 29: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Resolving  conflicts  ü We assess the quality of reads at this position. The

reverse sequence has low quality of chromatographs (this is often the case towards the ends of the sequence). However the forward strand clearly has good quality peaks and can be trusted.

F  

R  

Page 30: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Resolving  conflicts   ü Other conflicts may

occur between two nucleotides, judgment on how to resolve such conflicts should be made based on:

Page 31: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Resolving  conflicts   Other conflicts may occur between two nucleotides, judgment on how to resolve such conflicts should be made based on: ü Quality of reads on both

strands (take data from the most consis tent sequence)

Page 32: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Resolving  conflicts   Other conflicts may occur between two nucleotides, judgment on how to resolve such conflicts should be made based on: ü Quality of reads on both

strands (take data from the most consistent sequence)

ü  Two differing bases may be picked on either sequences because it is genuinely a SNP position so judgment should be based on quality of reads but also background knowledge on the sequences been analyzed.

Page 33: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

Consensus  sequences  Once you have assembled and resolved conflicts you can extract a consensus sequence that is used in further analysis

Page 34: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...

The End