2014 10-01-assembly summaryvariantsoverview

15
Wednesday Mini Overview

description

Bioinformatics MSc - Wednesday summary of genome assembly & introduction to variant calling

Transcript of 2014 10-01-assembly summaryvariantsoverview

Page 1: 2014 10-01-assembly summaryvariantsoverview

Wednesday Mini Overview

Page 2: 2014 10-01-assembly summaryvariantsoverview
Page 3: 2014 10-01-assembly summaryvariantsoverview

Congrats!• first ever genome assembly

• complete approach

• with real-world cutting edge tools

• Some shortcuts:

• we used 2% of a eukaryotic (ant) genome

• only 1 type of paired reads

• only used “one-step” software.

Page 4: 2014 10-01-assembly summaryvariantsoverview

So you want to do sequence a genome…

• Sampling? • algorithms prefer low diversity

• Sequencing approach? • paired end? • which sequencer? • what is needed for scaffolding?

Page 5: 2014 10-01-assembly summaryvariantsoverview

Scaffolding

Page 6: 2014 10-01-assembly summaryvariantsoverview

So you want to do sequence a genome…

• Sampling? • algorithms prefer low diversity

• Sequencing approach? • paired end? • which sequencer? • what is needed for scaffolding?

• input data Q/A? • sequencer statistics • fastqc • bio-relevant measurements? (e.g. % mapping to known data)

Unable to detect all errors!

Page 7: 2014 10-01-assembly summaryvariantsoverview

• trimming/deduplicating/filtering • removing excess/redundant data • removing errors

• Which assembler? • used by others? (publications/ online list/ forum/

assemblathon) • something new? !

• assembly result QA • sequence statistics (e.g., QUAST) • bio-relevant measures (e.g. ,CEGMA)

So you want to do sequence a genome…

Page 8: 2014 10-01-assembly summaryvariantsoverview

Perfect parameters

• Instead: need to test many combinations

• of trimming

• of filtering

• different assembly software

Page 9: 2014 10-01-assembly summaryvariantsoverview

Take home messages• No “best way”

• Need to install a lot of software

• A lot of work in UNIX - to launch software, to convert formats…

• Need to test many parameters

• Be careful with qualities!

Page 10: 2014 10-01-assembly summaryvariantsoverview

No need to understand everything!

20% effort for 80% result

Page 11: 2014 10-01-assembly summaryvariantsoverview

Calling variants

Page 12: 2014 10-01-assembly summaryvariantsoverview
Page 13: 2014 10-01-assembly summaryvariantsoverview

www.sciencemag.org SCIENCE VOL 331 25 FEBRUARY 2011 1067

REPORTS

on

Mar

ch 1

2, 2

013

ww

w.s

cien

cem

ag.o

rgD

ownl

oade

d fro

m

Solenopsis invicta fire ants are a big problem!very well studied!

Ascunce et al 2011

Page 14: 2014 10-01-assembly summaryvariantsoverview

Solenopsis invicta fire ant: two social forms

!

•1 large queen •Independent founding •Highly territorial •Many sizes of workers

!

•2-100 smaller queens •Dependent founding •No inter-colony aggression •All workers similar size

Single-queen form: Multiple-queen form:

Page 15: 2014 10-01-assembly summaryvariantsoverview