Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome...

Post on 16-Dec-2015

216 views 0 download

Tags:

Transcript of Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome...

Welcome toIntroduction to Bioinformatics

Wednesday, 10 FebruaryGenome Sequencing/Assembly

• Genome sequencing/Assembly

Click anywhere to go on to the next slide

This demonstration is best viewed as a slide show,enabling you to simulate a session and make

changes in cursor position more obvious.To do this, click Slide Show on the top tool bar, then View show.

What to do for summer vacation?

Deadline, SUNday Feb 28!

Target, Monday Mar 1!

Deadline, ???

Deadline, FRIday Feb 26!

Global Viral Genome Project

Deadline, whenever!

Learn more about…

HHMI: http://www.vcu.edu/csbc/hhmi/

BBSI: http://www.vcu.edu/csbc/bbsi/

VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm

GVGP: http://biobike.csbc.vcu.edu (News)

What is the sequence (5' to 3') represented by the gel?

Myers et al SQ2

G A T C

What is the sequence (5' to 3') represented by the gel?

Myers et al SQ2

G A T C

Dideoxy sequencing(= Sanger sequencing)

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

Dideoxy sequencing

What is the sequence (5' to 3') represented by the gel? G A T C

Myers et al SQ2

What is the sequence (5' to 3') represented by the gel? G A T C

ddCddC

ddCddC

ddC

TCGTGTACATCGTAACACGGTTAAGT

Myers et al SQ2

Sequencing processDrosophila genome(~100 million nt)

Sequence it

Technical limitation

Reads limited to 100’s of nt

Sequencing processDrosophila genome(~100 million nt)

. . .

How many possible 500 nt fragments are there?

Sequencing processDrosophila genome(~100 million nt)

. . .

SAMPLE

Sequencing processDrosophila genome(~100 million nt)

SAMPLE

. . .

How many 500 nt samples needed 100 million nt?100 000 000 500

Sequencing processDrosophila genome(~100 million nt)

SAMPLE

. . .

How many 500 nt samples needed 100 million nt?

Is this enough?

Oversampling … coverage?

1 000 000 5

Paint the wall

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

How long will this take?

Paint the wall

How long will this take?

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

Paint the wall

How long will this take?

40 "

25 "

1 sq "

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

Paint the wall

How long will this take?

40 "

25 "

1000paint balls?

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10

Oversampling

Co

mp

lete

nes

s

How much is painted with 1x oversampling?

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

What fraction won't be painted?

P(TT) = 1/2 x 1/2 = 1/4

Probability that two coins come up both tails

Rule of multiplicationintersectionindependent

Gets T from first AND gets T from second

Intersection of possibilities(Rule of multiplication)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

P(at least 1 T) = 1/4 + 1/4 + 1/4

Probability that either of two coins comes up tails

1/2 x 1/2 = 1/4?

Gets HT or TH or TT

Union of possibilities(Rule of addition)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

1/2 + 1/2 = 1?

P(at least 1 T) = 1/4 + 1/4 + 1/4

Probability that either of two coins comes up tails

Gets HT or TH or TT

Union of possibilities(Rule of addition)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

Rule of additionunion

mutually exclusive

P(at least 1 T) = 1 - 1/4

Probability that either of two coins does not comes up tails

Probability(2 T) = 1 – Probability(NOT 2 T)

Union of possibilities(Rule of complementation)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

Rule of complementationyin-yangAdds to 1

Sequencing processDrosophila genome(~100 million nt)

. . .

Focus on one nucleotide…

What’s the probability that it’s covered by one read?

What’s the probability that it’s covered by two reads?

What’s the probability that it’s covered by 200,000 reads?

Problem Set 3, Problem 2Statistics of mini-plasmid assembly

Why read pairs? Scaffolds?

DNA

Myers et al SQ6

Contig 1 Contig 2

G A T Cprimer

primer

x 1000's

plasmid

insert

~2000 nt mates

Myers et al SQ6Why read pairs? Scaffolds?

. . .

~ 150,000 nt

Bacterial Artificial CHROMOSOME

mates

Myers et al SQ6Why read pairs? Scaffolds?

P1-derived Artificial CHROMOSOME

Myers et al SQ6Why read pairs? Scaffolds?

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

Myers et al (2000)