CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50...

29
273a Lecture 9, Aut08, Batzoglou 273a Lecture 9, Fall 2008 Quality of assemblies—mouse Terminology: N50 contig length N50 contig length If we sort contigs from largest to smallest, and Covering the genome in that order, N50 is the le Of the contig that just covers the 50 th percentil 7.7X sequence coverage
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50...

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Quality of assemblies—mouse

Terminology: N50 contig lengthN50 contig lengthIf we sort contigs from largest to smallest, and startCovering the genome in that order, N50 is the lengthOf the contig that just covers the 50th percentile.

7.7X sequence coverage

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Quality of assemblies—dog

7.5X sequence coverage

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Quality of assemblies—chimp

3.6X sequence Coverage

AssistedAssembly

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

History of WGA

• 1982: -virus, 48,502 bp

• 1995: h-influenzae, 1 Mbp

• 2000: fly, 100 Mbp

• 2001 – present human (3Gbp), mouse (2.5Gbp), rat*, chicken, dog, chimpanzee,

several fungal genomes

Gene Myers

Let’s sequence the human

genome with the shotgun

strategy

That is impossible, and

a bad idea anyway

Phil Green

1997

$399 Personal Genome Service

$2,500 Health Compass service

$985 deCODEme(November 2007)

(November 2007)

(April 2008)

$350,000 Whole-genome sequencing(November 2007)

Genetic Information Nondiscrimination Act(May 2008)

Whole-genome sequencing

Comparative genomicsGenome resequencing

Structural variation analysis

Polymorphism discoveryMetagenomicsEnvironmental

sequencingGene expression profiling

Applications

GenotypingPopulation genetics

Migration studiesAncestry inference

Relationship inferenceGenetic screening

Drug targetingForensics

CS273a Lecture 9, Aut08, Batzoglou

Sequencing applications

Demand for more sequencing

Sequencing technology improvement

Increase in sequencing data output

New sequencing applications

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologySanger sequencing

1975 1980 20081990 2000

$10.00

$1.00

$0.10

$0.01

Cost per finished bp:

Read length: 15 – 200 bp 500 – 1,000 bp

Throughput: “grad-student years” 2 ∙ 106 bp/day

Fred Sanger

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologySanger sequencing

3 ∙ 109 bp

1x coverage

10x coverage

2 ∙ 106 bp/day= 40 years

× 3 ∙ 109 bp

10x coverage × 3 ∙ 109 bp × $0.001/bp = $30 million

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Pyrosequencing on a chip

Mostafa Ronaghi, Stanford Genome Technologies Center

454 Life Sciences

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

Read length: 250 bp

Throughput: 300 Mb/day

Cost: ~ 10,000 bp/$

De novo: yes

Genome Sequencer / FLX

“short reads”

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Single Molecule Array for Genotyping—Solexa

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

Read length: ~ 35 bp

Throughput: 300 – 500 Mb/day

Cost: ~ 100,000 bp/$

De novo: yes

Genome Analyzer SOLiD Analyzer

“microreads”

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

Read length: ~ 50-150 bp

Throughput: 3 Gb/day

Cost: ~ 3,000,000 bp/$

De novo: yes

Genome Analyzer SOLiD Analyzer

reads

CS273a Lecture 9, Aut08, Batzoglou

Illumina Projections

CS273a Lecture 9, Aut08, Batzoglou

Complete Genomics

$5,000 this summer Quality?...

1,000 genomes in 2009 20,000 genomes in 2010

CS273a Lecture 9, Aut08, Batzoglou

Pacific Biosciences

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

• 2006: $10 million• 2008: $100,000• 2009: $10,000• ? $1,000• ??? $100

So, how fast is cost going down?

CS273a Lecture 9, Aut08, Batzoglou

Molecular Inversion Probes

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Illumina Genotype Arrays

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technology

Next-generation sequencing

Read length: 1 bp

Throughput: 1 – 2 Mb/day

Cost: 5,000 bp/$

De novo: no

Infinium Assay GeneChip Array

genotypes

“SNP chips”

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Nanopore Sequencing

http://www.mcb.harvard.edu/branton/index.htm

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technology

Technology Read length (bp)

Throughput (Mb/day)

Cost (bp/$)

De novo

Sanger 1,000 2 1,000

454 250 300 10,000

Solexa / ABI 35 500 100,000

SNP chip 1 2 5,000

Application Sanger 454 Solexa/ABISNP chip

Bacterial sequencing $

Mammalian sequencing $$$ $$not likely

today

Mammalian resequencing $$$ $$ $

Metagenomics $$ $

Genotyping $$$ $$$ $$$

CS273a Lecture 9, Aut08, Batzoglou

Multiple Sequence Alignment

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Evolution at the DNA level

…ACGGTGCAGTTACCA…

…AC----CAGTCCACCA…

Mutation

SEQUENCE EDITS

REARRANGEMENTS

Deletion

InversionTranslocationDuplication

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Evolutionary Rates

OK

OK

OK

X

X

Still OK?

next generation

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Orthology, Paralogy, Inparalogs, Outparalogs

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008