Sequencing and Assembly Cont'd CS273a Lecture 5, Aut08, Batzoglou
CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50...
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Quality of assemblies—mouse
Terminology: N50 contig lengthN50 contig lengthIf we sort contigs from largest to smallest, and startCovering the genome in that order, N50 is the lengthOf the contig that just covers the 50th percentile.
7.7X sequence coverage
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Quality of assemblies—dog
7.5X sequence coverage
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Quality of assemblies—chimp
3.6X sequence Coverage
AssistedAssembly
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
History of WGA
• 1982: -virus, 48,502 bp
• 1995: h-influenzae, 1 Mbp
• 2000: fly, 100 Mbp
• 2001 – present human (3Gbp), mouse (2.5Gbp), rat*, chicken, dog, chimpanzee,
several fungal genomes
Gene Myers
Let’s sequence the human
genome with the shotgun
strategy
That is impossible, and
a bad idea anyway
Phil Green
1997
$399 Personal Genome Service
$2,500 Health Compass service
$985 deCODEme(November 2007)
(November 2007)
(April 2008)
$350,000 Whole-genome sequencing(November 2007)
Genetic Information Nondiscrimination Act(May 2008)
Whole-genome sequencing
Comparative genomicsGenome resequencing
Structural variation analysis
Polymorphism discoveryMetagenomicsEnvironmental
sequencingGene expression profiling
Applications
GenotypingPopulation genetics
Migration studiesAncestry inference
Relationship inferenceGenetic screening
Drug targetingForensics
CS273a Lecture 9, Aut08, Batzoglou
Sequencing applications
Demand for more sequencing
Sequencing technology improvement
Increase in sequencing data output
New sequencing applications
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technologySanger sequencing
1975 1980 20081990 2000
$10.00
$1.00
$0.10
$0.01
Cost per finished bp:
Read length: 15 – 200 bp 500 – 1,000 bp
Throughput: “grad-student years” 2 ∙ 106 bp/day
Fred Sanger
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technologySanger sequencing
3 ∙ 109 bp
1x coverage
10x coverage
2 ∙ 106 bp/day= 40 years
× 3 ∙ 109 bp
10x coverage × 3 ∙ 109 bp × $0.001/bp = $30 million
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Pyrosequencing on a chip
Mostafa Ronaghi, Stanford Genome Technologies Center
454 Life Sciences
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technologyNext-generation sequencing
Read length: 250 bp
Throughput: 300 Mb/day
Cost: ~ 10,000 bp/$
De novo: yes
Genome Sequencer / FLX
“short reads”
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Single Molecule Array for Genotyping—Solexa
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technologyNext-generation sequencing
Read length: ~ 35 bp
Throughput: 300 – 500 Mb/day
Cost: ~ 100,000 bp/$
De novo: yes
Genome Analyzer SOLiD Analyzer
“microreads”
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technologyNext-generation sequencing
Read length: ~ 50-150 bp
Throughput: 3 Gb/day
Cost: ~ 3,000,000 bp/$
De novo: yes
Genome Analyzer SOLiD Analyzer
reads
CS273a Lecture 9, Aut08, Batzoglou
Complete Genomics
$5,000 this summer Quality?...
1,000 genomes in 2009 20,000 genomes in 2010
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
• 2006: $10 million• 2008: $100,000• 2009: $10,000• ? $1,000• ??? $100
So, how fast is cost going down?
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technology
Next-generation sequencing
Read length: 1 bp
Throughput: 1 – 2 Mb/day
Cost: 5,000 bp/$
De novo: no
Infinium Assay GeneChip Array
genotypes
“SNP chips”
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Nanopore Sequencing
http://www.mcb.harvard.edu/branton/index.htm
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technologyNext-generation sequencing
CS273a Lecture 9, Aut08, Batzoglou
Sequencing technologySequencing technology
Technology Read length (bp)
Throughput (Mb/day)
Cost (bp/$)
De novo
Sanger 1,000 2 1,000
454 250 300 10,000
Solexa / ABI 35 500 100,000
SNP chip 1 2 5,000
Application Sanger 454 Solexa/ABISNP chip
Bacterial sequencing $
Mammalian sequencing $$$ $$not likely
today
Mammalian resequencing $$$ $$ $
Metagenomics $$ $
Genotyping $$$ $$$ $$$
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Evolution at the DNA level
…ACGGTGCAGTTACCA…
…AC----CAGTCCACCA…
Mutation
SEQUENCE EDITS
REARRANGEMENTS
Deletion
InversionTranslocationDuplication
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Evolutionary Rates
OK
OK
OK
X
X
Still OK?
next generation
CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008
Orthology, Paralogy, Inparalogs, Outparalogs