Practical Guide to the $1000 Genome (2014)
Transcript of Practical Guide to the $1000 Genome (2014)
A Practical Guide to the $1000 Genome
Michael Heltzen, CEO & Co-Founder
Shawn C. Baker, Ph.D., CSO & Co-Founder
The Sequencing Marketplace
Match researchers with sequencing providers
Neutral stance
Unique perspective
Where to start?
How do I communicate it?
Pick 2, but not all 3…
The lab’s side of the problem
Overcapacity…
What is our value proposition?
What is an optimal customer for us?
Buyers don’t know what they want?
How do I price?
Should I generalize or specialize?
Technologies and needs change all the time…
Lack of standards
Why are standards so hard for us as an industry?
How does AllSeq work?
AllSeq connect researcher with NGS sequencing needs, to the most optimal lab for each case
It works like this
Project design& QA
Offers & Picking a lab
Match & talks
Human and
diseases
Virus and
Bacteria
Plants and
Animals
Over to Shawn and the $1000 Genome
The $1000 genome is here!
(sort of…)
The HiSeq X Ten: What is it?
Data output:
– 600 Gb/day
– 1.8 Tb/run
– ~5 whole human genomes/day
– 1800 genomes per year
Patterned flow cells
Improved optics
What’s the catch?
$1000 Genome
=$800 – sequencing$135 – amortization$65 – library prep
$1000 Genome
= $1M
$1000 Genome
= $10M
1 day = $5000
=
1 year = $1,800,000
=
1 year= $18,000,000
=
4 years = $72,000,000
=
Allseq.com/1000-genome
…ACCATGATCTAGCCGATTTCGA…
…TGGTACTAGATCGGCTAAAGCT…
Whole Genome vs Exome
Whole Genome
~2.8Gb = ~ 95% coverage
…ACCATGATCTAGCCGATTTCGA…
…TGGTACTAGATCGGCTAAAGCT…
Exome Sequencing
~40Mb = ~ 1.3% coverage
…ACCATGATCTAGCCGATTTCGA…
…TGGTACTAGATCGGCTAAAGCT…
Whole Genome vs Exome
WGS Exome
Price ✓Coverage ✓
Uniformity ✓Analysis ✓
HiSeq X Ten Dataset
HiSeq X Ten Dataset
NA12878D and NA12878J – Coriell Cell Repository
Illumina TruSeq Nano, 2X150bp, 350bp insert
>120Gb, 87% >Q30
Analyzing the Data
Primary
• Base calling
• QC
Secondary
• Assembly
• Alignment
Tertiary
• Annotations
• Visualization
• Statistics
Reporting
• Research
• Clinical
IT Infrastructure/Data Management
Analyzing the Data
Primary
• Base calling
• QC
Secondary
• Assembly
• Alignment
Tertiary
• Annotations
• Visualization
• Statistics
Reporting
• Research
• Clinical
IT Infrastructure/Data Management
Analyzing the Data
@EAS54_6_R1_2_1_413_324CCCTTCTTGTCTTCAGCGTTTCTCC+;;3;;;;;;;;;;;;7;;;;;;;88@EAS54_6_R1_2_1_540_792TTGGCAGGCCAAGGCCGATGGATCA+;;;;;;;;;;;7;;;;;-;;;3;83@EAS54_6_R1_2_1_443_348GTTGCTTCTGGCGTGGGTGGGGGGG+EAS54_6_R1_2_1_443_348;;;;;;;;;;;9;7;;.7;393333
fastq file:
Data Analysis & Interpretation
Medical report:
Example from knomeDISCOVERY
Analyzing the Data
Long Reads: PacBio
~2kb ~10kb
Long Reads: Moleculo
Moleculo TruSeq Synthetic Long Reads
10kb ‘synthetic’ reads
Long Reads: Oxford Nanopore
Single Cell/Cell-Free DNA Sequencing
Moving Beyond the Genome
Credits: Darryl Leja (NHGRI), Ian Dunham (EBI)
Topic: Researchers vs. clinical.
Trends: Transition to the Clinic
Increased output
Lower cost
Rapid updates
Ease of use
Quick TAT
Stability
Researchers Clinicians
Approval trend: Transition to the Clinic
MiSeq Dx
– FDA clearance Nov 2013
– Will also submit 2500 and NIPT assay
PGM
– Listed with FDA Sept 2014
Opportunities and challenges
What is great– We are getting there…– It is going faster and better/cheaper/faster– More and more people are starting to understand
What is not so great– We are not there yet – We are not even as far as many people think we are– Lack of standards (especially for the clinical market)
First: The bad part
Technical error sources:
– Sampling
– Sequencing
– Bioinformatics
– Interpretation
Lack of standards…
Then: The good part
Large steps in the right direction on all fronts. Is it only a matter of time now…
The new genomics technologies are slowly getting ripe for the clinic!
We are collectively making the world a better place!
www.allseq.com@[email protected]