ASHG 2015 Genome in a bottle
-
Upload
genomeinabottle -
Category
Health & Medicine
-
view
2.742 -
download
0
Transcript of ASHG 2015 Genome in a bottle
Genome in a Bottle: You’ve sequenced. How well did you do?
October 9, 2015
Justin Zook, Marc Salit, and the Genome in a Bottle Consortium
*Nothing to Disclose
Sequencing technologies and bioinformatics pipelines disagree
O’Rawe et al. Genome Medicine 2013, 5:28
Sequencing technologies and bioinformatics pipelines disagree
O’Rawe et al. Genome Medicine 2013, 5:28
Who is right?
Is anyone right?
Genome in a Bottle Consortium (GIAB)Hosted by US National Institute of Standards and Technology
Goal: Provide infrastructure to assess confidence in human variant calls
• Appropriately consented widely available DNA samples, distributed by the Coriell Institute– Also, QCed Reference Material (RM) versions
from controlled lots will be available from NIST– Also, PGP samples are commercially available
• High-accuracy reference data for these samples
• Tools to facilitate their use– With the Global Alliance Data Working Group
Benchmarking Team
Global Alliance for Genomics and Healthga4gh.org
Genome in a Bottlegenomeinabottle.org
GIAB Selected SamplesCEPH/Utah Pedigree 1463
✔
NA12889
NA12879
NA12890
NA12880NA12881
NA12882NA12883
NA12884NA12885
NA12886NA12887
NA12888NA12893
NA12877 NA12878
NA12891 NA12892
✔ ✔NA24149 NA24143
NA24385
Ashkenazi Jewish Trio
✔
NA24694 NA24695
NA24631
Asian (Han Chinese) Trio
✔
Note: Illumina and RTG have used data from the pedigreeto improve variant calls in the specific GIAB samples.
New
New
PersonalGenomeProject
Available asNIST RM8398
NGS Validation Process usingGenomes in Bottles
Sample
gDNA isolation
Library Prep
Sequencing
Alignment/Mapping
Variant Calling
Confidence Estimates
Downstream Analysis
Analytical ProcessGenome in a Bottle Scope
Pre-Analytical Process
Clinical InterpretationGIAB Data
Pilot Genome: NA12878
Integrated 14 datasets from 5 platforms to establish Reference SNP/indel Calls for NA12878
Zook et al., Nature Biotechnology, 2014.
~77 % High-confidence~23 % Uncertain
Uses of GIAB NA12878
Oncology – Molecular and Cellular Tumor Markers“Next Generation” Sequencing (NGS) guidelines for somatic genetic variant detection
www.bioplanet.com/gcat
GeT-RM Browser from NCBI and CDC• http://www.ncbi.nlm.nih.gov/variation/tools/get-rm/
Global Alliance for Genomics and Health Benchmarking Task Team
• Developed standardized definitions for performance metrics like TP, FP, and FN.
• Developing sophisticated benchmarking tools• vcfeval – Len Trigg• hap.py – Peter Krusche• vgraph – Kevin Jacobs
• Standardized bed files with difficult genome contexts for stratification
Credit: GA4GH, Abby Beeler, Ellie Wood
Stratification of FP RatesHigher FP rates at Tandem Repeats
New GIAB Triosfrom Personal Genome Project
Public, unembargoed data from GIAB AJ PGP Trio
Long reads/”Linked” reads• ~70/30/30x PacBio
– ~11kb N50• ~100x BioNano• ~30x 10X Genomics• ~20x Moleculo• Complete Genomics LFR• ~0.005x Oxford Nanopore
Short reads• 300x Illumina paired-end• 15x Illumina 6kb mate-pair• 100x Complete Genomics• 60x SOLiD 5500W• 1000x Ion Proton Exome
http://biorxiv.org/content/early/2015/09/15/026468
GIAB Analysis Group – New Data Sets
Leaders• Francisco de la Vega• Chris Mason• Tina Graves• Valerie Schneider• Justin Zook• Marc Salit
Status• Analysis Group Responsibilities:
– https://docs.google.com/document/d/10eA0DwB4iYTSFM_LPO9_2LyyN2xEqH49OXHhtNH1uzw/edit?usp=sharing
• Analysis Milestones:– https://docs.google.com/spreadsheets/d/1Pj4nSz
H742g40wJz2fA6f8kFtZYAToZpSZYVPiC5st4/edit?usp=sharing
• Analysis Methods– https://docs.google.com/spreadsheet
s/d/1Je2g85H7oK6kMXbBOoqQ1FMNrvGnFuUJTJn7deyYiS8/edit?usp=sharing
• Analysis Plan:– https://drive.google.com/file/d/0B7Ao1qq
JJDHQdnVEaVdqbWdEdkE/view?usp=sharing
• Collecting Data and analyses on GIAB FTP Site
• Recruiting people to help with the work.
Goal: Establish and distribute a set of authoritative benchmark variant calls of all types and sizes, as well as homozygous reference regions, on GIAB PGP trios
Analysis Progress: AJ Trio• SNPs/indels
– NIST working on integration– 10X/moleculo/PacBio for difficult-to-map regions
• Assembly– 2 de novo assemblies – Useful for SV calling
• Structural variants– Candidate calls being generated by 15+ groups with >20
different algorithms and 6 datasets– 3+ integration methods
• Long-range Phasing– 2 phased calls so far (CG LFR and 10X)– Integration methods needed
• Other analyses– CpG methylation with PacBio and Illumina
GIAB AJ Trio PacBio-only AssembliesPacBio Only
Input Algorithm# of
Contigs N50 Max Total
ChildMHAP/Celera (Phillippy Lab) 13,048 4.5Mb 35.1Mb 3.0Gb
ChildDaligner/Falcon
(Chin/Bashir) 9,973 7.1Mb 39.2Mb 3.0Gb
MotherMHAP/Celera (Phillippy Lab) 23,493 1.03Mb 8.9Mb 3.0Gb
FatherMHAP/Celera (Phillippy Lab) 16,326 0.91Mb 9.8Mb 3.0Gb
Merged Trio
Daligner/Falcon(Chin/Bashir) 5,680 9.25 Mb 50.3Mb 2.9Gb
Credits: Ali Bashir, Jason Chin, Adam Phillippy, and Serge Koren
GIAB AJ Trio Hybrid PacBio/BioNano Assembly
Hybrid (PacBio with BioNano)
Input Assembly Notes# of
Scaffolds N50 Max TotalHG002 Falcon 248 22.7Mb 92.8Mb 2.38Gb
Trio Falcon 210 29.3Mb 87.6Mb 2.32GbTwo Step
Triocelera (child) +
falcon (trio) 187 34.3Mb 98.0Mb 2.6Gb
Credits: Ali Bashir, Jason Chin, Alex HastiePendleton et al, Nature Methods, 2015
Proposed approach to form high-confidence SV (and non-SV) calls
Generate Candidate Calls
Compare/evaluate calls using Parliament/MetaSV/svclassify/others?;
manual inspection
Integrate new and revised calls; manual inspection
Combine integrated calls; manual inspection; targeted experimental validation?
August 30, 2015
Nov 1, 2015
Jan 1, 2016
Jan 26, 2016 and beyond
Very Preliminary Confirmation of SVs
Integration results from AJ son
Parliament: BMC Genomics, 2015, 16:286 (performed by Andrew Carroll, DNAnexus)MetaSV: Bioinformatics, 2015, 31:2741 (performed by Marghoob Mohiyuddin, Bina/Roche)
• Parliament– Candidates from Illumina– Confirmed by PacBio and/or
Illumina– ~50% in both technologies– ~4.5k deletions, 1k insertions– 85% of Genotypes consistent
within Trio • MetaSV
– Multiple types of evidence from Illumina
MetaSVTotal:2809
ParliamentTotal:5467
569(20 %)
977(18 %)
MetaSV2240
(80 %)Parliament
4490(82 %)
50 % reciprocal overlapSome overlap within Parliament calls
New GIAB GitHub Site
github.com/genome-in-a-bottle Credit: Chunlin Xiao, NCBI
WARNINGS
• Easiest to benchmark only within high-confidence bed file
• Benchmark calls/regions tend to be biased towards easier variants and regions– Some clinical tests are enriched for difficult sites
• Always manually inspect a subset of FPs/FNs• Stratification by variant type and region is
important• Always calculate confidence intervals
Acknowledgments
• FDA – Elizabeth Mansfield, Computing staff
• Many members of Genome in a Bottle– New members
welcome!– Sign up on website for
email newsletters
Steering Committee– Marc Salit – Justin Zook– David Mittelman – Andrew Grupe – Michael Eberle– Steve Sherry – Deanna Church – Francisco De La Vega– Christian Olsen – Monica Basehore – Lisa Kalman – Christopher Mason – Elizabeth Mansfield – Liz Kerrigan – Leming Shi – Melvin Limson – Alexander Wait Zaranek – Nils Homer – Fiona Hyland– Steve Lincoln – Don Baldwin – Robyn Temple-Smolkin – Chunlin Xiao– Kara Norman– Luke Hickey
For More Informationwww.genomeinabottle.org - sign up for general GIAB and Analysis Team google group emails
github.com/genome-in-a-bottle – Guide to GIAB data & ftp
www.slideshare.net/genomeinabottle
www.ncbi.nlm.nih.gov/variation/tools/get-rm/ - Get-RM Browser
Data: http://biorxiv.org/content/early/2015/09/15/026468
Global Alliance Benchmarking Team– ga4gh.org/#/benchmarking-team
Twice yearly workshop – Winter: January 28-29, 2016 at Stanford University, California, USA– Summer at NIST, Maryland, USA
Public Meetings!
Justin Zook: [email protected] Salit: [email protected]
Contribute calls or critically evaluate
GIAB calls!
NIST/NRC Postdoc Opportunities available!