PR7000-0663
What You’ll Learn Today
Variant Call Format
• Components
• Format types
Recommendations for NGS Interpretation
Publicly available Databases
Somatic versus Germline Interpretations Tools / Differences /
Similarities
Automation
Reporting
PR7000-0663
Meta-information Lines of a vcf
https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/#vcf-file-structure
gdcWorkflow: information on GDC pipelines that were used to generate the VCF file. GDC annotated VCF files contain two gdcWorkflow lines, one representing the variant calling process and the other representing the variant annotation process.
INDIVIDUAL: information about the study participant, including:
• NAME: Submitter ID (barcode) associated with the participant, and
• ID: GDC case UUID
SAMPLE: sample information, including:
• ID: NORMAL or TUMOR
• NAME: Submitter ID (barcode) of the aliquot
• ALIQUOT_ID: GDC aliquot UUID
• BAM_ID: BAM file UUID
INFO: format of additional information fields
• NOTE: GDC Annotated VCFs may contain multiple INFO lines. The last INFO line contains information about annotation fields generated by the Somatic Annotation Workflow (see GDC INFO Fields below).
FILTER: description of filters that have been applied to the data
FORMAT: description of genotype fields
reference: the reference genome used to generate the VCF file
contig: contigs included in the VCF files
• NOTE: Annotated VCFs include contig information for autosomes, sex chromosomes, and mitochondrial DNA. Unplaced, unlocalized, human decoy, and viral genome sequences are not included.
VEP: the VEP command used by the Somatic Annotation Workflow to generate the annotated VCF file.
PR7000-0663
Column Header Line in vcf
CHROM: chromosome
POS: position
ID: identifier
REF: reference base(s)
ALT: alternate base(s)
QUAL: quality
FILTER: filter status
INFO: additional information
FORMAT: format of sample genotype data
NORMAL: normal sample genotype data
TUMOR: tumor sample genotype data
See Variant Call Format (VCF) Version 4.1 Specification for details.
PR7000-0663
Data in vcf
This contains the recorded called variant information. It is in the
format of tab-delimited information. Each line contains a
recorded variant from the calls upstream.
Example of a line of a variant call in the vcf:
chr22 17264565 . G T 255.0 Pass
DP=132;DP4=41,39,28,24;STDP4=41,39,28,24;AF1=0.39393938;AN=2;MQ=41 GT:PL:GQ
1/0:0,255,255:100
PR7000-0663
Commonly Used Publicly Available Databases
1000 Genomes - >1000 participants to catalog human genetic variation
ClinVar – medically important variants and phenotypes
COSMIC – Catelog of somatic mutations in cancer
dbNSFP – based on Ensemble for functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) in the human genome
dbSNP - for genetic variation within and across different species developed and hosted by (NCBI) in collaboration with (NHGRI)
ESP6500 - genes and mechanisms contributing to heart, lung and blood disorders
ExAC - >60,000 unrelated individuals sequenced as part of various disease-specific and population genetic studies
HGMD - known (published) gene lesions responsible for human inherited disease
OMIM – comprehensive collection of human genes and genetic phenotypes
RefSeq - annotated and curated collection of publicly available nucleotide sequences (DNA, RNA) and their protein products.
HPO - standardized vocabulary of phenotypic abnormalities encountered in human disease
NCBI GeneBank - NIH genetic sequence database
PR7000-0663
How Databases Can Be Utilized
Population Frequencies
Subpopulation / Cohort Studies
Genotype/Phenotype Correlation
Known Diseases
Known Mutations
Functional Effect Predictions
Inheritance Mode of Disease
Other
PR7000-0663
Some Analysis Methods/ Tools / Hypotheses
Population Frequencies
Assessing Inheritance Modes using Mendelian Law
Exonic / Protein Investigation
Quality of called variant
Zygosity
Targeted Panels
Splice Sites
Compound Variation
Public database and Lab’s Own Curation
PR7000-0663
How is the Information Processed/Saved?
Vcf file > Bioinformatics / Excel Files
Tens to thousands of lines of information per patient!
Questions/methods/tools applied to each line:
• chr22 17264565 . G T 255.0 Pass
DP=132;DP4=41,39,28,24;STDP4=41,39,28,24;AF1=0.39393938;AN
=2;MQ=41 GT:PL:GQ 1/0:0,255,255:100
- What is the population frequency for this?
- Is there a protein change?
- Was this inherited?
- Does either parent display phenotype?
- What is the gene relationship?
PR7000-0663
Issues with Manual Analysis
Saving Information – Big Data!
Relying on Interpretation Method (reproducible?)
Audit Trails
Bioinformatics – not always avaialble
No specific standard – only recommendations
Data increasing daily! Storage!
PR7000-0663
How can we move from a suspected cancer to a report including diagnoses, clinical trails, and thetapeutic options?
PR7000-0663
Annotation sources and versions tag each analysis for your reference and audit trails – what was available at that time in the resources?
PR7000-0663
Easy to use drag and drop method to build your SOP hypothesis
Underlying
annotation
sources are
embedded in
the software
PR7000-0663
Select criteria for your patient’s variant filtration with as few or as many filter patterns as you see fit
PR7000-0663
Filter through ClinVar and the Cancer Gene Census accessing specific clinical significances
PR7000-0663
Filter Using Clinical Interpretations of Variants in Cancer (CIViC) for Precision Medicine Applications
PR7000-0663
Your list of 150 variants has now been reduced to 3 for final review
150 variants 3 variants
PR7000-0663
Moving from triage to review tab your pathologist can view all annotations, labels and more in one screen
PR7000-0663
Your historical curated information is also readily accessible – is this related to the disease? Is it actionable?
PR7000-0663
Categorize your variant with Clinical Relevance = Uncertain, Low, Moderate, High and Very High
PR7000-0663
If you haven’t filtered against this in your SOP tree, CIViC annotations are available in final review
PR7000-0663
Report your actionable variants with your final report
Win Time
Sending Out
These
Templated
Reports from
Automated
Analyses!!
Reduce Errors with these auto-
populated report templates!!!
Top Related