Workshop on Tools for DNA sequencing and resequencing in ...
Eshg sequencing workshop
-
Upload
oxford-gene-technology -
Category
Technology
-
view
1.316 -
download
0
Transcript of Eshg sequencing workshop
![Page 1: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/1.jpg)
Dr. Mike Evans — Chief Executive
A unique targeted sequencing service providing meaningful results, not insurmountable data
![Page 2: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/2.jpg)
Outline of presentation
• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO
• Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist
• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology
• Summary• Q&A• Lunch
![Page 3: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/3.jpg)
OGT - provides advanced clinical genetics solutions - develops innovative molecular diagnostics
• Founded by Ed Southern in 1995• 64 people
OGT Begbroke: Corporate offices and high-throughput labs
OGT Southern Centre: Biomarker discovery
![Page 4: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/4.jpg)
IP Licensing40 licence relationships
TechnologiesFor Molecular
Medicine
Clinical and Genomic SolutionsCytogenetics products and genomic services
Diagnostic BiomarkersGenomic- and protein-based diagnostics
OGT’s key businesses
![Page 5: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/5.jpg)
Clinical and Genomic Solutions
Addressing the challenges of high-throughput, high-resolution molecular technologies:
•High equipment and staff training costs•Short equipment lifespan•Complex study design and processes (e.g. platform evaluation & selection)•Vast amounts of data
• Extensive computing infrastructure
• Data analysis expertise and resource
The solution: Genefficiency Genomic Services
![Page 6: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/6.jpg)
High-quality data & complete reassurance
• Experimental and array design expertise• High-throughput processing (>2000 samples / week)• Applications: aCGH-CNV, methylation, miRNA, gene expression
analysis• Comprehensive data analysis services • >40 QC checks on each sample to ensure high-quality data
Genefficiency™ — World’s Leading aCGH Service
![Page 7: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/7.jpg)
Independent Accreditations
• First Agilent High-Throughput Microarray Certified Service Provider
• ISO 9001:2008 — Quality management systems
• ISO 27001:2005 — Information security
• ISO 17025:2005 — aCGH Laboratory services
FS 561156
IS 561157
4593
![Page 8: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/8.jpg)
20,000 samples. 1,000 samples / week
“In order to characterise genetic variants, reproducible performance and reliable processing of the high resolution microarrays is essential. We were pleased with OGT’s responsive approach and attention to producing high quality data to tight deadlines”
Dr Matt Hurles, Wellcome Trust Sanger Institute.”
Customer Satisfaction…
![Page 9: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/9.jpg)
OGT Collaborators and Customers
![Page 10: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/10.jpg)
A World-class Team
Our expert team deliver:• Excellent project management and customer service
• >600 projects to date• >50,000 samples
• Unparalleled expertise in study and probe design• Advanced data analysis though a dedicated team of
bioinformaticians• Rapid turnaround times• A wealth of experience of clinical and translational
research projects
![Page 11: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/11.jpg)
Delivering Discovery
Genefficiency Targeted Sequencing Services — designed to be different:
• Comprehensive — taking you from genomic DNA to filtered, qualified results
• Rigorously designed — project and probe design expertise maximises your likelihood of discovery
• Expert support — experienced team of biologists and bioinformaticians
• Dedication to quality — from sample to result, delivering reliable results every time
![Page 12: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/12.jpg)
Delivering an Integrated, Comprehensive Service
11/04/23 12
1. Selection of most appropriate genomic regions for enrichment
2. Capture, sample multiplexing and sequencing
3. Data analysis and advanced filtering of variants
![Page 13: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/13.jpg)
Delivering Expert Project Design
Step 1: Selection of most appropriate genomic regions for your project and budget
Whole exome
Pre-designed, validated whole exome capture probes
Coding regions are “most likely” candidates for many disorders
Custom genomic regions
Expert custom design of capture probes for your regions of interest
Flexibility to focus on regions of clinical significance or GWAS regions
![Page 14: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/14.jpg)
Delivering Class-leading Technology
We have fully optimised the DNA capture and sequencing methodologies, so you don’t have to!
Step 2: Performing the capture, sample multiplexing, library preparation and sequencing
•Options for sample indexing and multiplexing to minimise sequencing cost
•Depth of sequencing coverage to suit your samples and project
•Paired-end sequencing on the industry-leading Illumina HiSeq 2000
![Page 15: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/15.jpg)
OGT Delivers Discovery, not just Data
Step 3: Data analysis and advanced filtering of variants
•OGT’s dedicated analysis pipeline brings you beyond data, to a filtered list of variants relevant to your study
SEQUENCE FILTER DISCOVER
![Page 16: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/16.jpg)
OGT Genefficiency Targeted Sequencing Services
The PLATFORM• Core sequencing platform: Illumina HiSeq 2000 • Core sequence capture technology: Agilent SureSelect
The PEOPLE• Team of highly skilled molecular biologists and bioinformaticians• Core expertise in probe design • Successful development of advanced analysis solutions
![Page 17: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/17.jpg)
Outline of presentation
• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO
• Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist
• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology
• Summary• Q&A• Lunch
![Page 18: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/18.jpg)
Agenda
• Important Definitions and Terminologies
• Introduction to Targeted Enrichment
• Custom Bait Design
![Page 19: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/19.jpg)
Definitions and Terminologies
• Read length – The number of bases sequenced in a fragment
• Capture efficiency
• Paired end sequencing
• Read depth - How many times has a base been sequenced?
On target Off targetOff target
Region of Interest
Region of Interest
![Page 20: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/20.jpg)
Read Depth Will Vary Across a Region of Interest
*Sequence Depth >20x: ~82% for Single End
How many times has a base been sequenced?
*Agilent. 5990-4928EN
![Page 21: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/21.jpg)
Read Depth Will Vary Across a Region of Interest
*Sequence Depth >20x: ~82% for Single End~90% for Paired End
How many times has a base been sequenced?
*Agilent. 5990-4928EN
![Page 22: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/22.jpg)
Assuming no allelic bias the theoretical read depth required to detect heterozygous variation with given accuracy can be calculated using a binomial distribution
• Minimum capacity required = Region of interest (ROI) x required depth
• Q30 variant detection for 15Kb ROI requires 210Kb sequencing capacity
Calculations based on variation being seen in at least 2 reads
• Should not be just one read as this could be ‘noise’
• Required observations could be a percentage of reads
Read Depth Required for Mutation Detection
Depth Required Het. Call Accuracy Probability of Error Quality
11 99% 1:100 Q20
14 99.9% 1:1000 Q30
18 99.99% 1:10000 Q40
25 99.999% 1:100000 Q50
![Page 23: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/23.jpg)
Agenda
• Important Definitions and Terminologies
• Introduction to Targeted Enrichment
• Custom Bait Design
![Page 24: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/24.jpg)
Why use Targeted Enrichment?
Flexibility in choice of genomic loci• Allows capture of specific regions of interest for SNP and Indel detection
Cost Effectiveness• Ideal for clinical applications
• Specific candidate genes are targeted
• Fine mapping post-GWAS
• Cost Benefits
• Enables multiplexing to fill capacity
Streamlined Data Analysis• Reduced noise due to targeted specificity
![Page 25: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/25.jpg)
Targeted Approaches Introduce Bias
There are significant imbalances in the sequence coverage achieved, particularly with targeted approaches
E.g. Agilent SureSelect*
• 3.3MB ROI
• 10M reads
• ~80% Targeted bases covered at ≥ 20x depth
• < 4% Targeted bases missed
*Ernani F. And LeProust E, Agilent. 5990-3532EN
![Page 26: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/26.jpg)
14x (Q30)
Targeted gene sequencing can lead to some targets without the
required depth of coverage
Example of Design Bias - Insufficient Coverage
Inadequate Coverage
*data kindly provided by C. Mattocks National Genetics Reference Lab, Salisbury, UK
![Page 27: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/27.jpg)
Option 1:
•Increase coverage by increasing depth of sequencing
•Coverage of all targets proportionally increased
•Increased cost of sequencing
•Some bases still missed
(Q30)
Solution: Intelligent Design to Improve Coverage:
Option 2:
•Intelligent design of capture probes increases under-represented loci
•More even coverage of entire region, no loci missed (more likely to find mutations present)
•No need to increase sequence depth overall (more cost effective)
![Page 28: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/28.jpg)
Agenda
• Important Definitions and Terminologies
• Introduction to Targeted Enrichment
• Custom Bait Design
![Page 29: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/29.jpg)
Problems Facing Users
• Design tools not user friendly• Design tools only good for draft design• Potential sources of bias• Regions of interest too short
• Bait thermodynamic behaviour
• GC content
• Melting Temperature
• Risk of Design Errors
• OGT’s extensive experience in designing probes for microarrays allows us to minimise bias and ensure evenness of coverage giving the best chance to identify mutations
![Page 30: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/30.jpg)
OGT’s Design Pipeline – what we need from you:
• Regions of Interest• Gene lists• Chromosomal locations
• Genome build version
• Data file format• Text, Excel, etc....• Consistent e.g. chr1: 2247628-2248537
3. Singletons2. Draft Design1. Data 4. Thermo-
dynamics 5. Report
![Page 31: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/31.jpg)
• Assess the output:• Coverage• Bait distribution• Repeatmasking
Region of Interest
Run Draft Design
3. Singleton Baits
2. Draft Design1. Data 4. Bait Thermo-
dynamics 5. Report
![Page 32: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/32.jpg)
• Assess the output:• Coverage• Bait distribution• Repeatmasking
Region of Interest
Run Draft Design
3. Singleton Baits
2. Draft Design1. Data 4. Bait Thermo-
dynamics 5. Report
Repeatmasking
![Page 33: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/33.jpg)
• This ensures that small regions are captured as well as large regions
• Advantage - Improves evenness of capture across the design
Before After
• Review the draft design and identify any regions covered by a
single bait• These regions span less than 120 bases
• Add additional singleton baits to the design
Correction for Singleton Baits
3. Singleton Baits
2. Draft Design1. Data 4. Bait Thermo-
dynamics 5. Report
![Page 34: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/34.jpg)
GC content
• Calculate GC content for all baits
• Identify those baits where GC content is extreme (for instance >65% and <40%)
• Add additional copies of these baits
Region of Interest
GC extreme
Correction for Bait Thermodynamics
Tm content
• Calculate the Tm for all baits
• Identify those baits where Tm is extreme (e.g. > 75oC)
• Add additional copies of these baits
Tm extreme
3. Singleton Baits
2. Draft Design1. Data 4. Bait Thermo-
dynamics 5. Report
![Page 35: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/35.jpg)
3. Singleton
Baits2. Draft Design1. Data 4. Bait Thermo-
dynamics 5. Report
• Design Parameters
• Depth of Coverage• On target / Off target• Regions not covered – and why not
• Bait Details• Singletons• GC distribution• Tm distribution
• Library Design• Baits generated
Customer Report
![Page 36: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/36.jpg)
• Better ‘evenness’ of coverage helps ensure no regions are missed and maximises the likelihood of variant detection
• Improvement of overall capture efficiency and on-target performance equals cost effective sequencing downstream
• Increase capture efficiency of SNPs and Indels equals an increase in the likelihood of detection
• Reduction of risk
Advantages of OGT’s Approach
![Page 37: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/37.jpg)
Summary
• Custom design of regions for targeted sequencing offers significant flexibility for many applications
• Expert probe design will ensure:• Evenness of coverage across the entire region
• Maximum likelihood of discovery of variants
• Efficient and cost effective use of sequencer capacity
• Overall these modifications make OGT’s capture perform better
![Page 38: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/38.jpg)
Outline of presentation
• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO
• Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist
• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology
• Summary• Q&A• Lunch
![Page 39: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/39.jpg)
Adding Value Through Analysis
• Introduction• NGS data analysis
• Primary analysis• Mapping and assembly• Q score re-calibration• NGS sequencing QC• NGS alignment QC
• Secondary analysis• SNP and Indel calling• Annotation and evaluation pipeline• SIFT and PolyPhen
• Deliverables• Case study• Summary
![Page 40: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/40.jpg)
The Analysis Challenge
SequencerHard drive
with ~4Gb per exome
Publication
![Page 41: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/41.jpg)
Raw Data: FASTQ(standard text representation of short reads)
FASTQ uses four lines per sequence.
• Line 1: '@' followed by a sequence identifier
• Line 2: raw sequence letters
• Line 3: '+' (and optional sequence identifier)
• Line 4: quality values for the sequence in Line 2. Must contain the same number of symbols as letters in the sequence. (The letters encode Phred Quality Scores from 0 to 93 using ASCII 33 to 126)
Example
@SEQ_IDGATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT+!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
![Page 42: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/42.jpg)
Phred Quality Scores
• Phred is an accurate base-caller used for capillary traces (Ewing et al Genome Research 1998)
• Each called base is given a quality score Q
• Quality based on simple metrics (such as peak spacing) calibrated against a database of hand-edited data
• QPhred = -10 * log10(estimated probability call is wrong)
Q30 often used as a threshold for useful sequence data
Phred Quality ScoreProbability of incorrect base call
Base call accuracy
10 1 in 10 90 %
20 1 in 100 99 %
30 1 in 1000 99.9 %
40 1 in 10000 99.99 %
![Page 43: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/43.jpg)
• FASTQ is FASTA with quality scores added. Standard output format of NGS basecalling;
• SAM and BAM are equivalent formats for describing alignments of reads to a reference genome
• SAM: text file• BAM: compressed binary, indexed, so it is possible to access reads
mapping to a segment without decompressing the entire file• BAM is used by IGV and other software• Current Standard Binary Format containing:
• Meta Information (read groups, algorithm details)• Sequence and Quality Scores• Alignment information
• VCF file: text file that lists all called variants (= differences to reference genome)
File Formats: FASTQ, SAM, BAM, VCF
![Page 44: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/44.jpg)
• Just FASTQ files
• Data mapped and assembly (vs. genome or exome? De-duplicated? Locally re-aligned? Indexed?)
• All of the above plus VCF file
• Annotation of variants against genes, exons, transcripts...
• Links to external resources
• Sequence alignments for visual inspection of variant calls
• Filtered and prioritised data
• Multi-genome analysis
*) Kevin Rose (born Robert Kevin Rose, February 21, 1977) is an American Internet entrepreneur
NGS Data Analysis: A rose is a rose is a rose
#CHROM POS ID REF ALT QUAL FILTER INFO FORMATA_36_B100184 65 . T C 6.2 . DP=27;AF1=0.4999;CI95=0.5,0.5;DP4=7,12,5,3;MQ=44;FQ=8.65;PV4=0.4,4.2A_36_B100224 48 . G A 225 . DP=80;AF1=0.5;CI95=0.5,0.5;DP4=32,4,38,3;MQ=56;FQ=8.65A_36_B100255 42 . A C 22 . DP=32;AF1=0.5;CI95=0.5,0.5;DP4=23,2,4,3;MQ=20;FQ=25;PV4=0.057,1.9e-06,1,0.004A_36_B100333 76 . G A 225 . DP=50;AF1=0.5;CI95=0.5,0.5;DP4=10,9,18,9;MQ=57;FQ=225;PV4=0.3...
![Page 45: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/45.jpg)
Adding Value Through Analysis
• Introduction• NGS data analysis
• Primary analysis• Mapping and assembly• Q score re-calibration• NGS sequencing QC• NGS alignment QC
• Secondary analysis• SNP and Indel calling• Annotation and evaluation pipeline• SIFT and PolyPhen
• Deliverables• Case study• Summary
![Page 46: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/46.jpg)
Primary Analysis - Mapping and Alignment
Raw Sequence
Files
FASTQ Format
MappingMapping
BWA/Bowtie
Raw Alignment
Files
SAM/BAM Format
Local Realignment(around InDels)
Local Realignment(around InDels)
GATK
Duplicate marking
Duplicate marking
Analysis-ready
Alignment
Picard SAM/BAM Format
Quality score re-
calibration
Quality score re-
calibration
Picard
![Page 47: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/47.jpg)
Why Mark Duplicates and Realignment around Indels?
![Page 48: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/48.jpg)
Why Mark Duplicates and Realignment around Indels?
3 incorrect calls within 40bp!
![Page 49: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/49.jpg)
Primary Analysis - Mapping and Alignment
Raw Sequence
Files
FASTQ Format
MappingMapping
BWA/Bowtie
Raw Alignment
Files
SAM/BAM Format
Local Realignment(around InDels)
Local Realignment(around InDels)
GATK
Duplicate marking
Duplicate marking
Analysis-ready
Alignment
Picard SAM/BAM Format
Quality score re-
calibration
Quality score re-
calibration
Picard
![Page 50: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/50.jpg)
NGS Variant Calling Methods
Option 1 - Hard filtering
Example: SNP can only be called if• read depth >10 • >35% of reads carry SNP
Effective filtering Transparent to user– Simplistic approach– Will miss high quality calls that don’t pass threshold
Option 2 - Statistical analysis
Based on quality scores of individual basepairs, the alignment and statistical probability models
Robust Optimum balance of sensitivity and specificity due to the use of statistical models Fewer false positive and false negative SNP calls– Requires correctly pre-processed data with reliable quality scores
![Page 51: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/51.jpg)
Base Quality Score Re-Calibration
Source: The Broad Institutehttp://www.broadinstitute.org/files/shared/mpg/nextgen2010/nextgen_poplin.pdf
Before Recalibration After Recalibration
![Page 52: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/52.jpg)
Primary Analysis – Raw data and assembly QC
Raw Sequence
Files
FASTQ Format
MappingMapping
BWA/Bowtie
Raw Alignment
Files
SAM/BAM Format
Local Realignment(around InDels)
Local Realignment(around InDels)
GATK
Duplicate marking
Duplicate marking
Analysis-ready
Alignment
Picard SAM/BAM Format
Quality score re-
calibration
Quality score re-
calibration
Picard
![Page 53: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/53.jpg)
Primary Analysis – Raw data and assembly QC
Raw Sequence
Files
FASTQ Format
MappingMapping
BWA/Bowtie
Raw Alignment
Files
SAM/BAM Format
Local Realignment(around InDels)
Local Realignment(around InDels)
GATK
Duplicate marking
Duplicate marking
Analysis-ready
Alignment
Picard SAM/BAM Format
Quality score re-
calibration
Quality score re-
calibration
Picard
Sequence QC checkSequence QC check
Raw data QC ReportRaw data QC Report
FastQC AlignmentQC ReportAlignmentQC Report
Alignment QC checkAlignment QC check
Picard
![Page 54: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/54.jpg)
Secondary Analysis SNP and Indel calling, annotation and filtering
GATK
Unified Genotyper
Unified Genotyper
Analysis-ready
alignment
SNPs
InDels
VCF Format
Variant Evaluation
Variant Evaluation
• Known variant?
• Impact on gene expression?
• Splicing affected?
• Non-synonymous or frameshift mutation?
• Impact on protein function?
• How confident are we in the call?
• Zygosity?
Comprehensiveinteractive OGT
Report
Comprehensiveinteractive OGT
Report
AlignmentQC ReportAlignmentQC Report
Sequence QC ReportSequence QC Report
SAM/BAM Format OGT
![Page 55: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/55.jpg)
SNP/Indel Classification(standard analysis)
We check and annotate every single detected SNP and Indel against all human Ensembl genes and transcripts and dbSNP
dbSNP annotation:•Is the variant known?•Obtain allele frequency
Does it affect any of the following•Promoter region•UTR•Splice sites or intronic region•CDS
• Synonymous mutation• Non synonymous mutation• Frameshift mutation• Stop codon (truncated/elongated protein sequence)• Overlap with protein domain• Consequence on protein function predicted (SIFT & PolyPhen)
![Page 56: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/56.jpg)
SIFT predicts whether an amino acid substitution affects protein function
based on • sequence homology (phylogenetic conservation)• the physical properties of amino acids.
SIFT can be applied to naturally occurring non-synonymous polymorphisms and laboratory-induced mutations.
SIFT – SORTS INTOLERANT FROM TOLERANT MUTATIONS
![Page 57: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/57.jpg)
PolyPhen: Prediction of Functional Effect of nsSNPs
PolyPhen (=Polymorphism Phenotyping) is an automatic tool for prediction of possible impact of an amino acid substitution on the structure and function of a human protein. This prediction is based on straightforward empirical rules which are applied to the sequence, phylogenetic and structural information characterizing the substitution
![Page 58: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/58.jpg)
OGT Processing Overview
Individual Genome Analysis(Standard Level)
Multi Genome Analysis, Data Gathering and Comparison
(Advanced Level)
Tailored analysis based on client’s individual requirements
(Expert Level)
Perform pairwise genome analysis
Filter out variants
present in any “baseline”
exome (e.g. somatic tissue, healthy sibling)
AND not all “case” exomes
Study specific additional in-depth filtering and analysis
DataInformation
![Page 59: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/59.jpg)
NGS Data Delivery
Hard drive(or FTP)
ship data
browse
Double click!
Copy data to shared drive or
local hard drive and...
![Page 60: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/60.jpg)
NGS Data Delivery
Hard drive(or FTP)
ship data
browse
Comprehensive HTML analysis report
Copy data to shared drive or
local hard drive and...
![Page 61: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/61.jpg)
NGS Data Delivery
Hard drive(or FTP)
ship data
browse
Comprehensive HTML analysis report
Copy data to shared drive or
local hard drive and...
File location& share results
![Page 62: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/62.jpg)
Analysis Report: Summary Section
![Page 63: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/63.jpg)
Analysis Report: Summary Section
![Page 64: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/64.jpg)
Analysis Report: Summary Section
![Page 65: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/65.jpg)
Analysis Report: Summary Section
![Page 66: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/66.jpg)
Analysis Report: QC Section – Read QC
![Page 67: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/67.jpg)
Analysis Report: QC Section – Read QC
![Page 68: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/68.jpg)
Analysis Report: QC Section – Read QC
![Page 69: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/69.jpg)
Analysis Report: QC Section – Read QC
![Page 70: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/70.jpg)
Analysis Report: QC Section – Read QC
![Page 71: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/71.jpg)
Analysis Report: QC Section – Alignment QC
![Page 72: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/72.jpg)
Analysis Report: QC Section – Alignment QC
![Page 73: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/73.jpg)
Analysis Report: QC Section – Alignment QC
![Page 74: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/74.jpg)
Analysis Section - Overview
![Page 75: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/75.jpg)
Analysis Section - Overview
![Page 76: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/76.jpg)
The Variant Table View
Filter In
terface
![Page 77: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/77.jpg)
The Variant Table View
Data display
Data export
![Page 78: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/78.jpg)
The Variant Table View – External Links
![Page 79: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/79.jpg)
The Variant Table View – External Links
![Page 80: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/80.jpg)
The Detailed Variant View
![Page 81: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/81.jpg)
The Detailed Variant View
![Page 82: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/82.jpg)
Predicted Consequences on Protein Function
![Page 83: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/83.jpg)
Predicted Consequences on Protein Function
![Page 84: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/84.jpg)
Predicted Consequences on Protein Function
![Page 85: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/85.jpg)
Alignment View of Selected Variant in IGV
![Page 86: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/86.jpg)
Alignment View of Selected Variant in IGV
![Page 87: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/87.jpg)
Alignment View of Selected Variant in IGV
![Page 88: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/88.jpg)
Interactive Data Filtering
![Page 89: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/89.jpg)
Interactive Data Filtering
![Page 90: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/90.jpg)
Case Study: a published exome studyMulti exome study reveal causative mutation of monogenic disorder
Stan
dard
Ana
lysi
sAdv
ance
d Ana
lysi
s
![Page 91: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/91.jpg)
Analysis Report: Supplementary Section
![Page 92: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/92.jpg)
SummaryOGT offers fast, accurate & powerful NGS analysis
Standard Analysis
• Robust statistical data analysis
• Comprehensive variant annotation
• Interactive filtering and prioritisation of data based on
• chromosomal region
• allele frequency / novelty
• zygosity
• confidence score
• severity of mutation
Advanced Analysis• Multi-genome comparison
Bespoke analysis • Tailored to your specific requirements
let us help you with your workload
![Page 93: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/93.jpg)
Outline of Presentation
• Delivering a unique next generation sequencing service — Dr Mike Evans, CEO
• Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist
• Adding value through analysis — Dr Volker Brenner, Head of Computational Biology
• Summary• Q&A• Lunch
![Page 94: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/94.jpg)
Please Enjoy Your Lunch!
Come and visit us at Booth #562
•Complete a survey for the chance to win a Kindle* eBook Reader
•Come to our wine reception tomorrow (Sunday) at 17:00 at our booth
*For full Terms and Conditions please visit www.ogt.co.uk/genefficiency/ESHGsurvey.html
![Page 95: Eshg sequencing workshop](https://reader036.fdocuments.in/reader036/viewer/2022062303/554e8d5db4c90573338b4bba/html5/thumbnails/95.jpg)
95
Thank youwww.ogt.co.uk