Science WWebinar Seriesebinar Series DNA Target … slides...sdfsdfsdf Make Genomic DNA Fragment...

Post on 30-Aug-2020

2 views 0 download

Transcript of Science WWebinar Seriesebinar Series DNA Target … slides...sdfsdfsdf Make Genomic DNA Fragment...

Sponsored by:

Participating Experts:

Daniel Turner, Ph.D.Wellcome Trust Sanger Institute,Cambridge, UK

Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009

Brought to you by the Science/AAAS Business Office

Kelly Frazer, Ph.D.Scripps Genomic MedicineSan Diego, CA

Enrichment StrategiesEnrichment Strategies

www.opengenomics.com/SureSelect

Daniel J TurnerHead of Sequencing Technology Development

Wellcome Trust Sanger Institute

DNA Target EnrichmentStrategies – bringing efficiencies

to genome sequencing

Target enrichment strategies

PCR

on array

in solution

Target enrichment strategies

PCR

on array

in solution

• Design primers that are specific for the region of interest

• Amplify

• Sequence

XR

1,438 samples

57 populations

Population sequencing of ACTN3

The α-actinin-3 deficiency trade-off:

Compared to R577 homozygotes, R557X homozygotes have:

• lower muscle strength and mass

• reduced capacity for rapid energy generation

MacArthur et al. 2007. Nature Genetics 39:1261-1265MacArthur et al. 2008 Hum Mol Genet 17:1076-86

• increased endurance capacity

• increased fatigue recovery

• enhanced muscle metabolic efficiency

Acoustic shearing

96-well library prep

ACTN3 CTSF

25 kb

Quail et al. (2008) Nat. Methods 5, 1005-1010

SPRI bead clean-ups

Custom adapters and barcoded PCR primers

Sequencing Strategy

lanes 1,3,5,7 lanes 2,6,8

Sequencing Strategy

Uniformity of coverage

0

5000

10000

15000

20000

25000

30000

35000

40000

2200 2250 2300 2350 2400 2450 2500 2550 2600 2650 2700 2750 2800 2850 2900 2950 3000 3050 3100 3150 3200 3250 3300 3350 3400 3450

0

5000

10000

15000

20000

25000

30000

11800 11900 12000 12100 12200 12300 12400 12500 12600 12700 12800 12900 13000 13100 13200 13300 13400

Fragment 2

Fragment 8

0

5000

10000

15000

20000

25000

15750 15850 15950 16050 16150 16250 16350 16450 16550 16650 16750 16850 16950 17050 17150 17250 17350 17450

Fragment 10

• Uniformity is governed by the accuracy of pooling

• 80% with a coverage within 2-fold range of the median

• 99.9% accuracy for genotype calling

• 63 high-confidence SNPs identified, 27 of them novel and 23 rare.

• Analysis of non-European HapMap samples and HGDP samples ongoing.

Results

Target enrichment strategies

PCR

on array

in solution

• Limit of 5–20 kb per PCR

• Difficult to multiplex, optimise and normalise

• Uses a lot of DNA

• Expensive if multiplexing

• But very effective

Target enrichment strategies

PCR

on array

in solution

• Hybridise sample DNA to target-specific probes on a microarray

• Wash to remove background

• Elute

• Sequence

Target enrichment strategies

PCR

on array

in solution

• Hybridise sample DNA to target-specific probes in solution

• Capture probe / target

• Wash to remove background

• Elute

• Sequence

gDNA Fragmentation

Target size: 100-300bpTarget size: 100-400bp

• Shorter fragments hybridize more efficiently

• Optimized settings give tighter distribution of fragment sizes

Library purification

SPRI beads: easily automated

allow elution in a wider variety of buffers

PCR and GC bias

Without PCR prior to hybridization

a. b.10 30 40 50 60

GC content (%)

0

80

60

40

20

0 20 10040 60 80

Percentile of unique sequence ordered by GC content

0 20 100806040

Percentile of unique sequence ordered by GC content

10 30 40 50 60 70

GC content (%)

0

80

60

40

20

With PCR prior to hybridization

• Completeness: % of target bases covered by >= 1 sequence read

• Specificity: % of sequences mapping to target regions

• Uniformity: variation in coverage

Evaluation parameters

Completeness

On array ~ 98.6% of targeted bases

In solution ~ 99.5% of targeted bases

PCR =< 100% of targeted bases

Specificity

On array up to 70% on target

In solution up to 80% on target

PCR up to 100% on target

On array 90% of CTR at 30x

In solution 95% of CTR at 30x

90%

95%

100%

0 10 20 30Coverage (-fold)

% o

f CTR

bas

es 14M7.5M6.8M6.5M6.2M5.8MArray 6.5M

Sequence uniformity

%GC vs %Coverage

Target enrichment strategies

PCR

on array

in solution

• enables large-scale projects, which would not be realistic with PCR

• Not easily scalable

• Requires expensive hardware

Target enrichment strategies

PCR

on array

in solution

• enables large-scale projects, which would not be realistic with PCR

• Simple & relatively rapid to perform

• Scalable & easily automated

• Uses least DNA

• Requires expensivehardware

• No whole exome set available commercially

AcknowledgementsLira MamanovaCarol Scott

Iwanka KozarewaDaniel MacArthurChris Tyler-SmithQasim AyubLiz Huckle

Alison CoffeyEleanor HowardAarno Palotie

Wellcome Trust Sanger Institute

Emily LeProustFred Ernani

Agilent Technologies

Tom AlbertHeike FieglerGreg McGuiness

Nimblegen

Sponsored by:

Participating Experts:

Daniel Turner, Ph.D.Wellcome Trust Sanger Institute,Cambridge, UK

Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009

Brought to you by the Science/AAAS Business Office

Kelly Frazer, Ph.D.Scripps Genomic MedicineSan Diego, CA

Enrichment StrategiesEnrichment Strategies

www.opengenomics.com/SureSelect

Enrichment of sequencing targets from the human genome

Kelly A Frazer, PhDDirector, Genomic BiologyScripps Genomic Medicine

June 10, 2009

genomic DNA

select regions

What is targeted sequencing?

Define sequence targets

Target enriched samples

Sequence

Next‐Gen Sequencing

• Low costs for generating raw, per nucleotide sequence, ($0.00001 per base).

• Best suited for generating large amounts of raw sequence data per sample, (109nucleotides per day).  

Still too costly and too low through‐put to perform whole‐genome sequencing for on many different DNA samples

Why perform targeted sequencing?

To efficiently use current technologies for population‐based sequencing studies, it is necessary to enrich for specific loci in the human genome.

Population Sequence Studies 

• Sequence‐based association studies

Healthy elderly cohort versus individuals with age‐related diseases

• Functional annotation of genomic intervals

9p21 interval associated  with CAD and T2D

• PCR – enriches target sequences with high specificity but difficult to scale

• Hybridization based methods – long oligonucleotides in solution allow for efficient capture of ~3.5 Mb of sequence targets

• Microdroplet PCR – encapsulation of PCR reactions allows for simultaneous amplification of ~4,000  targeted elements

Sample enrichment methods

Important parameters • Efficiency of assay design

– The fraction of targeted base pairs for which an assay can be designed

• Specificity of target enrichment– The fraction of high quality reads that map directly on the targeted sequences

• Coverage uniformity across targeted sequences– If coverage differs greatly then one has to sequence deeply to adequately cover underrepresented bases

• Reproducibility across technical replicates & samples

• Systematic allelic biases resulting in drop‐out effects– Errors of this nature result in high rates of incorrectly called heterozygous variant sites

Target Enrichment by Solution Hybridization

sdfsdfsdf Make Genomic DNA Fragment Libraries

Agilent Microarray 

‐ synthesis 120‐mer oligonucleotides

‐ convert to biotinylated RNA capture probes

‐ hybridization with DNA 

‐ capture and wash

‐ elution and PCR amplify

‐ sequence targeted sequences 

3.6 Mb of Targeted Sequences

• 624 genes– 9,215 exons

– 4,886 evolutionarily conserved sequences (ECS) 

– total 3.2 Mb of sequence

• 3 Contiguous Regions– 9p21: 196 kb

– APOE: 100 kb

– 8q24.21: 125 kb

Probe design efficiency

(a)

(b)

genes

Repeat Mask

Probes

Chr9

CDKN2ACDKN2BAS

CDKN2BC9orf53

21950000 21960000 21970000 21980000 21990000 22000000 22010000 22020000 22030000 22040000 22050000 22060000 22070000 22080000 22090000 22100000 22110000 22120000 22130000

CDKN2A

CDKN2BASCDKN2B

21960000 21965000 21970000 21975000 21980000 21985000 21990000 21995000 22000000

FOXO1 gene

Repeat Mask

ECS Block

Probes

Chr13

ECS Signal

• 622 genes – CDS (97%)  UTR (88%)  ECS (86%)

• Three genomic intervals – 37% to 55%

Specificity of target enrichment 

38.6% map directly on target47.8% map on or near target (+/‐ 150 bp)

Percent of base pairs corresponding to filtered reads 

Coverage uniformity across targeted sequences

Normalized coverage – divided the observed coverage of each base by the mean coverage of all targeted bases

88.4% of all bases fell within ¼ to 4 times the mean coverage

98.3% of all bases covered by at least one read

Reproducibility of coverage

Technical replicates r2 ~0.95

Variant calling accuracycomparison to microarray genotypes

~ 4,100 SNPs

QS >= 30  detection rate = 93% concordance rate = 99.3% 

No systematic allelic biases

Solution hybridization‐based method is well suited for the enrichment of loci in the mega‐base‐pair scale from the human genome for population sequence studies

Microdroplet PCR Workflow

Primer library – up to 4000 different elements

Fragmented genomic DNA template

Primer design efficiency

• 47 genes – 435 exons– 29 from ENCODE intervals

– 8 TRP channel superfamily

– 11 deep venous thrombosis 

• 457 amplicons of varying sizes (119‐956 bp) and GC content (33‐74%)

Successfully design PCR assays for all exons

Specificity of target enrichment 

• 78% of filtered reads successfully mapped to a targeted amplicon

• Off target reads aligned across genome  in a random fashion ‐ suggesting that background sequence is due to non‐specific genomic DNA carryover rather then from off‐target amplification

Coverage uniformity across targeted sequences

Normalized coverage – divided the observed coverage of each base by the mean coverage of all targeted bases

89.6% of all bases fell within ¼ to 4 times the mean coverage

99.6% of all bases covered by at least one read

Only one ampliconcompletely failed

Reproducibility of coverage

Sample to sample r2 ~0.96

Variant calling accuracycomparison to microarray genotypes

~ 450 SNPs

QS >= 30  detection rate = 97.6% concordance rate = 99.1% 

Accuracy was similar in ENCODE versus non‐ENCODE interval variants and between samples of African and European ancestry  indicating that allelic biases are mimimal

The microdroplet PCR process is extremely efficient with almost 100% of all primer pairs successful.  The data generated is well suited for performing population‐based sequence studies.

Selecting a method

• Study design– Known functional elements or entire intervals

– Total amount of targeted sequences

– Number of samples

• Sequencing Technology

AcknowledgementsSTSI/Scripps Genomic Medicine

Ryan Tewhey

Kazu Nakano

Wendy Wang

Sarah Murray

Olivier Harismendy

Eric Topol

Sponsored by:

Participating Experts:

Daniel Turner, Ph.D.Wellcome Trust Sanger Institute,Cambridge, UK

Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009

Brought to you by the Science/AAAS Business Office

Kelly Frazer, Ph.D.Scripps Genomic MedicineSan Diego, CA

Enrichment StrategiesEnrichment Strategies

www.opengenomics.com/SureSelect

Look out for more webinars in the series at:

www.sciencemag.org/webinar

For related information on this webinar topic, go to:

www.opengenomics.com/SureSelect

To provide feedback on this webinar, please e‐mail

your comments to webinar@aaas.org

Sponsored by:

Webinar SeriesWebinar SeriesScienceScienceDNA Target DNA Target 10 June, 200910 June, 2009

Brought to you by the Science/AAAS Business Office

Enrichment StrategiesEnrichment Strategies