Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright...

25
Donald King [email protected] Vesicular Disease Reference Laboratory Group Linking genetic and epidemiological datasets: the challenges of reconstructing transmission trees for livestock viral diseases FMD and SVD Reference Laboratories

Transcript of Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright...

Page 1: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Donald King [email protected]

Vesicular Disease Reference Laboratory Group

Linking genetic and

epidemiological datasets: the challenges of reconstructing

transmission trees for livestock viral diseases

FMD and SVD Reference Laboratories

Page 2: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

BBSRC National Virology Centre: The Plowright Building

• 2015: Occupied new high containment laboratory

• Houses all work with “live” FMD and International Reference Laboratories for FMD, BT, PPR, ASF, AHS, Capripox

Page 3: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Foot-and-Mouth Disease Virus

• Family Picornaviridae, genus Aphthovirus • Causes a highly contagious disease of cloven-hoofed

livestock Cattle, sheep, goats and pigs

• + sense ss RNA genome ~8300nt

• 7 Serotypes (O; A; C; SAT1; 2; 3 and Asia-1) including numerous subtypes VP1 sequence data widely used for strain characterisation

L 2A 3CPrimary cleavages

Poly(C)

L 1A 1B 1C 1D 2A 2B 2C 3A 3B 3C 3D

Secondary cleavages 1B/RNA? 3C 3C 3C 3C 3C 3C

Kilobases

0 1 2 3 4 5 6 7 8

ProteaseCarboxy-terminalself-cleaving

ProteaseCapsid NTP binding* Polymerase

Genome-linked(VPg)

Membrane-binding

VP4 VP2 VP3 VP1VPG

5’UTR

AAA (n)

3’UTR

Page 4: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

• Multiple virus serotypes/topotypes/strains

• Monitoring global patterns of virus distribution Tracing sources of outbreaks (who-infected-who?)

Early recognition of the emergence of new lineages

• Antigenic prediction and vaccine selection

Why do we sequence FMDV?.........

Page 5: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

2015: Changing epidemiological patterns

Outbreaks reported to the OIE (change of epidemiological status):

http://www.oie.int/wahid-prod/public.php?page=home

O: North Africa

Multiple

lineages

East Asia

Page 6: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

O/1628-1468/Algeria/2014 (IZSLER)

O/TUN/1031/2014* (IZSLER)

O/1631-FA001/Algeria/2014 (IZSLER)

O/TUN/1054/2014* (IZSLER)

O/LIB/12/2013

O/LIB/11/2013

O/LIB/1/2013

O/LIB/4/2013

O/LIB/13/2013

O/LIB/5/2013

O/LIB/17/2013

O/LIB/7/2013

O/LIB/22/2013

O/LIB/2/2013

O/LIB/3/2013

O/LIB/6/2013

O/LIB/16/2013

O/IND52/2013* (PD-FMD)

O/SAU/2/2014

O/SAU/3/2014

O/SAU/1/2014

O/SAU/7/2013

O/SAU/8/2013

O/SAU/6/2013

O/IND179/2013* (PD-FMD)

O/SAU/4/2014

O/SAU/3/2013

O/SAU/1/2013

O/SAU/4/2013

O/IND127/2013* (PD-FMD)

O/NEP/16/2013

O/BHU/1/2013

O/NEP/16/2012

O/IND50/2013* (PD-FMD)

O/NEP/10/2012

O/NEP/11/2012

O/NEP/13/2012

O/NEP/6/2013

O/NEP/11/2013

O/NEP/12/2013

O/UAE/1/2014

O/UAE/2/2014

O/IND205/2013* (PD-FMD)

O/NEP/1/2014

O/NEP/2/2014

O/IND219/2013* (PD-FMD)

O/IND222/2013* (PD-FMD)

O/NEP/3/2013

O/IND189/2013* (PD-FMD)

O/NEP/6/2014

O/BHU/3/2009

O/IND/102/2010* (KC506466)

d

c O/UAE/4/2008

a O/KUW/3/97 (DQ164904)

b O/OMN/7/2001 (DQ164941)

Ind-2001

PanAsia

PanAsia-2

O/IND/53/79 (AF292107)

O/IND/R2/75* (AF204276)

O1/Manisa/TUR/69 (AY593823)

ME-SA

SEA

EA-2

EA-3

EA-4

WA

EA-1

CATHAY

ISA-1

ISA-2

EURO-SA

100

100

99

100

99

100

78

98

77

86

95

8890

87

87

91

92

100

91

87

86

99

99

99

100

99

95

74

96

86

94

95

97

0.02

• O/ME-SA/Ind2001 • FMD Outbreaks in Libya, Tunisia and Algeria • New lineage introduced into North Africa • Source: Indian sub-continent

• Increased onward threats to Morocco and Europe?

Example: Molecular Epidemiology

Knowles et al., (2015) TED

Page 7: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Limitations of VP1 sequence data

• VP1 nucleotide sequences can be used to provide evidence to support transboundary movements of FMDV

• Useful for regional and country-level epidemiology

• VP1 not typically useful to resolve transmission trees within outbreak clusters (relationships between infected farms)

• Can we use full genome sequence data to increase the resolution of analysis and trace the spread of FMDV during an outbreak?

• Sequencing improvements make rapid full-genome sequencing achievable Sanger methods NGS approaches

Page 8: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Complete genome sequencing work:

Caroline Wright

Guido König

Begoña Valdazo- González

Dan Haydon

Eleanor Cottam

Nick Knowles

Marco Morelli

Richard Orton

Graham Freimanis

David King

Kasia Bankowska

Antonello Di-nardo

Müge Fırat-Saraç

Faizah Hamid

Jemma Wadsworth

Ack

no

wle

dge

me

nts

:

Jan Kim

Page 9: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Cottam et al., (2006), J. Virol

Increased number of informative sites Retrospective analysis of the 2001 outbreak in the UK

Genome position

2A

2B 2C 3A 3B 3C 3D VP4 VP2 VP3 VP1 AAA (n) L

0 8000

• Analysis of 23 complete genome sequences (consensus) • 197 sites with nt substitutions

Page 10: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

IAH2

IP2b

IP2c

IP3c

IP4b IP3b

IP5

IP6b

IP1b(1)

IP1b(2)

IP7

MAH

IP8

Practical uses of full-genome sequences: Farm-to-farm level resolution (UK 2007)

• Discriminates between viruses recovered from infected farms • Data was provided rapidly (in real-time) to support UK

eradication programme

Cottam et al (2008) PLoS Pathogens

TCS representation of sequences ( ) recovered from farms with putative intermediates ( )

IAH2

IP2b

IP2c

IP3b

IP1b(1)

IP1b(2)

MAH Expected changes for each farm-to-farm transmission link: 4.3 ± 2.1 nts for 2001 Cottam et al., (2008) Proc. Roy. Soc. B

• Provided evidence for the existence of IP5 (farm with FMD serology positive cattle and sheep) bridges gap between two phases of the outbreak

Page 11: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Level of individual-to-individual (cattle)

B4.9D.V

B5.9D.V

B2.6D.VB1.2D.V

B3.3D.V

B2.32D.P

2

3

IAH2*

1 B2.4D.P

B2.2D.P

B2.6D.P2

Juleff et al., (2013) J. Gen. Virol.

• Experimental infection chain

• Full genome sequences can resolve down to individual transmission events (direct and indirect contact)

• However, interpretation can be sensitive to the particular sample type analysed (acute vs carrier samples)

Page 12: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Impact of within-herd genetic variation upon inferred transmission trees

• 45 complete genomes from UK 2007

• Challenging data set due to long branch length on one farm

• Random selection single sequence from each farm

• 6% of tree topologies were identical

• 85% of tree topologies differed by only one edge

• Cost effective approach

Valdazo-Gonzalez et al (2015) Infec. Gen. Evol.

Fre

qu

en

cy

Distance

Page 13: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

• MiSeq (Illumina)

• Bench-top platform • At The Pirbright Institute,

located “inside” containment

• 7GB/run (up to 250 nt paired-end reads)

• High Q-score for data quality

New era for sequencing?

Brown and Underwood (1982) Charaterisation of Danish and German FMDV isolates using ribonuclease T1 fingerprinting

Page 14: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

EpiSeq project

Epi-Seq aims to exploit NGS technologies to: • Generate improved tools for use in real-time monitoring

of epidemics • Collaborators: Belgium, Germany, UK, Italy and Sweden

(and Denmark) Target important RNA/DNA viruses:

• Causing epidemic disease (FMDV/AIV) • Causing endemic disease (CSFV) • 2 DNA viruses (ASFV and Poxviruses)

Results will bring novel insights into:

• Field epidemiology: monitor trans-boundary movements • Evolutionary ecology: genetic determinants underpinning

phenotypic traits

Page 15: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

NGS (Illuminia) protocol for FMDV

Co

vera

ge/s

ite

Logan et al, (2014) BMC Genomics

• PCR-free protocol o Eliminates requirements for extensive primer panels

• Can be applied to “any” RNA virus with a poly (A) tail • High coverage suitable for consensus and deep-sequencing • Multiplexing (up to 96 samples/run) is possible

Page 16: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

• First case 20/02/01 – Abattoir, Essex

• >2000 infected farms

• ~ 7 month period

• Epidemiological links between local farms are not well understood

• Sequencing of representative viruses from an archive from ~1500 farms is underway

NGS application: UK 2001 FMD Outbreak

Page 17: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Assigning ownership to ancestral nodes

– cul-de-sacs where sequenced material does not normally represent the material that is transmitted to a down-stream farm

Consequence: Sequence-based analysis is compatible with a large number of fine-scale transmission trees

Generating transmission trees Limitations and challenges

Considering a simple TCS tree:

Inferred

transmission

trees

Source Farm A

Farm B

Farm C

Page 18: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

SEQUENCE DATA FIELD EPI DATA

Date of cull

Location

Number of animals

Est. age of oldest lesion

Relationship between

sequences

Proposed

epidemiological links

with other cases –

tracing exercises

Linking datasets Using field-epidemiological data as a framework for the sequence-based trees:

Page 19: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Cottam et al, (2008) Proc. Roy. Soc. B

Start of

outbreak

* Most likely

infection date

(mode)

2 dIi(t) probability that i th farm was

infected at time t (discrete beta-

distribution)

Day of

cull

* date of confirmation minus oldest lesion minus 5d incubation

iC

j

jt

k

ii kLjItF0 1

)()()(

Probability that the ith farm is a source of infection at time t :

5d

L(k) probability of incubation for K days prior to

becoming infectious, gamma-distribution, 95%

probability between 2 and 12 days

2d 12d

Integrating temporal data Reduces number of plausible transmission trees

• 20 farms from the 2001 epidemic

• Within a single cluster

• Transmission trees based on full genome sequence consistent with >41,000 transmission trees

• Simple “Infected” and “Infectious” windows applied

• 4 trees represent >95% of total likelihood

Page 20: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Using evolutionary rates?

Wright et al., (2014) Infec. Gen. Evol.

Nu

mber

of nt substitu

tions

Outbreak timeline (days)

• Remarkably consistent substitution rate ~2 x 10-5 nt substitutions/site/day

• Retrospective analysis of the 1968-9 epidemic in the UK

• Evolutionary standstill for some viruses indicative of role of fomites (transmission via inanimate objects)?

Page 21: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Bayesian models Reconstructing transmission trees from different datasets

Morelli et al, (2012) PLoS Comp. Biol.

• Framework leading to a Bayesian inference scheme that combines genetic and epidemiological data

• Based on a dynamic model of pathogen transmission between source and receptor premises

• Accommodates spatial (GIS) and temporal data

• Work in progress!

Page 22: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

Applications: estimating what we do not see?

• Assuming: polymerase error rates are

clock-like and can be estimated

population structures can be modelled

• Can we use difference between sequence data to model the extent of un-sampled sequences (“dark matter”) between samples received for analysis?

• At the regional scale – can this be used as a proxy for FMD prevalence? Calibrated with real data

Andrew Rambaut, University of Edinburgh

Page 23: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

“dark” matter (un-sampled sequences)

Transmission to another animal

Intra-host viral pathways Regional pathways

Page 24: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

• Future will deliver new platforms and increased capacity to generate sequence data

• Requires close relationships between molecular virologists, bioinformaticians and informaticians

• Improved pipelines to reduce process-error and/or development of models to accommodate error in our data

• Approaches to translate genetic relationships into transmission trees (also using epidemiological data)

• Reliable (statistical) measures of the likelihood of transmission links

Summary and future priorities

Page 25: Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright Institute, located “inside” containment • 7GB/run (up to 250 nt paired-end reads)

• The FMD Reference Laboratory

• Nick Juleff

• David Paton

• Jan Kim

• John Hammond

• Partners on EpiSeq project

Acknowledgements: Work supported by:

Photo courtesy of HDR Architecture, Inc.; © 2104 James Brittain