Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright...
Transcript of Linking genetic and epidemiological datasets · • Bench-top platform • At The Pirbright...
Donald King [email protected]
Vesicular Disease Reference Laboratory Group
Linking genetic and
epidemiological datasets: the challenges of reconstructing
transmission trees for livestock viral diseases
FMD and SVD Reference Laboratories
BBSRC National Virology Centre: The Plowright Building
• 2015: Occupied new high containment laboratory
• Houses all work with “live” FMD and International Reference Laboratories for FMD, BT, PPR, ASF, AHS, Capripox
Foot-and-Mouth Disease Virus
• Family Picornaviridae, genus Aphthovirus • Causes a highly contagious disease of cloven-hoofed
livestock Cattle, sheep, goats and pigs
• + sense ss RNA genome ~8300nt
• 7 Serotypes (O; A; C; SAT1; 2; 3 and Asia-1) including numerous subtypes VP1 sequence data widely used for strain characterisation
L 2A 3CPrimary cleavages
Poly(C)
L 1A 1B 1C 1D 2A 2B 2C 3A 3B 3C 3D
Secondary cleavages 1B/RNA? 3C 3C 3C 3C 3C 3C
Kilobases
0 1 2 3 4 5 6 7 8
ProteaseCarboxy-terminalself-cleaving
ProteaseCapsid NTP binding* Polymerase
Genome-linked(VPg)
Membrane-binding
VP4 VP2 VP3 VP1VPG
5’UTR
AAA (n)
3’UTR
• Multiple virus serotypes/topotypes/strains
• Monitoring global patterns of virus distribution Tracing sources of outbreaks (who-infected-who?)
Early recognition of the emergence of new lineages
• Antigenic prediction and vaccine selection
Why do we sequence FMDV?.........
2015: Changing epidemiological patterns
Outbreaks reported to the OIE (change of epidemiological status):
http://www.oie.int/wahid-prod/public.php?page=home
O: North Africa
Multiple
lineages
East Asia
O/1628-1468/Algeria/2014 (IZSLER)
O/TUN/1031/2014* (IZSLER)
O/1631-FA001/Algeria/2014 (IZSLER)
O/TUN/1054/2014* (IZSLER)
O/LIB/12/2013
O/LIB/11/2013
O/LIB/1/2013
O/LIB/4/2013
O/LIB/13/2013
O/LIB/5/2013
O/LIB/17/2013
O/LIB/7/2013
O/LIB/22/2013
O/LIB/2/2013
O/LIB/3/2013
O/LIB/6/2013
O/LIB/16/2013
O/IND52/2013* (PD-FMD)
O/SAU/2/2014
O/SAU/3/2014
O/SAU/1/2014
O/SAU/7/2013
O/SAU/8/2013
O/SAU/6/2013
O/IND179/2013* (PD-FMD)
O/SAU/4/2014
O/SAU/3/2013
O/SAU/1/2013
O/SAU/4/2013
O/IND127/2013* (PD-FMD)
O/NEP/16/2013
O/BHU/1/2013
O/NEP/16/2012
O/IND50/2013* (PD-FMD)
O/NEP/10/2012
O/NEP/11/2012
O/NEP/13/2012
O/NEP/6/2013
O/NEP/11/2013
O/NEP/12/2013
O/UAE/1/2014
O/UAE/2/2014
O/IND205/2013* (PD-FMD)
O/NEP/1/2014
O/NEP/2/2014
O/IND219/2013* (PD-FMD)
O/IND222/2013* (PD-FMD)
O/NEP/3/2013
O/IND189/2013* (PD-FMD)
O/NEP/6/2014
O/BHU/3/2009
O/IND/102/2010* (KC506466)
d
c O/UAE/4/2008
a O/KUW/3/97 (DQ164904)
b O/OMN/7/2001 (DQ164941)
Ind-2001
PanAsia
PanAsia-2
O/IND/53/79 (AF292107)
O/IND/R2/75* (AF204276)
O1/Manisa/TUR/69 (AY593823)
ME-SA
SEA
EA-2
EA-3
EA-4
WA
EA-1
CATHAY
ISA-1
ISA-2
EURO-SA
100
100
99
100
99
100
78
98
77
86
95
8890
87
87
91
92
100
91
87
86
99
99
99
100
99
95
74
96
86
94
95
97
0.02
• O/ME-SA/Ind2001 • FMD Outbreaks in Libya, Tunisia and Algeria • New lineage introduced into North Africa • Source: Indian sub-continent
• Increased onward threats to Morocco and Europe?
Example: Molecular Epidemiology
Knowles et al., (2015) TED
Limitations of VP1 sequence data
• VP1 nucleotide sequences can be used to provide evidence to support transboundary movements of FMDV
• Useful for regional and country-level epidemiology
• VP1 not typically useful to resolve transmission trees within outbreak clusters (relationships between infected farms)
• Can we use full genome sequence data to increase the resolution of analysis and trace the spread of FMDV during an outbreak?
• Sequencing improvements make rapid full-genome sequencing achievable Sanger methods NGS approaches
Complete genome sequencing work:
Caroline Wright
Guido König
Begoña Valdazo- González
Dan Haydon
Eleanor Cottam
Nick Knowles
Marco Morelli
Richard Orton
Graham Freimanis
David King
Kasia Bankowska
Antonello Di-nardo
Müge Fırat-Saraç
Faizah Hamid
Jemma Wadsworth
Ack
no
wle
dge
me
nts
:
Jan Kim
Cottam et al., (2006), J. Virol
Increased number of informative sites Retrospective analysis of the 2001 outbreak in the UK
Genome position
2A
2B 2C 3A 3B 3C 3D VP4 VP2 VP3 VP1 AAA (n) L
0 8000
• Analysis of 23 complete genome sequences (consensus) • 197 sites with nt substitutions
IAH2
IP2b
IP2c
IP3c
IP4b IP3b
IP5
IP6b
IP1b(1)
IP1b(2)
IP7
MAH
IP8
Practical uses of full-genome sequences: Farm-to-farm level resolution (UK 2007)
• Discriminates between viruses recovered from infected farms • Data was provided rapidly (in real-time) to support UK
eradication programme
Cottam et al (2008) PLoS Pathogens
TCS representation of sequences ( ) recovered from farms with putative intermediates ( )
IAH2
IP2b
IP2c
IP3b
IP1b(1)
IP1b(2)
MAH Expected changes for each farm-to-farm transmission link: 4.3 ± 2.1 nts for 2001 Cottam et al., (2008) Proc. Roy. Soc. B
• Provided evidence for the existence of IP5 (farm with FMD serology positive cattle and sheep) bridges gap between two phases of the outbreak
Level of individual-to-individual (cattle)
B4.9D.V
B5.9D.V
B2.6D.VB1.2D.V
B3.3D.V
B2.32D.P
2
3
IAH2*
1 B2.4D.P
B2.2D.P
B2.6D.P2
Juleff et al., (2013) J. Gen. Virol.
• Experimental infection chain
• Full genome sequences can resolve down to individual transmission events (direct and indirect contact)
• However, interpretation can be sensitive to the particular sample type analysed (acute vs carrier samples)
Impact of within-herd genetic variation upon inferred transmission trees
• 45 complete genomes from UK 2007
• Challenging data set due to long branch length on one farm
• Random selection single sequence from each farm
• 6% of tree topologies were identical
• 85% of tree topologies differed by only one edge
• Cost effective approach
Valdazo-Gonzalez et al (2015) Infec. Gen. Evol.
Fre
qu
en
cy
Distance
• MiSeq (Illumina)
• Bench-top platform • At The Pirbright Institute,
located “inside” containment
• 7GB/run (up to 250 nt paired-end reads)
• High Q-score for data quality
New era for sequencing?
Brown and Underwood (1982) Charaterisation of Danish and German FMDV isolates using ribonuclease T1 fingerprinting
EpiSeq project
Epi-Seq aims to exploit NGS technologies to: • Generate improved tools for use in real-time monitoring
of epidemics • Collaborators: Belgium, Germany, UK, Italy and Sweden
(and Denmark) Target important RNA/DNA viruses:
• Causing epidemic disease (FMDV/AIV) • Causing endemic disease (CSFV) • 2 DNA viruses (ASFV and Poxviruses)
Results will bring novel insights into:
• Field epidemiology: monitor trans-boundary movements • Evolutionary ecology: genetic determinants underpinning
phenotypic traits
NGS (Illuminia) protocol for FMDV
Co
vera
ge/s
ite
Logan et al, (2014) BMC Genomics
• PCR-free protocol o Eliminates requirements for extensive primer panels
• Can be applied to “any” RNA virus with a poly (A) tail • High coverage suitable for consensus and deep-sequencing • Multiplexing (up to 96 samples/run) is possible
• First case 20/02/01 – Abattoir, Essex
• >2000 infected farms
• ~ 7 month period
• Epidemiological links between local farms are not well understood
• Sequencing of representative viruses from an archive from ~1500 farms is underway
NGS application: UK 2001 FMD Outbreak
Assigning ownership to ancestral nodes
– cul-de-sacs where sequenced material does not normally represent the material that is transmitted to a down-stream farm
Consequence: Sequence-based analysis is compatible with a large number of fine-scale transmission trees
Generating transmission trees Limitations and challenges
Considering a simple TCS tree:
Inferred
transmission
trees
Source Farm A
Farm B
Farm C
SEQUENCE DATA FIELD EPI DATA
Date of cull
Location
Number of animals
Est. age of oldest lesion
Relationship between
sequences
Proposed
epidemiological links
with other cases –
tracing exercises
Linking datasets Using field-epidemiological data as a framework for the sequence-based trees:
Cottam et al, (2008) Proc. Roy. Soc. B
Start of
outbreak
* Most likely
infection date
(mode)
2 dIi(t) probability that i th farm was
infected at time t (discrete beta-
distribution)
Day of
cull
* date of confirmation minus oldest lesion minus 5d incubation
iC
j
jt
k
ii kLjItF0 1
)()()(
Probability that the ith farm is a source of infection at time t :
5d
L(k) probability of incubation for K days prior to
becoming infectious, gamma-distribution, 95%
probability between 2 and 12 days
2d 12d
Integrating temporal data Reduces number of plausible transmission trees
• 20 farms from the 2001 epidemic
• Within a single cluster
• Transmission trees based on full genome sequence consistent with >41,000 transmission trees
• Simple “Infected” and “Infectious” windows applied
• 4 trees represent >95% of total likelihood
Using evolutionary rates?
Wright et al., (2014) Infec. Gen. Evol.
Nu
mber
of nt substitu
tions
Outbreak timeline (days)
• Remarkably consistent substitution rate ~2 x 10-5 nt substitutions/site/day
• Retrospective analysis of the 1968-9 epidemic in the UK
• Evolutionary standstill for some viruses indicative of role of fomites (transmission via inanimate objects)?
Bayesian models Reconstructing transmission trees from different datasets
Morelli et al, (2012) PLoS Comp. Biol.
• Framework leading to a Bayesian inference scheme that combines genetic and epidemiological data
• Based on a dynamic model of pathogen transmission between source and receptor premises
• Accommodates spatial (GIS) and temporal data
• Work in progress!
Applications: estimating what we do not see?
• Assuming: polymerase error rates are
clock-like and can be estimated
population structures can be modelled
• Can we use difference between sequence data to model the extent of un-sampled sequences (“dark matter”) between samples received for analysis?
• At the regional scale – can this be used as a proxy for FMD prevalence? Calibrated with real data
Andrew Rambaut, University of Edinburgh
“dark” matter (un-sampled sequences)
Transmission to another animal
Intra-host viral pathways Regional pathways
• Future will deliver new platforms and increased capacity to generate sequence data
• Requires close relationships between molecular virologists, bioinformaticians and informaticians
• Improved pipelines to reduce process-error and/or development of models to accommodate error in our data
• Approaches to translate genetic relationships into transmission trees (also using epidemiological data)
• Reliable (statistical) measures of the likelihood of transmission links
Summary and future priorities
• The FMD Reference Laboratory
• Nick Juleff
• David Paton
• Jan Kim
• John Hammond
• Partners on EpiSeq project
Acknowledgements: Work supported by:
Photo courtesy of HDR Architecture, Inc.; © 2104 James Brittain