Hetman immem xi final March 2016
-
Upload
iridacommunity -
Category
Science
-
view
28 -
download
1
Transcript of Hetman immem xi final March 2016
![Page 1: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/1.jpg)
IMMEM XINavigating Microbial Genomes: Insights from the Next Generation9 – 12 March 2016, Estoril, Portugal
The EpiQuant framework for assessing genetic and epidemiologic concordance: towards improved use of genomic data in epidemiological applications.Ben Hetman B 1,2; Steven Mutschall 1; Vic Gannon 1; James Thomas 2; and Eduardo Taboada 1
1 National Microbiology Laboratory at Lethbridge, Public Health Agency of Canada, Lethbridge AB, Canada.2 Department of Biological Sciences, University of Lethbridge, Lethbridge AB, Canada.
![Page 2: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/2.jpg)
2
29.3
19.67
11.12
3.081.94 1.56
0.36 0.32
Campylobacteriosis Salmonellosis GiardiasisShigellosis Verotoxigenic E. coli (VTEC) CryptosporidiosisListeriosis Cyclosporiasis
(*447)
(*269)
(*24)
(*4)(*39) (*7)
(*.55) (*7.5)
Thomas et al (2013). doi:10.1089/fpd.2012.1389FoodNet Canada Short Report 2013
***Post-correction estimate
Campylobacter is a public health challenge
#1 bacterial gastrointestinal disease in Canada and a leading foodborne pathogen worldwide (300-500 million cases)
Self-limiting illness, highly under-reported, largely sporadic
![Page 3: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/3.jpg)
3
The epidemiology of campylobacteriosis is daunting
Source: Julie Arsenault (PhD Thesis)papyrus.bib.umontreal.ca/jspui/handle/1866/4625
Widespread in “farm-to-fork” and “source-to-tap” high prevalence in most major livestock species found in many wild animal species, insects, surface waters
Difficult to establish sources of exposure and routes of transmission Crisis = Opportunity WGS to the rescue!!!
![Page 4: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/4.jpg)
4
Can rapidly generate different clusters of isolates at an almost unlimited number of thresholds
E.g.:Do groups formed by genomic relationships agree with those formed by epidemiologic relationships ?
- OR -
What is the optimal threshold for forming clusters that agree with epidemiologic relationships?
Genomic data…so many options for thresholding
WGS based analyses still require knowledge of the epidemiology to guide clustering of genomic data into “epidemiologically relevant clusters”
![Page 5: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/5.jpg)
5
Those who make many species are the 'splitters' and those who
make few are the 'lumpers’… – CD (1857)
Clustering thresholds have been with us forever…
Need to calibrate our analysises to ensure our results exploit the high resolution of WGS data while remaining epidemiologically relevant
![Page 6: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/6.jpg)
6
Building a model for quantifying epidemiological similarity
“Essentially, all models are wrong, but some are useful.”
George E.P. Box (1919-2013)
![Page 7: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/7.jpg)
7
How to relate epidemiologic and genomic clustering?1. Adjusted Wallace Coefficient: (AWC) Carriço et al. (Comparing Partitions)
The directional likelihood that two isolates clustered together using one method will be grouped together in the second method
AWCStrain 1 Strain 2 Strain 1 Strain 2WGS clusters Epi clusters
2. Intra-cluster cohesion: (ICC)A measure of the of the genomic and/or epidemiologic homogeneity of the isolates within a cluster
High ICC Low ICC
![Page 8: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/8.jpg)
8
Comparing epidemiology vs. genomics
Need a model to assess strain to strain relationships based on isolate epidemiology so we can directly compare them against the WGS data
Core Analysis
MIST
Source
Location
Date
Genomics Workflow
Epidemiology Workflow
Sequencing Assembly AnnotationIn-Silico Typing
Cluster Analysis&
Analysis of concordance
Metadata Curation Quantify Epi-Similarities
Isolate Selection
![Page 9: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/9.jpg)
The challenge with epidemiological data
Source SpatialTemporal
Surveillance data is inherently less comprehensive than outbreak data Metadata is generally qualitative/categorical, not quantitative
![Page 10: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/10.jpg)
Source SpatialTemporal
Establish a metric that summarizes the relationships between isolates based on basic epidemiologic metadata
Clustering of isolates based on epidemiological metadata
Our proposed approach: A model for quantifying epidemiological similarity between strains based on three primary factors: source, space, time
EpiSym = σ(source) + γ(geospatial) + τ(temporal)σ = coefficient for Sourceγ = coefficient for Geospatialτ = coefficient for Temporal
Building a model for epidemiological similarity
![Page 11: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/11.jpg)
11
Spatial =
Where• distab is given by the Haversine formula• x, y = sampling dates
Temporal =
Quantifying epi-similarities: Spatial and Temporal
‘Spatial’ and ‘Temporal’ factors required for the EpiSym coefficient are relatively simple to build into the equation
![Page 12: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/12.jpg)
12
Identify all available sources Identify core epidemiological attributes Assess each source independently and completely for each
attribute Score the pairwise similarity between any two sources based
on their shared epidemiological attributes
Source
Quantifying Source-Source Similarities
=Where• i, j = two sources being compared• *(i + j) = number of matching attributes• n = maximum possible score
EpiSym
![Page 13: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/13.jpg)
13
An example: ‘faecal cow’ vs. ‘retail chicken’
Similarity:
= 12.519 = 0.658
Σ Pairwise MatchesMaximum
Possible Score
=
Once source similarity is quantified, we can compute overall EpiSym
We can systematically compute EpiSym across large datasets epi clusters Comparison to genomic clusters using cluster concordance metrics
![Page 14: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/14.jpg)
14
Clinical
Animal
Environmental
A
B
C
DE
F
GH
I
JK
L
M
N
O
P Major clusters based on source factor Subclusters further refined by spatial and temporal factors
Results: epidemiological clustering of C. jejuni isolates
![Page 15: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/15.jpg)
15
Clinical
Animal
Environmental
A
B
C
DE
F
GH
I
JK
L
M
N
O
P
1 2
3
4
5
Clusters of secondary heat correspond to isolates with similar geography and temporal data, but different sources
Results: epidemiological clustering of C. jejuni isolates
![Page 16: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/16.jpg)
16
Calibrating WGS typing for epidemiologic investigations
We can identify the clusters obtained at varying thresholds and compare them to epidemiological clusters to look for ‘best-fit’
An advantage of WGS is the flexibility in thresholding that is possible
![Page 17: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/17.jpg)
17
Calibrating WGS typing for epidemiologic investigations
Genomic cluster homogeneity
vs.Epidemiologic cluster
homogeneity
Calculate point of highest genomic-cohesion while maintaining Multi-isolate clusters High epidemiologic validity
![Page 18: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/18.jpg)
18
Epi vs. Genomic clustering: examining the outliers
Strains with similar epidemiology aren’t necessarily similar genomically (and vice-versa!)
By overlaying the two methods, we can identify clusters that group together significantly stronger via genomic or epidemiologic relationships
“Epi-Clustering “Genomic-Clustering”
![Page 19: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/19.jpg)
19
White = high congruence
Green = stronger similarity via epi
Blue = stronger similarity via genotype
“Generalist genotype”
“Generalist source”
‘Generalist’ genotypes persist across many conbinations of source, temporal and spatial parameters
‘Generalist’ reservoirs support the persistence of a broad range of genotypes
Epi vs. Genomic clustering: examining the outliers
![Page 20: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/20.jpg)
20
Summary We have developed a model to help guide our analysis of Campylobacter
WGS data for practical public health purposes
Systematic examination of the relationship between the genomic and epidemiological similarity of sets of isolates optimization of clustering for epidemiologic relevance
Calculate point of highest genomic-cohesion while maintaining High epidemiologic cohesion Multi-isolate clusters
Interactive web application under development (Check it out!) https://hetmanb.shinyapps.io/EpiQuant/
![Page 21: Hetman immem xi final March 2016](https://reader037.fdocuments.in/reader037/viewer/2022103010/5885bafd1a28ab6f168b599f/html5/thumbnails/21.jpg)
21
AcknowledgementsPeople• Supervisors:
Ed Taboada + Jim Thomas• Lab:
Steven Mutschall (PHAC)Peter Kruczkiewicz (PHAC)Dillon Barker (PHAC/ULeth)
Funding• University of Lethbridge• Public Health Agency of Canada A-base • Gov’t of Canada: Genomics Research and Development
Initiative