Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context...

28
Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO On behalf of the GenEpiO Development Team Will Hsiao & Damion Dooley (BC Public Health Lab), Emma Griffiths & Fiona Brinkman (SFU) 1

Transcript of Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context...

Page 1: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Context is Everything: Integrating Genomics,

Epidemiological and Clinical Data Using GenEpiO

On behalf of the GenEpiO Development TeamWill Hsiao & Damion Dooley (BC Public Health Lab),

Emma Griffiths & Fiona Brinkman (SFU)

1

Page 2: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Sequencing & Bioinformatics

• Sequencing, Assembly Pipeline Parameters

• QA/QC Metrics• Tree Construction Details

Isolate Source

• Food, Clinical, Environment• Food category, Body Product• Dates, Location

Clinical and Epi Details

• Demographics• Host disease, Symptoms • Lab Test Results• Exposures

2

Genomic sequences don’t mean much without contextual information.

Page 3: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

When requisitioning metadata, you need to anticipate the needs of downstream users.

Descriptive – Organized - Standardized

FAIR Principles of Data Management:

F – FindableA – AccessibleI – InteroperableR – Reusable

Published in Nature, March 2016 (BioScience experts, ELIXIR)

3

Page 4: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Collection of isolates, metadata, Result reporting

Lab Information Management

Systems

Sequence submission, Distribution of

sequences

Public Repositories

Downstream Users –Public Health agencies, Surveillance Programs, Researchers

Minimal Metadata Requirements in

Public Repositories

Downstream Food Safety and Public Health Activities

Requiring Integrated Contextual

Information:• Outbreak

Investigation• Source Attribution• Risk Assessment• More…

SampleSource

Metadata Silos Barriers to

Data Sharing

Minimal Metadata

RequirementsBarriers to

Data Integration

Variable Metadata Collection At

Source

Disconnect between Upstream Sequence

Providers and Needs of Downstream Users

• Information siloed• Data loss• Lack of Standardization

The metadata collected at the source affects information propagation and downstream use.

4

Page 5: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

The

of Contextual Information

Isn’tSTANDARDIZED

•Free text, short hand, granularity, misspelling, paper format 5

Page 6: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

When Words Can Mean Different Things.

Lack of standardization results in semantic ambiguity.

6

Page 7: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

• Ontologies help resolve issues of taxonomy, granularity and specificity

• Reduces time consuming manual processing and mining

Leafy Greens “has_disposition” Transmission Vehicle (E.coli)

Spinach Lettuce

Endive “is_a” Lettuce TypeIcebergSpinaciaoleracea

Amaranthushybridus

S. Africa

7

Ontology, A Way of Structuring Information

• Standardized, well-defined hierarchy of terms

• Interconnected with logical relationships e.g. Endive “is_a” Lettuce type

Integrates Epi Exposure data

Page 8: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Ontology Acts Like A Mapping Tool.

• Humans AND computers can read it

• Mapping allows interoperability AND customization

• Facilitates data sharing, reproducibility

8http://eil.stanford.edu/supply_chain/case/mapping_example.JPG

Page 9: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Ontologies are different from program interfaces.

• How information is structured and linked • How information is presented to users

Ontologies: Interfaces:

9

Page 10: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

10

Data standards and ontologies help sort data at the source.

Sorting at the source… benefits from clarity

MetadataChallenges:• Collection• Organization• Storage/Archiving

Page 11: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Prospective Metadata Collection is Easier and More Efficient than

Retrospective Retrieval.

11

Page 12: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

12

Standardization of Contextual Information

Facilitate Reporting and Quality Control

• Reproducibility• Reproducibility• Reproducibility• Reproducibility

Page 13: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

To develop a useful ontology, engaging the end users is your TOP priority.

Medical & Environmental Microbiologists

Bioinformaticians

Surveillance Analysts & Lab Personnel

EpidemiologistsSoftware and Work Flows

Investigation ToolsInstrumentation

+ =

Interview users Examine resources

GenEpiO(Genomic

Epidemiology Application Ontology)

13

Existing Standards MIxSSNOMED-CTLOINC

Page 14: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Public Health Surveillance

Case Cluster Analysis

Result Reporting

Infectious Disease Epidemiology (from case to Intervention)Lab Surveillance (from sample to strain typing results)

Evidence Collection& Outbreak

Investigation

Sample Collection& Processing

Sequence Data Generation &

Processing

Bioinformatics Analysis

Result Reporting

Whole Genome Sequencing (SO, ERO, OBI etc)

Quality Control (OBI, ERO)

LegendPrimary Focus

GenEpiO (OBO)

Other

Anatomy (FMA)

Environment (Envo)

Food (FoodOn)

Clinical Sampling (OBI)

Custom LIMS

Quality Control (OBI, ERO)

AMR (ARO)

Virulence (PATO)

Phylogenetic Clustering (EDAM)

Mobile Elements (MobiO)

Quality Control (OBI, ERO)

Nomenclature & Taxonomy (NCBItaxon)

AMR (ARO) LOINC

Surveillance (SurvO)

Demographics (SIO)

Patient History (SIO)

Symptoms (SYMP)

Exposures (ExO)

Source Attribution (IDO)

Travel (IDO)

Transmission (TRANS)

Food (FoodOn)

Geography (OMRSE)

Outbreak Protocols

Surveillance (SurvO)

Food (FoodOn)

Surveillance (SurvO)

Mobile Elements (MobiO)

Infectious Disease (IDO)

Typing (TypON)

14

Page 15: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

GenEpiO: Combining Different Epi, Lab, Genomics and Clinical Data Fields.

Lab AnalyticsGenomics, PFGE

Serotyping, Phage typingMLST, AMR

Sample MetadataIsolation Source (Food, Host

Body Product, Environmental), BioSample

Epidemiology InvestigationExposures

Clinical DataPatient demographics, Medical

History, Comorbidities, Symptoms, Health Status

ReportingCase/Investigation Status

GenEpiO(Genomic Epidemiology Application Ontology)

15

See draft version at https://github.com/GenEpiO/genepio/wiki

Page 16: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Current Ontology Development Focuses On 3 Key Areas

Food Antimicrobial Resistance

Disease Surveillance

16

(FoodON)(SurvO) (ARO)

Page 17: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Resources:• LanguaL (USDA)• FoodEx2 (EFSA)• Codex Alimentarius (WHO)• USDA Nutrient Database• Painter Classification (CDC)• FoodO/FooDB• AGROVOC• Food Safety Information Network• Compendium of Analytical Methods (HC)• More…

FoodON: A Farm-to-Fork Food Ontology

Aim:To provide food descriptors for food items, ingredients, production environments to facilitate outbreak investigations, risk assessments, source attribution etc

See draft version at https://github.com/FoodOntology/foodon

17

Page 18: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Ontologies are commonly encoded using OWL (Web Ontology Language).

• Markup language for sharing ontologies on the web • Machine and human-readable• OWL statements written in RDF (XML syntax)

• Protégé (editor)• Ontology lookup services:

18

Explore GenEpiO with Proofsheethttp://tinyurl.com/uiproofsheet

Page 19: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

GenEpiO and FoodON are now part of the OBO Foundry library of ontologies.

• Prescribes best practices for ontology development

• Committed to common use, interoperability, collaborative development

• Common relations and syntax• Accessible definitions, good

documentation

Open Biomedical Ontologies - http://www.obofoundry.org/

144 ontologies accepted or under development Describing genes and phylogenies to diseases and anatomy

Ontology for the description of life-science and clinical investigations

19

Page 20: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

GenEpiO will be implemented in different interfaces.

• Creates BioSample-Compliant Genome Submission Forms. 20

Metadata Manager: Data entry portal

• Metadata Requisition• Implements GenEpiO terms• Facilitates descriptive metadata• Sorts at point-of-entry

Page 21: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Use computers to identify common exposures, symptoms etc among genomics clusters

Example: Automating Case Definition generationCorrelate Genomics Salmonella Cluster A cases between 01 Mar 2015- 15 Mar 2015 with High-Risk Food Types Spinach Leafy Greens and Geographical Location of Vancouver

XXXXXXXXXXXXXXGenEpiO Will Help Integrate Genomics and

Epidemiological Data.

21

Page 22: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Line List Visualizations of Selectable Data Based on GenEpiO Fields.

1. Line List View

2. Timeline View

Hideable cases

Selectable fields

Travel

Symptoms and Onset

Exposure Types

Hospitalization

22

Page 23: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Summary of the Advantages Genomic Epidemiology Ontology offers Public Health.

Improved Public Health

Investigation power!

1. Eliminates semantic ambiguity

2. Term-mapping allows customization

3. Faster data integration

4. Standardized quality control and result reporting trigger actionable events in same way

5. Reproducibility (accreditation, validation)

23

Page 24: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Genomic Epidemiology Ontology is like instrumentation for your contextual information…

it needs maintenance and continual improvement

To achieve consensus and uptake International GenEpiO and FoodON Consortia

Join us!E-mail: [email protected]

24

(>60 members from 15 countries)

GenEpiO facilitates data sharing!

Page 25: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Metadata management is crucial, but tricky.

GenEpiO helps sort it out. 25

Page 26: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

Context is everything in foodborne outbreak investigations.

GenEpiO helps fit the data together. 26

Page 27: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

27

Contact the GenEpiO Dev Team at: [email protected]

Page 28: Context is Everything: Integrating Genomics ...genepio.org/wp-content/uploads/2016/10/Gen...Context is Everything: Integrating Genomics, Epidemiological and Clinical Data Using GenEpiO

28

Project LeadersFiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NML

Simon Fraser University (SFU)Emma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav Dhillon

McMaster UniversityAndrew McArthurDaim Sardar

European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina

[email protected]

National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsenTara LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsMorag GrahamChrystal BerryLorelee TschetterEduardo ToboadaPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall

University of LisbonJoᾶo Carriҫo

European Bioinformatics InstituteMelanie CourtotHelen Parkinson

BC Public Laboratory and BC Centre for Disease Control (BCCDC)Damion DooleyJudy Isaac-RentonPatrick TangNatalie PrystajeckyJennifer GardyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorJennifer Law

University of MarylandLynn Schriml

Canadian Food Inspection Agency (CFIA)Adam KoziolBurton BlaisCatherine Carrillo

Dalhousie UniversityRob BeikoAlex Keddy

www.irida.ca