INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY •...

14
INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL DATA USING ONTOLOGY William Hsiao, PhD BC Centre for Disease Control and University of BC IRIDA and GenEpiO Consortia GloPID-R Zika Workshop 2016, São Paulo

Transcript of INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY •...

Page 1: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL

DATA USING ONTOLOGY

William Hsiao, PhDBC Centre for Disease Control and University of BC

IRIDA and GenEpiO Consortia

GloPID-R Zika Workshop 2016, São Paulo

Page 2: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

BIG DATA IS CHANGING PUBLIC HEALTH PRACTICES

• Big Data: Increasing digitization of biomedical data (digital objects) requires computers for processing and management to turn into Information and Knowledge

• Molecular/ Genomic Epidemiology: high throughput DNA sequencing provides high resolution evidence for epidemiological investigations• Raw data (GBytes) -> Processed data (MB) -> Interpreted data (KB) -> Decision (Bytes; subtyping results)

• Localized data processing reduce data transfer bottleneck

• Data harmonization and sharing is for both human and, more critically, for computer consumptions

Page 3: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

WhenSameWordsCanMeanDifferentThings

SEMANTIC AMBIGUITY

Nuts

Page 4: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

IRIDA

Sequencing Instruments

Web Application

Data management

Built-in Analytical

Tools

External Galaxy

Command-line Tools

Project Information: http://www.irida.ca

• Open Source and Free

• Modular Design• User friendly web interface• Robust analysis pipeline engine• Data management for genomic data

• Secure authentication and authorization to access data and system

• Audit trail (data provenance)

IRIDA: GENOMIC DATA ANALYSIS PLATFORM

Page 5: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

Project Information: http://www.irida.ca

IRIDA: FEDERATION AND DATA SHARING

• Federation: Multiple local instances able to communicate to each other• Allow on-site data generation and

analysis

• Sharing: Allow data sharing securely via standard API for enhanced analysis (using 3rd party analysis tools)

• Eventually cultivating a culture of openness of responsible data sharing and collaborative analysis tool development

Page 6: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

Sequencing & Bioinformatics

• Sequencing, Assembly Pipeline Parameters

• QA/QC Metrics• Tree Construction Details

Sample Information

• Isolation source• Food, Clinical, Environment• Food category, Body Product• Dates, Location

Clinical and Epi Details

• Demographics• Host disease, Symptoms • Lab Test Results• Exposures

GENOMIC SEQUENCES NEED TO BE INTERPRETED IN CONTEXT

Page 7: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

Descriptive – Organized - Standardized

FAIR Principles of Digital Data Management:

F – FindableA – AccessibleI – InteroperableR – Reusable

Published in Nature Scientific Data, March 2016

WHEN REQUISITIONING CONTEXTUAL DATA, WE NEED TO ANTICIPATE THE NEEDS OF DOWNSTREAM USERS

Difficult to foresee data integration needs during an evolving PH emergency such as zika or ebola outbreaks

Page 8: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined
Page 9: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

ONTOLOGY

• A mechanism to specify and express a body of knowledge

• Standardized, well-defined hierarchy of terms

• Each term has a unique universal ID• Terms interconnected with logical

relationships• Have formats that are Human AND

computer readable

• This internally coherent tool can act as an universal translator of different standards

Page 10: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

Lab AnalyticsGenomics, PFGE

Serotyping, Phage typingMLST, AMR

Sample MetadataIsolation Source (Food,

Host Body Product, Environmental),

BioSample

Epidemiology InvestigationExposures

Clinical DataPatient demographics,

Medical History, Comorbidities, Symptoms,

Health Status

ReportingCase/Investigation Status

GenEpiO(Genomic Epidemiology

Application Ontology)

GEN-EPI-O: COMBINING EPI, LAB, GENOMICS AND CLINICAL DATA FIELDS

Page 11: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

GEN-EPI-O INITIAL DEVELOPMENT

Medical & Environmental Microbiologists

Bioinformaticians

Surveillance Analysts & Lab Personnel

Epidemiologists Software and Work Flows

Investigation ToolsInstrumentation

+ =

Interview users Examine resources

GenEpiO(Genomic

Epidemiology Application Ontology)

Page 12: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

GenEpiO is part of the OBO Foundry library of ontologies

• Prescribes best practices for ontology development

• Common relations, syntax and data formats

• Re-use terms when possible• Committed to openness, interoperability

and collaboration• Attributable efforts

Open Biomedical Ontologies - http://www.obofoundry.org/

144 ontologies accepted or under development àDescribing genes and phylogenies to diseases and anatomy

See draft version at https://github.com/GenEpiO/genepio/wiki

Page 13: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

ADVANTAGES OF ONTOLOGY

• Eliminates semantic ambiguity

• Term-mapping allows customization of displays

• Flexible to allow incorporation of new data sources/types• Faster data integration

• Triggers actionable events in same way• Reproducibility (suitable for organizational accreditation, validation)

• Curator Attribution (giving credit to people working on the resources)

• Formation of GenEpiO consortium to work on a common open resource• Identify priorities

• Build Consensus

Improved Public Health

Investigation power!

Page 14: INTEGRATING GENOMICS, EPIDEMIOLOGICAL AND CLINICAL … … · zika or ebola outbreaks. ONTOLOGY • A mechanism to specify and express a body of knowledge • Standardized, well-defined

ACKNOWLEDGEMENTS

IRIDA Project LeadersFiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NMLRob Beiko – DalhousieAndrew McArthur - McMasterLeonid Chindelevitch – SFUCedric Chauve - SFUSimon Fraser University (SFU)Emma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav Dhillon

McMaster UniversityDaim Sardar

European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina

National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsenTara LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsMorag GrahamChrystal BerryLorelee TschetterEduardo ToboadaPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven MutschallUniversity of LisbonJoᾶo Carriҫo

BC Public Laboratory and BC Centre for Disease Control (BCCDC)Damion DooleyJudy Isaac-RentonPatrick TangNatalie PrystajeckyJennifer GardyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’sousaUniversity of MarylandLynn SchrimlCanadian Food Inspection Agency Adam KoziolBurton BlaisCatherine CarrilloDalhousie UniversityAlex KeddyEuropean Bioinformatics InstituteMelanie CourtotHelen Parkinson

GenEpiO Project LeadersWill Hsiao – BCCDCFiona Brinkman – SFUAndrew McArthur - McMasterSimon Fraser University (SFU)Emma GriffithsUniversity of British ColumbiaDamion DooleyMcMaster UniversityAmos RaphenyaBrian Alcock

GenEpiO