IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

31
IRIDA: Canada’s federated platform for genomic epidemiology William Hsiao, Ph.D. [email protected] @wlhsiao BC Public Health Microbiology and Reference Laboratory and University of British Columbia ABPHM 2015

Transcript of IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Page 1: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

IRIDA: Canada’s federated platform for genomic epidemiology

William Hsiao, Ph.D.

[email protected] @wlhsiao

BC Public Health Microbiology and Reference Laboratory

and University of British Columbia ABPHM 2015

Page 2: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Genome Canada Bioinformatics Competition: Large-Scale Project

“A Federated Bioinformatics Platform for Public Health Microbial Genomics”

Our Goal

The IRIDA platform

(Integrated Rapid Infectious Disease Analysis) An open source, standards compliant, high quality genomic epidemiology

analysis platform to support real-time (food-borne) disease outbreak investigations

2 www.IRIDA.ca

Page 3: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

3

Each year, one in eight Canadians (or four million people)

get sick with a domestically acquired food-borne illness.

Page 4: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Partnership among public health agencies and academic institutes to bridge the gaps between advancements in genomic epidemiology and application to real-life and real-

time use cases in public health agencies

- Project Team has direct access to state of the art research in academia - Project Team is directly embedded in user organization

Page 5: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Interviews with key personnel to identify barriers to implement genomic epidemiology in

public health agencies

5

Page 6: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

GAP 1: PUBLIC HEALTH PERSONNEL LACK TRAINING IN GENOMICS

Page 7: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data

• Carefully designed and engineered software platform is just the starting point… User

Interface

Secu

rity

File system

Metadata Storage Application

logic

REST API Workflow Execution Manager

Continuous Integration Documentation

Page 8: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

• Easy to use interface hiding the technical details

Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data

Page 9: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 1a: Build a User Friendly, high quality analysis platform to process genomics data

Page 10: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 1b: Build Portable and Transparent Pipelines

• Use Galaxy as workflow engine – large community support

• Retools to address usability, security, and other limitations

• Version Controlled Pipeline Templates • Input files, parameters, and workflow are

sent to IRIDA-specific Galaxy for execution • Results and provenance information are

copied from Galaxy

1. Input files sent to

Galaxy

3. Results downloaded from Galaxy

IRIDA UI/DB

Galaxy Assembly Tools

Variant Calling Tools

REST API

Shared File System

Worker Worker

2. Tools executed on Galaxy workers

Page 11: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 1c: Start the training NOW!

• Canada’s National Microbiology Laboratory has hosted genomic workshops for partners and collaborators

• IRIDA Project has dedicated funding for hosting workshops in 4Q of 2015 and 2016

• We would like to hear about other training initiatives and share experience and training material

Page 12: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

GAP 2: INFORMATION SHARING IS INEFFICIENT AND AD-HOC

Page 13: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Many Players in surveillance and outbreak – ineffective information sharing

Source: M. Taylor, BCCDC

Provincial public health dept.

National laboratory

Local public health dept.

Provincial laboratory

Cases

Physicians Frontline lab

Information

Bioinformatics and Analytical Capacities

Page 14: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Many Systems used in Reporting Diseases –require data re-entry and re-coding

National Ministry of Health

Provincial public health dept.

National laboratory

Local public health dept.

Provincial laboratory

Cases

Physicians Local laboratory

Fax/Electronic

Fax

Phone/Fax

Electronic/Paper

Electronic/Fax/Phone Mailing of Samples/Fax/Eelctroni

c

Source: M. Taylor, BCCDC

Page 15: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

IRIDA is designed with these dilemma in mind

• Solutions: – 2a: Localized Instance of federated databases

– 2b: Permission Control – authentication /authorization for

information sharing

– 2c: User role-based display of information

Page 16: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 2a: Local/Cloud Instances and Data Federation

• Data processing capacity pushed to data generating labs

• Allow data sharing securely for enhanced analysis • Eventually cultivating a culture of openness of data

sharing and collaborative development of tools

16

Page 17: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Authorization

Solution 2b: Security

• Local authorization per instance. • Method-level authorization. • Object-level authorization. • Allow secure, fine grained and

flexible information sharing

Page 18: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 2c: Role-based Dynamic Display driven by Ontology

• Ontologies often lack a content management system (CMS) • An Interface Model Ontology (IFM) can define a CMS for an

ontology

Page 19: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

IFM Interface View Permissions

Detailed View Restricted View

E.g. User role permissions control visibility and editing of content

Page 20: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

GAP 3: INFORMATION REPRESENTATION IS INCONSISTENT

Page 21: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 3a: Use Ontology

• Ontology: a way to describe types of entities and relations between them

• Why use ontology – Ontology is flexible and expandable – Lower levels of expressivity (e.g. controlled vocabulary,

data dictionary) are heavy handed and show low level of compliance and adoption

– Free text used as an alternative that are not computing friendly

– Ontology and semantic web technologies may be a solution

Page 22: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Many Domains of Knowledge are needed to describe an outbreak investigation

Build On, Work With: OBI TypON NGSOnto NIAID-GSC-BRC core metadata MIxS Ontology NCBI Biosample etc TRANS – Pathogen Transmission EPO Exposure Ontology Infectious Disease Ontology CARD, ARO for AMR USDA Nutrient DB EFSA Comp. Food Consump. DB Example gaps to be filled: Expand food ontology; expand CARD AMR data with others.

Page 23: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Lab Checklist/Ontology

• Currently finishing a lab/genomics checklist and starting an epidemiology checklist

• Metadata Domains: – Sample Collection – Sample Source – Environmental – Lab Analytics – Sequencing Process /QC – Sequencing Run /QC – Assembly Process / QC – Others overlapping with Epi: Demographic / Geographic /

etc.

Page 24: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

GAP 4: GENOMIC DATA INTERPRETATION IS COMPLEX AND TECHNOLOGY IS EVOLVING

Page 25: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 4a: Use of QA/QC in IRIDA

• Software Engineering – High quality software that meets regulatory guidelines – Open Source product to ensure “white box” testing – Ontology driven software development – Follow proper software development cycle

• Data Quality

– Built-in modules to check for input data quality – Warnings and Feedbacks during pipeline execution to laboratory technologists – Use of Ontology to check metadata (non-genomic) data quality

• Analytic Tool Quality

– Utilize validation datasets – Use of abstract pipeline description – with version control – Periodic analysis of exceptions and boundary cases to assess tool accuracy

Page 26: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 4b: Generation of validation datasets

To Participate, Contact Rene Hendriksen [email protected] Or Errol Strain [email protected]

http://www.globalmicrobialidentifier.org/Workgroups#work-group-4

Page 27: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Solution 4c: Exploratory tools can access certain data via REST API securely

27

http://pathogenomics.sfu.ca/islandviewer

IslandViewer

Dhillon and Laird et al. 2015, Nucleic Acids Research

http://kiwi.cs.dal.ca/GenGIS

Parks et al. 2013, PLoS One

Page 28: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Availability

• Jun 1 2015: IRIDA 1.0 beta Internal Release – Release to collaborators for installation and full test

• Jul 1 2015: IRIDA 1.0 beta1

– Announce Beta release, download, documentation available on website – www.irida.ca

• Aug 1 2015: IRIDA 1.0 beta2

– Cloud installer, with documentation – Additional pipelines as available – Visualization as available

Page 29: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

Acknowledgements Project Leaders

Fiona Brinkman – SFU Will Hsiao – PHMRL Gary Van Domselaar – NML University of Lisbon Joᾶo Carriҫo National Microbiology Laboratory (NML) Franklin Bristow Aaron Petkau Thomas Matthews Josh Adam Adam Olson Tarah Lynch Shaun Tyler Philip Mabon Philip Au Celine Nadon Matthew Stuart-Edwards Morag Graham Chrystal Berry Lorelee Tschetter Aleisha Reimer Laboratory for Foodborne Zoonoses (LFZ) Eduardo Taboada Peter Kruczkiewicz Chad Laing Vic Gannon Matthew Whiteside Ross Duncan Steven Mutschall

Simon Fraser University (SFU) Melanie Courtot Emma Griffiths Geoff Winsor Julie Shay Matthew Laird Bhav Dhillon Raymond Lo BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC) Judy Isaac-Renton Patrick Tang Natalie Prystajecky Jennifer Gardy Damion Dooley Linda Hoang Kim MacDonald Yin Chang Eleni Galanis Marsha Taylor Cletus D’Souza Ana Paccagnella University of Maryland Lynn Schriml Canadian Food Inspection Agency (CFIA) Burton Blais Catherine Carrillo Dominic Lambert Dalhousie University Rob Beiko Alex Keddy

29

McMaster University Andrew McArthur Daim Sardar European Nucleotide Archive Guy Cochrane Petra ten Hoopen Clara Amid European Food Safety Agency Leibana Criado Ernesto Vernazza Francesco Rizzi Valentina

Page 30: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

30 30

IRIDA Annual General Meeting Winnipeg, April 8-9, 2015

Page 31: IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao

The IRIDA platform (Integrated Rapid Infectious Disease Analysis)

An open source, standards compliant, high quality genomic epidemiology analysis platform to support real-time (food-borne) disease outbreak

investigations

Contacts:

[email protected] @wlhsiao

31 www.IRIDA.ca