Planning a national bioinformatics infrastructure...

25
Planning a national bioinformatics infrastructure investment 1

Transcript of Planning a national bioinformatics infrastructure...

Planning a national bioinformatics infrastructure investment

1

2

Infrastructure?

3

Research infrastructure?

4

Research infrastructure?

What is ‘Bioinformatics Infrastructure’?

What is ‘Bioinformatics Infrastructure’?

Research infrastructure we need to remain

competitive in the global, rapidly changing

research environment of biosciences

Biosciences: understanding the research community

30,000 health/biosciences researchers

18,000 health/biosciences RHD students

48,000 health/biosciences PG course work students

(163,000 + 40,000 =) 200,000 health/biosciences UG students

1,000 to 1,500 bioinformatician/computational biologists

Biosciences: understanding the research communityEstimated # Australian biology researchers in 2018: 30,000

20,000

(→ 15,000)

biology-focussed bioscience

researchers

occasional users of bioinformatics

web services

Eg BLAST, Ensembl

7,000

(→ 12,000)

data-intensive

bioscience researchers

‘omics data analysis is a critical

contributor to the research

outcomes

Eg. RNAseq analysis to identify

upregulated genes in broader

research program

2,000

(→ 3,000)

bioinf-intensive bioscience

researchers

research is fully dependent on

advanced use of bioinformatics

Eg. Genomic cancer research,

population genomics/agricultural

genomics programs

Estimated #: 1,000

(In 5 years → 1,500)

bioinformaticians

research into/application of

techniques & tool development

Eg. research generating new tool or

statistical method; core facilities

applying complex analyses

8

Biosciences: understanding the research communityEstimated # Australian biology researchers in 2018: 30,000

20,000

(→ 15,000)

biology-focussed bioscience

researchers

occasional users of bioinformatics

web services

Eg BLAST, Ensembl

7,000

(→ 12,000)

data-intensive

bioscience researchers

‘omics data analysis is a critical

contributor to the research

outcomes

Eg. RNAseq analysis to identify

upregulated genes in broader

research program

2,000

(→ 3,000)

bioinf-intensive bioscience

researchers

research is fully dependent on

advanced use of bioinformatics

Eg. Genomic cancer research,

population genomics/agricultural

genomics programs

Estimated #: 1,000

(In 5 years → 1,500)

bioinformaticians

research into/application of

techniques & tool development

Eg. research generating new tool or

statistical method; core facilities

applying complex analyses

Important Transitions 9

The rapidly evolving international context

MISSION:

● Provide, expand and improve a repertoire of specialized bioinformatics tools

● Provide access to computing and storage capacities

● Provide regular training events● Maintain and develop specific high-quality data

resources

MISSION:

● Providing the national and international life science community with a state-of-the-art bioinformatics infrastructure, including resources, expertise and services

● Federating world-class researchers and delivering training in bioinformatics

Europe: National infrastructures + ELIXIR + EBI

Europe: National infrastructures + ELIXIR + EBIThe ELIXIR Platforms comprise:

● Data Sustaining Europe’s life-science data infrastructure

● Tools Services and connectors to drive access and exploitation

● Interoperability Supporting the discovery, integration and analysis of

biological data

● Compute Storage, compute and authentication/access services

● Training Professional skills for managing and exploiting data

Four Use Cases service domain-specific research communities:

● Human data Developing long-term strategies for managing and accessing

sensitive human data

● Rare diseases Supporting the development of new therapies for rare

diseases

● Marine metagenomics Developing a sustainable metagenomics

infrastructure to nurture research and innovation in marine science

● Plant science Developing an infrastructure to facilitate genotype-phenotype

analyses for crop and tree species

Europe: National infrastructures + ELIXIR + EBI

Four main components:

● A computing environment, such as the cloud or HPC (High Performance Computing) resources, which supports access, utilization and storage of digital objects.

● Publicly available datasets that adhere to a Commons digital object compliance model.

● Software services and tools to facilitate access to and use on data, both the data in the Commons or elsewhere.

● A digital object compliance model that describes the properties of digital objects that enable them to be findable, accessible, interoperable and reproducible (FAIR).

US National Institutes of Health: Data Commons

US National Science Foundation: CyVerseVision: Transforming science through data-driven discovery

Mission: Design, develop, deploy, and expand a national

cyberinfrastructure for life science research, and train

scientists in its use

15

● Platforms, tools, datasets● Storage and compute● Training and support

World: NBCI +EBI

World: NBCI +EBI

Planning an Australian

Bioinformatics Infrastructure

investment

19

National research infrastructure roadmap

National Reference GroupProf Tony Bacic (Director, La Trobe Institute of Agriculture and Food)Prof Jacquie Batley (Plant Genetics & Breeding, UWA)Prof Dave Burt (Director Genomics, UQ)Prof Peter Cameron (Acad Director, Alfred Emerg & Trauma Centre)Prof Joanne Daly (CSIRO Honorary Fellow)Prof Frank Gannon (Director, QIMR Berghofer)Prof Rob Henry (Director, QAAFI, UQ)Prof Ary Hoffmann (Biosciences, Melbourne U)Prof Dean Jerry (Dep Director, JCU Ctr Tropical Fisheries & Aquaculture)Prof Ryan Lister (Head, Epigntcs & Genomics, Harry Perkins Inst, UWA)Prof John Mattick (Director, Garvan Institute/Director, Genome England)Prof Kathryn North (Director, MCRI)Prof Nicki Packer (Macquarie U & Inst for Gycomics, Griffith U)Prof Tony Papenfuss (President ABACBS, Comp Biol WEHI/Petermac)Dr. Maurizio Rossetto (NSW Royal Bot Gardens)Prof Eric Stone (Director, ANU-CSIRO Ctr Genmcs, Metablmcs & Bioinf)Dr Jen Taylor (Group leader Bioinformatics, CSIRO)Prof Steve Wesselingh (Director, SAHMRI)Prof James Whisstock (Monash, EMBL-Australia)Prof Marc Wilkins (Director, Ramaciotti Centre, UNSW)

Paul FlicekLead: Vertebrate

Genomics & ENSEMBL

Jaap HeringaHead: ELIXIR-NL

Jason WilliamsLead: Education,

Outreach and Training

Tony PapenfussHead, Computational Biology,

WEHI, VIC

Mark WalkerDirector, Aust Infectious

Disease Res Centre, UQ, QLD

Delphine FleuryAus Centre for Plant Functional

Genomics, SA

Sean GrimmondDirector, Centre for UoM Cancer

Research, VIC

Rebecca JohnsonDirector, Australian Museum

Research Institute, NSW

International Scientific Advisory Group

Vivien BonazziProgram Leader for NIH Data Commons

Rochelle TractenbergFounder: Collaborative

for Research on Outcomes and Metrics

Community consultations:

Brisbane 8/AugPerth 10-11/OctCanberra 30/Oct

Sydney 3/NovMelbourne 8/NovABACBS 14/Nov

Melbourne 17/NovAdelaide 20/Nov

Timeline

Phase Period Activity

1 Jun - Aug 2017

Concept development and project planning

2 Sep - Dec 2017

Elaboration of requirements, options and consensus building

3 Jan - Sep 2018

Engagement with expected NCRIS planning of its investments

4 Oct 2018 - ? 2019

Steps to implement results or engage further as needed

Three Capabilities

Capability I: A Biologist to Bioinformatics Bridge

A national omics analysis service providing:

● A means to use standardised bioinformatics techniques

through high level interfaces;

● Integrated with a regionally accessible support and

training network; and

● Providing direct access to underlying infrastructure for

new technique developers

Options (which are not mutually exclusive):

1. Operate the service on a transaction based model - bring

your data - take your data.

2. Include a long term data retention and publication

function.

3. Include a user workflow retention and publication

function.

Capability III: An Australian Biomolecular Data Consortium

A joining together of the leadership in bioscience to address long term

systemic challenges:

● Policy development around rapidly emerging data asset issues;

● The changing requirements on undergraduate and postgraduate

training; and

● Engagement with large scale -omic resources onshore and offshore.

Capability II: Data Integration Facilities

Providing:

● A facility for data intensive computing on bioscience data and tools;

● Coupled with a critical mass of data science expertise versed in

omics; and

● Assigned by merit to large research teams for extended periods

Options:

1. A single facility supporting integration of all data types (eg EBI).

2. Multiple facilities, each providing support for a domain of

specialization in data types (eg deNBI).

23

Understanding the research communityEstimated # Australian biology researchers in 2018: 30,000

20,000

(→ 15,000)

biology-focussed bioscience

researchers

occasional users of bioinformatics

web services

Eg BLAST, Ensembl

7,000

(→ 12,000)

data-intensive

bioscience researchers

‘omics data analysis is a critical

contributor to the research

outcomes

Eg. RNAseq analysis to identify

upregulated genes in broader

research program

2,000

(→ 3,000)

bioinf-intensive bioscience

researchers

research is fully dependent on

advanced use of bioinformatics

Eg. Genomic cancer research,

population genomics/agricultural

genomics programs

Estimated #: 1,000

(In 5 years → 1,500)

bioinformaticians

research into/application of

techniques & tool development

Eg. research generating new tool or

statistical method; core facilities

applying complex analyses

Important Transitions 24

Beyond tools, data & compute: workforce training

25

Workforce Development is a high priority

‘ready to go’ opportunity

Analysis of global efforts shows:

● Significant resource is being

committed

● Substantial training material exists

● As expected, lots of commonality