Planning a national bioinformatics infrastructure...
Transcript of Planning a national bioinformatics infrastructure...
What is ‘Bioinformatics Infrastructure’?
Research infrastructure we need to remain
competitive in the global, rapidly changing
research environment of biosciences
Biosciences: understanding the research community
30,000 health/biosciences researchers
18,000 health/biosciences RHD students
48,000 health/biosciences PG course work students
(163,000 + 40,000 =) 200,000 health/biosciences UG students
1,000 to 1,500 bioinformatician/computational biologists
Biosciences: understanding the research communityEstimated # Australian biology researchers in 2018: 30,000
20,000
(→ 15,000)
biology-focussed bioscience
researchers
occasional users of bioinformatics
web services
Eg BLAST, Ensembl
7,000
(→ 12,000)
data-intensive
bioscience researchers
‘omics data analysis is a critical
contributor to the research
outcomes
Eg. RNAseq analysis to identify
upregulated genes in broader
research program
2,000
(→ 3,000)
bioinf-intensive bioscience
researchers
research is fully dependent on
advanced use of bioinformatics
Eg. Genomic cancer research,
population genomics/agricultural
genomics programs
Estimated #: 1,000
(In 5 years → 1,500)
bioinformaticians
research into/application of
techniques & tool development
Eg. research generating new tool or
statistical method; core facilities
applying complex analyses
8
Biosciences: understanding the research communityEstimated # Australian biology researchers in 2018: 30,000
20,000
(→ 15,000)
biology-focussed bioscience
researchers
occasional users of bioinformatics
web services
Eg BLAST, Ensembl
7,000
(→ 12,000)
data-intensive
bioscience researchers
‘omics data analysis is a critical
contributor to the research
outcomes
Eg. RNAseq analysis to identify
upregulated genes in broader
research program
2,000
(→ 3,000)
bioinf-intensive bioscience
researchers
research is fully dependent on
advanced use of bioinformatics
Eg. Genomic cancer research,
population genomics/agricultural
genomics programs
Estimated #: 1,000
(In 5 years → 1,500)
bioinformaticians
research into/application of
techniques & tool development
Eg. research generating new tool or
statistical method; core facilities
applying complex analyses
Important Transitions 9
MISSION:
● Provide, expand and improve a repertoire of specialized bioinformatics tools
● Provide access to computing and storage capacities
● Provide regular training events● Maintain and develop specific high-quality data
resources
MISSION:
● Providing the national and international life science community with a state-of-the-art bioinformatics infrastructure, including resources, expertise and services
● Federating world-class researchers and delivering training in bioinformatics
Europe: National infrastructures + ELIXIR + EBI
Europe: National infrastructures + ELIXIR + EBIThe ELIXIR Platforms comprise:
● Data Sustaining Europe’s life-science data infrastructure
● Tools Services and connectors to drive access and exploitation
● Interoperability Supporting the discovery, integration and analysis of
biological data
● Compute Storage, compute and authentication/access services
● Training Professional skills for managing and exploiting data
Four Use Cases service domain-specific research communities:
● Human data Developing long-term strategies for managing and accessing
sensitive human data
● Rare diseases Supporting the development of new therapies for rare
diseases
● Marine metagenomics Developing a sustainable metagenomics
infrastructure to nurture research and innovation in marine science
● Plant science Developing an infrastructure to facilitate genotype-phenotype
analyses for crop and tree species
Four main components:
● A computing environment, such as the cloud or HPC (High Performance Computing) resources, which supports access, utilization and storage of digital objects.
● Publicly available datasets that adhere to a Commons digital object compliance model.
● Software services and tools to facilitate access to and use on data, both the data in the Commons or elsewhere.
● A digital object compliance model that describes the properties of digital objects that enable them to be findable, accessible, interoperable and reproducible (FAIR).
US National Institutes of Health: Data Commons
US National Science Foundation: CyVerseVision: Transforming science through data-driven discovery
Mission: Design, develop, deploy, and expand a national
cyberinfrastructure for life science research, and train
scientists in its use
15
● Platforms, tools, datasets● Storage and compute● Training and support
National Reference GroupProf Tony Bacic (Director, La Trobe Institute of Agriculture and Food)Prof Jacquie Batley (Plant Genetics & Breeding, UWA)Prof Dave Burt (Director Genomics, UQ)Prof Peter Cameron (Acad Director, Alfred Emerg & Trauma Centre)Prof Joanne Daly (CSIRO Honorary Fellow)Prof Frank Gannon (Director, QIMR Berghofer)Prof Rob Henry (Director, QAAFI, UQ)Prof Ary Hoffmann (Biosciences, Melbourne U)Prof Dean Jerry (Dep Director, JCU Ctr Tropical Fisheries & Aquaculture)Prof Ryan Lister (Head, Epigntcs & Genomics, Harry Perkins Inst, UWA)Prof John Mattick (Director, Garvan Institute/Director, Genome England)Prof Kathryn North (Director, MCRI)Prof Nicki Packer (Macquarie U & Inst for Gycomics, Griffith U)Prof Tony Papenfuss (President ABACBS, Comp Biol WEHI/Petermac)Dr. Maurizio Rossetto (NSW Royal Bot Gardens)Prof Eric Stone (Director, ANU-CSIRO Ctr Genmcs, Metablmcs & Bioinf)Dr Jen Taylor (Group leader Bioinformatics, CSIRO)Prof Steve Wesselingh (Director, SAHMRI)Prof James Whisstock (Monash, EMBL-Australia)Prof Marc Wilkins (Director, Ramaciotti Centre, UNSW)
Paul FlicekLead: Vertebrate
Genomics & ENSEMBL
Jaap HeringaHead: ELIXIR-NL
Jason WilliamsLead: Education,
Outreach and Training
Tony PapenfussHead, Computational Biology,
WEHI, VIC
Mark WalkerDirector, Aust Infectious
Disease Res Centre, UQ, QLD
Delphine FleuryAus Centre for Plant Functional
Genomics, SA
Sean GrimmondDirector, Centre for UoM Cancer
Research, VIC
Rebecca JohnsonDirector, Australian Museum
Research Institute, NSW
International Scientific Advisory Group
Vivien BonazziProgram Leader for NIH Data Commons
Rochelle TractenbergFounder: Collaborative
for Research on Outcomes and Metrics
Community consultations:
Brisbane 8/AugPerth 10-11/OctCanberra 30/Oct
Sydney 3/NovMelbourne 8/NovABACBS 14/Nov
Melbourne 17/NovAdelaide 20/Nov
Timeline
Phase Period Activity
1 Jun - Aug 2017
Concept development and project planning
2 Sep - Dec 2017
Elaboration of requirements, options and consensus building
3 Jan - Sep 2018
Engagement with expected NCRIS planning of its investments
4 Oct 2018 - ? 2019
Steps to implement results or engage further as needed
Three Capabilities
Capability I: A Biologist to Bioinformatics Bridge
A national omics analysis service providing:
● A means to use standardised bioinformatics techniques
through high level interfaces;
● Integrated with a regionally accessible support and
training network; and
● Providing direct access to underlying infrastructure for
new technique developers
Options (which are not mutually exclusive):
1. Operate the service on a transaction based model - bring
your data - take your data.
2. Include a long term data retention and publication
function.
3. Include a user workflow retention and publication
function.
Capability III: An Australian Biomolecular Data Consortium
A joining together of the leadership in bioscience to address long term
systemic challenges:
● Policy development around rapidly emerging data asset issues;
● The changing requirements on undergraduate and postgraduate
training; and
● Engagement with large scale -omic resources onshore and offshore.
Capability II: Data Integration Facilities
Providing:
● A facility for data intensive computing on bioscience data and tools;
● Coupled with a critical mass of data science expertise versed in
omics; and
● Assigned by merit to large research teams for extended periods
Options:
1. A single facility supporting integration of all data types (eg EBI).
2. Multiple facilities, each providing support for a domain of
specialization in data types (eg deNBI).
23
Understanding the research communityEstimated # Australian biology researchers in 2018: 30,000
20,000
(→ 15,000)
biology-focussed bioscience
researchers
occasional users of bioinformatics
web services
Eg BLAST, Ensembl
7,000
(→ 12,000)
data-intensive
bioscience researchers
‘omics data analysis is a critical
contributor to the research
outcomes
Eg. RNAseq analysis to identify
upregulated genes in broader
research program
2,000
(→ 3,000)
bioinf-intensive bioscience
researchers
research is fully dependent on
advanced use of bioinformatics
Eg. Genomic cancer research,
population genomics/agricultural
genomics programs
Estimated #: 1,000
(In 5 years → 1,500)
bioinformaticians
research into/application of
techniques & tool development
Eg. research generating new tool or
statistical method; core facilities
applying complex analyses
Important Transitions 24