EMBL Australian Bioinformatics Resource AHM - Data Commons
-
Upload
vivien-bonazzi -
Category
Science
-
view
218 -
download
0
Transcript of EMBL Australian Bioinformatics Resource AHM - Data Commons
BD2K and why bioinformatics matters
relevance to Australia
EMBL - Australia AHM 2016
Vivien BonazziSenior Advisor for Data Science Technologies
ADDs (Assoc. Director for Data Science) OfficeOffice of the Director (OD)National Institutes of Health (NIH)
The NIH Data Commons Digital Ecosystems for using and sharing FAIR Data
EMBL - Australia AHM 2016
Vivien BonazziSenior Advisor for Data Science Technologies
ADDs (Assoc. Director for Data Science) OfficeOffice of the Director (OD)National Institutes of Health (NIH)
What’s driving the need for a Data Commons?
Convergence of factors
Mountains of Data
Increasing need and support for Data sharing
Availability of digital technologies and infrastructures that support Data at scale
https://gds.nih.gov/ Went into effect January 25, 2015
NCI guidance:http://www.cancer.gov/grants-training/grants-management/nci-policies/genomic-data
Requires public sharing of genomic data sets
9
Recommendation #4: A national cancer data ecosystem for sharing and analysis.
Create a National Cancer Data Ecosystem to collect, share, and interconnect a broad array of large datasets so that researchers, clinicians, and patients will be able to both contribute and analyze data, facilitating discovery that will ultimately improve patient care and outcomes.
9
Challenges with Biomedical DataThe Journal Article is the end goal
Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to support FAIR data
What’sChanging?
Digital ecosystems
Development of the NIH Data Commons
How do we find data, software, standards? How can we make (large) data, annotations,
software, metadata accessible? How do we reuse data, tools and standards? How do we make more data machine readable? How do we leverage existing digital technologies
systems, infrastructures? How do we collaborate? How do we enable digital ecosystem?
Changing the conversation around Data sharing and access
NIH Data Commons
Data Commons enabling data driven science
Enable investigators to leverage all possible data and tools in the effort to accelerate biomedical discoveries, therapies and cures
by driving the development of data infrastructure and data science capabilities through collaborative research and robust engineering
Matthew Trunnel, FHC
Data Commons’s
Developing a Data Commons
Treats products of research – data, methods, papers etc. as digital objects
These digital objects exist in a shared virtual space• Find, Deposit, Manage, Share, and Reuse data,
software, metadata and workflows Digital object compliance through FAIR principles:• Findable• Accessible (and usable)• Interoperable • Reusable
The Data Commons is a framework that supports
FAIR data access and sharing and fosters the development of a digital ecosystem
https://datascience.nih.gov/commons
The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data“Reference” Data Sets
User defined data
Digital Object Compliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
Current Data Commons Pilots
Current Data Commons PilotsReference Data
Sets
Commons Framework Pilots
Cloud Credit Model
Resource Search & Index
Explore feasibility of the Commons Framework Facilitate collaboration and interoperability
Making large and/or high impact NIH funded data sets and tools accessible in the cloud
Developing Data and Software indexing methodsLeveraging BD2K Efforts: bioCADDIE and others.Collaborating with external groups
Provide access to cloud (IaaS) and PaaS/SaaS via creditsConnecting credits to the grants system
Reference Data Sets Pilot Large, High-Impact Datasets in the Cloud
Commons Framework Pilots
Software and Services
Commons Framework
• FAIRness Metrics• Data-object registry • Interoperability of APIs • Workflow sharing and docker registry • Commons Framework Publications
Resource Search & Indexing
Discoverability of data and software
Cloud Credits Model
$ denominated NIH credits to use cloud resources (IaaS) and services (PaaS/SaaS)
The Data Commons Framework
Compute Platform: Cloud
Services: APIs, Containers, Indexing,
Software: Services & Tools
scientific analysis tools/workflows
Data“Reference” Data Sets
User defined data
Digital Object Compliance
App store/User Interface
PaaS
SaaS
IaaS
https://datascience.nih.gov/commons
Indexing
Authorization /authentication layer
Digital Ecosystem
Considerations and Concluding Thoughts
ConsiderationsMetrics – Understanding and accounting of data usage patternsCost• Cloud Storage• Pay for use cloud compute (NIH credits pilot)• Indirect costs for cloud
Hybrid Clouds – Institution (private) and commercial (public) cloudsManaging Open vs Controlled access data • Auth: single sign on - dreams/nightmares?
Archive vs Working and versioning Copies of data Interoperability with other Commons (clouds)
Standards – Metadata, UIDs, APIs Discoverability – Finding digital objects across clouds Interfaces – For users with different needs and capabilities Consent – Reconsenting data, Dynamic consents? Policies
• Data sharing policies that are useful and effective • Keep pace with use of technology (e.g. dbGAP data in the Cloud)
Incentives • Access to, and shareability of FAIR Data as part of NIH grant review
criteria
Governance – Community involvement in governance models Sustainability – Long term support
Relevance to Australia?
Relevance to Australia The value of Australian Data *
Unique flora and fauna e.g Marsupials
Indigenous Australians
Understanding of genomic structure – health & disease Medicinal products
Making this data (securely) available With high quality annotation and metadata Attributions to original authors On the cloud Via open standard APIs
Aggregation of data via an Australian wide Commons?
Indexing
Authorization /authentication layer
Oz Digital Ecosystem
Summary
We need an unprecedented level of convergence and collaboration to drive biomedical science to the next level.
Supporting this model of data-intensive collaborative science requires a shift in academic research culture and new investments in data infrastructure and capabilities.
Matthew Trunnel, FHC
Acknowledgments• ADDS Office: Jennie Larkin, Phil Bourne, Michelle Dunn,Mark Guyer, Allen Dearry, Sonynka
Ngosso, Tonya Scott, Lisa Dunneback, Vivek Navale (CIT/ADDS)
• NCBI: George Komatsoulis
• NHGRI: Valentina di Francesco
• NIGMS: Susan Gregurick
• CIT: Andrea Norris, Debbie Sinmao
• NIH Common Fund: Jim Anderson , Betsy Wilder, Leslie Derr
• NCI Cloud Pilots/ GDC: Warren Kibbe, Tony Kerlavage, Tanja Davidsen
• Commons Reference Data Set Working Group: Weiniu Gan (HL), Ajay Pillai (HG), Elaine Ayres, (BITRIS), Sean Davis (NCI), Vinay Pai (NIBIB), Maria Giovanni (AI), Leslie Derr (CF), Claire Schulkey (AI)
• RIWG Core Team: Ron Margolis (DK), Ian Fore, (NCI), Alison Yao (AI), Claire Schulkey (AI), Eric Choi (AI)
• OSP: Dina Paltoo, Kris Langlais, Erin Luetkemeier, Agnes Rooke,
• Research and Industry: Mathew Trunnell (FHC), Bob Grossman (Chicago), Toby Bloom (NYGC)
Stay in Touch
QR Business Card
@Vivien.Bonazzi
Slideshare
Blog (Coming soon!)
Vivien Bonazzi