PowerPoint Presentation
The Data Commons Digital Ecosystems for Sharing and Analyzing biomedical Big DataVivien Bonazzi, Ph.D.Senior Advisor for Data ScienceOffice of Data Science (ADDS)National Institutes of Health
Lets Talk About Biomedical Big Data
What Makes Big Data Big?
VOLUMEVELOCITYVARIETYVERACITY
Its a signal of the coming Digital Economy DATA has VALUEDATA is CENTRAL to the Digital EconomyBut its more than this..
An economy characterized by using data to gain a business advantage
(yes, institutions are a business)
Organizations that are not born digital will be at a disadvantage in the new economy
Organizations will be defined by their digital assets
Scientific digital assets Data Software Workflows Documentation Journal Articles
The most successful organizations of the future will be those that can leverage their digital assets and transform them into a digital enterprise
Make data
The currency of an organization
Usable in a digital ecosystems Data Commons
The problem with biomedical data
Digital assets includes Data
Challenges Biomedical Data
The Journal Article is the end goal Data is a means to an ends (low value) Data is not FAIR Findable, Accessible, Interoperable, Reproducible Limited e-infrastructures to support FAIR data
The ProblemWith Biomedical DATA
https://www.youtube.com/watch?v=N2zK3sAtr-4
WhatsChanging?
FAIR principles drive data to become the currency
Policies that promote data sharing via FAIR help change the culture
Currencies dont exist in a vacuum
Buy and sell Goods
14
We also need a digital ecosystem that allows transactions to occur on FAIR data at scale
The Data Commons is a platform that fosters the development of a digital ecosystem
The Data Commons platform that fosters development of a digital ecosystem
Treats products of research data, software, methods, papers etc as digital asset (object)
Digital objects need to conform to FAIR principles
Digital objects exist in a shared virtual space- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support them
The Data Commons is a platform? that fosters the development of a digital ecosystem
A nascent platform18
A platform is a plug and play model that allows multiple participants (producers and consumers) to connect to it, interact with each other and create value
Sangeet Paul Choudary Platform Scale
A lot of what see today uses a platform approach
Sangeet Paul Choudary Platform Scale
Platforms that utilize data as a central currency enable transactions between producers and consumers20
The goal of the a Data Commons Platform is to enable interactions between producers and consumersSangeet Paul Choudary Platform Scale
Producers of digital objects - data, tools, workflows - used by consumersThe Platform enables these transactions Accommodates bioinformatics and non bioinformatics users21
To understand the Data Commons Platform (and how it works for biomedical data) we need to use a Platform stackto help visualize the concept
Framework helps visualize the concept of the platform22
Sangeet Paul Choudary Platform Scale
Platforms have 3 layers
NIH Data Commons - Platform Stackhttps://datascience.nih.gov/commons
TechnologyTechnologyDataNetwork/market place
https://datascience.nih.gov/commonsNIH Data Commons - Platform Stack
Initial PhaseUnique digital object identifiers of resolvable to original authoritative sourceMachine readableA minimal set of searchable metadata Clear access rules (especially important for human subjects data)An entry (with metadata) in one or more indices
Future PhasesStandard, community based unique digital object identifiers Conform to community approved standard metadata and ontologies for enhanced searchingDigital objects accessible via open standard APIsNIH Data Commons: Digital Asset Compliance Making things FAIR
Data Commons Platform drives digital ecosystem
The NIH Data Commons Pilot
The NIH Data Commons Pilot
Co-location of large and/or highly utilized NIH funded data withstorage and computing infrastructure + Commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community.
Investigators will be able to collaborate and share digital objects within this environment and connect with others
NIH Nascent Commons Pilots
An NIH Wide Data Commons Pilot
Indexing
Indexing
IndexingAuthorization /authentication layer
Considerations
Metrics - understanding and accounting of data usage patterns
Cost - Cloud Storage, pay for use cloud compute (NIH credits)
Hybrid Clouds Mix of research and commercial clouds
Connecting - Interoperability with other Commons, clouds
Consent - Reconsenting data, Dynamic consents
Standards Metadata, UIDs, APIs
An Australian Commons Experiment?
A Garvan Data Commons Platform?
Garvan DATANCI + CloudAnalysis tools (Inc 3rd party)Apps StoreCommunity Research, Clinical, PublicAPI connectivity with other Commons
* All Garvan Data + Tools in authorized /access control environment allow access to approved users
* Hybrid Clouds: NCI (National Computing Infrastructure) + Commercial (AWS Allow approved users Garvan or others inlcudingcommercial vendors (ie DNA Nexus) to develop tools (SaaS) onto of the Garvan dataAPI connections to other Commons NY Genome Center* Beacon projects - variation
40
An Australian Data Commons?Australian DATA - Flora and FaunaCommercial Cloud (NCI) Analysis tools (Inc 3rd party)Apps StoreCommunity Research, Clinical, PublicAPI connectivity with other Commons
Develop an Australian Data CommonsMake ALL Australian data : flora, fauna incl. human clinical data available in a data commons cloud (mix of NCI and commercial cloud)Encourage tool development from bioinformatics research or commercial groupsMake the commons interoperable with other Cloud CommonsUse NCBI and EBI as an archive learn their annotation methods for metadata and their data distribution methods and cloud access.Embed Postdocs within NCBI and EBI to learn these methods and bring them back to Australia. Develop a team approachUse this as a way to train the next generation of scientists Bfx and non Bfx
41
To achieve great things, two things are needed: a plan and not quite enough time
Leonard Bernstein
Thank youADDS Office- Phil Bourne, Michelle Dunn, Jennie Larkin, Mark Guyer, Sonynka NgossoNCBI: Jim Ostell, David Lipman, George KomatsoulisNHGRI: Valentina di Francesco, Kevin Lee, Eric GreenNIGMS: John Lorsch, Susan GregurikCIT: Andrea Norris, Debbie Sinmao, Stacy CharlandNCI: Warren Kibbe, Tony Kerlavage, Lou Staudt, Tanja Davidsen, Ian ForeNIAID: JJ McGowan, Nick Weber, Darrell Hurt, Maria GiovanniThe NIH Common Fund: Betsy Wilder, Jim Anderson, Leslie DerrTrans NIH BD2K Executive Committee & Working groupsMany biomedical researchers, cloud providers, IT professionals
John Mattick and the Garvan Institute
Stay in Touch
QR Business [email protected]
SlideshareBlog (Coming soon!)Vivien Bonazzi
Top Related