Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality Gary Berg-Cross,...

12
Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality https://rd-alliance.org/group/data-fabric- ig.html Gary Berg-Cross, Keith Jeffery, Reagan Moore What principles & methods principles are needed to guide the interaction between services interface, What Managed processes & Services? What Basic & Flexible Infrastructure Machinery? See https:// rd-alliance.org/group/data-fabric-ig/post/re-rd a-datafabric-ig-data-fabric-position-paper-broa den-discussion-berg-1

Transcript of Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality Gary Berg-Cross,...

Page 1: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

Position Paper for Data Fabric IGInteroperability, Infrastructures and Virtuality 

https://rd-alliance.org/group/data-fabric-ig.htmlGary Berg-Cross, Keith Jeffery, Reagan Moore

What principles & methods principles are needed to guide the interaction between services interface, protocol ?

What Managed processes &Services?

What Basic & Flexible Infrastructure Machinery?

See https://rd-alliance.org/group/data-fabric-ig/post/re-rda-datafabric-ig-data-fabric-position-paper-broaden-discussion-berg-1 for discussion & attached file.

Page 2: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

2

1. Much of the current DF discussion focuses on a data management & lifecycle view. Lacks a focus on other important topics

standards & federation mechanisms that are needed to assemble collaborations spanning institutions, data management environments.

2. Interoperability, needs to be a 1st class concept in the DF conservation it is fundamentally important for federation & overcoming data-silo

generated problems.

3. There are multiple benefits for development of federated & virtualized mechanisms & mathematical descriptions to assist sharing DOs & (digital) knowledge procedures.

Key Points

Page 3: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

3Initial ideas for DF IG- Implied Framework is Data Lifecycle

New Groups emerge so Pubs are part of Data Fabric & related analysis

View with a Data Management Focus that emerged from the discussions amongst various RDA WG chairs

Where is interoperability in a Raw to Citable Data View ?

Page 4: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

4Data Fabric Analysis, e.g. components & services in LC via Use CASES

How do we come to essential components & services?

(guided by use scenarios that need to include collaboration)

This scenario doesn’t show analysis or data sharing via DO manipulation!

Sharing

Page 5: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

5

Concept of Interoperability: The extent to which systems and devices can routinely

exchange data and services, and interpret that shared data through the shared services

A stronger type of data exchange can include knowledge of the meaning of the data content, usage constraints, and the underlying assumptions.

Bring interoperability (within and cross-domain) aspects into the DF discussions as a ‘first class citizen’ alongside all the other aspects of the research data lifecycle in a domain.

Make Interoperability a First Class View

Page 6: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

6

When an enterprise implements a data management solution, one of multiple types of DFs infrastructure is typically chosen: Data management –enterprise to build a data repository, manage an

information catalog, & enforce management & curation policies (but also)

Data analysis –enterprise to process a data collection, apply analysis & visualization tools, and automate a processing pipeline. (but also)

Data preservation –enterprise to build reference collections and knowledge bases that comprise their intellectual capital, while managing technology evolution

Data publication –enterprise to provide descriptive information and arrangement for discovery and access of data collections.

Data sharing – controlled sharing of a data collection, shared analysis workflows, and information catalogs - interoperability.

Multiple types of DF infrastructure

Page 7: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

7Interoperability mechanisms required for sharing data, information, & knowledge.

Composition - how the separate components, developed separately, can be made to work together.

Minimal set of infrastructure mechanisms & service requirements

Gaps, obstacles and possible incompatibilities

Different suites of components will have different data fabrics.

Enable reproducible research

Brokers

Page 8: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

8

EUDAT & the DataNet Federation Consortium use cases provide some view to help: Interoperability mechanisms for sharing DOs & (digital)

knowledge procedures An implication is that researcher can re-execute trusted

procedures to obtain identical results, making reproducible data-driven research possible

Community driven research collaborations Seismology – share seismic data, tsunami prediction workflows

between research groups Climate change – share oceanography environmental data,

coastal storm surge analyses, hydrology flood analyses, satellite environmental data

Genomics – build a cohort of genomes, predictive models for humans, plants, animals, diseases

Data Sharing Use Cases

Page 9: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

9

1. Shared name spaces for users, files, and services. 1. Besides a single sign-on, shared name space providing users

federated services we want to afford service for virtual collections that span administrative domains.

2. And a shared name space for services enables re-use of procedures across researcher resources.

2. Shared services for manipulating digital objects. 1. Such as shared service through a broker, accessing the

service through its access protocol or an encapsulated service in a virtual machine environment, for movement to the local research resources for execution.

3. Third-party (service) access. 1. Posting requests to a 3rd party, such as a message queue, and

eliminate direct communication between the federated system components.

Expanded Ideas of Federation: 3 versions of federated systems

Page 10: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

10

Virtual machines, such as in a CLOUD or GRID environment, Required to manage dynamic resource allocation, scalability, distributed

parallelism, energy efficiency and other aspects.

Virtual collections Required to build research collaboration environments

We need the appropriate level of abstraction for optimum computing environment/middleware behavior. Too low or prescriptive a level constrains the environment, too high or abstract a level does not indicate clearly the requirement of

the user. See Triple-I Computing as a concept (Information-Intention-Incentive

model proposed by [Schubert and Jeffery, 2014]) and already research projects are addressing the challenges therein.

Enhanced Use of Virtualization

Page 11: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

11

We have made a useful start but the DF vision needs to be expanded (also focused for maximum benefit) to more than a domain of registered DO stored in well-managed

repositories

Frame DF & its services broadly as data use & applications taking into account the available context or environment. This doesn’t minimize good data management practices and services

which are necessary and deserve support, but are not sufficient to address the challenge for interoperability.

For enhanced, semi-automated interoperability we need to consider: Improved metadata for data in context with enhanced semantics Leveraging the emergence of a mathematical foundation for federation

of data management systems (e.g. work by Hao Xu).

Analysis and Preliminary Conclusions

Page 12: Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality  Gary Berg-Cross, Keith.

12

Including datasets, SW services, resources (computers, detectors…), users

Composed as workflows documented mathematically and ideally created autonomically

Achieved through metadata describing the elements of bullet 1 Discovery Contextualisation (relevance, quality, … through relations to

organisations, persons, projects, publications etc. and provenance, rights)

Detailed application-specific (to connect software to data at a resource for a user) i.e. schema-level

The key technologies to achieve interoperability as recognised by researchers are: AAAI PID Metadata with formal syntax and declared semantics

So…. DF, VRE must be integrated