NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin...

9
NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence) September 5, 2013

Transcript of NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin...

Page 1: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST BIG DATA WGReference Architecture Subgroup

Agenda for the Subgroup Call

Co-chairs:Orit Levin (Microsoft)James Ketner (AT&T)Don Krapohl (Augmented Intelligence)

September 5, 2013

Page 2: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 2

Agenda

• Synchronization with other subgroups• Deliverable #1• Deliverable #2• Open Issues to be resolved TODAY

09/04/2013

Page 3: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 3

Deliverable #1: The White Paper (outline posted as M0151)• Status: no real progress so far• The situation will be discussed tomorrow, Thu , during the weekly subgroup

call

• Call for volunteers to describe different approaches• Especially for the authors of proposed architectures

09/04/2013

Page 4: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 4

Deliverable #2: The Common RA

• Great progress has been made• The two proposals under discussion are very close• Both are based on previous contributions and the work of all subgroups

• The main open issue is around the role of the “Data Sources”• We need to resolve this issue before we can move forward

• Today• Clarify the decision / reaching consensus process• Make the decision

• The table of additional open issues is provided• Less critical, but needs to be resolved ASAP• Close as many as possible open issues today

09/04/2013

Page 5: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 5

Open Issue: The role of Data Sources in the NIST BD Reference Architecture• Proposal I:• “Data Manager” uses the “Data Service Abstraction” provided by the

Transformation to drive the system• Data Sources are passive and therefore are included in the Data Manager

• Proposal II:• “Data Producer” abstracts and introduces the data sources into the system by

exposing the “Data Service Abstraction” to be used by other components• “System Manager” is a new fifth main component that drives the system (i.e.,

assumes part of the Data Manager functionality per above)

09/04/2013

Page 6: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 6

M0189v.1 (+ most recent feedback) M0126v.4

09/04/2013

Page 7: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 7

Data Manager

• Is responsible for processing given data using certain methodologies and placing the processed results into certain formats.• Main responsibility is to provide data source description (location of data which

can be centralized or distributed, data at rest or data in motion, file types and attributes, etc.) to the TF and request one or more data services for the given data.• Data services can include

• (a) collecting data from one or more data sources, • (b) deciding what curation process should be performed, • (c) determining how data should be analyzed, • (d) choosing what methods and formats the processed results should be stored as, and• (e) deciding how and where processed results should be pushed to (end users or systems, if

any)

09/04/2013

Proposal I

Page 8: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 8

System Manager (or Vertical Application)

• Responsible for the supply chain from a variety of• Data Producers• Data Capabilities Providers

• Defines and integrates the required data transformations components into the vertical system• Uses both internal and external to

the system service abstractions • Scope depends on the Vertical’s

business environment:• Tightly-coupled Enterprise: System

Manager is a central functional entity• Loosely-coupled Vertical: every

independent stakeholder implements its System Manager 09/04/2013

Data Producer

• Introduces new information feeds into the big data system for discovery, access and use by the BD system:• Creates the metadata describing the

information source(s), usage policies, and other relevant attributes

• Using the metadata, publishes the availability of the information and the means to access it through a well-defined interface.

• Using various technology means, makes the information accessible by the BD system components. These technology means allow subscription to events, listening to data feeds, querying for specific data properties or content, and the ability to upload and host software tools to process the data in situ.

• Note: New feeds are distinct from the data already being in use by the system and residing the various system repositories (including memory, databases, etc.) although similar technologies can be used to access both.

Proposal II

Page 9: NIST BIG DATA WG Reference Architecture Subgroup Agenda for the Subgroup Call Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented.

NIST Big Data WG / Ref Arch Subgroup 9

Abstract Alternative Approaches Resolution

Notes

Data Sources role 1. Data Sources are included in “Data Manager”, which uses the “services” provided by the “transformation” to drive the system.

2. Data Sources are a stand-alone Level 1 component , which provides “data services” to other RA components.

TBD Blocking the progress

Terminology and Taxonomy used for the Level 1 and Level 2 RA blocks

1. Collector, Manager, Producer, Consumer, etc.2. Collection, Management, Sources, Use, etc.

TBD Relates to the “actors” vs. “roles” vs. “activities” open issue in Def&Tax subgroup

Granularity of interfaces shown in terms of data flow and tools flows

1. Shown together by a single arrow2. Shown separately

TBD

Placement of Management and “Security & Privacy” blocks

1. Inside the “Management Capabilities” block2. Each block (Capabilities, Security&Privacy, Management) is stand alone

TBD

“Management” block resolution

1. Management only, no additional details on the figure2. Management block contains System Management and Lifecycle

Management sub-blocks3. System Management and Lifecycle Management blocks are stand alone

TBD

“Capabilities’” sub-block’ vocabulary

TBD: Needs targeted input and explicit discussion from the Roadmap subgroup

TBD Pending sync with the Roadmap subgroup

Open Issues

09/04/2013