Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center -...

21
Dusan Jakovetic University of Novi Sad, Faculty of Sciences, Serbia Virtual BenchLearning Webinar July 8, 2020 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780787 Industrial-Driven Big Data as a Self-Service Solution

Transcript of Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center -...

Page 1: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

Dusan Jakovetic

University of Novi Sad, Faculty of Sciences, Serbia

Virtual BenchLearning Webinar

July 8, 2020

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780787

Industrial-Driven Big Data as a Self-Service Solution

Page 2: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

Brief introduction to I-BiDaaS

I-BiDaaS pipeline, architecture & technologies

I-BiDaaS & BDVA reference model

Benchmarking landscape and opportunities @ I-BiDaaS

Outline

2

Page 3: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

Identity card

http://www.ibidaas.eu/ @Ibidaas https://www.linkedin.com/in/i-bidaas/

3

Page 4: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

Consortium

1. FOUNDATION FOR RESEARCH AND TECHNOLOGY HELLAS (FORTH)

2. BARCELONA SUPERCOMPUTING CENTER - CENTRO NACIONAL DE SUPERCOMPUTACION (BSC)

3. IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD (IBM)

4. CENTRO RICERCHE FIAT SCPA (CRF)

5. SOFTWARE AG (SAG)

6. CAIXABANK, S.A (CAIXA)

7. THE UNIVERSITY OF MANCHESTER (UNIMAN)

8. ECOLE NATIONALE DES PONTS ET CHAUSSEES (ENPC)

9. ATOS SPAIN SA (ATOS)

10. AEGIS IT RESEARCH LTD (AEGIS)

11. INFORMATION TECHNOLOGY FOR MARKET LEADERSHIP (ITML)

12. UNIVERSITY OF NOVI SAD FACULTY OF SCIENCES SERBIA (UNSPMF)

13. TELEFONICA INVESTIGACION Y DESARROLLO SA (TID)

4

Page 5: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

A complete and safe environment for methodological big data experimentation

Tool and services to increase the quality of data analytics

A Big Data as a Self-Service solution that helps in breaking silos and boosts EU's data-driven economy

Tools and services for fast ingestion and consolidation of both realistic and fabricated data

Tools and services for the management of heterogeneous infrastructures

Increases impact in research community and contributes to industrial innovation capacity

5

Key messages

Page 6: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

• Expert mode

Analyze your DataUsers

• Import your data

• Self-service mode

Data

• Fabricate Data

• Stream & Batch Analytics

• Expert: Upload your code

• Self-service: Select an

algorithm from the pool

Results

• Visualize the results

• Share models

Do it yourself

In a flexible

manner

Break data silos Safe environment Interact with Big Data

technologiesIncrease speed of

data analysisCope with the rate of data

asset growth

Intra- and inter-

domain data-flow

Benefits of using I-BiDaaS

• Co-develop mode

I-BiDaaS pipeline

• Co-develop: custom end-

to-end application

• Tokenize data

6

Page 7: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

• Expert mode

Analyze your DataUsers

• Import your data

• Self-service mode

Data

• Fabricate Data

• Stream & Batch Analytics

• Expert: Upload your code

• Self-service: Select an

algorithm from the pool

Results

• Visualize the results

• Share models

Do it yourself

In a flexible

manner

Break data silos Safe environment Interact with Big Data

technologiesIncrease speed of

data analysisCope with the rate of data

asset growth

Intra- and inter-

domain data-flow

Benefits of using I-BiDaaS

• Co-develop mode

• Co-develop: custom end-

to-end application

• Tokenize data

7

Flexible solution

I-BiDaaS pipeline

Page 8: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

• Expert mode

Analyze your DataUsers

• Import your data

• Self-service mode

Data

• Fabricate Data

• Stream & Batch Analytics

• Expert: Upload your code

• Self-service: Select an

algorithm from the pool

Results

• Visualize the results

• Share models

Do it yourself

In a flexible

manner

Break data silos Safe environment Interact with Big Data

technologiesIncrease speed of

data analysisCope with the rate of data

asset growth

Intra- and inter-

domain data-flow

Benefits of using I-BiDaaS

• Co-develop mode

• Co-develop: custom end-

to-end application

• Tokenize data

Data sharing & breaking silos

8

I-BiDaaS pipeline

Page 9: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

• Expert mode

Analyze your DataUsers

• Import your data

• Self-service mode

Data

• Fabricate Data

• Stream & Batch Analytics

• Expert: Upload your code

• Self-service: Select an

algorithm from the pool

Results

• Visualize the results

• Share models

Do it yourself

In a flexible

manner

Break data silos Safe environment Interact with Big Data

technologiesIncrease speed of

data analysisCope with the rate of data

asset growth

Intra- and inter-

domain data-flow

Benefits of using I-BiDaaS

• Co-develop mode

• Co-develop: custom end-

to-end application

• Tokenize data

9

I-BiDaaS pipeline

Page 10: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

• Expert mode

Analyze your DataUsers

• Import your data

• Self-service mode

Data

• Fabricate Data

• Stream & Batch Analytics

• Expert: Upload your code

• Self-service: Select an

algorithm from the pool

Results

• Visualize the results

• Share models

Do it yourself

In a flexible

manner

Break data silos Safe environment Interact with Big Data

technologiesIncrease speed of

data analysisCope with the rate of data

asset growth

Intra- and inter-

domain data-flow

Benefits of using I-BiDaaS

• Co-develop mode

• Co-develop: custom end-

to-end application

• Tokenize data

10

I-BiDaaS pipeline

Page 11: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

The I-BiDaaS solution:Architecture & technologies

Medium to long term business decisions

Data Fabrication

Platform (IBM)

Refined specifications

for data fabrication

GPU-accelerated Analytics (FORTH)

Apama Complex Event Processing (SAG)

Streaming Analytics

Batch Processing

Advanced ML (UNSPMF)

COMPs Programming Model (BSC)

Query Partitioning

Infrastructure layer: Private cloud; Commodity cluster; GPUs

Pre-definedQueries

SQL-likeinterface

Domain Language

ProgrammingAPI

User Interface

Resource management and orchestration (ATOS)

AdvancedVisualis.

Advanced IT services for Big Data processing tasks;

Open source pool of ML algorithms

Data ingestion

Programming Interface / Sequential

Programming

(AEGIS+SAG)

(AEGIS)

COMPs Runtime(BSC)

Distributed large scale

layer

Application layer

Un

ive

rsal

Me

ssag

ing

(SA

G)

Test

Dat

a Fa

bri

cati

on

(IB

M)

Meta-data;Data descri-ption

Hecuba tools(BSC)

Short term decisionsreal time alerts

Model structure improvements

Learned patterns correlations

11

Page 12: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

WP2: Data, user interface, visualization

Technologies:• IBM TDF• SAG UM• AEGIS AVT

Medium to long term business decisions

Data Fabrication

Platform (IBM)

Refined specifications

for data fabrication

GPU-accelerated Analytics (FORTH)

Apama Complex Event Processing (SAG)

Streaming Analytics

Batch Processing

Advanced ML (UNSPMF)

COMPs Programming Model (BSC)

Query Partitioning

Infrastructure layer: Private cloud; Commodity cluster; GPUs

Pre-definedQueries

SQL-likeinterface

Domain Language

ProgrammingAPI

User Interface

Resource management and orchestration (ATOS)

AdvancedVisualis.

Advanced IT services for Big Data processing tasks;

Open source pool of ML algorithms

Programming Interface / Sequential

Programming

(AEGIS+SAG)

(AEGIS)

COMPs Runtime(BSC)

Distributed large scale

layer

Application layer

Un

iver

sal M

essa

gin

g (S

AG

)

Dat

a Fa

bri

cati

on

Pla

tfo

rm (

IBM

)

Meta-data;Data descri-ption

Hecuba tools(BSC)

Short term decisionsreal time alerts

Model structure improvements

Learned patterns correlations

12http://ibidaas.eu/tools

Page 13: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

13

Medium to long term business decisions

Data Fabrication

Platform (IBM)

Refined specifications

for data fabrication

GPU-accelerated Analytics (FORTH)

Apama Complex Event Processing (SAG)

Streaming Analytics

Batch Processing

Advanced ML (UNSPMF)

COMPs Programming Model (BSC)

Query Partitioning

Infrastructure layer: Private cloud; Commodity cluster; GPUs

Pre-definedQueries

SQL-likeinterface

Domain Language

ProgrammingAPI

User Interface

Resource management and orchestration (ATOS)

AdvancedVisualis.

Advanced IT services for Big Data processing tasks;

Open source pool of ML algorithms

Programming Interface / Sequential

Programming

(AEGIS+SAG)

(AEGIS)

COMPs Runtime(BSC)

Distributed large scale

layer

Application layer

Un

iver

sal M

essa

gin

g (S

AG

)

Dat

a Fa

bri

cati

on

Pla

tfo

rm (

IBM

)

Meta-data;Data descri-ption

Hecuba tools(BSC)

Short term decisionsreal time alerts

Model structure improvements

Learned patterns correlations

WP3:Batch analytics

Technologies:• BSC COMPSs• BSC Hecuba• BSC Qbeast• Advanced ML (UNSPMF)

13http://ibidaas.eu/tools

Page 14: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

14

Medium to long term business decisions

Data Fabrication

Platform (IBM)

Refined specifications

for data fabrication

GPU-accelerated Analytics (FORTH)

Apama Complex Event Processing (SAG)

Streaming Analytics

Batch Processing

Advanced ML (UNSPMF)

COMPs Programming Model (BSC)

Query Partitioning

Infrastructure layer: Private cloud; Commodity cluster; GPUs

Pre-definedQueries

SQL-likeinterface

Domain Language

ProgrammingAPI

User Interface

Resource management and orchestration (ATOS)

AdvancedVisualis.

Advanced IT services for Big Data processing tasks;

Open source pool of ML algorithms

Programming Interface / Sequential

Programming

(AEGIS+SAG)

(AEGIS)

COMPs Runtime(BSC)

Distributed large scale

layer

Application layer

Un

iver

sal M

essa

gin

g (S

AG

)

Dat

a Fa

bri

cati

on

Pla

tfo

rm (

IBM

)

Meta-data;Data descri-ption

Hecuba tools(BSC)

Short term decisionsreal time alerts

Model structure improvements

Learned patterns correlations

WP4: Streaming analytics

Technologies:• SAG Apama CEP• FORTH GPU-accel. analytics

14http://ibidaas.eu/tools

Page 15: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

15

Medium to long term business decisions

Data Fabrication

Platform (IBM)

Refined specifications

for data fabrication

GPU-accelerated Analytics (FORTH)

Apama Complex Event Processing (SAG)

Streaming Analytics

Batch Processing

Advanced ML (UNSPMF)

COMPs Programming Model (BSC)

Query Partitioning

Infrastructure layer: Private cloud; Commodity cluster; GPUs

Pre-definedQueries

SQL-likeinterface

Domain Language

ProgrammingAPI

User Interface

Resource management and orchestration (ATOS)

AdvancedVisualis.

Advanced IT services for Big Data processing tasks;

Open source pool of ML algorithms

Programming Interface / Sequential

Programming

(AEGIS+SAG)

(AEGIS)

COMPs Runtime(BSC)

Distributed large scale

layer

Application layer

Un

iver

sal M

essa

gin

g (S

AG

)

Dat

a Fa

bri

cati

on

Pla

tfo

rm (

IBM

)

Meta-data;Data descri-ption

Hecuba tools(BSC)

Short term decisionsreal time alerts

Model structure improvements

Learned patterns correlations

WP5:Resource mgmt & integration

Technologies:• ATOS Resource mgmt• ITML integration services

15http://ibidaas.eu/tools

Page 16: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

16

BDVA Reference model

BDV SRIA: European Big Data Value Strategic Research and Innovation Agenda

Page 17: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

17

BDVA reference model horizontal concerns & I-BiDaaS

BDV reference model horizontal concern I-BiDaaS module or platform as a whole

Data visualization and user interaction Advanced visualization module (AEGIS+SAG);

User interface (AEGIS)

Data analytics Batch processing module (UNSPMF+BSC);

Streaming analytics module (SAG+FORTH)

Data processing architectures expected

advances according to BDVA SRIAI-BiDaaS platform

Data protection

FORTH commodity cluster privacy preservation through

commodity hardware (Intel SGX); TDF (IBM) for generation of

realistic synthetic data when real data cannot be uploaded to

cloud or similar systems

Data management COMPSs runtime (BSC); ATOS resource management and

orchestration module

The Cloud and HPC (efficient usage of Cloud) ATOS resource management and

orchestration module

Page 18: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

Brief introduction to I-BiDaaS

I-BiDaaS pipeline, architecture & technologies

I-BiDaaS & BDVA reference model

Benchmarking landscape and opportunities @ I-BiDaaS

Outline

18

Page 19: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

19

Benchmarking: Technology levelI-BiDaaS partner Technology name Big Data pipeline element Current benchmarks

FORTH GPU accelerator technology Data pre-processing, Streaming Analytics

Custom benchmark (throughput, latency)

BSC COMPSs Sequential programming model for distributed architectures

Applications (Own use cases)

BSC Hecuba Data management framework with easy interface

Applications (Own use cases)

BSC Qbeast Multidimensional indexing and storage

TCP-H

IBM Test Data Fabrication Synthetic test data fabrication Several open source + commercial products (e.g., Grid tools of CA) / No known benchmarks yet

SAG Apama Streaming Analytics Platform Streaming Analytics Custom benchmark (throughput)

SAG Universal Messaging Message Broker Custom benchmark (throughput)

SAG WebMethods Integration Platform Integration N/A

SAG MashZone Visualization N/A

AEGIS Advanced visualization and monitoring Visualization and interface N/A

UNSPMF Pool of ML algorithms in COMPSs/Python

Batch analytics Respective MPI implementation; Sklearn

ATOS Resource management and orchestration module

Resource management N/A

Page 20: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

Business Objectives Data Sets Data Size Processing Type Type of Analysis

Tele

com

s ⁻ improve and optimize

current operations

⁻ Anonymized mobility data

(structured)

⁻ Anonymized call center data

(unstructured)

TB ⁻ batch & streaming ⁻ predictive

⁻ descriptive /

diagnostic

Fin

ance

⁻ improve decision

making

⁻ improve efficiency of Big

Data solutions

⁻ Tokenized online banking

control data (structured)

⁻ Tokenized bank transfer data

(structured)

⁻ Tokenized IP address data

(structured)

PB ⁻ batch

⁻ batch & streaming

⁻ descriptive /

diagnostic

Man

ufa

ctu

rin

g ⁻ improve and optimise

current operations

⁻ improve the quality of

the process and product

⁻ Anonymized SCADA/MES data

(structured)

⁻ Anonymized Aluminum Die-

casting (structured)

GB ⁻ batch

⁻ batch & streaming

⁻ predictive

⁻ diagnostic

Benchmarking: Business, data & analytics level

20

Page 21: Industrial-Driven Big Data as a Self-Service Solution€¦ · barcelona supercomputing center - centro nacional de supercomputacion (bsc) 3. ibm israel - science and technology ltd

I-BiDaaS

Partner

Use Case Most relevant business KPIs

TID Accurate location prediction with high traffic and visibility - Acquisition of insights on the dynamics of cellular sectors- Processing costs (cost reduction)- Customer satisfaction

TID Optimization of placement of telecommunication equipment

TID Quality of service in Call Centers

CAIXA Enhanced control on online banking - Cost reduction

- Data accessibility

- Time efficiency

- End-to-end execution time (from data request

to data provision)

CAIXA Advanced analysis of bank transfer payment in financial terminal

CAIXA Analysis of relationships through IP addresses

CRF Production process of aluminium die-casting - 3 Product quality levels (High. Medium, Low)

- Overall Equipment Effectiveness (OEE),

- Maintenance cost

- Cost reductionCRF Maintenance and monitoring of production assets

Benchmarking: Business level

21