BigDataEurope @BDVA Summit2016 2: Societal Pilots

43
BIG DATA EUROPE AND THE 7 SOCIETAL PILOTS BDVA Summit 2016, Valencia 1 December 2016 Summit 2016

Transcript of BigDataEurope @BDVA Summit2016 2: Societal Pilots

BIG DATA EUROPEAND THE 7 SOCIETAL PILOTS

BDVA Summit 2016, Valencia1 December 2016

Summit 2016

Talk outline

The BigDataEurope Project & Mission [2 slides] The Big Data Integrator (BDI) platform [3 slides] 7 Pilots for the 7 Societal Challenge Domains

o Overview o SC4 (Transport: Traffic Conditions Estimation)o SC7 (Security: Event Detection) [DEMO]

6-déc.-16www.big-data-europe.eu

Supporting the Societal Domains with Big Data Technology

BigDataEurope Project

6-déc.-16www.big-data-europe.eu

BigDataEurope Action EC Horizon 2020 Coord. & Support Action

o ~5mio €, 2015-2017

Lower barrier for using BD technologieso Setup & deploy use-case workflows, lack of expertise

Show societal value of Big Datao Across 7 H2020 societal challengeso Establish data value chains across domains & orgs.

6-déc.-16www.big-data-europe.eu

Data Value Chain Evolution

6-déc.-16

Extraction, Curation Quality, Linking, Integration

Publication, Visualization, Analysis

Extraction, Curation, Quality, Linking, Integration, Publication,

Visualization, Analysis

HealthTransport

Security

Extraction Curation Quality Linking Integration Publication Visualization Analysis

Data Repositories

Linked Open Data

TIME

Food SocietiesClimate EnergyProprietary, ‘locked-in’solutions

OS Solutions,Big Data Stacks

www.big-data-europe.eu

A flexible, generic platform for (Big) Data Value Chain Deployment

Big Data Integrator

6-déc.-16www.big-data-europe.eu

• Must be considered at: data acquisition, data processing and data display level

• A need to find a solution to accommodate all 3 levels

• It is an important concern to most SCs• Common feeling “better integration solution of wider variety of data

leads to better statistics”

• Most help in this direction is needed by SC1 and SC5, remains an important aspect for All SCs

• Decisions depend on results of statistics which are as good as the data quality which is used

SC1 SC2 SC3 SC4 SC5 SC6 SC7

Societal Perception of the 4 V’s

Platform Requirements

Big Data Integrator: Architecture Key points

o Stacks Open Source solutions (Free)

o Dockerization

o Facilitates integration and deployment

o Plug-and-play BD Platform

o Cloud-deployment ready

Key BDE additionso Support layer: integrated UI

o Semantification layer6-déc.-16www.big-data-europe.eu

Big Data Integrator: In-Use

Big Data Integrator:https://github.com/big-data-europe

WIKI : extensive documentation, information on supported components, instructions, etc.

6-déc.-16www.big-data-europe.eu

Demonstrating the Societal Value through 7 Pilot ‘Real-world’ use-cases

1. Overview

BigDataEurope Pilots

6-déc.-16www.big-data-europe.eu

Pilots: Overview

SC1: Health & Pharm.

SC2: Food & Agr.

SC3: Energy

SC4: Transport

6-déc.-16www.big-data-europe.eu

SC5: Climate

SC6: Social Sciences

SC7: Security

7 Pilots

◎ BDI Platform Instantiationso Allow end-users to easily deploy functionality in own system environment o Modularized Docker approach - easier to replace componentso Reduces effort to keep 3rd party software updated & integrated

◎ 7 Societal Challenge Pilots o Aligned with 7 European Commision H2020 Societal Challengeso Real-world use-cases (Data, Objectives, Solutions)o Some pilots have different data & objectives but a similar solution

6-déc.-16www.big-data-europe.eu

SC1: Pharmacology research

6-déc.-16

www.big-data-europe.eu

Life Sciences & Health

• Query a large number of datasets, some large

• Existing elaborate ingestion and homogenization by OpenPHACTS

• Extensive toolset developed by OPF and others

Objective: Large-scale heterogeneous pharma-research data linking & integration

SC1: Architecture & Components

6-déc.-16www.big-data-europe.eu

• Replicate Open PHACTS functionality on the BDE infrastructure using OS solutions• Based on Virtuoso, proprietary

distributed database• Apply to other domains (e.g.

Agriculture)• Porting to BDI gives flexibility

and enables new functionalities• Logging & system health monitoring

SC2: Viticulture resources

6-déc.-16www.big-data-europe.eu

Food and Agriculture

Objective: Automate publication ingestion and thematic classification• AgInfra is a major

infrastructure for agriculture researchers, serving cross-linked bibliography, data, and processing services

www.big-data-europe.eu

SC2: Architecture & Components

• BDI deployed as an external infrastructure for processing text (viticulture publications)

• Storing and processing text at a larger scale than AgInfracan currently manage

SC3: Predictive maintenance

6-déc.-16www.big-data-europe.eu

Energy

• Wind turbine monitoring applies computational models to sensor data streams

• Models are weekly re-parameterized using week’s data from multiple turbines

Objective: Real-time turbine monitoring stream processing and analytics

www.big-data-europe.eu

• Existing in-house non-scalable solution for model parameterization• Reliable Fortran software for data analysis• Efficient, but not scalable to data volume

• Developing a BDI orchestrator• Re-uses existing software unmodified• Makes it easy to apply in parallel to many

datasets and manage the outputs

SC3: Architecture & Components

SC4: Traffic conditions estimation

6-déc.-16www.big-data-europe.eu

Transport

• Combines:• Traffic modelling from

historical data• Current measurements from a

taxi fleet of 1200 vehicles

Objective: Estimation of real-time traffic conditions in Thessaloniki

6-déc.-16www.big-data-europe.eu

• New Flink implementations of map matching and traffic prediction algorithms

• BDI provides access to varied data sources• PostGIS database with

city map• ElasticSearch database

of historical data• Kafka stream of real-

time data

SC4: Architecture & Components

SC5: Climate modelling

6-déc.-16www.big-data-europe.eu

Climate

• Preparing modelling experiments• Slicing, transforming, combining datasets• Submission and retrieval from modelling

infrastructure• Discovering and re-using previously

computed derivatives• Lineage annotation: computer derivatives

from datasets and model parameters• Finding appropriate past runs avoids

repeating weeks-long modelling runs

Objective: Supporting data-intensive climate research

• BDI offers:• Hive for managing data

in a way that can be retrieved and manipulated, rather than file blocks

• Cassandra stores structured and textual metadata for searching headers and lineage

• Existing infrastructure; stable, reliable software for parallel computation of models• BDI is deployed as an external infrastructure for preparing and managing datasets

SC5: Architecture & Components

SC6: Municipality budgets

6-déc.-16www.big-data-europe.eu

Social Sciences

• Ingestion of budget and budget execution data

• Multiple municipalities in varied formats and data models

Objective: Homogenized Budgetary data made available for analysis and comparison

6-déc.-16www.big-data-europe.eu

• BDI deployed as ingestion and storage infrastructure for external tools• Homogenizes variety of

data (JSON, CSV, XML, etc.)

• Exposes data as SPARQL endpoint serving homogenized data

• Existing analytics and visualization tools• Use SPARQL queries to retrieve only the relevant slices of the overall data

SC6: Architecture & Components

SC7: Change detection & verification

6-déc.-16www.big-data-europe.eu

Secure Societies

• Events are extracted from text published by news agencies and on social networking sites

• Events are geo-located and relevant changes are detected by comparing current and previous satellite images

Objective: Detect and Verify Events based on Satellite Imagery, News and Social Media

6-déc.-16www.big-data-europe.eu

Event Detection

Change Detection

• Re-implementation of change detection algorithms for Spark

• Parallel orchestrator for text analytics• Re-uses existing software• Scales to many input streams

• BDI provides:• Cassandra for text content and

metadata• Strabon GIS store for detected

change location• Homogeneous access to both for

analysis and visualization

SC7: Architecture & Components

Demonstrating the Societal Value through 7 Pilot ‘Real-world’ use-cases

2. In-depth look at the Transport Pilot

BigDataEurope Pilots

6-déc.-16www.big-data-europe.eu

Transport Pilot: Architecture & Objectives

“A scalable, fault-tolerant and flexible platform based on open source frameworks that can process unbounded data sets and graphs.”

Message Broker: Kafka Cluster

L. Selmi - BDE - Tech. Workshop

Apache Kafka is a high-throughput distributed durable messaging system

Apache Kafka

Stream and Batch Processor: Flink Cluster

L. Selmi - BDE - Tech. Workshop

Apache Flink is an open source platform for distributed stream and batch data processing.

Apache Flink

Storage and Indexing: Elasticsearch Cluster

L. Selmi - BDE - Tech. Workshop

Elasticsearch is a distributed open source document database built on top of Apache Lucene

Map-Matching & Prediction: Rserve

L. Selmi - BDE - Tech. Workshop

R is a free software environment for statistical computing. It is used in the pilot to run the map-matching and the prediction algorithms.

The R Project

Transport Pilot: Architecture (High-level)

L. Selmi - BDE - Tech. Workshop

Transport Pilot: BDE Components in Docker Swarm

L. Selmi - BDE - Tech. Workshop

Transport Pilot: The BDE Platform Stack

L. Selmi - BDE - Tech. Workshop

Visualization

L. Selmi - BDE - Tech. Workshop

SC4 Pilot 1 can process real-time FCD data for map-matching and simple road segments classification (normal/congested)

Demonstrating the Societal Value through 7 Pilot ‘Real-world’ use-cases

3. Demonstration of the Security Pilot

BigDataEurope Pilots

6-déc.-16www.big-data-europe.eu

Architecture for SC 7 38

Stack

Security Pilot in Practice

Demonstration

6-déc.-16www.big-data-europe.eu

Free Workshops, Hangouts & Webinars

BigDataEurope Activities

6-déc.-16www.big-data-europe.eu

2nd round of Societal Workshops

6-déc.-16www.big-data-europe.eu

Transport 22 September 2016 Brussels Collocated with Big Data for Transport, Tisa workshop

Food&Agri 30 September 2016 Brussels Collocated with DG AGRI WP2018-20 stakeholder consultation

Energy 4 October 2016 Brussels Collocated with EC H2020 Info Day on “Smart Grids and Storage”

Climate 11 October 2016 Brussels Collocated with Melodies Project Event – Exploiting Open Data

Security 18 October 2016 Brussels Standalone WorkshopSocieties 5 December 2016 Cologne Collocated with EDDI16- 8th Annual

European DDI User Conference Health 9 December 2016 Brussels Standalone Workshop

Other Activities

Fresh set (7) of Societal Workshops in 2017

Various SC-focussed and general hangouts, follow!o Apache Flink & BDE (20 Oct) – available onlineo BDVA & BDE Webinar planned early next yearo Keep track on BDE Website (Events)

6-déc.-16www.big-data-europe.eu

WEB: www.big-data-europe.eu EMAIL: [email protected]

BIG DATA INTEGRATOR www.github.com/big-data-europe

PROJECT COORDINATION (Fraunhofer IAIS)Prof. Sören Auer, auer © cs.uni-bonn · de > Dr. Simon Scerri, scerri © cs.uni-bonn · deEIS Department/Group,Fraunhofer IAIS & CS Department Uni-Bonn, Bonn, Germany

Questions & Contacts

www.big-data-europe.eu6-déc.-16

#BigDataEurope

leads the FraunhoferBig Data Alliance