BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

91
BIG DATA EUROPE H2020 CSA (2015 - 17) SC1 – HEALTH CHALLENGE WEBINAR Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges April 4 th 2017 Kiera McNeice, Ronald Siebes, Hajira Jabeen and Nick Lynch

Transcript of BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Page 1: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BIG DATA EUROPE

H2020 CSA (2015-17)

SC1 – HEALTH CHALLENGE WEBINAR

Integrating Big Data, Software & Communities for Addressing Europe’s Societal ChallengesApril 4th 2017

Kiera McNeice, Ronald Siebes, Hajira Jabeen and Nick Lynch

Page 2: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BigDataEurope

5-avr.-17www.big-data-europe.eu

The 7 Societal

Challenges and their

first pilots

Page 3: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC1: Life Sciences & Health

5-avr.-17www.big-data-europe.eu

SC1: Life Sciences & Health

Page 4: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC1: Life Sciences & Health

5-avr.-17www.big-data-europe.eu

Page 5: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC1: Life Sciences & Health

5-avr.-17www.big-data-europe.eu

Page 6: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

SC2: Food & Agriculture

Page 7: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

Partners:FAO, the largest autonomous agency within the

United Nations system and one of the main

players in the agricultural information

community.

Big Data Focus area: Large-scale distributed agricultural data integration

Selected Key Data assets: INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural

Bibliography Network (ABN), AgroVoc, AquaMaps, Fishbase

Semantic Web Company (SWC) is a technology provider headquartered in

Vienna (Austria). SWC supports organizations from all industrial sectors

worldwide to improve their information management. Their core product is to

extract meaning from big data by making use of linked data technologies.

Agroknow is a company that captures, organizes and adds value to the

rich information available in agricultural and food sciences, in order to

make it universally accessible, useful and meaningful.

Page 8: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Viticulture(from the Latin word for vine)

is the science, production,

and study of grapes.

It deals with the series of

events that occur in the vineyard.

Page 9: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC2: Food & Agriculture

5-avr.-17www.big-data-europe.eu

Pilot 2: Support advanced crop

data discovery, processing,

combining and visualization from

distributed and heterogeneous

data repositories

Vine and Wine sector: emerging market in EU

Sustainability and biodiversity challenges:

local varieties are being lost

Exploitation of new grapevine varieties and

clones in terms of climate change adaptation

Quality and health status of viticultural

products

Contribution to human health (antioxidants,

prevention of heart diseases etc.)

Wide variety of heterogeneous (and big)

data from various information sources

Reasons:

Page 10: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC3: Energy

5-avr.-17www.big-data-europe.eu

SC3: Energy

Page 11: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC3: Energy

5-avr.-17www.big-data-europe.eu

Partners:A public entity supervised by the Ministry of Environment,

Energy and Climate Change in Greece, founded in

September 1987, active in the fields of Renewable

Energy Sources (RES), Rational Use of Energy (RUE) and

Energy Saving (ES).

Big Data Focus area: Real-time turbine monitoring stream processing and analytics

Selected Key Data assets: European Energy Exchange Data, smart meter sensor data,

gas/fuels market/price data, consumption statistics, stratigraphic model data (geology,

geophysics)

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

Page 12: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC3: Energy

5-avr.-17www.big-data-europe.eu

Pilot focus area:

System monitoring

in energy production

units.

Page 13: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC3: Energy

5-avr.-17www.big-data-europe.eu

Pilot 3: Operation, maintenance

and production forecasting for

wind turbines on real-time sensor

data.

Current technology is not able to deal with

full amount of available valuable data

Economic benefit of predicting output and

prevention of damage (if one can predict one

part about to fail it can be prevented that other

parts get damaged)

Large continuous stream of sensor data,

perfect to test our platform

Reasons:

Page 14: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC4: Transport

5-avr.-17www.big-data-europe.eu

SC4: Transport

Page 15: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC4: Transport

5-avr.-17www.big-data-europe.eu

Partners: The Fraunhofer Society is a German research organization with 67

institutes spread throughout Germany, each focusing on different

fields of applied science.

Big Data Focus area: Streaming sensor network & geo-spatial data integration

Selected Key Data assets: GTFS data, OSM/LinkedGeoData, MobilityMaps, Transport

sensor data, ROSATTE Road safety attributes, European Road Data Infrastructure -

EuroRoadS

The Centre for Research and Technology-Hellas (CERTH)

founded in 2000 is one of the leading research

centres in Greece. CERTH includes the Hellenic Institute of

Transport (HIT): Land, Sea and Air Transportation as well

as Sustainable Mobility services

ERTICO - ITS Europe is a partnership of around 100 companies

and institutions involved in the production of Intelligent Transport

Systems (ITS).

Page 16: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC4: Transport

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Info mobility and

traffic planning

Page 17: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC4: Transport

5-avr.-17www.big-data-europe.eu

Pilot 4: Multisource data collection

for the provision of accurate info-

mobility and advanced transport

planning service in Thessaloniki,

Greece

Congestion is a major problem in Europe,

especially in urban areas.

utilizing real-time probe data for the

provision of accurate info-mobility services and

advanced transport planning, leads to better

decisions

The use of mobility data coming from multiple

sources presents significant challenges,

especially due to the different nature of the

datasets both in content and spatio-temporal

terms as well as due to the fact that the data

should be collected and processed in real time.

Reasons:

Page 18: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC5: Climate

5-avr.-17www.big-data-europe.eu

SC5: Climate

Page 19: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC5: Climate

5-avr.-17www.big-data-europe.eu

Partners:A public entity supervised by the Ministry of Environment,

Energy and Climate Change in Greece, founded in

September 1987, active in the fields of Renewable

Energy Sources (RES), Rational Use of Energy (RUE) and

Energy Saving (ES).

Big Data Focus area: Enormous simulation time. Extremely complicated computing model.

Selected Key Data assets: European Grid Infrastructure (EGI). Access to several data centres

hosted at CNRS-Lyon, NCSR-D Athens, INFN-Milan, NIKhEF-Amsterdam.

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

Page 20: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC5: Climate

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Supporting data-intensive

climate research

Page 21: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC5: Climate

5-avr.-17www.big-data-europe.eu

Pilot 5: Downscaling, and retrieval

process on (raw) climate data via

User-defined parameters (e.g.

geographical areas, time period,

physical variables, computational

grids, time steps)

The provision of Climate model data satisfies

an important objective, that of assessing the

potential impacts of climate change on well

being for adaptation, prevention and mitigation

measures and supporting other policy making

decisions.

The awareness led to the availability of huge

datasets

Downscaling is a computational intensive

process

Reasons:

Page 22: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

SC6: Social Sciences

Page 23: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

Partners:CESSDA provides large scale, integrated and sustainable

data services to the social sciences. CESSDA is organised

as a limited company under Norwegian law owned and

financed by the individual EU member states’ ministry of

research or a delegated institution.

Big Data Focus area: Statistical and research data linking & integration

Selected Key Data assets: Federated social sciences data catalogs, statistical data from public

data portals and statistical offices (e.g. EuroStats, UNESCO, WorldBank)

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

Page 24: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

Pilot focus area:

Citizens budget spending on

municipal level

Page 25: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC6: Social Sciences

5-avr.-17www.big-data-europe.eu

Pilot 6: Citizens budget

in municipal level

Budget: the most important document of

public policy

Budget execution affects everyday lives

Citizens are more involved in city level

Having a platform that integrates

heterogeneous budget data (many municipality

have their own data formats) and calculates

infographics would benefit the citizens, the

research community and policy makers

Reasons:

Page 26: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC7: Security

5-avr.-17www.big-data-europe.eu

SC7: Security

Page 27: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC7: Security

5-avr.-17www.big-data-europe.eu

Partners:The Centre supports the decision making of the European

Union in the field of the Common Foreign and Security

Policy (CFSP), by providing products and services

resulting from the exploitation of relevant space assets

and collateral data, including satellite imagery and

aerial imagery, and related services.

NCSR "Demokritos", the largest multidisciplinary research

centre of Greece hosts significant scientific research,

technological development and educational activities,

coordinated by eight Institutes.

Page 28: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC7: Security

5-avr.-17www.big-data-europe.eu

Big Data Focus area: Image data analysis

Selected Key Data assets: Earth Observation data (e.g. Very High Resolution Satellite

Imagery acquired from commercial providers and governmental systems) and collateral data

for supporting CFSP/CSDP missions and operations

Page 29: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC7: Security

5-avr.-17www.big-data-europe.eu

Pilot focus area:Getting insight in man-made surface

changes triggered by automatic detection, news, or

social media information

Page 30: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

SC7: Security

5-avr.-17www.big-data-europe.eu

Pilot 7: Ingestion of remote

sensing images and social

sensing data to detect and verify

man-made changes on the Earth

surface for security applications

Evacuation route planning

Monitoring of critical infrastructures

Border security

Satellite image data is HUGE and

computational intensive to compare

Smart ‘focus’ algorithms are needed to

prioritize the analysis jobs

Reasons:

Page 31: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Big Data Europe Integrator Platform

Dr Hajira Jabeen, University of Bonn

SC1 Webinar

Page 32: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Platform Goals

◎Opensource

◎Simple to get started with Big Data

◎Support a variety of use cases

◎Embrace emerging Big Data technologies

◎Simple integration with custom components

Page 33: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Key actors

Page 34: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Platform Architecture4

Page 35: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

5

Platform Architecture

Page 36: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Platform Architecture6

Page 37: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Platform Architecture Support Layer

Init Daemon

GUIs

Monitor

App Layer

Traffic

Forecast Satellite Image Analysis

Platform Layer

Spark Flink Semantic Layer

Ontario SANSA SemagrowKafka

Real-time Stream Monitoring

...

...

Resource Management Layer (Swarm)

Hardware Layer

Premises Cloud (AWS, GCE, MS Azure, …)

Data Layer

Hadoop NOSQL Store CassandraElasticsearch ...RDF Store

Page 38: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Supported FrameworksSearch/indexing Data processing

Apache Solr Apache Spark

Data acquisition Apache Flink

Apache Flume Semantic Components

Message passing Strabon

Apache Kafka Sextant

Data storage GeoTriples

Hue Silk

Apache Cassandra SEMAGROW

ScyllaDB LIMES

Apache Hive 4Store

Postgis OpenLink Virtuoso

8

Page 39: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDI Stack Lifecycle

Page 40: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDI Stack Lifecycle

Deploy BDE

Platform/Stack

to the Cluster

Page 41: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDI Stack Lifecycle

Stack/Cluster

Monitor

Page 42: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDI Stack Lifecycle

Developing

Custom

Applications

Page 43: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDI Stack Lifecycle

Docker Images

Page 44: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDI Stack Lifecycle

BDI Stack (workflow)

builder

Page 45: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDI Stack Lifecycle

Custom Components

*Init Daemon

*Integrator UI

Page 46: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

◎ High level pictureo docker-compose.yml describes pipeline topology

◎ BDE provided componentso extend template image with your code

◎ New componentso build a Docker image for your componento this is your own little Virtual Machine for your component

◎ Sharingo publish topology as git repositoryo publish new components on docker hub

Platform development

Page 47: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Actors

◎Cluster Setup ◎Developer ◎Packaging◎Stack Composition / Integration◎Deployment◎Monitoring

17

Page 49: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Development◎Base Docker images

o Serve as a template for a (Big Data) technologyo Easily extendable custom algorithm/data

◎Published componentso Image repositories on GitHubo Automated builds on DockerHubo Documentation on BDE Wiki

19

Page 50: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Deploying a Big Data Stack◎ Stack

o collection of communicating components o to solve a specific problem

◎ Described in Docker Composeo Component configurationo Application topology

20

Page 51: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Enhancing the Component

◎ Orchestrator required for initialization process (init_daemon)o Components may depend on each othero Components may require manual intervention

◎ User Interface Integrationo Standard Interfaces from componentso Combine and align the interfaces

21

Page 52: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

User Interfaces

◎Target: Facilitate use of the platform

o User Interface Adaption

◎Available interfaces

o Workflow UIs

❖Workflow Builder

❖Workflow Monitor

o Swarm UI

o Integrator UI

22

Page 53: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDE Workflow Builder23

Page 54: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDE Workflow Monitor24

Page 55: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Swarm UI

Page 56: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Swarm UI26

Page 57: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Integrator UI27

Page 58: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Beyond the state of the art ...

Smart Big Data

Increase the value of Big Data by adding meaning to it!

28

Page 59: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Semantic Data Lake (Ontario)

◎Data Swamp

o Repository of data in its raw format

o Structured, semi-structured, unstructured

o Schema-less

◎Data Lake

o Add a Semantic layer on top of the source datasets

o The data is semantically lifted using existing ontology terms

29

Page 60: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 61: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

31

SANSA Stack

Page 62: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Thank youhttps://github.com/big-data-europe

32

[email protected]

Page 63: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

33

Page 64: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDE vs Hadoop distributions

Hortonworks Cloudera MapR Bigtop BDE

File System HDFS HDFS NFS HDFS HDFS

Installation Native Native Native Native lightweight virtualization

Plug & play components (no rigid schema)

no no no no yes

High Availability Single failure recovery (yarn)

Single failure recovery (yarn)

Self healing, mult. failure rec.

Single failure recovery (yarn)

Multiple Failure recovery

Cost Commercial Commercial Commercial Free Free

Scaling Freemium Freemium Freemium Free Free

Addition of custom components

Not easy No No No Yes

Integration testing yes yes yes yes --

Operating systems Linux Linux Linux Linux All

Management tool Ambari Cloudera manager MapR Control system

- Docker swarm UI+ Custom

34

Page 65: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

BDE vs Hadoop distributions◎BDE is not built on top of existing distributions◎Targets

o Communitieso Research institutions

◎Bridges scientists and open data◎Multi Tier research efforts towards Smart

Data

35

Page 66: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Stian Soiland-Reyes, University of ManchesterNick Lynch, CTO Open PHACTS Foundation

4 Apr 2017

Stian Soiland-Reyes, University of ManchesterNick Lynch, CTO Open PHACTS Foundation

4 Apr 2017

Page 67: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 68: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Summary

3

• Update on Docker and Open PHACTS

• Learnings & transition to AWS

• Next Steps & Future Releases

Page 69: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 70: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 71: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 72: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 73: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 74: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 75: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Open PHACTS @dockerhub

14

https://hub.docker.com/r/openphacts/

Page 76: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 77: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 78: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 79: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 80: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 81: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 82: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 83: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 84: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 85: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 86: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 87: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 88: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
Page 89: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Open PHACTS Next Steps

34

• Data Refresh planned API 2.2:–Phase 1: ChEMBL, WikiPathways, Uniprot + Chemistry

Refreshed (RDF and linksets)

–Phases 2 & 3: Remaining data sources

–Build data refresh processes

• Wider Architecture Review

• Science and Open PHACTS Webinar–Science and Open PHACTS: Workflow tools for Life

Science Research

–https://register.gotowebinar.com/register/2550359383420450817

Page 90: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe

Open PHACTS

35

• Custom Data Staging:

–Different licensing options to cover Annotated SureChEMBL for members/non members

• MicroServices?

–Part of Architecture review to discuss future services/API

–Interested in experiences of this

• Workflow

–BioExcel Workflow blocks in development

–See Bio.tools

Page 91: BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe