Dockerizing a multi-component Open Data app

31
Dockerizing a multi- component Open Data app Athens Docker Meetup, June 2016 Dimitris Negkas, Stergios Tsiafoulis [email protected], [email protected]

Transcript of Dockerizing a multi-component Open Data app

Dockerizing a multi-component Open Data app

Athens Docker Meetup, June 2016Dimitris Negkas, Stergios Tsiafoulis

[email protected], [email protected]

Description and Scope

LinkedEconomy (http://linkedeconomy.org/).

is a publicly available web platform and linked data repository.

its scope is to transform, curate, aggregate, interlink and publish economic data in machine-readable format, to enable citizens awareness

research with unprecedented data

evidence-based policy

Data Sources Sources Currently used:

Transparency – DIAVGEIA

Central Electronic Registry of Public Procurement - E-Procurement

National Strategic Reference Framework (NSRF)

Central Market of Thessaloniki (CMT)

e-Prices

Fuel Prices

Municipality of Athens, Municipality of Thessaloniki

Government of Australia

Data growth

we use Open Link Virtuoso for 15 different sources of nearly 1B triples

we host 27 datasets in CKAN from 15 organizations

data is increased respectively each month

Data processing Each data source is separately handled and processed as its

available data are not uniformly provided or in machine-readable format.

Diavgeia, “NSRF” and Observatories for product and fuel prices provide a rich API interface that can be easily queried in order to provide machine-readable data in JSON format.

In the cases of E-Procurement, “CMT” and “Municipalities of Athens and Thessaloniki” there is no API available. Thus, we have developed a software module, which gathers online information in an automated way, storing it in a machine-readable format.

General Architecture

Process model

Open economic data related to public budgeting, spending and prices are characterized of high volume, velocity, variety and veracity

We have to build custom components under the common logic of transforming static data to linked open data streams.

Process model: Nucleus

The nucleus of our approach is semantic modelling, data enrichment and interconnections.

Data are stored in raw (as harvested from sources), in RDF and json formats.

Process model : Data distribution

Enriched data are distributed though five channels:

1. Data dumps (CKAN), 2. SPARQL queries,3. Web, 4. Social media 5. Structured inputs to

Business Intelligence (BI) systems.

Additionally, data can be further analysed and exchanged with relevant platforms (e.g. SPARQL to R).

Process model : Validation and messenger

The validation component runs throughout the whole process in order to safeguard high data quality by detecting errors.

The messaging component works as an internal messaging and alert system for all components.

Process flow

Infrastructure

Functionalities /

Components Services / Data sources

VM1 linkedeconomy.org apache, php, mysql, drupal

VM2 SPARQL endpoint, demo site OLV, apache, php, mysql, drupal

VM3 Harvester

CouchDB, Lucene, apache, mysql / CKAN

(Greek Datasets)

VM4 Harvester, Messenger mysql, LinkedEconomy dropbox

VM5 Storage - Secondary triplestore CouchDB, OLV, CouchDB-Lucene, docker

VM6 Harvester

apache, php, mysql, drupal / CKAN (Foreign

Datasets)

VM7 SPARQL endpoint OLV (Foreign graphs)

VM8 Management JIRA, mysql, tomcat

VM9 Dashboard front-end, CMS, INSPINIA

VM10 System administration VPN, firewalls, etc.

Physical Storage - Core triplestore OLV (Greek graphs)

As core infrastructure we use ~okeanos, which is an established cloud-based service provided for the Greek research and academic community.

LinkedEconomy

CKAN

“Hottest” Prices per municipality

Supermarkets Geoinformation

Application System

Small ApplicationsJava, Php and UNIX Scripts

Di@vgeia

KHMDHS

Virtuoso

CouchDB

Drupal

MySql

ePrices

CKAN

fuelPricesQGIS

Dockerize the System

Di@vgeia

KHMDHS

ePrices

Virtuoso

Drupal

MySql

QGIS Desktop

CouchDB

QGIS Server

Small Applications

CKAN

With Compose 2

Docker MySQL

version: '2' services: mysql: build: ./mysql-docker/5.6 container_name: eLodDrupalmySQL volumes: - /mysql_drupal:/var/lib/mysql environment: - MYSQL_DATABASE=drupalelod - MYSQL_ROOT_PASSWORD=eLodmysqlpass restart: on-failure

Save your data !!

Will build the image from your directory

Do not use flag “always” in your development environment!

Docker Drupal drupal: build: ./docker-drupal command: - /start.sh depends_on: - mysql container_name: eLodDrupal #image: eLodDrupal ports: - "8081:80" volumes: - "/data_drupal:/var/www/html" links: - "mysql" environment: - MYSQL_DATABASE=drupalelod - MYSQL_USER=root - MYSQL_PASSWORD=eLodmysqlpass - DRUPAL_ADMIN_PW=eLODDR - DRUPAL_ADMIN=admin - MYSQL_HOST=eLodDrupalmySQL - [email protected] restart: on-failure

Will start the service only after MySQL service

Will link the container with MySQL container

Docker Virtuoso virtuoso: build: ./docker-virtuoso container_name: eLodVirtuoso ports: - "8890:8890" volumes: - /virtuoso/db:/var/lib/virtuoso/db environment: - DBA_PASSWORD=eLodVir - SPARQL_UPDATE=true - DEFAULT_GRAPH=http://localhost:8890/DAV restart: on-failure

Docker QGIS qgisdesktop: #image: kartoza/qgis-desktop:2.14 build: ./qgis-desktop/2.14 hostname: qgis-server volumes: #Wherever you want to mount your data from - ./gis:/gis #Unix socket for X11 - "/tmp/.X11-unix:/tmp/.X11-unix" links: - db:db environment: - DISPLAY=unix:1 command: /usr/bin/qgis

Build the system

Clone the repository from githubhttps://github.com/stetsiafoulis/eLOD

Create the directories where you are going to link your data

Enter docker-compose up -d and that’s it !!

Why Docker ?

o Portableo Lightweight o Move to different cloud infrastructures

and to Physical serverso Run on Virtual Machines for

development and testing o Easily Scale o Easy Delivery and deploymento Run Anywhere (regardless host distro,

physical, cloud or not )o Run Anything

What’s Next ??

Scaling per Source

Di@ygeia KHMDHSVirtuoso

Drupal

MySql

QGIS Desktop

CouchDB

QGIS Server

Small Applications

Virtuoso

Drupal

MySql

CouchDB

QGIS Server

Small ApplicationsQGIS Desktop

Run Small Apps through Docker API

Small Applications

Next Steps - Swarm Virtuoso

Drupal

MySql

CouchDB

QGIS Server

Cluster management ScalingState reconciliationMulti-host networkingService discoveryLoad balancing

Next Steps - Consul

Health CheckingService Discovery

Multi Datacenter support

Any Questions ??

Appendix - Data Sources links LinkedEconomy (http://linkedeconomy.org/).

[email protected]

Sources Currently used: Transparency - DIAVGEIA: https://diavgeia.gov.gr Central Electronic Registry of Public Procurement - E-Procurement (KHDMHS):

http://www.eprocurement.gov.gr National Strategic Reference Framework (NSRF): https://www.espa.gr/en Central Market of Thessaloniki (CMT): http://www.kath.gr/ e-Prices: http://www.e-prices.gr/ Fuel Prices: http://www.fuelprices.gr/ Municipality of Athens: https://www.cityofathens.gr/khe/proypologismos Municipality of Thessaloniki:

http://www.thessaloniki.gr/portal/page/portal/DioikitikesYpiresies/GenDnsiDioikOikonYpiresion/DnsiDiafanEksipirDimoton/TmimaDiafaneias/AnoiktiDdiathesiDedomenon/DimosiefsiEktelesisProipologismou/ektelesi-proypologismou

Government of Australia: http://data.gov.au/