Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders...

31
BIG DATA EUROPE Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges

Transcript of Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders...

Page 1: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

BIG DATA EUROPE Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges

Page 2: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Partners

Page 3: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Mission

Lower barrrier for using big data technologies

o Required effort and resources

o Required data science skills

Assist in establishing

cross-lingual/organizational/domain Data Value

Chains

Show societal value of Big Data 16-mars-15 www.big-data-europe.eu

Page 4: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

cross-lingual / cross-organizational / cross-domain

Societal Domain Preliminary Big Data Focus area Selected Key Data assets

Life Sciences &

Health

Heterogeneous data Linking & integration

Biomedical Semantic Indexing & QA

ACD Labs / ChemSpider, ChEBI, ChEMBL, Con-ceptWiki, DrugBank, EN-ZYME, Gene

Ontology, GO Annotation, Swis-sProt, UniProt, Wik-iPathways, PubMed, MeSH, Disease

Ontology (DO), Joint Chemical Dic-tionary (Jochem), Bio-ASQ datasets

Food & Agriculture Large-scale distributed data integration INFOODS, AQUASTAT Green Learning Network (GLN), Agricultural Bibliography

Network (ABN), AGRIS, AquaMaps, Fishbase

Energy Real-time monitoring, stream processing,

data analytics, and decision support European Energy Exchange Data, smart meter measurement data, gas/fuels/energy

market/price data, consumption statistics, equipment condition monitoring data)

Transport Streaming sensor network & geo-spatial

data integration GTFS data, OSM/ LinkedGeoData, MobilityMaps, Transport sensor data, ROSATTE

Road safety attributes, European Road Data Infrastructure - EuroRoadS

Climate Real-time monitoring, stream processing, and

data analytics. European Grid Infrastructure (EGI), Databases hosting atmospheric data. Several

software frameworks for simulation, calibration and reconstruction.

Social Sciences Statistical and research data linking &

integration Federated social sciences data catalogs, statistical data from public data portals and

statistical offices (e.g. EuroStats, UNESCO, WorldBank)

Security

Real-time monitoring, stream processing, and

data analytics.

Image data analysis

Earth Observation data (e.g. Very High Resolution Satellite Imagery acquired from

commercial providers and governmental systems) and collateral data for supporting

CFSP/CSDP missions and operations, Databases hosting atmospheric Data.

Experimental and simulation data concerning dispersion of hazardous substances

Page 5: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Project Summary

Two clearly defined coordination and support measures:

Coordination: Engaging with a diverse range of stakeholder groups representing particularly the Horizon

2020 societal challenges Health, Food & Agriculture, Energy, Transport, Climate, Social Sciences and

Security; Collecting requirements for the ICT infrastructure needed by data-intensive science practitioners

tackling a wide range of societal challenges; covering all aspects of publishing and consuming semantically

interoperable, large-scale data and knowledge assets;

Support: Designing, realizing and evaluating a Big Data Aggregator platform infrastructure that meets

requirements, minimises disruption to current workflows, and maximises the opportunities to take advantage

of the latest European RTD developments (incl. multilingual data harvesting, data analytics & visualisation).

BigDataEurope will implement and apply two main instruments to successfully realize these measures:

Build Societal Big Data Interest Groups in the W3C interest group scheme and involving a large number of

stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts;

Design, integrate and deploy a cloud-deployment-ready Big Data aggregator platform comprising key

open-source Big Data technologies for real-time and batch processing, such as Hadoop, Cassandra and

Storm.

Page 6: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Orthogonal Dimensions of Big Data Ecosystems

Generic Big Data Enabling Technologies

Data Value Chain

Data Generation & Acquisition

Data Analysis & Processing

Data Storage & Curation

Data Visualization &

Usage

Data-driven Services

So

cie

tal

Ch

all

en

ge

s

Do

ma

in S

pe

cifi

c D

ata

Ass

ets

& T

ech

no

log

y

Healthcare

Food Security

Energy

Intelligent Transport

Climate & Environment

Inclusive & Reflective Societies

Secure Societies

Page 7: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

BigDataEurope Platform

16-mars-15 www.big-data-europe.eu

Page 8: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Work Packages & Implementation Phases

Community

Building

M1-M12 M13-M24 M25-M36

Enabling Technologies

Component Integration

Uptake

Integrator Deployment

Community Assessment

WP3 – Big Data Generic Enabling Technologies & Architecture

WP5 – Big Data Integrator Instances

WP7 – Dissemination & Communication

WP2 – Community Building & Requirements

WP4 – Big Data Integrator Platform

WP6 – Real-life Deployment & User Evaluation

Page 9: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

BDE platform covers complete data-landscape

Data processing with human organized information

Similar data processing steps applied on a large

quantity

Similar data processing steps applied on a stream

of data

Page 10: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Blueprint BDE platform

Dis

sem

ina

tion A

PI

aggregated

data

Search

index

Dataset

Meta data

SPARQL

JSON

LOD

search

Dis

sem

ination s

tora

ge

JSON-LD

Real time aggregator

Bulk data aggregator

Background aggregator

Rep

ort

ing

API

Bulk

database

Background

knowhow

Page 11: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Blueprint BDE platform

Dis

sem

ina

tion A

PI

aggregated

data

Search

index

Dataset

Meta data

SPARQL

JSON

LOD

search

Dis

sem

ination s

tora

ge

JSON-LD

Real time aggregator

Bulk data aggregator

Background aggregator

Rep

ort

ing

API

Bulk

database

Background

knowhow

Deployment

Page 12: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Coordination

16-mars-15 www.big-data-europe.eu

Page 13: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Networking partners

www.big-data-europe.eu

Health, demographic

change and wellbeing Food, Agriculture,

Forestry, Water and

Bioeconomy

Inclusive, innovative and

Reflective Societies

Secure, clean and

efficient energy Climate, environment,

resource efficiency and

raw materials

Smart, green and

integrated transport Secure

Societies

Page 14: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Envisioned societal stakeholder engagement cycle

Page 15: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Community building and supporting

◎ Establish 7 Societal Big Data Interest Groups

o modelled after the W3C interest groups

o involving a large number of stakeholders from the H2020 societal challenges as well as technical Big

Data experts

o each group has a domain and a technical chair

◎ Building a European network and multiplier organization per societal challenge to

o engage with stakeholders in the particular societal challenge area and raise awareness

o support the requirements elicitation, definition and prioritization

o assemble a library of data sources and datasets

o provide a comprehensive test bed for the evaluation of the BDE Aggregator Platform

o select pilot use cases, across different domains

o promote the showcase developed for the societal domain and support the dissemination of the BDE

results

o provide appropriate academic and training curricula for training future researchers and practitioners. 27-févr.-15

www.big-data-europe.eu

Page 16: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Workshops

◎ 7 X 3 Workshops (at least 3 per Societal Challenge)

◎ First series of workshops in the next months will focus on requirements

definition

o analyse workshops results and create 1st draft per societal challenge,

o examine also the use of other tools such as

❖ surveys (broad audience to ask for (big) data management needs)

❖ manage experts interviews with Big Data experts

❖ interviews with EC representative per societal challenge

◎ Second series of workshops in the 2nd year will focus on a review of the

architecture and first prototype implementation

◎ Third series of workshops in the 3rd year will focus on the platform

evaluation and showcases for the societal domains 27-févr.-15 www.big-data-europe.eu

Page 17: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

[email protected]

[email protected]

Big Data Europe

16-mars-15 www.big-data-europe.eu

Page 18: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

OPEN PHACTS - BIG DATA

AND DRUG DISCOVERY

BRYN WILLIAMS-JONES, CEO THE OPEN PHACTS FOUNDATION

Big Data Europe

Page 19: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

LiteraturePubChem

GenbankPatents

DatabasesDownloads

Data Integration Data AnalysisFirewalled Databases

Repeat @ each

company x

Lowering industry firewalls: pre-competitive informatics in drug discovery

Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944

Pre-competitive Informatics:

Pharma companies are all accessing, processing, storing & re-processing external open research data

Page 20: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

• EC funded public-private

partnership for pharmaceutical

research

• Focus on key problems

– Efficacy, Safety, Education

& Training, Knowledge

Management

The Innovative Medicines Initiative

The Open PHACTS Project

• Create a semantic integration hub (“Open

Pharmacological Space”)…

• Runs 2011-2014, ENSO till 2016

• Deliver services to support on-going drug

discovery programs in pharma and public domain

• Leading academics in semantics, pharmacology

and informatics, driven by solid industry business

requirements

• 10 EFPIA companies, 15 academics, 6 SMEs

• Focus on sustainability and long term impact of

the Open PHACTS infrastructure

Page 21: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Integrate Multiple Research Biomedical

Data Resources

Into A Single Open & Free

Access Point

Open PHACTS Mission

Page 22: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

ChEMBL DrugBank Gene

Ontology Wikipathways

UniProt

ChemSpider

UMLS

ConceptWiki

ChEBI

TrialTrove

GVKBio

GeneGo

TR Integrity

What do research scientists want

to know?

Page 23: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Number sum Nr of 1 Question

15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse

18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is

the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?

24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives.

32 13 8 For a given interaction profile, give me compounds similar to it.

37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that

contain substructure X.

38 13 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match

stereochemistry or not).

41 13 8

A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What

are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of

the target family (i.e. PKC) both from structured assay databases and the literature.

44 13 8 Give me all active compounds on a given target with the relevant assay data

46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)

59 14 8 Identify all known protein-protein interaction inhibitors

‘Business Questions’

Page 24: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Nanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON) Domain

Specific

Services

Identity Resolution

Service

Chemistry

Registration

Normalisation &

Q/C

Identifier

Management

Service

Indexing

Core

Pla

tform

P12374

EC2.43.4

CS4532

“Adenosine

receptor 2a”

VoID

Db

Nanopub

Db

VoID

Db

VoID

Nanopub

VoID

Public Content Commercial

Public Ontologies

User

Annotations

Apps

The Open PHACTS Discovery

Platform

http://dx.doi.org/10.1016/j.websem.2014.03.003

Page 25: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

Sustaining Impact

“Software is free like puppies

are free - they both need money

for maintenance”

…and more resource for future

development

Page 26: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,
Page 27: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

How do we move data about and

integrate it?

Page 28: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

http://imgs.xkcd.com/comics/standards.png

Data Standardisation is vital

Page 29: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

P12047 X31045

GB:29384

Yet the bioscience world really

struggles to agree on names

Page 30: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

[email protected] @Open_PHACTS

Open PHACTS Practical Semantics

[email protected]

Acknowledgements GlaxoSmithKline – Coordinator

Universität Wien – Managing entity

Technical University of Denmark

University of Hamburg, Center for Bioinformatics

BioSolveIT GmBH

Consorci Mar Parc de Salut de Barcelona

Leiden University Medical Centre

Royal Society of Chemistry

Vrije Universiteit Amsterdam

Novartis

Merck Serono

H. Lundbeck A/S

Eli Lilly Netherlands Bioinformatics Centre

Swiss Institute of Bioinformatics ConnectedDiscovery

EMBL-European Bioinformatics Institute

Janssen Esteve Almirall

OpenLink Scibite

The Open PHACTS Foundation

Spanish National Cancer Research Centre

University of Manchester

Maastricht University

Aqnowledge

University of Santiago de Compostela

Rheinische Friedrich-Wilhelms-Universität Bonn

AstraZeneca

Pfizer

Page 31: Big Data Europe - FOT-Netfot-net.eu/.../van-Nuffelen-Big-Data-Europe-fotnet.pdf · stakeholders from the Horizon 2020 societal challenges as well as technical Big Data experts; Design,

[email protected]

[email protected]

Big Data Europe

16-mars-15 www.big-data-europe.eu