Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

26
Big Data challenges and solutions in agricultural and environmental research Big Data Europe – AIMS webinar, 17 December 2015 Rob Lokers Alterra, Wageningen UR The Netherlands

Transcript of Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Page 1: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Big Data challenges and solutions in

agricultural and environmental research

Big Data Europe – AIMS webinar, 17 December 2015

Rob LokersAlterra, Wageningen URThe Netherlands

Page 2: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Outline

� Historic perspective (agricultural & environmental modelling)

� Expectations for the (near) future

� Some Big Data examples from the agri-food domain

� Big Data challenges in agri-environmental research

� Expectations versus reality in 2015

2

Page 3: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

3

1960

-

1980

Crop science

Animal science

Food Science

Economics

Institutional

data

collection

Institutional

data

collection

Institutional

data

collection

Institutional

data

collection

1980

-

2000

2000-

2010

2010-

2015

First

computer

models

Institutional

applications

Integrated modelling frameworks

First

computer

models

Institutional

applications

First

computer

models

Institutional

applications

First

computer

models

Institutional

applications

Open data across sectors

IT improvements

(meta data,

semantics)

IT improvements

(meta data,

semantics)

IT improvements

(meta data,

semantics)

IT improvements

(meta data,

semantics)

Page 4: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

4

Crop science

Animal science

Food Science

Economics

2010-

2015

Open data across sectors

IT improvements

(meta data,

semantics)

IT improvements

(meta data,

semantics)

IT improvements

(meta data,

semantics)

IT improvements

(meta data,

semantics)

2015

-

2020

BIG DATA: one massive

linked data pool across

disciplines and strong

computational

capabilities

Computational

capabilities:

• Amazon

• Microsoft

Azure

• Google Earth

Engine

• EC research

infrastructures

New data sources:

• Remote sensing

• Crowd sourcing

• Rapid

phenotyping/

Omics

• Social media

Potential to solve problems on agriculture, nutrition, food security, climate change?

Page 5: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Data analysis and integration, Models,

Artificial Intelligence, Linked Open

Data, Semantic web technologies, ...

Policy options, Products, Services, Costs,

Benefits, Scenarios, Impact Assessments,

Decision Support Systems, Integrated

models, .....

Decision domain

(policy/industry)

Process of data based value creation and roles involved

Policy makers/industry/societal stakeholders

Wisdom

Knowledgeinfo +

application

Information

data + added meaning

(Big) Data

raw material

Knowledge

domain

(science /

consultants)

Interests (economic, social, environmental),

values, preferences, trade-offs, risks,

intangibles, ethics, ....

Databases, Satellites,

Sensor networks, Social

media, Citizen

Observatories, ... Op

en

(d

ata

) S

tan

dard

s,

(m

eta

)d

ata

rep

osit

orie

s,

Bu

sin

ess d

evelo

pm

en

t, V

isu

alizati

on

tools

an

d

meth

od

s, C

on

textu

alizati

on

, K

now

led

ge B

rokerag

e, ..

.

Page 6: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Food Security example: Monitoring Agricultural

ReSources (MARS)

Wisdom

Knowledge

Information

Data

� Owned and operated by EC-JRC

� Crop forecasts at EU level needed to take

rapid decisions on Common Agricultural

Policy instruments during the year

� Provide information on vulnerability in

specific food insecure areas

� In support of:

● European Common Agricultural

Policy on commodities & subsidies

(focus on Europe, Asia)

● Food aid (focus on Africa)

� Monitoring weather and crop conditions of

current growing season (early warning,

extreme events)

Page 7: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Example: Monitoring Agricultural ReSources (MARS)

Wisdom

Knowledge

Information

Data

weather archives live data streams

crop, soil databases Models

Page 8: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Example: Monitoring Agricultural ReSources (MARS)

Wisdom

Knowledge

Information

Data

weather archives live data streams

crop, soil databases Models

Rescaling, interpolations

GIS

Crop models

Page 9: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Example: Monitoring Agricultural ReSources (MARS)

Wisdom

Knowledge

Information

Data

weather archives live data streams

crop, soil databases Models

Statistical tools Decision support

Data mining & reporting

Page 10: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Example: Monitoring Agricultural ReSources (MARS)

Wisdom

Knowledge

Information

Data

weather archives live data streams

crop, soil databases Models

Policy & decision making

Page 11: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Big Data technologies

Technologies currently used (agri-environmental research)

� RDBMS, geo-databases

● but also file-based, Excel etc.

� Various “old & proven” programming languages (esp. for

modelling, data processing)

● Fortran, C/C++, Java etc.

� Remote sensing: dedicated tools & environments for

processing and analysis

● ENVI, R, GDAL etc.

� GIS & spatial analysis packages

� Harmonized information / data models (but still per discipline)

� Local, optimized solutions for computing and storage

Page 12: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Big Data technologies

Experimental technologies (ICT research for agriculture):

� High Performance clusters / grids

● E.g. parallelization of modelling and analysis software

� RDF databases

● Linked Data applications linking sources of metadata,

bibliographical data, statistical data

� Vocabularies and ontologies

● Annotation of (meta)data for improved discovery

� Semantic technologies

� NLP algorithms

However: agro-environmental research seems to be a “wicked domain” with specific challenges

Page 13: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Variety

Wisdom

Knowledge

Information

Data

Wisdom

Knowledge

Information

Data

Wisdom

Knowledge

Information

Data

Velo

city

Variety Variety

Volume

Climatology

Agronomy

Soil Science

Page 14: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Variety

� Agro-environmental research = Interdisciplinary:

● different targeted objectives

● different data formats

● different schema’s, vocabularies etc.

● different levels of standardization

● different granularities

� Example: relevant domains for agricultural impact

assessments

● Agronomy

● Climate

● Water/irrigation management

● Economy etc...

Page 15: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Variety

Semantic alignment can be problematic

� Different domains use different semantics to describe the

same knowledge

� Semantics maybe in different, non-recognized standards or

not existing

� Ontology alignment tools usually do not work

� Manual alignment is resource-consuming and requires multi-

disciplinary experts

� No fitting vocabularies and ontologies to effectively annotate

datasets

� Example: climate data – temperature

● Modelled differently in different vocabularies/ontologies

● Not specific enough to characterize data

Page 16: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Variety

Commonly used vocabularies usually do not fit scientific

requirements

� Many seem to be designed for annotation of bibliographic data

� Often not complete, extended in response to requirements of

the owner / maintainer

� Unbalanced, e.g. level of detail and granularity

� Example: tree species vs. climate variables in Agrovoc

Quercus

Quercus Robur

Fagaceae

Fagus

Quercus Ilex

temperature

Air temperature

Measure

Interest rate

Body temperature

Page 17: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Food production example: Smart Farming: Monitoring, planning & control

17

Genome sequences

Feed uptake

Performance

Manure

Temperature

Activity

Heart rate

pH

Antibodies

Biomarkers

Medicine use

........

........

Size

Location

Performance

Manure

Water

Energy

Nutrition

Health management

. . . . . .

. . . . . .

Distance to . .

Public health

Living environment

Mineral cycles

Healthy products

Disease risks

Economic figures

Environmental issues

. . . . . . .

. . . . . . .

Crop or Animal level Farm level

Environmental level

Supporting sustainable food production

and contributing to the realization of

(inter)national policy agenda’s.

Market prices

Logistics

Regulations

. . . . . . .

. . . . . . .

Market level

Page 18: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Veracity

Wisdom

Knowledge

Information

Data

Wisdom

Knowledge

Information

Data

Wisdom

Knowledge

Information

Data

Velo

city

Variety Variety

Volume

Climatology

Agronomy

Soil Science

Page 19: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Veracity

Wisdom

Knowledge

Information

Data

Wisdom

Knowledge

Information

Data

Wisdom

Knowledge

Information

DataV

elo

city

Variety Variety

Volume

Climatology

Agronomy

Soil Science

Veracity Veracity

Veracity

Page 20: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Veracity

� In science, trust is essential, in two ways:

● Trust of end users in the quality of (meta)data

● Trust of data providers regarding end users

� Data accessibility is generally low

● stored in silos

● only exchanged among peers in known networks

● hardly documented

● This has a lot to do with culture and policies in research

organisations regarding data management

Page 21: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Veracity

� Publication of datasets

● A lot of data is inaccessible, for various reasons and

either intentional or unintentional.

● Some data is accessible, but might be

● not provided through standardized interfaces

● not easily discoverable

� Example: weather data (station observations)

Page 22: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Challenge - Veracity

� Documentation of datasets

● No incentive to provide metadata

● End of project activity, with no benefits for scientist

● Boring work

● No clear perspective on end user requirements

● Metadata schema’s are generally considered too complex

� Metadata quality is generally poor

● Minimal amount of metadata provided

● No, non-standardized or irrelevant annotations

� Example: Forestry Clearinghouse

Page 23: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Expectations versus reality in 2015...

Promise: new technologies will make our life much easier

� RDF databases

� Semantic technologies

� Grid computing, cloud storage solutions etc.

Reality: we keep work with the old stuff

� Alignment and integration are hard to accomplish

� New technologies prove to be too immature for the real world

� Production systems are still developed using what we know

works well (e.g. RDBMS, legacy models and data formats)

� Successful innovative initiatives use hybrid solutions, often

build on “proven” technologies

● Limited to metadata level or small &medium sized data,

with limited domain coverage

Page 24: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Expectations versus reality in 2015...

Promise: “Googlification” of scientific data provision

� Transparent access to big, distributed, heterogeneous datasets

� “Magical” semantic (and linguistic) query processing

� Tools seamlessly transform heterogeneous data to model data

input, information and knowledge for decision making

Reality: we struggle getting ourselves into shape

� Attempts are mainly successful on metadata level and

bibliographic sources (genetics might be an exception)

� Cumbersome first attempts to harmonize big heterogeneous

data streams

� Custom-build data collection and processing chains still remain

dominant

Page 25: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Source: Gartner (August 2015)

25

Page 26: Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research

Thank you for

your attention

26