Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research
-
Upload
aims-agricultural-information-management-standards -
Category
Science
-
view
712 -
download
1
Transcript of Webinar@AIMS: Big Data challenges and solutions in agricultural and environmental research
Big Data challenges and solutions in
agricultural and environmental research
Big Data Europe – AIMS webinar, 17 December 2015
Rob LokersAlterra, Wageningen URThe Netherlands
Outline
� Historic perspective (agricultural & environmental modelling)
� Expectations for the (near) future
� Some Big Data examples from the agri-food domain
� Big Data challenges in agri-environmental research
� Expectations versus reality in 2015
2
3
1960
-
1980
Crop science
Animal science
Food Science
Economics
Institutional
data
collection
Institutional
data
collection
Institutional
data
collection
Institutional
data
collection
1980
-
2000
2000-
2010
2010-
2015
First
computer
models
Institutional
applications
Integrated modelling frameworks
First
computer
models
Institutional
applications
First
computer
models
Institutional
applications
First
computer
models
Institutional
applications
Open data across sectors
IT improvements
(meta data,
semantics)
IT improvements
(meta data,
semantics)
IT improvements
(meta data,
semantics)
IT improvements
(meta data,
semantics)
4
Crop science
Animal science
Food Science
Economics
2010-
2015
Open data across sectors
IT improvements
(meta data,
semantics)
IT improvements
(meta data,
semantics)
IT improvements
(meta data,
semantics)
IT improvements
(meta data,
semantics)
2015
-
2020
BIG DATA: one massive
linked data pool across
disciplines and strong
computational
capabilities
Computational
capabilities:
• Amazon
• Microsoft
Azure
• Google Earth
Engine
• EC research
infrastructures
New data sources:
• Remote sensing
• Crowd sourcing
• Rapid
phenotyping/
Omics
• Social media
Potential to solve problems on agriculture, nutrition, food security, climate change?
Data analysis and integration, Models,
Artificial Intelligence, Linked Open
Data, Semantic web technologies, ...
Policy options, Products, Services, Costs,
Benefits, Scenarios, Impact Assessments,
Decision Support Systems, Integrated
models, .....
Decision domain
(policy/industry)
Process of data based value creation and roles involved
Policy makers/industry/societal stakeholders
Wisdom
Knowledgeinfo +
application
Information
data + added meaning
(Big) Data
raw material
Knowledge
domain
(science /
consultants)
Interests (economic, social, environmental),
values, preferences, trade-offs, risks,
intangibles, ethics, ....
Databases, Satellites,
Sensor networks, Social
media, Citizen
Observatories, ... Op
en
(d
ata
) S
tan
dard
s,
(m
eta
)d
ata
rep
osit
orie
s,
Bu
sin
ess d
evelo
pm
en
t, V
isu
alizati
on
tools
an
d
meth
od
s, C
on
textu
alizati
on
, K
now
led
ge B
rokerag
e, ..
.
Food Security example: Monitoring Agricultural
ReSources (MARS)
Wisdom
Knowledge
Information
Data
� Owned and operated by EC-JRC
� Crop forecasts at EU level needed to take
rapid decisions on Common Agricultural
Policy instruments during the year
� Provide information on vulnerability in
specific food insecure areas
� In support of:
● European Common Agricultural
Policy on commodities & subsidies
(focus on Europe, Asia)
● Food aid (focus on Africa)
� Monitoring weather and crop conditions of
current growing season (early warning,
extreme events)
Example: Monitoring Agricultural ReSources (MARS)
Wisdom
Knowledge
Information
Data
weather archives live data streams
crop, soil databases Models
Example: Monitoring Agricultural ReSources (MARS)
Wisdom
Knowledge
Information
Data
weather archives live data streams
crop, soil databases Models
Rescaling, interpolations
GIS
Crop models
Example: Monitoring Agricultural ReSources (MARS)
Wisdom
Knowledge
Information
Data
weather archives live data streams
crop, soil databases Models
Statistical tools Decision support
Data mining & reporting
Example: Monitoring Agricultural ReSources (MARS)
Wisdom
Knowledge
Information
Data
weather archives live data streams
crop, soil databases Models
Policy & decision making
Big Data technologies
Technologies currently used (agri-environmental research)
� RDBMS, geo-databases
● but also file-based, Excel etc.
� Various “old & proven” programming languages (esp. for
modelling, data processing)
● Fortran, C/C++, Java etc.
� Remote sensing: dedicated tools & environments for
processing and analysis
● ENVI, R, GDAL etc.
� GIS & spatial analysis packages
� Harmonized information / data models (but still per discipline)
� Local, optimized solutions for computing and storage
Big Data technologies
Experimental technologies (ICT research for agriculture):
� High Performance clusters / grids
● E.g. parallelization of modelling and analysis software
� RDF databases
● Linked Data applications linking sources of metadata,
bibliographical data, statistical data
� Vocabularies and ontologies
● Annotation of (meta)data for improved discovery
� Semantic technologies
� NLP algorithms
However: agro-environmental research seems to be a “wicked domain” with specific challenges
Challenge - Variety
Wisdom
Knowledge
Information
Data
Wisdom
Knowledge
Information
Data
Wisdom
Knowledge
Information
Data
Velo
city
Variety Variety
Volume
Climatology
Agronomy
Soil Science
Challenge - Variety
� Agro-environmental research = Interdisciplinary:
● different targeted objectives
● different data formats
● different schema’s, vocabularies etc.
● different levels of standardization
● different granularities
� Example: relevant domains for agricultural impact
assessments
● Agronomy
● Climate
● Water/irrigation management
● Economy etc...
Challenge - Variety
Semantic alignment can be problematic
� Different domains use different semantics to describe the
same knowledge
� Semantics maybe in different, non-recognized standards or
not existing
� Ontology alignment tools usually do not work
� Manual alignment is resource-consuming and requires multi-
disciplinary experts
� No fitting vocabularies and ontologies to effectively annotate
datasets
� Example: climate data – temperature
● Modelled differently in different vocabularies/ontologies
● Not specific enough to characterize data
Challenge - Variety
Commonly used vocabularies usually do not fit scientific
requirements
� Many seem to be designed for annotation of bibliographic data
� Often not complete, extended in response to requirements of
the owner / maintainer
� Unbalanced, e.g. level of detail and granularity
� Example: tree species vs. climate variables in Agrovoc
Quercus
Quercus Robur
Fagaceae
Fagus
Quercus Ilex
temperature
Air temperature
Measure
Interest rate
Body temperature
Food production example: Smart Farming: Monitoring, planning & control
17
Genome sequences
Feed uptake
Performance
Manure
Temperature
Activity
Heart rate
pH
Antibodies
Biomarkers
Medicine use
........
........
Size
Location
Performance
Manure
Water
Energy
Nutrition
Health management
. . . . . .
. . . . . .
Distance to . .
Public health
Living environment
Mineral cycles
Healthy products
Disease risks
Economic figures
Environmental issues
. . . . . . .
. . . . . . .
Crop or Animal level Farm level
Environmental level
Supporting sustainable food production
and contributing to the realization of
(inter)national policy agenda’s.
Market prices
Logistics
Regulations
. . . . . . .
. . . . . . .
Market level
Challenge - Veracity
Wisdom
Knowledge
Information
Data
Wisdom
Knowledge
Information
Data
Wisdom
Knowledge
Information
Data
Velo
city
Variety Variety
Volume
Climatology
Agronomy
Soil Science
Challenge - Veracity
Wisdom
Knowledge
Information
Data
Wisdom
Knowledge
Information
Data
Wisdom
Knowledge
Information
DataV
elo
city
Variety Variety
Volume
Climatology
Agronomy
Soil Science
Veracity Veracity
Veracity
Challenge - Veracity
� In science, trust is essential, in two ways:
● Trust of end users in the quality of (meta)data
● Trust of data providers regarding end users
� Data accessibility is generally low
● stored in silos
● only exchanged among peers in known networks
● hardly documented
● This has a lot to do with culture and policies in research
organisations regarding data management
Challenge - Veracity
� Publication of datasets
● A lot of data is inaccessible, for various reasons and
either intentional or unintentional.
● Some data is accessible, but might be
● not provided through standardized interfaces
● not easily discoverable
� Example: weather data (station observations)
Challenge - Veracity
� Documentation of datasets
● No incentive to provide metadata
● End of project activity, with no benefits for scientist
● Boring work
● No clear perspective on end user requirements
● Metadata schema’s are generally considered too complex
� Metadata quality is generally poor
● Minimal amount of metadata provided
● No, non-standardized or irrelevant annotations
� Example: Forestry Clearinghouse
Expectations versus reality in 2015...
Promise: new technologies will make our life much easier
� RDF databases
� Semantic technologies
� Grid computing, cloud storage solutions etc.
Reality: we keep work with the old stuff
� Alignment and integration are hard to accomplish
� New technologies prove to be too immature for the real world
� Production systems are still developed using what we know
works well (e.g. RDBMS, legacy models and data formats)
� Successful innovative initiatives use hybrid solutions, often
build on “proven” technologies
● Limited to metadata level or small &medium sized data,
with limited domain coverage
Expectations versus reality in 2015...
Promise: “Googlification” of scientific data provision
� Transparent access to big, distributed, heterogeneous datasets
� “Magical” semantic (and linguistic) query processing
� Tools seamlessly transform heterogeneous data to model data
input, information and knowledge for decision making
Reality: we struggle getting ourselves into shape
� Attempts are mainly successful on metadata level and
bibliographic sources (genetics might be an exception)
� Cumbersome first attempts to harmonize big heterogeneous
data streams
� Custom-build data collection and processing chains still remain
dominant
Source: Gartner (August 2015)
25
Thank you for
your attention
26