Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner...

35
Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner ([email protected]) Eco-informatics facility, Terrestrial Ecosystem Research Network, University of Adelaide, Australia www.aekos.org.au

Transcript of Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner...

Page 1: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Moving beyond preservation: Developing a platform to enable complex data reuse

Dr. David Turner ([email protected])

Eco-informatics facility, Terrestrial Ecosystem Research Network,University of Adelaide, Australia

www.aekos.org.au

Page 2: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Data services

• Data & information management• Knowledge modelling• Data relationship management• Licensing, citations and condition of use• Informatics and community practices• User support• Usage statistics to support our data

contributors

plot

complex

well-described

integrated

ecology

Page 3: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

ÆKOS’s Niche

N

sites, surveys, plotscomplex well-described integrated

ÆKOS Data

Primary

ecology

ÆKOS’s Niche

Page 4: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

The data revolution

Page 5: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

An emerging consensus?

Free of financial barriers• for any researcher to contribute to• for any user to access immediately

on publication

Made available without restriction on reuse for any purpose

• subject to proper attribution

Quality-assured and published in a timely mannerArchived and made available in perpetuity

International Council for Science (ISCU)2 September 2014

Published data should be independently understandable

Peer (2014) International Journal of Digital Curation

Page 6: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Are there unique challenges in ecology?

“Our extensive experience … collecting empirical data is that large data sets are often nuanced and complex, and appropriate analysis of them requires intimate knowledge of their context and substance to avoid making serious mistakes in interpretation.”

David Lindenmayer and Gene E. Likens 2013. Benchmarking Open Access Science Against Good Science. Bulletin of the Ecological Society of America 94:338–340. http://dx.doi.org/10.1890/0012-9623-94.4.338

Page 7: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Ecological complexity

Page 8: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Reusing dataIdentify problem

Draft approach

Search for data

Acquire data

Assess suitability

Modify approach

Prepare data

Conduct analysis

Interpret results

Search for data

Acquire data

Assess suitability

Prepare data

Page 9: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Barriers to reuseIdentify problem

Draft approach

Search for data

Acquire data

Assess suitability

Modify approach

Prepare data

Conduct analysis

Interpret results

Dispersed: Data is stored in many storage locations and formats

Source:Forestcheck: www.dec.wa.gov.au

Complex: Data usually needs explanation and context before it can be accurately used

RecID Species Xcoord Ycoord Height dbh1 E obliqua 56.22506 137.3208 34 362 E obliqua 34.45058 137.3557 22 333 E obliqua 34.25678 136.1189 54 794 E obliqua 35.77208 136.785 66 685 E obliqua 35.97997 136.8556 43 276 E baxteri 37.03322 138.71 56 777 E baxteri 34.61981 136.8554 33 208 E baxteri 36.0738 139.8762 22 1019 A brownii 35.1474 138.6559 25 71

10 A brownii 37.81432 136.2933 62 4211 A brownii 35.95443 138.5847 23 2212 A brownii 35.51555 139.868 42 9313 A marina 35.78676 139.8709 23 10314 A marina 37.70242 136.0484 34 7615 A marina 34.00839 137.3669 43 3316 A marina 36.74387 137.9251 34 9117 A marina 37.92455 136.7602 43 55

www.nswrail.net

Diverse and fragmented: Ecological data covers a wide range of topics and there are many different ways of measuring, observing and expressing different concepts* Rapidly evolving with few measurement standards

Page 10: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Empowering researchers

Discovery

Comprehension

Extraction

Access

Integration

Publication- Article- Data and citation

to AEKOS

Identify problem

Draft approach

Search for data

Acquire data

Assess suitability

Modify approach

Prepare data

Conduct analysis

Interpret results

Consider your users

Page 11: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Discovery through traditional metadata

An example of a textual abstract for a data set:

• Otway Ranges Orchid Recovery Program

The aim of the project is to compile and implement recovery plans for nationally threatened native orchids occurring in the Otway Ranges region of Victoria. Populations are monitored to gauge the current threats or causes of decline and the effectiveness of recovery actions.

Species studied include: Caladenia argocalla, Pterostylis bryophylla, Thelymitra cyanapicata and Caladenia rigida.

This dataset contains records collected from 1966 to present.

The fields in this dataset include: Species name, GPS reading and datum, start and end dates, historical records, population size, key threatening processes, number of flowering individuals, number of flowers and the number of individual plants aborted, grazed, pollinated, hand pollinated, damaged, spent and caged. Images of the species and recovery activities are also available.

source: Metadata from Flora Information System , Information Services Section (ISS) of the Victorian Department of Sustainability and Environment.

Page 12: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Creating structure

Collection: Otway Ranges Orchid

Recovery ProgramSubject Keywords : EARTH

SCIENCE - BIOLOGICAL CLASSIFICATION – PLANTS -

ORCHIDACEAE

Subject Keywords : EARTH SCIENCE - BIOSPHERE -

ECOLOGICAL DYNAMICS – COMMUNITY DYNAMICS

READE, J. (2010) Population Trends and Key Management Actions for Otway Ranges Threatened Orchid

Species. DSE, 265pp.

Organisation: http://www.dse.vic.gov

.au/dse/

ISO Keywords: BIOTA

Subject Keywords: EARTH SCIENCE - BIOSPHERE - ECOLOGICAL DYNAMICS - SPECIES_THREATENING PROCESS

URL: http://www.viridans.com/FISVFD/VICFIS1.HTM

rights

spatial coverage

citation

location

subject

is managed by

is owned byPerson: Joe Reade[mail:[email protected]]

subject

subject

subject

Full Description: {abstract – as before}

description

Spatial Coverage: (38.4S – 38.9S, 143E – 144W)

Derived from: ANDS RIF-CS format metadata

Page 13: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Incorporate observations and description

Collection: Otway Ranges

Orchid Recovery Program

species coverage

Observed Entity: Organism

Measure Coverage: Organism Absence

Time Coverage: Jun 1966 - 2011

Spatial Coverage: Polygon: «sub-coastal area around Cape Otway»

spatial coverage

timecoverage

measurecoverage

Entity coverage

Measure Coverage: Organism Presence

measurecoverage

Species Target: Spider Orchid

Family Target: Orchidaceae

species target

species target

Species Coverage: Caladenia argocalla

Species Coverage: Pterostylis bryophylla

Species Coverage:Thelymitra cyanapicata

species coverage

species coverage

species coverage

Species Coverage: Caladenia rigid

Method Coverage: Visual Observation

methodcoverage

Measure Coverage: Organism Population

measurecoverage

Page 14: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Define relationships and vocabularies

Common Name:Orchid

Common Name: Spider Orchid

Genus: Caladenia Genus: Pterostylis

Genus: Thelymitra

equates to

Species: Caladenia argocalla

equates to

Spatial Coverage: Polygon: «sub-coastal area around Cape Otway»

Place: Otway Ranges

covers place

Species Target: Orchid

Family Target: Orchidaceae

Organisation: Department for

Sustainability and Environment, VIC.

Person: Joe Reade[mailto:[email protected]]

Person: Dr. Joe Reade

member of

Land Use: Forestry

Page 15: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

AEKOS discovery

Page 16: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.
Page 17: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Access

Page 18: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

The comprehension challenge

Page 19: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Data entropy

Page 20: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

The information landscape

RecID Species Xcoord Ycoord Height dbh1 E obliqua 56.22506 137.3208 34 362 E obliqua 34.45058 137.3557 22 333 E obliqua 34.25678 136.1189 54 794 E obliqua 35.77208 136.785 66 685 E obliqua 35.97997 136.8556 43 276 E baxteri 37.03322 138.71 56 777 E baxteri 34.61981 136.8554 33 208 E baxteri 36.0738 139.8762 22 1019 A brownii 35.1474 138.6559 25 71

10 A brownii 37.81432 136.2933 62 4211 A brownii 35.95443 138.5847 23 2212 A brownii 35.51555 139.868 42 9313 A marina 35.78676 139.8709 23 10314 A marina 37.70242 136.0484 34 7615 A marina 34.00839 137.3669 43 3316 A marina 36.74387 137.9251 34 9117 A marina 37.92455 136.7602 43 55

Plants BirdsBats

© eResearchSA

© e

Rese

arch

SA

Page 21: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Embedding context

Observation

Observed Entity

Observation Processobserved under

part of part of

related to related to

self-observed

measurements of targeted things +observation of contextual things

measurements of effort +description of method context

‘document‘ of observed things + context

Observation Set

(Collection)

in

data set with associateddescription metadata

Page 22: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.
Page 23: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.
Page 24: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Integration

ÆKOS applies a flexible knowledge representation approach

Collection BookGraph ChapterObservation SectionEntity Paragraph on common subjectStatement SentenceValue Object of a SentenceMetadata Front Matter/Edition NoticeOntology GrammarVocabulary Dictionary + Thesaurus

(Data) Collection

Graph

Observation

Entity

Statement

Value

Page 25: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Concept alignment

Overlapping concepts

Measurement standards

Classification systems

Preserve complexity in the data

Squash variation for discovery

Page 26: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Study location

Sampled area

Landscape features

Sampling unit

Organism group(vegetation association)

Organism group(individual tree)

Entity Attribute Value

Org_gp 0001 Tree height 8

Org_gp 0001 Species E. camaldulensis

Org_gp 0001 DBH 45

Org_gp 0001 Life stage Mature

Org_gp 0001 Condition Good

Org_gp 0001 Floristics Flowering

Org_gp 0001 Shape C

Representing data as information

Page 27: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.
Page 28: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Integration

Page 29: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Agile data management

Page 30: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

ÆKO

S’s

cove

rage

Page 31: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Leve

l of D

ata

Com

plex

ity (R

ichn

ess)

DataOne

Nature

ALA(Species

Data)

No data

ANDS- RDA, TDDP

Vegbank

Pangaea

Other Atlases

ÆKOS Researcher Datasets (SHaRED)

ÆKOS (Site Data)

Level of data integrationFully Integration

ÆKOS in the data landscape

Page 32: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

ÆKOS in the data landscapeLe

vel o

f Des

crip

tion

DataOne

Nature

ALA (species

data)

No data

ANDS- RDA, TDDP

Vegbank Pangaea

Other Atlases

ÆKOS Researcher Datasets (SHaRED)

ÆKOS Integrated Site Data

Fully Integration

Level of data integration

Page 33: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Science impact

Page 34: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

/

Infrastructure uptake

International National

Page 35: Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner (d.turner@adelaide.edu.au) Eco-informatics facility, Terrestrial.

Feedback and collaborators wanted

Website: www.aekos.org.au

[email protected]