Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner...

Post on 15-Jan-2016

219 views 0 download

Tags:

Transcript of Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner...

Moving beyond preservation: Developing a platform to enable complex data reuse

Dr. David Turner (d.turner@adelaide.edu.au)

Eco-informatics facility, Terrestrial Ecosystem Research Network,University of Adelaide, Australia

www.aekos.org.au

Data services

• Data & information management• Knowledge modelling• Data relationship management• Licensing, citations and condition of use• Informatics and community practices• User support• Usage statistics to support our data

contributors

plot

complex

well-described

integrated

ecology

ÆKOS’s Niche

N

sites, surveys, plotscomplex well-described integrated

ÆKOS Data

Primary

ecology

ÆKOS’s Niche

The data revolution

An emerging consensus?

Free of financial barriers• for any researcher to contribute to• for any user to access immediately

on publication

Made available without restriction on reuse for any purpose

• subject to proper attribution

Quality-assured and published in a timely mannerArchived and made available in perpetuity

International Council for Science (ISCU)2 September 2014

Published data should be independently understandable

Peer (2014) International Journal of Digital Curation

Are there unique challenges in ecology?

“Our extensive experience … collecting empirical data is that large data sets are often nuanced and complex, and appropriate analysis of them requires intimate knowledge of their context and substance to avoid making serious mistakes in interpretation.”

David Lindenmayer and Gene E. Likens 2013. Benchmarking Open Access Science Against Good Science. Bulletin of the Ecological Society of America 94:338–340. http://dx.doi.org/10.1890/0012-9623-94.4.338

Ecological complexity

Reusing dataIdentify problem

Draft approach

Search for data

Acquire data

Assess suitability

Modify approach

Prepare data

Conduct analysis

Interpret results

Search for data

Acquire data

Assess suitability

Prepare data

Barriers to reuseIdentify problem

Draft approach

Search for data

Acquire data

Assess suitability

Modify approach

Prepare data

Conduct analysis

Interpret results

Dispersed: Data is stored in many storage locations and formats

Source:Forestcheck: www.dec.wa.gov.au

Complex: Data usually needs explanation and context before it can be accurately used

RecID Species Xcoord Ycoord Height dbh1 E obliqua 56.22506 137.3208 34 362 E obliqua 34.45058 137.3557 22 333 E obliqua 34.25678 136.1189 54 794 E obliqua 35.77208 136.785 66 685 E obliqua 35.97997 136.8556 43 276 E baxteri 37.03322 138.71 56 777 E baxteri 34.61981 136.8554 33 208 E baxteri 36.0738 139.8762 22 1019 A brownii 35.1474 138.6559 25 71

10 A brownii 37.81432 136.2933 62 4211 A brownii 35.95443 138.5847 23 2212 A brownii 35.51555 139.868 42 9313 A marina 35.78676 139.8709 23 10314 A marina 37.70242 136.0484 34 7615 A marina 34.00839 137.3669 43 3316 A marina 36.74387 137.9251 34 9117 A marina 37.92455 136.7602 43 55

www.nswrail.net

Diverse and fragmented: Ecological data covers a wide range of topics and there are many different ways of measuring, observing and expressing different concepts* Rapidly evolving with few measurement standards

Empowering researchers

Discovery

Comprehension

Extraction

Access

Integration

Publication- Article- Data and citation

to AEKOS

Identify problem

Draft approach

Search for data

Acquire data

Assess suitability

Modify approach

Prepare data

Conduct analysis

Interpret results

Consider your users

Discovery through traditional metadata

An example of a textual abstract for a data set:

• Otway Ranges Orchid Recovery Program

The aim of the project is to compile and implement recovery plans for nationally threatened native orchids occurring in the Otway Ranges region of Victoria. Populations are monitored to gauge the current threats or causes of decline and the effectiveness of recovery actions.

Species studied include: Caladenia argocalla, Pterostylis bryophylla, Thelymitra cyanapicata and Caladenia rigida.

This dataset contains records collected from 1966 to present.

The fields in this dataset include: Species name, GPS reading and datum, start and end dates, historical records, population size, key threatening processes, number of flowering individuals, number of flowers and the number of individual plants aborted, grazed, pollinated, hand pollinated, damaged, spent and caged. Images of the species and recovery activities are also available.

source: Metadata from Flora Information System , Information Services Section (ISS) of the Victorian Department of Sustainability and Environment.

Creating structure

Collection: Otway Ranges Orchid

Recovery ProgramSubject Keywords : EARTH

SCIENCE - BIOLOGICAL CLASSIFICATION – PLANTS -

ORCHIDACEAE

Subject Keywords : EARTH SCIENCE - BIOSPHERE -

ECOLOGICAL DYNAMICS – COMMUNITY DYNAMICS

READE, J. (2010) Population Trends and Key Management Actions for Otway Ranges Threatened Orchid

Species. DSE, 265pp.

Organisation: http://www.dse.vic.gov

.au/dse/

ISO Keywords: BIOTA

Subject Keywords: EARTH SCIENCE - BIOSPHERE - ECOLOGICAL DYNAMICS - SPECIES_THREATENING PROCESS

URL: http://www.viridans.com/FISVFD/VICFIS1.HTM

rights

spatial coverage

citation

location

subject

is managed by

is owned byPerson: Joe Reade[mail:joe.reade@dse.vic.gov.au]

subject

subject

subject

Full Description: {abstract – as before}

description

Spatial Coverage: (38.4S – 38.9S, 143E – 144W)

Derived from: ANDS RIF-CS format metadata

Incorporate observations and description

Collection: Otway Ranges

Orchid Recovery Program

species coverage

Observed Entity: Organism

Measure Coverage: Organism Absence

Time Coverage: Jun 1966 - 2011

Spatial Coverage: Polygon: «sub-coastal area around Cape Otway»

spatial coverage

timecoverage

measurecoverage

Entity coverage

Measure Coverage: Organism Presence

measurecoverage

Species Target: Spider Orchid

Family Target: Orchidaceae

species target

species target

Species Coverage: Caladenia argocalla

Species Coverage: Pterostylis bryophylla

Species Coverage:Thelymitra cyanapicata

species coverage

species coverage

species coverage

Species Coverage: Caladenia rigid

Method Coverage: Visual Observation

methodcoverage

Measure Coverage: Organism Population

measurecoverage

Define relationships and vocabularies

Common Name:Orchid

Common Name: Spider Orchid

Genus: Caladenia Genus: Pterostylis

Genus: Thelymitra

equates to

Species: Caladenia argocalla

equates to

Spatial Coverage: Polygon: «sub-coastal area around Cape Otway»

Place: Otway Ranges

covers place

Species Target: Orchid

Family Target: Orchidaceae

Organisation: Department for

Sustainability and Environment, VIC.

Person: Joe Reade[mailto:joe.reade@dse.vic.gov.au]

Person: Dr. Joe Reade

member of

Land Use: Forestry

AEKOS discovery

Access

The comprehension challenge

Data entropy

The information landscape

RecID Species Xcoord Ycoord Height dbh1 E obliqua 56.22506 137.3208 34 362 E obliqua 34.45058 137.3557 22 333 E obliqua 34.25678 136.1189 54 794 E obliqua 35.77208 136.785 66 685 E obliqua 35.97997 136.8556 43 276 E baxteri 37.03322 138.71 56 777 E baxteri 34.61981 136.8554 33 208 E baxteri 36.0738 139.8762 22 1019 A brownii 35.1474 138.6559 25 71

10 A brownii 37.81432 136.2933 62 4211 A brownii 35.95443 138.5847 23 2212 A brownii 35.51555 139.868 42 9313 A marina 35.78676 139.8709 23 10314 A marina 37.70242 136.0484 34 7615 A marina 34.00839 137.3669 43 3316 A marina 36.74387 137.9251 34 9117 A marina 37.92455 136.7602 43 55

Plants BirdsBats

© eResearchSA

© e

Rese

arch

SA

Embedding context

Observation

Observed Entity

Observation Processobserved under

part of part of

related to related to

self-observed

measurements of targeted things +observation of contextual things

measurements of effort +description of method context

‘document‘ of observed things + context

Observation Set

(Collection)

in

data set with associateddescription metadata

Integration

ÆKOS applies a flexible knowledge representation approach

Collection BookGraph ChapterObservation SectionEntity Paragraph on common subjectStatement SentenceValue Object of a SentenceMetadata Front Matter/Edition NoticeOntology GrammarVocabulary Dictionary + Thesaurus

(Data) Collection

Graph

Observation

Entity

Statement

Value

Concept alignment

Overlapping concepts

Measurement standards

Classification systems

Preserve complexity in the data

Squash variation for discovery

Study location

Sampled area

Landscape features

Sampling unit

Organism group(vegetation association)

Organism group(individual tree)

Entity Attribute Value

Org_gp 0001 Tree height 8

Org_gp 0001 Species E. camaldulensis

Org_gp 0001 DBH 45

Org_gp 0001 Life stage Mature

Org_gp 0001 Condition Good

Org_gp 0001 Floristics Flowering

Org_gp 0001 Shape C

Representing data as information

Integration

Agile data management

ÆKO

S’s

cove

rage

Leve

l of D

ata

Com

plex

ity (R

ichn

ess)

DataOne

Nature

ALA(Species

Data)

No data

ANDS- RDA, TDDP

Vegbank

Pangaea

Other Atlases

ÆKOS Researcher Datasets (SHaRED)

ÆKOS (Site Data)

Level of data integrationFully Integration

ÆKOS in the data landscape

ÆKOS in the data landscapeLe

vel o

f Des

crip

tion

DataOne

Nature

ALA (species

data)

No data

ANDS- RDA, TDDP

Vegbank Pangaea

Other Atlases

ÆKOS Researcher Datasets (SHaRED)

ÆKOS Integrated Site Data

Fully Integration

Level of data integration

Science impact

/

Infrastructure uptake

International National

Feedback and collaborators wanted

Website: www.aekos.org.au

d.turner@adelaide.edu.au