Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner...
-
Upload
edwina-lindsey -
Category
Documents
-
view
217 -
download
0
Transcript of Moving beyond preservation: Developing a platform to enable complex data reuse Dr. David Turner...
Moving beyond preservation: Developing a platform to enable complex data reuse
Dr. David Turner ([email protected])
Eco-informatics facility, Terrestrial Ecosystem Research Network,University of Adelaide, Australia
www.aekos.org.au
Data services
• Data & information management• Knowledge modelling• Data relationship management• Licensing, citations and condition of use• Informatics and community practices• User support• Usage statistics to support our data
contributors
plot
complex
well-described
integrated
ecology
ÆKOS’s Niche
N
sites, surveys, plotscomplex well-described integrated
ÆKOS Data
Primary
ecology
ÆKOS’s Niche
The data revolution
An emerging consensus?
Free of financial barriers• for any researcher to contribute to• for any user to access immediately
on publication
Made available without restriction on reuse for any purpose
• subject to proper attribution
Quality-assured and published in a timely mannerArchived and made available in perpetuity
International Council for Science (ISCU)2 September 2014
Published data should be independently understandable
Peer (2014) International Journal of Digital Curation
Are there unique challenges in ecology?
“Our extensive experience … collecting empirical data is that large data sets are often nuanced and complex, and appropriate analysis of them requires intimate knowledge of their context and substance to avoid making serious mistakes in interpretation.”
David Lindenmayer and Gene E. Likens 2013. Benchmarking Open Access Science Against Good Science. Bulletin of the Ecological Society of America 94:338–340. http://dx.doi.org/10.1890/0012-9623-94.4.338
Ecological complexity
Reusing dataIdentify problem
Draft approach
Search for data
Acquire data
Assess suitability
Modify approach
Prepare data
Conduct analysis
Interpret results
Search for data
Acquire data
Assess suitability
Prepare data
Barriers to reuseIdentify problem
Draft approach
Search for data
Acquire data
Assess suitability
Modify approach
Prepare data
Conduct analysis
Interpret results
Dispersed: Data is stored in many storage locations and formats
Source:Forestcheck: www.dec.wa.gov.au
Complex: Data usually needs explanation and context before it can be accurately used
RecID Species Xcoord Ycoord Height dbh1 E obliqua 56.22506 137.3208 34 362 E obliqua 34.45058 137.3557 22 333 E obliqua 34.25678 136.1189 54 794 E obliqua 35.77208 136.785 66 685 E obliqua 35.97997 136.8556 43 276 E baxteri 37.03322 138.71 56 777 E baxteri 34.61981 136.8554 33 208 E baxteri 36.0738 139.8762 22 1019 A brownii 35.1474 138.6559 25 71
10 A brownii 37.81432 136.2933 62 4211 A brownii 35.95443 138.5847 23 2212 A brownii 35.51555 139.868 42 9313 A marina 35.78676 139.8709 23 10314 A marina 37.70242 136.0484 34 7615 A marina 34.00839 137.3669 43 3316 A marina 36.74387 137.9251 34 9117 A marina 37.92455 136.7602 43 55
www.nswrail.net
Diverse and fragmented: Ecological data covers a wide range of topics and there are many different ways of measuring, observing and expressing different concepts* Rapidly evolving with few measurement standards
Empowering researchers
Discovery
Comprehension
Extraction
Access
Integration
Publication- Article- Data and citation
to AEKOS
Identify problem
Draft approach
Search for data
Acquire data
Assess suitability
Modify approach
Prepare data
Conduct analysis
Interpret results
Consider your users
Discovery through traditional metadata
An example of a textual abstract for a data set:
• Otway Ranges Orchid Recovery Program
The aim of the project is to compile and implement recovery plans for nationally threatened native orchids occurring in the Otway Ranges region of Victoria. Populations are monitored to gauge the current threats or causes of decline and the effectiveness of recovery actions.
Species studied include: Caladenia argocalla, Pterostylis bryophylla, Thelymitra cyanapicata and Caladenia rigida.
This dataset contains records collected from 1966 to present.
The fields in this dataset include: Species name, GPS reading and datum, start and end dates, historical records, population size, key threatening processes, number of flowering individuals, number of flowers and the number of individual plants aborted, grazed, pollinated, hand pollinated, damaged, spent and caged. Images of the species and recovery activities are also available.
source: Metadata from Flora Information System , Information Services Section (ISS) of the Victorian Department of Sustainability and Environment.
Creating structure
Collection: Otway Ranges Orchid
Recovery ProgramSubject Keywords : EARTH
SCIENCE - BIOLOGICAL CLASSIFICATION – PLANTS -
ORCHIDACEAE
Subject Keywords : EARTH SCIENCE - BIOSPHERE -
ECOLOGICAL DYNAMICS – COMMUNITY DYNAMICS
READE, J. (2010) Population Trends and Key Management Actions for Otway Ranges Threatened Orchid
Species. DSE, 265pp.
Organisation: http://www.dse.vic.gov
.au/dse/
ISO Keywords: BIOTA
Subject Keywords: EARTH SCIENCE - BIOSPHERE - ECOLOGICAL DYNAMICS - SPECIES_THREATENING PROCESS
URL: http://www.viridans.com/FISVFD/VICFIS1.HTM
rights
spatial coverage
citation
location
subject
is managed by
is owned byPerson: Joe Reade[mail:[email protected]]
subject
subject
subject
Full Description: {abstract – as before}
description
Spatial Coverage: (38.4S – 38.9S, 143E – 144W)
Derived from: ANDS RIF-CS format metadata
Incorporate observations and description
Collection: Otway Ranges
Orchid Recovery Program
species coverage
Observed Entity: Organism
Measure Coverage: Organism Absence
Time Coverage: Jun 1966 - 2011
Spatial Coverage: Polygon: «sub-coastal area around Cape Otway»
spatial coverage
timecoverage
measurecoverage
Entity coverage
Measure Coverage: Organism Presence
measurecoverage
Species Target: Spider Orchid
Family Target: Orchidaceae
species target
species target
Species Coverage: Caladenia argocalla
Species Coverage: Pterostylis bryophylla
Species Coverage:Thelymitra cyanapicata
species coverage
species coverage
species coverage
Species Coverage: Caladenia rigid
Method Coverage: Visual Observation
methodcoverage
Measure Coverage: Organism Population
measurecoverage
Define relationships and vocabularies
Common Name:Orchid
Common Name: Spider Orchid
Genus: Caladenia Genus: Pterostylis
Genus: Thelymitra
equates to
Species: Caladenia argocalla
equates to
Spatial Coverage: Polygon: «sub-coastal area around Cape Otway»
Place: Otway Ranges
covers place
Species Target: Orchid
Family Target: Orchidaceae
Organisation: Department for
Sustainability and Environment, VIC.
Person: Joe Reade[mailto:[email protected]]
Person: Dr. Joe Reade
member of
Land Use: Forestry
AEKOS discovery
Access
The comprehension challenge
Data entropy
The information landscape
RecID Species Xcoord Ycoord Height dbh1 E obliqua 56.22506 137.3208 34 362 E obliqua 34.45058 137.3557 22 333 E obliqua 34.25678 136.1189 54 794 E obliqua 35.77208 136.785 66 685 E obliqua 35.97997 136.8556 43 276 E baxteri 37.03322 138.71 56 777 E baxteri 34.61981 136.8554 33 208 E baxteri 36.0738 139.8762 22 1019 A brownii 35.1474 138.6559 25 71
10 A brownii 37.81432 136.2933 62 4211 A brownii 35.95443 138.5847 23 2212 A brownii 35.51555 139.868 42 9313 A marina 35.78676 139.8709 23 10314 A marina 37.70242 136.0484 34 7615 A marina 34.00839 137.3669 43 3316 A marina 36.74387 137.9251 34 9117 A marina 37.92455 136.7602 43 55
Plants BirdsBats
© eResearchSA
© e
Rese
arch
SA
Embedding context
Observation
Observed Entity
Observation Processobserved under
part of part of
related to related to
self-observed
measurements of targeted things +observation of contextual things
measurements of effort +description of method context
‘document‘ of observed things + context
Observation Set
(Collection)
in
data set with associateddescription metadata
Integration
ÆKOS applies a flexible knowledge representation approach
Collection BookGraph ChapterObservation SectionEntity Paragraph on common subjectStatement SentenceValue Object of a SentenceMetadata Front Matter/Edition NoticeOntology GrammarVocabulary Dictionary + Thesaurus
(Data) Collection
Graph
Observation
Entity
Statement
Value
Concept alignment
Overlapping concepts
Measurement standards
Classification systems
Preserve complexity in the data
Squash variation for discovery
Study location
Sampled area
Landscape features
Sampling unit
Organism group(vegetation association)
Organism group(individual tree)
Entity Attribute Value
Org_gp 0001 Tree height 8
Org_gp 0001 Species E. camaldulensis
Org_gp 0001 DBH 45
Org_gp 0001 Life stage Mature
Org_gp 0001 Condition Good
Org_gp 0001 Floristics Flowering
Org_gp 0001 Shape C
Representing data as information
Integration
Agile data management
ÆKO
S’s
cove
rage
Leve
l of D
ata
Com
plex
ity (R
ichn
ess)
DataOne
Nature
ALA(Species
Data)
No data
ANDS- RDA, TDDP
Vegbank
Pangaea
Other Atlases
ÆKOS Researcher Datasets (SHaRED)
ÆKOS (Site Data)
Level of data integrationFully Integration
ÆKOS in the data landscape
ÆKOS in the data landscapeLe
vel o
f Des
crip
tion
DataOne
Nature
ALA (species
data)
No data
ANDS- RDA, TDDP
Vegbank Pangaea
Other Atlases
ÆKOS Researcher Datasets (SHaRED)
ÆKOS Integrated Site Data
Fully Integration
Level of data integration
Science impact
/
Infrastructure uptake
International National
Feedback and collaborators wanted
Website: www.aekos.org.au