2018 Tarboton HydroShare Data Management...
Transcript of 2018 Tarboton HydroShare Data Management...
![Page 2: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/2.jpg)
Motivation: Water/hydrology research is a team sport
• requires integration of information from multiplesources• is data and computationally intensive• requires collaboration and working as a
team/community
Data
Analysis
Models
• Advancing Hydrologic Understanding
CyberInfrastructure Challenges• The data deluge
• Large datasets, data heterogeneity, Inadequate metadata
• Data Organization and Model Input preparation• Reproducibility• Software installation and configuration
• Platform dependencies, Library dependencies, Licensing
• Computational resources• Memory, disk and processing
![Page 3: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/3.jpg)
Outline
• Data Management 101(Many slides from Jeff Horsburgh Research Scholar’s presentation)
• HydroShare Overview• HydroShare Hands on
![Page 4: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/4.jpg)
The Steven Hall Story
With a little help, Steven deposited his dataset in the online
HydroShare repository
Steven collected his data in the
field and transformed
into a sharable format
Steven verified his data and metadata were correct but
kept the data private
Steven submitted his paper for
publication and responded to
reviews
Steven published his
paper and cited published data in HydroShare
Steven published his data in
HydroShare and received a DOI
From Jeff Horsburgh
![Page 5: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/5.jpg)
Data Management 101• How are you managing your data?
• There are simple guidelines to improve data management
• Benefits– Improved data organization – facilitates analysis– Improved reproducibility– Improved capacity for data re-use
Borer, E.T., E.W. Seabloom, M.B. Jones, and M. Schildhauer (2009). Some simple guidelines for effective data management, ESA Bulletin, 90(2):205-214, http://dx.doi.org/10.1890/0012-9623-90.2.205
From Jeff Horsburgh
![Page 6: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/6.jpg)
1. Don’t Mess with the Raw Data
• Always store uncorrected data with all of its “bumps andwarts”
• Do not make any corrections to this– You could change something that was actually correct– You could make mistakes while correcting other mistakes
• Script QA/QC procedures and write results to a new file/copyof the data
From Jeff Horsburgh
![Page 7: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/7.jpg)
An Example
From Jeff Horsburgh
![Page 8: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/8.jpg)
An Example
Removal of a calibration shiftFrom Jeff Horsburgh
![Page 9: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/9.jpg)
An Example
Removal of anomalous, out of range valuesFrom Jeff Horsburgh
![Page 10: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/10.jpg)
An Example
Removal of “bad data” – sensor malfunctionFrom Jeff Horsburgh
![Page 11: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/11.jpg)
2. Use Descriptive File Names
• Use only plain ASCII characters• Brief, but descriptive of content• Generally – avoid spaces in file names• Include a “readme” file when using many files in a directory
From Jeff Horsburgh
![Page 12: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/12.jpg)
This might not be the best system…
How could we make this better?
From Jeff Horsburgh
![Page 13: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/13.jpg)
Streamflow Data from USGS
From Jeff Horsburgh
![Page 14: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/14.jpg)
4. Do Not Mix Data Typesin Table Columns
• Numeric, strings, date/time, boolean• Different software packages will handle mixed
data types inconsistently• Can be more difficult to detect errors in the
data• Can cause erroneous results
From Jeff Horsburgh
![Page 15: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/15.jpg)
5. Archive Data in Non-ProprietaryData Formats
• Microsoft Excel is widely available and usednow, but what about in 10 years? 20 years?
• How many other software programs can openyour data?
• Will your data disappear if the fileformat/software become obsolete?
From Jeff Horsburgh
![Page 16: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/16.jpg)
• Does Your Office LookLike This?
• What are thepotential problems?
• What are somepotential solutions?
6. Preservation/Backup MediaHow are you preserving your data now?
From Jeff Horsburgh
![Page 17: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/17.jpg)
• Natural disaster• Facilities infrastructure failure• Storage failure• Server hardware/software failure• Application software failure• External dependencies• Format obsolescence• Legal encumbrance• Human error• Malicious attack by human or
automated agents• Loss of staffing competencies• Loss of institutional commitment• Loss of financial stability• Changes in user expectations and
requirements
Data Loss
CC im
age
by S
hary
nM
orro
w o
n Fl
ickr
CC im
age
by m
ombo
leum
on F
lickr
Slide courtesy DataONE.From Jeff Horsburgh
![Page 18: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/18.jpg)
To the Cloud!• Convenience• Accessibility anywhere• Cross platform• Enhanced sharing• Low cost
• But…• Privacy???????• Delay (slow or non-existent
internet)• Storage, but not much else• File formats and semantics
still matter• No community of similar
experts From Jeff Horsburgh
![Page 19: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/19.jpg)
Why store your model on Hydroshare (where your data is also located)?
• Model creates reproducible results• Models/code can be shared by simply
giving permission (no need to copy)• Models can be re-executed at any time
From Jeff Horsburgh
![Page 20: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/20.jpg)
Reproducible Visualization in Python
From Jeff Horsburgh
![Page 21: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/21.jpg)
8. Maintain Metadata (Information about Data)
Borer et al.: “Do not underestimate your ability to forget details about a study!”
– WHO created the data?– WHAT is the content of the data?– WHEN were the data created?– WHERE is it geographically?– WHY were the data developed?– HOW were the data developed?
From Jeff Horsburgh
![Page 22: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/22.jpg)
• When you provide data to someone else, what types of information would you want to include with the data?
• When you receive a dataset from an external source, what types of details do you want to know about the data?
Sharing Data: The Golden Rule
From Jeff Horsburgh
![Page 23: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/23.jpg)
• Providing data: – Why were the data created? – What limitations do the data have? – What does the data mean? – How should the data be cited if it is re-used in a new study?
• Receiving data:– What are the data gaps?– What processes were used for creating the data?– Are there any fees associated with the data?– In what scale were the data created? – What do the values in the tables mean?– What software do I need in order to read the data?– What projection are the data in?– Can I give these data to someone else?
Sharing Data
From Jeff Horsburgh
![Page 24: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/24.jpg)
Necessary Meta/data Structure
The degree of metadata format and structure necessary for different levels of projected secondary data utilization. (adapted from Michener et al., 1997).
From Jeff Horsburgh
![Page 25: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/25.jpg)
Summary
1. Don’t mess with the raw data2. Use descriptive file names3. Use descriptive file headers4. Do not mix data types in table columns5. Archive data in non-proprietary data formats6. Consider media7. Ensure repoducibility8. Maintain metadata
From Jeff Horsburgh
![Page 26: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/26.jpg)
Data and models used by hydrologists are diverse…• Time series• Geographic rasters• Geographic features• Multidimensional space/time• Model programs• Model instances• …
141 241 341
131 231 331
121 221 321
111 211 311
441
431
421
411
142 242 342
132 232 332
122 222 322
112 212 312
442
432
422
412
143 243 343
133 233 333
123 223 323
113 213 313
443
433
423
413
Y
X
Time
http://www.unidata.ucar.edu
http://www.usgs.gov
http://www.esri.com
From Jeff Horsburgh
HydroShare can hold data in a wide variety of formats, and data in any format as “generic”
![Page 27: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/27.jpg)
How do people share other content now
• YouTube• Facebook• Instagram• Drop Box• Google Drive• ArcGIS Online• Hydrologic data ?
![Page 28: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/28.jpg)
HydroShare is a platform for sharing Hydrologic Resources and Collaborating•File Storage
Value Added Functionality
DropBox-ish Functionality
dropbox.com
• Meta Data Descriptions• Data Access API• Web Apps• Social Functions• DOI Data Publication
The goal of HydroShare is to advance hydrologic science by enabling the scientific community to more easily and freely share products resulting from their research - not just the scientific publication summarizing a study, but also the data and models used to create the scientific publication.
![Page 29: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/29.jpg)
![Page 30: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/30.jpg)
Collaborative data sharing
Add content to HydroShare to share with your colleagues or formally publish
to document result reproducibility
![Page 31: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/31.jpg)
Resources (data and models) in HydroShare are objects of collaboration (social objects)
For each resource you can- Manage who has access
- To edit- To view
- Comment or rate- Get unique identifier- Describe with metadata- Organize into collections- Formally publish- Version- Open with compatible web
app
![Page 32: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/32.jpg)
Resources formally published receive a citable digital object identifier (DOI) and are made immutable to changes
...
Formal data publication
![Page 33: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/33.jpg)
Automatic and natural metadata gathering eases some of the pain of metadata entry
For geographic raster WGS 84 Coverage information automatically harvested from GeoTIFF coordinate system information
For multidimensional netCDF data with CF convention metadata the HydroShare metadata can be fully and automatically completed
![Page 34: 2018 Tarboton HydroShare Data Management Tutorialdata.mekongwater.org/static/files/2019-Mekong... · • Reproducibility • Software installation and configuration • Platform dependencies,](https://reader033.fdocuments.in/reader033/viewer/2022050106/5f44dee0300c20033235f4af/html5/thumbnails/34.jpg)
Summary1. A new, web-based system for advancing model and data sharing2. Access multiple types of hydrologic data using standards compliant data
formats and interfaces3. Flexible discovery functionality4. Model sharing and execution5. Facilitate and ease access to use of high performance computing6. Social media and collaboration functionality7. Links to other data and modeling systems8. Enable more rapid advances in hydrologic understanding through
collaborative data sharing, analysis and modeling9. Much of the functionality has applicability to other geosciences beyond
hydrology