Research data management
-
Upload
leon-osinski -
Category
Data & Analytics
-
view
190 -
download
0
Transcript of Research data management
![Page 1: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/1.jpg)
Research data managementPROOF Advanced course Information Literacy and Research Data ManagementTU/e, 12-11-2015
[email protected], TU/e IEC/Library
Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original
![Page 2: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/2.jpg)
Topics part Research data management
1. Usable data (tabular data)
2. Accessible data (DataverseNL)
![Page 3: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/3.jpg)
Topic #1
1. Usable data (tabular data)
2. Accessible data (DataverseNL)
![Page 4: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/4.jpg)
What is the nature of the “unusual episode” to which this table refers?
![Page 5: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/5.jpg)
![Page 6: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/6.jpg)
Raw data: https://www.amstat.org/publications/jse/datasets/titanic.dat.txt
Documentation of the data:
https://www.amstat.org/publications/jse/datasets/titanic.txt
Size (number of observations and variables)
Description
Provenance
Variable descriptions
Based on:
The "Unusual Episode" Data Revisited / by Robert J. MacG. Dawson, in: Journal of Statistics Education vol. 3(1995), issue 3
![Page 7: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/7.jpg)
Morphological Measurements of Galapagos Finches
http://dx.doi.org/10.5061/dryad.152
Use of standard names (taxonomy, species)
Variable names clear enough? WingL must be wing length but what is N.Ubkl?
Based on:
Looking after datasets / by Antony Unwin, 01-09-2015, http://blog.revolutionanalytics.com/2015/09/looking-after-datasets.html
![Page 8: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/8.jpg)
Air crashes
http://bit.ly/KIB_PlaneTruth
meaning of px?
basis for visualizations
Ecological datasets: http://esapubs.org/archive/ecol/E090/118/
excellent metadata including project description, experimental design and license information (copyright)
Sample datasets: http://dx.doi.org/10.6084/m9.figshare.1314459
![Page 9: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/9.jpg)
Heart rate changes… / by Daniel Lakens, http://dx.doi.org/10.4121/uuid:ab52261c-206b-4bed-a59d-026a16c04144
Excel-file
No documentation
Proteomic Analysis in Type 2 Diabetes Patients … / by Maria A. Sleddering , Albert J. Markvoort et. al., http://dx.doi.org/10.1371/journal.pone.0112835
Word.doc
![Page 10: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/10.jpg)
to allow your data to be easily: imported by data management systems; analyzed by analysis software, and ; combined with other data (interoperability)make sure that: each row represents a single observation (record) and each column a single
variable or type of measurement (field) every cell should contain only a single value there should be only one column for each type of information
Cross-tab structure / contingency table: different columns contain measurements of the same variable: easier to read but difficult to add data (columns) to the records (rows). See Titanic table versus Titanic raw data
Lessons learnedtable structure
![Page 11: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/11.jpg)
columns: use clear, descriptive variable names, avoid special characters (can cause problems with some software)
rows: if possible, use standard names within cells (derived from a taxonomy for example)
missing data / null values: best option: use a blank
Lessons learnedcolumns (variables) and rows (records)
![Page 12: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/12.jpg)
size of the data set: number of observations and variables explanation of the variables description of the data: what’s included and excluded, known problems or
inconsistencies in the data, units of measurement provenance (origin) of the data, data manipulation steps
a simple readme file can be enough (see documentation titanic dataset)
Lessons learnedintelligibility: documentation
![Page 13: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/13.jpg)
if possible use a non-proprietary (open) file format (are easier to use in a variety of software), like csv for tabular data
if possible, take the preferred formats of a data archive in account http://datacentrum.3tu.nl/fileadmin/editor_upload/File_formats/Digital_Preservation_Support_levels.pdf
Lessons learnedlong term availability
![Page 14: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/14.jpg)
Excel data provenance and documentation of data processing is bad
OpenRefine runs on your computer (not in the cloud), inside the Firefox browser (not in IE),
no web connection is needed working with OpenRefine: http://www.datacarpentry.org/OpenRefine-
ecology/01-working-with-openrefine.html captures all steps done to your raw data ; original dataset is not modified ; steps
are easily reversed ;
Toolsfor working with messy data
![Page 15: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/15.jpg)
Topic #2
1. Usable data (tabular data)
2. Accessible data (DataverseNL)
![Page 16: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/16.jpg)
Test environment: Go to: https://act.dataverse.nl/
[ Actual website: https://www.dataverse.nl ]
Click ‘Log in’ (at the top right)
Select SURFconext in the Please select your institution list and click Continue.
Select Eindhoven University of Technology and log on with your TU/e username and password
When asked for it, give permission to share your data by answering Yes or click this Tab
When asked to create an account, answer Yes or click this Tab.
When you succeeded to create an account, your username is: @[prefix of your email address]
DataverseNLlog in | creating an account
![Page 17: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/17.jpg)
Storage and backup of data through DANS [Dutch Archiving and Networking Services]
Data transfer: up to 2 Gb per dataset
Via 3TU.Datacentrum: up to 50 Gb free
DataverseNLstorage and backup of data
![Page 18: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/18.jpg)
Organization of data in Dataverse [Dataverse] Dataset (Data)file
Before uploading, you have to describe your data (‘metadata’) + Discovery metadata+ Formal metadata (for citation)+ Substantial metadata (for discovery)+ Metadata on data collection and methodology+ …
Version control of datasets, not of (data) files!
DataverseNLorganization and description of your data
![Page 19: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/19.jpg)
Read-, edit- and access rights by assigning roles to registered usersA role defines the permissions you have Access restricted site: reading rights only (downloading datafiles) Contributor: the previous plus creating and editing own Studies Contributor +: all the previous plus editing all Studies in a Dataverse Curator: all the previous plus publishing (‘releasing’) Studies & assigning access rights to
Studies Admin: all the previous plus assigning roles to users in a Dataverse & creating external user
accounts
Access rights to specified groups at Dataverse, Study and data file level ‘Unreleashed’ Study; only visible to persons who have access rights to that Study ‘Released’ Study: default Public ; after that access can be restricted (‘restricted access’) Access rights = 1reading/downloading data files ; 2edit rights = editing metadata, adding or
deleting data files [defined by a role]
DataverseNL access control by assigning roles and access rights to users #1
![Page 20: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/20.jpg)
DataverseNL access control by assigning roles and access rights to users #2
![Page 21: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/21.jpg)
DataverseNLrecognition for and collaborating on your data
Persistent identifier (DOI)
Assigning roles (with edit-rights) to users
[ Jointly / online analysis of data (Stata, SPSS, GraphML) ]
![Page 22: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/22.jpg)
Registering via SURFconext+ At start you only have a user account ( your email address) then Curator
may assign you reading rights or Admin a particular role (with rights) + ‘External’ persons can use DataverseNL but cannot create an account
themselves Admin has to do this
A Dataverse or Study that has not been released, is only visible to persons that have rights to that Dataverse or Study
A Dataverse or Study that has been released with full restriction of access, is still accessible to persons that have rights to that Dataverse or Study
Non released Studies do not have version control
Contributor cannot release own Studies / assigning access rights Admin or Curator has to do this after a request
When assigning rights (Permissions), do not forget to Save changes
DataverseNLpractical
![Page 23: Research data management](https://reader036.fdocuments.in/reader036/viewer/2022070513/5883f73c1a28ab34428b768d/html5/thumbnails/23.jpg)
More sharing or collaboration platforms