D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data...
Transcript of D5.6 Updated EDSA Data Management Plan · Project acronym: EDSA Project full name: European Data...
Projectacronym: EDSA
Projectfullname: EuropeanDataScienceAcademy
Grantagreementno: 643937
D5.6UpdatedEDSADataManagementPlan
DeliverableEditor: EmilyVacher(ODI)
Othercontributors:
DeliverableReviewers:
R.Brochenin(Tu/e)/ShathaJaradat(KTH)
Deliverableduedate: 31/07/2016
Submissiondate: 29/07/2016
Distributionlevel: Public
Version: 1.0
ThisdocumentispartofaresearchprojectfundedbytheHorizon2020FrameworkProgrammeoftheEuropeanUnion
ChangeLog
Version Date Amendedby Changes
0.1 27/05/2016 EmilyVacher Createddocument,addedinitialplanoutline
0.2 31/05/2016 EmilyVacher AddeddatasetsandWPdescriptions
0.3 10/06/2016 EmilyVacher Incorporatedamendments
0.4 23/06/2016 EmilyVacher Updatedwithnewpublisheddata
0.5 28/06/2016 EmilyVacher Incorporatedamendments
0.6 27/07/2016 ElenaSimperl ScientificReview
1.0 29/07/2016 AnetaTumilowicz FinalQA
D5.6UpdatedEDSADataManagementPlanPage3of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
TableofContents
ChangeLog..................................................................................................................................................................................2
TableofContents......................................................................................................................................................................3
ListofTables..............................................................................................................................................................................4
ListofFigures............................................................................................................................................................................5
1 ExecutiveSummary.........................................................................................................................................................6
1.1 Lessonslearnt...........................................................................................................................................................6
1.2 UpdatesfromInitialDMP.....................................................................................................................................7
2 Policy..................................................................................................................................................................................10
2.1 DatastandardsandmetadatapolicyforEDSA.........................................................................................10
2.2 DatasharingpolicyforEDSA...........................................................................................................................10
2.3 SupportingpeoplewhowanttouseEDSAdata......................................................................................11
2.4 DatastorageandmanagementpolicyforEDSA......................................................................................12
2.5 DatapreservationandarchivingpolicyforEDSA...................................................................................13
3 Challengesanddecisions............................................................................................................................................14
3.1 Informedconsent.................................................................................................................................................14
3.2 Anonymisationofpersonaldata....................................................................................................................14
3.3 Thirdpartylicences.............................................................................................................................................15
4 DataManagementPlan...............................................................................................................................................16
4.1 Summary..................................................................................................................................................................16
4.2 TheEDSARegister...............................................................................................................................................16
4.2.1 Introduction....................................................................................................................................................16
4.3 Workpackage1‐Demandanalysisandadvisoryboard.....................................................................17
4.3.1 Corporaofcrawledweb‐basedadvertsfromLinkedIn................................................................17
4.3.2 AggregatedstatisticsofEuropeanskilldemandbasedonweb‐basedjobadverts..........18
4.3.3 Individualresultsfromdemandanalysis...........................................................................................20
4.3.4 Summarydatafromsurveysandinterviews....................................................................................21
4.3.5 De‐identifiedsurveyresponsesfromdemandanalysis................................................................22
4.3.6 Recordingsandtranscriptionsofinterviews....................................................................................23
4.3.7 ideXlabsearchplatformresults..............................................................................................................24
4.4 Workpackage2–Curriculaandcoursedevelopment.........................................................................26
4.4.1 RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU...26
4.4.2 Datasetforcourseexamplesandexercises.......................................................................................27
4.4.3 Eventlogfromamunicipalityprocess................................................................................................29
4.5 Workpackage3–Trainingdeliveryandlearninganalyticsfeedback...........................................30
4.5.1 Repositorystatisticsondownloadsandviewsofeducationalresources.............................30
Page4of43EDSAGrantAgreementno.643937
4.5.2 LearningAnalyticsdatageneratedfromtheEDSAOnlineCoursesportal...........................32
4.5.3 InternallogofeLearningsystems..........................................................................................................33
4.5.4 Statisticsofcourseregistration,participationandcompletion................................................34
4.5.5 Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources.........................................................................................................................................................................36
4.5.6 RecordedbehaviorofstudentsfollowingthefirstsessionoftheprocessminingMOOC37
4.6 Workpackage4–Disseminationandcommunitybuilding...............................................................38
4.6.1 WebserverlogsandGoogleanalyticsofprojectwebsiteaccess.............................................38
4.6.2 Generatedsocialmediaengagementdata..........................................................................................40
4.7 Workpackage5–Exploitation.......................................................................................................................41
4.7.1 Listofprojectexploitationresults‐collaborations,institutionalandgeographicalbeneficiaries...................................................................................................................................................................41
4.7.2 TheEDSARegister........................................................................................................................................42
ListofTablesTable1:EntriesintheDataManagementPlan‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐8
Table2:FourLevelsofCertificates‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐11
Table3:CorporaofcrawledWeb‐basedadvertsfromLinkedIn‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐17
Table4:AggregatedStatisticsofEuropeanskilldemandonweb‐basedjobadverts‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐18
Table5:Individualresultsfromdemandanalysis ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐20
Table6:Summarydatafromsurveysandinterviews‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐21
Table7:De‐identifiedsurveyresponsesfromdemandanalysis‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐22
Table8:Recordingsandtranscriptionsofinterviews‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐23
Table9:IdeXlabsearchplatformresults‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐24
Table10:RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU‐‐‐‐‐26
Table11:Datasetforcourseexamplesandexercises‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐27
Table12:Eventlogfromamunicipalityprocess‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐29
Table13:Repositorystatisticsondownloadsandviewsofeducationalresources‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐30
Table14:LearninganalyticsdatageneratedfromEDSAonlinecoursesportal‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐32
Table15:Internallogofelearningsystems ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐33
Table16:Statisticsofcourseregistration,participationandcompletion‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐34
Table17:Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐36
Table18:RecordedbehaviourofstudentsfollowingthefirstsessionoftheprocessminingMOOC37
Table19:WebserverlogsandGoogleanalyticsofprojectwebsiteaccess‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐38
Table20:Generatedsocialmediaengagementdata‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐40
D5.6UpdatedEDSADataManagementPlanPage5of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Table21:Listofprojectexploitationresults‐collaborations,institutionalandgeographicalbeneficiaries‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐41
Table22:TheEDSARegister‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐42
ListofFiguresFigure1EntriesintheDataManagementPlan‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐8
Figure2TheDataSpectrum‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐10
Page6of43EDSAGrantAgreementno.643937
1 ExecutiveSummaryTheEuropeanDataScienceAcademy(EDSA)isparticipatinginthepilotactiononopenaccessresearchdata,asdefinedintheguidelinesonOpenAccesstoScientificPublicationsandResearchDatainHorizon20201.EDSAdataincludesdatathatisused,generatedandcollectedbytheproject.Thedatamanagementplan(DMP)iskeytotrackingthesedatasets,andidentifyingwhichofthemhavebeenorcanbepublishedunderanopenlicence.Itisnotalwaysappropriatetopublishdataasopendata;thedatamanagementplanallowsustoclearlyseewhatdataisnotpublishedopenlyandthereasonforthat.ThisistheseconditerationoftheDMP.ThefirstwasincludedinD5.5atMonth6oftheproject.TheoriginalDMPwasanoutlineofthedataweanticipated,whereasthisDMPincludesdatathatwehavestartedcollecting,generatingandusingintheproject.ThefinalversionoftheDMPisdueatMonth36oftheproject.TheEDSADMPincludesthefollowinginformationforeachdataset:
Datasetreferenceandname Datasetdescription Standardsandmetadata Datasharing Archivingandpreservation
Specifically,ourgoalsareto:
Manageandmaintaindata,whereapplicable,toensurequalityandtomakethedatausable. Ensure that all dataproducedby theproject is subject to appropriate levels of security and
privacy. Publishdataproducedbytheprojectunderanopenlicence,wherepossible.
Atthehalfwaystageoftheproject,thisDMPaddressessomeofthekeychallengesandlessonsthattheConsortiumhaslearnt.Theseareoutlinedbelow.Theplanalsooutlineshowindividualdatasetswillbemaintainedduringandaftertheproject.Wewillcontinuetoupdatedatasetswhereappropriate,collectandgeneratenewdatasetsandpublishasmuchofourdataasopenaspossible,usingthelessonsthatwehavelearnt.
1.1 Lessonslearnt
ThemanagementofEDSAdatahasprovideduswithsomeusefullessonsforfurtheriterationsoftheDMP.Theselessonscanbedividedintotwocategories:licensingandmanagingrisk.
1GuidelinesonOpenAccesstoScientificPublicationsandResearchDatainHorizon2020(2016)http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020‐hi‐oa‐pilot‐guide_en.pdf[accessed29/06/2016]
D5.6UpdatedEDSADataManagementPlanPage7of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Licensing
Checkthirdpartylicencesatregularintervalstoensurethatweareadheringtotheirtermsaslicencechangesaregenerallynotcommunicatedbydataproviders.
EncouragetheuseofCreativeCommonslicences2,specificallyCC‐BY,withintheEDSAprojecttomakeitaseasyaspossibletoreuseourdata.
Managingrisk
Anticipatechangesindatausewherepossible. Getinformedconsentforuseofpersonaldata
Wewilladdtotheselessonsastheprojectcontinuesandwefacemorechallengeswithourdatause.
1.2 UpdatesfromInitialDMP
TheEDSADMP is not a static log of the project datasets but an evolving resource, that reflects thechangingnatureofthedatatheprojectcollectsorgenerates.ThisDMPreflectstheprogresstheprojecthasmade in thepast12monthsof theproject ‐ incorporating furtherdatasetsandadhering toourguidingpoliciesonopenpublication.D5.5providedaninitialsnapshotofthedatawemanagedintheearlystagesoftheprojectandthedatathatweanticipatedwouldbecollectedorgeneratedoverthecomingmonths.ThisDMPreflectsthecurrentstatusoftheproject’sdatasetsinmoreextensivedetail.Figure1showsthedatasetswhichhavebeencollectedorgeneratedbytheConsortiumbetweenmonths6and18oftheproject.TherearetwonewentriestotheDMP:
De‐identifiedsurveyresponsesfromthedemandanalysisresearch LearningAnalyticsdatageneratedfromtheEDSAOnlineCoursesportal
Thede‐identifiedsurveyresponseshavereplacedtheaggregatedresultsfromtheonlinesurveydatasetasthewaythedataispresentedhasbeenchanged.Foramoredetaileddiscussiononthistopicseethesectionon‘Challengesanddecisions’,orD1.4.Learning Analytics from the EDSA Online Courses portal is a new entry to this DMP. It has beenpublishedopenlyviaGithub3,forotherstobenefitfrom,astherearenorestrictionswiththethirdparty.The entriesbelowhavebeen removed from thisDMPas theyhaveeitherbeen replacedby specificdatasetentries,orarenolongerexpectedtobecollectedorgeneratedastheprojecthasprogressed.Theremovedentriesare:
2PleasenotethattherearemultipleCreativeCommonslicences,whichareoutlinedontheirwebsite:https://creativecommons.org/licenses/3https://alexmikro.github.io/learning‐analytics‐dataset‐from‐the‐edsa‐online‐courses‐portal/
Page8of43EDSAGrantAgreementno.643937
Aggregatedresultsfromtheonlinesurvey(asabove) Aggregatedstatisticsofnetworkingandengagementdata(datasetsnowexplicitlystated) Linkedopendatasources, suchas theDBLPComputerScienceBibliography4andGeoNames
Ontology5(datasetsnowexplicitlystated) Publically available governmental, financial, network and environmental datasets for each
course(datasetsnowexplicitlystated)
Table1:EntriesintheDataManagementPlan
WorkPackage
Lead Dataset ProjectPhase
Status NewentrytoDMPD5.6
WP1 ODI Corporaofcrawledweb‐basedadvertsfromLinkedIn
M6‐M18 Finished No
WP1 ODI AggregatedstatisticsofEuropeanskilldemandbasedonweb‐basedjobadverts
M6‐M18 Ongoing No
WP1 ODI Individualresultsfromdemandanalysis M2‐M18 Finished No
WP1 ODI DemandAnalysisSummary M18 Finished Yes
WP1 ODI De‐identifieddatafromdemandanalysis M2‐M18 Finished Yes
WP1 ODI Recordingsandtranscriptionsofinterviews M2‐M18 Finished No
WP1 ODI ideXlabsearchplatformresults M6‐M36 Ongoing No
WP2 ODI RelatedcoursedataregardingsimilarmodulesandtrainingofferingsacrosstheEU
M18 Finished No
WP2 Persontyle Datasetsforcourseexamplesandexercises M6‐M36 Ongoing No
WP2 TU/e Eventlogfromamunicipalityprocess M12‐M36
Ongoing No
WP3 OU LearningAnalyticsdatageneratedfromtheEDSAOnlineCoursesportal
M12‐M36
Ongoing Yes
WP3 JSI Repositorystatisticsondownloadsandviewsofeducationalresources
M12‐M36
Ongoing No
WP3 JSI Internallogsofelearningsystems M12‐M36
Ongoing No
WP3 JSI Statisticsofcourseregistration,participationandcompletion
M12‐M36
Ongoing No
4http://dblp.uni‐trier.de/5http://www.geonames.org/ontology/documentation.html
D5.6UpdatedEDSADataManagementPlanPage9of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
WP3 JSI Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources
M12‐M36
Ongoing No
WP3 TU/e RecordedbehaviorofstudentsfollowingthefirstsessionoftheprocessminingMOOC
M12 Finished No
WP4 SOTON WebserverlogsandGoogleanalyticsofprojectwebsiteaccess
M12‐M36
Ongoing No
WP4 SOTON Generatedsocialmediaengagementdata M12‐M36
Ongoing No
WP5 ideXlab Listofprojectexploitationresults–collaborations,institutionalandgeographicalbeneficiaries,
M18‐M36
Ongoing No
WP5 ODI EDSAregister M6‐M36 Ongoing Yes
Page10of43EDSAGrantAgreementno.643937
2 PolicyInD5.5weoutlinedtheoverallEDSApoliciesfordatastandardsandmetadatastandards,datasharinganddatapreservation,inlinewithbestpracticeforestablishingaDMP6.
2.1 DatastandardsandmetadatapolicyforEDSA
Standardisingtheproject’scollectionandproductionofdataensuresreusabilityandinteroperabilitywithintheproject,andexternallyifopenlyavailable.Wherepossible,dataismadeavailableinCSV,JSONorlinkeddatainRDFformat
2.2 DatasharingpolicyforEDSA
Wherepossible,opendatawillbeprovidedsothatothersareabletoaccess,useandsharethedata.ThisdatawillbemadeavailableunderaCreativeCommonsAttributionlicence(CCBY4.0)TheOpenDataInstitutehasproducedadataspectrumassettoexplainfrequentlyused,butfrequentlymisinterpreted terms, such as open data, closed data, personal data, and big data. Themost usefulcategorisationofdataisthroughthelicenceandaccessrights.Dataexistsonaspectrum7,whichrangesfromclosedtoshared,toopen.
Figure1:TheDataSpectrum
CC‐BYTheOpenDataInstitute
6http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020‐hi‐oa‐pilot‐guide_en.pdf7https://theodi.org/data‐spectrum
D5.6UpdatedEDSADataManagementPlanPage11of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Thesurveyandinterviewdatafromthedemandanalysisprovidesuswithagoodexampleofdataacrossthespectrum.Thelistofnamesandemailaddressesofparticipantsisclosed.ThisiscurrentlyonlyheldbyonememberoftheConsortiumandwillonlybeusedifrequiredforauditpurposes.Theindividualrecordingsandtranscriptsareanexampleofshareddata.Theseareonlyavailabletomembersoftheconsortiumwho have been given named access. The de‐identified survey results from the demandanalysisareopen.ThisdataispublishedonGithubunderaCC‐BYlicence8.
2.3 SupportingpeoplewhowanttouseEDSAdataWeusetheODI’sOpenDataCertificatestandardtobenchmarkeachopendataset.Thiswillenableuserstoseewhenthedatawillbeupdated,whatformatthedataisin,whatsupportisavailableandwhereitcamefrom.Wherewehavepublisheddataopenly,wehaveusedtheOpenDataInstitute’scertificationprocesstodemonstratetopotentialreusersthatitisqualityopendata.TherearefourlevelsofCertificates9:
Table2:FourLevelsofCertificates
Bronze
The data is openly licensed, available with no restrictions, accessible and legallyreusable.
Silver
The data satisfies the Bronze requirements, the data is documented in amachinereadable format, is reliable and offers ongoing support from the publisher via adedicatedcommunicationchannel.
Gold
ThedatasatisfiestheSilverrequirements,ispublishedinanopenstandardmachinereadable format, has guaranteed regular updates, offers greater support,documentation,andincludesamachinereadablerightsstatement.
Platinum
The data satisfies the Gold requirements, has machine readable provenancedocumentation, uses unique identifiers in the data, the publisher has acommunications team offering support. This is an exceptional example of aninformationinfrastructure.
CC‐BYTheOpenDataInstitute
8http://davetaz.github.io/quantitative‐data‐from‐edsa‐demand‐analysis‐/9https://certificates.theodi.org/en/
Page12of43EDSAGrantAgreementno.643937
Currently,notallofthedatathathasbeenpublishedhasbeencertified,althoughthisisouraimandinprogress.EDSADatasetsthatcurrentlyhaveacertificate:
TheEDSARegisterhasabronzecertificatebecauseitis:o Openlylicensedandlegallyreusable(=‘open’)o Accessibleontheweb
De‐identifiedsurveyresponsesfromthedemandanalysishasasilvercertificatebecauseitis:o Openlylicensedandlegallyreusable(=‘open’)o Accessibleonthewebo Publishedinamachinereadableformato OffersongoingsupportfromthepublisherviaGithub
ItisimportanttonotethatthereisnoissueinbeingatBronzelevel,asthedataisstillpublishedopenlytoa level thatmeetsuserneeds.Higher isnotalwaysappropriate for theEDSAdataas there isnotalwaysamechanism inplace forongoingdiscussionof thedata (Silver), and itmaynotbeupdatedregularly, especiallybeyond the lengthof theproject (Gold).Over99%of allODI certificates are toBronzestandard10.
2.4 DatastorageandmanagementpolicyforEDSATherearecurrentlythreemaintypesofrepositoryforEDSAdata:openaccessrepositories(forexampleGithub),theEDSAprojectwebsiteandinternalinstitutionalandorganisationalrepositoriesforsecurelyholdingdata.Thecriteriafordeterminingwheredataisstoredisasfollows:Openaccessrepositories:Wearefollowingapolicyof‘openbydefault.’Ifthereisnoreasonwhythedatacannotorshouldnotbepublishedopenly,thenourpolicyisthatitshouldbepublishedunderanopenlicence.Opendataaboutindividualsshouldbede‐identified,andonlypublishedwiththeconsentoftheindividualsconcerned.Thedatashouldalsobeunrestrictedbytermsofuse.TheEDSAprojectwebsite:TheaimisthatallofthedatathatispublishedopenlywillbemadeavailableviatheEDSAwebsite.Thisistoensurethatthedataisfindablebyaswideanaudienceaspossible.Datathatisopenlylicensedbutdifficulttodiscoverisnotwidelyconsideredtobeopendata.TheEDSAwebsitealsodisplaysdatathatcannotbepublishedopenly,oftenduetorestrictionsintermsofuse.Thisallowsuserstoviewthedata,oraggregationsofthedata.Internalinstitutionalandorganisationalrepositories:Somedatasetsinthedatamanagementplanarehostedinrepositoriesoftheorganisationresponsibleforthatdata.Whilesomeof theseare internal,hosted inConsortiumpartners’ internalrepositories,
10https://certificates.theodi.org/status
D5.6UpdatedEDSADataManagementPlanPage13of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
somedatasetsusedincoursematerialsarehostedonexternalrepositoriesthereforethatorganisationisresponsibleformaintainingthedata.Datasetshostedininternalrepositoriescannotbepublished,usuallyduetorestrictionsofuseofpersonaldata.
2.5 DatapreservationandarchivingpolicyforEDSAStriving forpreservationofdatawill enable long‐termvalue tobeadded to thedomainbeyond theproject.ItwillalsoproveavaluableresourcetoaEuropeanwideinitiative(EDSA)initiatedaspartofworkpackage5. Althoughtheaimoftheprojectistopreserveasmuchofthedataaspossible,datapublishedinexternalopenrepositoriesisreliantonthatsystem.Asaproject,EDSAareyettodetermineapolicyregardingarchivingofdatasets.ThiswillbedecidedpriortothefinalDataManagementPlan(M36).
Page14of43EDSAGrantAgreementno.643937
3 ChallengesanddecisionsCreatingandmaintainingaDMPfortheprojecthasensuredthatwehavebeenabletohighlightpotentialdatamanagementandusagechallengesandmakeinformeddecisionswithintheConsortiumonhowtheyshouldbeaddressed.Inthissection,wehighlightsomeofthemaintopicsofconsiderationwhenmanagingprojectdatasets.TheseprovideuswithusefullessonsforfurtheriterationsoftheDMP.
3.1 Informedconsent
Itisalegalrequirementtoinformpeoplehowtheirpersonaldataisgoingtobeusedandtoretrievetheirinformedconsent.Whilstthereareexceptionstothisrequirement,suchasnationalsecurityorforservicesinthepublicinterest,theseexceptionsdonotapplytothisproject.Thisisareathatwehaveaddressedintheprojectoverthelast12months.Theintendeduseofthedemandanalysissurveydatachangedoverthecourseofthe18‐monthdatacollection,duetothelengthofthestudy.Atthestartoftheproject,weplannedtoreleasesummarystatisticsofthequantitativesurveydatathroughourskillsdashboard.Accordingly,participantswereinformedthatdatawouldbemadeaccessibleinananonymous,aggregatedform.Following discussions during the evaluation of the pilot study, we established that releasing de‐identifiedsurveyresponseswouldaddvaluetotheproject’soutputs.Thisdatawouldprovideaccesstoresponsesonanindividualbasis,thusaddingmuchgreaterdetailandutilitytopotentialreusersofthedata.Consequently, thedatawouldnolongerbeaggregated.Inmonth9oftheproject,wethereforechangedthewordingoftheinformedconsentsectionofthesurvey,statingthatdatacouldlaterbemadepubliclyavailableinde‐identifiedformats,usinganopenlicence.Duetothischangeinuse,wehavenotusedthedatacollectedbeforethepermissionschangeonthedashboard,orintheopendataset.ThedatacollectedfromearlystudyparticipantshasbeenincludedintheaggregatedanalysisinD1.4,butarenotavailableintheopendata.Wehavealsonotpublisheddatafrompeoplewhohadwithheldconsentforthisuseoftheirdata.
3.2 Anonymisationofpersonaldata
ItwasimportanttoensurethattheLearningAnalyticsdatawasanonymisedbeforeitcouldbepublishedopenlyandthatnoindividualusercouldbeidentified.TheConsortiumcametoanagreedpolicywhichwillbeappliedwhereappropriatetofurtherdatasets.Wewillpublishdataopenlyifthedatahasbeende‐identified, and individual users cannot be recognised. De‐identified data will also be publishedalongside a Privacy Impact Assessment which identifies potential risks and how they have beenmanaged.
D5.6UpdatedEDSADataManagementPlanPage15of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
3.3 Thirdpartylicences
Therearemanytypesofopenlicences.Severaltimes,theConsortiumfacedchallengeswhenscouringthetermsandconditionsofthirdpartysites,suchasLinkedIn11,AdzunaAPI12,LearningLocker13,toensurethetermsofuseareadheredto.FortheEDSAdata,theConsortiumisencouragedtouseCreativeCommonslicencestoensurethatpeoplewishingtouseourdatacanclearlyfindhowtheycanuseit.AlternativeopenlicencesusedintheprojectaretheopensourceGNUGeneralPublicLicenceV314andthe3TU.DatacentrumGeneralTermsofUse15.Whendataiscollectedfromathirdpartywebsite,itisvitaltotrackthetermsofuse,asthesecanchange.Atthebeginningoftheproject,theConsortiumcollectedandpublisheddatafromathirdpartywebsite,LinkedIn. At the time, the terms allowed such use, but during the project the licence provided byLinkedInchanged.Wedidnotkeeparecordoftheoriginallicence.Therewasdebateaboutwhethertheprojectcouldkeepthedataopenlyavailable,asatthetimeofcollectionthiswaspermitted.Toavoidriskwedecidedtoremovethatdata,especiallyasthelinkstothelicencethatwehadusedspecificallystatedthatwecouldnotuseitinthewaywehadplanned.Another notable challenge was the use of data from Trovit16, a website that aggregates jobadvertisementsfromacrossEurope.Thisdatapopulatesthejobsdashboard.Thetermsofthelicencedidnotallowustousethedata,howeveraftercarefulconsiderationwedecidedasaConsortiumthattheUKtextanddatamining(TDM)exceptionforresearchpurposes17allowedtheuseofthedataaslongas thedata itself wasnotaccessiblebyothers.TheUK law followsguidance from theEUDatabaseDirective(96/9/EC),anddiscussionsonanEU‐wideTDMexceptionarelikelytotakeplacein2016.Restrictionsondatausefrequentlypreventindividualsfrommaximisingthevalueofthatdata.Ifthedatawasopen, and anyone could access, use and share it for anypurpose,Trovitwouldultimatelybenefit from increased coverage and traffic, via attribution. If a company does notwant anyone tobenefitfinanciallyfromtheirwork,anon‐commerciallicencesuchasCC‐BY‐NC4.018wouldstillenableotherstousethedataandlinkbacktoTrovit.
11https://developer.linkedin.com/legal/api‐terms‐of‐use12https://developer.adzuna.com/13http://learninglocker.net/14http://www.gnu.org/licenses/gpl‐3.0.en.html15http://researchdata.4tu.nl/fileadmin/editor_upload/pdf/General_terms_of_use_3TU.Datacentrum.pdf16http://jobs.trovit.co.uk/17http://www.legislation.gov.uk/ukdsi/2014/9780111112755p6(accessed30/06/2016)18https://creativecommons.org/licenses/by‐nc/4.0/
Page16of43EDSAGrantAgreementno.643937
4 DataManagementPlan
4.1 Summary
WhencreatingtheEDSADMP,wetookguidancefrombestpractice,onlinetoolssuchasDMPTool19andDMPonline20.DMPonlineallowsyoutoselectthespecificprojectcategory, inthiscaseHorizon2020pilot action on open access research data, and therefore ensure thatwe captured all the necessarymetadata.Thedatasetsareorganisedbyworkpackage.EachtablerepresentsonedatasetgeneratedorcollectedbytheEDSAproject.Eachtableincludesthefollowinginformation:
Datasetreferenceandname Datasetdescription Standardsandmetadata Datasharing Archivingandpreservation
WecreatedadatasetofallthedatasetsgeneratedorcollectedbytheEDSAproject.Detailsofthiscanbefoundbelow.
4.2 TheEDSARegister
4.2.1 Introduction
The EDSA Register is published under a CC‐BY 4.0 Creative Commons licence21. It is published onGitHub22andcanalsobeaccessedviatheEDSAwebsiteathttp://edsa‐project.eu/resources/datasets/ThedatasethasbeencertifiedasBronzeusingtheOpenDataCertificates23.ThisdatasetisupdatedeverythreemonthsbytheODIwithinformationfromtheWorkPackageleads.Thenextupdateisdueatmonth21oftheproject.WiththeConsortiumworkpackageleads,weexploredthedatasetsforeachworkpackage,enablingdiscoveryofwhatdatacouldbepublishedopenly.Welooktosharebestpracticesandtoensureahighqualityofopendata.Bestpracticesinclude:
Publishinginamachinereadableformat,e.g.CSV Providingsupportingdocumentationormetadata Using a clear open licence, preferably Creative Commons Attribution 4.0 licence24 for
consistency.
19https://dmp.cdlib.org/20https://dmponline.dcc.ac.uk/21https://creativecommons.org/licenses/22https://theodi.github.io/european‐data‐science‐academy‐register/ 23https://certificates.theodi.org24https://creativecommons.org/licenses/
D5.6UpdatedEDSADataManagementPlanPage17of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
WeusedaGoogleSheetasourdatamanagementtool,whichweupdateeverythreemonths.ItisthisGoogleSheetwhichisembeddedontheEDSAwebsite25.AnyeditsmadeontheGoogleSheetshowonthewebsiteinrealtime.
4.3 Workpackage1‐Demandanalysisandadvisoryboard
WP1hascollectedandgenerateddatafromthedemandanalysisstudy.Thisincludesrecordingsandtranscriptionsoftheinterviews,surveyresponsesandanonymisedresultsofthesurveysandinterviews
4.3.1 Corporaofcrawledweb‐basedadvertsfromLinkedIn
Table3:CorporaofcrawledWeb‐basedadvertsfromLinkedIn
DatasetReferenceandName
DatasetIdentifier WebSiteHarvest
Datasetdescription
Generatedorcollected Collected
Origin LinkedIn
Scale 46terms31languages47countries1harvestperday2162datapointsperday
Whoisthisusefulfor? Internaldemandanalysisandtoinformcurriculumdevelopment.
Similarexistingdatasets Manydatasetsarecollectedinthisarea,howeverduetothespecificnatureofthisstudy,collectionofnewdataisrequiredandintegrationwithexistingdatasets isnot viable.Thevalueof thisdataset comesfromtheprovisionofanup‐to‐datesnapshotofcurrentdatascienceskillsneedsacrosstheEU.
Standardsandmetadata
Methodologyfordatacollection/management AlldatacollectedistranslatedintoCSVformat.
Metadata,supportingmaterial Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideofthe project. The data collectedwill be used for internal analysis toinformthecreationofcurriculum.
Statusandlocationofmetadata
Metadataisnotpublicallyavailable
25http://edsa‐project.eu/datasets
Page18of43EDSAGrantAgreementno.643937
Datasharing
Licensing,dataprotection,ownershipandcopyright UsageoftheLinkedInserviceisboundbytheuseragreement
Ifthedatacannotbepublishedopenly,why?
ThetermsoftheLinkedInuseragreementnowforbidharvestingandcollection of data without express permission. When the data wascollected,thiswasnotthecase.
https://www.linkedin.com/legal/user‐agreement?trk=hb_ft_userag
Howwillthedatabeshared? Datawillbenotsharedoravailableforreuse
Datarepository InternalODIRepository
DatasetLink Thereisnoexternallink
Archivingandpreservation
Howlongshouldthedatabepreserved?
Untiltheendoftheproject
Approxendvolume <1Gb
Whoisresponsibleforthedatamanagementandcuration?
ODI lead data management and curation, other WP1 partners willcontribute
Qualityassuranceincludingbackupprocedures
BackeduptoaninternalODIrepository
Associatedcostsfordatamanagement
Approximately1dayeffortpermonth
4.3.2 AggregatedstatisticsofEuropeanskilldemandbasedonweb‐basedjobadverts
Table4:AggregatedStatisticsofEuropeanskilldemandonweb‐basedjobadverts
DatasetReferenceandName
DatasetIdentifier WebSiteStatistics
Datasetdescription
Generatedorcollected Collected
D5.6UpdatedEDSADataManagementPlanPage19of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Origin AdzunaAPI26Trovit27
Scale Varied
Whoisthisusefulfor? Populating the dashboard, internal demand analysis and to informcurriculumdevelopment.
Similarexistingdatasets Manydatasetsarecollected inthisarea,howeverduetothespecificnatureofthisstudy,collectionofnewdataisrequiredandintegrationwith existing datasets is not viable. The value of this dataset comesfromtheprovisionofanup‐to‐datesnapshotofcurrentdatascienceskillsneedsacrosstheEU.
Standardsandmetadata
Methodologyfordatacollection/management
AlldatacollectedistranslatedintoCSVformat.
Metadata,supportingmaterial TheAdzunadataisaccessibleviatheAdzunaAPI.TheTrovitdatawillbe not available for reuse or accessible by anyone outside of theproject.
Statusandlocationofmetadata
Metadataisnotpublicallyavailable
Datasharing
Licensing,dataprotection,ownershipandcopyright
ThedatawillbeavailableforuseviatheEDSAdashboard.Howeveritwillnotbeavailable todownloadas this contravenesTrovit’s termsandconditions.
Ifthedatacannotbepublishedopenly,why?
Trovit’s terms of use prohibit the use of their data. The researchexceptionallowsustousethedatabutnottomakeitavailableinrawformatforotherstoconsumeforcommercialpurposes.
Howwillthedatabeshared? ViatheEDSAdashboard
Datarepository InaninternalJSIrepository
DatasetLink N/A
Archivingandpreservation
Howlongshouldthedatabepreserved?
Untiltheendoftheproject
Approxendvolume <1Gb
26https://developer.adzuna.com/27https://www.trovit.com/
Page20of43EDSAGrantAgreementno.643937
Whoisresponsibleforthedatamanagementandcuration?
ODI lead data management and curation, other WP1 partners willcontribute
Qualityassuranceincludingbackupprocedures
BackeduptoaninternalJSIrepository
Associatedcostsfordatamanagement
Approximately1dayeffortpermonth
4.3.3 Individualresultsfromdemandanalysis
Table5:Individualresultsfromdemandanalysis
Datasetreferenceandname
Datasetidentifier IndividualResponses
Datasetdescription
Generatedorcollected Generated
Origin Guidedsurveysandonlineresponses
Scale 584surveys108interviews
Whoisthisdatausefulfor? Internaldemandanalysis.
Similarexistingdatasets A number of surveys exist in this domain but their data is notavailableto thisproject. ThisdatawillenableEDSAtobuildupacountrybycountryviewofcurrentcapacityandrequirementsfordatascienceskills.
Standardsandmetadata
Methodologyfordatacollection/management
Data collection methods outlined in D1.4. Translated into CSVformat.
Metadata,supportingmaterial
Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideoftheproject.Thedatacollectedwillbeusedforinternalanalysistoinformthecreationofcurriculum.De‐identifieddatawillbepubliclyavailable,wherepossible.
Statusandlocationofmetadata
Metadataisnotpublicallyavailable
DataSharing
Licensing,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why?
Dataprotectionofpersonaldata
D5.6UpdatedEDSADataManagementPlanPage21of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Howwillthedatabeshared? Datawillbenotsharedoravailableforreuse
Datarepository InternalODIrepository
DatasetLink Thereisnoexternallink
Archivingandpreservation
Howlongshouldthedatabepreserved?
Untiltheendoftheproject
Approximateendvolume <100Mb
Whoisresponsiblefordatacurationandmanagement?
ODI leaddatamanagementandcuration,otherWP1partnerswillcontribute
Qualityassuranceincludingbackupprocedures
BackeduptoaninternalODIrepository
Associatedcostsfordatamanagement
Approximately1dayeffortpermonth
4.3.4 Summarydatafromsurveysandinterviews
Table6:Summarydatafromsurveysandinterviews
Datasetreferenceandname
Datasetidentifier DemandAnalysisSummary
Datasetdescription
Generatedorcollected Generated
Origin Guidedsurveysandonlineresponses
Scale584surveys108interviews
Whoisthisdatausefulfor?Externalanalysisofrespondentswhotookthesurveysandinterviews.
Similarexistingdatasets None
Standardsandmetadata
Methodologyfordatacollection/management
DatacollectionmethodsoutlinedinD1.4.TranslatedintoCSVformat.
Metadata,supportingmaterialAREADME.mdfileisavailabledetailingthedatastructureandbasicusage.
Statusandlocationofmetadatahttps://theodi.github.io/edsa‐demand‐analysis‐summary‐data/
DataSharing
Licensing,ownershipandcopyright
CreativeCommonsAttribution(CCBY4.0)https://creativecommons.org/licenses/by/4.0/
Page22of43EDSAGrantAgreementno.643937
Ifthedatacannotbepublishedopenly,why?
Thedataispublishedopenly
Howwillthedatabeshared?DatawillbeavailabletoaccessfromtheEDSAwebsiteandtheODI’sGithubrepository.
Datarepository Github/EDSAwebsite
DatasetLinkhttps://theodi.github.io/edsa‐demand‐analysis‐summary‐data/
Archivingandpreservation
Howlongshouldthedatabepreserved?
AslongasGithubexistsasaminimum.Beyondthatavaluejudgementwouldhavetobemade.
Approximateendvolume <100Mb
Whoisresponsiblefordatacurationandmanagement?
ODIleaddatamanagementandcuration,otherWP1partnerswillcontribute
Qualityassuranceincludingbackupprocedures
Storedinexternalrepositories‐EDSAwebsiteandGithub
Associatedcostsfordatamanagement
Storedinexternalrepositories‐EDSAwebsiteandGithub
4.3.5 De‐identifiedsurveyresponsesfromdemandanalysis
Table7:De‐identifiedsurveyresponsesfromdemandanalysis
Datasetreferenceandname
Datasetidentifier DeidentifiedResponses
Datasetdescription
Generatedorcollected Generated
Origin OnlineSurveyhttp://edsa‐project.eu/resources/survey/
Scale 496surveyresults
Whoisthisdatausefulfor? Externalanalysisofresultsandtrendsbyanyonewhowishestogathersurveydataintheareaofdatascience
Similarexistingdatasets Thereareanumberofothersurveysthathavebeenaggregatedthatwecancompareourresulttooandusetheseresultsifnecessary.Thisdatasethasthesameeventualvaluetoothersinthearea.
Standardsandmetadata
Methodologyfordatacollection/management
DatacollectionmethodsoutlinedinD1.4.TranslatedintoCSVformat.
D5.6UpdatedEDSADataManagementPlanPage23of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Metadata,supportingmaterial
AREADME.mdfileisavailabledetailingthedatastructureandbasicusage.
Statusandlocationofmetadata
http://davetaz.github.io/quantitative‐data‐from‐edsa‐demand‐analysis‐/
DataSharing
Licensing,ownershipandcopyright
CreativeCommonsAttribution(CCBY4.0)https://creativecommons.org/licenses/by/4.0/
Ifthedatacannotbepublishedopenly,why?
Thedataispublishedopenly
Howwillthedatabeshared? DatawillbeavailabletoviewontheEDSAdashboardandaccessibleforfreeintheEDSAdashboardGithubrepository.
Datarepository Github/EDSADashboardonwebsite
DatasetLink http://davetaz.github.io/quantitative‐data‐from‐edsa‐demand‐analysis‐/
Archivingandpreservation
Howlongshouldthedatabepreserved?
AslongasGithubexistsasaminimum.Beyondthatavaluejudgementwouldhavetobemade.
Approximateendvolume <100Mb
Whoisresponsiblefordatacurationandmanagement?
ODIleaddatamanagementandcuration,otherWP1partnerswillcontribute
Qualityassuranceincludingbackupprocedures
Storedinexternalrepositories‐EDSAwebsiteandGithub
Associatedcostsfordatamanagement
Storedinexternalrepositories‐EDSAwebsiteandGithub
4.3.6 Recordingsandtranscriptionsofinterviews
Table8:Recordingsandtranscriptionsofinterviews
Datasetreferenceandname
Datasetidentifier InterviewTranscipts
Datasetdescription
Generatedorcollected Generated
Origin Interviews
Scale108transcripts108recordings
Whoisthisdatausefulfor? Internaldemandanalysis
Page24of43EDSAGrantAgreementno.643937
SimilarexistingdatasetsNosimilardatasetsexistthatareusableforthisproject.Theinterviewsprovideinsightsanddatapointsforuseinthedemandanalysis.
Standardsandmetadata
Methodologyfordatacollection/management
QualitativeandquantitativeresearchmethodologyforcollectionoutlinedinD1.4
Metadata,supportingmaterial
Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideoftheproject.Thedatacollectedwillbeusedforinternalanalysistoinformthecreationofcurriculum.
Statusandlocationofmetadata
Metadataisnotpublicallyavailable
DataSharing
Licensing,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why?
Dataprotectionofpersonaldata
Howwillthedatabeshared? Datawillbenotsharedoravailableforreuse
Datarepository InternalODIrepository
DatasetLink Thereisnoexternallink
Archivingandpreservation
Howlongshouldthedatabepreserved?
Untiltheendoftheproject
Approximateendvolume <3GB
Who is responsible for datacurationandmanagement?
ODIleaddatamanagementandcuration
Quality assurance includingbackupprocedures
BackeduptoaninternalODIrepository
Associated costs for datamanagement
AspartofthesubcontractingcostsofWP1
4.3.7 ideXlabsearchplatformresults
Table9:IdeXlabsearchplatformresults
DatasetReferenceandName
DatasetIdentifier ExpertIdentification
Datasetdescription
Generatedorcollected Collected
Origin Researchpublications
D5.6UpdatedEDSADataManagementPlanPage25of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Scale Notyetknownascollectionisongoing
Whoisthisusefulfor? Internal demand analysis and to inform curriculum development.Providesinsightsintooffersideofskillsanalysis.
Similarexistingdatasets Notinthisarea.Thisdatasetwillprovidevalidationofthedemandanalysisandformthebasisforfurtherinsights.
Standardsandmetadata
Methodologyfordatacollection/management
TheideXlabsearchenginewillusethesamplingapproachoutlinedinD1.2.fordatacollection.CSVdatawillbecreated
Metadata,supportingmaterial Datawillbenotavailableforreuseoraccessiblebyanyoneoutsideofthe project. The data collectedwill be used for internal analysis toinformthecreationofcurriculum.
Statusandlocationofmetadata
Accompanyingdocumenttoexplaindatastructure.Thiswillnotbemadeopen.
Datasharing
Licensing,dataprotection,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why?
Protectionofpersonaldata
Howwillthedatabeshared? Thedatawillnotbesharedduetorestrictionsontheuseofpersonaldata.
Datarepository ideXlabsearchplatform
DatasetLink Thereisnoexternallink
Archivingandpreservation
Howlongshouldthedatabepreserved?
Untiltheendoftheproject
Approxendvolume Est.1000returns
Whoisresponsibleforthedatamanagementandcuration?
ideXlableaddatamanagementandcuration,otherWP1partnerswillcontribute
Qualityassuranceincludingbackupprocedures
BackeduptoaninternalideXlabrepository
Associatedcostsfordatamanagement
Approx2persondayspermonth.Nootherexternalcosts
Page26of43EDSAGrantAgreementno.643937
4.4 Workpackage2–CurriculaandcoursedevelopmentWP2hascollecteddatafromopenlyavailablesourcesandcreatedsubsetsofthisdatatobeusedinthelearningresourcesproduced.Datahasalsobeencollectedaboutexistingdatasciencecoursesasperearlierrecommendations.
4.4.1 RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU
Table10:RelatedcoursedataregardingsimilarmodulesandtrainingavailableacrosstheEU
Dataset Reference andName
DatasetIdentifier DataScienceCourses
Datasetdescription
Generatedorcollected Collected
Origin Coursewebsites
Scale Notyetknown
Whoisthisusefulfor? Internalusefordevelopmentofcurriculaandlearningmaterials.
SimilarexistingdatasetsNone.Thedatawillprovideauseful resourceaspartof thedemandanalysis.
Standardsandmetadata
Methodology for datacollection/management
Systematic search and reviewof available data science courses. Thesearch terms were Data Science, Big Data, Data Analytics, BusinessAnalytics, Machine Learning, Distributed Computing, AdvancedComputingDataScienceStream,DataAnalyticsstream.
Metadata,supportingmaterial Metadatahasbeenpublishedalongsidethedata
Status and location ofmetadata https://theodi.github.io/data‐science‐courses‐in‐europe‐2016/
Datasharing
Licensing, data protection,ownershipandcopyright ThedataislicensedunderaCreativeCommonsCC‐BY4.0licence
Ifthedatacannotbepublishedopenly,why? Thedataispublishedopenly
Howwillthedatabeshared?GitHub/EDSAwebsite
Datarepository GiHub.AlsoavailableviatheEDSAwebsite
DatasetLinkhttps://theodi.github.io/data‐science‐courses‐in‐europe‐2016/
D5.6UpdatedEDSADataManagementPlanPage27of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Archivingandpreservation
How long should the data bepreserved? Untiltheendoftheproject
Approxendvolume <1GB
Who is responsible for thedata management andcuration?
ODIleaddatamanagementandcuration
Quality assurance includingbackupprocedures
BackeduptoaninternalODIrepository
Associated costs for datamanagement
AspartofthesubcontractingcostsofWP1.Noongoingcosts.
4.4.2 Datasetforcourseexamplesandexercises
Table11:Datasetforcourseexamplesandexercises
DatasetReferenceandName
DatasetIdentifier Using namespace notation to specify R packages: sml::poly4,sml::poly4b, sml::kmeans, sml::seeds, car::Duncan, car::Davis,datasets::car, datasets::HairEyeColor, datasets::Airquality,datasets::swiss,bestGLM::zprostate,MASS::menarche
Datasetdescription
Generatedorcollected Both
Origin Third party R packages students download from CRAN. Some in anauthordevelopedpackagehostedonCRAN
Scale 12smalldatasets.<1MB
Whoisthisusefulfor? Students in the "EssentialsofDataAnalytics andMachineLearning"course.
Similarexistingdatasets DatasetsarearchivedinCRAN.Usedincourseexamplesandexercises.
Standardsandmetadata
Methodologyfordatacollection/management None
Page28of43EDSAGrantAgreementno.643937
Metadata,supportingmaterial Thedatasetswillbeusedwithinlearningactivitiesofferedaspartofthe"EssentialsofDataAnalyticsandMachineLearning"course.TheyarestoredinthesmlRpackage.
Statusandlocationofmetadata
Package documentation (except, currently, for those in the smlpackage)
Datasharing
Licensing,dataprotection,ownershipandcopyright
GNUGPLV3
http://www.gnu.org/licenses/gpl‐3.0.en.html
Ifthedatacannotbepublishedopenly,why? Thedataispublishedopenly
Howwillthedatabeshared? ViaRpackages,searchableonline.
Datarepository CRAN
DatasetLink https://vincentarelbundock.github.io/Rdatasets/datasets.html
Archivingandpreservation
Howlongshouldthedatabepreserved? Aslongastheownersdonotremovethem.Ifthedatasetsarenolonger
accessible,othersimilardatasetswillbeusedinthemodule.
Approxendvolume <1MB
Whoisresponsibleforthedatamanagementandcuration?
Persontyle lead data management and curation, third parties forcollecteddata
Qualityassuranceincludingbackupprocedures RelyingonCRAN
Associatedcostsfordatamanagement None
D5.6UpdatedEDSADataManagementPlanPage29of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
4.4.3 Eventlogfromamunicipalityprocess
Table12:Eventlogfromamunicipalityprocess
DatasetReferenceandName
DatasetIdentifier a07386a5‐7be3‐4367‐9535‐70bc9e77dbe6
Datasetdescription
Generatedorcollected Collected
Origin Dutchmunicipality
Scale 200KB
Whoisthisusefulfor? Usersinterestedinreallifeeventlogs.
Similarexistingdatasets Large collection of real life event logs athttp://data.3tu.nl/repository/collection:event_logs_real
Standardsandmetadata
Methodologyfordatacollection/management Managementthrough3TUdatacentre
Metadata,supportingmaterial Includesnumberoftraces,events,attributes,timespan,etc.
Statusandlocationofmetadata
http://data.3tu.nl/repository/uuid:a07386a5‐7be3‐4367‐9535‐70bc9e77dbe6
Datasharing
Licensing,dataprotection,ownershipandcopyright
Ownlicence(Attribution,non‐commercial)
http://researchdata.4tu.nl/fileadmin/editor_upload/pdf/General_terms_of_use_3TU.Datacentrum.pdf
Ifthedatacannotbepublishedopenly,why?
Thedataisavailablepublicly.Astherearerestrictionsofusewiththelicence,thiscannotbeconsidered‘opendata’
Howwillthedatabeshared? Via3TUDatacentre
Datarepository 3TUDatacentre
DatasetLink http://data.3tu.nl/repository/uuid:a07386a5‐7be3‐4367‐9535‐70bc9e77dbe6
Archivingandpreservation
Howlongshouldthedatabepreserved? pastprojectend
Page30of43EDSAGrantAgreementno.643937
Approxendvolume 200KB
Whoisresponsibleforthedatamanagementandcuration? 3TU
Qualityassuranceincludingbackupprocedures
Reliantonthirdparty.Ifthedatasetbecomesunavailablewewilluseasimilaroneintheonlinemodule.
Associatedcostsfordatamanagement None
4.5 Workpackage3–TrainingdeliveryandlearninganalyticsfeedbackWP3hasstartedcollectingdataonthetrainingdeliveredintheproject–face‐to‐faceandonline‐andwillcontinuetocollectdataasmoretrainingiscreatedanddelivered.
Thisincludesdataoncourseregistration,participationandstudentretentionrate.Weusethisdatatoinformbestpracticesforstudentsandeducators,andtoimprovethecurriculaandcontent.Thisisstillalottobeexploredaroundthelearninganalyticsdata,especiallyaswecontinuetocreatemoreonlinemodules.Differentpartnershavecreatedmodulesusingdifferentsoftware.ForexampleCoursera28,TinCanAPI(xAPI)29,LearningLocker30.
4.5.1 Repository statistics on downloads and views of educationalresources
Table13:Repositorystatisticsondownloadsandviewsofeducationalresources
DatasetReferenceandName
DatasetIdentifier RepositoryStatistics
Datasetdescription
Generatedorcollected Collected
Origin videolectures.net
Scale Viewsandcommentsforeachvideolecture
Whoisthisusefulfor? Internalanalysis,curriculumdevelopment,externaldemandanalysis
28https://www.coursera.org29http://tincanapi.com/30http://learninglocker.net/
D5.6UpdatedEDSADataManagementPlanPage31of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Similarexistingdatasets None.Providesevidenceofresourceusageandbasisfor improvingcurriculum,contentandcoursestructure.
Standardsandmetadata
Methodologyfordatacollection/management CSVisusedforVideolecturesAPI
Metadata,supportingmaterial Videolectures REST api documentation. An MD Readme file isavailablefordownload
Statusandlocationofmetadata https://github.com/innanoval/edsa‐videolectures‐statistics‐dataset‐1/tree/gh‐pages/data
Datasharing
Licensing,dataprotection,ownershipandcopyright ThedataispublishedunderaCC‐BYlicence.
Ifthedatacannotbepublishedopenly,why? N/A
Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables;publishedonGithub
Datarepository Github/videolecturesrepository.Proximitytodatasource.
DatasetLink https://github.com/innanoval/edsa‐videolectures‐statistics‐dataset‐1/tree/gh‐pages/data
Archivingandpreservation
Howlongshouldthedatabepreserved?
thedatawillbeavailableaftertheprojectendsaspartoftheproject'slearningmaterials
Approxendvolume <1GB
Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute
Qualityassuranceincludingbackupprocedures
videolectures ‐ relying on internal quality assurance & back upprocedures
Associatedcostsfordatamanagement Approximately1daypermonthduringtheproject’slifetime
Page32of43EDSAGrantAgreementno.643937
4.5.2 Learning Analytics data generated from the EDSA Online Coursesportal
Table14:LearninganalyticsdatageneratedfromEDSAonlinecoursesportal
DatasetReferenceandName
DatasetIdentifier EDSAOnlineCoursesLA
Datasetdescription
Generatedorcollected Generated
Origin http://courses.edsa‐project.eu
Scale Notyetknown
Whoisthisusefulfor? Courseproducerscangetanunderstandingofhowtheircoursesarebeingused.Learnerscanmonitortheirlearningprogress.
Similarexistingdatasets NotmanyLearningAnalyticsdatasetsarepubliclyavailable.TheOUhas recently published a similar dataset:https://analyse.kmi.open.ac.uk/open_dataset
Standardsandmetadata
Methodologyfordatacollection/management
ThexAPIspecificationisusedforexpressingthedata;theopensourceLearningLockersoftwareisusedforstoringandvisualisingthedata.
Metadata,supportingmaterial Introduction to the xAPI (or Tin Can API):https://tincanapi.com/overview/. Introduction to Learning Locker:https://learninglocker.net
Statusandlocationofmetadata https://tincanapi.com/overview/
https://learninglocker.net
https://alexmikro.github.io/learning‐analytics‐dataset‐from‐the‐edsa‐online‐courses‐portal/
Datasharing
Licensing,dataprotection,ownershipandcopyright
Creative Commons Attribution (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
Ifthedatacannotbepublishedopenly,why? Thedataispublishedopenly.
Howwillthedatabeshared? ViatheEDSAwebsite/Github
Datarepository WehavesetupadedicatedEDSALearningLocker.Thiswaschosenforthereasonsoutlinedinhttps://learninglocker.net/benefits/
D5.6UpdatedEDSADataManagementPlanPage33of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
DatasetLink https://alexmikro.github.io/learning‐analytics‐dataset‐from‐the‐edsa‐online‐courses‐portal/
Archivingandpreservation
Howlongshouldthedatabepreserved? Atleastuntiltheendofproject
Approxendvolume Notyetknown
Whoisresponsibleforthedatamanagementandcuration? OUleaddatamanagementandcuration.
Qualityassuranceincludingbackupprocedures
RelyingonthebackupproceduresoftheOU,asthedatasetishostedonanOUserver.
Associatedcostsfordatamanagement
Serverstoragehasalreadybeenpurchased.EffortforanalysingthedatahasbeenallocatedinTask3.4.
4.5.3 InternallogofeLearningsystems
Table15:Internallogofelearningsystems
Datasetdescription
Generatedorcollected Collected
Origin videolectures.net
Scale 20.000 videos, 17.431 lectures, 12.998 authors, 952 events, 579categories
Whoisthisusefulfor? Internaldemandanalysis
Similarexistingdatasets None.Providesevidenceofresourceusageandbasisforimprovingcurriculum,contentandcoursestructure.
Standardsandmetadata
Methodologyfordatacollection/management JSONisusedforVideolecturesAPI
Metadata,supportingmaterial VideolecturesRESTapidocumentation
Statusandlocationofmetadata N/A
Page34of43EDSAGrantAgreementno.643937
Datasharing
Licensing,dataprotection,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why?
Privacy.Datarequiresanonymisationand/oraggregation,andatthemomenttheusecaseforanonymiseddataisnotclear.
Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables
Datarepository videolecturesrepository.Proximitytodatasource.
DatasetLink Thereisnoexternallink
Archivingandpreservation
Howlongshouldthedatabepreserved? atleastuntiltheendofproject
Approxendvolume N/A
Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute
Qualityassuranceincludingbackupprocedures
Videolectures ‐ relying on internal quality assurance & back upprocedures
Associatedcostsfordatamanagement N/A
4.5.4 Statisticsofcourseregistration,participationandcompletion
Table16:Statisticsofcourseregistration,participationandcompletion
DatasetReferenceandName
DatasetIdentifier StatisticsForCourses
Datasetdescription
Generatedorcollected Collected
Origin videolectures.net
Scale Forvideolectures‐availablepervideolecture,perviewer
Whoisthisusefulfor? Internaldemandanalysis
D5.6UpdatedEDSADataManagementPlanPage35of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Similarexistingdatasets None.Providesbasis for improvingcurriculum,contentandcoursestructure.
Standardsandmetadata
Methodologyfordatacollection/management JSONisusedforVideolecturesAPI
Metadata,supportingmaterial VideolecturesRESTapidocumentation
Statusandlocationofmetadata N/A
Datasharing
Licensing,dataprotection,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why?
Privacy. Data requires anonymisation and/or aggregation. It isintendedthatthisdatawillbepublishedbeforetheendoftheproject.
Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables
Datarepository videolecturesrepository.Proximitytodatasource.
DatasetLink N/A
Archivingandpreservation
Howlongshouldthedatabepreserved? atleastuntiltheendofproject
Approxendvolume <1GB
Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute
Qualityassuranceincludingbackupprocedures
videolectures ‐ relying on internal quality assurance & back upprocedures
Associatedcostsfordatamanagement N/A
Page36of43EDSAGrantAgreementno.643937
4.5.5 Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources
Table17:Aggregatedstatisticsofengagementwiththedevelopedcoursesandeducationalresources
DatasetReferenceandName
DatasetIdentifier AggregatedStatistics
Datasetdescription
Generatedorcollected Generated
Origin videolectures.net
Scale Forvideolectures‐availablepervideolecture,perviewer
Whoisthisusefulfor? Internalanalysis,demandanalysis
Similarexistingdatasets None. Provides evidence of adoption and basis for improvingcurriculum,contentandcoursestructure.
Standardsandmetadata
Methodologyfordatacollection/management JSONisusedforVideolecturesAPI
Metadata,supportingmaterial VideolecturesRESTapidocumentation
Statusandlocationofmetadata N/A
Datasharing
Licensing,dataprotection,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why?
Privacy. Data that does not contain privacy issues might bepublishable
Howwillthedatabeshared? Availabletoseeatvideolectureswebsite;describedaspartofWP3deliverables
Datarepository videolecturesrepository.Proximitytodatasource.
DatasetLink N/A
Archivingandpreservation
D5.6UpdatedEDSADataManagementPlanPage37of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Howlongshouldthedatabepreserved? Atleastuntiltheendofproject
Approxendvolume <1GB
Whoisresponsibleforthedatamanagementandcuration? JSIleaddatamanagementandcuration.OUcontribute
Qualityassuranceincludingbackupprocedures
Videolectures ‐ relying on internal quality assurance & back upprocedures
Associatedcostsfordatamanagement Approximately1dayofeffortpermonth
4.5.6 Recorded behavior of students following the first session of theprocessminingMOOC
Table18:RecordedbehaviourofstudentsfollowingthefirstsessionoftheprocessminingMOOC
DatasetReferenceandName
DatasetIdentifier CourseraMOOCprocmin001
Datasetdescription
Generatedorcollected Collected
Origin coursera.org
Scale severallargetables
Whoisthisusefulfor? learninganalyticswithinEDSA
Similarexistingdatasets EveryCourseracoursehasthisdatarecorded
Standardsandmetadata
Methodologyfordatacollection/management DatacollectionismanagedbyCoursera
Metadata,supportingmaterial Thereisnoexternallinktothemetadata
Statusandlocationofmetadata Thereisnoexternallinktothemetadata
Datasharing
Licensing,dataprotection,ownershipandcopyright
RawdataismanagedbyTU/eandcannotbesharedduetoCourserarestrictionsofuse.
Page38of43EDSAGrantAgreementno.643937
Ifthedatacannotbepublishedopenly,why? Restrictionsofusefromthedataprovider
Howwillthedatabeshared? Thisdatawillnotbepublishedopenly
Datarepository ThedataiscollectedbyandstoredonaCourserarepository.
DatasetLink Thereisnoexternallinktothedata.
Archivingandpreservation
Howlongshouldthedatabepreserved? N/A
Approxendvolume Around1GB
Whoisresponsibleforthedatamanagementandcuration? JoosBuijs
Qualityassuranceincludingbackupprocedures N/A
Associatedcostsfordatamanagement N/A
4.6 Workpackage4–DisseminationandcommunitybuildingWP4hascontinuedtocollectdatafromwebserverlogsandGoogleanalyticsfortheprojectwebsite,aswell as socialmedia engagementdata fromTwitterandLinkedIn.This allows formonitoringof theprojects community building and dissemination. Aggregated statistics of the networking andengagementdatawillbeproducedandincludedinD4.4andD4.5.
4.6.1 WebserverlogsandGoogleanalyticsofprojectwebsiteaccess
Table19:WebserverlogsandGoogleanalyticsofprojectwebsiteaccess
DatasetReferenceandName
DatasetIdentifier WebsiteAnalytics
Datasetdescription
Generatedorcollected Collected
Origin http://edsa‐project.eu
Scale 1website
D5.6UpdatedEDSADataManagementPlanPage39of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Whoisthisusefulfor? Internal analysis for dissemination and community analysis.Secondaryuseforimplicitdemandanalysis.
Similarexistingdatasets None. Provides evidence of engagement and basis for UXimprovement.
Standardsandmetadata
Methodologyfordatacollection/management
Quantitative recording of website traffic via Google Analyticsdashboard,analysedusingavarietyofanalytictools.
Metadata,supportingmaterial Sessions,Pageviews,Demographics,UserFlow,Bouncerate.
Statusandlocationofmetadata There isnometadatapublicallyavailable as thedata isnotopenlypublishedAll sections thatwillbeusedarewithinhttps://analytics.google.com/
Datasharing
Licensing,dataprotection,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why?
User privacy. The data can be aggregated and published under anopenlicence.Ajudgementcallwillhavetobemadeonwhetherthisisworthit.
Howwillthedatabeshared? AnalyseddatawillbemadeavailablethroughoutdeliverablereportsinWP4.
Datarepository InternalinstitutionalSoton/OUrepositories
DatasetLink Thereisnoexternallink
Archivingandpreservation
Howlongshouldthedatabepreserved? Atleastuntiltheendofproject
Approxendvolume <1GB
Whoisresponsibleforthedatamanagementandcuration? OUleaddatamanagementandcuration.Sotoncontribute
Qualityassuranceincludingbackupprocedures Backedupremotely
Associatedcostsfordatamanagement Freestorage.0.5daypermonth
Page40of43EDSAGrantAgreementno.643937
4.6.2 Generatedsocialmediaengagementdata
Table20:Generatedsocialmediaengagementdata
DatasetReferenceandName
DatasetIdentifier SocialMediaEngagements
Datasetdescription
Generatedorcollected Collected
Origin Twitter
Scale 1TwitterAccount
Whoisthisusefulfor? Internalanalysisforcommunitystrengthandprojectdissemination.
Similarexistingdatasets None that relate to EDSA. Provides evidence for engagement withproject, effectiveness of dissemination activities. Provides basis forunderstandingwhatcontentusersfindmostengaging.
Standardsandmetadata
Methodologyfordatacollection/management Regularaccessofdatafromanalytics.twitter.com
Metadata,supportingmaterial Tweets,Impressions,ProfileVisits,Followers,Mentions
Statusandlocationofmetadata https://analytics.twitter.com/user/edsa_project/home
Datasharing
Licensing,dataprotection,ownershipandcopyright
Datawillbelicensedincompliancewitheachsocialnetwork'stermsandconditions
Ifthedatacannotbepublishedopenly,why?
Datasharingneedstocomplywithindividualsitelicenses.Howeverthemajorityofsocialnetworksdonotpermitcollection,harvestingandrepublicationofdata
Howwillthedatabeshared? DashboardonEDSAwebsite.DeliverablereportsinWP4.
Datarepository InternalinstitutionalSotonrepositories
DatasetLink Thereisnoexternallinkasthetermsandconditionshavenotyetbeenchecked.
Archivingandpreservation
Howlongshouldthedatabepreserved? Untiltheendoftheproject
D5.6UpdatedEDSADataManagementPlanPage41of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Approxendvolume <1GB
Whoisresponsibleforthedatamanagementandcuration? Sotonleaddatamanagementandcuration.
Qualityassuranceincludingbackupprocedures Backedupremotely
Associatedcostsfordatamanagement Freestorage.1daypermonth
4.7 Workpackage5–ExploitationWP5willgenerateanon‐goinglistofestablishedcollaborationinitiatives,institutionsbenefitingfromtheproject and geographical regions using the project’s results. TheEDSARegister is an additionaldatasetthatcomesunderthisworkpackage.
4.7.1 Listofprojectexploitationresults ‐collaborations,institutionalandgeographicalbeneficiaries
Table21:Listofprojectexploitationresults‐collaborations,institutionalandgeographicalbeneficiaries
DatasetReferenceandName
DatasetIdentifier ProjectExploitation
managementdescription
Generatedorcollected Generated
Origin Projectpartners
Scale Variable
Whoisthisusefulfor? Internalanalysisforresultstobeexploitedandtargets
Similarexistingdatasets None.Providesdataondisseminationactivity,networkandresults.
Standardsandmetadata
Methodologyfordatacollection/management Reportdetailingresultsfrominterviewsandexploitationactivities
Metadata,supportingmaterial Thisdatawillbeinternalonly
Statusandlocationofmetadata Thisdatawillbeinternalonly
Datasharing
Page42of43EDSAGrantAgreementno.643937
Licensing,dataprotection,ownershipandcopyright
Rawdatawillbeownedbytheprojectandunlicensed.Itwillnotbeavailableforreuse.
Ifthedatacannotbepublishedopenly,why? Confidentiality
Howwillthedatabeshared? DeliverablereportsinWP5.
Datarepository Googledocsshareddocument
DatasetLink Thisdatawillbeinternalonly
Archivingandpreservation
Howlongshouldthedatabepreserved? Untiltheendoftheproject
Approxendvolume <500MB
Whoisresponsibleforthedatamanagementandcuration? ideXlableaddatamanagementcuration
Qualityassuranceincludingbackupprocedures Backedupremotely
Associatedcostsfordatamanagement Freestorage.1daypermonth
4.7.2 TheEDSARegister
Table22:TheEDSARegister
DatasetReferenceandName
DatasetIdentifier EDSARegister
Datasetdescription
Generatedorcollected Generated
Origin Projectpartners
Scale <500KB
Whoisthisusefulfor?Anyone interested in understanding the datasets used within theEDSAproject.Internalmanagementtool.
Similarexistingdatasets None.
Standardsandmetadata
D5.6UpdatedEDSADataManagementPlanPage43of43
2016©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.
Methodology for datacollection/management
Project partners update every three months until the end of theproject. ODI responsible for conversion to CSV and publication asopendata.
Metadata,supportingmaterialAREADME.mdfileisavailabledetailingthedatastructureandbasicusage.
Statusandlocationofmetadata https://theodi.github.io/european‐data‐science‐academy‐register/
Datasharing
Licensing, data protection,ownershipandcopyright ThisdatasetispublishedonGithub,underaCC‐BYlicence.
Ifthedatacannotbepublishedopenly,why? N/A
Howwillthedatabeshared?Via Github and via the EDSA website (http://edsa‐project.eu/resources/datasets/)
Datarepository Github
DatasetLink https://theodi.github.io/european‐data‐science‐academy‐register/
Archivingandpreservation
How long should the data bepreserved?
As long as Github exists as a minimum. Beyond that a valuejudgementwouldhavetobemade.
Approxendvolume <500KB
Whoisresponsibleforthedatamanagementandcuration?
ODI leaddatamanagementandcuration,otherWP1partnerswillcontribute
Quality assurance includingbackupprocedures Storedinexternalrepositories‐EDSAwebsiteandGithub
Associated costs for datamanagement
Stored in external repositories ‐ EDSA website and Github;approximately2dayspermontheffortformaintenance.