Criteria and evaluation of research data repository ... · repository platforms @ the University of...
Transcript of Criteria and evaluation of research data repository ... · repository platforms @ the University of...
Criteriaandevaluationofresearchdatarepositoryplatforms@theUniversityofPretoria,SouthAfrica
PresentedbyMr.JohannvanWyk &Mr.IsakvanderWaltLibraryServicesUniversityofPretoria
ProjectteamUPIT:Karin,Yzelle,HermanUPLibrary:Isak,Johann,Heila
Agenda
• ProjectScope&projectteam• Researchdatalifecycle• E-ResearchFramework• ProductInvestigation• Criteria&evaluation• Recommendations• NextSteps• Documentsproduced
ProjectScope
Thescopeoftheprojectwastoevaluateproducts(commercialandopensource)whichcouldbeutilisedasaResearchDataRepositoryPlatformaspartofatotalResearchDataManagement(RDM)solutionatUP.
AtotalRDMsolutionincludeallphasesoftheResearchdatalifecycle,butfortherepositorysolution,thefocuswasthusonidentifyingapotentialsolutionforthe“Dissemination”phaseoftheresearchdatalifecycle.
RDMRepositoryProjectTeamBusinessSponsor– ProfStephanieBurton(VP:Research)ITSSponsor– AndreKleynhans (DeputyDirector:ITS)
ProjectTeammembers:ITSProjectManagerandBusinessAnalyst– KarinMeyerITSInfrastructureArchitect - DrYzelleRoetsITSeResearchSupportManager– HermanJacobs
LibraryServices:SeniorITConsultant– IsakvanderWaltLibraryServices:AssistantDirector:RDM– JohannvanWykLibraryServices:DeputyDirector:StrategicInnovation– DrHeilaPienaar
PROCESSESwithintheRESEARCHDATALIFECYCLE
CreatingData
ProcessingData
AnalysingData
ResearchDataLifeCycle
Re-usingData
PreservingData
GivingAccesstoData
ProductInvestigationMethodologyFinalisationofproductevaluationcriteria• Consultedwithvariousstakeholders
• LibraryandITSstaff• ExternalstakeholdersattheNEDICCworkshopheldattheCSIR• PeerUniversities
• Utilisedvariousselectioncriteria fromotherinstitutionse.g.LeedsUniversity,TexasDigitalLibraryandtheRDARPRDIGMatrix(http://tinyurl.com/RPRD-matrix)selectioncriteriaasabasisandadapteditaccordingtoUPspecificrequirements.
ProductShortListingProductswereshortlistedbasedonthefollowing:• Productscanofproductsbeingusedinternationally,and• MostcommonlyusedproductsatuniversitiessimilartoUP(sizeandresearchactivity).
ProductEvaluation• UP’sformalRequestForInformation(RFI)processwasfollowed• Productevaluationcriterialistwascompiledandsendtoshortlistedvendorstogether
withstandardRFIdocumentation• Therequestedinformationwasreceivedfromthevendorsandpreparedforscoring,and• Productswerescoredandevaluated.
EvaluationCriteria
• Functional/Businesscriteria:DepositandUpload;Re-Usability;IdentityandAccessManagement;Reporting;Discovery;Preservation
• NonFunctional:RepositoryArchitecture;DataManagement;DataGovernance
• Technicalaspects:Back-endManagement;Integration;Infrastructure
• Vendorspecific:Support,Training,UsageofProduct
• Performance requirements• Integration requirements
UniqueID RequirementDescription Priority
DU-1 Offercustomisablemetadataschemaasperresearchareaordiscipline(includingmandatoryfields). H
DU-2 Offertheindexingofmetadata. H
DU-3 Offersufficientsupportforgeospatialandjournalarticlemetadata.Supportassociationofsingleormultiplefileswithonemetadatarecord. H
DU-4 Uploadandstoremetadataatadataobjectlevel,whereadataobjectisafolderthatcontainsoneormorefiles. M
DU-5 Supportmultiplefiletypesandformatsofdata,e.g.MSExcel2007,MySQLdatabase,rawdatafilefromaCampbellCR10datalogger,anymultimedia,etc. H
DU-6 Thesystemshouldhaveasimpleprocessforuploadinglarge(multi-TB)datasets,potentiallyconsistingofthousandsoffiles. Musthavetheabilitytouploadlargedatasets(e.g.2MB,2GB,1TB). H
DU-7 Supportcontrolledlistsagainstsomemetadatafields,eitherheldlocallyordrawnfromanexternalsourcee.g.Subjectvocabularies. H
DU-8 Supportcustomisationofout-of-the-boxhelptextandprovidecontextsensitivefeedbackforthedepositore.g.Highlightmissingmetadatafields,fileuploadfailurealert. M
DU-9 Accommodateworkflowwheredataneedstobedestructedwithanapprovalprocessandaudittrail. L
DU-10 Researchersmustbeabletosubmitdatatorepositorythemselves. H
DU-11 Processofsubmittingdatatoarepositoryfromothersystems/instruments. H
DU-12 Abilitytobatchuploaddataintoarepository. H
DU-13 Thirdpartymustbeabletouploaddatasetonbehalfofresearcher. H
DU-14 Supportgeneration/labellingofpersistentuniqueidentifiersfordatasetsincludingDOIs. H
DU-15 Abilitytosupportthesubmissionofdataatanyresearchstage(i.e.InitialData,WorkingData,FinalDataStages)totherepository. M
DU-16 Explainhowuserinterfacecustomisationisachieved. H
DU-17 Out-of-the-boxuserinterfaceintuitive(easytouse)tousers. M
DU-18 Out-of-the-boxuserinterfacemeetsaccessibilityrequirements,e.g.W3CWCAG1. H
DU-19 AssignmentofIntellectualProperty(IP)rightsandmultiplecontentlicensingoptionswithtermsandconditionsexposedclearly humanandmachinere-usersispossible,suchascopyrightandcreativecommons(CC). H
Table 1: Deposit and Upload functional criteria
ShortlistedProducts&RFIFeedback
ProductVendor/ImplementationPartner
RFIFeedback
DSpace Atmire Receivedinformationoncriterialist, proposedimplementationoptionsanditsassociatedcost.
Figshare DigitalScience Receivedinformationoncriterialist, proposedimplementationoptionsanditsassociatedcost.
Islandora Discoverygarden Receivedinformationoncriterialist, proposedimplementationoptionsanditsassociatedcost.
Dataverse HarvardUniversity Receivedinsufficientinformationoncriterialist,implementationoptionsandcost.
PURR PurdueUniversity FailedtorespondtoRFI.
RedboxQueenslandCyberInfrastructureFoundation(QCIF)
Receivedinformationoncriterialist,butRedboxisonlyametadatarepositoryandnotadatarepository.
Implementationoptionswithmostimportantadvantages/disadvantages– Option1
Option Advantages Disadvantages
Option1- Locallyhosted(bothapplicationandstoragearelocallyhostedatUP)
• UPnotdependentoninternetforaccesstoapplication
• UPabletomanageown data• Compliancetolegalissuesregardingdata,i.e.POPIAct
• Riskofsecurityislower(controlownstorage)
• Resourcestobeprovided(includesInfrastructureandHumanresourcesforapplication andstorage)whichincreasecost
• Requiredskillsset(e.g.webskills)islimitedornotcurrentlyavailableinITS
• UPbandwidthwillcauserestrictions,i.e.indexingofsite
• Opensourceproduct- nolegalentity/responsiblecompanyforassistance,support,enhancements,newreleases,etc.
Implementationoptionswithmostimportantadvantages/disadvantages– Option2
Option Advantages Disadvantages
Option2- Hybrid(applicationiscloudhosted,whilethestorageislocallyhosted)
• Collaborationwithotherinstitutionsinfutureiseasier
• Noadditionalresources(HRorinfrastructure)arerequiredfortheapplication
• Legalentity existi.e..theapplication
• Geographicredundancy• HighavailabilityontheUPfrontend– nobandwidthconstraints
• Meta dataaswellasdatawillbealwaysavailable,searchableandabletobeindexed
• UPwillbeincontroloftheirIP(controlownstorage)
• Riskofsecuritywillbelower(controlownstorage)
• ResourcestobeprovidedwhichincludesinfrastructureandhumanresourcesforstorageaswellasRD,backups,accesscontrol,cooling,etc.
• Requiredskillsset(e.g.webskills)islimitedornotcurrentlyavailableinITS
• Indexing ofsitedependentonUP’sbandwidth
Implementationoptionswithmostimportantadvantages/disadvantages– Option3
Option Advantages Disadvantages
Option3- Fullycloud-based(boththeapplicationandstoragearecloudhostedthroughthevendor)
• Collaborationwithotherinstitutionsinfutureiseasier
• Noadditionalresources(HRorinfrastructure)arerequiredfortheapplication
• Legalentity existi.e.theapplication
• Geographicredundancy• HighavailabilityontheUPfrontend– nobandwidthconstraints
• Meta dataaswellasdatawillbealwaysavailable,searchableandabletobeindexed
• UPwillbeincontroloftheirIP(controlownstorage)
• Riskofsecuritywillbelower(controlownstorage)
• UPdoesnothavecontrolofIP(governanceandaccessibilitytoUP’sdataisinthehandsofthevendor)
• PossiblefuturesanctionsagainstsomecountriesmayresultinsomeusersfromotherpartsoftheworldnotbeingabletoreachUP’srepository
• GrowingrunningcostasUPwillhavetopayforup-anddownloading aswellasstorageofdata
ProductEvaluationResults
Criteria Figshare Islandora DSpace
BEEEEAllproductsandassociatedvendors/implementationpartnersareinternationally based,
thereforenoweightwasassignedinthescoringexercise.RequirementsCriteria(inclfunctional,non-functional,vendor)
85%fit 96%fit 65%fit
Pricing
Preferentialcriteria:HybridOption(option2)
100%Fit10%fit– onlyavailable throughhugecustomdevelopment
whichposeshugeriskstoUP.0%Fit
Preferentialcriteria:Consortialpricing
100%Fit 0%fit 0%fit
CONFIDENTIAL
RecommendationsThefollowingisrecommendedforimplementingofaResearchDataRepositoryplatform)solutionatUP:• Figshare should be considered as the product of choice• Implement the Hybrid implementation option with the application
being cloud hosted and a local storage of 20Tb to start with• Local storage can be supplemented in future with Cloud storage• Storage should be investigated in line with the total eResearch
initiative and framework of UP• A business owner needs to be identified to be responsible for a total
RDM implementation• Implementation of a Research Data Repository platform will require a
significant increase in Human and Infrastructure Resource components,and
• Consortial pricing can be kept in mind for the future and was not usedas a determining selection criterion.
NextSteps
• AppointaBusinessowner(s)foratotalRDMsolution• InvestigatetoolsthatcansupporttheResearch-in-Processphase,e.g.myTardis
• Finalisestoragesolution(eg.AfricanResearchCloud)• BusinessCasetosecureresources(financialandhuman)
• Implementationofrepositorysolution• Training ofresearchers&librarystaff
Gapanalysis:Figshare (obtained0onthesecriteria)
Functionalcriteria:• Mustbeabletochangedataformats,althoughmostformatsareagnostic.• Auto-generatepreservationmetadata,e.g.PREMIS.• Abilitytomigratefilesindatasetstonew/otherformatsovertime.• BecompliantwiththeOAIS(OpenArchivalInformationSystem)referencemodel.
Non-functionalcriteria:Offerde-duplicationofdata,metadata
Disadvantages:• TheannualsubscriptionfeeforFigshare isrelativelyhigh• Customisationisnotpossibleasitisaproprietaryproduct• Theproprietaryproductaspectalsolimitsthelookandfeelcustomisationofthe
producttoreflectmoreofUP’sfootprint,and• NolocalsupportexistswithinSouthAfrica.
Documents• UPResearchDataRepositoryEvaluation• UPResearchDataManagementBusinessRequirementsSpecification
• Executivesummary• RDMProjectProgressFeedback• ContextDiagramforRDM• Islandora,Figshare,Redbox,DSpace,Dataverse,PURRrequirementscriteriafeedbackdocuments