April 1, 2016 DIGHUMLAB – A PROFILE PAPER€¦ · 1.1 2016-02-02 BCD Draft for management group...
Transcript of April 1, 2016 DIGHUMLAB – A PROFILE PAPER€¦ · 1.1 2016-02-02 BCD Draft for management group...
April 1, 2016
DIGHUMLAB–APROFILEPAPER
ByBirteChristensen-Dalsgaard
DIGHUMLAB
Draftversion2.0
April1,2016
Thisisadraftversionofaprofilepaper,whichisintendedasabasisforadiscussionwithstakeholders.
ThepapercontainsmyreflectionsformulatedduringmyfirstthreemonthsasprojectmanagerforDIGHUMLAB.Theobjectiveistwo-fold:toidentifyissues,whichneedstobeclarifiedinrelationtotheprofileofDIGHUMLAB,andtoestablishacommonframeworkforthediscussionsbetweenthedigitalhumanitiescommunityandmyself.
ii
VersionhistoryVersionno. Data Author Status Changes1.1 2016-02-02 BCD Draftfor
managementgroup
1.2 2016-02-23 BCD Draftforcommentbymanagementgroupandsecretariesfordeans
MajorchangesresultingfromdiscussionatmanagementgroupmeetingFeb8.
1.3 2016-03-01 BCD Drafttobeusedatadvisoryboardmeeting
Minorchangesbasedoncomments
2.0 2016-04-01 BCD Draftforgeneraldiscussion
MinorchangeseaterpresentationfortheThemeleaders,the AdvisoryBoardandtheSteeringGroup
iii
INDHOLDIntroduction...........................................................................................................................................................................1Resumé.....................................................................................................................................................................................2Vision,impactandgovernance......................................................................................................................................3Impact..................................................................................................................................................................................4Governance........................................................................................................................................................................4
Thepoliticalandscholarlylandscape........................................................................................................................5DigitalHumanitiesandthetechnologicalevolution.......................................................................................6Crowdsourcing/citizenscience................................................................................................................................7
Datafordigitalhumanities..............................................................................................................................................8Serviceprofile....................................................................................................................................................................10Completenessoftools................................................................................................................................................10Services–internationalinspiration....................................................................................................................11
DIGHUMLAB,adigitalecosystem..............................................................................................................................13DIGHUMLABterminology........................................................................................................................................15
ExistingServicesinDIGHUMLAB..............................................................................................................................15DIGHUMLABtools.......................................................................................................................................................15DIGHUMLAB-Datasets...............................................................................................................................................17Otherdigitalresources..............................................................................................................................................18
References:..........................................................................................................................................................................20
iv
1
INTRODUCTIONDigitalhumanitiesisanumbrellatermfortheories,methodologies,andpracticesrelatedtohumanitiesscholarshipthatusethedigitalcomputerasanintegratedandessentialpartofitsresearchandteachingactivities.Thecomputercanbeusedforestablishing,finding,collecting,andpreservingmaterialtostudy,asanobjectofstudyinitsownright,asananalyticaltool,orforcollaborating,andfordisseminatingresults.,NielsBrügger,2016
DIGHUMLABistheinfrastructuresupportingdigitalhumanitiesinDenmark1:
ThevisionforDIGHUMLABistorejuvenatefieldsofresearchwithinthehumanitiesandsocialsciencesthrough:
• Broadaccesstodigitalsourcesandresearchdata• Developmentofsoftware-supportedanalysismethods• Collaborativeformsofworkandnewresearchconcepts• Internationalisationofclassicspecialistresearchskillsandemerginginterdisciplinary
fieldsofresearch
DIGHUMLABwillenhanceandfacilitatedigitalhumanitiesinDanishresearch,therebycontributingtogreaterinterdisciplinarycooperation,widespreadknowledgetransferandglobalorientationandincreasedinternationalisationofbothresearchandeducation.
Thisformulationisverybroadand,aswillbediscussedlater,wouldbenefitfromareformulationandfocus.Thereformulationwillbeaprocessandthepresentreportisafirststeptowardsobtaininggreaterclarityofandagreementonthebreathanddepthofactivities.
Theapproachhasbeenaliteraturestudynotonlyindigitalhumanitiesbutalsoinneighbouringfieldslikedigitalscholarship,e-science,digitalcommonsandopenscience,todiscusswithstakeholdersandtolookintothetechnologicaltrendsandhowtheyinfluencescience.Inasecondsteplabswillbevisitedandexperiencefromthesewillinfluencethefuturedevelopment.
Manybooksandarticlesexistonthetopicofdigitalhumanitiesandthisreportisaresultofreadingsomeofthem.Thereportdoesnotaimtobeneitherareviewofexistingdebatesondigitalhumanitiesnordoesitintendtobeascientificpaper.AllclaimsmadeinthisreporthasbeenfoundinwhatIconsiderwellestablishedpublications;theresponsibilityofputtingthedifferentaspectstogetherismine.
ThisreportwillintroduceandcovertheimportantaspectsneededtodescribeDIGHUMLAB.Amoreindepthversion,wheresomeofthepointsareexpandedandillustrated,willbepresentedlaterinaseparatedocument.
1 http://dighumlab.com/about/vision-mission-goals/
2
RESUMÉDIGHUMLABisapartnershipbetweeninstitutionssharingacommonvisionofstimulatingresearchandeducationindigitalhumanitiesandarepreparedtosharethecostofmaintainingaplatformsupportingthisvision.
DIGHUMLABisadigitalecosystemsupportingdigitalhumanities.Itconsistsofa:
• Technicalandcontentinfrastructureofdigitalobjects,softwareandcapturefacilitiesconnectedviacommonstandardsforcommunicationandclassification
• Supportfacilitiesbasedontutorials,training,• Sharingofpoliciesandpracticese.g.viaexpertknowledgenetworksand • Communityactivitiesstimulatingthedialogwithanduptakeofnewresearchquestions–
aswellasbroaderinvolvement(likecitizenscience)
Allelementsarepresentedinacontext.Thisisachievedthroughlinksbetweentheobjectsand/orserviceelementsofDIGHUMLABandtheworkofresearchteamsinDenmark.Theresearchteamsbehindthepublishedcasestudiesortheresearchactivitieshaveagreedtoassistotherusersonhowtoproceedorgetstartedviaacommonsupportinfrastructure.
TheprovenanceoftoolsandservicesreferredtoaspartoftheDIGHUMLABecosystemvaries.Somearedevelopedandmaintainedbythepartners,othersdevelopandmaintainedaspartofDIGHUMLABandfinally,somearebothdevelopedandmaintainedasDIGHUMLAB.Asanexample,librariesmaintainmuchofthecontentandmanytoolsaredevelopedandmaintainedbyresearchcommunitiesorcommercialcompanies.
ThismixofresponsibilitiesneedstobereflectedinthesustainabilitymodelforDIGHUMLAB.
TheroleofDIGHUMLABasatoolandcontentpromoter/provideristoensurearelevant,adequateselectionsetinacontext.Inparticular:
• ParticipateinandsupplementthenationalandinternationaldevelopmentinselectedareasdevelopingmissingcomponentsofspecialinterestfortheDanishcommunity.
• PromoteandmobilisingtheworkfromtheinternationalcommunitiesintoaDanishcontext–e.g.viathecasestories.
• Createguidesandtutorialstotoolsandcontent,whichareusedbystudentsandresearchersattheDanishUniversities.
• Worktowardscreatingarichercontentbaseforthedigitalhumanisticscholarsthroughexposingand/orremovingbarriersforusingexistingdigitalmaterialresidinginarchives,museumsandothercontentcollectinginstitutions.
Aswithbiologicalecosystems,thedigitalecosystemneedstoadapttoexternalforceslikethechangingdirectionsoftheresearchactivities,newstrategicprioritiesoftheuniversitiesandanewpoliticalsituation.ThepresentcontentandtoolselectionofDIGHUMLABreflectstheresearchactivitiesofthepresentmembersofDIGHUMLAB.Asmanyhavepointedout,therearemanyareasnotaddressedbyDIGHUMLABsuchasresearcharoundsmartcities,researchindigitalhistory,technologyenhancedlearning,etc..
DIGHUMLABwillneedtodevelopmechanismforbothevaluateexistingactivitiesaroundtools,contentandservicestoensureDIGHUMLABrepresentstheneedsoftheusercommunityand
3
DIGHUMLABneedstostimulatethedevelopmentnewideas.Thedecisiononactivitiestobeterminatedorinitiatedlieswiththesteeringgroup.
VISION,IMPACTANDGOVERNANCEItisrarethatpeopleopenlywriteabouttheirfailures;fortunatelyQuinnDombrowskitheprojectmanagerofaveryambitiousinfrastructureprojectinmanywayssimilartoDIGHUMLAB,hasdoneso.
Inaresentpaper“WhatEverHappenedtoProjectBamboo”(Dombrowski2014)heanalysesboththedifferentphasesofthehumanisticcyber-infrastructureinitiativeBambooandofferssuggestionstothecauseofitstermination.Belowaresomequotesfromhisconclusions:
PriorworkonsocialscienceinfrastructuredevelopmentsuggeststhatBamboo’smodeofengagement—bringingtogetherpeoplefromthescholarly,technology,andlibrarycommunitiesafterBamboohadaconceptualandtechnicaltrajectory,whilenonethelessexpecting‘participatorydesign’—wouldbeasourceoftension.Indeed,thewiderangeofresponsestotheinitialtechnology-orientedproposalputBambooinabind.Technologistsandsomelibrarianstendedtoseeitasimportantandnecessary,whilemanyscholarsfeltthattheirneedslayelsewhereentirely.Changingscholars’mindswouldnotbequick;asnotedinRibesandBaker(2007),‘conceptualinnovationisanextendedprocess:onecannotsimplymakeclaimsabouttheimportanceof. . . [e.g.cyberinfrastructure]andexpectimmediatemeaningfulcommunityuptake’.
Andlater:FromtheearlyplanningworkshopstotheMellonFoundation’srejectionoftheproject’sfinalproposalattempt,Bamboowasdoggedbyitsreluctanceand/orinabilitytoconcretelydefineitself.Intheearlydays,avoidingaconcretedefinitionwasmotivatedbyadesirefortheprojecttoremainflexibleandresponsivetoitscommunity.
Andfinally:Perhaps,thegreatestimpedimenttoBamboo’ssuccesswasthelackofasharedvisionamongprojectleaders,developmentteams,andcommunicationsstaff.Inthebeginning,Bamboohadmulti-universitycross-professionalteamswhosemembersfacedchallengesincommunicationandculturebuthelpedoneanotherunderstandBamboo’sgoalsinmorenuancedways.Duringthedevelopmentphase,teamswereformedonthebasisofprofessionandinstitution,eachoneworkingaccordingtotheirownstatusquo,withlittleconnectiontoabiggerpicture.TheBambooplanningprojectaskedparticipants‘what’sinitforyou?’—animportantconsiderationoftenoverlookedinconsortialefforts.Withoutasharedvisiontocounterbalancethepullofself-interest,acomplexmulti-facetedprojectlikeBamboobecomeslittlemorethanafundingumbrellaforindividualinitiatives.
LearningfromtheexperienceofBamboo,DIGHUMLABshouldmakesureto
- Haveaclearvision- Haveacleardefinitionofitsgoalsandexpectations- Haveaclearmandate,and- Haveacontinuousdialogbetweenstakeholders–andlistentoeachother
Theresponsetothesepointswillbeverymuchinfocusthecoming6months.
4
IMPACTDigitalhumanitiesisarelativelynewandagrowingdiscipline,whichwillrequireinvestments–inatimeofcutsattheuniversitiesandresearchlibrariesandthememoryinstitutions.DIGHUMLABthereforeneedstoarticulateitspotentialintermsofpotentialsavings(orsmallerinvestments)andhowitcontributestosocialimpact.
ThecollaborativeeffortofDIGHUMLABwillservetomaximisetherelevanceandreturnofthetechnologyinvestments.Bysharingthecostofmaintainingtheplatform,thecostfortheindividualuniversitywillbereducedcomparedtocarryingtheoperationalcostformany,uncorrelatedservices.
Arecentreport“EvaluatingandmonitoringtheSocio-EconomicImpactofInvestmentinResearchInfrastructures”(technopolis,2015)pointstothepotentialsocio-economicimpactsofDIGHUMLABsuchas
- Thecombinationofdigitalandhumanisticskillsofstudents,mightleadtoincreasedemployment
- Jointventurewithcompaniesmaycontributetoinnovationactivities
Andthescientificimpactas
- Newknowledgeand - Scientificvisibility- Broaderresearchcollaborations
Withacombinedeffortandunifiedvoice,DIGHUMLABcanapproachcompanies,innovationnetworksand/orcompanieslikeAlexandratoformpartnershipsandtofacilitatethepotentialcommonbenefits.
GOVERNANCEAkeytosuccessforDIGHUMLABisitsabilitytoensurerelevanceofallitsactivitiesandtocreatevisibleresultsandvaluefortheresearchcommunity.ToensurethisDIGHUMLABneeds
- AmandateforamorestrategicroleinrelationtodigitalhumanitiesinDenmark- Amechanismforensuringtherightportfolioofactivities- Acommitmentfromtheinstitutionstoimplementsolutions- Anorganization,whichcansupportformulationsofe.g.externalfunding-andmarketing
strategiesandcansupportine.g.documentingsocietalimpact.
DIGHUMLABwillneedapermanentgovernancestructure;howevermanyissuesstillneedstobeaddressedandIproposetousetheprogramgovernancemodelfortheactivities,whichmayleadtoapermanentorganization.
Moderniseringsstyrelsehasdefinedaprogramas2 (inDanish):Etprogramerenmidlertidigfleksibelorganisationsstruktur,dereroprettetforatkoordinere,styreogovervågegennemførelsenafengruppegensidigtafhængigeprojekterogaktiviteter,foratlevereensamletbusinesscaseiformafresultateroggevinster,derunderstøttersammevision.
2 digitaliseringsstyrelsenshjemmeside:http://www.digst.dk/
5
Thepresentinfrastructureproject,DIGHUMLAB,willfollowthenewgovernancestructurewhenitgivesmeaning(ase.g.thespiltbetweenmanagementandstrategy).NewactivitiessuchasDIGHUMLAB2,willfallunderthenewgovernancestructure.
Anacceptanceofthisproposalwillhereandnowresultinfew,butsignificantchanges:
ToensurerelevanceandlocalimplementationofDIGHUMLABsolutions,anewgoverning“science”boardwillbecreatedwithrepresentationfromallthestakeholders:theuniversities,thelibraries,thememoryinstitutionsandrelevantinterestorganizations.Justasthebusinessboardinacompanyhastheresponsibilitytoensurevalue,thescienceboardhastheresponsibilitytoensureDIGHUMLABaddressesrelevantproblems,developsrelevantsolutionsandtoensurethattheseareimplementedattheinstitutions.
Thepresentmembersofthemanagementgroupcouldbepartofthescienceboard,butitwillbesupplementedwithrepresentationsfromthedigitalhumanitiesstrategyboardsattheindividualuniversities,withmembersfrome.g.DariahandClarinandwithrepresentationsfromongoingSIGs.
Toensureprogressintheindividualprojects,themanagementgroupwillfocusonprogress,achievementofgoalsandfollowuponexpenditure.
Thesteeringgroupwillremainunchanged.
Tofacilitateshortdecisionroutesandtosupportstrategicactivitiessuchasformulationandfollow-uponsocietalimpact,onfundraisingactivitiesetc.,themembersofthesteeringgroupwillappointa“deputy”,whowillassistthesecretariat.
THEPOLITICALANDSCHOLARLYLANDSCAPEDIGHUMLABisoneofaseriesofnationalandinternationalinitiativesandlargeprojectsinitiatedtostimulateresearchintheareaofe-scienceanddigitalscholarship.
DIGHUMLABwillcoordinateitsactivitieswithDeiCtoensurecoordinationoftaskandwillrespondtoinitiativesarounddatamanagementanduseofcomputerfacilities.DIGHUMLABandDeiCwillseekcommonsolutionstochallengessuchassecuredepositofresearchdata,portalsolutionsandlegalchallenges.
DIGHUMLABhasanationalmandatetoensurecoordinationwithactivitiesinEuropeanERICsCLARINERICandDariahERIC.Thesecoordinatingactivitieswillbemobilizedinconnectionwiththeproposedscienceboard.
DIGHUMLABisinfluencedbystrategicinitiativesandtrendssuchas
• OpenScience,whichworkstowardsanopendialogonalllevelsbetweenscientists.Thisinitiativeinvolvesamongotherthingsalsoopensource,opendataandopenaccess.The
6
DanishMinistryofresearchandeducationpublishedin2014anopenaccessstrategystatingthatin2022allshouldhaveaccesstoallpublicresearchpublications3
• Thecodeofconduct(andrequirementsfromsomejournals)aboutreproducibility.• Politicalfocusondialogwiththeinnovationnetworksandindustries• Collaborationisthenorm–whichopensforrequirementsabouttoolstosupportthe
collaborationintermsofcommunicationtolls,commonaccesstoanalysistoolsandplatforms,authoringtoolsetc.
DIGHUMLABwillworkonbeinganactiveplayerintheopensciencemovement–bysupportingopenaccess,byrequireowndevelopedsoftwaretobeopensourceandbypromotingandsupportingopendata.DIGHUMLABwillcollaboratewiththeactorsinthisfieldandwhensolutionsemerge,willworkonimplementingthese.
DIGITALHUMANITIESANDTHETECHNOLOGICALEVOLUTIONAsexplainedbye.gNielsBrügger(2016),digitalhumanitieshasbeenthroughdifferentphases,wherethecomputersandthedigitalmaterialbecomemoreandmoreintegratedintotheresearchmethodology..
Ashasbeendemonstratedinnumerousareas,theuptakeofnewtechnologiesgoesthroughanumberofphases,whichstartwithusingthetechnologytosolveproblemsmoreefficiently,goesoveraphase,wherethetechnologyisusedtoaugmenttheprocessesandends,wherethenewtechnologyistotallyimmersedintheproblemformulationandsolution.
Onearea,wherethisdevelopmenthasbeenstudiedquiteintensely,iseducation.InaUNESCOreport4 onereads:“StudiesofICTdevelopmentinbothdevelopedanddevelopingcountriesidentifyatleastfourbroadapproachesthroughwhicheducationalsystemsandindividual
schoolsproceedintheiradoptionanduseofICT.Thefourapproaches,termedemerging,applying,infusionandtransformation,representacontinuum….”
Inthefiguretotheleft5,theinterplaybetweenthetechnologicaluptakeandthepedagogicalprinciplesishighlighted.
3 “Atskabefriadgangforalleborgere,forskereogvirksomhedertilalleforskningsartiklerfradanskeforskningsinstitutionerfinansieretafdetoffentligeog/ellerafprivatefonde.”fromhttp://ufm.dk/forskning-og-innovation/samspil-mellem-viden-og-innovation/open-access/billeder-og-filer/danmarks-nationale-strategi-for-open-access.pdf
4 http://unesdoc.unesco.org/images/0012/001295/129538e.pdf5 fromanOECDreport:http://www.oecd.org/edu/ceri/Technology-Rich%20Innovative%20Learning%20Environments%20by%20Jennifer%20Groff.pdf
7
AnotherveryillustrativepresentationhasbeenproducedbyJeroeBosnamandBiancaKramerfromUtrechtUniversityLibrary(BosmanandKarmer,2015).Theyhaveinvestigatedwhichtoolsarebeingusedintheresearchprocessandhaveidentifiedanumberofworkflows,asshownonthefigurebelow.
Theaboveisinterestingnotonlybecauseitillustratesthechangingnatureofscholarlyworkbutalsobecausesomeofthetoolsrelyonthewisdomofyourpeers.Mendelay,whichwasoriginallymeantforstoringreferencessuddenlyisadiscoverytool–basedonthecollectionofpeers,andResearchgate,whichoriginallywasjustasiteforpublishingyouresultsnowhastoolsactivelypromotingresearchandthereforeisusedasanoutreachmechanism.
CROWDSOURCING/CITIZENSCIENCEThewisdomofpeersandamateursisbehindagrowingactivity,namelycrowsourcingandcitizenscience.
OnthedigitalhumanitieswebsitefromCambridge,oncanreadthefollowingdefinition:
Researchersinwiderangeofacademicdisciplineshavebeguntoexperimentwith‘crowdsourcing’-creatingormobilisingonlinecommunitiesofvolunteerstoassistthemintheirresearch.…………………….academiccrowdsourcingprojectsmayhavemoreincommonwithotherkindsofparticipatoryresearchinitiatives.‘Citizenscience’,forexample,whichisdescribedinarecentreportbytheUKEnvironmentalObservationFramework,as“theinvolvementofvolunteersinscience”,issometimesusedinterchangeablywith‘crowdsourcing’asalabeltodescriberesearchprojectswhichinvolveanonline,opencallforparticipationbythepublic.
ApioneerintheareaisZooUniverse,www.zoouniverse.org,whereonemightparticipateinthetranscriptionoftheworksofShakespeareorthelettersofartists.ActivitiesattheZooUniverse
8
hasbeenstudiedbySauermannandFranzoni(2015)whoanalyzedmorethan12milliondailyobservationsofuserswhoparticipateinthesevenprojectshostedatthecitizensciencesiteZooniverse.org.Theyfoundthatmorethan100,000citizenscientistscontributedover129,000hoursofunpaidtime,withanestimatedvalueofover$1.5millionforthefirstsixmonthsofjustsevenprojectsstudied.
CrowdsourcingisalreadyusedalotinDenmark,e.g.inarchivesandlibrariestoimprovedataquality.TheNationalArchiveusescrowdsourcingtotranscribee.g.census-dataandlocalarchiveshavemanagedtogetalotofregistrationcardstranscribe.TheRoyalLibraryhasinvolvedusersinaccuratelocationoffarms.
Asisthetopicofmanyarticles,citizensciencecanreallybeawin-winscenario–bothfromanoutreachpointofviewandfromanengagementpointofview.Thescientistsgetalotoffree,qualityassistanceandthepopulationgetsinsightintoproblemsandownershiptotheresearchoutcome.
DIGHUMLABwilllookintotheuseofcitizenscience.
DATAFORDIGITALHUMANITIESTheexponentialgrowthofstorage,ofcomputerpower,andinformationingeneralwillcompletelychangeourpossibilities.AsespeciallyKurzweil(RayKurzweil,TheSingularityIsNear:WhenHumansTranscendBiologyandhttp://www.kurzweilai.net/the-law-of-accelerating-returns)haspointedout,theexponentialgrowthofthedigitalworldisincontrasttotheinnaturelineargrowthbehindthebiologicallawsgoverningthehumanevolution.Humanscan’tgrasptheentirety.Technologyworksheretoaidhumansintwoverydifferentways:
- Bigdata,wherecomputerpowerisusedtoanalyzeandvisualizethemassiveamountsofdatatoextractpatternsandtrends
- Semanticdevelopmentwhereinformation(andthings)understandeachotherandcanactaccordingly
Someofthedilemmasbetweentraditionalhumanitiesanddigitalhumanitiesisthemethodologybestillustratedbycontrastinganumberoftermarisingintheliteraturebeingprettysimilar,butstillbeingusedbydifferentcommunities:
• Closereading–distancereading6:Traditionallyresearchinhumanitieshasbeenbasedonso-called“closereading”,referringtothecarefulreadingandinterpretationofselectedobjects–incontrasttotheso-calleddistancereadingrelyingonanalysisofmassiveamountsofaggregatedinformation.
• Smalldataversusbigdata7:Thetwoleadstoverydifferentresearchquestionsandmethodologies,,whichIreturntobelow.
6 TermintroducedbyMorettihttp://www.digitalhumanities.org/dhq/vol/8/1/000171/000171.html7 Seee.g.blogpostbyTimHitchcock:http://historyonics.blogspot.dk/2014/11/big-data-small-data-and-meaning_9.html
9
• Structuredversusunstructured:Themajorreasonforthegrowthofdatarightnowisthedramaticincreaseofsocalledunstructureddata.Oneoperateswith8:
o Wellstructured(orjust”structured”)whichamountsto5-10%ofthetotalamountofdata,isdefinedasdata/informationwhichcanbestoredinclassicrelationaldatabaseswheretheintendedinterpretationforeveryfielddataisexplicitlyencodedinthedatabasebycolumnandrowheadings
o Semi-structured–accountingforother5-10%,isinformationthatdoesn’tresideinarelationaldatabasebutthatdoeshavesomeorganisationalpropertiesthatmakeiteasiertoanalyse
o Unstructured–whichistheremaining80–90%whichmaybedefinedasthedirectproductofhumancommunication.Examplesincludenaturallanguagedocuments,email,speech,imagesandvideo.Itisinformationthatwasnotspecificallyencodedformachinestoprocessbutratherauthoredbyhumansforhumanstounderstand
Unstructureddataiseverywhere.Infact,mostindividualsandorganizationsconducttheirlivesaroundunstructureddata.Justaswithstructureddata,unstructureddataiseithermachinegeneratedorhumangenerated.
Workingwithbigdataisaboutreducingthecomplexityoftheresearchobjectandincreasingtheirqualityasillustratedbelow.
Attheotherendofthescale,stilladdressingthepotentialoverflowofdata,is”Linkeddata”,whichwascoinedbytheinventoroftheweb,TimBurners-Lee,inconnectionwithhisintroductionofthesemanticweb.TheSemanticWebisn'tjustaboutputtingdataontheweb.
8 Thenumbersarefromhttps://jeremyronk.wordpress.com/2014/09/01/structured-semi-structured-and-unstructured-data/-andfitwiththefigureabove.
10
Thereallyimportantpointoflinkeddatais,thatthedataaredescribedinastructuredmannermakingitpossibleformachinestoaccessandusethem;theyaremachineactionable.
Usingthesestructuredinformationresourceswillbothreduceredundancyoffactualinformationandwillallowforagreaterdegreeofautomatisationofcertainworkflows.Aneasytounderstandexampleistheuseofcreativecommonlicenses.Hereapieceofcodeassociatedwitheache.g.pictureswilltellcrawlerssuchasGoogleabouttherightsassociatedwiththedigitalobject.Google(orothers)canthenmakeasearchinterface,whereIasausercanaskonlytogetpictures,whichcanbere-used.
Anotherclassofdataaretheidentification/authorityinformationsuchas:
- Uniqueidentifiersforresearchoutput–DOI,DanPID- Registryofuniqueresearcheridentifiers–ORCID- VirtualInternationalAuthorityFile(VIAF)
Basedonthediscussionsinthissection:
- DIGHUMLABworkswithbigdatatoinvestigatethepotentialforthislineofresearch.- DIGHUMLABsupports,whenpossible,theLODparadigm.- DIGHUMLABwillencourageactivitiesaroundsmalldata(databases,2D,3D).
SERVICEPROFILECOMPLETENESSOFTOOLSAmajorchallengeindefiningtheactivitiesforDIGHUMLABisthedegreetowhichsoftwareshouldresultinready,easytousetoolsandservices.Themoreeffortyouputintodevelopingapieceofsoftwaretheeasierandcheaperisittousefortheaudience.Often,howeverasistheexperiencefromcloudcomputing,atapriceoffewerdegreesoffreedom.IncloudcomputingonedistinguishesbetweendifferentlevelsofcloudserviceprovisionwithInfrastructureasaService(IaaS)asthemostprimitive,overPlatformasaService(PaaS)toend-userfriendlySoftwareasaService(SaaS),whichnormallycanonlybeusedinonewayandwithlittlefreedomtochangethebehavior.
11
DIGHUMLABreliesonatechnicalinfrastructureofnetworks,computersandinfrastructuresfore.g.autentificationandauthorisationprovidedbyDeICandthedifferentit-departments.
TheplatformandsoftwarelayerforDIGHUMLABalsoconsistsofowndevelopedsoftwareandprogramsforspecifictasks.Someofthetoolsareend-userfriendly,andcanbeusedbyanyone,othersrequireprogrammingexpertise.DIGHUMLABwillneedtomakeachoicebetweenusingit-competencetodevelopthetoolsortouseit-competencetoassistresearchersbuildingoncomponentsfromaroundtheworld.
Iproposethefollowingstrategyfordevelopments:
• Toolsintendedformanyusersshouldbemadeeasytouse.Amongthesetoolsarecontentgatheringandsomeoftheanalysisandvisualisationtools.Noorlittleit-knowledgeisrequiredtousethesetools.
• Formorespecificsolutionstoresearchquestions,theaimshouldbewelldocumentedtoolswithwelldescribedinterfaces.Tousethese,oneneedsprogrammingskills.Tosupportresearcherswithouttherequiredit-competence,DIGHUMLABshouldhaveapoolofit-developers,whichafterapplicationcanbeassignedtoaspecificprojectfor3-12month(seeremarkbyDonaldWaltersbelow).
SERVICES–INTERNATIONALINSPIRATION Numerousinitiativesexisttosupportresearchin“digitalhumanities”.Theyhavenameslike:digitalhumanitieslab,humanitieslab,digitalscholarshipcenters,digitalcommons,etc.InareportformaworkshoparrangedbyCoalitionforNetworkedInformation(CNI)attendedmainlybylibrarians,DonaldWaters,SeniorProgramOfficerforScholarlyCommunicationsatTheAndrewW.MellonFoundation,providedhisobservationsontheworkshop’spresentationsanddiscussions9:
Hiscommentsfocusedontheneedtoviewthesupportofdigitalscholarshipinaninstitutionalcontext,notnarrowlyinalibrarycontext.Inhisview,digitalhumanitiescentersandfacultyinstitutesareoftentheplacesthatbringnewideasin–theyareattheleadingedgeofdevelopments;incontrast,thekindsofcentersthisworkshopfocusedonallownewtools,methods,andinfrastructuretomovefromtheedgetothecenter,makingthosethingsavailabletomoreindividualsandtoabroaderrangeofdisciplinesthanthefacultyinstituteserves.AsDSCsdevelop,theywillalwayshavemoredemandsonthemthanresources,andWaterssuggestedthatdevelopingapeerreviewprocessthatemphasizedsuchfactorsasthebenefitoftheproject totheinstitution’smission,thepotentialapplicationsoftheoutcomesoftheprojecttootherscholarship,thepotentialinter-institutionaldimensionsoftheproject,andthedevelopmentofrobusttoolsthatcouldbeusedinteachingandlearningenvironmentsaswellasinresearch,wouldassistinidentifyingpriorityprojects.
FromtheEuropeansceneIwanttohighlightthreetypesofinitiatives:
CHASE(http://www.chase.ac.uk/),whichisaconsortiumfortheHumanitiesandArtsinSouthEastEnglandworksonnewapproachtodoctoralresearch.Itofferstrainingprogrammesanditworksonidentifyingplacementsforstudents.Membersandpartnersareuniversitiesand
9 https://www.cni.org/wp-content/uploads/2014/11/CNI-Digitial-Schol.-Centers-report-2014.web_.pdf
12
memoryinstitution.ThealsoworkwithNetworkDevelopmentFund,whichcouldbeamodelforthefutureofSIGwithinDIGHUMLAB.
Physicalfacilitiesrunbyuniversitiesorlibraries,whererelevanthardwareandsoftwareequipment(computers,videoequipment,eyetracking,audiorecording,bigdataanalysis,etc.)areinstalledandwhereresearchersandstudentscanexperienceandbetrainedindifferentaspectsofdigitalmethodologies.HerebothUmeaaandLundgiveverygoodexamples,seehttp://www.humlab.umu.se/en/about/humlab/andhttp://www.humlab.lu.se/en/about/).Whatseemstocharacterisethesecentersisthepresenceofadrivingperson,whoensuresthedevelopmentoflab-facilitiesthroughdealswiththeuniversityandthroughfocuseduseofgrantmoney.
Registries,mainlydrivenbylibraries,whichattempttoprovideanentrytothewealthoftoolsavailablefordigitalhumanitiesresearchoranentrytoidentifydata(suchasClarin)ortoolslikeDirt(http://dirtdirectory.org)maintainedbytheCenterNetorganization.Utrechthasdevelopedaregistrycontainingadescriptionofowndevelopedtools–withexamplesandinformationonthedeveloper.
Iobserve
- Aneedtostimulateprojectcreationandexchangeofideas- Aneedforapeerreviewprocesstogoverntheuseofsupportresources- Aneedtoassiststudentsandresearchersinexploringnewpossibilities- Aneedtouseprojectmoneytodevelopandexpandthearrayofservices–toensure
continuedrelevanceofthefacilities
ForDIGHUMLABIpropose
• TocreateacommonregistryoftoolsusedindigitalhumanitiesinDenmark,wheretheactualusescenarioisincenter–eitherinformofareallifeusecaseortoolsand/ordigitalobjectsarepartofaworkflow.Theusecaseortheworkflowcanstemfromaresearchquestionorfromalearningsituation.
• Togetherwiththescienceboardarrangeayearly“dating”event,whereideasarepitchedandwherepanelswillchallengethese.Theobjectiveistocreatesynergybetweensimilaractivitiesandtoidentifynewareasofactivity.
• Establishapeerreviewprocesstodistributeit-developmentresources.–ifitisdecidedtocreateapoolofsuchexpertise.
• WorkoncreatingamembershipmodelbasedonthemodelofChase,wherefundingorganisations,universitiesandinformationcompaniespartnertooffersupportforcompetencebuilding,e.g.throughplacementgrantinconnectionwithPhDstudies.
13
DIGHUMLAB,ADIGITALECOSYSTEMEUdefinesresearchinfrastructure10 as:facilities,resourcesandrelatedservicesusedbythescientificcommunitytoconducttop-levelresearchintheirrespectivefields,..
ThefacilitiesinDIGHUMLABareequipmentandsoftwareandtheresourcesarethedigitalobjectsandrelatedservicesarethesupport,training,outreach,andadministrativestructurebuiltaroundthefacilities.
Ashasalreadybeendescribed,thetechnologicallandscapechangesfastandwithexponentialgrowthratesinallparameters.Thishasconsequencesfortheinfrastructurewhichcan’tbeonebigmonolithicsystem–ithastobedynamicandadjusttothechangingneedsoftheresearchcommunity.Thisistheideabehindadigitalecosystem,aconceptnowusedforconferences,booksandpapersanddescribedindetailbyBriscoe(Briscos,2009)inhisPhDthesis“DigitalEcosystems”.
Wikipediadefinesadigitalecosystemasadistributed,adaptive,opensocio-technicalsystemwithpropertiesofself-organisation,scalabilityandsustainabilityinspiredfromnaturalecosystems.Digitalecosystemmodelsareinformedbyknowledgeofnaturalecosystems,especiallyforaspectsrelatedtocompetitionandcollaborationamongdiverseentitieshttps://en.wikipedia.org/wiki/Digital_ecosystem)
ThecomponentsofDIGHUMLABaredigitalobjects,servicesandsupport–developedinresponsetorealresearchquestions.Theinteroperabilityisensuredthroughadherencetoexchangestandardsformetadata,fortheobjects,forthesoftwareandforcommunication.Theusabilityisensuredthroughuserinvolvementinfuturedevelopments,throughmanualsand/ortrainingmaterialandthroughanexpertsupportsystem.
DigitalEcosystem
Objects
Practice
Community
Experts
Tutorials
Tools
10 (https://ec.europa.eu/research/infrastructures/index_en.cfm?pg=what
14
ThefigureaboveisinspiredbyasimilarfigurewithjustfourcategoriesfromYaleDigitalCollectioncenter(http://ydc2.yale.edu/digital-ecosystem),capturingthecomponentsoftheDIGHUMLABdigitalecosystem.
FollowingthesixcategoriesinthefigureIthereforeproposethefollowingdescriptionofDIGHUMLAB:
DIGHUMLABasadigitalecosystemadvancingresearchindigitalhumanitiesandconsistingof:
Atoolareaconsistingof
• Software(preferableopensource)–withwelldescribedAPIsandwithmetadatafordiscovery
• Selectedexternal(andinternal)services,whichcanbeinvokedaccordingtostandards• Capturetools
Adigitalobjectareaconsistingof
• Licenseddataandtheirassociatedmetadata• Opendataadheringtostandardsandwithmetadatafordiscovery(preferableadhering
tothestandardsbehindLinked(open)Data)
Tutorialsconsistingof
• Video/demonstratorsillustratingtheuse• Workflowdescriptions• Trainingcourses
Asharepoliciesandpracticeareaconsistingof
• Informationonethicalquestion(DIGETIK)• Networkforbestpracticeinrightandprivacy(findcorrecttitle)
Andfinallyacommunityareaof
• Specialinterestgroupsaddressingemergingthemes• Participationinnationalnetworks• CommunitiesontheWebandonFacebook• Collaborationthroughinternationalfora
Andfinally,expertsagreeingtosupporttheresearch(andeducational)communitythroughactivitiessuchas
• Atechnicalhelpdesk• Adviceonsimilarproblems(e.g.purchaseofvideocapturetools)
Theindividualcellsoftheecosystemwillbegroupedaccordingtopropertiessuchasthenatureofthedatasetandtheresearchactivitysupportedbythetoolandtheunderlyingtechniqueused.
AnadditionalconceptintroducedinDIGHUMLABis“incontext”;notoolandnodatasetwillbeexposedwithoutbeingsetinacontext,whichcanhappeneitherviaacasestory,apaperbasedonthedatasetand/ortoolsorviaadescriptionofaworkflow.
15
Intheecosystem,somedata/objectsandsomeprocesseswillbeconnectedtoformworkflows.Theseworkflowstogetherwithcasestudieswillconstitutethebasisforknowledgesharingandcanactasthekeytoconnectinterestedstudents/researcherstoexperts.
AspointedoutbyamongothersCarolGoble(Goble,2008)theconstructionofrelevantworkflowsmakesiteasierforstudentsandresearcherstogetstartedandgetresults.
DIGHUMLABTERMINOLOGY Theglueofthedigitalecosystemhasmanyforms,fromprotocolsandframeworkforthesystem-to-systemcommunicationoverstandardsforwrappingtheobjectstohowthesearedescribed.
Intheshorttimeframe,itwillbeusefulwithataxonomytogroupthetoolsandobjects.
SinceDariahisapartofDIGHUMLABIthinkweneedtoexploretheadaptabilityoftheTaDiRAH11 taxonomy,alsoasithasbeenusedtoclassifythetoolsinDIRT.
TaDiRAHoperateswiththreeareas:Activities,ObjectsandTechniques.DIGHUMLABwillinvestigatethepossibilityofusingtheTaDiRAHtaxonomidevelopedbyDariahforactivitiestodescribetheactivitiesandusethisapproachtounifytheecosystem.
DIGHUMLABmayinvestigatetheuseofNeMO12 todescribethedigitalobjects.
Togetafullclassification,amethodologywillbeneededfortechniques;myinvestigationsuptonowhavenotrevealedagoodcandidate.
EXISTINGSERVICESINDIGHUMLABThestudentsandresearchersoftheparticipatinginstitutionsofDIGHUMLABcanfreelyuseandseekadviceontoolsandservicesasdescribedbelow.
DIGHUMLABTOOLSThefollowingtoolshavedevelopedaspartoforhasbeenembeddedinDIGHUMLAB:
Tools,whichhasbeendevelopedbymembersofDIGHUMLABandtowhichDIGHUMLABhasaresponsibilityofmaintainingthetools:
CLARINinfrastructuretoolsfor:
• Linguisticannotations(POS-TaggingopenNLP,Brill'sPosTagger,NameRecogniser,CSTlemmatiser,Bohnetsparser)andfor
• Conversation(TEIP5tokenizer,TEIP5segmenter,CoNLLconverter)
LARMinfrastructuregivingaccesstomorethan1mio.Radio-andtelevisionprogramsandsupportingthefollowingfeatures:
• Collaboration• Sharedannotationsendenrichmentofthematerial
11 http://tadirah.dariah.eu/vocab/index.php12 http://nemo.dcu.gr
16
• Accesstoallmaterialinmediestream(Newspapers,broadcastsandmanuscriptsfornews-broadcasts)
Lab-facilitiestovideo-captureeventsandtoannotatethese:
MobileVideo-capturefacility–basedonGoProcamera
VideoResearchLabinAalborg,
VideoEditingLabinKolding
Thefollowingopensourceor/andcommercialtoolsareusedbymembersoftheDHLcommunity:
EZArchive–atoolsdevelopedbyChaosInside,whichcanbeusedforstoringresearchoutput.(notutorialavailable)
Asetoftoolsforgatheringofwebbasedmaterialexplainedandusedinaworkshoponcreatingwebarchives:
• Enkeltsider:(WebSnapper,Paparazzi)• Lydogvideo:(VideoDownloadHelper,Musicbox,WireTapSudio,Videobox)• Skærmfilmiingb(SnagIt)• Samlehelewebsider(HTTrack)
Asetoftoolsforworkingwithvideowithtutorialsexplaininghowtousethese:
• Transcritiontools:(CLAN,ELAN,Praat)• Video-editingtools:(FinalCutPro,PremierePro)
Atoolfordocumentdesign
• InDesign
ThetoolsofDIGHUMLABarevisualizedinthefigurebelow,wherethetaxonomyofTaDiRahisusedasabasisforcharacterizingthetools:
17
Supportingtoolsprovidedbyothers:
- DanPID;KB- Supportforpublishingjournals(nationalinitiativesupportedbytheMinistryofresearch
andeducation):KB- WAYF–authentication- Datacite(DOIfordatasets)- ORCID–identifiesforresearchers
MostoftheDIGHUMLABtoolshavebeendevelopedasopensourceandhavebeensubmittedtoGitHUB.
DIGHUMLAB-DATASETSMostofthedataisopenaccess;however,afairamountofthecontemporarymaterial,isclosedasprivacyhindersopenaccesstothematerial.
18
CLARIN:
- Generallanguagecorpus,notpublic(18,893files)- DK-CLARINGenerallanguagecorpus,publicpart,textsfromtheDanishParliament- DK-CLARINLanguageforSpecialPurposes(LSP)corpusconsistsoftextsfromseven
selecteddomains.Itcomprises11Mtokensfromtheperiod2000-2010,complementingtheexistingDanishgenerallanguagecorpora
- KnowledgeforEverymanCorpusofDanishtextsfromtheperiod1500-1750(86files)- RAPID,DanishPressReleasesfromtheEuropeanCommission1993-2003(5329files)- JRC-Acquis(2306files)- DanishFinancialReports,2002-2010(90files)- Aseriesofthreedifferentmaterialtypesrelatedto
o Talkbankconversations(7soundfiles,13videofilesand21annotatedcollections)
o Interviewswithyoungstudents(14soundfiles,46videofilesand16Annotatedcollections)
o Conversationamongstudents(16soundfiles,21videofilesand4Annotatedcollections)
- Structureddata(owl,csv):DanNet- Jydskordbog(TEIPS)
Netarchive
ThenetarchivehasharvestedtheDanishInternetsince2005,wheretheDanishLegalDepositLawwaschangedtoincludethistypeofmaterial.ThetaskisundertakenbythetwolegaldepositlibrariesinDenmark,theStateandUniversityLibrary,AarhusandtheRoyalLibraryinCopenhagen.TheNetarchivecontainsmorethan10.000.000.000documents.
Radio-andtelevisioncollection
TheStateandUniversityLibrary,Aarhus,hostthenationalmediacollection.Duetothelibraryfocusondigitalizationitnowcontainsnearto2millionprograms:
• Harvestingofradio-andtelevisioncontinuouslysinceJan.12006• Retrodigitalisationoftelevision(Mpeg-1,mpeg-2,H.264)• RetrodigitalisationofDRradiotapesfromca.1920(incollaborationwithDR)(WAV,
BWF,mp3)• 52.000commercials
OTHERDIGITALRESOURCESTheabovearethemainresourcesusedinDIGHUMLAB.Manymoreressourcesareavailableandfutureworkwillworkonidentifyingrelevantdigitalmaterialforhumanities.Afullregistrationofdigitalizedcollectionscanbefoundat“DanskKulturarvdigitaliseret”,aportalmaintainedbytheRoyalLibraryprovidingaccesstoretro-digitisedmaterial.
Amongthecollections,whichhasbeenmentionedasbeingofinterestfordigitalhumanitiesarementionedbelow.However,manymoreareavailableandarerelevant,sothelistbelowisinnowayexhaustive.
19
Digitalizednewspapers(SB)
TheStateandUniversityLibraryhoststhenewspaperarchive.Morethan13mio.pageshavebeendigitizedandOCR’eduptonowandareavailableviaMediastream.
DigitalizedLiterature(KB)
TheRoyalLibraryisresponsibleforlargecollectionsofmanuscripts,lettersandbooks.SomeofthesehavebeendigitizedandpublishedonthewebandsomehavebeenassociatedwiththeMARC-recordofthelibrarysystem.
Mostoftheearlymaterial(upto1700)areavailable(freelyfromDenmark)viaEarlyEuropeanBooks.
Worksof78majoroutofcopyrightDanishauthorshasbeenpublishedas“Arkivfordansklitteratur”,acollaborationbetweenKBandDSL.Theapproximately2.100pagesareavailableviawww.adl.dk.
DigitalizedSheetMusic(KB)
TheRoyalLibraryisresponsibleforlargecollectionsofsheetmusic.Morethan7.000piecesofthesehavebeendigitizedandissearchableviathelibrarysystem.
Retro-digitalizedmapsandpictures
AlargecollectionofaerialphotoshavebeenretrodigitizedbytheRoyalLibraryandismadeavailableviatheportal“Danmarksetfraluften–førGoogle”.
Kiekegaard(Kierkegaardcentret,KB),Holberg(DSL,KB)andGrundtvig(Grundtvigcentret):
ThecompleteworkandcorrespondenceofthreeofthemajorDanishauthorsareavailableasannotatedtextviathreedifferentsites.
SørenKierkegaardsSkrifter,www.sks.dk:thecompleteworkofKierkegaard,
LudvigHolbergsSkrifter,www.holberg,thecompleteworkofHolbergand
Grundtvigsværker,www.grundtvigsværker.dk,thecompleteworkofGrundtvig.
DigitisedmaterialfromRigsarkivet(NationalArchive)
20
REFERENCES:Bosman,JeroenandKramer,Bianca2015 ‘101InnovationsinScholarlyCommunication’seee.g.https://innoscholcomm.silk.coorhttps://figshare.com/articles/101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow/1286826
Briscoe,Gerard2009PhDthesis“DigitalEcosystems”http://arxiv.org/pdf/0909.3423.pdf
Brügger,Niels2016‘DigitalHumanities’,inTheInternationalEncyclopediaofCommunicationTheoryandPhilosophy.red.KlausBruhnJensen,Wiley-Blackwell2016.
Dombrowski,Quinn2014‘WhatEverHappenedtoProjectBamboo?’LiteraryandLinguisticComputing,vol.29,no.3,September2014,pp.326–339. doi: 10.1093/llc/fqu026)
Franzoni,ChiaraandSauermann,Henry2014‘CrowdScience:TheOrganizationofScientific ResearchinOpenCollaborativeProjects’,ResearchPolicy,vol43,issue1,pages1-20
Goble,Carol2008InvitedtalkatECDL2008