This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The...

Post on 24-Jun-2020

2 views 0 download

Transcript of This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The...

Thischapterispartof“TheSAGEHandbookofWebHistory”,editedbyNielsBrüggerandIanMilligan.Thisversionissubsequenttopeerreviewbutbeforetype-settingandproofing,sodon’tcitethis,citetheversionofrecord,availablefromhttps://uk.sagepub.com/en-gb/eur/the-sage-handbook-of-web-history/book252251ThisversioncanbereusedunderCC-BY-NC-ND4.0:https://creativecommons.org/licenses/by-nc-nd/4.0/

CollectingPrimarySourcesfromWebArchives:

ATaleofScarcityandAbundance

FedericoNanni

DataandWebScienceGroup

UniversityofMannheim

Ladiversitédestémoignageshistoriques

estpresqueinfinie.

(Bloch,1949)

TheWorldWideWebisthelargestcollectionofhumantestimoniesthatwehaveeverhadatour

fingertips.Spanningfrominstitutionalwebsitestodigitallibraries,frompersonalblogstoTwitter

accountsofprominentpoliticians,fromonlinenewspaperstolarge-scaleknowledgebases,an

immensenumberofborn-digitaltestimoniesiswaitingtoberetrieved,selectedandstudiedbyfuture

historians.Inadditiontothis,whilethesenewresourcesarepilingupsteadilyinfrontofoureyes,

theyarealsorapidlyreplacingtheiranaloguecounterparts,fromprintednewsarticlestopersonal

diaries,fromlettercorrespondencestoscientificpublications.

Byacknowledgingthissuddentransitioninproductionfromprintedtodigitaldocuments,thegoalof

thischapteristopresentanddiscusssomeofthenewmethodologicalissuesthatarisewhenthese

materialsaretobeemployedasprimarysourcesforstudyingtherecentpast.Firstly,anoverviewof

thedebateonthehistorian’scraftisoffered.Then,twodifferentcasestudiesthathavedealtwiththe

difficultiesofadoptingborn-digitalmaterialsinhistoricalworkwillbedescribed:thefirstisfocused

onreconstructingthepastofuniversitywebsitesasanewwayforstudyingtherecentpastof

academicinstitutions;thesecondretrievesmaterialsfromlarge-scalearchivesofthewebinorderto

studycontemporarysocio-politicalevents.Throughthesedescriptions,itwillbehighlightedhowa

fruitfulcombinationofthehistoricalmethodswithapproachesfromotherresearchareas,suchas

internetstudiesandnaturallanguageprocessing,couldsupportfuturehistoriansinsuccessfully

addressingthem.

1.TheHistoricalMethod:TodayandTomorrow

Inordertounderstandhowthetransitionfromanaloguetodigitalsourcesisabouttochangethe

historian’scraft,itisfirstofallessentialtoexaminehowthe‘historicalmethod’(Shafer,1974)is

generallydefinedandwhichareitsmajorsteps.

DefiningaSubject

Inthefirstpartofanyhistoricalresearch,thescholarbroadlydefinesthesubjectofinvestigationand

-togetherwithit-aninitialquestion.Theresearchquestion,firstlypresentedatacoarse-grained

level,willbesharpenedthroughtherecursiveprocessofcollectingsources,interpretingthemandby

doingsodiscoveringtheunderlyingnarrative.

CollectingtheEvidence

Inordertoaddresstheresearchquestion,thehistorianidentifiesthetestimoniesuponwhichshe/he

buildsanarrativethroughacomplexprocessofcollection,analysisandselectionoftheremainsof

thepast.Thesetestimoniescouldbephysicalremains(e.g.buildings,statues),oralmemories,printed

documents(e.g.chronicles,diaries,articles,censusdata)andwillsoonbecomeborn-digital

documents,suchaswebsites,onlineforums,emailthreads,large-scaledatabases,etc.Theprocessof

collectingprimarysourceshasbeenshapedandsharpenedbydecadesofdiscussionsin

historiographybothonhowtoestablishthereliabilityofthesematerials,forexamplethroughsource

criticism,andonhowmuch‘trueknowledge’canbederivedfromthem(thereareasmany

interpretationsofthesametextasmanyreaders,asBarthes(1967)hastaughtus).

InterpretingtheEvidence

Theinterpretationofthecollectedtextualsourcesrepresentsthecoreofanyhistoricalresearch.Due

tothisreason,ithasbeenthecentralfocusofdebateacross20thcenturyhistoriographyandhas

experienceddrastictransitionsinmethodology.Asamatteroffact,theanalysisandinterpretationof

sourcescanbeconductedinmanydifferentways:traditionalhistoriographyscholarshipshave

stronglyreliedonhermeneuticsandonthecarefulqualitativeexaminationofdocuments,whileother

approaches-whichemergedduringthesecondpartofthe20thcenturyinspiredbysocialscience

methodologies(seetheadventofCliometrics-Greif,1997)–haveemployedcensusdataor

economicreportsinordertoconductlarge-scalequantitativeanalyses.

Throughthe‘70sandthe‘80spostmodernanddeconstructionisttheories(startingfromtheworksof

Barthes,1967;Derrida,1967andLyotard,1979,amongothers)haveposedmajorcritiquestothe

underlyingassumptionofbothtraditionalandsocial-sciencehistoricalscholarshipsthatitispossible

todiscovera‘uniquetruth’aboutthepastthroughtocarefulanalysisoftheremains.Theenormous

impactofthesecritiqueshasbeenremarkedbymanyhistorians(Munslow,2006;Burke,2008)and

hasledtotheso-calledculturalturnintheprofession,whichisstillreflectedstronglytodayinthe

community1.

PresentingaNarrative

Thefinalstepofanyhistoricalresearchistodefineanarrativeandwriteahistory.Thecreationofa

narrative,whichishighlyconnectedwiththeinitialdefinitionoftheresearchquestion,givesthe

historianthepossibilityofplacingtheworkshe/heiswritingaspartofalargercontributiontothe

field.Thisisachievedintwointerconnectedways:firstofall,byofferinganew/differentperspective

onthetopicunderstudy;inadditiontothis,byparticipatinginthelargerdebateinhistoriography

regardingthewaysthepastcanbere-discovered,examined,describedand-forcertainauthors2-

evenmodelled.

1.1AComputationalTurnoftheCraft?

Historyhasbeenpartoftheso-calleddigitalhumanities(Schreibmanetal.,2004),sincetheirvery

beginning.3Inparticular,duringthesecondpartofthe20thcenturythepotentialofcomputational

methodsandtheirimpactoverthehistorian'scrafthavebeenrecurrenttopicsinhistoriography.As

ThomasIII(2004)remarked,alreadyin1945VannevarBush,inhisfamousessay‘AsWeMayThink’

pointedoutthattechnologycouldbethesolutionthatwouldenableustomanagetheabundanceof

scientificandhumanisticdata(Bush,1945);inhisvision,theMemexcouldbecomeanextremely

usefulinstrumentforhistorians.

Theuseofthecomputerinhistoricalresearch,whichgrewsignificantlybetweenthe‘60sandthe‘70s

thanksbothtotheeffortsoftheAnnalesschool(seeforexampleDaumardandFuret,1959)andtoits

applicationtotheanalysisofeconomicandcensusdata(Greif,1997),hasbeenstronglyrelatedtothe

adoptionofsocialsciencepracticesinhistoricalstudies(Evans,2001).Apioneeringworkontheuse

ofdatabasetechnologiesforhistoricalresearchwasconductedbyManfredThallerduringthe‘80s

(Thaller,1991).

However,asMilligan(2012)andRobertson(2016)havealreadyremarked,alargemajorityofthe

historiancommunityhasremainedskepticaltowardstheadoptionofcomputationalmethodsinthe

craft.Thisattitudehasconsolidatedinoppositiontootherhumanitiesdisciplines:forexample,inthe

lastthirtyyearsthefieldofliterarystudyhaslargelyexperimentedwiththepotentialofwhatthey

havedefinedas‘distantreading’techniques,inordertoextractquantifiableinformationfromlarge

amountoftexts(Moretti,2013).Instead,duringthesametime,theso-calleddigitalhistory

community(Cohenetal.,2008)hasdecidedtofocusprimarilyonthepotentialitiesoftheWebasa

platformforthecollection,presentation,anddisseminationofmaterial(CohenandRosenzweig,

2005)andonthemore‘communicativeaspects’ofdoingresearchinthehumanities(Robertson,

2016).Thiscanbenoticedbyobservingtheimportancegiventodigitalpublichistorytopics(Noiret,

2015),therelevanceofteachingindigitalhistory(Cohenetal.,2008)andthetraditionofdigital

historymapping(KnowlesandHillier,2008).

Inthesecondpartofthe2000s,thanksinparticulartothepromptavailabilityofdigitizedhistorical

primarysourcesandthepotentialitiesofwebtechnologies,thisskepticalattitudetowards

computationalmethodshasslowlychangedandafewinterdisciplinaryteamshavedevelopedtoolsin

ordertohelpothertraditionallytrainedhistorianstoemploythesemethodsintheirwork.AsNelson

(2016)remarked,thefirstfruitfulapplicationsofthesemethodsforsupportinghistoricalnarratives

canbefoundintheworksofWilkens(2013)andBlevins(2014),whicharerobustexamplesofthe

beginningofamatureseasonofdigitalhistory.

Whiletheseearlyscholarshipsbasedontheuseofcomputationalapproachesareessentialfor

refreshingthehistoriographicdebate,itisarguedinthischapterthattheadoptionofcomputational

methodscouldnotbeconsiderperseasarevolutionaryturningpointfortheprofession.Infact,use

oftheseapproachesissimilartoothermethodologicalturningpointsthathistorianshavealready

experiencedbefore(Milligan(2012),forexample,identifies‘threewaves’ofcomputationalhistory);

moreover,duringthelasttenyearstheuseofcomputationalmethodsinhumanitiesresearchhas

beenstronglysustainedandencouragedbypublicandprivateinstitutions(fromtheNEHDigital

HumanitiesAdvancementGrantstotheVolkswagenStiftungon‘MixedMethods’intheHumanities)

aswellasprivatecompanies(e.g.,Google’s‘commitment’totheDigitalHumanities)andoften

mainstreammediasources(Rothman,2014).

Nevertheless,itisarguedinthischapterthathistoriographyisabouttoexperienceanewandway

moreconspicuousturningpointandthatthiswillhaveaverystrongimpactonaspecificstepofthe

historian’scraft,namelythewaysourcesarecollectedfromnowon.Born-digitaldocumentsshared

online,theirephemerality,preservation,availabilityandaccessisabouttoposealargesetofnew

challengesforfuturehistorians.Inthenextdecades,themethodologicaldebateinhistoriographywill

notonlybecenteredaroundqualitativeoverquantitative,distantversusclose,hermeneuticsagainst

statisticalsignificance,butitwillalsoaddresstheneedsofthecommunityinfindingwaysofacquiring

knowledgeonourrecent(digital)past.

1.2Theborn-digitalturn

Thetransitionfromanaloguetoborn-digitalmaterialsisinfluencingthewayhistoriansstudythepast:

materialssuchaswebsites,forums,blogs,tweets,emails,areinfactverydifferentcomparedto

traditionalanalogueanddigitizedprimarysources.Born-digitalmaterialshaveanextremelyshortlife

comparedtoprinteddocumentsastheyaresignificantlymoredifficulttoarchiveandpreserve

(LaFrance,2015).Thisisduetoavastnumberofreasons(Brügger,2005)andtheconsequenceofit

hasbeensummarizedbyRosenzweig(2003)withtheconceptof‘scarcity’ofdigitalprimarysources.

Webpagesdisappearconstantlyfromtheliveweb(becausetheyareremovedbytheauthororby

theowneroftheplatform,forinstanceduetocopyrightissues),leavingafamiliartraceof404status

codemessages.Severalscholars(Rosenzweig,2003;Brügger,2012amongothers)havealready

remarkedonthegreatimpactthattheephemeralityofwebmaterialswillhaveonthesharingand

accessibilityoftheknowledgeproducedinthedigitalageforthenextgenerationsofhistorians.Asit

hasbeenalreadysaid,inoppositiontothefactthat‘papersurvivesbenignneglectforalongtime’

(Davis,2014):

Thelifecycleofmostwebpagesrunsitscourseinamatterofmonths.In1997,theaverage

lifespanofawebpagewas44days;in2003,itwas100days.Linksgobadevenfaster.A

2008analysisoflinksin2,700digitalresources—themajorityofwhichhadnoprint

counterpart—foundthatabout8percentoflinksstoppedworkingafteroneyear.By2011,

whenthreeyearshadpassed,30percentoflinksinthecollectionweredead.(LaFrance,

2015)

Moreover,whilesometypesofpagesdisappearmorefrequentlythanothers(e.g.socialmedia

messagesasopposedtoofficialstatementsonadministrativewebsites),thosethatdosurvivetendto

changeveryfrequently(Doughertyetal.,2010).Forexample,articlesinnewspapers(Nanni,2013)as

wellasofficialadministrativepageshavebeenoftenmodifiedwithoutaspecificmention(Owenand

Davis,2008).WhileinitiativessuchastheInternetArchivehavealongtraditionofpreservingborn-

digitalmaterialsforfutureresearch,severalissuesstillexistandnewissuescontinuetoemerge-not

intheleastduetoconstantinnovationsinwebtechnologies.Therefore,researchershavetodealwith

thecollectedmaterialsinahighlycriticalway,asBrügger(2012)describedwhenheintroducedhis

definitionofwebarchivedocumentsasreborn-digitalmaterials:

Oneofthemaincharacteristicsofwebarchivingisthattheprocessof

archivingitselfmaychangewhatisarchived,thuscreatingsomethingthatis

notnecessarilyidenticaltowhatwasonceonline.[...]And,second,thatawebsitemaybe

updatedduringtheprocessofarchiving,justastechnicalproblemsmayoccurwherebyweb

elementswhichwereinitiallyonlinearenotarchived.Thus,itcanbearguedthattheprocess

ofarchivingcreatesthearchivedwebonthebasisofwhatwasonceonline:theborn-digital

webmaterialisreborninthearchive.(Brügger,2012)

Thedifficultiesinthepreservationofdigitalsourcespresentanewsetofissuesforhistorianswho

plantoemploythemintheirwork;however,theyremainonlypartoftheoverallproblem.Infact,

alreadyin2003,Rosenzweigenvisionedthatfuturehistorianswillnotonlydealwithaconsistent

scarcityofprimarysources,buttheywillbealsochallengedbyaneverexperiencedbeforeabundance

ofrecordsofourpast.Theindispensableneedofcomputationalmethodsforprocessingand

retrievingmaterialsfromthesehugecollectionsofprimarysourceshasbeenacentraltopicof

Milligan'spublications(2012,2016).Fromhisworksitemergesthatnowthatthecommunityis

dealingwiththeabundanceofborn-digitalsources,theuseofcomputationalapproachescannotbea

choiceforthedigitalhumanitiesresearcheranymore.Therefore,itbecomesessentialthatthe

researchersadoptthesesolutionscritically,alwaysknowingtheirpotentialandlimitations,andlearn

howtocombinethemfruitfullywiththetraditionalhistoricalmethod.

Whiletheconsequencesoftheadventofborn-digitalsourceswillberevolutionaryforourprofession,

sofar‘verylittleattentionhasbeenpaidtothenewdigitalmediaashistoricalsources’(Brügger,

2012),highlightingthefactthat,while‘newmediaisnotthatnewanymore’(Milligan,2016)forour

society,theyremainanoveltyforhistorians.

Thenextsectionswillremarkfurtheronthistopicbydescribingtwoverydifferentcasestudiesthat

havedealtwiththeuseofborn-digitaldocumentsasprimarysourcesforhistoricalresearch.Thefirst

thatwillbeintroduced,focusesonexaminingtheonlinepresenceoftheUniversityofBologna,since

theearlyNineties,andremarksontheimportanceofcombiningthetraditionalhistorian’scraftwith

approachesfromthefieldofinternetstudies.

2.StudyingtheRecentPastofAcademicInstitutions:ATaleofScarcity

Multiplehistorianshaveconsideredacademicinstitutionsaspolitical,economicalandsocialactors;

theyhavealsoarguedhowtheirpower,roleandinfluencechangedovertime,especiallyinrelationto

otheractors,suchasthecity,thechurch,thenationalgovernment(Brockliss,1978).Inparticular,the

comprehensivefour-volumebookseries`AHistoryoftheUniversitiesinEurope’,commissionedbythe

EuropeanUniversityAssociation,editedbyHildedeRidder-SymoensandWalterRüeggandpublished

between1992and2011,offersanunprecedentedoverviewonhowuniversitieshavetransformed

overcenturies:whattheyhavetaughtandresearched,howtheyhavebeeninstitutionalizedandhow

theyhaveinteractedwiththesociety.

Historiansofhighereducation,whopresentedtheirresearchinthevolume,haveadoptedalarge

varietyofprimaryandsecondarysourcesintheirworks,fromuniversity-archivematerialssuchas

matriculationandgraduationstatisticstoacademicdissertations,frompublicreportstolargescale

statisticalanalyses.Basedonthesedata,researchershavedescribedanddrawnconclusionsonthe

historyofuniversitiesonalargevarietyoftopics,suchasthewayuniversitieshavemanaged

resources,thewaytheadmissionprocesshaschangedbeforeandafter1970,andhowsciencesand

humanitieshavebeentaughtandstudied.

Thecurrentpromptavailabilityofalargevarietyofborn-digitalmaterialssuchassyllabi(Cohen,

2005),bachelor,masteranddoctoraltheses(Ramage,2011),academicwebsites(Holzmannetal.,

2016b)andtheirhyperlinkedstructure(Haleetal.,2014)isabouttobecomeanewrelevant

componentofthisfieldofresearch(Nanni,2017b).

Anemblematicexampleofthenewchallengesthatborn-digitaldocumentswillposetohistoriansof

highereducationisastudyonreconstructingtherecentpastoftheUniversityofBologna,throughits

digitalsources(Nanni,2017a).

TheUniversityofBologna'swebsite(Unibo.it),initiallycreatedin1993,representsanewcategoryof

relevantresourceforhistoriansofhighereducation.Thewebsitecollectsandofferstothereadera

largevarietyofdocuments,fromdescriptionsofeducationalprojectstooverviewsofresearch

groups,fromreportsofcollaborationwithinternationalinstitutionstoinformationonopportunities

ofinteractionswiththeprivatesector.Inaddition,italsoshowshowdifferentdepartments,

professorsandresearchteamshavebeenadoptingtheweb–especiallyinitsearlydays.Amongthe

manyrelevantexamples,onethatdeservesspecialmentionisthattheAstronomyDepartmentofthe

universitywasalreadysharingpreprintsoftheirpublicationsonlinein1994ashtmlpages,inanearly

attemptofbenefittingfromthepotentialoftheWorldWideWeb.

Nevertheless,whileUnibo.itrepresentsausefulcollectionofprimarysources,thewebsitehasbeen

modifiedseveraltimesduringitsfirsttwentyyearsandthemajorityofthepagesthathavebeen

publishedinthepastarenotavailableanymoreontheliveweb.Inparticular,thetransitiontotheso-

called‘PortaleD’Ateneo’,whichstartedintheearly2000s,requiredthatalldepartmentpageschange

theirstructureandadoptacommonlayoutandorganizationoftheircontent.Thishasoftenforced

thecreationofbrand-newdepartmentsubdomainsandtheremovalthepreviousversionsofthe

samefromtheliveweb.Asanadditionalissue,theteamthathasmanagedthewebsiteduringthis

entiretransitionhasnotconsistentlyarchivedthepreviousversionsofthewebsiteanddocumented

theirwork.

Giventhefactthatasof2017theNationalLibrariesofFlorenceandRomearestillnotpartofthe

InternationalInternetPreservationConsortium(IIPC)andnocoordinatedprojectwiththespecific

purposeofpreservingthenationalwebspherecurrentlyexistsinItaly,theInternetArchiveremains

theonlyresourceavailableforrecollectingallthematerialsthatarenotavailableontheUniversityof

Bolognawebsiteanymore.However,in2002aremovalrequest4fromtheadministrativeteamof

Unibo.itwassenttotheInternetArchive,andforthisreasonUnibo.ithadbeeninaccessiblethrough

theWaybackMachineformorethanthirteenyears.Thishighlycomplexsituationreflectsanewlevel

ofdifficultiesthatfuturehistorianswillencounterwhileattemptingtocollectborn-digitalsources.In

thenextsection,anoverviewofthevarietyofsourcesandmethodsthathavebeenusedtodealwith

thisissueandtoreconstructthepastofUnibo.itwillbepresented.

LibraryandArchiveMaterials

Asaninitialstepoftheresearch,materialsavailableintheuniversitylibraryandarchiveswere

consulted.Amongmanyotherdocuments,averyusefulsourcehasbeentheuniversityyearbook.In

theearly90sonlyafewpiecesofinformationregardingthewebsitewerementionedinthe

yearbook;nevertheless,thissourceofferedaninitialdiachronicoverviewoftheofficialteamsthat

weremanagingUnibo.itandwasusefulfordrawingalistofpeopletointerview.

Interviews

Inordertocapturetherationaleandthechangingarchitectureofthewebsite,thedifferentteams

whomanagedthewebsitewereinterviewed,togetherwithtechniciansandresearcherswhoworked

onthedevelopmentofthepagesofvariousdepartments,especiallyduringthe‘90s.Yetanother

interestingfinding,presumablyhighlyrelevantforfuturehistorians,wasthatmanytimesduringthe

interviewsthesubjectsusedpublicandprivatebackupsofemailsinordertorecollectthememories

oftheirexperienceinworkingonUnibo.itandtoconfirmpassagesofthehistoricalreconstruction.

Newspapers

Asalreadydoneinpreviouswork(Brügger,2011),whereprintedmediawereusedtoretrieve

informationaboutthewebofthepast,informationrelatedtoUnibo.itandtheroleofthewebsitefor

theUniversityofBolognahavebeenidentifiedinlocalandnationalnewspaperarchives.Duringthe

‘90s,newspaperssuchasLaRepubblicaandIlRestodelCarlinopublishedafewshortarticlescovering

thenewfunctionalitiesonthewebsite(e.g.freeemailaccountforallstudents,onlinefeepayments,

etc.).Thesepublications,togetherwithmaterialscollectedfromtheuniversitydigitalmagazines

(Alma2000,AlmaNews,UniboMagazine),offeredanadditionaloverviewonhowtheuniversity

decidedtopromotethewebsitetoitsaudience.

OnlineForums

Togetacloserlookattheeverydayuseofthewebsitebystudentsandresearchers,othermaterials

havebeencollectedandanalyzed,startingfromstudentforums(e.g.UniversiBo)andUsenet

discussionspreservedbyGoogle.Thesedocuments,especiallyinthe‘90s,presenttheperspective

andenthusiasmofarathersmallbutspecificsubsetoftheuniversitycommunity,namelystudents,

researchersandprofessorsinSTEMfields,whosedepartmentswereamongthefirstonestooffer

accesstotheweb.

LiveWebMaterials

Whilethewebsitehasbeenrestructuredmultipletimesduringitsfirst20yearsonline,many

resourcesarestillavailableonthelivewebandcanrevealthecurrentroleofwebsiteinthe

university'sorganizationandmanagement(e.g.attractingnationalandinternationalstudentsand

researches,promotingcollaborationswiththeprivatesector,etc).Additionally,thesocialmedia

pagesoftheinstitution(suchasFacebook,YoutubeandTwitterprofiles)arebecomingkey

componentsofitspresenceonline,showingalternativeandmoreinformalwaysofinteractionwith

theusers.

PresenceofItalianWebsitesinOtherNationalWebArchives

AsidefromtheInternetArchive,since1996nationallibrariesfromallaroundtheworldhavealso

beguntopreservetheirnationalwebpast.PANDORA,startedin1996bytheNationalLibraryof

Australia,theUKWebArchive(2004),theNetarkivet(2005)inDenmarkandthePortugueseWeb

Archive(2011)arejustafewexamplesofthisinternationalendeavor.Giventhecomplexityof

definingandpreservingwhatiscalleda‘nationalweb-sphere’(Brügger,2009),thisresearchalso

exploredtheuseofforeignwebarchivesasaproxyforstudyingUnibo.it.Thepracticeofretrieving

primarysourcesrelatedtoanItalianuniversitywebsiteinforeignwebarchivescouldseemratherodd

asthegoalofanationalwebarchiveispreciselytopreservethewebofitscountry,howeverfrom

timetotimepartofthenon-nationalwebalsoendsupbeingpreserved,unintentionally,bythese

digitalarchives.

Forexample,toarchivenationalwebspheresinanautomaticway,archivistscouldsetupcrawlers

withamaximumnumberofhyperlinkstheycanfollow,withaspecificsetofstartingpoints.Acrawler

whichissettogoatmosttenlinksawayfromoneoftheseURLscouldalsoendupcrawlingnon-

nationalcontent,asitwillsystematicallyfollowallthehyperlinks.Forthisreason,iftheUniversityof

BolognaweretoorganizeaSummerSchoolandAarhusUniversityhadlinkeditfromitswebsite,the

UniversityofBolognawebsite(oratleastpartofit)wouldbeunintentionallypreservedintheDanish

WebArchive.

Asapartofthiswork,ithasbeenfoundoutthatboththePortuguese(Arquivo)andDanish

(Netarkivet)webarchiveshavepreservedpartsofUnibo.itseveraltimes,since2006.

ClonedVersionsoftheWebsite

Amongthevarietyofsourcesavailable,onedeservesaspecificmention.InMay2007,agroupof

activistsdecidedtocreateacopyoftheUnibo.itwebinterface,aspartofaprotestagainstthe

EuropeanCreditTransferandAccumulationSystem(ECTS)fortheevaluationofthenumberofhours

ofstudy.IntheURLhttp://www.unibologna.euanidenticalversionofthewebsitewasavailable,with

thedescriptionofthereasonsoftheprotest.

Thissourcehasnotonlybeenimportantinthisstudyasitdocumentedaninnovativewayof

conductingaprotestagainstanacademicinstitution(bytargetingitswebsite),butalsobecausethe

cloned-websitewaspreservedbytheInternetArchive.

ACriticalCombinationofSourcesandMethods

Thecombinationoftraditionalarchivalpracticeswithapproachesfromthefieldofinternetstudiesis

essentialintheattemptoffacingthisemblematicexampleofscarcityofborn-digitalprimarysources

andreconstructingthepastoftheUniversityofBolognawebsite.Thisnewmethodologyforcollecting

born-digitalevidenceshasbeenespeciallyusefulinidentifyingthenarrativebehindtheearlyyearsof

Unibo.it,whichinvolvesthearrivalofaTurkishprofessorfromtheUnitedStatesattheuniversityin

1988,theestablishmentofthesecondItaliannodetotheInternetandthecreationofarguablyoneof

themostrelevantuniversitywebsitesofthecountry5.

Whilethedifficultiesinreconstructingtherecentpastofauniversitywebsitecouldsurprisethe

reader,aslessthan30yearshavebeenpassedsinceitscreation,theyonlyrepresentonepartofthe

newissuesthatborn-digitalsourceswillposetofuturehistorian.

Asithasbeenpreviouslyremarkedandwillbeexpandedinthenextsection,futurehistorianswillbe

infactalsochallengedbyaneverexperiencedbeforeabundanceofrecordsofourpast.Thesecond

casestudypresentedinthischapterfocusesonobtainingsmalltopic-specificcollectionsfromlarge-

scalearchivesoftheweb;bypresentingtheencounteredchallengesanddescribingtheadopted

solutions,itwillberemarkedontheimportanceoffruitfullycombiningthetraditionalhistorical

methodwithapproachesfromthefieldofnaturallanguageprocessing.

3.CreatingPoliticalEventCollections:ATaleofAbundance

TheWorldWideWebprovidestheresearchcommunitywithanunprecedentedabundanceof

primarysourcesfordiachronicallytracing,examiningandunderstandingmajoreventsand

transformationsinoursociety.Fortwodecades,publicandprivateinstitutionshavepreservedthese

born-digitalmaterialsforfutureanalysis(GomesandCosta,2011).However,thesecollectionsare

nowsolargethat–intherarecaseswhentheyarefullyavailableforresearch(Hockx-Yu,2014)–itis

notfeasibleforscholarstostudypoliticalandsocialphenomenabyexaminingthemintheirentirety.

IfweforinstanceconsidertheInternetArchive,duringitsfirsttwentyyearsithaspreservedalmost

500billionwebpages,andasof2017ithasacollectionofaround25petabytesofdata.Since2001,

thiscollectionhasbecomeavailableforresearchthroughaURLsearchtoolontheWaybackMachine.

Inthemostrecentyears,informationretrievalsystemssupportingkeywordsearchoverthe

diachroniclayersofwebarchiveshavebeendevelopedbytheresearchcommunityandemployedby

institutionssuchastheUKWebArchiveand–since2017–alsopartiallybytheInternetArchive.In

additiontothis,out-of-the-boxtoolssuchasArchiveSpark(Holzmannetal.,2016a)andWarcbase(Lin

etal.,2017)havebeendevelopedbytheresearchcommunitywiththespecificgoalofsupporting

scholarsingatheringinformationfromlarge-scalewebarchivecollections.

Oneofthemainendeavorsofwebarchiveinstitutionsforfosteringtheuseofthesenewresourcesis

tooffermanuallycuratedsub-collectionsregardingrecentsocio-politicalevents.OnArchive-It–a

subscriptionwebarchivingserviceprovidedbytheInternetArchive–afewcollectionsregarding

large-scaleeventssuchastheBostonMarathonShooting,theBlackLivesMattermovementandthe

CharlieHebdoterroristattackareavailable.Thecollectionsarecuratedby‘theArchive-Itteamin

conjunctionwithcuratorsandsubjectmatterexpertsfrominstitutionsaroundtheworld’.

Inadditiontomanualselection,anothersolutionemployedbydigitalarchivistsforcreatingand

sharingtheseeventcollectionsistoadoptafilteringapproachthatpresentstotheuseronlythose

documentsthatmentionthenameoftheevent.Thistypeofapproachiscommoninevent-harvesting

fromTwitter,whereresearcherscollectalltweetsthat–forexample–mentionthehashtagofthe

event.

Whilebothcollectingdocumentsfromwebarchivesthroughmanualselectionandretrieving

materialsthroughname-filteringhavealreadyprovedtheirusefulnessinsupportingresearchersin

thehumanitiesandsocialsciences(e.g.,Small,2011),theyhaveafewcruciallimitations.Onone

hand,manualselectionisobviouslyapainstakinglylongprocess–giventhepreviouslymentioned

difficultiesofretrievinginformationfromwebarchives.Ontheotherhand,collectingdocuments

usingtheevent-nameheuristicspresentsthecruciallimitationofoftenmissinginformationon

backgroundstoriesaswellaspremisesoftheexaminedevents.Togiveaspecificexample,letus

imaginethatthegoalistocollectprimarysourcesregardingthe2004UkraineOrangeRevolution.If

theadoptedmethodonlyretrievesdocumentsthatmentionthenameoftheevent,itwillnotcollect

materialsthatconnectthepremisesoftherevolutiontothepreviouscontroversialpresidential

electioninthecountry.AndthesameissuewillemergewhenstudyingthefirstfreeAlgerianelections

sincetheirindependence(1990),whichisapremiseoftheAlgeriancivilwar,orevenwhen

investigatingtheeconomiccrisisbehindFujimori'sauto-golpeinPeru,1992.Inthislastcase,the

documentsthatdiscusstheadoptionofausteritymeasureswillbenotbepartofthecollection.

Moreover,thenameusedforreferringtoaneventmightchangeovertimeorvarybetweencountries

andlanguages:forexample,oneoftheearlyhashtagsusedforthe2011EgyptianRevolutionwas

#jan25,referringtothedayitstarted.

Thesecondcasestudypresentedinthischapterisaninterdisciplinaryprojectbetweencomputer

scienceandpoliticalhistoryfocusedonbuildingmorecomprehensivesub-collectionsregarding

eventssuchaselections,protestsandpoliticalcrisesfromlarge-scalewebarchives.Aspartofthis

research,asystemthatemploysnaturallanguageprocessingmethodsandinformationretrieval

approacheshasbeendeveloped,whichisabletogatherandorganizeahighlycomprehensive

collectionofsourcesdescribingaspecificevent(Nannietal.,2017).Thedevelopedapproachis

inspiredbythefactthat,whenhistoriansareconductingthesametaskmanually(i.e.,identifying

relevantmaterialsacrossanentirearchive),theydonotnecessarilysearchonlyfordocumentsthat

mentionthenameoftheevent.Whathistorianswilltrytocollectarealsothosedocumentsthattalk

aboutrelatedaspectswhichprovidethecontext,involvingforexamplesomeoftheparticipantsto

theevent,butnotothers.IfweconsiderthepreviousexampleregardingtheOrangeRevolution,

historianswillalsobeinterestedinmaterialsfromthesameperiodoftimediscussingthepolitical

careerofYuliaTymoshenkooraddressingthestateofthepoliticalrelationsbetweenUkraine,Russia

andtheEuropeanUnion.

IdentifyingRelatedConceptsandEntities

Inordertoachievethisgoalinanautomatedfashion,thefirststepistobeabletoidentifyasetof

conceptsandentitiesthatarerelevanttoanevent.Todoso,DBpedia(Aueretal.,2007)hasbeen

employed.Thisisalarge-scaleknowledgebaseextractedfromWikipedia,whereevents(suchasthe

OrangeRevolution)arerepresentedbynodesandconnectedthroughedges(i.e.,hyperlinksin

Wikipedia)tootherrelatedentities.

RetrievingContextualPassages

Foreachcollectedentityandconcept,atextualpassagepresentingitinthecontextoftheeventwas

alsoextractedfromWikipedia(forexample:‘YuliaTymoshenkoco-ledtheOrangeRevolutionandwas

thefirstwomanappointedPrimeMinisterofUkraine’).Thisisanoptimalsolutionforidentifying

othertermsthatcouldbeusefultoindentifyrelevantdocuments.

RankingConceptsandEntities

Havingobtainedaninitialsetofpotentiallyrelevantconceptsandentities,thegoalistoscoreeachof

themonhowrelevanttheyaretotheevent.Forexample,whileYuliaTymoshenkoishighlyrelevant

fortheOrangeRevolution,theEuropeanUnionplayedonlyamarginalroleintheevent.Different

approachesforrankingentitiesandconceptsforrelevanceweretestedandthebestperforming

solutionwastocomputedistancesbetweenentitiesandtheeventemployingout-of-the-boxRDF

vector-representations(RistoskiandPaulheim,2016).

FindingMentionsinText

Havingourrankedsetofentitiesandconcepts,otherdocumentswereretrievedfromtheweb-

archivementioningtheminrelevantcontexts.Inordertogobeyondsimplestring-matchingof

conceptsthatareconsideredrelevant(e.g.,‘protests’,‘revolution’,‘crisis’,‘election’),word-

embeddingrepresentations(Mikolovetal.,2013)havebeenadopted.Embeddingtechniques

representeachword,entityorconcept(e.g.,‘protest’)asanumeric-vectorofndimensions.This

allowstomeasuresimilarityacrossdifferentwordsandtocollectrelevantmaterialseveniftheytalk

about‘demonstration’or‘crisis’,insteadforexampleofmentioning‘protest’or‘revolution’.

FinalCollectionBuilding

Itcouldhappenthatdocumentsmentionrelevantentitiesandconceptsoutofcontext,forexample

aspartofacomparison:‘ThepopularoppositiontoEthiopia'scurrentcorruptregimeiscomparable

totheOrangeRevolutioninUkraine.’.Inordertofilterthemoutandselectonlythedocumentsthat

shouldbeincludedintheevent-collection,amachinelearningsystemcalledLearningtoRank(Liu,

2009)hasbeenemployed,which,givenaninitialsetofrelevantandnotrelevantdocuments,learns

howtoabstractthispropertyandtoautomatetherankingprocess.

ACriticalCombinationofSourcesandMethods

Thecombinationoftraditionalpracticesofhistoricalresearchwithmethodologiesandapproaches

fromthefieldsofnaturallanguageprocessingandinformationretrievalisessentialforfacingthe

largeabundanceofborn-digitalprimarysources.Someoftheapproachespresentedinthischapter

havebeenalreadyadoptedinpoliticalscienceresearch.Oneofthesefirststudiesfocuseson

retrievingdocumentswhichreferredtopoliticalevents(e.g.,elections)frominstitutionalweb

collectionsoftheUnitedStatesgovernmentinordertodefineanewmeasureof‘attention’ofthe

U.S.CongressandthePresidenttodemocratizationandelectoralpracticesinothercountries,from

ZimbabwetoHaitiandEgypt(Elshehawyetal.,2017).Bydoingso,thisinitialworkhighlightsboththe

potentialandchallengesofusingborn-digitaldocumentsandcomputationalmethodsforobtaining

newinsightsontherecentpoliticalpast.

Thetwocasestudiespresentedinthischapterrevealtheimportanceofadoptingahighly

interdisciplinaryapproachwhendealingwithborn-digitalsources;methodologiesfromthefieldof

internetstudiescouldsupporthistoriansinreconstructinglostwebpages,whilenaturallanguage

processingmethodscouldguidetheminretrievingdocumentsfromlarge-scalewebarchives.The

finalpartofthischapterwillremarkfurtheronthis,bydiscussingontheimportanceofofferingthis

interdisciplinarypreparationtofuturehistoriansintheireducationalprograms.

4.Conclusion:ANewGenerationofHistorians

Inrecentyears,researchershavearguedthathistory,asotherhumanitiesdisciplines,isreachinga

turningpointinitsmethodology(Scheinfeldt,2012;Graham,MilliganandWeingart,2015;Nelson,

2016):sustainedbytheeffortsofmanydigitizationprojects,thecommunityhasbeenemploying

computationalmethodsinordertoexaminethesevastresourcesandobtainingnewinsights.This

changeinmethodologyhasreopenedalong-termdebateregardingthewaystextualevidenceofthe

pastcanandshallbeproperlyinterpreted.

Whileforthehistoricalprofessionitisofcoursebeneficialtoconstantlydebateandcriticizethe

validityofestablishedpracticesofacquiringknowledgefromsources,itisarguedinthischapterthat

theadoptionofdigitizeddatasetsandcomputationalmethodscannotbeconsidered,byitself,the

triggeringfactorofafundamentalturningpointinourprofession.Infact,adopting(ornot)large-scale

datasetsofdigitizedsources,togetherwithcomputationalmethods,willalwaysremainachoicefor

thehistoryscholar:CharlesDarwincanstillbestudiedwithoutconductingtextminingoverthe

collectionspresentedonDarwinOnline,aswellastheLondonof18thcenturycanbeexamined

withoutdistantreadingtheProceedingsoftheOldBaileyOnline.

However,itisalsoarguedthathistoryisinfactabouttofaceaparadigm-shiftingtransitioninits

methods,butthetriggeringcauseofthistransitionreliesontheborn-digitalnatureofthelarge

majorityofsourcesproducedbycontemporarysocieties.Thischangeaffectsanytypeofdocument

wecreateandconsumeinoureverydaylife,frombureaucraticformscollectedbythepublicsectorto

newspapersarticlestopoliticalmailcorrespondencestouniversitywebsites,anditisaboutto

presentitsmultifariousconsequencesonhistoricalresearch.

Born-digitalsourcesaresignificantlymorecomplextoarchive,collect,analyzeandselectcomparedto

traditionalmaterials.Websites(suchasUnibo.it),arelargeandvariegatedcollectionsofdocuments,

whichareoftennotpreservedintheirentiretybywebarchiveinitiativesandcanbere-constructed

onlythroughthemeticulouscombinationofvariouspiecesofinformationfromdifferentsources.

Whenaresource,suchastheinstitutionalwebsiteofanadministrationisfinallyre-created,itisoften

sovastthatcomputationaltechnologies(i.e.naturallanguageprocessingmethodsandinformation

retrievalapproaches)arenecessaryforidentifyingandretrievingspecificdocuments.

Themethodologicalstepsoverviewedinthischapterforcollecting,analyzingandselectingborn-

digitaldocumentsrequirestronginterdisciplinarycompetencesandahighlycriticalattitudetowards

sourcesandmethods.Inthiscomplexscenario,thischapterconcludesbyraisingaverypressing

question:howcanthenewgenerationsofhistoriansbepreparedtofacethesenewchallenges?

Inrecentyears,thedigitalhistorycommunityhasalreadyofferedmanyeducationalactivitieson

computationalmethodstoitsstudents.Fromworkshopstopanels,fromcoursestosummerschools,

fromtutorialstohackathons,theseinitiativeshavealmostalwaysbeenfocusedonpresentingthe

potentialofnewresources,toolsandplatformstothehistorystudents,followinganattitudewhich

hasbeenbrandedas‘morehack,lessyack’(Nowviskie,2014).Whileofferinghands-onexperiences

withcomputationaltoolsisimportantinordertointroducehistorystudentstothedigitalhumanities,

acriticalapproachisstronglyneededinordertoproperlydealwithborn-digitalsourcesand

computationalmethods.

Forthisreason,itisessentialthatstudentswillfirstofallbeguidedinshapingtheirresearchtopics

andreceiveearlyonintheirstudiesthepreparationnecessarytosupportacriticalanalysisofthe

born-digitaldocumentsandcomputationalmethodsattheirdisposal.Thiswillbeimperativefora

generationofhistorianswhowillbeabletogobeyondanunquestionedadoptionofthenewsources

andtoolsattheirdisposalandwillinsteadcriticallyemploythem,insearchofnewhistorical

perspectives.

References

AuerS.,BizerC.,KobilarovG.,LehmannJ.,CyganiakR.,IvesZ.(2007)‘DBpedia:ANucleusforaWeb

ofOpenData’,Proceedingsofthe6thInternationaland2ndAsianConferenceonSemanticWeb:722-

735.

Barthes,R.(1967)‘DiscourseonHistory’,SocialScienceInformation,6(4):65-75.

Blevins,C.(2014)‘Space,Nation,andtheTriumphofRegion:AViewoftheWorldfromHouston’,The

JournalofAmericanHistory,101(1):122-147.

Bloch,M.(1949)Apologiepourl'histoire,ou,Métierd'historien,ArmandColin,Paris.

Brockliss,L.W.(1978)‘PatternsofAttendanceattheUniversityofParis,1400–1800’,TheHistorical

Journal,21(3):503-544.

Brügger,N.(2005)‘ArchivingWebsites:GeneralConsiderationsandStrategies’,TheCentrefor

InternetResearch,Aarhus.

Brügger,N.(2009)‘WebsiteHistoryandtheWebsiteasanObjectofStudy’.NewMedia&Society,

11(1-2):115-132.

Brügger,N.(2011)‘WebArchiving–betweenPast,Present,andFuture’,inM.ConsalvoandC.Ess

(ed.),TheHandbookofInternetStudies,Wiley-Blackwell,Oxford.

Brügger,N.(2012)‘WhenthePresentWebisLaterthePast:WebHistoriography,DigitalHistory,and

InternetStudies’.HistoricalSocialResearch/HistorischeSozialforschung,37(4):102-117.

Burke,P.(2008)WhatisCulturalHistory?,Polity,Cambridge(UK).

Bush,V.(1945)‘AsWeMayThink’,TheAtlanticMonthly,176(1):101-108.

Cohen,D.J.,&Rosenzweig,R.(2005)DigitalHistory:AGuidetoGathering,Preserving,andPresenting

thePastontheWeb,UniversityofPennsylvaniaPress.

Cohen,D.J.(2005)‘Bythebook:AssessingthePlaceofTextbooksinUSSurveyCourses’,TheJournal

ofAmericanHistory,91(4):1405-1415.

Cohen,D.J.,Frisch,M.,Gallagher,P.,Mintz,S.,Sword,K.,Taylor,A.M.,&Turkel,W.J.(2008)

‘Interchange:ThePromiseofDigitalHistory’,TheJournalofAmericanHistory,95(2):452-491.

Daumard,A.,&Furet,F.(1959)‘Méthodesdel'histoiresociale:lesarchivesnotarialesetla

mécanographie’.Annales.Histoire,SciencesSociales,14(4):676-693.

Davis,C.(2014)‘ArchivingtheWeb:ACaseStudyfromtheUniversityofVictoria’.code{4}libJournal,

26(http://journal.code4lib.org/articles/10015)

Derrida,J.(1967)OfGrammatology,LesÉditionsdeMinuit,Paris.

Dougherty,M.,Meyer,E.T.,Madsen,C.M.,VandenHeuvel,C.,Thomas,A.,&Wyatt,S.(2010),

‘ResearcherEngagementwithWebArchives:StateoftheArt’,PreprintonSSRN

(https://ssrn.com/abstract=1714997)

Elshehawy,A.,Marinov,N.,&Nanni,F.(2017)‘QuantifyingAttentiontoForeignElectionswithText

AnalysisofUSCongressandthePresidency’,PreprintonSSRN(https://ssrn.com/abstract=2981486)

Evans,R.J.(2001)InDefenceofHistory,GrantaBooks,London.

Fogel,R.W.,&Engerman,S.L.(1974)TimeontheCross,UniversityPressofAmerica,Lanham,

Maryland.

GomesD.,MirandaJ.,CostaM.(2011)‘ASurveyonWebArchivingInitiatives’,Proceedingsofthe

15thinternationalconferenceonTheoryandpracticeofdigitallibraries:408-420.

Graham,S.,Milligan,I.,&Weingart,S.(2015)ExploringBigHistoricalData:TheHistorian's

Macroscope,ImperialCollegePress,London.

Greif,A.(1997)‘CliometricsAfter40Years’,TheAmericanEconomicReview,87(2):400-403.

Hale,S.A.,Yasseri,T.,Cowls,J.,Meyer,E.T.,Schroeder,R.,&Margetts,H.(2014)‘MappingtheUK

Webspace:FifteenYearsofBritishUniversitiesontheWeb’,Proceedingsofthe2014ACMConference

onWebScience:62-70.

Hockx-Yu,H.(2014)‘AccessandScholarlyUseofWebArchives’,Alexandria:TheJournalofNational

andInternationalLibraryandInformationIssues,25(1-2):113-127.

Holzmann,H.,Goel,V.,&Anand,A.(2016a)‘Archivespark:EfficientWebArchiveAccess,Extraction

andDerivation’,Proceedingsofthe2016IEEE/ACMJointConferenceonDigitalLibraries(JCDL):83-92.

Holzmann,H.,Nejdl,W.,&Anand,A.(2016b)‘TheDawnofToday'sPopularDomains:AStudyofthe

ArchivedGermanWebOver18Years’,Proceedingsofthe2016IEEE/ACMJointConferenceonDigital

Libraries(JCDL):73-82.

Iggers,G.G.(2005)HistoriographyintheTwentiethCentury:FromScientificObjectivitytothe

PostmodernChallenge,WesleyanUniversityPress,Middletown(CT).

Knowles,A.K.,&Hillier,A.(eds)(2008)PlacingHistory:HowMaps,SpatialData,andGISare

ChangingHistoricalScholarship,ESRI,NewYork.

LaFrance,A.(2015)‘RaidersoftheLostWeb’.TheAtlantic,14

(https://www.theatlantic.com/technology/archive/2015/10/raiders-of-the-lost-web/409210/)

Lin,J.,Milligan,I.,Wiebe,J.,&Zhou,A.(2017)‘Warcbase:ScalableAnalyticsInfrastructurefor

ExploringWebArchives’,JournalonComputingandCulturalHeritage(JOCCH),10(4):22.

Liu,T.Y.(2009)‘LearningtoRankforInformationRetrieval’,FoundationsandTrendsinInformation

Retrieval,3(3):225-331.

Lyotard,J.F.(1979)Thepostmoderncondition:AReportonKnowledge,Minuit,Paris.

Munslow,A.(2006)DeconstructingHistory.Routledge,NewYork.

Milligan,I.(2012)‘Miningthe‘InternetGraveyard’:RethinkingtheHistorians’Toolkit’.Journalofthe

CanadianHistoricalAssociation/RevuedelaSociétéhistoriqueduCanada,23(2):21-64.

Milligan,I.(2016)‘LostintheInfiniteArchive:ThePromiseandPitfallsofWebArchives’,International

JournalofHumanitiesandArtsComputing,10(1):78-94.

Mikolov,T.,Sutskever,I.,Chen,K.,Corrado,G.S.,&Dean,J.(2013)‘DistributedRepresentationsof

WordsandPhrasesandtheirCompositionality’,Proceedingsofthe26thInternationalConferenceon

NeuralInformationProcessingSystems:3111-3119.

Moretti,F.(2013)DistantReading.VersoBooks,London.

Nanni,F.(2013)‘L’archiviazionedellepaginedeiquotidianionline’,Diacronie.StudidiStoria

Contemporanea,15(3)(http://www.studistorici.com/wp-content/uploads/2013/10/02_NANNI.pdf)

Nanni,F.(2017a)‘ReconstructingaWebsite'sLostPast:MethodologicalIssuesConcerningtheHistory

ofwww.unibo.it’,DigitalHumanitiesQuarterly.11(2)

(http://www.digitalhumanities.org/dhq/vol/11/2/000292/000292.html)

Nanni,F.(2017b)‘TheWebasaHistoricalCorpus:Collecting,AnalysingandSelectingSourcesonthe

RecentPastofAcademicInstitutions’,Ph.D.Dissertation,UniversityofBologna.

Nanni,F.,Ponzetto,S.P.,&Dietz,L.(2017)‘BuildingEntity-CentricEventCollections’,Proceedingsof

2017IEEE/ACMJointConferenceonDigitalLibraries(JCDL):199-209.

Nelson,R.K.(2016)‘DigitalHumanitiesasAppendix’,AmericanQuarterly,68(1):131-136.

Noiret,S.(2015)‘DigitalPublicHistory:BringingthePublicBackIn’,PublicHistoryWeekly,3(13)

(http://hdl.handle.net/1814/38393).

Nowviskie,B.(2014)‘OntheOriginof“Hack”and“Yack”’,,inM.K.GoldandL.F.Klein(eds)Debates

inDigitalHumanities(2ndedn),UniversityofMinnesotaPress

(http://dhdebates.gc.cuny.edu/debates/text/58)

Owen,D.,&Davis,R.(2008)‘PresidentialCommunicationintheInternetEra’,PresidentialStudies

Quarterly,38(4):658-673.

Ramage,D.R.(2011)‘StudyingPeople,Organizations,andtheWebwithStatisticalTextModels’,

Ph.D.Dissertation,StanfordUniversity.

Ristoski,P.,&Paulheim,H.(2016)‘RDF2vec:RDFGraphEmbeddingsforDataMining’,Proceedingsof

the2016InternationalSemanticWebConference:498-514.

Robertson,S.(2016)‘TheDifferencesBetweenDigitalHistoryandDigitalHumanities’,inM.K.Gold

andL.F.Klein(eds)DebatesinDigitalHumanities(2ndedn),UniversityofMinnesotaPress.

(http://dhdebates.gc.cuny.edu/debates/text/76).

Rosenzweig,R.(2003)‘ScarcityorAbundance?PreservingthePastinaDigitalEra’,TheAmerican

HistoricalReview,108(3):735-762.

Rothman,J.(2014)‘AnAttempttoDiscovertheLawsofLiterature’,TheNewYorker.

Rüegg,W.,&deRidder-Symoens,H.(eds)(1992)AHistoryoftheUniversityinEurope,Cambridge

UniversityPress,Cambridge.

Scheinfeldt,T.(2012)‘SunsetforIdeology,SunriseforMethodology’,inM.K.GoldandL.F.Klein

(eds)DebatesinDigitalHumanities(1stedn),UniversityofMinnesotaPress:124-127.

Schreibman,S.,Siemens,R.,&Unsworth,J.(eds)(2004)ACompaniontoDigitalHumanities,Blackwell

Publishing,Oxford.

Shafer,R.J.(1974)AGuidetoHistoricalMethod,DorseyPress,Belmont(CA).

Small,T.A.(2011)‘WhattheHashtag?AContentAnalysisofCanadianPoliticsonTwitter’,

Information,Communication&Society,14(6):872-895.

Thaller,M.(1991)‘TheHistoricalWorkstationProject’,ComputersandtheHumanities,25(2):149-

162.

ThomasIII,W.G.(2004)‘ComputingandtheHistoricalImagination’,inSchreibman,S.,Siemens,R.,&

Unsworth,J.(eds)ACompaniontoDigitalHumanities,BlackwellPublishing,Oxford:56-68.

Wilkens,M.(2013)‘TheGeographicImaginationofCivilWar-EraAmericanFiction’,AmericanLiterary

History,25(4):803-840.

1Itisalsoimportanttoacknowledgethatreactionstopostmodernapproachesarepresentaswellinthehistoriographicdebate

(seeforexampleEvans,2001).

2SeeforexampletheadoptionofsocialsciencemethodologiesinhistoricalresearchinFogelandEngerman(1974).

3However,therelationshipbetweenhistoryandcomputingontheonesideandliteraryandlinguisticcomputingontheother

sidehasalwaysbeencomplicated(seeforexampleRobertson,2016).

4AsdescribedintheFAQsectionoftheInternetArchive,awebsiteownercanrequesttostopcrawlingorarchivingasiteand

theInternetArchivewillendeavortocomplytoit.Thiswillbesignaledbya'blockedsiteerror'messagesuchas‘ThisURLhas

beenexcludedfromtheWaybackMachine’.

5In2001theUniversityofBolognawebsitewonthe‘WWW’prizefromtheItalianeconomicnewspaperIlSole24Oreforthe

bestwebsiteinthecategory‘School,universityandresearch’.Then,forthreeconsecutiveyears(2005-2007)Unibo.itreceived

the‘Osc@rdelweb’prizeasthebestItalianpublicadministrationwebsite.In2007LuigiNicolais,theItalianMinisterofPublic

Administration,wasalsopresenttoconfertheprize.