This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The...

26
This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. This version is subsequent to peer review but before type-setting and proofing, so don’t cite this, cite the version of record, available from https://uk.sagepub.com/en-gb/eur/the-sage-handbook-of-web-history/book252251 This version can be reused under CC-BY-NC-ND 4.0: https://creativecommons.org/licenses/by-nc-nd/4.0/ Collecting Primary Sources from Web Archives: A Tale of Scarcity and Abundance Federico Nanni Data and Web Science Group University of Mannheim La diversité des témoignages historiques est presque infinie. (Bloch, 1949) The World Wide Web is the largest collection of human testimonies that we have ever had at our fingertips. Spanning from institutional websites to digital libraries, from personal blogs to Twitter accounts of prominent politicians, from online newspapers to large-scale knowledge bases, an immense number of born-digital testimonies is waiting to be retrieved, selected and studied by future historians. In addition to this, while these new resources are piling up steadily in front of our eyes, they are also rapidly replacing their analogue counterparts, from printed news articles to personal diaries, from letter correspondences to scientific publications. By acknowledging this sudden transition in production from printed to digital documents, the goal of this chapter is to present and discuss some of the new methodological issues that arise when these materials are to be employed as primary sources for studying the recent past. Firstly, an overview of

Transcript of This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The...

Page 1: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Thischapterispartof“TheSAGEHandbookofWebHistory”,editedbyNielsBrüggerandIanMilligan.Thisversionissubsequenttopeerreviewbutbeforetype-settingandproofing,sodon’tcitethis,citetheversionofrecord,availablefromhttps://uk.sagepub.com/en-gb/eur/the-sage-handbook-of-web-history/book252251ThisversioncanbereusedunderCC-BY-NC-ND4.0:https://creativecommons.org/licenses/by-nc-nd/4.0/

CollectingPrimarySourcesfromWebArchives:

ATaleofScarcityandAbundance

FedericoNanni

DataandWebScienceGroup

UniversityofMannheim

Ladiversitédestémoignageshistoriques

estpresqueinfinie.

(Bloch,1949)

TheWorldWideWebisthelargestcollectionofhumantestimoniesthatwehaveeverhadatour

fingertips.Spanningfrominstitutionalwebsitestodigitallibraries,frompersonalblogstoTwitter

accountsofprominentpoliticians,fromonlinenewspaperstolarge-scaleknowledgebases,an

immensenumberofborn-digitaltestimoniesiswaitingtoberetrieved,selectedandstudiedbyfuture

historians.Inadditiontothis,whilethesenewresourcesarepilingupsteadilyinfrontofoureyes,

theyarealsorapidlyreplacingtheiranaloguecounterparts,fromprintednewsarticlestopersonal

diaries,fromlettercorrespondencestoscientificpublications.

Byacknowledgingthissuddentransitioninproductionfromprintedtodigitaldocuments,thegoalof

thischapteristopresentanddiscusssomeofthenewmethodologicalissuesthatarisewhenthese

materialsaretobeemployedasprimarysourcesforstudyingtherecentpast.Firstly,anoverviewof

Page 2: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

thedebateonthehistorian’scraftisoffered.Then,twodifferentcasestudiesthathavedealtwiththe

difficultiesofadoptingborn-digitalmaterialsinhistoricalworkwillbedescribed:thefirstisfocused

onreconstructingthepastofuniversitywebsitesasanewwayforstudyingtherecentpastof

academicinstitutions;thesecondretrievesmaterialsfromlarge-scalearchivesofthewebinorderto

studycontemporarysocio-politicalevents.Throughthesedescriptions,itwillbehighlightedhowa

fruitfulcombinationofthehistoricalmethodswithapproachesfromotherresearchareas,suchas

internetstudiesandnaturallanguageprocessing,couldsupportfuturehistoriansinsuccessfully

addressingthem.

1.TheHistoricalMethod:TodayandTomorrow

Inordertounderstandhowthetransitionfromanaloguetodigitalsourcesisabouttochangethe

historian’scraft,itisfirstofallessentialtoexaminehowthe‘historicalmethod’(Shafer,1974)is

generallydefinedandwhichareitsmajorsteps.

DefiningaSubject

Inthefirstpartofanyhistoricalresearch,thescholarbroadlydefinesthesubjectofinvestigationand

-togetherwithit-aninitialquestion.Theresearchquestion,firstlypresentedatacoarse-grained

level,willbesharpenedthroughtherecursiveprocessofcollectingsources,interpretingthemandby

doingsodiscoveringtheunderlyingnarrative.

CollectingtheEvidence

Inordertoaddresstheresearchquestion,thehistorianidentifiesthetestimoniesuponwhichshe/he

buildsanarrativethroughacomplexprocessofcollection,analysisandselectionoftheremainsof

thepast.Thesetestimoniescouldbephysicalremains(e.g.buildings,statues),oralmemories,printed

documents(e.g.chronicles,diaries,articles,censusdata)andwillsoonbecomeborn-digital

documents,suchaswebsites,onlineforums,emailthreads,large-scaledatabases,etc.Theprocessof

collectingprimarysourceshasbeenshapedandsharpenedbydecadesofdiscussionsin

historiographybothonhowtoestablishthereliabilityofthesematerials,forexamplethroughsource

Page 3: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

criticism,andonhowmuch‘trueknowledge’canbederivedfromthem(thereareasmany

interpretationsofthesametextasmanyreaders,asBarthes(1967)hastaughtus).

InterpretingtheEvidence

Theinterpretationofthecollectedtextualsourcesrepresentsthecoreofanyhistoricalresearch.Due

tothisreason,ithasbeenthecentralfocusofdebateacross20thcenturyhistoriographyandhas

experienceddrastictransitionsinmethodology.Asamatteroffact,theanalysisandinterpretationof

sourcescanbeconductedinmanydifferentways:traditionalhistoriographyscholarshipshave

stronglyreliedonhermeneuticsandonthecarefulqualitativeexaminationofdocuments,whileother

approaches-whichemergedduringthesecondpartofthe20thcenturyinspiredbysocialscience

methodologies(seetheadventofCliometrics-Greif,1997)–haveemployedcensusdataor

economicreportsinordertoconductlarge-scalequantitativeanalyses.

Throughthe‘70sandthe‘80spostmodernanddeconstructionisttheories(startingfromtheworksof

Barthes,1967;Derrida,1967andLyotard,1979,amongothers)haveposedmajorcritiquestothe

underlyingassumptionofbothtraditionalandsocial-sciencehistoricalscholarshipsthatitispossible

todiscovera‘uniquetruth’aboutthepastthroughtocarefulanalysisoftheremains.Theenormous

impactofthesecritiqueshasbeenremarkedbymanyhistorians(Munslow,2006;Burke,2008)and

hasledtotheso-calledculturalturnintheprofession,whichisstillreflectedstronglytodayinthe

community1.

PresentingaNarrative

Thefinalstepofanyhistoricalresearchistodefineanarrativeandwriteahistory.Thecreationofa

narrative,whichishighlyconnectedwiththeinitialdefinitionoftheresearchquestion,givesthe

historianthepossibilityofplacingtheworkshe/heiswritingaspartofalargercontributiontothe

field.Thisisachievedintwointerconnectedways:firstofall,byofferinganew/differentperspective

onthetopicunderstudy;inadditiontothis,byparticipatinginthelargerdebateinhistoriography

regardingthewaysthepastcanbere-discovered,examined,describedand-forcertainauthors2-

evenmodelled.

Page 4: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

1.1AComputationalTurnoftheCraft?

Historyhasbeenpartoftheso-calleddigitalhumanities(Schreibmanetal.,2004),sincetheirvery

beginning.3Inparticular,duringthesecondpartofthe20thcenturythepotentialofcomputational

methodsandtheirimpactoverthehistorian'scrafthavebeenrecurrenttopicsinhistoriography.As

ThomasIII(2004)remarked,alreadyin1945VannevarBush,inhisfamousessay‘AsWeMayThink’

pointedoutthattechnologycouldbethesolutionthatwouldenableustomanagetheabundanceof

scientificandhumanisticdata(Bush,1945);inhisvision,theMemexcouldbecomeanextremely

usefulinstrumentforhistorians.

Theuseofthecomputerinhistoricalresearch,whichgrewsignificantlybetweenthe‘60sandthe‘70s

thanksbothtotheeffortsoftheAnnalesschool(seeforexampleDaumardandFuret,1959)andtoits

applicationtotheanalysisofeconomicandcensusdata(Greif,1997),hasbeenstronglyrelatedtothe

adoptionofsocialsciencepracticesinhistoricalstudies(Evans,2001).Apioneeringworkontheuse

ofdatabasetechnologiesforhistoricalresearchwasconductedbyManfredThallerduringthe‘80s

(Thaller,1991).

However,asMilligan(2012)andRobertson(2016)havealreadyremarked,alargemajorityofthe

historiancommunityhasremainedskepticaltowardstheadoptionofcomputationalmethodsinthe

craft.Thisattitudehasconsolidatedinoppositiontootherhumanitiesdisciplines:forexample,inthe

lastthirtyyearsthefieldofliterarystudyhaslargelyexperimentedwiththepotentialofwhatthey

havedefinedas‘distantreading’techniques,inordertoextractquantifiableinformationfromlarge

amountoftexts(Moretti,2013).Instead,duringthesametime,theso-calleddigitalhistory

community(Cohenetal.,2008)hasdecidedtofocusprimarilyonthepotentialitiesoftheWebasa

platformforthecollection,presentation,anddisseminationofmaterial(CohenandRosenzweig,

2005)andonthemore‘communicativeaspects’ofdoingresearchinthehumanities(Robertson,

2016).Thiscanbenoticedbyobservingtheimportancegiventodigitalpublichistorytopics(Noiret,

2015),therelevanceofteachingindigitalhistory(Cohenetal.,2008)andthetraditionofdigital

historymapping(KnowlesandHillier,2008).

Page 5: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Inthesecondpartofthe2000s,thanksinparticulartothepromptavailabilityofdigitizedhistorical

primarysourcesandthepotentialitiesofwebtechnologies,thisskepticalattitudetowards

computationalmethodshasslowlychangedandafewinterdisciplinaryteamshavedevelopedtoolsin

ordertohelpothertraditionallytrainedhistorianstoemploythesemethodsintheirwork.AsNelson

(2016)remarked,thefirstfruitfulapplicationsofthesemethodsforsupportinghistoricalnarratives

canbefoundintheworksofWilkens(2013)andBlevins(2014),whicharerobustexamplesofthe

beginningofamatureseasonofdigitalhistory.

Whiletheseearlyscholarshipsbasedontheuseofcomputationalapproachesareessentialfor

refreshingthehistoriographicdebate,itisarguedinthischapterthattheadoptionofcomputational

methodscouldnotbeconsiderperseasarevolutionaryturningpointfortheprofession.Infact,use

oftheseapproachesissimilartoothermethodologicalturningpointsthathistorianshavealready

experiencedbefore(Milligan(2012),forexample,identifies‘threewaves’ofcomputationalhistory);

moreover,duringthelasttenyearstheuseofcomputationalmethodsinhumanitiesresearchhas

beenstronglysustainedandencouragedbypublicandprivateinstitutions(fromtheNEHDigital

HumanitiesAdvancementGrantstotheVolkswagenStiftungon‘MixedMethods’intheHumanities)

aswellasprivatecompanies(e.g.,Google’s‘commitment’totheDigitalHumanities)andoften

mainstreammediasources(Rothman,2014).

Nevertheless,itisarguedinthischapterthathistoriographyisabouttoexperienceanewandway

moreconspicuousturningpointandthatthiswillhaveaverystrongimpactonaspecificstepofthe

historian’scraft,namelythewaysourcesarecollectedfromnowon.Born-digitaldocumentsshared

online,theirephemerality,preservation,availabilityandaccessisabouttoposealargesetofnew

challengesforfuturehistorians.Inthenextdecades,themethodologicaldebateinhistoriographywill

notonlybecenteredaroundqualitativeoverquantitative,distantversusclose,hermeneuticsagainst

statisticalsignificance,butitwillalsoaddresstheneedsofthecommunityinfindingwaysofacquiring

knowledgeonourrecent(digital)past.

Page 6: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

1.2Theborn-digitalturn

Thetransitionfromanaloguetoborn-digitalmaterialsisinfluencingthewayhistoriansstudythepast:

materialssuchaswebsites,forums,blogs,tweets,emails,areinfactverydifferentcomparedto

traditionalanalogueanddigitizedprimarysources.Born-digitalmaterialshaveanextremelyshortlife

comparedtoprinteddocumentsastheyaresignificantlymoredifficulttoarchiveandpreserve

(LaFrance,2015).Thisisduetoavastnumberofreasons(Brügger,2005)andtheconsequenceofit

hasbeensummarizedbyRosenzweig(2003)withtheconceptof‘scarcity’ofdigitalprimarysources.

Webpagesdisappearconstantlyfromtheliveweb(becausetheyareremovedbytheauthororby

theowneroftheplatform,forinstanceduetocopyrightissues),leavingafamiliartraceof404status

codemessages.Severalscholars(Rosenzweig,2003;Brügger,2012amongothers)havealready

remarkedonthegreatimpactthattheephemeralityofwebmaterialswillhaveonthesharingand

accessibilityoftheknowledgeproducedinthedigitalageforthenextgenerationsofhistorians.Asit

hasbeenalreadysaid,inoppositiontothefactthat‘papersurvivesbenignneglectforalongtime’

(Davis,2014):

Thelifecycleofmostwebpagesrunsitscourseinamatterofmonths.In1997,theaverage

lifespanofawebpagewas44days;in2003,itwas100days.Linksgobadevenfaster.A

2008analysisoflinksin2,700digitalresources—themajorityofwhichhadnoprint

counterpart—foundthatabout8percentoflinksstoppedworkingafteroneyear.By2011,

whenthreeyearshadpassed,30percentoflinksinthecollectionweredead.(LaFrance,

2015)

Moreover,whilesometypesofpagesdisappearmorefrequentlythanothers(e.g.socialmedia

messagesasopposedtoofficialstatementsonadministrativewebsites),thosethatdosurvivetendto

changeveryfrequently(Doughertyetal.,2010).Forexample,articlesinnewspapers(Nanni,2013)as

wellasofficialadministrativepageshavebeenoftenmodifiedwithoutaspecificmention(Owenand

Davis,2008).WhileinitiativessuchastheInternetArchivehavealongtraditionofpreservingborn-

Page 7: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

digitalmaterialsforfutureresearch,severalissuesstillexistandnewissuescontinuetoemerge-not

intheleastduetoconstantinnovationsinwebtechnologies.Therefore,researchershavetodealwith

thecollectedmaterialsinahighlycriticalway,asBrügger(2012)describedwhenheintroducedhis

definitionofwebarchivedocumentsasreborn-digitalmaterials:

Oneofthemaincharacteristicsofwebarchivingisthattheprocessof

archivingitselfmaychangewhatisarchived,thuscreatingsomethingthatis

notnecessarilyidenticaltowhatwasonceonline.[...]And,second,thatawebsitemaybe

updatedduringtheprocessofarchiving,justastechnicalproblemsmayoccurwherebyweb

elementswhichwereinitiallyonlinearenotarchived.Thus,itcanbearguedthattheprocess

ofarchivingcreatesthearchivedwebonthebasisofwhatwasonceonline:theborn-digital

webmaterialisreborninthearchive.(Brügger,2012)

Thedifficultiesinthepreservationofdigitalsourcespresentanewsetofissuesforhistorianswho

plantoemploythemintheirwork;however,theyremainonlypartoftheoverallproblem.Infact,

alreadyin2003,Rosenzweigenvisionedthatfuturehistorianswillnotonlydealwithaconsistent

scarcityofprimarysources,buttheywillbealsochallengedbyaneverexperiencedbeforeabundance

ofrecordsofourpast.Theindispensableneedofcomputationalmethodsforprocessingand

retrievingmaterialsfromthesehugecollectionsofprimarysourceshasbeenacentraltopicof

Milligan'spublications(2012,2016).Fromhisworksitemergesthatnowthatthecommunityis

dealingwiththeabundanceofborn-digitalsources,theuseofcomputationalapproachescannotbea

choiceforthedigitalhumanitiesresearcheranymore.Therefore,itbecomesessentialthatthe

researchersadoptthesesolutionscritically,alwaysknowingtheirpotentialandlimitations,andlearn

howtocombinethemfruitfullywiththetraditionalhistoricalmethod.

Whiletheconsequencesoftheadventofborn-digitalsourceswillberevolutionaryforourprofession,

sofar‘verylittleattentionhasbeenpaidtothenewdigitalmediaashistoricalsources’(Brügger,

Page 8: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

2012),highlightingthefactthat,while‘newmediaisnotthatnewanymore’(Milligan,2016)forour

society,theyremainanoveltyforhistorians.

Thenextsectionswillremarkfurtheronthistopicbydescribingtwoverydifferentcasestudiesthat

havedealtwiththeuseofborn-digitaldocumentsasprimarysourcesforhistoricalresearch.Thefirst

thatwillbeintroduced,focusesonexaminingtheonlinepresenceoftheUniversityofBologna,since

theearlyNineties,andremarksontheimportanceofcombiningthetraditionalhistorian’scraftwith

approachesfromthefieldofinternetstudies.

2.StudyingtheRecentPastofAcademicInstitutions:ATaleofScarcity

Multiplehistorianshaveconsideredacademicinstitutionsaspolitical,economicalandsocialactors;

theyhavealsoarguedhowtheirpower,roleandinfluencechangedovertime,especiallyinrelationto

otheractors,suchasthecity,thechurch,thenationalgovernment(Brockliss,1978).Inparticular,the

comprehensivefour-volumebookseries`AHistoryoftheUniversitiesinEurope’,commissionedbythe

EuropeanUniversityAssociation,editedbyHildedeRidder-SymoensandWalterRüeggandpublished

between1992and2011,offersanunprecedentedoverviewonhowuniversitieshavetransformed

overcenturies:whattheyhavetaughtandresearched,howtheyhavebeeninstitutionalizedandhow

theyhaveinteractedwiththesociety.

Historiansofhighereducation,whopresentedtheirresearchinthevolume,haveadoptedalarge

varietyofprimaryandsecondarysourcesintheirworks,fromuniversity-archivematerialssuchas

matriculationandgraduationstatisticstoacademicdissertations,frompublicreportstolargescale

statisticalanalyses.Basedonthesedata,researchershavedescribedanddrawnconclusionsonthe

historyofuniversitiesonalargevarietyoftopics,suchasthewayuniversitieshavemanaged

resources,thewaytheadmissionprocesshaschangedbeforeandafter1970,andhowsciencesand

humanitieshavebeentaughtandstudied.

Thecurrentpromptavailabilityofalargevarietyofborn-digitalmaterialssuchassyllabi(Cohen,

2005),bachelor,masteranddoctoraltheses(Ramage,2011),academicwebsites(Holzmannetal.,

Page 9: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

2016b)andtheirhyperlinkedstructure(Haleetal.,2014)isabouttobecomeanewrelevant

componentofthisfieldofresearch(Nanni,2017b).

Anemblematicexampleofthenewchallengesthatborn-digitaldocumentswillposetohistoriansof

highereducationisastudyonreconstructingtherecentpastoftheUniversityofBologna,throughits

digitalsources(Nanni,2017a).

TheUniversityofBologna'swebsite(Unibo.it),initiallycreatedin1993,representsanewcategoryof

relevantresourceforhistoriansofhighereducation.Thewebsitecollectsandofferstothereadera

largevarietyofdocuments,fromdescriptionsofeducationalprojectstooverviewsofresearch

groups,fromreportsofcollaborationwithinternationalinstitutionstoinformationonopportunities

ofinteractionswiththeprivatesector.Inaddition,italsoshowshowdifferentdepartments,

professorsandresearchteamshavebeenadoptingtheweb–especiallyinitsearlydays.Amongthe

manyrelevantexamples,onethatdeservesspecialmentionisthattheAstronomyDepartmentofthe

universitywasalreadysharingpreprintsoftheirpublicationsonlinein1994ashtmlpages,inanearly

attemptofbenefittingfromthepotentialoftheWorldWideWeb.

Nevertheless,whileUnibo.itrepresentsausefulcollectionofprimarysources,thewebsitehasbeen

modifiedseveraltimesduringitsfirsttwentyyearsandthemajorityofthepagesthathavebeen

publishedinthepastarenotavailableanymoreontheliveweb.Inparticular,thetransitiontotheso-

called‘PortaleD’Ateneo’,whichstartedintheearly2000s,requiredthatalldepartmentpageschange

theirstructureandadoptacommonlayoutandorganizationoftheircontent.Thishasoftenforced

thecreationofbrand-newdepartmentsubdomainsandtheremovalthepreviousversionsofthe

samefromtheliveweb.Asanadditionalissue,theteamthathasmanagedthewebsiteduringthis

entiretransitionhasnotconsistentlyarchivedthepreviousversionsofthewebsiteanddocumented

theirwork.

Page 10: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Giventhefactthatasof2017theNationalLibrariesofFlorenceandRomearestillnotpartofthe

InternationalInternetPreservationConsortium(IIPC)andnocoordinatedprojectwiththespecific

purposeofpreservingthenationalwebspherecurrentlyexistsinItaly,theInternetArchiveremains

theonlyresourceavailableforrecollectingallthematerialsthatarenotavailableontheUniversityof

Bolognawebsiteanymore.However,in2002aremovalrequest4fromtheadministrativeteamof

Unibo.itwassenttotheInternetArchive,andforthisreasonUnibo.ithadbeeninaccessiblethrough

theWaybackMachineformorethanthirteenyears.Thishighlycomplexsituationreflectsanewlevel

ofdifficultiesthatfuturehistorianswillencounterwhileattemptingtocollectborn-digitalsources.In

thenextsection,anoverviewofthevarietyofsourcesandmethodsthathavebeenusedtodealwith

thisissueandtoreconstructthepastofUnibo.itwillbepresented.

LibraryandArchiveMaterials

Asaninitialstepoftheresearch,materialsavailableintheuniversitylibraryandarchiveswere

consulted.Amongmanyotherdocuments,averyusefulsourcehasbeentheuniversityyearbook.In

theearly90sonlyafewpiecesofinformationregardingthewebsitewerementionedinthe

yearbook;nevertheless,thissourceofferedaninitialdiachronicoverviewoftheofficialteamsthat

weremanagingUnibo.itandwasusefulfordrawingalistofpeopletointerview.

Interviews

Inordertocapturetherationaleandthechangingarchitectureofthewebsite,thedifferentteams

whomanagedthewebsitewereinterviewed,togetherwithtechniciansandresearcherswhoworked

onthedevelopmentofthepagesofvariousdepartments,especiallyduringthe‘90s.Yetanother

interestingfinding,presumablyhighlyrelevantforfuturehistorians,wasthatmanytimesduringthe

interviewsthesubjectsusedpublicandprivatebackupsofemailsinordertorecollectthememories

oftheirexperienceinworkingonUnibo.itandtoconfirmpassagesofthehistoricalreconstruction.

Newspapers

Asalreadydoneinpreviouswork(Brügger,2011),whereprintedmediawereusedtoretrieve

informationaboutthewebofthepast,informationrelatedtoUnibo.itandtheroleofthewebsitefor

Page 11: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

theUniversityofBolognahavebeenidentifiedinlocalandnationalnewspaperarchives.Duringthe

‘90s,newspaperssuchasLaRepubblicaandIlRestodelCarlinopublishedafewshortarticlescovering

thenewfunctionalitiesonthewebsite(e.g.freeemailaccountforallstudents,onlinefeepayments,

etc.).Thesepublications,togetherwithmaterialscollectedfromtheuniversitydigitalmagazines

(Alma2000,AlmaNews,UniboMagazine),offeredanadditionaloverviewonhowtheuniversity

decidedtopromotethewebsitetoitsaudience.

OnlineForums

Togetacloserlookattheeverydayuseofthewebsitebystudentsandresearchers,othermaterials

havebeencollectedandanalyzed,startingfromstudentforums(e.g.UniversiBo)andUsenet

discussionspreservedbyGoogle.Thesedocuments,especiallyinthe‘90s,presenttheperspective

andenthusiasmofarathersmallbutspecificsubsetoftheuniversitycommunity,namelystudents,

researchersandprofessorsinSTEMfields,whosedepartmentswereamongthefirstonestooffer

accesstotheweb.

LiveWebMaterials

Whilethewebsitehasbeenrestructuredmultipletimesduringitsfirst20yearsonline,many

resourcesarestillavailableonthelivewebandcanrevealthecurrentroleofwebsiteinthe

university'sorganizationandmanagement(e.g.attractingnationalandinternationalstudentsand

researches,promotingcollaborationswiththeprivatesector,etc).Additionally,thesocialmedia

pagesoftheinstitution(suchasFacebook,YoutubeandTwitterprofiles)arebecomingkey

componentsofitspresenceonline,showingalternativeandmoreinformalwaysofinteractionwith

theusers.

PresenceofItalianWebsitesinOtherNationalWebArchives

AsidefromtheInternetArchive,since1996nationallibrariesfromallaroundtheworldhavealso

beguntopreservetheirnationalwebpast.PANDORA,startedin1996bytheNationalLibraryof

Australia,theUKWebArchive(2004),theNetarkivet(2005)inDenmarkandthePortugueseWeb

Archive(2011)arejustafewexamplesofthisinternationalendeavor.Giventhecomplexityof

Page 12: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

definingandpreservingwhatiscalleda‘nationalweb-sphere’(Brügger,2009),thisresearchalso

exploredtheuseofforeignwebarchivesasaproxyforstudyingUnibo.it.Thepracticeofretrieving

primarysourcesrelatedtoanItalianuniversitywebsiteinforeignwebarchivescouldseemratherodd

asthegoalofanationalwebarchiveispreciselytopreservethewebofitscountry,howeverfrom

timetotimepartofthenon-nationalwebalsoendsupbeingpreserved,unintentionally,bythese

digitalarchives.

Forexample,toarchivenationalwebspheresinanautomaticway,archivistscouldsetupcrawlers

withamaximumnumberofhyperlinkstheycanfollow,withaspecificsetofstartingpoints.Acrawler

whichissettogoatmosttenlinksawayfromoneoftheseURLscouldalsoendupcrawlingnon-

nationalcontent,asitwillsystematicallyfollowallthehyperlinks.Forthisreason,iftheUniversityof

BolognaweretoorganizeaSummerSchoolandAarhusUniversityhadlinkeditfromitswebsite,the

UniversityofBolognawebsite(oratleastpartofit)wouldbeunintentionallypreservedintheDanish

WebArchive.

Asapartofthiswork,ithasbeenfoundoutthatboththePortuguese(Arquivo)andDanish

(Netarkivet)webarchiveshavepreservedpartsofUnibo.itseveraltimes,since2006.

ClonedVersionsoftheWebsite

Amongthevarietyofsourcesavailable,onedeservesaspecificmention.InMay2007,agroupof

activistsdecidedtocreateacopyoftheUnibo.itwebinterface,aspartofaprotestagainstthe

EuropeanCreditTransferandAccumulationSystem(ECTS)fortheevaluationofthenumberofhours

ofstudy.IntheURLhttp://www.unibologna.euanidenticalversionofthewebsitewasavailable,with

thedescriptionofthereasonsoftheprotest.

Thissourcehasnotonlybeenimportantinthisstudyasitdocumentedaninnovativewayof

conductingaprotestagainstanacademicinstitution(bytargetingitswebsite),butalsobecausethe

cloned-websitewaspreservedbytheInternetArchive.

ACriticalCombinationofSourcesandMethods

Thecombinationoftraditionalarchivalpracticeswithapproachesfromthefieldofinternetstudiesis

essentialintheattemptoffacingthisemblematicexampleofscarcityofborn-digitalprimarysources

Page 13: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

andreconstructingthepastoftheUniversityofBolognawebsite.Thisnewmethodologyforcollecting

born-digitalevidenceshasbeenespeciallyusefulinidentifyingthenarrativebehindtheearlyyearsof

Unibo.it,whichinvolvesthearrivalofaTurkishprofessorfromtheUnitedStatesattheuniversityin

1988,theestablishmentofthesecondItaliannodetotheInternetandthecreationofarguablyoneof

themostrelevantuniversitywebsitesofthecountry5.

Whilethedifficultiesinreconstructingtherecentpastofauniversitywebsitecouldsurprisethe

reader,aslessthan30yearshavebeenpassedsinceitscreation,theyonlyrepresentonepartofthe

newissuesthatborn-digitalsourceswillposetofuturehistorian.

Asithasbeenpreviouslyremarkedandwillbeexpandedinthenextsection,futurehistorianswillbe

infactalsochallengedbyaneverexperiencedbeforeabundanceofrecordsofourpast.Thesecond

casestudypresentedinthischapterfocusesonobtainingsmalltopic-specificcollectionsfromlarge-

scalearchivesoftheweb;bypresentingtheencounteredchallengesanddescribingtheadopted

solutions,itwillberemarkedontheimportanceoffruitfullycombiningthetraditionalhistorical

methodwithapproachesfromthefieldofnaturallanguageprocessing.

3.CreatingPoliticalEventCollections:ATaleofAbundance

TheWorldWideWebprovidestheresearchcommunitywithanunprecedentedabundanceof

primarysourcesfordiachronicallytracing,examiningandunderstandingmajoreventsand

transformationsinoursociety.Fortwodecades,publicandprivateinstitutionshavepreservedthese

born-digitalmaterialsforfutureanalysis(GomesandCosta,2011).However,thesecollectionsare

nowsolargethat–intherarecaseswhentheyarefullyavailableforresearch(Hockx-Yu,2014)–itis

notfeasibleforscholarstostudypoliticalandsocialphenomenabyexaminingthemintheirentirety.

IfweforinstanceconsidertheInternetArchive,duringitsfirsttwentyyearsithaspreservedalmost

500billionwebpages,andasof2017ithasacollectionofaround25petabytesofdata.Since2001,

thiscollectionhasbecomeavailableforresearchthroughaURLsearchtoolontheWaybackMachine.

Inthemostrecentyears,informationretrievalsystemssupportingkeywordsearchoverthe

Page 14: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

diachroniclayersofwebarchiveshavebeendevelopedbytheresearchcommunityandemployedby

institutionssuchastheUKWebArchiveand–since2017–alsopartiallybytheInternetArchive.In

additiontothis,out-of-the-boxtoolssuchasArchiveSpark(Holzmannetal.,2016a)andWarcbase(Lin

etal.,2017)havebeendevelopedbytheresearchcommunitywiththespecificgoalofsupporting

scholarsingatheringinformationfromlarge-scalewebarchivecollections.

Oneofthemainendeavorsofwebarchiveinstitutionsforfosteringtheuseofthesenewresourcesis

tooffermanuallycuratedsub-collectionsregardingrecentsocio-politicalevents.OnArchive-It–a

subscriptionwebarchivingserviceprovidedbytheInternetArchive–afewcollectionsregarding

large-scaleeventssuchastheBostonMarathonShooting,theBlackLivesMattermovementandthe

CharlieHebdoterroristattackareavailable.Thecollectionsarecuratedby‘theArchive-Itteamin

conjunctionwithcuratorsandsubjectmatterexpertsfrominstitutionsaroundtheworld’.

Inadditiontomanualselection,anothersolutionemployedbydigitalarchivistsforcreatingand

sharingtheseeventcollectionsistoadoptafilteringapproachthatpresentstotheuseronlythose

documentsthatmentionthenameoftheevent.Thistypeofapproachiscommoninevent-harvesting

fromTwitter,whereresearcherscollectalltweetsthat–forexample–mentionthehashtagofthe

event.

Whilebothcollectingdocumentsfromwebarchivesthroughmanualselectionandretrieving

materialsthroughname-filteringhavealreadyprovedtheirusefulnessinsupportingresearchersin

thehumanitiesandsocialsciences(e.g.,Small,2011),theyhaveafewcruciallimitations.Onone

hand,manualselectionisobviouslyapainstakinglylongprocess–giventhepreviouslymentioned

difficultiesofretrievinginformationfromwebarchives.Ontheotherhand,collectingdocuments

usingtheevent-nameheuristicspresentsthecruciallimitationofoftenmissinginformationon

backgroundstoriesaswellaspremisesoftheexaminedevents.Togiveaspecificexample,letus

imaginethatthegoalistocollectprimarysourcesregardingthe2004UkraineOrangeRevolution.If

theadoptedmethodonlyretrievesdocumentsthatmentionthenameoftheevent,itwillnotcollect

materialsthatconnectthepremisesoftherevolutiontothepreviouscontroversialpresidential

Page 15: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

electioninthecountry.AndthesameissuewillemergewhenstudyingthefirstfreeAlgerianelections

sincetheirindependence(1990),whichisapremiseoftheAlgeriancivilwar,orevenwhen

investigatingtheeconomiccrisisbehindFujimori'sauto-golpeinPeru,1992.Inthislastcase,the

documentsthatdiscusstheadoptionofausteritymeasureswillbenotbepartofthecollection.

Moreover,thenameusedforreferringtoaneventmightchangeovertimeorvarybetweencountries

andlanguages:forexample,oneoftheearlyhashtagsusedforthe2011EgyptianRevolutionwas

#jan25,referringtothedayitstarted.

Thesecondcasestudypresentedinthischapterisaninterdisciplinaryprojectbetweencomputer

scienceandpoliticalhistoryfocusedonbuildingmorecomprehensivesub-collectionsregarding

eventssuchaselections,protestsandpoliticalcrisesfromlarge-scalewebarchives.Aspartofthis

research,asystemthatemploysnaturallanguageprocessingmethodsandinformationretrieval

approacheshasbeendeveloped,whichisabletogatherandorganizeahighlycomprehensive

collectionofsourcesdescribingaspecificevent(Nannietal.,2017).Thedevelopedapproachis

inspiredbythefactthat,whenhistoriansareconductingthesametaskmanually(i.e.,identifying

relevantmaterialsacrossanentirearchive),theydonotnecessarilysearchonlyfordocumentsthat

mentionthenameoftheevent.Whathistorianswilltrytocollectarealsothosedocumentsthattalk

aboutrelatedaspectswhichprovidethecontext,involvingforexamplesomeoftheparticipantsto

theevent,butnotothers.IfweconsiderthepreviousexampleregardingtheOrangeRevolution,

historianswillalsobeinterestedinmaterialsfromthesameperiodoftimediscussingthepolitical

careerofYuliaTymoshenkooraddressingthestateofthepoliticalrelationsbetweenUkraine,Russia

andtheEuropeanUnion.

IdentifyingRelatedConceptsandEntities

Inordertoachievethisgoalinanautomatedfashion,thefirststepistobeabletoidentifyasetof

conceptsandentitiesthatarerelevanttoanevent.Todoso,DBpedia(Aueretal.,2007)hasbeen

employed.Thisisalarge-scaleknowledgebaseextractedfromWikipedia,whereevents(suchasthe

Page 16: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

OrangeRevolution)arerepresentedbynodesandconnectedthroughedges(i.e.,hyperlinksin

Wikipedia)tootherrelatedentities.

RetrievingContextualPassages

Foreachcollectedentityandconcept,atextualpassagepresentingitinthecontextoftheeventwas

alsoextractedfromWikipedia(forexample:‘YuliaTymoshenkoco-ledtheOrangeRevolutionandwas

thefirstwomanappointedPrimeMinisterofUkraine’).Thisisanoptimalsolutionforidentifying

othertermsthatcouldbeusefultoindentifyrelevantdocuments.

RankingConceptsandEntities

Havingobtainedaninitialsetofpotentiallyrelevantconceptsandentities,thegoalistoscoreeachof

themonhowrelevanttheyaretotheevent.Forexample,whileYuliaTymoshenkoishighlyrelevant

fortheOrangeRevolution,theEuropeanUnionplayedonlyamarginalroleintheevent.Different

approachesforrankingentitiesandconceptsforrelevanceweretestedandthebestperforming

solutionwastocomputedistancesbetweenentitiesandtheeventemployingout-of-the-boxRDF

vector-representations(RistoskiandPaulheim,2016).

FindingMentionsinText

Havingourrankedsetofentitiesandconcepts,otherdocumentswereretrievedfromtheweb-

archivementioningtheminrelevantcontexts.Inordertogobeyondsimplestring-matchingof

conceptsthatareconsideredrelevant(e.g.,‘protests’,‘revolution’,‘crisis’,‘election’),word-

embeddingrepresentations(Mikolovetal.,2013)havebeenadopted.Embeddingtechniques

representeachword,entityorconcept(e.g.,‘protest’)asanumeric-vectorofndimensions.This

allowstomeasuresimilarityacrossdifferentwordsandtocollectrelevantmaterialseveniftheytalk

about‘demonstration’or‘crisis’,insteadforexampleofmentioning‘protest’or‘revolution’.

FinalCollectionBuilding

Itcouldhappenthatdocumentsmentionrelevantentitiesandconceptsoutofcontext,forexample

aspartofacomparison:‘ThepopularoppositiontoEthiopia'scurrentcorruptregimeiscomparable

Page 17: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

totheOrangeRevolutioninUkraine.’.Inordertofilterthemoutandselectonlythedocumentsthat

shouldbeincludedintheevent-collection,amachinelearningsystemcalledLearningtoRank(Liu,

2009)hasbeenemployed,which,givenaninitialsetofrelevantandnotrelevantdocuments,learns

howtoabstractthispropertyandtoautomatetherankingprocess.

ACriticalCombinationofSourcesandMethods

Thecombinationoftraditionalpracticesofhistoricalresearchwithmethodologiesandapproaches

fromthefieldsofnaturallanguageprocessingandinformationretrievalisessentialforfacingthe

largeabundanceofborn-digitalprimarysources.Someoftheapproachespresentedinthischapter

havebeenalreadyadoptedinpoliticalscienceresearch.Oneofthesefirststudiesfocuseson

retrievingdocumentswhichreferredtopoliticalevents(e.g.,elections)frominstitutionalweb

collectionsoftheUnitedStatesgovernmentinordertodefineanewmeasureof‘attention’ofthe

U.S.CongressandthePresidenttodemocratizationandelectoralpracticesinothercountries,from

ZimbabwetoHaitiandEgypt(Elshehawyetal.,2017).Bydoingso,thisinitialworkhighlightsboththe

potentialandchallengesofusingborn-digitaldocumentsandcomputationalmethodsforobtaining

newinsightsontherecentpoliticalpast.

Thetwocasestudiespresentedinthischapterrevealtheimportanceofadoptingahighly

interdisciplinaryapproachwhendealingwithborn-digitalsources;methodologiesfromthefieldof

internetstudiescouldsupporthistoriansinreconstructinglostwebpages,whilenaturallanguage

processingmethodscouldguidetheminretrievingdocumentsfromlarge-scalewebarchives.The

finalpartofthischapterwillremarkfurtheronthis,bydiscussingontheimportanceofofferingthis

interdisciplinarypreparationtofuturehistoriansintheireducationalprograms.

4.Conclusion:ANewGenerationofHistorians

Inrecentyears,researchershavearguedthathistory,asotherhumanitiesdisciplines,isreachinga

turningpointinitsmethodology(Scheinfeldt,2012;Graham,MilliganandWeingart,2015;Nelson,

Page 18: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

2016):sustainedbytheeffortsofmanydigitizationprojects,thecommunityhasbeenemploying

computationalmethodsinordertoexaminethesevastresourcesandobtainingnewinsights.This

changeinmethodologyhasreopenedalong-termdebateregardingthewaystextualevidenceofthe

pastcanandshallbeproperlyinterpreted.

Whileforthehistoricalprofessionitisofcoursebeneficialtoconstantlydebateandcriticizethe

validityofestablishedpracticesofacquiringknowledgefromsources,itisarguedinthischapterthat

theadoptionofdigitizeddatasetsandcomputationalmethodscannotbeconsidered,byitself,the

triggeringfactorofafundamentalturningpointinourprofession.Infact,adopting(ornot)large-scale

datasetsofdigitizedsources,togetherwithcomputationalmethods,willalwaysremainachoicefor

thehistoryscholar:CharlesDarwincanstillbestudiedwithoutconductingtextminingoverthe

collectionspresentedonDarwinOnline,aswellastheLondonof18thcenturycanbeexamined

withoutdistantreadingtheProceedingsoftheOldBaileyOnline.

However,itisalsoarguedthathistoryisinfactabouttofaceaparadigm-shiftingtransitioninits

methods,butthetriggeringcauseofthistransitionreliesontheborn-digitalnatureofthelarge

majorityofsourcesproducedbycontemporarysocieties.Thischangeaffectsanytypeofdocument

wecreateandconsumeinoureverydaylife,frombureaucraticformscollectedbythepublicsectorto

newspapersarticlestopoliticalmailcorrespondencestouniversitywebsites,anditisaboutto

presentitsmultifariousconsequencesonhistoricalresearch.

Born-digitalsourcesaresignificantlymorecomplextoarchive,collect,analyzeandselectcomparedto

traditionalmaterials.Websites(suchasUnibo.it),arelargeandvariegatedcollectionsofdocuments,

whichareoftennotpreservedintheirentiretybywebarchiveinitiativesandcanbere-constructed

onlythroughthemeticulouscombinationofvariouspiecesofinformationfromdifferentsources.

Whenaresource,suchastheinstitutionalwebsiteofanadministrationisfinallyre-created,itisoften

sovastthatcomputationaltechnologies(i.e.naturallanguageprocessingmethodsandinformation

retrievalapproaches)arenecessaryforidentifyingandretrievingspecificdocuments.

Page 19: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Themethodologicalstepsoverviewedinthischapterforcollecting,analyzingandselectingborn-

digitaldocumentsrequirestronginterdisciplinarycompetencesandahighlycriticalattitudetowards

sourcesandmethods.Inthiscomplexscenario,thischapterconcludesbyraisingaverypressing

question:howcanthenewgenerationsofhistoriansbepreparedtofacethesenewchallenges?

Inrecentyears,thedigitalhistorycommunityhasalreadyofferedmanyeducationalactivitieson

computationalmethodstoitsstudents.Fromworkshopstopanels,fromcoursestosummerschools,

fromtutorialstohackathons,theseinitiativeshavealmostalwaysbeenfocusedonpresentingthe

potentialofnewresources,toolsandplatformstothehistorystudents,followinganattitudewhich

hasbeenbrandedas‘morehack,lessyack’(Nowviskie,2014).Whileofferinghands-onexperiences

withcomputationaltoolsisimportantinordertointroducehistorystudentstothedigitalhumanities,

acriticalapproachisstronglyneededinordertoproperlydealwithborn-digitalsourcesand

computationalmethods.

Forthisreason,itisessentialthatstudentswillfirstofallbeguidedinshapingtheirresearchtopics

andreceiveearlyonintheirstudiesthepreparationnecessarytosupportacriticalanalysisofthe

born-digitaldocumentsandcomputationalmethodsattheirdisposal.Thiswillbeimperativefora

generationofhistorianswhowillbeabletogobeyondanunquestionedadoptionofthenewsources

andtoolsattheirdisposalandwillinsteadcriticallyemploythem,insearchofnewhistorical

perspectives.

Page 20: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

References

AuerS.,BizerC.,KobilarovG.,LehmannJ.,CyganiakR.,IvesZ.(2007)‘DBpedia:ANucleusforaWeb

ofOpenData’,Proceedingsofthe6thInternationaland2ndAsianConferenceonSemanticWeb:722-

735.

Barthes,R.(1967)‘DiscourseonHistory’,SocialScienceInformation,6(4):65-75.

Blevins,C.(2014)‘Space,Nation,andtheTriumphofRegion:AViewoftheWorldfromHouston’,The

JournalofAmericanHistory,101(1):122-147.

Bloch,M.(1949)Apologiepourl'histoire,ou,Métierd'historien,ArmandColin,Paris.

Brockliss,L.W.(1978)‘PatternsofAttendanceattheUniversityofParis,1400–1800’,TheHistorical

Journal,21(3):503-544.

Brügger,N.(2005)‘ArchivingWebsites:GeneralConsiderationsandStrategies’,TheCentrefor

InternetResearch,Aarhus.

Brügger,N.(2009)‘WebsiteHistoryandtheWebsiteasanObjectofStudy’.NewMedia&Society,

11(1-2):115-132.

Brügger,N.(2011)‘WebArchiving–betweenPast,Present,andFuture’,inM.ConsalvoandC.Ess

(ed.),TheHandbookofInternetStudies,Wiley-Blackwell,Oxford.

Brügger,N.(2012)‘WhenthePresentWebisLaterthePast:WebHistoriography,DigitalHistory,and

InternetStudies’.HistoricalSocialResearch/HistorischeSozialforschung,37(4):102-117.

Burke,P.(2008)WhatisCulturalHistory?,Polity,Cambridge(UK).

Page 21: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Bush,V.(1945)‘AsWeMayThink’,TheAtlanticMonthly,176(1):101-108.

Cohen,D.J.,&Rosenzweig,R.(2005)DigitalHistory:AGuidetoGathering,Preserving,andPresenting

thePastontheWeb,UniversityofPennsylvaniaPress.

Cohen,D.J.(2005)‘Bythebook:AssessingthePlaceofTextbooksinUSSurveyCourses’,TheJournal

ofAmericanHistory,91(4):1405-1415.

Cohen,D.J.,Frisch,M.,Gallagher,P.,Mintz,S.,Sword,K.,Taylor,A.M.,&Turkel,W.J.(2008)

‘Interchange:ThePromiseofDigitalHistory’,TheJournalofAmericanHistory,95(2):452-491.

Daumard,A.,&Furet,F.(1959)‘Méthodesdel'histoiresociale:lesarchivesnotarialesetla

mécanographie’.Annales.Histoire,SciencesSociales,14(4):676-693.

Davis,C.(2014)‘ArchivingtheWeb:ACaseStudyfromtheUniversityofVictoria’.code{4}libJournal,

26(http://journal.code4lib.org/articles/10015)

Derrida,J.(1967)OfGrammatology,LesÉditionsdeMinuit,Paris.

Dougherty,M.,Meyer,E.T.,Madsen,C.M.,VandenHeuvel,C.,Thomas,A.,&Wyatt,S.(2010),

‘ResearcherEngagementwithWebArchives:StateoftheArt’,PreprintonSSRN

(https://ssrn.com/abstract=1714997)

Elshehawy,A.,Marinov,N.,&Nanni,F.(2017)‘QuantifyingAttentiontoForeignElectionswithText

AnalysisofUSCongressandthePresidency’,PreprintonSSRN(https://ssrn.com/abstract=2981486)

Evans,R.J.(2001)InDefenceofHistory,GrantaBooks,London.

Page 22: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Fogel,R.W.,&Engerman,S.L.(1974)TimeontheCross,UniversityPressofAmerica,Lanham,

Maryland.

GomesD.,MirandaJ.,CostaM.(2011)‘ASurveyonWebArchivingInitiatives’,Proceedingsofthe

15thinternationalconferenceonTheoryandpracticeofdigitallibraries:408-420.

Graham,S.,Milligan,I.,&Weingart,S.(2015)ExploringBigHistoricalData:TheHistorian's

Macroscope,ImperialCollegePress,London.

Greif,A.(1997)‘CliometricsAfter40Years’,TheAmericanEconomicReview,87(2):400-403.

Hale,S.A.,Yasseri,T.,Cowls,J.,Meyer,E.T.,Schroeder,R.,&Margetts,H.(2014)‘MappingtheUK

Webspace:FifteenYearsofBritishUniversitiesontheWeb’,Proceedingsofthe2014ACMConference

onWebScience:62-70.

Hockx-Yu,H.(2014)‘AccessandScholarlyUseofWebArchives’,Alexandria:TheJournalofNational

andInternationalLibraryandInformationIssues,25(1-2):113-127.

Holzmann,H.,Goel,V.,&Anand,A.(2016a)‘Archivespark:EfficientWebArchiveAccess,Extraction

andDerivation’,Proceedingsofthe2016IEEE/ACMJointConferenceonDigitalLibraries(JCDL):83-92.

Holzmann,H.,Nejdl,W.,&Anand,A.(2016b)‘TheDawnofToday'sPopularDomains:AStudyofthe

ArchivedGermanWebOver18Years’,Proceedingsofthe2016IEEE/ACMJointConferenceonDigital

Libraries(JCDL):73-82.

Iggers,G.G.(2005)HistoriographyintheTwentiethCentury:FromScientificObjectivitytothe

PostmodernChallenge,WesleyanUniversityPress,Middletown(CT).

Page 23: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Knowles,A.K.,&Hillier,A.(eds)(2008)PlacingHistory:HowMaps,SpatialData,andGISare

ChangingHistoricalScholarship,ESRI,NewYork.

LaFrance,A.(2015)‘RaidersoftheLostWeb’.TheAtlantic,14

(https://www.theatlantic.com/technology/archive/2015/10/raiders-of-the-lost-web/409210/)

Lin,J.,Milligan,I.,Wiebe,J.,&Zhou,A.(2017)‘Warcbase:ScalableAnalyticsInfrastructurefor

ExploringWebArchives’,JournalonComputingandCulturalHeritage(JOCCH),10(4):22.

Liu,T.Y.(2009)‘LearningtoRankforInformationRetrieval’,FoundationsandTrendsinInformation

Retrieval,3(3):225-331.

Lyotard,J.F.(1979)Thepostmoderncondition:AReportonKnowledge,Minuit,Paris.

Munslow,A.(2006)DeconstructingHistory.Routledge,NewYork.

Milligan,I.(2012)‘Miningthe‘InternetGraveyard’:RethinkingtheHistorians’Toolkit’.Journalofthe

CanadianHistoricalAssociation/RevuedelaSociétéhistoriqueduCanada,23(2):21-64.

Milligan,I.(2016)‘LostintheInfiniteArchive:ThePromiseandPitfallsofWebArchives’,International

JournalofHumanitiesandArtsComputing,10(1):78-94.

Mikolov,T.,Sutskever,I.,Chen,K.,Corrado,G.S.,&Dean,J.(2013)‘DistributedRepresentationsof

WordsandPhrasesandtheirCompositionality’,Proceedingsofthe26thInternationalConferenceon

NeuralInformationProcessingSystems:3111-3119.

Moretti,F.(2013)DistantReading.VersoBooks,London.

Page 24: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Nanni,F.(2013)‘L’archiviazionedellepaginedeiquotidianionline’,Diacronie.StudidiStoria

Contemporanea,15(3)(http://www.studistorici.com/wp-content/uploads/2013/10/02_NANNI.pdf)

Nanni,F.(2017a)‘ReconstructingaWebsite'sLostPast:MethodologicalIssuesConcerningtheHistory

ofwww.unibo.it’,DigitalHumanitiesQuarterly.11(2)

(http://www.digitalhumanities.org/dhq/vol/11/2/000292/000292.html)

Nanni,F.(2017b)‘TheWebasaHistoricalCorpus:Collecting,AnalysingandSelectingSourcesonthe

RecentPastofAcademicInstitutions’,Ph.D.Dissertation,UniversityofBologna.

Nanni,F.,Ponzetto,S.P.,&Dietz,L.(2017)‘BuildingEntity-CentricEventCollections’,Proceedingsof

2017IEEE/ACMJointConferenceonDigitalLibraries(JCDL):199-209.

Nelson,R.K.(2016)‘DigitalHumanitiesasAppendix’,AmericanQuarterly,68(1):131-136.

Noiret,S.(2015)‘DigitalPublicHistory:BringingthePublicBackIn’,PublicHistoryWeekly,3(13)

(http://hdl.handle.net/1814/38393).

Nowviskie,B.(2014)‘OntheOriginof“Hack”and“Yack”’,,inM.K.GoldandL.F.Klein(eds)Debates

inDigitalHumanities(2ndedn),UniversityofMinnesotaPress

(http://dhdebates.gc.cuny.edu/debates/text/58)

Owen,D.,&Davis,R.(2008)‘PresidentialCommunicationintheInternetEra’,PresidentialStudies

Quarterly,38(4):658-673.

Ramage,D.R.(2011)‘StudyingPeople,Organizations,andtheWebwithStatisticalTextModels’,

Ph.D.Dissertation,StanfordUniversity.

Page 25: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

Ristoski,P.,&Paulheim,H.(2016)‘RDF2vec:RDFGraphEmbeddingsforDataMining’,Proceedingsof

the2016InternationalSemanticWebConference:498-514.

Robertson,S.(2016)‘TheDifferencesBetweenDigitalHistoryandDigitalHumanities’,inM.K.Gold

andL.F.Klein(eds)DebatesinDigitalHumanities(2ndedn),UniversityofMinnesotaPress.

(http://dhdebates.gc.cuny.edu/debates/text/76).

Rosenzweig,R.(2003)‘ScarcityorAbundance?PreservingthePastinaDigitalEra’,TheAmerican

HistoricalReview,108(3):735-762.

Rothman,J.(2014)‘AnAttempttoDiscovertheLawsofLiterature’,TheNewYorker.

Rüegg,W.,&deRidder-Symoens,H.(eds)(1992)AHistoryoftheUniversityinEurope,Cambridge

UniversityPress,Cambridge.

Scheinfeldt,T.(2012)‘SunsetforIdeology,SunriseforMethodology’,inM.K.GoldandL.F.Klein

(eds)DebatesinDigitalHumanities(1stedn),UniversityofMinnesotaPress:124-127.

Schreibman,S.,Siemens,R.,&Unsworth,J.(eds)(2004)ACompaniontoDigitalHumanities,Blackwell

Publishing,Oxford.

Shafer,R.J.(1974)AGuidetoHistoricalMethod,DorseyPress,Belmont(CA).

Small,T.A.(2011)‘WhattheHashtag?AContentAnalysisofCanadianPoliticsonTwitter’,

Information,Communication&Society,14(6):872-895.

Thaller,M.(1991)‘TheHistoricalWorkstationProject’,ComputersandtheHumanities,25(2):149-

162.

Page 26: This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The SAGE Handbook of Web History”, edited by Niels Brügger and Ian Milligan. ... methodologies

ThomasIII,W.G.(2004)‘ComputingandtheHistoricalImagination’,inSchreibman,S.,Siemens,R.,&

Unsworth,J.(eds)ACompaniontoDigitalHumanities,BlackwellPublishing,Oxford:56-68.

Wilkens,M.(2013)‘TheGeographicImaginationofCivilWar-EraAmericanFiction’,AmericanLiterary

History,25(4):803-840.

1Itisalsoimportanttoacknowledgethatreactionstopostmodernapproachesarepresentaswellinthehistoriographicdebate

(seeforexampleEvans,2001).

2SeeforexampletheadoptionofsocialsciencemethodologiesinhistoricalresearchinFogelandEngerman(1974).

3However,therelationshipbetweenhistoryandcomputingontheonesideandliteraryandlinguisticcomputingontheother

sidehasalwaysbeencomplicated(seeforexampleRobertson,2016).

4AsdescribedintheFAQsectionoftheInternetArchive,awebsiteownercanrequesttostopcrawlingorarchivingasiteand

theInternetArchivewillendeavortocomplytoit.Thiswillbesignaledbya'blockedsiteerror'messagesuchas‘ThisURLhas

beenexcludedfromtheWaybackMachine’.

5In2001theUniversityofBolognawebsitewonthe‘WWW’prizefromtheItalianeconomicnewspaperIlSole24Oreforthe

bestwebsiteinthecategory‘School,universityandresearch’.Then,forthreeconsecutiveyears(2005-2007)Unibo.itreceived

the‘Osc@rdelweb’prizeasthebestItalianpublicadministrationwebsite.In2007LuigiNicolais,theItalianMinisterofPublic

Administration,wasalsopresenttoconfertheprize.