Richard deswarte interrogating the archived uk web

Post on 02-Jul-2015

1.191 views 2 download

description

Digital History seminar 4 November 2014 Live Stream: http://ihrdighist.blogs.sas.ac.uk/2014/10/28/tuesday-4-november-interrogating-the-archived-uk-web-historians-and-social-scientists-research-experiences/

Transcript of Richard deswarte interrogating the archived uk web

The Search for Meaning:

Exploring Euroscepticism in the

UK Web Archive

IHR Digital History Seminar

IHR, 23 April 2014Richard Deswarte

School of History, UEA

• Intro to ‘Revealing British Eurosceptism in the

UK Web Domain and Archive’

• Searching

• Meaning

• Provisional thoughts so far

Searching

• Eurosceptic, Euroscepticism, UKIP, EU, Referendum Party

• Searched 0.5% of domain; then 12%; then fullish

• Limited but numerous results

• UK Web Archive - Eurosceptic 312 returns; 5604 returns; approx. 14 000

• UK Government Web Archive – Eurosceptic 3420

• Ordering of results – currently crawl date

• Strange ‘false’ returns – Yorkshire Post sports pages, Morning Advertiser

• Results/Filters – crawl year, hosts, suffixes, postcode, sentiment, content type, language

Meaning

• Making sense and analysing results – research valid

• Dirty data – Yorkshire Post

• Misleading data – UKIP

• Qualitative • Needle in a haystack

• Added value tools – sentiment analysis

• Quantitative – completeness of data and crawls

• Tools

• Downloading/exporting

Sentiment Analysis

Morning Advertiser

Forum postings

14 Feb 2007

IA WayBackMachine

Neutral (14)

Very Positive (10)

Very Negative (5)

Mildly positive (3)

Positive (1)

0

200

400

600

800

1000

1200

1400

Year 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Euroscepticism

Eurosceptic

UKIP

EU (x100)

Referendum Party

Keyword search returns 1996-2010

Preliminary Thoughts

• Search focus

• Unstructured big data (uncatalogued)

• Access to & understanding ‘full’ data

• Understanding meaning – sampling

• ‘False returns’ & ‘clean data’

• Tools

• Exporting results

• Interpreting results - sampling

• A unique but problematic primary source (anything & everything almost)

Thank you. Comments and questions welcome.

Richard Deswarte

School of History

UEA

r.deswarte@uea.ac.uk