Dmi12 workshops - crawling and scraping

49
Crawling and Scraping The Issuecrawler and the Lippmannian device. Erik Borra Michael Stevenson

description

The workshop serves as an introduction to two classic digital methods techniques for issue mapping and analysis. A discussion of the Issue Crawler and the Lippmannian device is followed by a short exercise in which we'll study the presence of skeptics among top sources of information related to climate change.

Transcript of Dmi12 workshops - crawling and scraping

Page 1: Dmi12   workshops - crawling and scraping

Crawling and ScrapingThe Issuecrawler and the Lippmannian device.

Erik BorraMichael Stevenson

Page 2: Dmi12   workshops - crawling and scraping

“Reworking method for Internet research”

Page 3: Dmi12   workshops - crawling and scraping

Issuecrawler.

Page 4: Dmi12   workshops - crawling and scraping

Body text

Body Text

Site

A

B

C

CRAWL STARTING POINTS

Page 5: Dmi12   workshops - crawling and scraping

Body text

Body Text

Site

A

B

C

CRAWL STARTING POINTS

Site

A

B

C

D

CRAWL DEPTH ONEfollow all starting points' outlinks

Page 6: Dmi12   workshops - crawling and scraping

Body text

Body Text

Site

A

B

C

CRAWL STARTING POINTS

Site

A

B

C

D

CRAWL DEPTH ONEfollow all starting points' outlinks

Site

A

B

C

D

E

F

G

H

CRAWL DEPTH TWOfollow all outlinks from the pages found in the previous depth

Page 7: Dmi12   workshops - crawling and scraping

Body text

Body Text

ANALYSIS SNOWBALLretain all links and sites discovered during the crawl

Site

A

B

C

D

E

F

G

H

Page 8: Dmi12   workshops - crawling and scraping

Body text

Body Text

ANALYSIS INTER-ACTOR

retain only links between the starting points

Site

A

B

C

Page 9: Dmi12   workshops - crawling and scraping

Body text

Body Text

ANALYSIS CO-LINK

retain sites that receive links from at least two other sites

Site

B

D

Page 10: Dmi12   workshops - crawling and scraping

Issuecrawler. Modes of analysis

Page 11: Dmi12   workshops - crawling and scraping

Issuecrawler.Micro-politics of association

Pharmaceutical multinational and environmental NGO link to (inter)governmental organizations, but these do not link back.

Pharmaceutical multinational links to environmental NGO, but NGO does not link back.

(Govcom.org, 1999)

Page 12: Dmi12   workshops - crawling and scraping

Issuecrawler.Micro-politics of association

Clusters of Armenian and international organizations, latter do not link back.

(Audrey Selian, 2004)

Page 13: Dmi12   workshops - crawling and scraping

Issuecrawler.Macro-politics of association

Democratic Presidential Primary Web Campaigns (Betsy Sinclair 2007; 2008)

Page 14: Dmi12   workshops - crawling and scraping

Issuecrawler.Macro-politics of association

Page 15: Dmi12   workshops - crawling and scraping

Issuecrawler.Macro-politics of association

Page 16: Dmi12   workshops - crawling and scraping

Issuecrawler.Network composition over time

Page 17: Dmi12   workshops - crawling and scraping

Issuecrawler.Micro-politics of associationMacro-politics of association

Network composition over time

However... “Doesn’t do content analysis”

Page 18: Dmi12   workshops - crawling and scraping

Lippmannian device.Modes of analysis

Page 19: Dmi12   workshops - crawling and scraping

Walter Lippmann (1889-1974).“A Test of the News,” 1920

Public Opinion, 1922The Phantom Public, 1927

‘The problem is to locate by clear and coarse objective tests the actor in a controversy who is most worthy of public support.’ (p120)

-The Phantom Public

Page 20: Dmi12   workshops - crawling and scraping

Lippmannian device. Showing the partisanship of an actor.

Showing the issue agenda of an organization.

Source cloud Issue cloud

Partisanship or commitment. Which sources mention the expert’s name?

Issue agenda. Which issues are on the agenda of an organization or movement?

Page 21: Dmi12   workshops - crawling and scraping

Lippmannian device. “Source cloud”Showing the partisanship or

commitment of sources to one name

Craig Venter's presence in the Synthetic Biology issue space, March 2008. Top sources on "synthetic biology" according to a Google query, with number of mentions of Venter per source, ordered.

Page 22: Dmi12   workshops - crawling and scraping

Lippmannian device. “Source cloud”

Method for showing the partisanship or commitment of sources to names

1. Gather source list (e.g. through IssueCrawler)2. Query source list for one or more experts

Page 23: Dmi12   workshops - crawling and scraping

Lippmannian device. “Source cloud”Showing the partisanship or

commitment of sources to names

Climate Change Skeptics: Who recognizes them?

(Digital Methods Initiative, 2007)https://wiki.digitalmethods.net/Dmi/ClimateChangeSkeptics

Page 24: Dmi12   workshops - crawling and scraping

Lippmannian device. “Making an Issue cloud”

An organization’s issue agenda (or commitment)

Public Knowledge, a digital rights NGO, has issues. Which are they most committed to?

Page 25: Dmi12   workshops - crawling and scraping
Page 26: Dmi12   workshops - crawling and scraping
Page 27: Dmi12   workshops - crawling and scraping
Page 28: Dmi12   workshops - crawling and scraping

Lippmannian device. “Issue cloud”

Showing the issue commitments of the NGO, Public Knowledge

Public Knowledge's issue commitment. Lower six issues on Public Knowledge's issue list, ranked according to number of mentions of issues on publicknowledge.org, 2 October 2009.

Page 29: Dmi12   workshops - crawling and scraping
Page 30: Dmi12   workshops - crawling and scraping

Lippmannian device. “Making an Issue cloud”

Greenpeace issues, http://www.greenpeace.org/international/campaigns.

Stop climate changeProtect ancient forestsDefending our OceansSay no to genetic engineeringEliminate toxic chemicalsDemand Peace and DisarmamentEnd the nuclear ageEncourage sustainable trade

Keep most significant issue language.

"climate change""ancient forests"oceans"genetic engineering""toxic chemicals"disarmament"nuclear power""sustainable trade"

Page 31: Dmi12   workshops - crawling and scraping
Page 32: Dmi12   workshops - crawling and scraping

Lippmannian device. “Issue cloud”

Greenpeace’s issue agenda (distribution of commitment)

Greenpeace's issue commitment. Greenpeace's campaign issue list, ranked according to number of mentions of issues on greenpeace.org, 11 October 2009.

Page 33: Dmi12   workshops - crawling and scraping

Lippmannian device. “Making an Issue cloud”

Multiple sources, multiple issues

What is the agenda of the global human rights network?

Which issues are at the top and

at the bottom of the agenda?

What is the current level of commitment to a particular issue?

Page 34: Dmi12   workshops - crawling and scraping

Lippmannian device. “Making an Issue cloud”

Multiple sources, multiple issues

This is more complicated, but still doable(Govcom.org, University of Pittsburg, UMass Amhearst, ongoing)

Page 35: Dmi12   workshops - crawling and scraping

Lippmannian device. “Making an Issue cloud”

Take three good lists of human rights organizations (global south, global north, UN’s)

Page 36: Dmi12   workshops - crawling and scraping

Lippmannian device. “Making an Issue cloud”

Make a list of all issues listed on all Websites

Page 37: Dmi12   workshops - crawling and scraping
Page 38: Dmi12   workshops - crawling and scraping
Page 39: Dmi12   workshops - crawling and scraping

Lippmannian device. “Issue cloud”

Showing the issue commitments of global human rights network

Global human rights issue agenda. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.

Page 40: Dmi12   workshops - crawling and scraping

Lippmannian device. “Issue cloud”

Showing the issue commitments of global human rights network

Global human rights issue agenda, bottom. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.

Page 41: Dmi12   workshops - crawling and scraping

Lippmannian device.

Partisanship check. Which side of the controversy is an actor on?

Use the source cloud

Page 42: Dmi12   workshops - crawling and scraping

Lippmannian device.

1. Check an organization’s issue agenda. What are its current commitments?

2. Check a national or global movement’s issue agenda. What are its current commitments?

Use the issue cloud

Page 43: Dmi12   workshops - crawling and scraping

Questions.

Page 44: Dmi12   workshops - crawling and scraping

Exercise: Sourcing Climate Change

Skeptics.

Page 45: Dmi12   workshops - crawling and scraping

Body text

Body Text

Climate Change Sceptics on the Web (Frederick Seitz)

Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web?Findings_There is distance between the skeptics and the top of the search engine returns.

Source_google.comQuery_“Frederick Seitz”Method_Search for query “Frederick Seitz” in top 100. Organized in order.Tools_Google Scraper and Tag Cloud GeneratorDate_30 July 2007

Product_of the Digital Methods Initiative, dmi.mediastudies.nl. Analysis_by Bram Nijhof, Richard Rogers and Laura van der Vlies. Design_Anne Helmond.

CC_BY:NC:SA

campaigncc.org (1)

climateark.org (4)marshall.org (8)

realclimate.org (35)sourcewatch.org (21)

abc.net.au (0)

acfonline.org.au (0)

bbc.co.uk (0) bom.gov.au (0)

cbc.ca (0)

ciel.org (0)

climatechallenge.gov.uk (0)

climatechange.ca.gov (0)

climatechange.com.au (0)

climatechangecentral.com (0)

climatechangecollege.org (0)

climatecrisis.net (0)

climatescience.gov (0)

dar.csiro.au (0)

davidsuzuki.org (0)

defra.gov.uk (0)

dfat.gov.au (0)

ec.gc.ca (0)

ecn.ac.uk (0)

ecokids.ca (0)

ecy.wa.gov (0)

eea.europa.eu (0)

eldis.org (0)

energy.gov (0)

envirolink.org (0)

epa.gov (0)

exploratorium.edu (0)

faqs.org (0)

foe.co.uk (0)

ft.com (0)

g8.gov.uk (0)

gcrio.org (0)

greenpeace.org (0)

grida.no (0)

guardian.co.uk (0)

iea.org (0)

iisd.org (0)

ipcc.ch (0)

iucn.org (0)

ltscotland.org.uk (0)

metoffice.gov.uk (0)

mfe.govt.nz (0)

mofa.go.jp (0)

nature.com (0) nature.org (0)

ncdc.noaa.gov (0)

open2.net (0)

panda.org (0)

pewclimate.org (0)

royalsoc.ac.uk (0)

scidev.net (0)

scienceagogo.com (0)

state.gov (0)

theglobeandmail.com (0)

ucar.edu (0)

un.org (0)

unep.org (0)

who.int (0)

whoi.edu (0)

worldwildlife.org (0)

CLIMATE CHANGESCEPTICS

Page 46: Dmi12   workshops - crawling and scraping

Research Question:Which climate change issue actors mention the skeptics, and what kinds of actors are more likely to mention them?

Method:Comparative Query: skeptics in three source sets (‘top’ sources, climate change blogs and climate change science network), outputting source cloud for each.

Page 47: Dmi12   workshops - crawling and scraping

Source Sets:

(1) Top ten Google returns for “climate change” (mix of media as well as governmental organizations)

Page 48: Dmi12   workshops - crawling and scraping

Source Sets:

(2) Climate change blogs network (IssueCrawler results - mix of blogs, social media, traditional media and governmental and non-governmental organizations)

Page 49: Dmi12   workshops - crawling and scraping

Source Sets:

(3) Climate change science network (IssueCrawler results - governmental, non-governmental, educational and media organizations)