Data Journalism (City Online Journalism wk8)

48
Online Journalism City University Paul Bradshaw Data journalism

description

Week 8 lecture to students on the 8 MAs at City University

Transcript of Data Journalism (City Online Journalism wk8)

Page 1: Data Journalism (City Online Journalism wk8)

Online JournalismCity UniversityPaul Bradshaw

Data journalism

Page 2: Data Journalism (City Online Journalism wk8)
Page 3: Data Journalism (City Online Journalism wk8)

1. What is it?2. Where to get it3. How to get it

Themes

Page 4: Data Journalism (City Online Journalism wk8)
Page 5: Data Journalism (City Online Journalism wk8)
Page 6: Data Journalism (City Online Journalism wk8)
Page 7: Data Journalism (City Online Journalism wk8)
Page 8: Data Journalism (City Online Journalism wk8)
Page 9: Data Journalism (City Online Journalism wk8)
Page 10: Data Journalism (City Online Journalism wk8)

“Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.”

Adrian Holovaty

Page 11: Data Journalism (City Online Journalism wk8)
Page 12: Data Journalism (City Online Journalism wk8)
Page 13: Data Journalism (City Online Journalism wk8)

Times film genres

Page 14: Data Journalism (City Online Journalism wk8)

• Times Data Blog

Page 15: Data Journalism (City Online Journalism wk8)
Page 17: Data Journalism (City Online Journalism wk8)

”QUOTE”

Now is a good time.

Page 18: Data Journalism (City Online Journalism wk8)

“The Tribune’s more than three dozen interactive databases, collectively have drawn three times as many page views as the site’s stories. [75% of traffic]”

http://bit.ly/dj2dmz

Page 19: Data Journalism (City Online Journalism wk8)

.

What is data?

Page 20: Data Journalism (City Online Journalism wk8)

NumbersTextLive dataBehavioural dataImages, audio, video

Anything that a computer can work with

Page 21: Data Journalism (City Online Journalism wk8)
Page 22: Data Journalism (City Online Journalism wk8)

Start with the data and look for the stories? (MPs’ expenses)Or start with a lead and look for the data?

Passive vs active data journalism

Page 23: Data Journalism (City Online Journalism wk8)

Data Journalism Continuum

Page 24: Data Journalism (City Online Journalism wk8)

Data.gov.ukGuardian datastoreOpenlylocal,Open Corporates, Open Charities, Who's Lobbying etc.FOI requests (WDTK), disclosure logsBooks - British Political Facts

Finding

Page 25: Data Journalism (City Online Journalism wk8)

GetTheData.orgWDMMG forumsMySociety mailing listsOpen Data CookbookWolfram Alpha forum

Finding – data communities

Page 26: Data Journalism (City Online Journalism wk8)
Page 27: Data Journalism (City Online Journalism wk8)

Government - national and local'Monitors' - regulators & other bodiesCharities, pressure groupsInstitutions - academic, scientific, healthBusiness, financeMedia, entertainment, sport

Other secondary sources

Page 28: Data Journalism (City Online Journalism wk8)

Site:gov.uk (etc)Filetype:pdf (etc) Imagine the page you hope to find, including jargon etc. Database contents are invisibleGoogle News alerts: report OR review

 Advanced search

Page 29: Data Journalism (City Online Journalism wk8)

"quotes search for exact phrases""disclosure logs" site:nhs.uk + ensures page contains word: +logs- omits results with word: -wooden* wildcard, e.g. "deaths * custody"~ synonyms, e.g. ~deaths

 Advanced search

Page 30: Data Journalism (City Online Journalism wk8)
Page 31: Data Journalism (City Online Journalism wk8)

Tip: use overseas sources

• US medicine databases• EU subsidy databases • Swedish people data• International police agency

correspondence with UK

Page 32: Data Journalism (City Online Journalism wk8)

RSS, XML, JSON, RDF - and APIsScraperwikiOutwit HubYahoo! PipesSpreadsheet formulae(look them up)

Feeds and scrapers

Page 33: Data Journalism (City Online Journalism wk8)

Format? Table? Pattern? URL?

'Structured' data

Page 34: Data Journalism (City Online Journalism wk8)

http://www.eib.org/projects/pipeline/?start=2009&end=2010&status=&region=&country=united+kingdom&sector=

Page 35: Data Journalism (City Online Journalism wk8)

'Structured' HTML? (Use Firebug)

<p>      <strong>Case Ref: FS50295557 <br />Date: 04/11/2010 <br />Public Authority: London Borough of Southwark <br />Summary: </strong>The complainant requested a copy of the authorities approved business plan  [...]<br /><strong>Section of Act/EIR &amp; Finding: </strong>FOI 1 - Complaint Upheld , FOI 10 - Complaint Upheld <br /><a title="Opens in new window" href="~/media/documents/decisionnotices/2010/fs_50295557.ashx" target="_blank">View PDF of Decision Notice FS50295557</a></p>

Page 36: Data Journalism (City Online Journalism wk8)

=ImportHTML("http://bob.com/mytable", "table", 1)=ImportXML("http://backtweets.com/search.xml?itemsperpage=100&...”)=ImportFeed("http://search.twitter.com/search.atom?rpp=20&page=1&q="&A2)

Spreadsheet formulae

Page 37: Data Journalism (City Online Journalism wk8)

Fetch Page module Regex

Yahoo! Pipes

Page 38: Data Journalism (City Online Journalism wk8)

"A problem for sites who want to provide privacy while allowing new users to join easily. Scraping services may constitute a violation of terms of service; tactics often resemble a denial-of-service attack or a security exploit."

Ethics

Page 39: Data Journalism (City Online Journalism wk8)

.

Questions?

Page 40: Data Journalism (City Online Journalism wk8)

Links

OnlineJournalismClasses.tumblr.comDelicious.com/paulb/cityoj08Delicious.com/paulb/datajournalismDelicious.com/paulb/visualisationDelicious.com/paulb/data

Page 41: Data Journalism (City Online Journalism wk8)

- Use advanced search to find data- Use tools to scrape data- Visualise a politician's speeches using Wordle or Many Eyes- Read up on some of the tools or technologies before the lab

 Lab

Page 42: Data Journalism (City Online Journalism wk8)

Books

Darrell Huff - How To Lie With Statistics Blastland & Dilnot - The Tiger That Isn'tDonna Wong - The WSJ Guide to Information GraphicsBrian Suda - A Practical Guide to Designing with Data

Page 43: Data Journalism (City Online Journalism wk8)

.

Assignments

Page 44: Data Journalism (City Online Journalism wk8)

Enough time?

10 credits = 100 hoursLectures = 15 hoursGroup blog = 60 hours (75%)Strategy = 20 hours (25%)(Some in labs) + 5 hours on other issues

Page 45: Data Journalism (City Online Journalism wk8)

Enough time? Blog

Just an example:10 posts ranging from simple links to interviews, analysis, experiment5.5 hours ave per week x10 weeks = 55 hours+ 5 hours to write evaluation

Page 46: Data Journalism (City Online Journalism wk8)

Enough time? Strategy

Just an example:12.5 hours researching community30 mins per week x10 weeks with community (2.5 hours)5 hours analysis & write up

Page 47: Data Journalism (City Online Journalism wk8)

Group blogs

8 areas:1.Online video; 2. Online audio3. Data; 4. UGC5. Community management6. Mobile; 7. Social media8. Infographics and photography

Page 48: Data Journalism (City Online Journalism wk8)

Criteria

Ass1: Newsgathering/researchProductionLaw, ethics and strategyAss 2: ResearchAnalysisExecution