Webscraping for jounalists

9
Webscraping for journalists CAJ May 13, 2011 “A little Wget magic”

description

From a presentation I have at the Canadian Association of Journalists on how journalists can learn to web scrape. Most of the presentation was real-time demos not included in this PPT deck.

Transcript of Webscraping for jounalists

Page 1: Webscraping for jounalists

Webscraping for journalistsCAJ May 13, 2011

“A little Wget magic”

Page 2: Webscraping for jounalists

Webscraping

Using software that simulates a web browser to download large quantities of information from a web site.

Page 3: Webscraping for jounalists

Why webscrape?

• Assemble your own copy of online data• Save time pointing-and-clicking

Page 4: Webscraping for jounalists

Why webscrape?

• Data publishers (governments) want you to access data on their terms

Page 5: Webscraping for jounalists
Page 6: Webscraping for jounalists
Page 7: Webscraping for jounalists
Page 8: Webscraping for jounalists

Is it legal?

Yes. But.

Do it ethically.Watch for robots.txt