Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price...

10
www.statistik.at Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa Group 2015 – Topic 1 Alternate data sources and Index number formulas Session 2 Online Prices and Web Scraping

Transcript of Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price...

Page 1: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Wir bewegen Informationen

Automatic price collection on the internet (web scraping) 

Ingolf Boettcher

Tokio20. May 2015 Ottawa Group 2015 –

Topic 1 Alternate data sources and Index number formulasSession 2 Online Prices and Web Scraping

Page 2: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 2 | 26.05.2015

Web scraping

There is a hugeamount of data on the internet <HTML>

<HEAD><TITLE> DATA </Title></HEAD> </HTML>

How can we bestcollect/scrape/harvestdata from there forstatistical purposes?

Page 3: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 3 | 26.05.2015

Web scraping

Internet data collection –Minimum goal for (Price) Statistics:

Turn website content into a spreadsheet

Page 4: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 4 | 26.05.2015

Web scraping

Internet data collection

Options:

1. Manual price collection2. Develop an API /Web scraper2.1 by writing custom computer code2.2 by using point and click web tools

Page 5: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 5 | 26.05.2015

Web scraping

Reasons for not writing an ownweb scraper

IT‐developer needed, therefore:• Expensive • Inflexible  • Even maintenance cannot be

handled by CPI staff

Page 6: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 6 | 26.05.2015

Web scraping

Reasons to use click and point webtools forweb scraping:

No IT‐developer needed, therefore:

• Cheap• Flexible• No programming skill required

Page 7: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 7 | 26.05.2015

Web scraping

How web scraping with click and point usingimport.io looks like:

• web-platform that allows to structure andextract data from websites

Page 8: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 8 | 26.05.2015

Webscraping

Web scraping with click and point on web‐based platform offers solutions to:

• extract data by point-and-click• record actions on a website • crawl all the data of a webpage

More issues to be considered:• Legality to crawl on websites• Internal IT Security • Training of staff

Page 9: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 9 | 26.05.2015

Contact:Ingolf Boettcher

Guglgasse 13, 1110 WienTel: +43 (1) 71128-7917

Fax: +43 (1) [email protected]

Automatic price collection on the internet (web scraping)

Page 10: Automatic data collection on the Internet (web …... Wir bewegen Informationen Automatic price collection on the internet (web scraping) Ingolf Boettcher Tokio 20. May 2015 Ottawa

www.statistik.at Folie 10 | 26.05.2015

Webscraping with import.io