A introduction to Scraperwiki (for not developers)

19
A introduction to A introduction to Maurizio Napolitano <[email protected]> Summer School "Data journalism e visualizzazione grafica dei dati" 29 July 2011 – Flavon (TN) for not developers

description

A simple introduction to scraperwiki with attention for the NOT developers

Transcript of A introduction to Scraperwiki (for not developers)

Page 1: A introduction to Scraperwiki (for not developers)

A introduction to A introduction to

Maurizio Napolitano <[email protected]>

Summer School"Data journalism e visualizzazione grafica dei dati"29 July 2011 – Flavon (TN)

for not developers

Page 2: A introduction to Scraperwiki (for not developers)

Description in the Description in the namename

sourcehttp://www.modot.org/central/major_projects/July2006photos.htm

sourcehttp://www.commoncraft.com/video/wikis

SCRAPER

WIKI

Page 3: A introduction to Scraperwiki (for not developers)

Wiki like WikipediaWiki like WikipediaScraper like ??? Scraper like ???

a scraper extract datafrom a content

Page 4: A introduction to Scraperwiki (for not developers)

Legal aspectLegal aspect

Scraper sites may violate copyright law.Even taking content from an open content site can be a copyright violation, if done in a way which does not respect the license. For instance, the GNU Free Documentation License (GFDL) and Creative Commons ShareAlike (CC-BY-SA) licenses require that a republisher inform readers of the license conditions, and give credit to the original author.

http://en.wikipedia.org/wiki/Scraper_site

Page 5: A introduction to Scraperwiki (for not developers)

.. then scraperwiki is ..... then scraperwiki is ...

https://scraperwiki.com/

A place where share scrapers … and data :)

Page 6: A introduction to Scraperwiki (for not developers)

ScraperWiki legal ScraperWiki legal aspectaspect

https://scraperwiki.com/terms_and_conditions/

Use

6. You agree that, in using the ScraperWiki site and services, you will not interfere with the legal rights

[...]

Intellectual Property

9. Subject to the following paragraphs, the source code of the ScraperWiki site, and all other copyrightable materials that form a part of it is released under the GNU Affero General Public License.

10. All scraping code hosted on the site is licensed under the GNU General Public License. You hereby license all scraping code you create using ScraperWiki under the same licence.

11. You agree to assert no additional intellectual property rights, including copyright and database right, in any scraped data other than those which subsisted in the relevant web sites before the running of the relevant scraper and which were held by you at that time.

12. You grant us a non-exclusive, worldwide, licence to use any data that you store on our site, for the purposes of administering the site.

Page 7: A introduction to Scraperwiki (for not developers)

ScraperWiki legal ScraperWiki legal aspectaspect

USE

6. You agree [..] you will not interfere with the legal rights [...]

INTELLECTUAL PROPERTY

9. […] the source code of the ScraperWiki [..] is released

under the GNU Affero General Public License.

10. All scraping code […] is licensed under the GNU General Public License.

11. You agree to assert no additional intellectual property rights [...]12. You grant us a non-exclusive, worldwide, licence to use any data that you store on our site, for the purposes of administering the site.

Page 8: A introduction to Scraperwiki (for not developers)

HOW CREATE A HOW CREATE A SCRAPER?SCRAPER?

Page 9: A introduction to Scraperwiki (for not developers)

The NOT developersThe NOT developers

Page 10: A introduction to Scraperwiki (for not developers)

The technical The technical approachapproach

http://unstats.un.org/unsd/demographic/products/socind/education.htm

Page 11: A introduction to Scraperwiki (for not developers)

Behind the pageBehind the page

HTMLcode

Page 12: A introduction to Scraperwiki (for not developers)

Where are the data?Where are the data?

There is a structurebehind!!!

Page 13: A introduction to Scraperwiki (for not developers)

The algorithm!!!The algorithm!!!

Download th web page Read the information

Find the right position

Extract the data

Create a CSV file

data1;data2;data3[...]dataN1;dataN2;dataN3

Page 14: A introduction to Scraperwiki (for not developers)

Example: python codeExample: python code

https://scraperwiki.com/docs/python/python_intro_tutorial/

Page 15: A introduction to Scraperwiki (for not developers)

… … and everything run and everything run in the cloud!!!in the cloud!!!

Page 16: A introduction to Scraperwiki (for not developers)

The code in the cloudThe code in the cloud

https://scraperwiki.com/scrapers/mlb_rosters/

Page 17: A introduction to Scraperwiki (for not developers)

Sharing & ReUseSharing & ReUse

Page 18: A introduction to Scraperwiki (for not developers)

Enjoy!!!Enjoy!!!

httpS://scraperwiki.com/

Page 19: A introduction to Scraperwiki (for not developers)

Thanks!Thanks!

A introduction to ScraperWiki for NOT developers by Maurizio Napolitano <[email protected]>

is licensed under a Creative Commons Attribuzione 3.0 Unported License.

Summer School"Data journalism e visualizzazione grafica dei dati"29 July 2011 – Flavon (TN)

Created for