A introduction to Scraperwiki (for not developers)
-
Upload
maurizio-napolitano -
Category
Technology
-
view
46.693 -
download
1
description
Transcript of A introduction to Scraperwiki (for not developers)
A introduction to A introduction to
Maurizio Napolitano <[email protected]>
Summer School"Data journalism e visualizzazione grafica dei dati"29 July 2011 – Flavon (TN)
for not developers
Description in the Description in the namename
sourcehttp://www.modot.org/central/major_projects/July2006photos.htm
sourcehttp://www.commoncraft.com/video/wikis
SCRAPER
WIKI
Wiki like WikipediaWiki like WikipediaScraper like ??? Scraper like ???
a scraper extract datafrom a content
Legal aspectLegal aspect
Scraper sites may violate copyright law.Even taking content from an open content site can be a copyright violation, if done in a way which does not respect the license. For instance, the GNU Free Documentation License (GFDL) and Creative Commons ShareAlike (CC-BY-SA) licenses require that a republisher inform readers of the license conditions, and give credit to the original author.
http://en.wikipedia.org/wiki/Scraper_site
.. then scraperwiki is ..... then scraperwiki is ...
https://scraperwiki.com/
A place where share scrapers … and data :)
ScraperWiki legal ScraperWiki legal aspectaspect
https://scraperwiki.com/terms_and_conditions/
Use
6. You agree that, in using the ScraperWiki site and services, you will not interfere with the legal rights
[...]
Intellectual Property
9. Subject to the following paragraphs, the source code of the ScraperWiki site, and all other copyrightable materials that form a part of it is released under the GNU Affero General Public License.
10. All scraping code hosted on the site is licensed under the GNU General Public License. You hereby license all scraping code you create using ScraperWiki under the same licence.
11. You agree to assert no additional intellectual property rights, including copyright and database right, in any scraped data other than those which subsisted in the relevant web sites before the running of the relevant scraper and which were held by you at that time.
12. You grant us a non-exclusive, worldwide, licence to use any data that you store on our site, for the purposes of administering the site.
ScraperWiki legal ScraperWiki legal aspectaspect
USE
6. You agree [..] you will not interfere with the legal rights [...]
INTELLECTUAL PROPERTY
9. […] the source code of the ScraperWiki [..] is released
under the GNU Affero General Public License.
10. All scraping code […] is licensed under the GNU General Public License.
11. You agree to assert no additional intellectual property rights [...]12. You grant us a non-exclusive, worldwide, licence to use any data that you store on our site, for the purposes of administering the site.
HOW CREATE A HOW CREATE A SCRAPER?SCRAPER?
The NOT developersThe NOT developers
The technical The technical approachapproach
http://unstats.un.org/unsd/demographic/products/socind/education.htm
Behind the pageBehind the page
HTMLcode
Where are the data?Where are the data?
There is a structurebehind!!!
The algorithm!!!The algorithm!!!
Download th web page Read the information
Find the right position
Extract the data
Create a CSV file
data1;data2;data3[...]dataN1;dataN2;dataN3
Example: python codeExample: python code
https://scraperwiki.com/docs/python/python_intro_tutorial/
… … and everything run and everything run in the cloud!!!in the cloud!!!
The code in the cloudThe code in the cloud
https://scraperwiki.com/scrapers/mlb_rosters/
Sharing & ReUseSharing & ReUse
Enjoy!!!Enjoy!!!
httpS://scraperwiki.com/
Thanks!Thanks!
A introduction to ScraperWiki for NOT developers by Maurizio Napolitano <[email protected]>
is licensed under a Creative Commons Attribuzione 3.0 Unported License.
Summer School"Data journalism e visualizzazione grafica dei dati"29 July 2011 – Flavon (TN)
Created for