Scraping
-
Upload
serafin -
Category
Technology
-
view
617 -
download
0
Transcript of Scraping
![Page 1: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/1.jpg)
Scraping express
Scraping expressEl arte de recuperar datos
Serafın Velez [email protected] – @seravb
22 de febrero de 2013Serafın Velez Barrera Scraping express
![Page 2: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/2.jpg)
Scraping express
Indice
Serafın Velez Barrera Scraping express
![Page 3: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/3.jpg)
Scraping express
Introduccion
¿Que eso del scraping?
El scraping es un tecnica que se usa para recuperardatos de una web o documento basicamente.
Serafın Velez Barrera Scraping express
![Page 4: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/4.jpg)
Scraping express
¿Como se hace?
¿Como se hace?
Existen varios metodos, por ejemplo:Para una web Algun framework Scrapy, FastCrawl..Tablas de PDF Algunas web Tabula
Serafın Velez Barrera Scraping express
![Page 5: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/5.jpg)
Scraping express
Scrapy
Instalacion de Scrapy
Podremos instalar Scrapy de varias maneras:Descarga de la web oficial de ScrapyLınea de comandos:
easy install -U Scrapypip install Scrapy
Centro de software
Serafın Velez Barrera Scraping express
![Page 6: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/6.jpg)
Scraping express
Scrapy
Conociendo a Scrapy
Cuando usamos Scrapy tenemos que crear un proyecto, y cadaproyecto se compone de:
Items Definimos los elementos a extraer.Spiders Es el corazon del proyecto, aquı definimos el
procedimiento de extraccion.Pipelines Son los elementos para analizar lo obtenido: validacion de
datos, limpieza del codigo html...
Serafın Velez Barrera Scraping express
![Page 7: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/7.jpg)
Scraping express
Scrapy
Internamente Scrapy
Serafın Velez Barrera Scraping express
![Page 8: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/8.jpg)
Scraping express
Scrapy
Primeros pasos - Crear un proyecto
1 scrapy startproject OpenDataDayProject
Serafın Velez Barrera Scraping express
![Page 9: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/9.jpg)
Scraping express
Scrapy
Primeros pasos - Definicion de la informacion
1 from scrapy.item import Item, Field
2 class ODDItem(Item):
3 title = Field()
4 link = Field()
5 desc = Field()
Serafın Velez Barrera Scraping express
![Page 10: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/10.jpg)
Scraping express
Scrapy
Primeros pasos - Programacion de los Spiders
1 from scrapy.spider import BaseSpider
2 class ODDSpider(BaseSpider):
3 name = "odd"
4 allowed\_domains = ["ugr.es"]
5 start\_urls = [
6 "http://www.ugr.es"
7 ]
8 def parse(self, response):
9 filename = response.url.split("/")[-2]
10 open(filename , ’wb’).write(response.body)
Serafın Velez Barrera Scraping express
![Page 11: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/11.jpg)
Scraping express
Scrapy
Primeros pasos - Ejecutando el proyecto
1 scrapy crawl OpenDataDayProject
Serafın Velez Barrera Scraping express
![Page 12: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/12.jpg)
Scraping express
Scrapy
Primeros pasos - Salvando lo obtenido
1 scrapy crawl OpenDataDayProject -o info.json -t json
Serafın Velez Barrera Scraping express
![Page 13: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/13.jpg)
Scraping express
Conclusiones
Conclusion
1 Piensa bien que quieres buscar/hacer (piensa en losaspectos legales tambien).
2 Buscate algun framework para trabajar o programate tuscript/programa para extraer datos.
3 Extrae los datos.4 Procesalos.5 Almacenalos si te es necesario.
Serafın Velez Barrera Scraping express
![Page 14: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/14.jpg)
Scraping express
Conclusiones
Serafın Velez Barrera Scraping express
![Page 15: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/15.jpg)
Scraping express
Conclusiones
Bibliografıa
Web oficial de ScrapyScrapy en un vistazoTutorial de ScrapyEjemplo en GithubTabula
Serafın Velez Barrera Scraping express
![Page 16: Scraping](https://reader038.fdocuments.in/reader038/viewer/2022110122/55a48e0c1a28ab8f288b4631/html5/thumbnails/16.jpg)
Scraping express
Conclusiones
Licencia
Scraping express - El arte de recuperar datosby Serafın Velez Barrera is licensed under a
Creative Commons Reconocimiento-NoComercial-CompartirIgual 3.0 Unported
License.
Serafın Velez Barrera Scraping express