Post on 10-Aug-2015
jpd15, Junio 2015
Ignacio Elola @ignacio_elola
Web data? Extrayendo datos de la web
who I am?
web data and import.io
example: text analysis with import.io and MonkeyLearn
summary
import.io?
the Web as a data source
What is import.io? ● Machine reading the web● Point-and-click UI● Map the data on a web page● Algorithms will turn it into structured data ● Real-time through an API
What is import.io? (continued) ● Custom Crawlers● Auto extraction● Authenticated APIs● Cloud scaling● Wide range of integration options
Structure the web
import.io consists of 4 tools
● Magic● Extractor● Crawler● Connector
and completely free...
import.io Magic
Sometimes we need to train the tool ourselves
import.io Extractor
import.io Extractorlets you structure a single page of data
import.io Extractorlets you structure a single page of data
Custom XPaths Custom Regex Updatable in real-time
Sometimes we need to extract data from a lot of URLS
Sometimes we need to extract data from a lot of URLS
import.io Crawler
Sometimes we need to extract data from a lot of URLS
import.io Crawler import.io extractor (bulk queries)
Sometimes we need to extract data from a lot of URLS we don’t know
import.io Crawler
The import.io Crawler relies on minimum input and gives you
maximum output
Sometimes we need to interact with the website
The import.io Connector uses page interactions, such as searches and
extracts the resulting data.
Example: analyzing newspapers with import.io and MonkeyLearn
Example: analyzing newspapers with import.io and MonkeyLearn
https://github.com/ignacioelola/web-text-analyzer
Example: analyzing newspapers with import.io and MonkeyLearn
Example: analyzing newspapers with import.io and MonkeyLearn
Example: analyzing newspapers with import.io and MonkeyLearn
Thanks!
Q & A