Scraping by examples

71
Alexandre Gomes Scraping by examples Friday, May 20, 2011

description

Learn how to scrap web pages in Ruby, Javascript (and others, soon).

Transcript of Scraping by examples

Page 1: Scraping by examples

Alexandre Gomes

Scrapingby examples

Friday, May 20, 2011

Page 2: Scraping by examples

http://creativecommons.org/licenses/by-nc/3.0/br/Friday, May 20, 2011

Page 4: Scraping by examples

Resumo do Censo 2010

Friday, May 20, 2011

Page 5: Scraping by examples

Resumo do Censo 2010

Friday, May 20, 2011

Page 6: Scraping by examples

Friday, May 20, 2011

Page 7: Scraping by examples

Friday, May 20, 2011

Page 8: Scraping by examples

Qual a relação entre os índices de alfabetização e a proporção feminina?

Friday, May 20, 2011

Page 9: Scraping by examples

0.49mulheres da região

total de pessoas da região

7.859.539

7.859.539 + 8.004.915= =

0.89alfabetizados* da região

total de pessoas* da região

11.326.492

12.670.041= =

Exemplo

* acima de 10 anos de idade

Friday, May 20, 2011

Page 10: Scraping by examples

E nas demais

regiões?Friday, May 20, 2011

Page 11: Scraping by examples

Scraping by Examples

Friday, May 20, 2011

Page 12: Scraping by examples

Nokogiri 鋸

Friday, May 20, 2011

Page 13: Scraping by examples

#1 Acessar a página que contém o dado

desejado

Friday, May 20, 2011

Page 14: Scraping by examples

teste

Friday, May 20, 2011

Page 15: Scraping by examples

teste

codigo

Friday, May 20, 2011

Page 16: Scraping by examples

$ rspec spec/ibge_censo2010_spec.rb:8Run filtered using {:line_number=>8}

IBGECenso2010 should open page with "Razão de sexo, população de homens e mulheres"

Finished in 44.4 seconds1 example, 0 failures$

Friday, May 20, 2011

Page 17: Scraping by examples

#2 Recuperar o dado desejado

Friday, May 20, 2011

Page 18: Scraping by examples

Antes, entenda a estrutura da página

Friday, May 20, 2011

Page 19: Scraping by examples

<table> <thead>...</thead> <tfoot> <tr> <td>...</td> <td>...</td> <td>...</td> <td>...</td> <td>...</td> </tr>

</tfoot> <tbody>...</tbody></table>

Estude o caminho do dado na árvore

DOM

Friday, May 20, 2011

Page 20: Scraping by examples

Observe IDs e classes CSS que podem ser úteis.

Friday, May 20, 2011

Page 21: Scraping by examples

Friday, May 20, 2011

Page 22: Scraping by examples

class="td_numeros"

Friday, May 20, 2011

Page 23: Scraping by examples

Friday, May 20, 2011

Page 24: Scraping by examples

Friday, May 20, 2011

Page 25: Scraping by examples

".td_numeros"

[

Friday, May 20, 2011

Page 26: Scraping by examples

".td_numeros"

[ 0 1 23 4 56 7 89 10 1112 13 1415 16 17

Friday, May 20, 2011

Page 27: Scraping by examples

[ 0 13 4 56 7 89 10 1112 13 1415 16 17

2

1º dado de que precisamos.

(numerador da fórmula)

Friday, May 20, 2011

Page 28: Scraping by examples

[ 0 13 4 56 7 89 10 1112 13 1415 16 17

2

2º dado de que precisamos.

(para o cálculo do denominador da fórmula)

Friday, May 20, 2011

Page 29: Scraping by examples

[ 0 13 4 56 7 89 10 1112 13 1415 16 17

2

mulheres da região N

total de pessoas da região N=

dados[5]

dados[4] + dados[5]

Friday, May 20, 2011

Page 30: Scraping by examples

teste

Friday, May 20, 2011

Page 31: Scraping by examples

code

Friday, May 20, 2011

Page 32: Scraping by examples

$ rspec spec

IBGECenso2010 razao de sexo should open page with "Razão de sexo, população de homens e mulheres" should get number of women

Finished in 1.78 seconds2 examples, 0 failures

Friday, May 20, 2011

Page 33: Scraping by examples

teste

Friday, May 20, 2011

Page 34: Scraping by examples

code

Friday, May 20, 2011

Page 35: Scraping by examples

#3 Recuperar o restante de dados

desejados

Friday, May 20, 2011

Page 36: Scraping by examples

Friday, May 20, 2011

Page 37: Scraping by examples

...Friday, May 20, 2011

Page 38: Scraping by examples

#4 Apresentação Web do scrapping

Friday, May 20, 2011

Page 39: Scraping by examples

application.rb

(...)Friday, May 20, 2011

Page 40: Scraping by examples

application.rb(...)

Friday, May 20, 2011

Page 41: Scraping by examples

index.erb

(...)

Friday, May 20, 2011

Page 42: Scraping by examples

http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011

Page 43: Scraping by examples

diferenciadade dados

o charme dos

mashups está na

visualização

http://datavisualization.ch/tools/13-javascript-libraries-for-visualizationsFriday, May 20, 2011

Page 44: Scraping by examples

#5 Visualização (ainda tosca) do

scrapping

Friday, May 20, 2011

Page 45: Scraping by examples

Friday, May 20, 2011

Page 46: Scraping by examples

#6 Visualização diferenciada da

informação

Friday, May 20, 2011

Page 47: Scraping by examples

?Friday, May 20, 2011

Page 48: Scraping by examples

Agora, a mesma coisa,

apenas com

JavascriptFriday, May 20, 2011

Page 49: Scraping by examples

#1 Acessar a página que contém o dado

desejado

Friday, May 20, 2011

Page 50: Scraping by examples

test

Friday, May 20, 2011

Page 51: Scraping by examples

code

Friday, May 20, 2011

Page 52: Scraping by examples

Friday, May 20, 2011

Page 53: Scraping by examples

#2 Recuperar o dado desejado

Friday, May 20, 2011

Page 54: Scraping by examples

test

Friday, May 20, 2011

Page 55: Scraping by examples

code

Friday, May 20, 2011

Page 56: Scraping by examples

#3 Recuperar o restante de dados

desejados

Friday, May 20, 2011

Page 57: Scraping by examples

...Friday, May 20, 2011

Page 58: Scraping by examples

#4 Apresentação Web do scrapping

Friday, May 20, 2011

Page 59: Scraping by examples

index.html

Friday, May 20, 2011

Page 60: Scraping by examples

index.html

Friday, May 20, 2011

Page 61: Scraping by examples

index.html

Friday, May 20, 2011

Page 62: Scraping by examples

index.html

Friday, May 20, 2011

Page 63: Scraping by examples

index.html

(...)Friday, May 20, 2011

Page 64: Scraping by examples

index.html

(...)Friday, May 20, 2011

Page 65: Scraping by examples

index.html

(...)

Friday, May 20, 2011

Page 66: Scraping by examples

index.html

(...)

Friday, May 20, 2011

Page 67: Scraping by examples

http://chart.apis.google.com/chart?chxt=y&chbh=a&chs=500x300&cht=bvg&chco=A2C180,3D7930

&chd=t:49,51,51,50,50|89,82,94,95,93&chdl=Women|Literates&chp=0.033

Friday, May 20, 2011

Page 68: Scraping by examples

código disponível em...

Friday, May 20, 2011

Page 69: Scraping by examples

P&RFriday, May 20, 2011

Page 70: Scraping by examples

http://tinyurl.com/AvaliacaoSOO14

Friday, May 20, 2011

Page 71: Scraping by examples

Friday, May 20, 2011