Web Scraper Shibuya.pm tech talk #8

Post on 13-May-2015

19.737 views 0 download

Tags:

Transcript of Web Scraper Shibuya.pm tech talk #8

Practical Web Scraping

with Web::Scraper

Tatsuhiko Miyagawa miyagawa@gmail.com

Six Apart, Ltd. / Shibuya Perl MongersShibuya.pm Tech Talks #8

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Practical Web Scraping

with Web::Scraper

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Web pages are built using text-based mark-up languages (HTML and XHTML), and frequently contain a wealth of useful data in text form. However, most web pages are designed for human consumption, and frequently mix content with presentation. Thus, screen scrapers were reborn in the web era to extract machine-friendly data from HTML and other markup.

http://en.wikipedia.org/wiki/Screen_scraping

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Web pages are built using text-based mark-up languages (HTML and XHTML), and frequently contain a wealth of useful data in text form. However, most web pages are designed for human consumption, and frequently mix content with presentation. Thus, screen scrapers were reborn in the web era to extract machine-friendly data from HTML and other markup.

http://en.wikipedia.org/wiki/Screen_scraping

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

"Screen-scrapingis so 1999!"

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

RSS is a metadatanot a complete

HTML replacement

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Practical Web Scraping

with Web::Scraper

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

What's wrong withLWP & Regexp?

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

<td>Current <strong>UTC</strong> (or GMT/Zulu)-time used: <strong id="ctu">Monday, August 27, 2007 at 12:49:46</strong> <br />

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

<td>Current <strong>UTC</strong> (or GMT/Zulu)-time used: <strong id="ctu">Monday, August 27, 2007 at 12:49:46</strong> <br />

> perl -MLWP::Simple -le '$c = get("http://timeanddate.com/worldclock/"); $c =~ m@<strong id="ctu">(.*?)</strong>@ and print $1'Monday, August 27, 2007 at 12:49:46

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

It works!

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

WWW::MySpace 0.70

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

WWW::Search::Ebay 2.231

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

WWW::Mixi 0.50

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

It works …

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

There are3 problems(at least)

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

(1)Fragile

Easy to break even with slight HTML changes(like newlines, order of attributes etc.)

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

(2)Hard to maintain

Regular expression based scrapers are good Only when they're used in write-only scripts

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

(3)Improper

HTML & encodinghandling

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

<span class="message">I &hearts; Shibuya</span>

> perl –e '$c =~ m@<span class="message">(.*?)</span>@ and print $1'I &hearts; Shibuya

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

<span class="message">I &hearts; Shibuya</span>

> perl –MHTML::Entities –e '$c =~ m@<span class="message">(.*?)</span>@ and print decode_entities($1)'I ♥ Shibuya

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

<span class="message">Perl が大好き! </span>

> perl –MHTML::Entities –MEncode –e '$c =~ m@<span class="message">(.*?)</span>@ and print decode_entities(decode_utf8($1))'Wide character in print at –e line 1.Perl が大好き!

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

The "right" wayof screen-scraping

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

(1), (2)MaintainableLess fragile

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Use XPathand CSS Selectors

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

XPath

HTML::TreeBuilder::XPathXML::LibXML

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

XPath

<td>Current <strong>UTC</strong> (or GMT/Zulu)-time used: <strong id="ctu">Monday, August 27, 2007 at 12:49:46</strong> <br />

use HTML::TreeBuilder::XPath;

my $tree = HTML::TreeBuilder::XPath->new_from_content($content);print $tree->findnodes('//strong[@id="ctu"]')->shift->as_text;

# Monday, August 27, 2007 at 12:49:46

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

CSS Selectors

"XPath for HTML coders""XPath for people who hates XML"

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

CSS Selectors

body { font-size: 12px; }

div.article { padding: 1em }

span#count { color: #fff }

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

XPath: //strong[@id="ctu"]

CSS Selector: strong#ctu

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

CSS Selectors

<td>Current <strong>UTC</strong> (or GMT/Zulu)-time used: <strong id="ctu">Monday, August 27, 2007 at 12:49:46</strong> <br />

use HTML::TreeBuilder::XPath;use HTML::Selector::XPath qw(selector_to_xpath);

my $tree = HTML::TreeBuilder::XPath->new_from_content($content);my $xpath = selector_to_xpath "strong#ctu";print $tree->findnodes($xpath)->shift->as_text;

# Monday, August 27, 2007 at 12:49:46

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Complete Script#!/usr/bin/perluse strict;use warnings;use Encode;use LWP::UserAgent;use HTTP::Response::Encoding;use HTML::TreeBuilder::XPath;use HTML::Selector::XPath qw(selector_to_xpath);

my $ua = LWP::UserAgent->new;my $res = $ua->get("http://www.timeanddate.com/worldclock/");if ($res->is_error) { die "HTTP GET error: ", $res->status_line;}my $content = decode $res->encoding, $res->content;

my $tree = HTML::TreeBuilder::XPath->new_from_content($content);my $xpath = selector_to_xpath("strong#ctu");my $node = $tree->findnodes($xpath)->shift;print $node->as_text;

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Robust,Maintainable,

andSane character

handling

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Exmaple (before)

<td>Current <strong>UTC</strong> (or GMT/Zulu)-time used: <strong id="ctu">Monday, August 27, 2007 at 12:49:46</strong> <br />

> perl -MLWP::Simple -le '$c = get("http://timeanddate.com/worldclock/"); $c =~ m@<strong id="ctu">(.*?)</strong>@ and print $1'Monday, August 27, 2007 at 12:49:46

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Example (after)#!/usr/bin/perluse strict;use warnings;use Encode;use LWP::UserAgent;use HTTP::Response::Encoding;use HTML::TreeBuilder::XPath;use HTML::Selector::XPath qw(selector_to_xpath);

my $ua = LWP::UserAgent->new;my $res = $ua->get("http://www.timeanddate.com/worldclock/");if ($res->is_error) { die "HTTP GET error: ", $res->status_line;}my $content = decode $res->encoding, $res->content;

my $tree = HTML::TreeBuilder::XPath->new_from_content($content);my $xpath = selector_to_xpath("strong#ctu");my $node = $tree->findnodes($xpath)->shift;print $node->as_text;

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

but …long and boring

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Practical Web Scraping

with Web::Scraper

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Web scraping toolkitinspired by scrapi.rb

DSL-ish

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Example (before)#!/usr/bin/perluse strict;use warnings;use Encode;use LWP::UserAgent;use HTTP::Response::Encoding;use HTML::TreeBuilder::XPath;use HTML::Selector::XPath qw(selector_to_xpath);

my $ua = LWP::UserAgent->new;my $res = $ua->get("http://www.timeanddate.com/worldclock/");if ($res->is_error) { die "HTTP GET error: ", $res->status_line;}my $content = decode $res->encoding, $res->content;

my $tree = HTML::TreeBuilder::XPath->new_from_content($content);my $xpath = selector_to_xpath("strong#ctu");my $node = $tree->findnodes($xpath)->shift;print $node->as_text;

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Example (after)

#!/usr/bin/perl

use strict;

use warnings;

use Web::Scraper;

use URI;

my $s = scraper {

process "strong#ctu", time => 'TEXT';

result 'time';

};

my $uri = URI->new("http://timeanddate.com/worldclock/");

print $s->scrape($uri);

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Basics

use Web::Scraper;

my $s = scraper {

# DSL goes here

};

my $res = $s->scrape($uri);

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

process

process $selector,

$key => $what,

…;

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

$selector:

CSS Selectoror

XPath (start with /)

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

$key:key for the result

hashappend "[]" for

looping

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

$what:'@attr''TEXT''RAW'

Web::Scrapersub { … }

Hash reference

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

<ul class="sites"><li><a href="http://vienna.openguides.org/">OpenGuides</a></li><li><a href="http://vienna.yapceurope.org/">YAPC::Europe</a></li></ul>

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

process "ul.sites > li > a",

'urls[]' => '@href';

# { urls => [ … ] }

<ul class="sites"><li><a href="http://vienna.openguides.org/">OpenGuides</a></li><li><a href="http://vienna.yapceurope.org/">YAPC::Europe</a></li></ul>

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

process '//ul[@class="sites"]/li/a',

'names[]' => 'TEXT';

# { names => [ 'OpenGuides', … ] }

<ul class="sites"><li><a href="http://vienna.openguides.org/">OpenGuides</a></li><li><a href="http://vienna.yapceurope.org/">YAPC::Europe</a></li></ul>

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

process "ul.sites > li",

'sites[]' => scraper {

process 'a',

link => '@href', name => 'TEXT';

};

# { sites => [ { link => …, name => … },

# { link => …, name => … } ] };

<ul class="sites"><li><a href="http://vienna.openguides.org/">OpenGuides</a></li><li><a href="http://vienna.yapceurope.org/">YAPC::Europe</a></li></ul>

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

process "ul.sites > li > a",

'sites[]' => sub {

# $_ is HTML::Element

+{ link => $_->attr('href'), name => $_->as_text };

};

# { sites => [ { link => …, name => … },

# { link => …, name => … } ] };

<ul class="sites"><li><a href="http://vienna.openguides.org/">OpenGuides</a></li><li><a href="http://vienna.yapceurope.org/">YAPC::Europe</a></li></ul>

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

process "ul.sites > li > a",

'sites[]' => {

link => '@href', name => 'TEXT';

};

# { sites => [ { link => …, name => … },

# { link => …, name => … } ] };

<ul class="sites"><li><a href="http://vienna.openguides.org/">OpenGuides</a></li><li><a href="http://vienna.yapceurope.org/">YAPC::Europe</a></li></ul>

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

result

result; # get stash as hashref (default)result @keys; # get stash as hashref containing @keysresult $key; # get value of stash $key;

my $s = scraper { process …; process …; result 'foo', 'bar';};

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Live Demo

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Tools

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

> cpan Web::Scraper

comes with 'scraper' CLI

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

> scraper http://example.com/

scraper> process "a", "links[]" => '@href';

scraper> d

$VAR1 = {

links => [

'http://example.org/',

'http://example.net/',

],

};

scraper> y

---

links:

- http://example.org/

- http://example.net/

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

> scraper /path/to/foo.html

> GET http://example.com/ | scraper

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Recent Updates

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

0.13'c' and 'c all'

WARN in scraper

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

0.14automatic absolute URI for link elements

(a@href, img@src)

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

0.14 (cont.)'RAW' and 'HTML'

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

0.15$Web::Scraper::UserAgent

$scraper->user_agent

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

0.19support encoding detection w/ META

tags

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

TODO

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Web::ScraperNeeds documentation

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

More examplesto put in eg/ directory

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Alternative APIinspired by scRUBYt!

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

OO Backend APIif you don't like the

DSL

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

integrate withWWW::Mechanize

and Test::WWW::Declare

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

XPath Auto-suggestion

off of DOM + element

DOM + XPath => ElementDOM + Element => XPath?

(Template::Extract?)

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

generic XML support(e.g. RSS/Atom feeds)

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

extensible text filterdate, geo, hCards (microformats)

<span class="entry-date">October 1st, 2007 17:13:31 +0900</span>

process ".entry-date", date => 'TEXT:rfc822';

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Summary

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Web::Scraperinspired by scrapi

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

easy, fun, maintainable& less fragile

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

CSS selectorXPath

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Questions?

Tatsuhiko MiyagawaTatsuhiko Miyagawa 2007/10/01 Shibuya.pm Tech Talk #82007/10/01 Shibuya.pm Tech Talk #8

Thank you

http://search.cpan.org/dist/Web-Scraperhttp://www.slideshare.net/miyagawa/

webscraper