Growth Hackers Toronto Web Scraping & Lead Generation

Post on 09-Feb-2017

248 views 0 download

Transcript of Growth Hackers Toronto Web Scraping & Lead Generation

Growth Hacking Toronto

@GrowthCollectiv@Dean_h_wang@FundThrough@TopHat@EventMobi@GuruLink

Connect with us:

N: EventMobi-GuestP: emguests#GrowthHackingTO

Web Scraping and Lead Generation

By:Dean Wang

Web Scraping and Lead GenerationNovember 17, 2015

About MeDean WangData Intelligence Manager at Influitive

@dean_h_wanglinkedin.com/in/deanwang1

• Data Nightmares• What is web scraping?• Learning web scraping• Demo set up• Legal stuff and prevention• More to explore• Demo results

Outline

Data NightmaresIs your database a strength or weakness?

Dude, where are my prospects?

Distinguishing Apples and Oranges

Hi [firstname],

Lead Generation

Data Refresh/Appending

Incomplete entries• Contact information• Segmentation

Outdated entries• Person has left the company• Company has grown or shrunk• Company was acquired

Innovative Marketing Initiatives

Personalize communications with targeted prospects

Target prospects with strong signalsMore possibilities for ongoing campaigns

Traditional Solutions

Manual data entry• Time and resource intensive

List buying• Quality problems

New Solutions

Web Scraping(the subject of this talk!)

Accessing a website’s APINot always available

What is Web Scraping?

The World Wide Web

• Full of information, directories• Some webpages are semi-structured• Web scraping can take advantage of this

structure

“Web Scraping” as an Intern

What an intern or scraper would do:• Find a list of useful pages• Click through each page• Note down the useful information

Learning Web Scraping

The Structure of Webpages

The Structure of Webpages

HTML tags “mark up” pages

<tag>

<div>

<span>

<a href=“www.influitive.com”>

The Structure of WebpagesWith a few exceptions, each opening tag has a closing tag

<div></div>

<span></span>

<h3></h3>

<a href=“www.influitive.com”></a>

The Structure of WebpagesTags can contain text or even other tags within them

<div>This text is in the div element</div>

<nested><tag></tag>

</nested>

<a href=“www.influitive.com”>Influitive</a>

The Structure of Webpages

CSS Selectors

CSS Selectors enable you to communicate exactly what tags you want to selectAn advanced technique that gives you more control over scrapinge.g. div, div a, h3.class

Structure and Web ScrapingWeb scraping takes advantages of HTML tags

The tags must be somewhat regular on every page in order for the web scraping to work

DemoFingers crossed!

Legal Stuff and Prevention

Legal Stuff

Is it legal? • A site’s terms of use often forbid it• Information may be otherwise publicly available• Case of QVC v Resultly• A gray area in general

Preventing Web Scraping

• robots.txt• Restrictions on number of page views allowed• Blacklisting IP addresses• CAPTCHAs• Slight variations in website design

More to ExploreLinks and Tools

Some Useful Linkshttp://www.w3schools.com/html/

http://scraping.pro/

https://blog.hartleybrody.com/web-scraping/

Different Web Scraping ToolsChrome Extensions

Data Miner/ScraperWebscraper

ScraperWikiMozendaKnowledge of Python required

ScrapySelenium + Beautiful Soup

Demo ResultsFingers crossed!

Questions?

Thank You