Growth Hackers Toronto Web Scraping & Lead Generation

34
Growth Hacking Toronto @GrowthCollectiv @Dean_h_wang @FundThrough @TopHat @EventMobi @GuruLink Connect with us: N: EventMobi-Guest P: emguests #GrowthHackingTO Web Scraping and Lead Generation By: Dean Wang

Transcript of Growth Hackers Toronto Web Scraping & Lead Generation

Page 1: Growth Hackers Toronto Web Scraping & Lead Generation

Growth Hacking Toronto

@GrowthCollectiv@Dean_h_wang@FundThrough@TopHat@EventMobi@GuruLink

Connect with us:

N: EventMobi-GuestP: emguests#GrowthHackingTO

Web Scraping and Lead Generation

By:Dean Wang

Page 2: Growth Hackers Toronto Web Scraping & Lead Generation

Web Scraping and Lead GenerationNovember 17, 2015

Page 3: Growth Hackers Toronto Web Scraping & Lead Generation

About MeDean WangData Intelligence Manager at Influitive

@dean_h_wanglinkedin.com/in/deanwang1

Page 4: Growth Hackers Toronto Web Scraping & Lead Generation

• Data Nightmares• What is web scraping?• Learning web scraping• Demo set up• Legal stuff and prevention• More to explore• Demo results

Outline

Page 5: Growth Hackers Toronto Web Scraping & Lead Generation

Data NightmaresIs your database a strength or weakness?

Page 6: Growth Hackers Toronto Web Scraping & Lead Generation

Dude, where are my prospects?

Page 7: Growth Hackers Toronto Web Scraping & Lead Generation

Distinguishing Apples and Oranges

Page 8: Growth Hackers Toronto Web Scraping & Lead Generation

Hi [firstname],

Page 9: Growth Hackers Toronto Web Scraping & Lead Generation

Lead Generation

Page 10: Growth Hackers Toronto Web Scraping & Lead Generation

Data Refresh/Appending

Incomplete entries• Contact information• Segmentation

Outdated entries• Person has left the company• Company has grown or shrunk• Company was acquired

Page 11: Growth Hackers Toronto Web Scraping & Lead Generation

Innovative Marketing Initiatives

Personalize communications with targeted prospects

Target prospects with strong signalsMore possibilities for ongoing campaigns

Page 12: Growth Hackers Toronto Web Scraping & Lead Generation

Traditional Solutions

Manual data entry• Time and resource intensive

List buying• Quality problems

Page 13: Growth Hackers Toronto Web Scraping & Lead Generation

New Solutions

Web Scraping(the subject of this talk!)

Accessing a website’s APINot always available

Page 14: Growth Hackers Toronto Web Scraping & Lead Generation

What is Web Scraping?

Page 15: Growth Hackers Toronto Web Scraping & Lead Generation

The World Wide Web

• Full of information, directories• Some webpages are semi-structured• Web scraping can take advantage of this

structure

Page 16: Growth Hackers Toronto Web Scraping & Lead Generation

“Web Scraping” as an Intern

What an intern or scraper would do:• Find a list of useful pages• Click through each page• Note down the useful information

Page 17: Growth Hackers Toronto Web Scraping & Lead Generation

Learning Web Scraping

Page 18: Growth Hackers Toronto Web Scraping & Lead Generation

The Structure of Webpages

Page 19: Growth Hackers Toronto Web Scraping & Lead Generation

The Structure of Webpages

HTML tags “mark up” pages

<tag>

<div>

<span>

<a href=“www.influitive.com”>

Page 20: Growth Hackers Toronto Web Scraping & Lead Generation

The Structure of WebpagesWith a few exceptions, each opening tag has a closing tag

<div></div>

<span></span>

<h3></h3>

<a href=“www.influitive.com”></a>

Page 21: Growth Hackers Toronto Web Scraping & Lead Generation

The Structure of WebpagesTags can contain text or even other tags within them

<div>This text is in the div element</div>

<nested><tag></tag>

</nested>

<a href=“www.influitive.com”>Influitive</a>

Page 22: Growth Hackers Toronto Web Scraping & Lead Generation

The Structure of Webpages

Page 23: Growth Hackers Toronto Web Scraping & Lead Generation

CSS Selectors

CSS Selectors enable you to communicate exactly what tags you want to selectAn advanced technique that gives you more control over scrapinge.g. div, div a, h3.class

Page 24: Growth Hackers Toronto Web Scraping & Lead Generation

Structure and Web ScrapingWeb scraping takes advantages of HTML tags

The tags must be somewhat regular on every page in order for the web scraping to work

Page 25: Growth Hackers Toronto Web Scraping & Lead Generation

DemoFingers crossed!

Page 26: Growth Hackers Toronto Web Scraping & Lead Generation

Legal Stuff and Prevention

Page 27: Growth Hackers Toronto Web Scraping & Lead Generation

Legal Stuff

Is it legal? • A site’s terms of use often forbid it• Information may be otherwise publicly available• Case of QVC v Resultly• A gray area in general

Page 28: Growth Hackers Toronto Web Scraping & Lead Generation

Preventing Web Scraping

• robots.txt• Restrictions on number of page views allowed• Blacklisting IP addresses• CAPTCHAs• Slight variations in website design

Page 29: Growth Hackers Toronto Web Scraping & Lead Generation

More to ExploreLinks and Tools

Page 30: Growth Hackers Toronto Web Scraping & Lead Generation

Some Useful Linkshttp://www.w3schools.com/html/

http://scraping.pro/

https://blog.hartleybrody.com/web-scraping/

Page 31: Growth Hackers Toronto Web Scraping & Lead Generation

Different Web Scraping ToolsChrome Extensions

Data Miner/ScraperWebscraper

ScraperWikiMozendaKnowledge of Python required

ScrapySelenium + Beautiful Soup

Page 32: Growth Hackers Toronto Web Scraping & Lead Generation

Demo ResultsFingers crossed!

Page 33: Growth Hackers Toronto Web Scraping & Lead Generation

Questions?

Page 34: Growth Hackers Toronto Web Scraping & Lead Generation

Thank You