Growth Hackers Toronto Web Scraping & Lead Generation
Transcript of Growth Hackers Toronto Web Scraping & Lead Generation
![Page 1: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/1.jpg)
Growth Hacking Toronto
@GrowthCollectiv@Dean_h_wang@FundThrough@TopHat@EventMobi@GuruLink
Connect with us:
N: EventMobi-GuestP: emguests#GrowthHackingTO
Web Scraping and Lead Generation
By:Dean Wang
![Page 2: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/2.jpg)
Web Scraping and Lead GenerationNovember 17, 2015
![Page 3: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/3.jpg)
About MeDean WangData Intelligence Manager at Influitive
@dean_h_wanglinkedin.com/in/deanwang1
![Page 4: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/4.jpg)
• Data Nightmares• What is web scraping?• Learning web scraping• Demo set up• Legal stuff and prevention• More to explore• Demo results
Outline
![Page 5: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/5.jpg)
Data NightmaresIs your database a strength or weakness?
![Page 6: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/6.jpg)
Dude, where are my prospects?
![Page 7: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/7.jpg)
Distinguishing Apples and Oranges
![Page 8: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/8.jpg)
Hi [firstname],
![Page 9: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/9.jpg)
Lead Generation
![Page 10: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/10.jpg)
Data Refresh/Appending
Incomplete entries• Contact information• Segmentation
Outdated entries• Person has left the company• Company has grown or shrunk• Company was acquired
![Page 11: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/11.jpg)
Innovative Marketing Initiatives
Personalize communications with targeted prospects
Target prospects with strong signalsMore possibilities for ongoing campaigns
![Page 12: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/12.jpg)
Traditional Solutions
Manual data entry• Time and resource intensive
List buying• Quality problems
![Page 13: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/13.jpg)
New Solutions
Web Scraping(the subject of this talk!)
Accessing a website’s APINot always available
![Page 14: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/14.jpg)
What is Web Scraping?
![Page 15: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/15.jpg)
The World Wide Web
• Full of information, directories• Some webpages are semi-structured• Web scraping can take advantage of this
structure
![Page 16: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/16.jpg)
“Web Scraping” as an Intern
What an intern or scraper would do:• Find a list of useful pages• Click through each page• Note down the useful information
![Page 17: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/17.jpg)
Learning Web Scraping
![Page 18: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/18.jpg)
The Structure of Webpages
![Page 19: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/19.jpg)
The Structure of Webpages
HTML tags “mark up” pages
<tag>
<div>
<span>
<a href=“www.influitive.com”>
![Page 20: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/20.jpg)
The Structure of WebpagesWith a few exceptions, each opening tag has a closing tag
<div></div>
<span></span>
<h3></h3>
<a href=“www.influitive.com”></a>
![Page 21: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/21.jpg)
The Structure of WebpagesTags can contain text or even other tags within them
<div>This text is in the div element</div>
<nested><tag></tag>
</nested>
<a href=“www.influitive.com”>Influitive</a>
![Page 22: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/22.jpg)
The Structure of Webpages
![Page 23: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/23.jpg)
CSS Selectors
CSS Selectors enable you to communicate exactly what tags you want to selectAn advanced technique that gives you more control over scrapinge.g. div, div a, h3.class
![Page 24: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/24.jpg)
Structure and Web ScrapingWeb scraping takes advantages of HTML tags
The tags must be somewhat regular on every page in order for the web scraping to work
![Page 25: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/25.jpg)
DemoFingers crossed!
![Page 26: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/26.jpg)
Legal Stuff and Prevention
![Page 27: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/27.jpg)
Legal Stuff
Is it legal? • A site’s terms of use often forbid it• Information may be otherwise publicly available• Case of QVC v Resultly• A gray area in general
![Page 28: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/28.jpg)
Preventing Web Scraping
• robots.txt• Restrictions on number of page views allowed• Blacklisting IP addresses• CAPTCHAs• Slight variations in website design
![Page 29: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/29.jpg)
More to ExploreLinks and Tools
![Page 30: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/30.jpg)
Some Useful Linkshttp://www.w3schools.com/html/
http://scraping.pro/
https://blog.hartleybrody.com/web-scraping/
![Page 31: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/31.jpg)
Different Web Scraping ToolsChrome Extensions
Data Miner/ScraperWebscraper
ScraperWikiMozendaKnowledge of Python required
ScrapySelenium + Beautiful Soup
![Page 32: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/32.jpg)
Demo ResultsFingers crossed!
![Page 33: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/33.jpg)
Questions?
![Page 34: Growth Hackers Toronto Web Scraping & Lead Generation](https://reader035.fdocuments.in/reader035/viewer/2022062522/589b97f01a28abd63e8b48d5/html5/thumbnails/34.jpg)
Thank You