CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% •...
-
Upload
truongdien -
Category
Documents
-
view
223 -
download
6
Transcript of CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% •...
![Page 1: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/1.jpg)
P A G E
CORE SECURITY Don’t try to block out the sun with your fingers!
Informa-on harves-ng with Test-‐driven development tools and understanding how to avoid it
Nicolás Rodriguez ([email protected]) November 20th, 2012
![Page 2: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/2.jpg)
P A G E
Introduc-on
• Structure of the Talk • Origin of the Talk • Informa-on Harves-ng
• Web Scraping • Techniques and tools • Legal Issues • Test-‐Driven Development
• Abusing a web site • Code and Demos
• Mi-ga-ons and Tradeoffs
• Q&A
2
![Page 3: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/3.jpg)
P A G E
Who Am I?
• Programmer since I was 10 years old
• Network Administrator for the last 10 years
• Security Consultant at Core Security since 2006
3
![Page 4: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/4.jpg)
P A G E
Informa-on Harves-ng
4
![Page 5: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/5.jpg)
P A G E
Web Scraping
5
![Page 6: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/6.jpg)
P A G E
Web Scraping and Legal Issues
• Web scraping may be against the terms of use of some websites. Read the Terms of Use of the target site!
• The enforceability of these terms is unclear: • While outright duplica-on of original expression will in many cases be
illegal, in the U.S. the courts ruled that duplica-on of facts is allowable
• U.S. courts have acknowledged that "scrapers“ may be held liable for commicng trespass.
• In Denmark systema-c crawling, indexing and deep linking does not to conflict with Danish law or the database direc-ve of the European Union (2006)
• In Australia, the Spam Act 2003 outlaws some forms of web harves-ng, although this only applies to email addresses.
6
![Page 7: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/7.jpg)
P A G E
Scraping Techniques
• HTTP GET / POST (wget / curl) • Single request per page • Cookies and session handling
• XSRF Token / Authoriza-on Tokens • Mul-ple requests per page
• JavaScript Rendering (i.e. Google Search) • Render page using local JavaScript Engine (i.e. v8)
• Test-‐Driven Development tools (i.e. Selenium) • DOM parsing: FindItem / Xpath • Source code grepping • JavaScript Injec-on
7
![Page 8: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/8.jpg)
P A G E
Test-‐Driven Development Tools (TDD)
• What is TDD? • Selenium
• Remote Automa-on
• WebDriver
• Uses • Web Applica-on test cases • Browser automa-on
• Whatever a user is able to do, this tool could reproduce it
• Web scraping
8
![Page 9: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/9.jpg)
P A G E
Code and Demos
9
![Page 10: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/10.jpg)
P A G E
Ini-al Setup
• Python 2.7 (hnp://www.python.org)
• Selinum Webdriver (hnp://seleniumhq.org/)
• JQuery (hnp://www.jquery.org)
• Mozilla Firefox (hnp://www.mozilla.com)
1 0
![Page 11: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/11.jpg)
P A G E
Code and Demos
• Basic Web Automa-on • Element handling
• DOM parsing • Source code grepping • Web Sites Screenshots
• Injec-on JavaScript libraries into a running site • Using JQuery to parse DOM elements
1 1
![Page 12: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/12.jpg)
P A G E
Possible Mi-ga-ons and Trade-‐offs
Public Pages: FORGET ABOUT IT!
• Authen-cated Pages: • Quotas: Limit the number of opera-ons on a given -me span
• Track behavior: any ac-on outside the standard usage triggers a user valida-on (CAPTCHA, login valida-on or session removal)
• Excess Traffic filtering.
Big Tradeoff: Performance vs Informa-on Protec-on
1 2
![Page 13: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/13.jpg)
P A G E
Things to watch out for…
• Don’t trust IP address • Anonymous services (TOR)
• Mul-ple IP addresses
• Quotas could be useless • Adjust based on the type of informa-on you manage
• Web scraping could be automated to be as stealth as possible
• Standard Behavior • There’s a fine line between a user and a good “scrapper” • Avoid damaging site’s usability. BIG ISSUE!
• No informa6on is safe! • If a user can see it, it can be scrapped!
1 3
![Page 14: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/14.jpg)
P A G E
Q&A
1 4
![Page 15: CORE%SECURITY% · PDF filevalidaon%(CAPTCHA,%login%validaon%or%session%removal)% • ExcessTrafficfiltering. Big%Tradeoff:% ... talk.harvesting.pptx Created Date:](https://reader031.fdocuments.in/reader031/viewer/2022030419/5aa5ff277f8b9afa758df701/html5/thumbnails/15.jpg)
P A G E
Thank you
1 5