Fun with Google

52
Fun with Jeremy Rasmussen 9/23/05 Part 1: Power searches and reconnaissance

Transcript of Fun with Google

Page 1: Fun with Google

Fun with

Jeremy Rasmussen9/23/05

Part 1: Power searches and reconnaissance

Page 2: Fun with Google

I’m feeling lucky

• The Google interface

• Preferences

• Cool stuff

• Power searching

Page 3: Fun with Google

Classic interface

Page 4: Fun with Google

Custom interface

Page 5: Fun with Google

Language prefs

Page 6: Fun with Google

Google in H4x0r

Page 7: Fun with Google

Language

• Proxy server can be used to hide location and identity while surfing Web

• Google sets default language to match country where proxy is

• If your language settings change inexplicably, check proxy settings

• You can manipulate language manually by fiddling directly with URL

Page 8: Fun with Google

Google scholar

Page 9: Fun with Google

Google University search

Page 10: Fun with Google

Google groups

Page 11: Fun with Google

Google freeware

• Web accelerator

• Google earth

• Picasa

• Etc.

Page 12: Fun with Google

Golden rules of searching

• Google is case-insensitive– Except for the Boolean operator OR, which

must be written in uppercase

• Wildcards not handled normally– * nothing more than a single word in a search

phrase; provides no additional stemming

• Google stems automatically– Tries to expand or contract words

automatically—can’t lead to unpredictable results

Page 13: Fun with Google

Golden rules of searching

• Google ignores “stop” words– Who, where, what, the, a, an– Except when you search on them individually– Or when you put “quotes” around search

phrase– Or when you +force +it +to +use +all +terms

• Largest possible search?

• Google limits you to a 10-word query– Get around this by using wildcards for stop

words

Page 14: Fun with Google

Boolean operators

• Google automatically ANDs all search terms

• Spice things up with:OR |NOT –

• Google evaluates these from left to right• Search terms don’t even have to be

syntactically correct in terms of Boolean logic

Page 15: Fun with Google

Search example

• What does the following search term do:

Intext:password | passcode intext:username | userid | user filetype:xls

• Locates all pages that have either password or passcode in their text. Then from these, show only pages that have username, userid, or user. From these, it shows only .XLS files.Google not confused by the lousy syntax or

lack of parentheses.

Page 16: Fun with Google

URL queries

• Everything that can be done through the search box can be done by manually entering a URL

• The only required parameter is q (query)

www.google.com/search?q=foo

• String together parameters with &

www.google.com/search?q=foo&hl=en(Specifies query on foo and language of English)

Page 17: Fun with Google

Some advanced operators

• intitle - search text within the title of a page– URL: as_occt=title

• inurl - search text within a given URL. Alows you to search for specific directories or folders– URL: as_occt=url

• filetype - search for pages with a particular file extension– URL: as_ft=i&as_filetype=<some file extension>

• site - search only within the specified sites. Must be valid top-level domain name– URL: as_dt=i&as_sitesearch=<some domain>

Page 18: Fun with Google

Some advanced operators

• link - search for pages that link to other pages. Must be correct URL syntax; if invalid link syntax provided, Google treats it like a phrase search– URL: as_lq

• daterange - search for pages published within a certain date range. Uses Julian dates or 3 mo, 6 mo, yr.– As_qdr=m6 (searches past six months)

• numrange - search for numbers within a range from low-high. e.g., numrange:99-101 will find 100. Alternatively, use 99..101– URL: as_nlo=<low num>&as_nhi=<high num>

• Note Google ignores $ and , (makes searching easier)

Page 19: Fun with Google

Advanced operators

• cache - use Google's cached link of the results page. Passing invalid URL as parameter to cache will submit query as phrase search.– URL:

• info - shows summary information for a site and provides links to other Google searches that might pertain to the site. Same as supplying URL as a search query.

• related - shows sites Google thinks are similar.– URL: as_rq

Page 20: Fun with Google

Google groups operators

• author - find a Usenet author

• group - find a Usenet group

• msgid - find a Usenet message ID

• insubject - find a Usenet subject lines (similar to intitle:)

• These are useful for finding people, NNTP servers, etc.

Page 21: Fun with Google

Hacking Google

• Try to explore how commands work together

• Try to find out why stuff works the way it does

• E.g., why does the following return > 0 hits?

(filetype:pdf | filetype:xls) -inurl:pdf -inurl:xls

Page 22: Fun with Google

Surfing anonymously

• People who want to surf anonymously usually use a Web proxy

• Go to samair.ru/proxy and find a willing, open proxy; then change browser configs

• E.g., proxy to 195.205.195.131:80 (Poland)– Check it via:

http://www.all-nettools.com/toolbox,net– Resets Google search page to Polish

Page 23: Fun with Google

Google searches for proxies

• inurl:"nph-proxy.cgi" "Start browsing through this CGI-based proxy“– E.g., http://www.netshaq.com/cgiproxy/nph-

proxy.cgi/011100A/• "this proxy is working fine!" "enter *"

"URL***" * visit– E.g.,

http://web.archive.org/web/20050922222155/http://davegoorox.c-f-h.com/cgiproxy/nph-proxy.cgi/000100A/http/news.google.com/webhp?hl=en&tab=nw&ned=us&q=

Page 24: Fun with Google

Caching anonymously

• Caching is a good way to see Web content without leaving an entry in their log, right?

• Not necessarily—Google still tries to download images, which creates a connection from you to the server.

• The “cached text only” will allow you to see the page (sans images) anonymously

• Get there by copying the URL from Google cache and appending &strip=1 to the end.

Page 25: Fun with Google

Using Google as a proxy

• Use Google as a transparent proxy server via its translation service

• Translate English to English:

http://www.google.com/translate?u=http%3A%2F%2Fwww.google.com&langpair=en%7Cen&hl=en&ie=Unknown&oe=ASCII

• Doh! It’s a transparent proxy—Web server can still see your IP address. Oh well.

Page 26: Fun with Google

Finding Web server versions

• It might be useful to get info on server types and versions

• E.g., “Microsoft-IIS/6.0” intitle:index.of• E.g., “Apache/2.0.52 server at”

intitle:index.of• E.g., intitle:Test.Page.for.Apache

it.worked!– Returns list of sites running Apache 1.2.6 with

a default home page.

Page 27: Fun with Google

Traversing directories

• Look for Index directories– Intitle:index.of inurl:”/admin/*”

• Or, Try incremental substitution of URLs (a.k.a. “fuzzing”)– /docs/bulletin/1.xls could be modified to

/docs/bulletin/2.xls even if Google didn’t return that file in its search

Page 28: Fun with Google

Finding PHP source

• PHP script executes on the server and presents HTML to your browser. You can’t do a “View Source” and see the script.

• However, Web servers aren’t too sure what to do with foo.php.bak file. They treat it as text.

• Search for backup copies of Web files:– inurl:backup intitle:index.of inurl:admin php

Page 29: Fun with Google

Recon: finding stuff about people

• Intranets– inurl:intranet intitle:human resources– inurl:intranet intitle:employee login

• Help desks– inurl:intranet help.desk | helpdesk

• Email on the Web– filetype:mbx intext:Subject– filetype:pst inurl:pst (inbox | contacts)

Page 30: Fun with Google

Recon: Finding stuff about people

• Windows registry files on the Web!– filetype:reg reg +intext:|internet account

manager“

• A million other ways:– filetype:xls inurl:”email.xls”– inurl:email filetype:mdb– (filetype:mail | filetype:eml | filetype:pst |

filetype:mbx) intext:password|subject– …

Page 31: Fun with Google

Recon: Finding stuff about people

• Full emails– filetype:eml eml +intext:"Subject"

+intext:"From" 2005

• Buddy lists– filetype:blt buddylist

• Résumés – "phone * * *" "address *" "e-mail"

intitle:"curriculum vitae“• Including SSN’s? Yes…

Page 32: Fun with Google

Recon: Finding stuff about people

Page 33: Fun with Google

Site crawling

• All domain names, different ways– site:www.usf.edu returns 10 thousand pages– site:usf.edu returns 2.8 million pages– site:usf.edu -site:www.usf.edu returns 2.9

million pages– site:www.usf.edu -site:usf.edu returns nada

Page 34: Fun with Google

Scraping domain names with shell script

trIpl3-H> trIpl3-H> lynx –dump \ "http://www.google.com/search?q=site:usf.edu+-www.usf.edu&num=100" > sites.txttrIpl3-H> trIpl3-H> sed -n 's/\. http:\/\/[[:alpha:]]*.usf.edu\//& /p' sitejunk.txt >> sites.outtrIpl3-H> trIpl3-H> trIpl3-H>

Page 35: Fun with Google

Scraping domain names with shell script

anchin.coedu.usf.educatalog.grad.usf.educe.eng.usf.educedr.coba.usf.educhuma.cas.usf.educomps.marine.usf.eduetc.usf.edufacts004.facts.usf.edufcit.coedu.usf.edufcit.usf.eduftp://modis.marine.usf.eduhsc.usf.eduhttps://hsccf.hsc.usf.eduhttps://security.usf.eduisis.fastmail.usf.edu

www.cas.usf.eduwww.coba.usf.eduwww.coedu.usf.eduwww.ctr.usf.eduwww.eng.usf.eduwww.flsummit.usf.eduwww.fmhi.usf.eduwww.marine.usf.eduwww.moffitt.usf.eduwww.nelson.usf.eduwww.plantatlas.usf.eduwww.registrar.usf.eduwww.research.usf.eduwww.reserv.usf.eduwww.safetyflorida.usf.eduwww.sarasota.usf.eduwww.stpt.usf.eduwww.ugs.usf.eduwww.usfpd.usf.eduwww.wusf.usf.edu

library.arts.usf.edulistserv.admin.usf.edumailman.acomp.usf.edumodis.marine.usf.edumy.usf.edunbrti.cutr.usf.edunosferatu.cas.usf.eduplanet.blog.usf.edupublichealth.usf.edurarediseasesnetwork.epi.usf.edutapestry.usf.eduusfweb.usf.eduusfweb2.usf.eduw3.usf.eduweb.lib.usf.eduweb.usf.eduweb1.cas.usf.eduwww.acomp.usf.eduwww.career.usf.edu

Page 36: Fun with Google

Using Google API

• Check out http://www.google.com/apis• Google allows up to 1000 API queries per day.• Cool Perl script for scraping domain names at

www.sensepost.com: dns-mine.pl – By using combos of site, web, link, about, etc. it kind

find a lot more than previous example

• Perl scripts for “Bi-Directional Link Extractor (BiLE)” and “BiLE Weight” also available.– BiLE grabs links to sites using Google link query– BiLE weight calculates relevance of links

Page 37: Fun with Google

Remote anonymous scanning with NQT

• Google query: filetype:php inurl:nqt intext:"Network Query Tool“

• Network Query Tool allows:– Resolve/Reverse Lookup– Get DNS Records– Whois– Check port– Ping host– Traceroute

• NQT form also accepts input from XSS, but it is still unpatched at this point!

• Using a proxy, perform anonymous scan via the Web• Even worse, attacker can scan the internal hosts of

networks hosting NQT

Page 38: Fun with Google

Other portscanning

• Find PHP port scanner:– inurl:portscan.php "from Port"|"Port Range« 

• Find server status tool:– "server status" "enter domain below"

Page 39: Fun with Google

Other portscanning

Page 40: Fun with Google

Finding network reports

• Find Looking Glass router info– "Looking Glass" (inurl:"lg/" | inurl:lookingglass)

• Find Visio network drawings– Filetype:vsd vsd network

• Find CGI bin server info:– Inurl:fcgi-bin/echo

Page 41: Fun with Google

Finding network reports

Page 42: Fun with Google

Default pages

• You’ve got to be kidding!– intitle:"OfficeConnect Wireless 11g Access Point" "Checking your browser"

Page 43: Fun with Google

Finding exploit code

• Find latest and greatest:– intitle:"index of (hack |sploit | exploit | 0day)"

modified 2005– Google says it can’t add date modifier, but I

can do it manually with as_qdr=m3

• Another way:– “#include <stdio.h>” “Usage” exploit

Page 44: Fun with Google

Finding vulnerable targets

• Read up on exploits in Bugtraq. They usually tell version number of vulernable product.

• Then, use Google to search for for “powered by”– E.g., “Powered by CubeCart 2.0.1”– E.g. “Powered by CuteNews v1.3.1”– Etc.

Page 45: Fun with Google

Webcams

• Blogs and message forums buzzed this week with the discovery that a pair of simple Google searches permits access to well over 1,000 unprotected surveillance cameras around the world -- apparently without their owners' knowledge. – SecurityFocus, Jan. 7, 2005

Page 46: Fun with Google

Webcams

• Thousands of webcams used for surveillance:– inurl:"ViewerFrame?Mode=" – inurl:"MultiCameraFrame?Mode="– inurl:"view/index.shtml"– inurl:"axis-cgi/mjpg"– intitle:"toshiba network camera - User Login"– intitle:"NetCam Live Image" -.edu -.gov– camera linksys inurl:main.cgi

Page 47: Fun with Google

More junk

• Open mail relays (spam, anyone?)– inurl:xccdonts.asp

• Finger– inurl:/cgi-bin/finger? "In real life“

• Passwords– !Host=*.* intext:enc_UserPassword=* ext:pcf– "AutoCreate=TRUE password=*“– …

Page 48: Fun with Google

So much to search, so little time…

• Check out the Google Hacking Database (GHDB): http://johnny.ihackstuff.com

Page 49: Fun with Google

OK, one more…

• Search on “Homeseer web control”

Page 50: Fun with Google

How not to be a Google “victim”

• Consider removing your site from Google’s index.– “Please have the webmaster for the page in question

contact us with proof that he/she is indeed the webmaster. This proof must be in the form of a root level page on the site in question, requesting removal from Google. Once we receive the URL that corresponds with this root level page, we will remove the offending page from our index.”

• To remove individual pages from Google’s index– See http://www.google.com/remove.html

Page 51: Fun with Google

How not to be a Google “victim”

• Use a robots.txt file– Web crawlers are supposed to follow the

robots exclusion standard specified at http://www.robotstxt.org/wc/norobots.html.

• The quick way to prevent search robots crawling your site is put these two lines into the /robots.txt file on your Web server:– User-agent: *– Disallow: /

Page 52: Fun with Google

Questions