LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

44
LIS618 lecture 9 Google Thomas Krichel 2011-11-21
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

Page 1: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

LIS618 lecture 9

Google

Thomas Krichel2011-11-21

Page 2: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

structure

• Google query language • Google special services and features– official– officious

Page 3: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

literature

• Calishain and Dornfest's “Google hacks”, O'Reilley 2003

• Schneider & alii “How to do everything with Google”, McGraw Hill Osborne, 2004

• Google web site • http://www.googleguide.com/

advanced_operators.html

Page 4: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

interfaces• The simple interface has command driven

features that make it more advanced than the advanced interface

• The advanced interface is a form interface to query language available on the simple interface.

• The Google toolbars for different browsers may be quite useful.

Page 5: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

customizing Google• These are available in the " " link on the ⚙

Google home page.• Preferences only stored as a cookie in the

browser.

Page 6: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

language preference• You can set– preferences for finding pages in a certain language

(set language to German and search for “Krichel”)– preferences for the language of the interface– both are impact search results.

• In fact the language preference is detected automatically by a http header that the browser usually sends.

• Thus you can set it in the browser too.

Page 7: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

SafeSearch

• This is a tool for the automatic exclusion of explicit erotic material.

• This was a big controversy topic in the 90s, early 00s in libraries.

• There is a way to look the save search on a browser by logging in to a Google account. But it is still browser dependent and (o/s level) user account dependent, and requires the authorization of third party cookies.

Page 8: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

Google Instant and results

• Google Instant is the feature that produces results while you type them.

• It differs by language and Google domain. – In google.com I am proposed “krichel” as a

completion to “kric”– In google.ru, I am not ;-(

• If you change the number of results per page from the default, the showing of instant results is disabled.

Page 9: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

location preference

• The location preference is normally extracted from the IP address of the browser’s computer.

• It may be set in the preference to some un-controlled string. But that will only take account of location inside the country domain. So Google ignores “moscow” as your location when searching google.es.

Page 10: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

composing the query

• You should type your question as responded on the web site. E.g. to find the age of N.N. “N.N. born”.

• Sometimes Google will attempt a guess, e.g. “Drupal written in”.

• Try descriptive words, e.g. avoid “documentation”

• For queries that are obvious, like “thomas krichel” use the “I’m feeling lucky” button.

Page 11: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

Boolean operators

• Default Boolean AND between terms• Case insensitive • Terms can be ORed with “ |”, or “OR” in all

caps (!) e.g. “krichel | chen”. “or” is treated is a normal term

• Adjacent terms have to be put in double quotes.

• Boolean NOT can be expressed with the exclusion operator – e.g. “krichel –thomas”

Page 12: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

wildcard and limit

• * is a wildcard for any word• There is a limit of 10 words, but a * does not

count towards the limit.

Page 13: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

phrase searching

• Double quotes around more than one term will prefer the terms to be appearing next to each other.

• Double quotes around one term will require the term to be appears as such, without spelling variations or grammatical variation. Example “"ralf"” will not search for “ralph”.

Page 14: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

similarity operator

• ~ is the similarity operator. It searches for synonyms. Synonyms are gleaned from the web, not from a thesaurus. This seem to include common spelling as well. examples– “~auto repair novosibirsk” vs “auto repair

novosibirsk”– ~phillipp ~meyr gesis

Page 15: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

query treatment

• Google prefers pages that have the search terms – in close proximity– in the same order as in the query

• Repeating a query term once adds weight to it• repeating it twice has no further effect

Page 16: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

spell checking

• Google makes spelling suggestions. They based on usage of query terms, not on a dictionary.– example: “untied stats” suggests “united states”– example: “beurocratic”

• But note that these suggestions depend on your interface language.

Page 17: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

maps

• Most things that look like a location expression can be entered in the normal search interface and return a map.”

• Example “ 11 rue Boinod, 75018 Paris”• Flights also have such a feature, if you

give your flight number

Page 18: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

map queries [n/a]

• you can compose map query as – point of interest near location– point of interest in location

• where point of interest is a destination of interest and location is a location. In the US, the name, state combination is sufficient. Examples• “strip clubs near jackson heights” • “shoe repair in Paris, France”

Page 19: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

math I

• you can enter math in plain English words– example: “two times two”– example: “half of eleven”– example: “five megabytes in bytes”– example: “ten gallons in liters”

• also knows the standard operators “+”, “-”, “*”, “/”, ”^“ or “**”, “% of”

Page 20: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

advanced math• “!” factorial• “choose” combination without replacement• “sqrt” square root• “log” logarithm (base 10)• “ln” logarithm (base e)• “lg” logarithm (base 2)• “exp” e to the power of• “mod” modulo (remainder)• “sin”, “cos”, “tan” “csc”, “sec”, “ctn”• “arcsin”, “arccos”, “arctan” “arccsc”, “arcsec”, “arcctn”• “sinh”, “cosh”, “tanh”

Page 21: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

site:• This is an officially documented special syntax

to limit searches to a domain.• It breaks down if a path is included.• It can not be used on its own, only with other

query expressions. • This can be used to build site-specific engines

using Google. Such engines would use the local query and append the site limiter.

Page 22: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

site:

• Example “"shit" krichel site:liu.edu -baseball -nyu”

• You could put longer site names “krichel site:www.liu.edu”.

• You can use -site to exclude a site eg. “krichel -site:openlib.org -site:liu.edu”

Page 23: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

filetype:

• This appears to find file with a certain type.• But it only looks for the extension. • Example– “copyright filetype:pptx”

Page 24: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

related:• related can be used with a site URL (without

http://) to find related pages. It can be combined with search terms.

• This is the same as the related link next to searches.

• The searches for– related:openlib.org/home/krichel– related:www.liu.edu

are very successful.

Page 25: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

link:

• link: returns pages that link to a specific page.• It is mentioned on the Google site.• It does not seem to work properly. Example:

link:wotan.liu.edu/home/krichel. Generally wotan is excluded in Google as a duplicate.

• Example: “link:openlib.org -site:openlib.org”, seems to lead to nonsense.

Page 26: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

link: limitation

• You cannot combine a link: search with a regular keyword search. As soon as you do the "link" is interpreted as a normal search term

• Google does probably not return all the pages that match.

• Example: “link:openlib.org -site:openlib.org”

Page 27: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

inofficial syntax

• There are special syntaxes that are not documented, but they appear to be working.

Page 28: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

intitle:• intitle: find in html <title> only• Example “intitle:lis618” find old powerpoints

slides. Recent slides can be found with “intitle:powerpoint krichel lis618” because that is the title of the slides at this time. ;-(.

• It can be combined with other terms. Example:– “intitle:"Thomas Krichel"”– intitle:lis650 -krichel -thomas -site:wotan.liu.edu

Page 29: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

intext:

• intext: find in text only. This will exclude occurrences of the search term in anchor or title data.

• example: “intext:"miserable failure"“ will not bring up George W. Bush's official biography, as “miserable failure” does, or did at one time.

Page 30: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

allintext:

• This requires all the term that follow to be in the text. The order seems to depend on the position of the words. But the words are, supposedly, requiered to be there

• “allintext:krichel love I thomas”• “allintext:I love thomas krichel”• “allintext:krichel novosibirsk 1209”

Page 31: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

inanchor:

• inanchor: This option requests pages, for which there is another page that links to them with the anchor text in the query. Example: – “inanchor:"courses information" krichel” finds it

now. – “inanchor:"nsk 10 hourly"”– “inanchor:lis650” finds http://www.docin.com/p-

71424881.html

Page 32: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

inanchor

• inanchor: can be used to restrict an otherwise popular term to the instances of something you want. Example– “inanchor:listserv library” – “inanchor:"list of my courses"” finds my courses

page because it has a link with that text from an old version of my homepage.

Page 33: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

allinanchor:

• This requires all the following terms to be in the anchor.

• Example:– allinanchor:Томас Крихель

Page 34: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

cache:

• cache: pages that are in the google cache, useful if query result has nothing to do with the query terms

• Example: “cache:openlib.org/home/krichel” will show the cached version of the page.

• If you add further terms, they will be highlighted.

Page 35: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

cache:

• This can be used to check if a page is indexed.• Example: – “cache:3d.openlib.org”– “cache:openlib.org”

Page 36: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

inurl:

• It finds in URL only.• It can use star as a wildcard• Examples– “inurl:list”– “inurl:krichel -thomas”– “inurl:*.openlib.org”– “inurl:wotan.liu.edu/omeka/ grandmother”

Page 37: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

allinurl:

• Requires all the terms that follow it to be in the URL address of the page found.

• Example: – “allinurl:visitor krichel house_rules” – “allinurl:visitor krichel house_rules

site:wotan.liu.edu” – “allinurl:visitor krichel house_rules

site:openlib.org”

Page 38: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

daterange:• This limits the search to pages indexed between a

range of dates. Changed pages are reindexed, unchanged pages are not reindexed when the crawler visits a page.

• Dates are expressed in the Julian period, i.e. number of days after -4713-01-01 0:00 UTC of the Julian calendar. This date is used by astronomers.

• Find a converter with the Google search “julian converter site:nasa.gov”

• example: “daterange:2453051-2453071 krichel”

Page 39: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

info:

• info: shows information about a page. The argument to info must be a real existing page that is in the Google index.

• Example: “info:openlib.org/home/krichel”

Page 40: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

mixing special syntax expressions

• The link: syntax does not mix with others.• Other bad ideas:– "site:openlib.org –inurl:openlib"– "site:edu site:com"

• Things that work well– intitle:search – Intitle:biology inurl:help

Page 41: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

examples

• George Bush site:nytimes.com• "Copyright * The New York Times" "George Bush"• Intitle:"directory * * trees"• Botany intitle:"directory of" site:edu• "powered by blogger" or site:blogspot.com• "classical music" (inurl:mailman | inurl:listserv)• google special syntax –site:google.com

Page 42: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

stocks on google

• stocks: ticker will look up a ticker symbol ticker at http://finance.yahoo.com

• you can find ticker symbols there• ticker symbols are useful to find financial

information about publicly traded companies. • example: “stocks:msft”

Page 43: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

google images special syntax

• intitle: searches for images with a given string in the file name– example: “intitle:novosibirsk”

• inurl: searches for images in pages that have a certain url– example: “inurl:liu.edu”

• site: restricts the search to a certain site. It should be combined with a search term like – example “site:liu.edu koenig”

Page 44: LIS618 lecture 9 Google Thomas Krichel 2011-11-21.

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!