Search engines

34
Search Engines Search Engines Searching More Effectively Searching More Effectively

description

 

Transcript of Search engines

Page 1: Search engines

Search EnginesSearch Engines

Searching More EffectivelySearching More Effectively

Page 2: Search engines

What is a search engine?What is a search engine?

Multiple servers that run a program calledMultiple servers that run a program calleda spider or a crawlera spider or a crawler

Crawlers build an index of web sitesCrawlers build an index of web sites They follow the links on the website, andThey follow the links on the website, and

crawl those pagescrawl those pages You can search the index by keywordYou can search the index by keyword

matchingmatching Search engines donSearch engines don’’t search the web, theyt search the web, they

search the index, so current events maysearch the index, so current events maynot be indexed yetnot be indexed yet

Page 3: Search engines

Surfing the IndexSurfing the Index

Search engines donSearch engines don’’t search the web,t search the web,they search the index, so currentthey search the index, so currentevents may not be indexed yetevents may not be indexed yet

Page 4: Search engines

Directories can often be moreDirectories can often be morefruitful than search enginesfruitful than search engines

Because directories are created by peopleBecause directories are created by peopleand not programs...and not programs...

They are smaller but reflect evaluatedThey are smaller but reflect evaluatedmaterial instead of 'all of the web'material instead of 'all of the web'

They are even less likely to be up to dateThey are even less likely to be up to date They usually take you to the front door ofThey usually take you to the front door of

a websitea website They are organized in a thoughtful mannerThey are organized in a thoughtful manner

so you can browseso you can browse

Page 5: Search engines

Directory ExampleDirectory Example

Open Directory ProjectOpen Directory Project It helps you to think logically aboutIt helps you to think logically about

the information you needthe information you need——thethestructure is already there, you juststructure is already there, you justhave to follow ithave to follow it

Page 6: Search engines

Popular Search EnginesPopular Search Engines

GoogleGoogle AltaVistaAltaVista

YahooYahoo AllthewebAlltheweb

MSNMSN DogPileDogPile

Page 7: Search engines

Number of pages indexedNumber of pages indexed

The more pages indexed, the more likelyThe more pages indexed, the more likelyyou are to find what you needyou are to find what you need

Hard to find a needle in a haystack if theHard to find a needle in a haystack if thehaystack is dumped on your headhaystack is dumped on your head

GoogleGoogle—— 4.3 Billion 4.3 Billion YahooYahoo—— 3.2 Billion 3.2 Billion TeomaTeoma—— 1 Billion 1 Billion AltaVista and AltaVista and AllTheWebAllTheWeb——Acquired byAcquired by

Yahoo and no longer availableYahoo and no longer availableSource: Source: Infopeopleproject.orgInfopeopleproject.org

Page 8: Search engines

GoogleGoogle’’s Indexs Index

Half the searchable web, so perhapsHalf the searchable web, so perhaps8 billion searchable pages are out8 billion searchable pages are outtherethere

Page 9: Search engines

What do search engines notWhat do search engines notsearch?search?

Private DatabasesPrivate Databases•• Not fixed URLsNot fixed URLs•• Professional, academicProfessional, academic•• Example at the end of presentationExample at the end of presentation

(MERLOT)(MERLOT)

Ask Ask JeevesJeeves only lists customers that only lists customers thatpay to have their site indexedpay to have their site indexed

Page 10: Search engines

Market ShareMarket Share

comScore Media Metrix Search Engine Ratings

Page 11: Search engines

Which sites use what engine?Which sites use what engine? Google uses Google owned by GoogleGoogle uses Google owned by Google Yahoo uses Yahoo owned by Yahoo, but theyYahoo uses Yahoo owned by Yahoo, but they

used to use Google, and they did recentlyused to use Google, and they did recentlyacquire acquire InktomiInktomi

AOL uses Google & Open Directory andAOL uses Google & Open Directory andowned by AOLowned by AOL

AltaVista uses Open Directory and YahooAltaVista uses Open Directory and Yahooand owned by Yahooand owned by Yahoo

AlltheWebAlltheWeb uses and is owned by Yahoo uses and is owned by Yahoo HotBotHotBot uses Google owned by Lycos uses Google owned by Lycos

Page 12: Search engines

Why do we care?Why do we care?If two sites use the same engine, youIf two sites use the same engine, you’’llll

get the same resultsget the same results

comScore Media Metrix Search Engine Ratings

Page 13: Search engines

Two sites, same resultsTwo sites, same results

AltaVistaAltaVista——apple pie apple pie AltaVistaAltaVistafound 2,410,000 resultsfound 2,410,000 results

Yahoo.comYahoo.com--Results --Results 1 - 101 - 10 of about of about2,410,0002,410,000 for for appleapple piepie

Page 14: Search engines

What does that mean?What does that mean?

Because there are basically twoBecause there are basically twoforces now in the search engineforces now in the search engineworld, based on market share, indexworld, based on market share, indexsize and unique searchingsize and unique searchingtechnologies, general searches aretechnologies, general searches arebest done at either Yahoo or Googlebest done at either Yahoo or Google

Page 15: Search engines

How does Google match websites?How does Google match websites?

Page RankPage Rank•• Google interprets a link from page A toGoogle interprets a link from page A to

page B as a vote by page A for page B.page B as a vote by page A for page B.It also analyzes the page that casts theIt also analyzes the page that casts thevote. If itvote. If it’’s important page (many linkss important page (many linksto it), its vote counts more heavilyto it), its vote counts more heavily

Text MatchingText Matching•• A page has to be both important (PageA page has to be both important (Page

Rank) and relevant (text-matching) toRank) and relevant (text-matching) tobe at the top of the listbe at the top of the list

Page 16: Search engines

Matching (continued)Matching (continued)

When engines rank results related to textWhen engines rank results related to textmatching, the location and frequency ofmatching, the location and frequency ofthe text string plays into accountthe text string plays into account

Pages with the phrase 'apple pie' will rankPages with the phrase 'apple pie' will rankhigher than pages that mention bothhigher than pages that mention bothterms separatelyterms separately

Pages that mention apple pie repeatedlyPages that mention apple pie repeatedlyrank high than pages with fewerrank high than pages with feweroccurrencesoccurrences

Pages with apple pie in the title of thePages with apple pie in the title of thepage rank higherpage rank higher

Page 17: Search engines

Title TagTitle Tag

When constructing a web page, theWhen constructing a web page, thetitle tag is importanttitle tag is important

Search engines look at themSearch engines look at them

Page 18: Search engines

Example of Title Tag CodeExample of Title Tag Code

http://http://campuslife.wlu.educampuslife.wlu.edu Source code: <title>OrientationSource code: <title>Orientation

Programs--Washington and LeePrograms--Washington and LeeUniversity</title>University</title>

In FrontPage, File/Save As/FileIn FrontPage, File/Save As/FileName/TitleName/Title

Google Search for New Page 1,Google Search for New Page 1,17,600,00017,600,000

Page 19: Search engines

Meta Tag SearchingMeta Tag Searching

Google does not search Meta Tags,Google does not search Meta Tags,too much too much ““meta tag spammeta tag spam””

InktomiInktomi was the last major search was the last major searchengine that used it, now they haveengine that used it, now they havebeen bought by Yahoobeen bought by Yahoo

TeomaTeoma might use meta tags might use meta tags

Page 20: Search engines

Meta TagMeta Tag

<head><head> <TITLE>Revisiting Meta Tags</title><TITLE>Revisiting Meta Tags</title> <META NAME="authors" CONTENT=" Danny<META NAME="authors" CONTENT=" Danny

Sullivan">Sullivan"> <META NAME="date" CONTENT="20021205"><META NAME="date" CONTENT="20021205"> <META NAME="channel" CONTENT="internet<META NAME="channel" CONTENT="internet

technology">technology"> <META NAME="description" CONTENT="Follow<META NAME="description" CONTENT="Follow

up to October 2002 article about the demise ofup to October 2002 article about the demise ofthe meta keywords tag.">the meta keywords tag.">

</head></head>

Page 21: Search engines

Keyword SearchingKeyword Searching

Be as specific as you canBe as specific as you can DonDon’’t use t use ““carcar”” if you can use if you can use ““ToyotaToyota”” Search engines have a hard timeSearch engines have a hard time

differentiating between differences indifferentiating between differences inmeaning, i.e., hard exam, hard cider, hardmeaning, i.e., hard exam, hard cider, hardtimes, hard drivetimes, hard drive

It canIt can’’t think for yout think for you——if you put in if you put in ““heartheartattackattack””, it won, it won’’t show pages with t show pages with ““cardiaccardiacarrestarrest””

Page 22: Search engines

Boolean SearchingBoolean Searching

George George BooleBoole, English, EnglishMathematician, Died 1864-logicalMathematician, Died 1864-logicalcombinatorial systemcombinatorial system

AND, OR, NOTAND, OR, NOT Used to get more targeted resultsUsed to get more targeted results Default Operator is AND at all majorDefault Operator is AND at all major

search engines, so if you type insearch engines, so if you type inapple pie, sites assume apple pie, sites assume ““apple ANDapple ANDpiepie””

Page 23: Search engines

Using Boolean Operators atUsing Boolean Operators atGoogleGoogle

Default Operator is ANDDefault Operator is AND apple pieapple pie—— 1,710,0001,710,000 apple AND pie (+pie)apple AND pie (+pie)——1,690,000,1,690,000,

default operator message, but itdefault operator message, but itdoes take into account word orderdoes take into account word order

Fewer results, perhaps a little moreFewer results, perhaps a little moreusefuluseful

Page 24: Search engines

Boolean Operator ORBoolean Operator OR

apple OR pieapple OR pie——7,140,0007,140,000 Use this if you donUse this if you don’’t want to rule outt want to rule out

too muchtoo much Asthma, acute OR chronicAsthma, acute OR chronic

Page 25: Search engines

Boolean Operator NOTBoolean Operator NOT

apple NOT pie (apple NOT pie (––pie)pie) What will NOT do to the searchWhat will NOT do to the search

results?results? ——816,000816,000 Lessened results by halfLessened results by half How could you use NOT to search forHow could you use NOT to search for

information about Bass fishing?information about Bass fishing? bass NOT guitar (when you want thebass NOT guitar (when you want the

fish)fish)

Page 26: Search engines

Be Careful with Be Careful with ““NOTNOT””

A search for 'apple pie NOT cobbler'A search for 'apple pie NOT cobbler'may remove useful results such asmay remove useful results such as"Aunt Sarah's Better Than Cobbler"Aunt Sarah's Better Than CobblerApple Pie"Apple Pie"

Page 27: Search engines

SynonymsSynonyms

~apple ~pie (synonyms)~apple ~pie (synonyms)——4,520,0004,520,000

Page 28: Search engines

Domain RestrictDomain Restrict

apple pie apple pie site:www.allrecipes.comsite:www.allrecipes.com——733733

More appropriate example:More appropriate example: admissions informationadmissions information——3,730,0003,730,000 admissions information admissions information site:www.wlu.edusite:www.wlu.edu 68 68www.wlu.eduwww.wlu.edu, search, search

Page 29: Search engines

Exact SearchExact Search

How do you get results that matchHow do you get results that matchexactly?exactly?

Use quotation marks, i.e., Use quotation marks, i.e., ““apple pieapple pie”” 696,000 on Google696,000 on Google

Page 30: Search engines

AltaVista & Google Cool FeatureAltaVista & Google Cool Feature

LinkLink——find out how many indexedfind out how many indexedpages link to your pagepages link to your page

http://http://www.altavista.comwww.altavista.com link:leechapel.wlu.edulink:leechapel.wlu.edu AltaVistaAltaVista——92 (searches Yahoo)92 (searches Yahoo) GoogleGoogle——3333

Page 31: Search engines

Cached ItemsCached Items

Google Google ““takes a picturetakes a picture”” (indexes a (indexes asite)site)

As web sites often do, the site goesAs web sites often do, the site goesawayaway

You can still look at the old siteYou can still look at the old sitethrough the cachethrough the cache

www.google.comwww.google.com

Page 32: Search engines

Meta Search EngineMeta Search Engine

What is a Meta Search Engine?What is a Meta Search Engine? Search Engines that display results fromSearch Engines that display results from

several sites at onceseveral sites at once DogpileDogpile--Google --Google ·· Yahoo Yahoo ·· Ask Ask JeevesJeeves

About About ·· LookSmartLookSmart ·· Overture OvertureFindWhatFindWhat

HmmmHmmm……DogpileDogpile inserts sites that have inserts sites that havepaid for placement without telling intopaid for placement without telling intoresults from various search enginesresults from various search engines

Page 33: Search engines

Safe SearchSafe Search

GoogleGoogle——SafeSearchSafeSearchFilterFilter——preferencespreferences

YahooYahoo——SafeSearchSafeSearchFilterFilter——preferencespreferences

AltaVistaAltaVista——Settings, Family Filter, canSettings, Family Filter, canset a passwordset a password

Page 34: Search engines

Advanced SettingsAdvanced Settings

Most search sites have a link forMost search sites have a link foradvanced settings, so you donadvanced settings, so you don’’t havet haveto remember the particular syntaxto remember the particular syntaxfor a particular type of searchfor a particular type of search