The Players The Majors Dead Search Engines International Search Engines Metasearch Engines.
Search engines
-
Upload
stefanos-anastasiadis -
Category
Documents
-
view
276 -
download
3
description
Transcript of Search engines
Search EnginesSearch Engines
Searching More EffectivelySearching More Effectively
What is a search engine?What is a search engine?
Multiple servers that run a program calledMultiple servers that run a program calleda spider or a crawlera spider or a crawler
Crawlers build an index of web sitesCrawlers build an index of web sites They follow the links on the website, andThey follow the links on the website, and
crawl those pagescrawl those pages You can search the index by keywordYou can search the index by keyword
matchingmatching Search engines donSearch engines don’’t search the web, theyt search the web, they
search the index, so current events maysearch the index, so current events maynot be indexed yetnot be indexed yet
Surfing the IndexSurfing the Index
Search engines donSearch engines don’’t search the web,t search the web,they search the index, so currentthey search the index, so currentevents may not be indexed yetevents may not be indexed yet
Directories can often be moreDirectories can often be morefruitful than search enginesfruitful than search engines
Because directories are created by peopleBecause directories are created by peopleand not programs...and not programs...
They are smaller but reflect evaluatedThey are smaller but reflect evaluatedmaterial instead of 'all of the web'material instead of 'all of the web'
They are even less likely to be up to dateThey are even less likely to be up to date They usually take you to the front door ofThey usually take you to the front door of
a websitea website They are organized in a thoughtful mannerThey are organized in a thoughtful manner
so you can browseso you can browse
Directory ExampleDirectory Example
Open Directory ProjectOpen Directory Project It helps you to think logically aboutIt helps you to think logically about
the information you needthe information you need——thethestructure is already there, you juststructure is already there, you justhave to follow ithave to follow it
Popular Search EnginesPopular Search Engines
GoogleGoogle AltaVistaAltaVista
YahooYahoo AllthewebAlltheweb
MSNMSN DogPileDogPile
Number of pages indexedNumber of pages indexed
The more pages indexed, the more likelyThe more pages indexed, the more likelyyou are to find what you needyou are to find what you need
Hard to find a needle in a haystack if theHard to find a needle in a haystack if thehaystack is dumped on your headhaystack is dumped on your head
GoogleGoogle—— 4.3 Billion 4.3 Billion YahooYahoo—— 3.2 Billion 3.2 Billion TeomaTeoma—— 1 Billion 1 Billion AltaVista and AltaVista and AllTheWebAllTheWeb——Acquired byAcquired by
Yahoo and no longer availableYahoo and no longer availableSource: Source: Infopeopleproject.orgInfopeopleproject.org
GoogleGoogle’’s Indexs Index
Half the searchable web, so perhapsHalf the searchable web, so perhaps8 billion searchable pages are out8 billion searchable pages are outtherethere
What do search engines notWhat do search engines notsearch?search?
Private DatabasesPrivate Databases•• Not fixed URLsNot fixed URLs•• Professional, academicProfessional, academic•• Example at the end of presentationExample at the end of presentation
(MERLOT)(MERLOT)
Ask Ask JeevesJeeves only lists customers that only lists customers thatpay to have their site indexedpay to have their site indexed
Market ShareMarket Share
comScore Media Metrix Search Engine Ratings
Which sites use what engine?Which sites use what engine? Google uses Google owned by GoogleGoogle uses Google owned by Google Yahoo uses Yahoo owned by Yahoo, but theyYahoo uses Yahoo owned by Yahoo, but they
used to use Google, and they did recentlyused to use Google, and they did recentlyacquire acquire InktomiInktomi
AOL uses Google & Open Directory andAOL uses Google & Open Directory andowned by AOLowned by AOL
AltaVista uses Open Directory and YahooAltaVista uses Open Directory and Yahooand owned by Yahooand owned by Yahoo
AlltheWebAlltheWeb uses and is owned by Yahoo uses and is owned by Yahoo HotBotHotBot uses Google owned by Lycos uses Google owned by Lycos
Why do we care?Why do we care?If two sites use the same engine, youIf two sites use the same engine, you’’llll
get the same resultsget the same results
comScore Media Metrix Search Engine Ratings
Two sites, same resultsTwo sites, same results
AltaVistaAltaVista——apple pie apple pie AltaVistaAltaVistafound 2,410,000 resultsfound 2,410,000 results
Yahoo.comYahoo.com--Results --Results 1 - 101 - 10 of about of about2,410,0002,410,000 for for appleapple piepie
What does that mean?What does that mean?
Because there are basically twoBecause there are basically twoforces now in the search engineforces now in the search engineworld, based on market share, indexworld, based on market share, indexsize and unique searchingsize and unique searchingtechnologies, general searches aretechnologies, general searches arebest done at either Yahoo or Googlebest done at either Yahoo or Google
How does Google match websites?How does Google match websites?
Page RankPage Rank•• Google interprets a link from page A toGoogle interprets a link from page A to
page B as a vote by page A for page B.page B as a vote by page A for page B.It also analyzes the page that casts theIt also analyzes the page that casts thevote. If itvote. If it’’s important page (many linkss important page (many linksto it), its vote counts more heavilyto it), its vote counts more heavily
Text MatchingText Matching•• A page has to be both important (PageA page has to be both important (Page
Rank) and relevant (text-matching) toRank) and relevant (text-matching) tobe at the top of the listbe at the top of the list
Matching (continued)Matching (continued)
When engines rank results related to textWhen engines rank results related to textmatching, the location and frequency ofmatching, the location and frequency ofthe text string plays into accountthe text string plays into account
Pages with the phrase 'apple pie' will rankPages with the phrase 'apple pie' will rankhigher than pages that mention bothhigher than pages that mention bothterms separatelyterms separately
Pages that mention apple pie repeatedlyPages that mention apple pie repeatedlyrank high than pages with fewerrank high than pages with feweroccurrencesoccurrences
Pages with apple pie in the title of thePages with apple pie in the title of thepage rank higherpage rank higher
Title TagTitle Tag
When constructing a web page, theWhen constructing a web page, thetitle tag is importanttitle tag is important
Search engines look at themSearch engines look at them
Example of Title Tag CodeExample of Title Tag Code
http://http://campuslife.wlu.educampuslife.wlu.edu Source code: <title>OrientationSource code: <title>Orientation
Programs--Washington and LeePrograms--Washington and LeeUniversity</title>University</title>
In FrontPage, File/Save As/FileIn FrontPage, File/Save As/FileName/TitleName/Title
Google Search for New Page 1,Google Search for New Page 1,17,600,00017,600,000
Meta Tag SearchingMeta Tag Searching
Google does not search Meta Tags,Google does not search Meta Tags,too much too much ““meta tag spammeta tag spam””
InktomiInktomi was the last major search was the last major searchengine that used it, now they haveengine that used it, now they havebeen bought by Yahoobeen bought by Yahoo
TeomaTeoma might use meta tags might use meta tags
Meta TagMeta Tag
<head><head> <TITLE>Revisiting Meta Tags</title><TITLE>Revisiting Meta Tags</title> <META NAME="authors" CONTENT=" Danny<META NAME="authors" CONTENT=" Danny
Sullivan">Sullivan"> <META NAME="date" CONTENT="20021205"><META NAME="date" CONTENT="20021205"> <META NAME="channel" CONTENT="internet<META NAME="channel" CONTENT="internet
technology">technology"> <META NAME="description" CONTENT="Follow<META NAME="description" CONTENT="Follow
up to October 2002 article about the demise ofup to October 2002 article about the demise ofthe meta keywords tag.">the meta keywords tag.">
</head></head>
Keyword SearchingKeyword Searching
Be as specific as you canBe as specific as you can DonDon’’t use t use ““carcar”” if you can use if you can use ““ToyotaToyota”” Search engines have a hard timeSearch engines have a hard time
differentiating between differences indifferentiating between differences inmeaning, i.e., hard exam, hard cider, hardmeaning, i.e., hard exam, hard cider, hardtimes, hard drivetimes, hard drive
It canIt can’’t think for yout think for you——if you put in if you put in ““heartheartattackattack””, it won, it won’’t show pages with t show pages with ““cardiaccardiacarrestarrest””
Boolean SearchingBoolean Searching
George George BooleBoole, English, EnglishMathematician, Died 1864-logicalMathematician, Died 1864-logicalcombinatorial systemcombinatorial system
AND, OR, NOTAND, OR, NOT Used to get more targeted resultsUsed to get more targeted results Default Operator is AND at all majorDefault Operator is AND at all major
search engines, so if you type insearch engines, so if you type inapple pie, sites assume apple pie, sites assume ““apple ANDapple ANDpiepie””
Using Boolean Operators atUsing Boolean Operators atGoogleGoogle
Default Operator is ANDDefault Operator is AND apple pieapple pie—— 1,710,0001,710,000 apple AND pie (+pie)apple AND pie (+pie)——1,690,000,1,690,000,
default operator message, but itdefault operator message, but itdoes take into account word orderdoes take into account word order
Fewer results, perhaps a little moreFewer results, perhaps a little moreusefuluseful
Boolean Operator ORBoolean Operator OR
apple OR pieapple OR pie——7,140,0007,140,000 Use this if you donUse this if you don’’t want to rule outt want to rule out
too muchtoo much Asthma, acute OR chronicAsthma, acute OR chronic
Boolean Operator NOTBoolean Operator NOT
apple NOT pie (apple NOT pie (––pie)pie) What will NOT do to the searchWhat will NOT do to the search
results?results? ——816,000816,000 Lessened results by halfLessened results by half How could you use NOT to search forHow could you use NOT to search for
information about Bass fishing?information about Bass fishing? bass NOT guitar (when you want thebass NOT guitar (when you want the
fish)fish)
Be Careful with Be Careful with ““NOTNOT””
A search for 'apple pie NOT cobbler'A search for 'apple pie NOT cobbler'may remove useful results such asmay remove useful results such as"Aunt Sarah's Better Than Cobbler"Aunt Sarah's Better Than CobblerApple Pie"Apple Pie"
SynonymsSynonyms
~apple ~pie (synonyms)~apple ~pie (synonyms)——4,520,0004,520,000
Domain RestrictDomain Restrict
apple pie apple pie site:www.allrecipes.comsite:www.allrecipes.com——733733
More appropriate example:More appropriate example: admissions informationadmissions information——3,730,0003,730,000 admissions information admissions information site:www.wlu.edusite:www.wlu.edu 68 68www.wlu.eduwww.wlu.edu, search, search
Exact SearchExact Search
How do you get results that matchHow do you get results that matchexactly?exactly?
Use quotation marks, i.e., Use quotation marks, i.e., ““apple pieapple pie”” 696,000 on Google696,000 on Google
AltaVista & Google Cool FeatureAltaVista & Google Cool Feature
LinkLink——find out how many indexedfind out how many indexedpages link to your pagepages link to your page
http://http://www.altavista.comwww.altavista.com link:leechapel.wlu.edulink:leechapel.wlu.edu AltaVistaAltaVista——92 (searches Yahoo)92 (searches Yahoo) GoogleGoogle——3333
Cached ItemsCached Items
Google Google ““takes a picturetakes a picture”” (indexes a (indexes asite)site)
As web sites often do, the site goesAs web sites often do, the site goesawayaway
You can still look at the old siteYou can still look at the old sitethrough the cachethrough the cache
www.google.comwww.google.com
Meta Search EngineMeta Search Engine
What is a Meta Search Engine?What is a Meta Search Engine? Search Engines that display results fromSearch Engines that display results from
several sites at onceseveral sites at once DogpileDogpile--Google --Google ·· Yahoo Yahoo ·· Ask Ask JeevesJeeves
About About ·· LookSmartLookSmart ·· Overture OvertureFindWhatFindWhat
HmmmHmmm……DogpileDogpile inserts sites that have inserts sites that havepaid for placement without telling intopaid for placement without telling intoresults from various search enginesresults from various search engines
Safe SearchSafe Search
GoogleGoogle——SafeSearchSafeSearchFilterFilter——preferencespreferences
YahooYahoo——SafeSearchSafeSearchFilterFilter——preferencespreferences
AltaVistaAltaVista——Settings, Family Filter, canSettings, Family Filter, canset a passwordset a password
Advanced SettingsAdvanced Settings
Most search sites have a link forMost search sites have a link foradvanced settings, so you donadvanced settings, so you don’’t havet haveto remember the particular syntaxto remember the particular syntaxfor a particular type of searchfor a particular type of search