1 JavaScript E-Commerce Prof. Sheizaf Rafaeli sheizaf/ecommerce/javascript.examples.html.
1 Search Engines Needles and Haystacks. E-Commerce Prof. Sheizaf Rafaeli2 News… zIn winter 2004,...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of 1 Search Engines Needles and Haystacks. E-Commerce Prof. Sheizaf Rafaeli2 News… zIn winter 2004,...
E-Commerce
Prof. Sheizaf Rafaeli 2
News…
In winter 2004, Google jumped up to 4,300,000,000 pages. Still a drop in the bucket.
Yahoo, AJ and others still runningFroogle. Google News. Google
Compute. The Deskbar, Cooking with google, ratemyprofessors.com….
Teoma, Kartoo, Vivisimo, Booble, amazon, imdb, VisualThesaurus, Wikipedia, Touchgraph, Grokker
E-Commerce
Prof. Sheizaf Rafaeli 4
Some concepts…
Manual vs. automatic vs. metasearchingDeep “hidden” web“Webliographies”, BloggingGooglewhacking, google bombing, To “be
googled”Web archives, wayback and alexaLaunchpads and toolbars: Microsoft,
Google, Clicksearch, Babylon, AlexaAI in searching (Google, AJ)
E-Commerce
Prof. Sheizaf Rafaeli 5
How much information is on the web?
35 GB? 300 GB? 3 TB? more?Mid 1999 estimate: 800 million pagesMid 2000 estimate: 3 billion (מיליארד) pagesMid 2003 estimate: 15 billion pages + “Deep Web”Google now indexes (only?) well over 4 billion
Early 2001 “Deep Web” estimate: 500 billionHow do you even estimate?How can you find what you are looking for?Doesn’t this remind you of going to the
library???
E-Commerce
Prof. Sheizaf Rafaeli 6
Engines Idling Roughly
Search engines were supposed to be the Grand Central stations of the Internet: a starting point for every venture into an overwhelming world of information. It appears, however, that people are comfortable clicking around on their own. Only 7 percent of Web pages are accessed through a search engine, a portion that has remained almost static since 1999.
E-Commerce
Prof. Sheizaf Rafaeli 7
Engines Idling Roughly
While search engines may not drive all that much traffic, they do take up a lot of time. Six out of 10 people online report using search engines more than one hour a week, according to a survey by pollster Roper Starch; more than a third search the Net diligently over two hours every week.
E-Commerce
Prof. Sheizaf Rafaeli 8
Engines Idling Roughly
Not surprisingly, many of these surfers are annoyed. Overall, 71 percent of people online say they get frustrated while searching the Net. And it doesn't take them long to lose their cool: About half are frustrated within 15 minutes. But despite the Web's enormous size about 80 percent of people say they usually find what they need when searching.
E-Commerce
Prof. Sheizaf Rafaeli 9
Engines Idling Roughly
But even the most comprehensive search engine, Google, captures only 42 percent of indexable Web pages. And that number drops dramatically for the competition. Second-ranked Fast, a Norwegian search technology, and Inktomi index 19 percent and 17 percent of the Web, respectively.
Still, when it comes to searching, less is more. Specialized search sites may be the key to helping people find what they're looking for. "Internet users need relevance when conducting searches," predicting the emergence of "vertical" search engines for specific user groups.
E-Commerce
Prof. Sheizaf Rafaeli 10
How do you find things at the library?
Several models: Walk around until you find something Walk around until you forget what you
want Walk around until you find a place to nap Use the library catalog Use the services of someone who knows
the collection (Reference Librarian)
E-Commerce
Prof. Sheizaf Rafaeli 12
Not all are American or even English, here, eg., are several Hebrew engines
: וואלהhttp://www.walla.co.il : אחלהhttp://www.achla.co.il : תפוזhttp://www.tapuz.co.il : נענעhttp://www.nana.co.il : סבבהhttp://www.sababa.co.il הארץ וIOL נדב הראל וiguide
E-Commerce
Prof. Sheizaf Rafaeli 16
Search Engines Refer Only A Small Percentage Of Traffic To Web Sites Worldwide
Are Search Engines truly so important?
E-Commerce
Prof. Sheizaf Rafaeli 17
What do Search Engines search?
They do NOT search the Web! That is, they do not search the web
the very moment you ask for something. Rather they search their databases or indexes
Search engines store the contents of millions of websites in an index or DB, and your query is matched up against that
E-Commerce
Prof. Sheizaf Rafaeli 18
What do Search Engines search?
They don’t even catalog the entire contents of the WWW! Nowhere near, in fact... you only get
what they have! For the most part, they don’t have the
contents of the websites they show you, only links to these sites
E-Commerce
Prof. Sheizaf Rafaeli 19
How do they find it?
They use Spiders, webbots and bots Crawlers, worms, and harvesters Wanderers, indexers, and sitesuckers
What are they? Self-directed browsers which go from link
to link, retrieving all or part of the contents of any given site for inclusion in the search engine's database.
E-Commerce
Prof. Sheizaf Rafaeli 20
How do I find what I want?
“Excuse me, do you have anything on fish..?”
“Do you have anything about the Olympics?”
E-Commerce
Prof. Sheizaf Rafaeli 21
How do I find what I want?
It pays to know how to askIt pays to understand how collections
work
E-Commerce
Prof. Sheizaf Rafaeli 22
Know the lingo
Boolean OperatorsFalse DropsDirectoriesFull-Text IndexingStemmingWebliographies
HitsRecallPrecisionKeywordsMeta-Search
EnginesPresentation order
E-Commerce
Prof. Sheizaf Rafaeli 23
Know the lingo
Boolean Operators Mathematical expressions used to express
statements of formal logic. Some of the most common Boolean operators are AND, OR, NOT and ()
Examples:icons AND NOT relig* free AND pictures AND NOT (nude OR naked)
Many sites claim to use it, only a few work well... trial and error
E-Commerce
Prof. Sheizaf Rafaeli 24
Know the lingo
False Drops Documents or websites retrieved that
are not relevant to the user’s needs Examples:
Let’s do a quick search for XXX
E-Commerce
Prof. Sheizaf Rafaeli 25
Know the lingo
Directories A hierarchical search that proceeds
through increasingly more specific headings or sub-topics
Let’s visit
E-Commerce
Prof. Sheizaf Rafaeli 26
Know the lingo
Full-Text Indexing An indexing method in which every word in
the web page is put into the database, with the exception of prepositions, conjuctions, and the like.
Controlled-language indexing How directories are implemented
Both of these are done for you by the Search Engine
E-Commerce
Prof. Sheizaf Rafaeli 27
Know the lingo
Stemming A type of search that uses the common
root of a word to include all possible occurrences of that word
Example:"child*" would yield results that include
childhood, childless, children, etc.
E-Commerce
Prof. Sheizaf Rafaeli 28
Know the lingo
Hits Documents, or references to documents, that
are returned in response to a query Note: a hit is not necessarily relevant
Recall The degree to which all the matching
documents in a collection are returned, i.e., if a search engine retrieves 80 of 100 available documents, its recall is 80%.
How do you determine recall on the web?
E-Commerce
Prof. Sheizaf Rafaeli 29
Know the lingo
Precision A standard way of measuring the
accuracy of an information retrieval system
The number of relevant documents obtained divided by the total number of documents retrievedin other words: (useful stuff / what you got)remember that a hit is not necessarily relevant
E-Commerce
Prof. Sheizaf Rafaeli 30
Know the lingo
Keywords A search that looks for specific words
provided by cataloged sites Typically, a search engine agent looks
for keywords contained in the <META> tag
A website developer can manipulate the <META> tag to increase the visibility of his/her site, at the expense of accuracy
E-Commerce
Prof. Sheizaf Rafaeli 31
Some Search Tips
Use the plus (+) and minus (-) signs in front of words to force their inclusion and/or exclusion in searches.
Use double quotation marks (" ") around phrases to ensure they are searched exactly as is
Put your most important keywords first in the string.
Type keywords and phrases in lower case to find both lower and upper case versions.
Use truncation and wildcards (e.g., *) to look for variations in spelling and word form.
Know whether or not the search engine you are using maintains a stop word list
E-Commerce
Prof. Sheizaf Rafaeli 32
The “Deep Web”
Regular web searches only drag nets across the surface
E-Commerce
Prof. Sheizaf Rafaeli 34
The “Deep Web”
500 times larger than surface web95% of it is public and freeContent in deep web 1000+ times
better quality7,500 TerraBytes (TB) of information45,000 search engines in “surface
web”
E-Commerce
Prof. Sheizaf Rafaeli 35
Presentation order (1)
Presentation order may be more important than just being mentioned. Is order affected by commercial fees? "A page is important if a bunch of
important pages point to it," explained Brin. (Google.com) "It's the sum of the pages that point to it."
E-Commerce
Prof. Sheizaf Rafaeli 36
Presentation order (2)
Location, Location, Location...and Frequency keywords appearing in the title, top are
more relevant than others, etc.
Link popularity Relevancy (person, institution)Meta tags Penalty items
E-Commerce
Prof. Sheizaf Rafaeli 37
Meta-Search Engines
Use multiple search engines in parallel to provide an answer to a single query
Front-ends to other search engines and their collections and typically do not contain their own databases
Examples Surfwax, Vivisimo, Ask Jeeves,
Metacrawler, The Mining Company
E-Commerce
Prof. Sheizaf Rafaeli 38
The Best Search Engine is…
Whichever one you can actually find things with Sometimes their indexing is a little more
“natural” to you Some people prefer search engines that use
directories (Yahoo! and others) and some prefer simple indexing (Altavista and others)
Some people prefer the “human touch” (“webliographies”, “about” The Mining Company).
E-Commerce
Prof. Sheizaf Rafaeli 39
Getting Listed and Noticed (promoting your page)
Have worthwhile content/service
Manual list with engines
Submission Services(like www.submitit.com)
Advertize in print, other media
Use graphics, scripts appropriately
Use good keywordsUse <META> tag
tricksGet complimentary
links, awardsJoin “rings”Be aware of XML,
Ratings and PICS
E-Commerce
Prof. Sheizaf Rafaeli 40
Disintermediation?
Re-intermediation!Infomediaries!
(portals, agents, consultants, experts)
Hagel and Singer: Net Worth: The emerging role of the infomediary in the race for customer information
E-Commerce
Prof. Sheizaf Rafaeli 41
Resources
Webhound www.mcli.dist.maricopa.edu/webhound/
websearch.about.com Search Engine Watch
www.searchiq.com www.searchenginewatch.com
The Spider’s Apprentice, at http://www.monash.com/spidap.html
E-Commerce
Prof. Sheizaf Rafaeli 43
Resources
S. Lawrence, C. L. Giles, Accessibility of Information on the Web, Nature, 400, pp. 107-109, 1999.
S. Lawrence, C.L. Giles, Searching the World Wide Web, Science, 280, p 98. 1998.
BrightPlanet’s “Deep Web White Paper”, 2000, at http://128.121.227.57/download/deepwebwhitepaper.pdf