Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web...
Transcript of Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web...
1
Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web-search Engines
Robert F. Musco
Southern Connecticut State University
2
Introduction
Web search engines are currently the primary method used to find information available on
the web, but few people are aware of how they work, or which are more suited to different needs.
The purpose of this paper is to compare the search tools of four different search engines, and to
conduct a sample search, analyzing each site‟s results in terms of quantity, overlap, and
relevance. The first three search engines discussed, Google, Ask, and Gigablast, were chosen
because they are popular tools that each use their own proprietary software. The fourth, Dogpile,
was selected because it is a popular example of a metasearch engine, which compiles
information from four other search engines, including Google, Ask, MSN Live Search, and
Yahoo.
As the first step in understanding how these search engines operate, the documentation found
at each web site was evaluated. The following section gives an overview of the search tools
offered by each engine, and these findings are schematized in the comparative table in Appendix
2. It should be noted that Google and Ask‟s documentation was more complete than that of
Gigablast, while very little documentation was found on Dogpile. As a result, this overview is
based on the information each company provided about its search functions. If a function was not
mentioned, however, one cannot assume it is not available, since spot tests carried out
throughout the investigation showed that some important functions not listed were indeed
operational. The instances in which this occurred are mentioned in the paper. The actual
operation of each engine in test searches will be discussed in the third section.
Search Engine Overview: Comparison of Search Tools
Although Google is often viewed as the leader of “user-friendly” search engines, finding and
compiling basic information about how its search function works required visiting several
3
sections of the website. Beginning in the Advanced Search, one sees that queries can be limited
by “all these words, one or more of these words, this exact word or phrase, (none) of these
unwanted words”, language, format, site name, exact phrases, data created, usage rights, location
of key words, region, and numeric range (Google, 2009a). Two additional limiters give results
that are “similar to the page” specified, and “link(ed) to the page” specified.
More specific information is found in the Basic Search Help, such as advice about search
strategies, and basic tips of how the engine operates (Google, 2009c). For example, Google
ignores most punctuation, is case insensitive, and generally counts all words, though it may
ignore a word if it considers it irrelevant. The “More Search Help” area is the most complete
guide to searching terms (Google, 2009d). Boolean operators are permitted in the main search
box, and a full list of additional operators, such as wildcard and synonym symbols, is given.
Google results can also be limited to a series of subject headings, or performed over the entire
web.
Google‟s main search box uses the AND operator by default when a space is left between
two terms, though a quick test shows that if the AND is actually used, the number of hits may
change, particularly if the search terms are common words, such as [cat AND dog]. These results
indicate that unknown criteria come into play when using Boolean operators. Perhaps this is an
example of Google‟s intelligent search overriding some operator commands in an attempt to get
at the user‟s “obvious” intent, as mentioned in the documentation.
Google‟s “Technology Overview” page, reached through its “Corporate Information”
section, gives a clear, though simple explanation of the theory behind its search technology
(Google, 2009b). Since the explanation does not enter into technical detail, it would be
inadequate for a searcher with a great deal of technical expertise, but its discussion of strategies
4
for optimizing searches and using special operators is sufficient for the general user. Google
explains that its robot, which crawls the web on a regular basis, is fully automated, meaning that
the company cannot adjust page rankings. In fact, Google states that it does not accept payment
for inclusion or placement in its ranking.
The explanation also helps to explain why search results may not exactly match search terms.
Google uses its “PageRank™ algorithm”, a proprietary ranking methodology, to weigh a number
of factors, including the appearance and position of text on the page, in determining the relative
relevance of web pages. The algorithm calculates a page‟s importance relative to similar pages
by counting the number of linked pages that point to it. What is innovative about this method is
that the “pointer” pages which link to the retrieved page are themselves weighed for their
relevance by the number of pages that point to them in turn. Pages being evaluated are penalized
if they contain links to “link farms”, which are sites created purely for the purpose of raising
other pages‟ relevance. Sometimes search results may include pages that do not actually contain
the search terms, but are reached through links in other pages described by text that contains the
terms.
Google is undeniably one of the leaders in enhanced web search features that operate with
shortcuts directly from the search box. These “bells and whistles” are small applications that
give real-time information just by entering limited search terms, like automatic stock quotes by
entering the stock symbol and Fed Ex tracking information by entering the tracking number
(Google, 2009f).
Ask was founded as AskJeeves.com in 1996. Like Google, it allows searching from various
subject categories, and has features similar to Google‟s in its Advanced Search page, which
allows a query to be defined by “all the words, at least one of the words, the exact phrase, none
5
of the words”, language, specific domain, exact phrases, date modified, location of key words,
and region (Ask, 2009a).
Searches are case-insensitive, word order matters, and spelling is automatically corrected
(Ask, 2009c). Though some of the operators listed in Google are not specifically listed in Ask‟s
Advanced Search Tips, Ask has developed even more Boolean-like operators than Google,
which make it possible to limit searches not only to specific URLs, but also to specify date
ranges or pages with the search terms in the titles or in hyper-linked text (Ask, 2009b). Oddly, it
is never mentioned that AND is the default operator in ASK, though spot-testing shows that it is.
Like Google, Ask produces variable results when a blank space between terms is compared to
use of the AND operator, especially when the two terms are very common words.
The Site Features section in Ask also lists a number of enhanced shortcut features, though
their operation can be confusing, since some are reached through menu categories, while others
are activated with a keyword in the search terms (Ask, 2009e).
Ask also uses a proprietary algorithm, here called Expertrank™, that relies on a “clustering
concept of subject-specific popularity”, which ranks hits based on the number of pages which
link to a site, weighing which of those pointer pages are more authoritative (Ask, 2009d). The
method is not explained in more detail, though the description sounds very similar to Google‟s
PageRank™ technology.
Gigablast offers a stripped-down search engine whose appearance is less commercial than
Google‟s. Gigablast does not include any of the automatic shortcut functions for weather, stock
market, etc., found on Google or Ask, but it does offer subject directories, though they were not
functional during the end of February and beginning of March while this paper was being
researched.
6
Gigablast‟s Advanced Search can handle searches restricting queries to “all these words, any
of these words, this exact phrase, none of these words” (Gigablast, 2009a). Searches can also be
limited to a specific URL, a specific site, pages linked to a specific URL, and one can choose to
enable site clustering.
The “Query Syntax” section of Gigablast explains the Boolean operators permitted, which
are mostly comparable to those in Google and Ask (Gigablast, 2009b). The AND operator is the
default, but it is applied in a very specific way. For example, with two terms, preference is given
to incidences of both terms next to each other. If one wants to avoid giving preference to both
terms together, the operator [term .. term] can be used.
An OR operator actually gives preference to hits with both terms, which appears to be a
strategy to increase relevance. Parentheses are said to be optional, and indeed, a test shows that
both AND and a blank when used without parentheses will nest the two AND terms before
applying another operator, such as OR, afterwards. For example, a search with [soup AND shoes
OR train] yielded results whose top ten were devoted to national train companies, with no
instances of “soup” or “shoe”, since the algorithm obviously gave greater weight to the OR term
it considered most important. When parentheses are used, however, they define the order in
which the operators are applied, instead of the default left-to-right “AND-first” logic that appears
to be used here. Thus, a sample of the hits from [soup AND (shoes OR train)] included pages
with “soup” and “shoes”, or “soup” and “train”, and even some hits with only the word “soup”.
Gigablast does not provide operators for limiting documents by date as Ask does, but one can
restrict hits by formats, such as .doc or .xml. There is one unusual operator worthy of mention,
which searches first by a primary term, than ranks all hits by a second term. Gigablast offers no
explanation of how its algorithm functions.
7
Dogpile is a metasearch engine that links results from Google, Yahoo, MSN Live Search,
and Ask. Dogpile includes sponsored links mixed among the results, though each is labeled as
such. Dogpile‟s Advanced Search shows search boxes which handle the operators “all these
words, any of these words, the exact phrase, none of these words”, in a specific domain, and
specific language (Dogpile, 2009a). The main search page provides tabbed categories for
limiting searches to specific formats (images or music), or subjects (news, yellow pages, and
white pages).
The “Metasearch 101” section explains the rationale for a metasearch engine (Dogpile,
2009c). A link can be found to a self-study, carried out in collaboration with University of
Pittsburgh and the Pennsylvania State University in 2007, that found that less than one percent of
first-page results on a given search query overlapped among the four major search engines
(Dogpile, 2008). The implication is that with so little overlap, any claims of highly relevant
results by competitors are suspect. The “InfoSpace” section mentions Dogpile‟s InfoSpace
proprietary technology as the software behind the search, but does not explain how the search
engine works, or how it ranks results (Dogpile, 2009b). A quick test shows that the AND
function is at least partly a default, but in contrast to the other engines discussed here, Dogpile
does not appear to allow Boolean operators, and only provides refined searches through its
advanced search page. Indeed, using AND as an explicit operator with two terms gives
inconsistent results, since the AND appears bolded in the results, suggesting that it is interpreted
not as an operator, but at least sometimes as an actual search term. Using OR in a search with
two terms actually reduces hits, suggesting that OR is not a Boolean operator. Performing a
search from the advanced search box, however, it was possible to see that when a search is done
with an exact phrase, the phrase then appears in the general search box with apostrophes placed
8
around it, and a minus sign before a “none of these words” term. Subsequent testing shows that
apostrophes and minus signs are indeed active Boolean operators.
Dogpile is not consistently case-insensitive, in contrast to the other three engines, and does
not always ignore “stop words”. The most notable difference seen when using Dogpile is that the
total number of hits in a search is not indicated on the page, and that the number of hits returned
can frequently be several orders of magnitude less than products like Google.
Comparison of Search Results among Search Engines
To analyze how the different search engines handle queries, a fairly limited topic
containing multiple terms was chosen. The goal of the query was to find out if research shows
that there is a correlation between playing violent video games and student achievement in high
school students. The search was begun with general terms, and refined in four steps. Each
separate search is indicated by a title of the search query, which is set off by brackets to frame
exactly what was entered in the search boxes, not to be confused with nesting parentheses.
Search Query: [student AND achievement]
Searching was begun with general terms, to give an idea of how each engine handled the
AND operator [student AND achievement]. Google produced more than 38,000,000 hits,
covering broad areas such as technology and student achievement, analysis of student
achievement, and public policy related to student achievement. Of the top 20 hits, almost all
were relevant in that they discussed the general issue of student achievement. Several documents
even addressed the specific issue of factors related to student achievement, such as teacher
quality, poverty, class size, library use. Most of the sampled results were from the domains .org,
.gov, and .edu though a few were from .coms, such as journal databases. Hits included pages
9
with the terms as a single phrase or separated in the document, and included the plural
“students”, showing that alternate forms of the terms will be returned.
Searching for the same two terms without the AND raised the number to 200,000,000, but
at the same time seemed to favor documents that had incidences of the terms as a phrase. One
possible explanation for the 6-fold increase in hits is that the search default without the written
AND is actually acting partially as an OR, thus including many pages with only one of the terms.
Due to the volume of hits, it was difficult to confirm this theory.
The [student AND achievement] search yielded fewer hits on Ask, roughly 10,000,000.
The sponsored results, which appeared at the top, were irrelevant, but did not affect precision,
because they were not counted in the number of hits retrieved. Since all the sampled hits had to
do with the general topic of student achievement, they were relevant. The hits included the
search terms singly or as a phrase, and when the search was repeated without the AND operator,
the number of results was unchanged, though the ranking shifted a bit.
Gigablast retrieved 505,000 hits, many fewer than the other two search engines, and
included the search terms separately and together. A search without the AND operator increased
the hits slightly, about 7%, and seemed more likely to rank instances of the full phrase first. Most
results could be considered relevant.
Dogpile gave 71 hits, a huge reduction from all other search engines. Because sponsored
results are counted as hits, and these were largely irrelevant, the precision rate dropped to 65%
(of the first 20 surveyed). AND was not tested as an operand, since it was already determined
that Dogpile does not support Boolean searches.
Search Query: [“student achievement” AND “high school students”]
10
In an effort to reduce the hits to an order of thousands, the next search query narrowed the
focus using two exact phrases framed in quotes, separated by an AND operator. Google returned
260,000 hits, and all of the first 20 appeared relevant to the query. Ask, by contrast, returned
almost 60,900 hits, whose first 20 could also be considered relevant in light of the search terms.
Though Ask appeared to interpret the exact phrase operator correctly, single words were
highlighted in the results, and looking at some individual hits showed that exact phrases may not
actually be part of the document body, but may be taken from surrounding text or larger page
headings or topics.
Gigablast, by contrast, returned 1,100,000 hits, far more even than the previous more
general search query, implying that either exact phrases are not always respected, or that OR is
somehow in effect. Once again, due to the sheer number of returns, the exact reason for this
anomaly could not be determined.
Carrying out the search in Dogpile required some adjustments in order to use two separate
exact phrases, because the Advanced Search box does not allow more than one exact phrase, and
Boolean operators seem not to work in the general search box. However it was observed that
when an exact phrase search was done in the Advanced Search box, the terms appeared in the
general search box in apostrophes. A trial was conducted in the general search box using the two
separate exact phrases, each enclosed in apostrophes (without the AND operator), which seemed
to work correctly. It returned 64 hits. Once again, the number of irrelevant sponsored results
clogged the page and lowered the precision rate, but the non-sponsored results surveyed were
appropriate to the query. Further testing showed that quotes appeared to function as a Boolean
operator to create exact phrases as well.
11
This search brought to light a problem with all the search engines‟ Advanced Search
functions. Though all four companies provide boxes in their Advanced Search window to allow
queries with the equivalent of AND, OR, and exact phrase, a single exact phrase box limits
queries to only one exact phrase at a time (except for Gigablast, which has two spaces). In
attempting to find a fix for this issue, the previous search [“student achievement”AND “high
school students”] was repeated in Google. From this results page, the Advanced Search was
opened to show how those terms had been placed. The second phrase appeared in the “all these
words” box, separated by hyphens, and the first phrase appeared in the “this exact wording” box.
The search was then retried in the general search box using these operators (second phrase with
hyphens, and the second with quotes), as in [“student achievement” AND high-school-students],
with and without nesting parenthesis. The search produced double the number of hits, 500,000.
Many of the sampled hits were relevant, though the first results were inconsistent in terms of
respecting the two separate phrases. This outcome shows that words joined by hyphens are not a
useful substitute for quotes as an exact phrase operator in Google. This could lead one to
conclude that more than one exact phrase cannot be used in the Advanced Search area, and that
unless Boolean formulas can be used to AND or OR multiple exact phrases in the general search
box, such a search cannot be done. Additionally, if Boolean operators are used in the general
search box, it is usually not possible to combine them with additional limiters, such as date, or
language, from within the Advanced Search box, because searching first from the general search
box, and then repeating the search using the terms as they were automatically placed in the
Advanced Search section produced inconsistent results.
12
At this point, it seemed advisable to investigate how other search engines handle the shift
between a search in the general search box, and a search using terms the way they automatically
appear in the Advanced Search box after the search is performed from the general search box. In
Ask, the query in the general search box [“student achievement” AND “high school students”]
appeared in Advanced Search with the Boolean operators and the terms completely intact within
the ALL THE WORDS box. Toggling back to the general search preserved the search results.
In Gigablast, the Advanced Search area is not accessible from the results page, and repeating the
identical search from the Advanced Search page within the “all of these words” box produced a
blank page. Dogpile, as previously mentioned, will reproduce both quotes as well as the
apostrophe operators in the “all of these” box in the Advanced Search Page after an exact phrase
search is done from the general search page.
Returning to Google to test whether the same operators work within the Advanced Search
page, exact phrases with quotes were tested inside the “all these words” box, which gave the
same results as the original search from the general search box, though toggling back a second
time to Advanced Search caused the results to double again. It should be noted that the other
three search engines results shifted very slightly when toggling back and forth from general to
Advanced searches.
Search Terms: [research AND “student achievement” AND “high school students” AND
"violent video games"]
Some additional terms were added to limit the results to research, or references to research,
about the relationship of playing violent video games to student achievement in high school
students. Google returned 228 hits, the first of which were appropriate to the search terms given.
Ask returned 1420 hits, but of the top ten not all included only exact phrases. Some terms were
13
found in titles or tabs unrelated to the body text, and others were not found on the page at all,
which suggested that these pages were returned because the phrases were found in linking
documents. Gigablast returned 97 hits, with results similar to those of Ask in terms of its
inconsistency in respecting exact phrases. Gigablast had several broken links in the top 10 alone,
which indicates that it is updated less frequently than other engines. Dogpile returned 47 hits,
and actually seemed to do the best job of respecting exact phrases, when compared to Ask and
Gigabyte.
Viewing a sampling of the hits in their entirety in the four search engines showed another
effect of the search using multiple terms. The hits returned were often pages with several articles
listed or abstracted on a single page, each one with one or more of the exact phrases. This makes
it possible for a result to be relevant from the point of view of a search engine, but useless to the
researcher, since the search terms are not always related in a single document.
It was also seen that many “relevant” hits tended to focus on the relationship between video
games and aggressive behavior, (which may be the predominant connection in the literature) and
not video games and student achievement. This suggested that terms related to aggression should
be excluded to further limit the field. The risk, of course, is that documents that address the topic
of violent video games as a factor influencing school achievement may also mention aggression,
and some useful documents may be missed, but it seemed a useful experiment.
Search Terms: [research AND “student achievement” AND “high school students” AND
"violent video games" -aggression]
Excluding the search term in Google returned 118 hits. Analyzing the first 30 by viewing
each page individually, no more than 4 out of 30 hits could be considered truly relevant in
answering the research query, for a precision rate of 13%. The reason for this is perhaps at the
14
core of why searching the web is very limited for some purposes. Almost all the pages were
related in some way to education, but they invariably met the search conditions of matching
multiple exact phrases by finding pages in which the phrases appeared in unrelated areas. The
most common kind of page returned was a collection of short news articles, or abstracts and lists
of articles and documents, in which each abstract contained at least one, but rarely all, of the
terms. These hits generally did not contain any information relevant to the intent of the query,
though the pages retrieved met the conditions of the search.
Ask returned 35 hits. Of the first 30, there was a 63% overlap with Google. Viewing each
of the first 30 pages gave a relevance of approximately 10%. When analyzing the sample pages
for relevance individually, several odd results were seen. For example, the abstract of one
particular page seemed to be relevant, since it mentioned the following phrase: “the frequency
and type of video games played appears to parallel risky drug and alcohol use”. When the link
was opened, however, neither the abstract phrase nor the original search terms could be found on
the page itself. Where the text actually came from is a mystery, and may have been included in
the page metadata, or perhaps a link from a different page.
Gigblast returned only 7 hits, 4 of which were also returned by Google, making it difficult
to calculate an overlap rate. The relevance rate was approximately 1 out of 7, or 14%. 4 of the 7
pages retrieved overlapped with Google.
Dogpile returned 28 hits, with a 68% overlap with Google, a rate similar to Ask‟s. Looking
at each page shows a precision of rate of 14%, not very different from the other products.
Conclusion
While all four search engines were able to handle a multi-termed query adequately with
AND, NOT, and exact phrase operators, the “intelligent” searching algorithms did not always
15
result in the best resources being retrieved. Chief among the reasons may be the fact that these
products often are not absolute in respecting exact phrases. The ability to look beyond the strict
exact phrase is a valuable feature, especially when it comes to finding logical alternate forms of
words within exact phrases, as in “high school students” vs. “high school student”, and “violent
video games” vs. “violence in video games”. Without such capability, perhaps no or very few
exact hits would be found exactly as queries, and the searcher would be required to know in
advance what combinations of search terms actually exist in documents. On the other hand, the
way in which the algorithms return unpredictable blends of the requested terms sometimes leads
to a lower precision. On the other hand, perhaps the exact phrases were not respected because
there were very few incidences of pages in which they all occurred, and the search engines were
making the next best choice.
It is also likely that a real flaw in this exercise lay in the search query itself, which
contained a few implicit assumptions. The first was that some of the available literature would
express an association between student achievement and violent video games, which may not be
the case. The second was that removing “aggression” from the query might lead to more accurate
hits. The results suggest that many of the current sources about students playing violent video
games do so in the context of discussions about aggressive behavior, with or without any
mention of student achievement. It is entirely possible that excluding this term eliminated useful
material.
A second possible failing is the relationship between the kind of query chosen and the
intrinsic nature of much of the information found by web search engines. As previously
mentioned, web search engines find matching or related terms on a page, in metadata, and in
page links, and do not seem reliable in distinguishing between lists of unrelated articles or
16
abstracts on a page, and a single document in which all the search conditions apply, in spite of
the highly touted algorithms, which purport to carry out an instant linguistic analysis. While one
of the engines does allow one to restrict search terms within a certain distance of each other, that
option was not available on all engines. One conclusion that can be drawn is that this particular
search was not appropriate to a web search engine, and would have been better suited to an
article database or an OPAC, both of which generally match terms with single sources. Perhaps
the most important lesson learned here is this cautionary tale for any researcher who may believe
that the best information is always found on the web.
Since the abstracts retrieved generally did not give enough information to make a judgment
about the relevance of hits obtained, each of the top 30 pages had to be viewed individually.
Google showed an approximate relevance rate of 13%, Ask of 10%, Gigablast of 14% (from a
very small sample), and Dogpile of 14%. These relative rates, however, are possibly inaccurate.
For example, a particular page from the Center on Media and Child Health, returned by Ask, was
not relevant because it did not connect the concepts in the search query, but following a link on
that page for “violent video games” brought up a page of extremely relevant studies discussing
the relationship to school performance. Finding the link seemed more fortuitous than an actual
instance of a relevant result from the search engine, though the general site was a promising
source.
The overlap in results returned among the products is as follows: Google and Ask—63%,
Google and Gigablast—57% (unscientifically extrapolated from 4 out of the total 7 results),
Google and Dogpile—68%, Ask and Gigablast (extrapolated from 3 out of 7 results), Ask and
Dogpile--64%, and Gigablast and Dogpile—43% (extrapolated from 3 out of 7 results).
17
This particular exercise does not make it possible to designate one search engine as
preferable to another, since the precision was rather low overall. What does seem clear to this
researcher is that all the engines tested use complicated strategies for determining relevance,
making it difficult to decipher the exact functioning of the algorithm from these “black box”
results. Doing so would require analyzing in detail the entire contents of each page returned. The
fairly high rates of overlap between Google, Ask, and Gigablast, and their comparable precision
suggest that these products have access to many of the same resources. Google is somewhat
faster to respond than the other engines tested, especially if Google‟s Chrome browser is used.
Dogpile‟s inability to use many Boolean operators sets it a bit outside the mainstream in terms of
ease of use, while Gigablast‟s behavior with exact phrase terms makes it quirky and somewhat
unpredictable.
18
References
Ask. (2009a). Advanced search. Retrieved 11 March, 2009, at
http://www.ask.com/?o=0&l=dir
Ask. (2009b). Advanced search tips. Retrieved 11 March, 2009, at
http://about.ask.com/en/docs/about/adv_search_tips.shtml
Ask. (2009c). Ask.com search tips. Retrieved 11 March, 2009, at
http://about.ask.com/en/docs/about/search_tips.shtml
Ask. (2009d). Ask search technology. Retrieved 11 March, 2009, at
http://about.ask.com/en/docs/about/webmasters.shtml
Ask. (2009e). Site Features. Retrieved 11 March, 2009, at
http://about.ask.com/en/docs/about/site_features_a11.shtml
Dogpile. (2009a). Advanced search. Retrieved 11 March, 2009, at
www.dogpile.com
Dogpile. (2008). Different Engines, Different Results Web: Searchers Not Always Finding What
They’re Looking for Online: A Research Study by Dogpile.com. Retrieved February 28, 2009 at:
http://www.infospaceinc.com/onlineprod/Overlap-DifferentEnginesDifferentResults.pdf
Dogpile. (2009c). Infospace. Retrieved 11 March, 2009, at
http://www.infospaceinc.com/ourstory/default.aspx
Dogpile. (2009d). Metasearch 101. Retrieved 11 March, 2009, at
http://www.dogpile.com/rescuefctb/ws/metasearch/_iceUrlFlag=11?_IceUrl=true
Gigablast. (2009a). Advanced search. Retrieved 11 March, 2009, at
http://gigablast.com/adv.html
Gigablast. (2009b). Query syntax. Retrieved 11 March, 2009, at
http://gigablast.com/help.html
Google. (2009a). Advanced Search. Retrieved 11 March, 2009, at
http://www.google.com/advanced_search?hl=en
Google. (2009b). Corporate information: Technology overview. Retrieved 11 March, 2009, at
http://www.google.com/corporate/tech.html
Google. (2009c). Google search basics: Basic search help. Retrieved 11 March, 2009, at
http://www.google.com/support/websearch/bin/answer.py?answer=134479&topic=351
19
Google. (2009d). Google search basics: More search help. Retrieved 11 March, 2009, at
http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861
Google. (2009f). Search Features. Retrieved 11 March, 2009, at
http://www.google.com/intl/en/help/features.htm
20
Appendix 1
Comparison of Search Results of Four Search Engines
Search query 1: [student AND achievement] Google.com Ask.com Gigablast.com Dogpile.com
Number of hits 38,700,000 9,480,000 429,404 71
Notes Without AND operator up to
200,000,000 and favors whole
phrase, plural form found
Without AND operator little
difference,
Without AND operator up to
538,000 and favors whole
phrase,
AND not used, not an operator,
sponsored hits irrelevant and
included in count
Search query 2: [“student achievement”AND “high school students”] Google.com Ask.com Gigablast.com Dogpile.com
Number of hits 260,000 60,900 9,586 64
Notes Quotes work correctly for exact
phrase, without AND operator
little difference,
Quotes work correctly, but
single words are also
highlighted, and phrases may not
be part of document body
Quotes work correctly (though
some attempts a little buggy),
but single words are also
highlighted
AND not used, used apostrophe
to include two exact phrases
because only one exact phrase
box in advanced search. Search
results not exact (changed
plurals), sponsored results lower
precision
Search query 3: [research AND “student achievement” AND “high school students” AND "violent video games"] Google.com Ask.com Gigablast.com Dogpile.com
Number of hits 228 1420 97 47
Notes Noted that link is made between
student achievement and
aggression
Quotes work correctly, exact
phrase not respected in all
results, single words are also
highlighted, and phrases may not
be part of document body. Tends
to show more exact phrases in
abstracts
Quotes work correctly, and
phrases may not be part of
document body, single words
are also highlighted
Some results to not have all
phrases, several links are broken.
AND not used, used apostrophe
to include two exact phrases (see
previous search). Appeared more
consistent in respecting exact
phrases. Sponsored results lower
precision.
21
Search query 4: [research AND “student achievement” AND “high school students” AND "violent video games" -aggression]
Google.com Ask.com Gigablast.com Dogpile.com
Number of hits 117 35 7 28
Relevance of top
30 hits
4/30 (13%) 3/30 (10%) 1/7 (14%) 4/28 (14%)
Notes Many hits are not single
documents, but rather lists of
articles, abstracts, blog headings,
etc.
Many hits are not single
documents, but rather lists of
articles, abstracts, blog headings,
etc.
Many hits are not single
documents, but rather lists of
articles, abstracts, blog headings,
etc.
MANY fewer results than other
products.
Quotes used, no AND.
In general search when minus
sign used to exclude term
“aggression” it does not appear
as excluded in the Advanced
Search, but when the entire
query done from within the
advanced Search with quotes for
exact phrases and no AND
operator, then toggling back to
general search shows the
excluded term with a minus sign.
Overlap Google/Ask 17/30 (63%)
Google/Gigablast 4/7
Google/Dogpile 19/28 (68%)
Ask/Gigablast 3/7
Ask/Dogpile 18/28 (64%)
Gigablast/Dogpile 3/7
22
Appendix 2
Comparison of Functionality of Four Search Engines
The information in the chart below was culled from the four search engines‟ documentation. Each specific function has not been tested
to determine if it works, except for the search queries carried out and analyzed in the body of this paper.
Legend for chart:
Normal text indicates that the feature is explicitly listed as operable.
(Grayed-out) Indicates that the feature is not explicitly listed as being operable, and has not been tested.
* Indicates that the feature is not explicitly listed as being operable, but has been determined to be operable through a trial.
[ ] Brackets surround exact search query as entered in search box.
___ Blanks indicate the search query terms (operands) used with Boolean operators.
1. Default is AND, but preference given to both terms next to each other. To avoid giving preference to both terms together,
separate operands with [term .. term]. Both AND and a blank when used without parentheses will nest the two AND terms
before applying another operator afterwards, like OR. Says parentheses are optional, but that is confusing in some cases.
2. OR operator gives preference to hits with both terms
3. Default is AND, but if the operator AND is actually used, number of search results may change.
Search engine→ Google.com Ask.com Gigablast.com Dogpile.com
Features
documented ↓
Search categories Includes categories to limit
searches (eg.: Web, Video,
Image, Groups, Products,
Blogs, News, and MANY more)
Includes categories to limit
searches (eg.: Web, Video,
Image, Groups, Products,
Blogs, News, and MANY more)
Includes categories to limit
searches (Arts, Games, Kids and
Teens, etc.)
Includes categories to limit
searches (eg.: Web, Video,
Image, Music, News, Yellow
Pages, White Pages)
Boolean search? Allows special operators (see
below)
AND operator by default
-Allows special operators (see
below)
- AND operator by default*
-Allows special operators (see
below)
-AND operator by default1
Does not appear to consistently
allow special operators5
-AND operator by default*
Special query
operators allowed
[ ___ AND ___ ] to show both
terms3
[+__ ] to include commonly
ignored words, and to search
-[ ___ AND ___ ] to show both
terms*3
-[+__ ] to include commonly
ignored words, and to search
-[ ___ AND ___ ] to show both
terms1
[+__ ] to include commonly
ignored words, and to search
[ ___ AND ___ ] to show both
terms3
[+__ ] to include commonly
ignored words, and to search
23
word precisely (no space,
precede by space)
related:(name of website)
[ -___ ] to exclude words (no
space, precede by space)
[ OR ___ ] to show at least one
of two separated terms (include
space)
[~__ ] synonym
[__*__ ] wildcard, whole words
only
[ “ __”] only exact word (same
as +), in phrase, exact words in
exact order
[ ___ site:(site name)] term
appears is specific site
word precisely (no space,
precede by space)
related:(name of website)
-[ -___ ] to exclude words (no
space, precede by space)
-[ OR ___ ] to show at least one
of two separated terms (include
space)
[~__ ] synonym
[__*__ ] wildcard, whole words
only
-[ “ __”] only exact word (same
as +), in phrase, exact words in
exact order
-[ ___ site:(site name)] term
appears is specific site
-[___ intitle:(title name)] term
must appear in page title
-[___ inurl:(url name)] term
must appear in name of the URL
[___ last:(period of time)] term
appears in pages during the
specified time period (subject to
limitations)
[___ afterdate:yyymmdd] term
appears in pages after specified
date
[___ beforedate:yyymmdd] term
appears in pages before specified
date
[___ betweendate:yyymmdd,
yyymmdd] term appears in
pages between specified dates
[___ inlink:(link address)] term
must appear in anchor link
word precisely (no space,
precede by space)
related:(name of website)
-[- ___ ] to exclude words (no
space, precede by space)
[AND NOT ___ ] to exclude
words
-[ OR ___ ] to show at least one
of two separated terms (include
space)2
[~__ ] synonym
[__*__ ] wildcard, whole words
only
-[ “ __”] only exact word (same
as +), in phrase, exact words in
exact order
-[ ___ site:(site name)] term
appears is specific site
-[ ___ title:(title name)] term
must appear in page title
-[ ___ suburl:(url name)] term
must appear in name of the URL
[___ last:(period of time)] term
appears in pages during the
specified time period (subject to
limitations)
[___ afterdate:yyymmdd] term
appears in pages after specified
date
[___ beforedate:yyymmdd] term
appears in pages before specified
date
[___ betweendate:yyymmdd,
yyymmdd] term appears in
pages between specified dates
-[___ link:(link address)] search
results link to webpage
word precisely (no space,
precede by space)
related:(name of website)
[ -___ ] to exclude words (no
space, precede by space)*
[ OR ___ ] to show at least one
of two separated terms (include
space)
[~__ ] synonym
[__*__ ] wildcard, whole words
only
[ “ __”] exact words in order*
[ „ __‟] exact words in order*
[ ___ site:(site name)] term
appears is specific site
24
-[___ type:(doc, xls, etc.) search
results must be in given format
-[___ | ___] searches for first
term, then for second, ranks
according to second
-[ ___ ip:___] searches for term
in numerical IP address
Advanced search
form
Allows search to be defined by
AND, OR, NOT, language,
format, site name, or exact
phrases, data created, usage
rights, location of key words,
region, and numeric range. Also
“similar to the page” and “link to
the page”.
- Allows search to be defined by
AND, OR, NOT, language,
specific domain, or exact
phrases, date modified, location
of key words, region,
-Allows search to be defined by
AND, OR, NOT, site name,
exact phrase, a URL, pages
linked to URL, site clustering,
Allows search to be defined by
AND, OR, NOT, exact phrase,
specific domain, specific
language.
Method of treating
words (truncating,
case, plurals,
spelling, stop
words, etc.)
All words count, though some
terms ignored if results deemed
relevant
Case Insensitive
Punctuation ignored with certain
exceptions; signs with “obvious”
meaning in term, such as:
[$__ ] dollar sign indicates
price,
[__-__ ] hyphen joins two
closely-related words
[__ _ __ ] underscore can
connect two words
Uses synonyms automatically
Stop words ignored (a, for, the,
etc.) sometimes (logic applied)
Will offer corrected spelling
All words count, though some
terms ignored if results deemed
relevant
-Case Insensitive
Punctuation ignored with certain
exceptions; signs with “obvious”
meaning in term, such as:
[$__ ] dollar sign indicates
price,
[__-__ ] hyphen joins two
closely-related words
[__ _ __ ] underscore can
connect two words
Uses synonyms automatically
Stop words ignored (a, for, the,
etc.) sometimes (logic applied)
-Will offer corrected spelling
-Word order matters (should
follow natural language)
“Natural language technology”
offers suggestions
All words count, though some
terms ignored if results deemed
relevant
Case Insensitive
Punctuation ignored with certain
exceptions; signs with “obvious”
meaning in term, such as:
[$__ ] dollar sign indicates
price,
[__-__ ] hyphen joins two
closely-related words
[__ _ __ ] underscore can
connect two words
Uses synonyms automatically
Stop words ignored (a, for, the,
etc.) sometimes (logic applied)
-Will offer corrected spelling
-All words count, though some
terms ignored if results deemed
relevant*
-Case sensitive (in some cases)*
Punctuation ignored with certain
exceptions; signs with “obvious”
meaning in term, such as:
[$__ ] dollar sign indicates
price,
[__-__ ] hyphen joins two
closely-related words
[__ _ __ ] underscore can
connect two words
Uses synonyms automatically
-Stop words NOT always
ignored (a, for, the, etc.)
sometimes (logic applied)
-Will offer corrected spelling
Additional Spell-checking -Spell-checking* -Spell-checking -Spell-checking
25
features (Bells and
Whistles)
Current weather
Current stock-quotes
Current time
Current scores
Book search
Unit conversion
Synonyms
Definitions
Business by location
Movies by location
Real estate by location
Flight info
Currency conversion
Maps
Package tracking
Patent numbers
Area codes location
Etc.
-Current weather
-Current stock-quotes
-Current time*
-Current scores*
Book search
-Unit conversion
-Synonyms
-Definitions
- Business by location
- Movies by location
-Real estate by location*
Flight info
-Currency conversion
-Maps
-Package tracking*
Patent numbers
Area codes
Natural language questions.
-TV listings
-local events
Etc.
Current weather
Current stock-quotes
Current time
Current scores
Book search
Unit conversion
Synonyms
Definitions
Business by location
Movies by location
Real estate by location
Flight info
Currency conversion
Maps
Package tracking
Patent numbers
Area codes location
Etc.
Current weather
Current stock-quotes
Current time
Current scores
Book search
Unit conversion
Synonyms
Definitions
Business by location
Movies by location
Real estate by location
Flight info
Currency conversion
Maps
Package tracking
Patent numbers
Area codes location
Etc.
Search Engine
advice
Keep it simple, start with fewer
words
Use words you guess most likely
to appear on page
Choose more descriptive words
for specific needs
“Search is rarely absolute … a
variety of techniques is used”
Use words you guess most likely
to appear on page
Choose more descriptive words
for specific needs
Use natural word order
Search one question at a time
Use spaces between words (test
if unsure)
Use the most direct words
possible.
Refine searches (use refined
categories offered)
Use natural word order
Use spaces between words (test
if unsure)
Use search categories next to
search box.