Search

18
Search John Brissenden 19.01.10

Transcript of Search

SearchJohn Brissenden

19.01.10

Reading

Halavais (2009), esp. chapter 3Brin and Page (1998)Hargittai, E (2004) Do you "google"? Understanding search engine use beyond the hype. First Monday, volume 9, number 3 (March 2004),URL: http://firstmonday.org/issues/issue9_3/hargittai/index.html

What we will cover today

What is search?What is a search engine?How do search engines work?Search Engine OptimisationTensions, problems, issues

Searches (millions)

July 2008 July 2009Change

(%)Share (%)

Total internet

80,554 113,685 41 100

Google sites 48,666 76,684 58 67.5

Yahoo! sites 8,689 8,898 2 7.8

Baidu.com 7,413 7,976 8 7

Microsoft sites

2,349 3,317 41 2.9

eBay 1,223 1,723 41 1.5

NHN Corp. 1,243 1,526 23 1.3

Ask Network 929 1,291 39 1.1

Yandex 663 1,290 94 1.1

AOL 1,148 1,023 -11 0.9

Facebook 743 879 18 0.7

Notes:Audience includes Internet users, ages 15 and older, at home and work. It excludes Internet activity from public computers, such as Internet cafes, and access from mobile phones or PDAs.Source: comScore qSearch, 2009

Like many kinds of statistics, search engine popularity is very hard to measure reliably, and interpretations of available data vary...More confusing is the difference in how popularity is understood. Popularity can mean, at the most basic level, two very distinct things: a) percentage of users who turn to a search engine for their search needs; and, b) percentage of all search queries that are run on a particular search engine. Depending on one’s interest, this distinction is important.”

Hargittai (2004)

Library

Switchboard

Filing system

URL listURL list

CrawlersCrawlers

Raw archive

Raw archive

Indexing and

ranking

Indexing and

ranking

Database

Database

“Front end”

“Front end”

Query formQuery form

ResultsResults?

Conceptual organisation of the typical search engine. Halavais

(2009): 15

Gather information from web

pages

Gather information from web

pages

Determine relevance to search query

Determine relevance to search query

Accept search query and present results

Accept search query and present results

CRAWLER: •Compiles list of URLs (pages) to be visited•Saves copy of pages•Looks through for links to other pages•Adds new links to the bottom of the list

ARCHIVE:•Created by crawlers•Allows for further processing to obtain information about page, eg extraction and indexing of key terms

DATABASE:•Ranks pages according to relevance to query•Google uses PageRank, based on incoming links, to infer authority

Preferential attachment

New nodes prefer to attach to well-attached nodes.

Barabasi & Albert (1999)

ImplicationsThe more popular you are, the more popular you become

Niches are important

Older nodes (sites) tend to be more popular than new ones, but only on average

Money alone is not enough to guarantee future popularity or growth, but relevance and connection to already popular nodes can be

?

Different kinds of search

LearningDiscoveryRe-finding

HorizontalVerticalMobile

Attention is a finite resource.

“The most important change the web brings us is not this increase of information. The real change on the web is in the technologies of attention, the ways in which individuals come to attend to particular content.”

Halavais (2009): 69

Search Engine Optimisation (SEO)

Good design Spam

Glossary (Halavais, 2009: 196-7)

Google bowling: Making a competitor look like a search spammer by employing obvious spam techniques on their behalf

Google dance: reordering of PageRank after Google completes a new crawl

Googlebomb: An attempt to associate a key phrase with a given website by collectively using that phrase in links to that site

Googlejuice: An imaginary representation of the reputational currency provided by linking from one site to another, thereby improving PageRankKeyword stuffing: Hiding many unrelated keywords, or a large number of the same keyword, on a page to improve its representation in search results

Link farming: Creation of large numbers of pages with the single intent of linking to a page and thus increasing its apparent popularity

Link slutting/whoring: Creating specific content for a site etc with the aim of collecting inbound links from other sites

Link spamming: Use of links to deceive search engines as to the reputation of a target site