. democratizing search. 1 / 43 Title Bernhard Rieder Université de Paris VIII - Vincennes...
-
date post
18-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of . democratizing search. 1 / 43 Title Bernhard Rieder Université de Paris VIII - Vincennes...
. democratizing search
. 1 / 43 Title
Bernhard Rieder
Université de Paris VIII - Vincennes Saint-Denis
Laboratoire Paragraphe
Democratizing SearchConcepts and Challenges
Deep Search
World-Information Institute
8 / 11 / 2008
. democratizing search
. 2 / 43 I - Search engine basics
Search engine basics rehashed
Search engines have emerged as the dominant pathways into
the depths of the Web for 1.5 billion Internet users.
After email, search is the second most frequent activity online.
Search engines play an important role in shaping which sites
are visible on the Web and which sites are not.
. democratizing search
. 3 / 43 I - The problem with search
The problem with search
"Search is broken." [ Jimmy Wales 2008 ]
The most common points of critique:• Crawling and ranking are not transparent ( black box )
• Might favor "commercial" sites
• Smaller sites have little visibility
• Susceptible to manipulation ( SEO )
• Results are "read only"
• Google as monopoly gatekeeper
A "quick fix" is quite improbable.
. democratizing search
. 4 / 43 I - Search as a strange object
Search as a strange object
Web search is a phenomenon that is not easy to categorize
and conceptualize.
• The Web is an information space unlike any other
• Search can be done using different techniques
• It is part of a variety of practices
What is the closest antecedent? The library catalogue? Mass
media? Guidebooks? Domain experts?
. democratizing search
. 5 / 43 I - Searching or Filtering?
Search is not search!?
Web search is part of a larger shift from information scarcity to
abundance.
"The task is not to design information-distributing systems but
intelligent information-filtering systems." [ H. Simon 1969 ]
Search engines are not systems of classification, they are
machines that make judgments on the importance of pieces of
information relative to a query.
. democratizing search
. 6 / 43 I - SCREEN: Yahoo 1997
. democratizing search
. 7 / 43 I - SCREEN: AltaVista 1996
. democratizing search
. 8 / 43 I - The search pipeline
The search pipeline
A search engine includes several distinct stages:
Crawler Index Search& Rank
GUI
. democratizing search
. 9 / 43 I - Some basic ranking principles
Some basic ranking principles ( content ranking )
query: "house" rank by: number of occurrences
query: "house AND hill" rank by: closeness"there is a house on the hill"
"from my house I can see a beautiful hill"
query: "house" rank by: location in document"<title>house</title>"
"<p>house<p>"
query: "house" rank by: URL"http://www.house.com""http://www.villa.com"
. democratizing search
. 10 / 43 I - Link analysis
The dominant paradigm: recursive link analysis
. democratizing search
. 11 / 43 II - The Web as scale-free network
The Web as scale-free network
. democratizing search
. 12 / 43 II - Link analysis and the logic of the hit
Link analysis and the logic of the hit
Link analysis projects the hypertext graph as a hierarchical
list that strongly favors hubs and networks of hubs.
Growth principle: "preferential attachment"• "cumulative advantage"
• "the rich get richer"
• "logic of the hit"
"We will have to realize that hierarchies fulfill a semantic
function and that semantic systems are hierarchic by
principle." [ Hartmut Winkler 1997 ]
. democratizing search
. 13 / 43 II - CITATION: Best friend
"So what’s our straightforward
definition of the ideal search engine?
Your best friend with instant access to
all the world’s facts and a photographic
memory of everything you’ve seen and
know. That search engine could tailor
answers to you based on your
preferences, your existing knowledge
and the best available information."
- Marissa Mayer, Google VP
. democratizing search
. 14 / 43 II - Current guiding principles
Current guiding principles
The two dominant guiding principles currently are:
• popularity ( the logic of the hit )
• convenience ( personalization )
. democratizing search
. 15 / 43 II - Where to look for alternative principles?
Where can we look for alternative principles?
Web search is a new phenomenon; it can nonetheless be
compared to adjacent domains.
• Libraries and documentation ( freedom of access )
• Media and journalism ( neutrality, plurality )
• Cultural policy - "exception culturelle" ( diversity )
• Community organization ( participation )
• Liberal democracy ( transparency, accountability )
. democratizing search
. 16 / 43 III - CITATION: Democracy!
"Democracy! Bah! When I hear that
word I reach for my feather Boa!"
- Allen Ginsberg
. democratizing search
. 17 / 43 III - Two concepts of democracy: community
Democracy as community
"The second big element of Web 2.0 is democracy. We now
have several examples to prove that amateurs can surpass
professionals, when they have the right kind of system to
channel their efforts. [ … ] Another place democracy seems to
win is in deciding what counts as news. I never look at any
news site now except Reddit." [ Paul Graham 2005 ]
"Democratizing search" would mean letting users rank results.
The community decides which information is best ( markers:
votes, clicks, pageviews, etc. ).
. democratizing search
. 18 / 43 III - CITATION: Wales on bias
"The idea that all 'selection' is equally
'biased' is fallacious. We intuitively
understand this when we talk about
other forms or writing or journalism;
we need to understand it for *this*
form of journalism as well."
- Jimmy Wales
. democratizing search
. 19 / 43 III - Wikia Search
Wikia Search
Wikia Search tries to apply the Wikipedia principle to ranking
search results, following the NPOV principle.
• All technology is open source
• Crawling is distributed using GRUB
• Currently in an experimental stage
Wikia Search follows a series of explicit principles:• Transparency
• Community
• Quality
• Privacy
. democratizing search
. 20 / 43 III - SCREEN: Wikia Search Abortion
. democratizing search
. 21 / 43 III - SCREEN: Wikia Search McCain
. democratizing search
. 22 / 43 III - Two concepts of democracy: society
Democracy as society
Large-scale collective governance based on bureaucratic
institutions limited by checks and balances.
"Democratizing search" could mean adapting search to the
requirements of liberal democracy.
Web search would serve the goal of informing citizens on the
different courses of action.
. democratizing search
. 23 / 43 III - What should we strife for?
What should we strive for?
Reforming the search landscape is a normative project that
would produce winners and losers.
• Transparency => Plurality of opinion
• Community => Society
• Quality
• Privacy
The goal would be having a variety of high-quality search
applications that deliver different sets of results.
. democratizing search
. 24 / 43 III - Democratizing search: main challenges
Democratizing search: main challenges
Market entry into the search market has become difficult.
• Cost for infrastructure / datacenter
• Difficulty finding quantifiable markers for ranking
• Changing user habits / software defaults
Every part of the search pipeline has specific costs and
specific engineering challenges. In order to have very fast
end-user performance, there has to be sophisticated load
balancing and an elaborate datacenter architecture.
. democratizing search
. 25 / 43 III - Overview
Democratizing search: overview
User sideeducation
Provider sideantitrust measures
financial aid
Interaction between user and serviceinterface / algorithm additions
search APIs
search sandbox
. democratizing search
. 26 / 43 III - CITATION: Mind of god
"The perfect search engine would be
like the mind of God."
- Sergey Brin
. democratizing search
. 27 / 43 III - A: User side
User side: education
Information access is driven by informational practices as
much as technology itself. User education can include:• General information on search engines and how they work
• Using a search engine to its full potential
• Learning about alternatives to the dominant player
• Understanding that linking is not an innocent practice
• General informational ecology
These points could easily be included into teaching curricula.
. democratizing search
. 28 / 43 III - A: SCREEN: Cheat sheet
. democratizing search
. 29 / 43 III - B: SCREEN: Monopoly
comScore European Search Properties March 08
. democratizing search
. 30 / 43 III - B: Antitrust measures
Provider side: antitrust measures
Ownership is commonly an issue in the world of media.
Google is politically quite active.
But how to split up
http://google.com?
. democratizing search
. 31 / 43 III - B: Financial aid
Provider side: financial aid
A series of countries grant direct or indirect subsidies to
newspapers.
France taxes cinema tickets and redistributes the money to
level the playing field.
Countries can offer targeted R&D grants ( e.g. Quaero ).
There could be public search engines or a public datacenter
infrastructure.
. democratizing search
. 32 / 43 III - B: A public infrastructure
Provider side: building a public infrastructure
Crawler Index Search& Rank
GUI
. democratizing search
. 33 / 43 III - B: SCREEN: exalead
. democratizing search
. 34 / 43 III - C: Empower the user through interaction
Between user and provider: interaction possibilities
Crawler Index Search& Rank
GUI
. democratizing search
. 35 / 43 III - C: SCREEN: exalead
. democratizing search
. 36 / 43 III - C: SCREEN: exalead
. democratizing search
. 37 / 43 III - C: SCREEN: msn sliders
. democratizing search
. 38 / 43 III - C: SCREEN: clusty
. democratizing search
. 39 / 43 III - C: Opening the results
Between user and provider: better Web APIs
Search APIs allow external applications to download a limited
number of results ( Google ~8, Yahoo BOSS 50, Live API 50 ).
With larger result sets, effective reranking or more powerful
user interaction would be possible.
. democratizing search
. 40 / 43 III - C: Opening the index
Between user and provider: the search sandbox
Crawler Index Search& Rank
GUI
. democratizing search
. 41 / 43 III - C: Opening the index
Between user and provider: the search sandbox
A search sandbox would have the following elements:• Run on corporate infrastructure
• A safe execution environment for untrusted code
• A limited set of API calls to access the index
• Users and institutions could propose alternative ranking methods
• Quota rules for processing time
This might allow an ecosystem of search methods to develop
in a situation that is both technically and economically viable.
. democratizing search
. 42 / 43 Conclusions
Conclusions
We will have to put humans "back into the loop", render
search configurations hybrid and more complex.
In order to open up the search landscape and get closer to
the goal of plurality, we will have to combine all three levels.
We need more large scale empirical data on search habits
and consequences of ranking.
Without better conceptual grasp on search engines,
regulatory efforts are highly improbable.
. democratizing search
. 43 / 43 The End
Thank you for your attention!
http://bernhard.rieder.fr
http://thepoliticsofsystems.net