Searching the Clouds
Presented by Kajal Miyan
Slides courtesy:
UC Berkeley RAD Lab
http://cis.poly.edu/westlab/odissea/
Above the Clouds
A Berkeley View of Cloud Computing
Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia
Seminar "Peer-to-peer Information Systems" 3
Outline
What is it? Why now? Cloud killer apps Economics Challenges and opportunities Implications ODESSIA
Seminar "Peer-to-peer Information Systems" 4
What is Cloud Computing?
Old idea: Software as a Service (SaaS) Def: delivering applications over the Internet
Recently: “[Hardware, Infrastructure, Platform] as a service” Poorly defined so we avoid all “X as a service”
Utility Computing: pay-as-you-go computing Illusion of infinite resources No up-front cost Fine-grained billing (e.g. hourly)
Seminar "Peer-to-peer Information Systems" 5
Why Now?
Experience with very large datacenters Advent of technologies like Web 2.0 and Google
Adsense Other factors
Pervasive broadband Internet Pay-as-you-go billing model Standard software stack (Amazon VM) Gray's analysis: keep the data near the application
Seminar "Peer-to-peer Information Systems" 6
Spectrum of Clouds
Instruction Set VM (Amazon EC2, 3Tera) Bytecode VM (Microsoft Azure) Framework VM
Google AppEngine, Force.com
Lower-level,Less management
Higher-level,More management
EC2 Azure AppEngine Force.com
Seminar "Peer-to-peer Information Systems" 7
Cloud Killer Apps
Interactive Real Time applications Rise of analytics Extensions of desktop software
Matlab, Mathematica Parallel Batch processing
Oracle at Harvard, Hadoop at NY Times
Seminar "Peer-to-peer Information Systems" 8
Economics of Cloud Users
• Pay by use instead of provisioning for peak
Static Datacenter DataCenter in Cloud
Demand
Capacity
Time
Demand
Capacity
Time
Seminar "Peer-to-peer Information Systems" 9
Adoption Challenges
Challenge Opportunity
Availability (DDos) Multiple providers & DCs
Data lock-in Standardization
Data Confidentiality and Auditability
Encryption, VLANs, Firewalls; Geographical Data Storage
Seminar "Peer-to-peer Information Systems" 10
Growth Challenges
Challenge Opportunity
Data transfer bottlenecks
FedEx-ing disks, Data Backup/Archival
Performance unpredictability
Improved VM support, flash memory, scheduling VMs
Scalable storage Invent scalable store
Bugs in large distributed systems
Invent Debugger that relies on Distributed VMs
Scaling quickly Invent Auto-Scaler that relies on ML; Snapshots
Seminar "Peer-to-peer Information Systems" 11
Policy and Business Challenges
Challenge Opportunity
Reputation Fate Sharing Offer reputation-guarding services like those for email
Software Licensing Pay-for-use licenses; Bulk use sales
Seminar "Peer-to-peer Information Systems" 12
Implications
Startups and prototyping Cost associativity for scientific applications Research at scale
ODISSEAopen distributed search engine architecture
A Peer-to-Peer Architecture for Scalable Web Search and
Information Retrieval
Torsten Suel, Chandan Mathur, Jo-Wen Wu, Jiangong Zhang,
Alex Delis, Mehdi Kharrazi, Xiaohui Long, Kulesh Shanmugasundaram
Seminar "Peer-to-peer Information Systems" 14
“If Distributed Systems can be used to search for aliens then why not for Words???” :D
Seminar "Peer-to-peer Information Systems" 15
Search Engine Revision
Your Browser
The Web
URL1
URL2
URL3 URL4
Crawler
Indexer
SearchEngine
Database Eggs?Eggs.
Eggs - 90%Eggo - 81%Ego- 40%
Huh? - 10%
All AboutEggsby
S. I. Am
Seminar "Peer-to-peer Information Systems" 16
Motivation Today, main part of the web search infrastructure is supplied by
only a few large crawl-based search engines Strong research in the field of P2P systems over the last few
years This raises two issues
Vast data in P2P networks require the ability to search in these networks Significant computing resources provided by a P2P system could be used to search content residing inside or outside the system
ODISSEA - distributed global indexing and query execution service
image/svg+xml
Seminar "Peer-to-peer Information Systems" 17
Design Overview
ODISSEA is different from many other approaches to P2P search
It assumes a two-layered search engine architecture
It has a global index structure distributed over the nodes of the system (In a global index, as contradiction to a local index, a single node holds the entire inverted index for a particular term)
Seminar "Peer-to-peer Information Systems" 18
Two Layer Approach Lower layer provides
maintanance of the global index structure under document insertions and updates
Maintanance of node joins and failures Efficient execution of simple search
queries`
ODISSEA
queries
crawler
queries
Search server
Upper layer interacts with P2P-based lower layer via two classes of clients Update clients (e.g crawler, web server) Query clients (user implemented
optimized query execution plan)
WWW
Seminar "Peer-to-peer Information Systems" 19
Target Applications
Full-text search in large document collections located within in P2P communities
Search in large intranet environments Web Search: a powerful API supports the anticipated shift
towards client-based search tools which better exploit the resources of todays desktop machines
Seminar "Peer-to-peer Information Systems" 20
Two Layer Approach
Enables a large variety of (client-based) search tools that more fully exploit client computing resources.
Those tools could share the same lower-layer web search infrastructure.
Tools are developed using an open API, which accesses the search infrastructure
When processing a query, this could in the most general case (i.e where no pre-evaluation is done on server-side) result in large amounts of data to be transferred to the query client
Seminar "Peer-to-peer Information Systems" 21
Global vs. Local index posting = [DocID, Position, additional information] Local Index: each node creates its own index for all docs
that are locally stored Gobal Index: each node will hold a complete global
postings list for a certain group of words Suppose a query „chair AND table“. Then the query will be
processed as follows:
Seminar "Peer-to-peer Information Systems" 22
Global vs. Local index Local index organization is very inefficient in very large
networks (e.g. web) if result quality is the major concern, because the query has to be transmitted to all nodes and all of them have to respond (as data is unclustered)
But in a global index organization large amounts of data need to be transmitted between nodes when Initially building the index (adding new nodes) Evaluating a query bad response time
Can be overcome with smart algorithmic techniques Choice depends on the types of queries and the frequency of
document updates, as well as on the question of how dynamic the system is
Seminar "Peer-to-peer Information Systems" 23
Crawling and Fault Tolerance
Crawling approach Non P2P crawlers have the advantage that they can be easily
altered in the case that some web site operators have complains about the bot
Smart crawling strategies beyond BFS are hard to implement in a P2P environment unless there is a centralized scheduler
P2P systems and fault tolerance System design relies on the assumption of a more stable P2P
environment, since otherwise administration (insert, update, replication) would be too expencive
Seminar "Peer-to-peer Information Systems" 24
Implementation Details Currently, implemented in Java, using Pastry as a
P2P substrate (lower layer) and a DHT mapping for hashing IDs to the appropriate IP-address (Pastry is an overlay and routing network for the implementation of a DHT. The key-value pairs are stored in a redundant p2p network of connected Internet hosts.)
Each node runs an indexer that stores inverted list in compressed form in a Berkeley DB
Using MD5, all documents and term lists are hashed to a 80-bit ID that is used for lookups in the system
Seminar "Peer-to-peer Information Systems" 25
Implementation Details
Parsing and Routing Postings New or updated documents are parsed at the node where they
reside, as determined by the DHT mapping Parser generates for each term a posting that is routed via
several intermediate nodes, as determined by the topology of the Pastry network, until it reaches its destination node
An index structure of a node is split up in a small structure (residing in main memory) that is eventually merged with a bigger structure on disk to avoid disk accesses during inserts/updates lower amortized complexity
Seminar "Peer-to-peer Information Systems" 26
Implementation Details
Groups and Splits Initially, all objects (documents, indexes) whose first w bits
(here w=16) coincide are placed into a common group identified by this w-bit string
Locally, each group maintains a Berkeley DB with all objects it contains
When a group (of documents) becomes too large (here >1GB), it is split into two groups identified by a (w+1)-bit string leaving a stub structure pointing to the new groups that are assigned to new nodes
If index structures for terms are too large (here >100MB), they are split into two lists according to the document IDs they contain
Seminar "Peer-to-peer Information Systems" 27
Seminar "Peer-to-peer Information Systems" 28
Implementation Details
Replication Performed at group level by attaching „/0“, „/1“, etc. to the
group label (e.g. 0100101/2) This new label is then what is really presented to Pastry/DHT
during lookups All replicas of a group form a „clique“ that communicate
periodically to update their status If a group replica fails, the others are in charge of detecting
this and if necessary perform repair Each node can contain several distinct group replicas and
therefore participate in several cliques Postings are first routed to only one replica that is then in
charge of forwarding them to the others over a period of a few minutes
Seminar "Peer-to-peer Information Systems" 29
Implementation Details
Faults, Unavailability and Synchronization When a node leaves the system, its group replicas eventually
have to be replaced to maintain the desired degree of replication
A node has failed if it has been unavailable for an extended period of time
Create new replicas for a failed node or if a certain number of nodes are unavailable
Former unavailable nodes have to synchronize its index structures using logs of missing updates
Seminar "Peer-to-peer Information Systems" 30
Efficient Query Processing
Information Thoeretic Background Let d be a document, q = q0…qm-1 a query consisting of m terms and
F be a function that assigns d (depending on q) a value F(d,q). Such a function is called a ranking function.
The top-k ranking problem for a query q is finding the k documents with the highest values F(d,q).
A common form of such a function looks like this
Since queries typically have at most only 2 search terms, the following algorithm focuses on the top-k ranking problem and queries with exactly 2 search terms (for one-term queries, there is in fact nothing to do)
1
0
),(),(m
iiqdfqdF
Seminar "Peer-to-peer Information Systems" 31
Efficient Query Processing
Fagin‘s Algorithm (FA) Intuitively, an item that is ranked in the top is likely to be
ranked very high in at least one of the contributing subcategories
Assume a query q = q0 AND q1 and postings of the form (d,f(d,qi)) that are sorted by the second component with highest values on top
Also assume that the inverted lists for q0 and q1 are located on the same machine, so that no network communication is required
Goal: compute the top k documents as fast as possible
Seminar "Peer-to-peer Information Systems" 32
Efficient Query Processing
0.9
1 2 43
86 5 1
1. Scan both lists from the beginning, by reading one element from each list in every step, until there are k documents that have been encountered in both lists (here assume k=2)
1. Compute the scores of these k documents. Also, for each document that was encoutered in only one of the lists, perform a lookup into the other list to determine the score of the document.
1. Return the k documents with the highest score (here d1, d5)
5
3 7
A
B
0.8 0.7 0.69 0.67
0.6 0.5 0.4 0.3 0.2 0.1
Seminar "Peer-to-peer Information Systems" 33
Efficient Query Processing
Threshold Algorithm (TA) Scan both lists simultaneously and read (d,f(d,q0)) from the
first and (d‘,f(d‘,q1)) from the second list Compute t = f(d,q0) + f(d‘,q1) For each d in one of the lists perform immediately a lookup in
the other list in order to compute its complete score Algorithm terminates, when k documents have been found
that have higher scores than the current value of t
Because it does not make sense to scan two lists simultaneously while they are distributed in a P2P network, the above techniques have to be adapted. This leads us to the following protocol that aims at minimize the data to be transferred.
Seminar "Peer-to-peer Information Systems" 34
Efficient Query Processing
A simple distributed pruning protocol (DPP)
A B
Node A (holding the shorter list) sends the first x postings to node B. Let rmin be the smallest value f(d,q0) transmitted
Node B receives the postings from A and performs a lookup into its own list in order to compute the total scores. Retain the k documents with the highest scores. Let rk be the smallest value among these.
Node B now transmitts to A all postings among its first x postings with f(d,q1) > rk - rmin, together with the total scores of the k documents from the previous step
Node A now performs lookups into its own list for the postings received from B and determines the overall top k documents
Seminar "Peer-to-peer Information Systems" 35
Efficient Query Processing
DPP-Example for k=2 and x=3:A containing term q0: (d1, 0.9), (d2, 0.8), (d3, 0.7), (d4, 0.69), (d5, 0.67)
B containing term q1: (d6, 0.6), (d5, 0.5), (d3, 0.4), (d1, 0.3), (d7, 0.2), (d8, 0.1)
A B
A to B: (d1, 0.9), (d2, 0.8), (d3, 0.7)
B computes:
(d1, 0.9 + 0.3)(d2, 0.8 + ----)(d3, 0.7 + 0.4)
B to A: (d6, 0.6), (d5, 0.5), because f(d6,5,q1) > 0.4 together with (d1, 1.2), (d3, 1.1)
A computes:
(d6, 0.6+ ---- ),(d5, 0.5+0.67),
rmin = 0.7
rk = 1.1rk – rmin = 1.1 - 0.7 = 0.4
Top 2 documents:
1. (d1, 1.2)2. (d5, 1.17)
Seminar "Peer-to-peer Information Systems" 36
Efficient Query Processing
Problems with the DPP works only with queries containing 2 search terms random lookups can cause disk accesses, since large index
structures reside on hard disk bad response time How must the value of x be chosen?
(x should be the number of postings transmitted from A and B, s.t. DPP works correct without extra roundtrip; depends on the k and length of the inverted lists) By deriving appropriate formulae based on extensive
testing By sampling-based methods that estimate the number of
documents appearing in both lists
Seminar "Peer-to-peer Information Systems" 37
Experimental Results
Seminar "Peer-to-peer Information Systems" 38
Efficient Query Processing
Evaluation of DPP 900 two-term queries selected form a set of over 1 million Testing corpora: 120 million web pages (1.8TB) that were
crawled by their own crawler Value of x determined by experiments on TA Computation within nodes are not taken into account Commmunication costs and estimated times of DPP for the
top-10 documents and standard cosine measure:shortest 20% shorter 20% middle 20% longer 20% longest 20%
Shorter lists 10.401 63.853 222.948 666.717 3.371.176 # postings A B 2.057 4.083 2.904 4.417 3.745 # postings B A 1.486 4.084 2.891 4.413 3.745 Total bytes transferred 28.344 65.336 46.360 70.640 59.920 Total com time (400Kbps) 1.052 1.477 1.216 1.550 1.405 Total com time (2Mbps) 833 1.368 1.107 1.441 1.295
Seminar "Peer-to-peer Information Systems" 39
Future Work
Bloom Filters New algorithmic techniques for the index
synchronization problem New strategies for load balancing and
rebuilding of lost replicas More experimental evaluation concerning
different types of queries
Seminar "Peer-to-peer Information Systems" 40
Questions?
Can we use this architecture to solve our Hardware and processing problems?
How much data and programming parallalization will be needed to make this possible?
Top Related