Fundamentals of MapReduce - UMIACSjimmylin/cloud-computing/2009-03-short-cou… · 1 Fundamentals...
Transcript of Fundamentals of MapReduce - UMIACSjimmylin/cloud-computing/2009-03-short-cou… · 1 Fundamentals...
-
1
Fundamentals of MapReduce
Jimmy LinThe iSchoolUniversity of Maryland
Monday, March 30, 2009
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
This afternoon...Introduction to MapReduce
MapReduce “killer app” #1: text retrieval
MapReduce “killer app” #2: graph algorithms
Introduction to MapReducep
Introduction to MapReduce: TopicsFunctional programming
MapReduce
Distributed file system
Functional ProgrammingMapReduce = functional programming meets distributed processing on steroids
Not a new idea… dates back to the 50’s (or even 30’s)
What is functional programming?Computation as application of functionsTheoretical foundation provided by lambda calculusTheoretical foundation provided by lambda calculus
How is it different?Traditional notions of “data” and “instructions” are not applicableData flows are implicit in programDifferent orders of execution are possible
Exemplified by LISP and ML
Overview of LispLisp ≠ Lost In Silly Parentheses
We’ll focus on particular a dialect: “Scheme”
Lists are primitive data types'(1 2 3 4 5)
'((a 1) (b 2) (c 3))
Functions written in prefix notation
(+ 1 2) → 3(* 3 4) → 12(sqrt (+ (* 3 3) (* 4 4))) → 5(define x 3) → x(* x 5) → 15
(( ) ( ) ( ))
-
2
FunctionsFunctions = lambda expressions bound to variables
Syntactic sugar for defining functions
(define foo(lambda (x y)(sqrt (+ (* x x) (* y y)))))
Above expressions is equivalent to:
Once defined, function can be applied:
(define (foo x y)(sqrt (+ (* x x) (* y y))))
(foo 3 4) → 5
Other FeaturesIn Scheme, everything is an s-expression
No distinction between “data” and “code”Easy to write self-modifying code
Higher-order functionsFunctions that take other functions as arguments
(define (bar f x) (f (f x)))
(define (baz x) (* x x))
(bar baz 2) → 16
Doesn’t matter what f is, just apply it twice.
Recursion is your friendSimple factorial example
(define (factorial n) (if (= n 1)
1(* n (factorial (‐ n 1)))))
(factorial 6) → 720
Even iteration is written with recursive calls!(define (factorial‐iter n)(define (aux n top product)(if (= n top)
(* n product)(aux (+ n 1) top (* n product))))
(aux 1 n 1))
(factorial‐iter 6) → 720
Lisp → MapReduce?What does this have to do with MapReduce?
After all, Lisp is about processing lists
Two important concepts in functional programmingMap: do something to everything in a listFold: combine results of a list in some way
MapMap is a higher-order function
How map works:Function is applied to every element in a listResult is a new list
f f f f f
FoldFold is also a higher-order function
How fold works:Accumulator set to initial valueFunction applied to list element and the accumulatorResult stored in the accumulatorRepeated for every item in the listRepeated for every item in the listResult is the final value in the accumulator
f f f f f final value
Initial value
-
3
Map/Fold in ActionSimple map example:
Fold examples:
(map (lambda (x) (* x x))'(1 2 3 4 5))
→ '(1 4 9 16 25)
Sum of squares:
(fold + 0 '(1 2 3 4 5)) → 15(fold * 1 '(1 2 3 4 5)) → 120
(define (sum‐of‐squares v)(fold + 0 (map (lambda (x) (* x x)) v)))
(sum‐of‐squares '(1 2 3 4 5)) → 55
Lisp → MapReduceLet’s assume a long list of records: imagine if...
We can parallelize map operationsWe have a mechanism for bringing map results back together in the fold operation
That’s MapReduce!
Observations:Observations:No limit to map parallelization since maps are indepedentWe can reorder folding if the fold function is commutative and associative
Typical ProblemIterate over a large number of records
Extract something of interest from each
Shuffle and sort intermediate results
Aggregate intermediate results
G t fi l t tGenerate final output
Key idea: provide an abstraction at the point of these two operations
MapReduceProgrammers specify two functions:map (k, v) → *reduce (k’, v’) → *
All v’ with the same k’ are reduced together
Usually, programmers also specify:partition (k’, number of partitions ) → partition for k’
Often a simple hash of the key, e.g. hash(k’) mod nAllows reduce operations for different keys in parallel
Implementations:Google has a proprietary implementation in C++Hadoop is an open source implementation in Java
mapmap map map
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 9
Shuffle and Sort: aggregate values by keys
reduce reduce reduce
a 1 5 b 2 7 c 2 3 6 9
r1 s1 r2 s2 r3 s3
Recall these problems?How do we assign work units to workers?
What if we have more work units than workers?
What if workers need to share partial results?
How do we aggregate partial results?
H d k ll th k h fi i h d?How do we know all the workers have finished?
What if workers die?
-
4
MapReduce RuntimeHandles scheduling
Assigns workers to map and reduce tasks
Handles “data distribution”Moves the process to the data
Handles synchronizationGathers sorts and shuffles intermediate dataGathers, sorts, and shuffles intermediate data
Handles faultsDetects worker failures and restarts
Everything happens on top of a distributed FS (later)
“Hello World”: Word Count
Map(String input_key, String input_value):// input_key: document name// input_value: document contentsfor each word w in input_values:
EmitIntermediate(w, "1");
Reduce(String key, Iterator intermediate values):Reduce(String key, Iterator intermediate_values):// key: a word, same for input and output// intermediate_values: a list of countsint result = 0;for each v in intermediate_values:
result += ParseInt(v);Emit(AsString(result));
split 0worker
Master
UserProgram
(1) fork (1) fork (1) fork
(2) assign map(2) assign reduce
split 0split 1split 2split 3split 4
worker
worker
worker
worker
outputfile 0
outputfile 1
(3) read(4) local write
(5) remote read(6) write
Inputfiles
Mapphase
Intermediate files(on local disk)
Reducephase
Outputfiles
Redrawn from Dean and Ghemawat (OSDI 2004)
Bandwidth OptimizationIssue: large number of key-value pairs
Solution: use “Combiner” functionsExecuted on same machine as mapperResults in a “mini-reduce” right after the map phaseReduces key-value pairs to save bandwidth
Skew ProblemIssue: reduce is only as fast as the slowest map
Solution: redundantly execute map operations, use results of first to finish
Addresses hardware problems... But not issues related to inherent distribution of data
How do we get data to the workers?
NAS
Compute Nodes
SAN
What’s the problem here?
-
5
Distributed File SystemDon’t move data to workers… Move workers to the data!
Store data on the local disks for nodes in the clusterStart up the workers on the node that has the data local
Why?Not enough RAM to hold all the data in memoryDisk access is slow disk throughput is goodDisk access is slow, disk throughput is good
A distributed file system is the answerGFS (Google File System)HDFS for Hadoop
GFS: AssumptionsCommodity hardware over “exotic” hardware
High component failure ratesInexpensive commodity components fail all the time
“Modest” number of HUGE files
Files are write-once, mostly appended to, y ppPerhaps concurrently
Large streaming reads over random access
High sustained throughput over low latency
GFS slides adapted from material by Dean et al.
GFS: Design DecisionsFiles stored as chunks
Fixed size (64MB)
Reliability through replicationEach chunk replicated across 3+ chunkservers
Single master to coordinate access, keep metadataSimple centralized management
No data cachingLittle benefit due to large data sets, streaming reads
Simplify the APIPush some of the issues onto the client
Application
GSF Client
GFS masterFile namespace
/foo/barchunk 2ef0
(file name, chunk index)
(chunk handle, chunk location)
Redrawn from Ghemawat et al. (SOSP 2003)
GFS chunkserver
Linux file system
…
GFS chunkserver
Linux file system
…
Instructions to chunkserver
Chunkserver state(chunk handle, byte range)
chunk data
Single MasterWe know this is a:
Single point of failureScalability bottleneck
GFS solutions:Shadow mastersMinimize master involvementMinimize master involvement
• Never move data through it, use only for metadata (and cache metadata at clients)
• Large chunk size• Master delegates authority to primary replicas in data mutations
(chunk leases)
Simple, and good enough!
Master’s Responsibilities (1/2)Metadata storage
Namespace management/locking
Periodic communication with chunkserversGive instructions, collect state, track cluster health
Chunk creation, re-replication, rebalancing, p , gBalance space utilization and access speedSpread replicas across racks to reduce correlated failuresRe-replicate data if redundancy falls below thresholdRebalance data to smooth out storage and request load
-
6
Master’s Responsibilities (2/2)Garbage Collection
Simpler, more reliable than traditional file deleteMaster logs the deletion, renames the file to a hidden nameLazily garbage collects hidden files
Stale replica deletionDetect “stale” replicas using chunk version numbersDetect stale replicas using chunk version numbers
MetadataGlobal metadata is stored on the master
File and chunk namespacesMapping from files to chunksLocations of each chunk’s replicas
All in memory (64 bytes / chunk)FastFastEasily accessible
Master has an operation log for persistent logging of critical metadata updates
Persistent on local diskReplicatedCheckpoints for faster recovery
MutationsMutation = write or append
Must be done for all replicas
Goal: minimize master involvement
Lease mechanism:Master picks one replica as primary; gives it a “lease” for mutationsPrimary defines a serial order of mutationsAll replicas follow this orderData flow decoupled from control flow
Parallelization ProblemsHow do we assign work units to workers?
What if we have more work units than workers?
What if workers need to share partial results?
How do we aggregate partial results?
H d k ll th k h fi i h d?How do we know all the workers have finished?
What if workers die?
How is MapReduce different?
Questions?Questions? MapReduce “killer app” #1:Text RetrievalText Retrieval
-
7
Text Retrieval: TopicsIntroduction to information retrieval (IR)
Boolean retrieval
Ranked retrieval
Text retrieval with MapReduce
The Information Retrieval Cycle
SourceSelection
Search
Query
Results
QueryFormulation
Resource
Selection
Examination
Documents
Delivery
Information
source reselection
System discoveryVocabulary discoveryConcept discoveryDocument discovery
The Central Problem in Search
SearcherAuthor
Concepts Concepts
Query Terms Document Terms
Do these represent the same concepts?
“tragic love story” “fateful star-crossed romance”
Architecture of IR Systems
DocumentsQuery
RepresentationFunction
RepresentationFunction
offlineonline
Hits
Query Representation Document Representation
ComparisonFunction Index
How do we represent text?Remember: computers don’t “understand” anything!
“Bag of words”Treat all the words in a document as index terms for that documentAssign a “weight” to each term based on “importance”Disregard order, structure, meaning, etc. of the wordsSimple yet effective!Simple, yet effective!
AssumptionsTerm occurrence is independentDocument relevance is independent“Words” are well-defined
What’s a word?
天主教教宗若望保祿二世因感冒再度住進醫院。這是他今年第二度因同樣的病因住院。 الناطق باسم -وقال مارك ريجيف
إن شارون قبل -الخارجية اإلسرائيلية الدعوة وسيقوم للمرة األولى بزيارةتونس، التي آانت لفترة طويلة المقر
1982الرسمي لمنظمة التحرير الفلسطينية بعد خروجها من لبنان عام .
Выступая в Мещанском суде Москвы экс-глава ЮКОСа заявил не совершал ничего противозаконного в чемзаявил не совершал ничего противозаконного, в чем обвиняет его генпрокуратура России.
भारत सरकार ने आिथर्क सवेर्क्षण में िवत्तीय वषर् 2005-06 में सात फ़ीसदीिवकास दर हािसल करने का आकलन िकया है और कर सुधार पर ज़ोर िदया है
日米連合で台頭中国に対処…アーミテージ前副長官提言
조재영기자= 서울시는 25일이명박시장이 `행정중심복합도시'' 건설안에대해 `군대라도동원해막고싶은심정''이라고말했다는일부언론의보도를부인했다.
-
8
Sample DocumentMcDonald's slims down spudsFast-food chain to reduce certain types of fat in its french fries with new cooking oil.NEW YORK (CNN/Money) - McDonald's Corp. is cutting the amount of "bad" fat in its french fries nearly in half, the fast-food chain said Tuesday as it moves to make all its fried menu items healthier.But does that mean the popular shoestring fries won't taste the same? The company says no. "It's a win-win for our customers because they are getting the same great french fry taste along with
16 × said
14 × McDonalds
12 × fat
11 × fries
“Bag of Words”
getting the same great french-fry taste along with an even healthier nutrition profile," said Mike Roberts, president of McDonald's USA.
But others are not so sure. McDonald's will not specifically discuss the kind of oil it plans to use, but at least one nutrition expert says playing with the formula could mean a different taste.Shares of Oak Brook, Ill.-based McDonald's (MCD: down $0.54 to $23.22, Research, Estimates) were lower Tuesday afternoon. It was unclear Tuesday whether competitors Burger King and Wendy's International (WEN: down $0.80 to $34.91, Research, Estimates) would follow suit. Neither company could immediately be reached for comment.
…
11 × fries
8 × new
6 × company, french, nutrition
5 × food, oil, percent, reduce, taste, Tuesday
…
Boolean RetrievalUsers express queries as a Boolean expression
AND, OR, NOTCan be arbitrarily nested
Retrieval is based on the notion of setsAny given query divides the collection into two sets: retrieved, not-retrievedretrieved, not retrievedPure Boolean systems do not define an ordering of the results
Representing Documents
The quick brown fox jumped over the lazy dog’s back.
Document 1
for
brown
d
backall
come
aid 001101
110010
Term Docu
men
t 1
Doc
umen
t 2
StopwordList
Document 2
Now is the time for all good men to come to the aid of their party.
the
is
to
of
quick
fox
over
lazy
dog
now
time
good
men
jump
their
party
110110010100
001001101011
Inverted Index
brown
dog
backall
come
aid 001100
010010
Term
Doc
1D
oc 2
001101
110010
Doc
3D
oc 4
000101
010010
Doc
5D
oc 6
001100
100010
Doc
7D
oc 8
brown
dog
backall
come
aid 4 82 4 61 3 71 3 5 72 4 6 83 5
Term Postings
quick
fox
over
lazy
dog
now
time
good
men
jump
their
party
000010010110
001001100001
110110010100
001001000001
110010010010
001000101001
010010010010
001001111000
quick
fox
over
lazy
dog
now
time
good
men
jump
their
party
3 53 5 72 4 6 831 3 5 7
1 3 5 7 8
2 4 82 6 8
1 5 72 4 6
1 36 8
Boolean RetrievalTo execute a Boolean query:
Build query syntax tree
For each clause, look up postings
( fox or dog ) and quick
fox dog
ORquick
AND
dog 3 5
Traverse postings and apply Boolean operator
Efficiency analysisPostings traversal is linear (assuming sorted postings)Start with shortest posting first
fox 3 5 7
foxdog 3 5
3 5 7OR = union 3 5 7
ExtensionsImplementing proximity operators
Store word offset in postings
Handling term variationsStem words: love, loving, loves … → lov
-
9
Strengths and WeaknessesStrengths
Precise, if you know the right strategiesPrecise, if you have an idea of what you’re looking forImplementations are fast and efficient
WeaknessesUsers must learn Boolean logicUsers must learn Boolean logicBoolean logic insufficient to capture the richness of languageNo control over size of result set: either too many hits or noneWhen do you stop reading? All documents in the result set are considered “equally good”What about partial matches? Documents that “don’t quite match” the query may be useful also
Ranked RetrievalOrder documents by how likely they are to be relevant to the information need
Estimate relevance(q, di)Sort documents by relevanceDisplay sorted results
User modelPresent hits one screen at a time, best results firstAt any point, users can decide to stop looking
How do we estimate relevance?Assume document is relevant if it has a lot of query termsReplace relevance (q, di) with sim(q, di)Compute similarity of vector representations
Vector Space Model
t
d2
d1
d3
t3
θφ
Assumption: Documents that are “close together” in vector space “talk about” the same things
t1
d4
d5t2
Therefore, retrieve documents based on how close the document is to the query (i.e., similarity ~ “closeness”)
Similarity MetricHow about |d1 – d2|?
Instead of Euclidean distance, use “angle” between the vectors
It all boils down to the inner product (dot product) of vectors
rr
kj
kj
dd
ddrr
rr⋅
=)cos(θ
∑∑∑
==
==⋅
=n
i kin
i ji
n
i kiji
kj
kjkj
ww
ww
dd
ddddsim
12,1
2,
1 ,,),( rr
rr
Term WeightingTerm weights consist of two components
Local: how important is the term in this document?Global: how important is the term in the collection?
Here’s the intuition:Terms that appear often in a document should get high weightsTerms that appear in many documents should get low weightsTerms that appear in many documents should get low weights
How do we capture this mathematically?Term frequency (local)Inverse document frequency (global)
TF.IDF Term Weighting
ijiji n
Nw logtf ,, ⋅=
jiw , weight assigned to term i in document j
ji,tf
N
in
number of occurrence of term i in document j
number of documents in entire collection
number of documents with term i
-
10
TF.IDF Example
4
5
1
5
3
4
1 2 3
2
3
4
0.301
0.125
0.125
tfidf
complicated
contaminated
fallout
1,4
1,5
2,1
3,5
3,3
3,4
0.301
0.125
0.125
complicated
contaminated
fallout
4,2
4,3
6
3
3
1
6
3
7
1
2
2
4 0.125
0.602
0.301
0.000
0.602
information
interesting
nuclear
retrieval
siberia
1,6
1,3
2,1
2,6
1,2
0.125
0.602
0.301
0.000
0.602
information
interesting
nuclear
retrieval
siberia
2,3 3,3 4,2
3,7
3,1 4,4
Sketch: Scoring AlgorithmInitialize accumulators to hold document scores
For each query term t in the user’s queryFetch t’s postingsFor each document, scoredoc += wt,d × wt,q
Apply length normalization to the scores at end
Return top N documents
MapReduce it?The indexing problem
Must be relatively fast, but need not be real timeFor Web, incremental updates are important
The retrieval problemMust have sub-second responseFor Web only need relatively few resultsFor Web, only need relatively few results
Indexing: Performance AnalysisInverted indexing is fundamental to all IR models
Fundamentally, a large sorting problemTerms usually fit in memoryPostings usually don’t
How is it done on a single machine?
How large is the inverted index?Size of vocabularySize of postings
Vocabulary Size: Heaps’ Law
βKnV =V is vocabulary sizen is corpus size (number of documents)K and β are constants
Typically, K is between 10 and 100, β is between 0.4 and 0.6
When adding new documents, the system is likely to have seen most terms already… but the postings keep growing
George Kingsley Zipf (1902-1950) observed the following relation between frequency and rank
I th d
Postings Size: Zipf’s Law
crf =⋅ orrcf = f = frequencyr = rank
c = constant
In other words:A few elements occur very frequentlyMany elements occur very infrequently
Zipfian distributions:English wordsLibrary book checkout patternsWebsite popularity (almost anything on the Web)
-
11
Word Frequency in English
the 1130021 from 96900 or 54958of 547311 he 94585 about 53713to 516635 million 93515 market 52110a 464736 year 90104 they 51359in 390819 its 86774 this 50933and 387703 be 85588 would 50828that 204351 was 83398 you 49281for 199340 company 83070 which 48273is 152483 an 76974 bank 47940said 148302 has 74405 stock 47401it 134323 are 74097 trade 47310on 121173 have 73132 his 47116by 118863 but 71887 more 46244as 109135 will 71494 who 42142at 101779 say 66807 one 41635mr 101679 new 64456 their 40910with 101210 share 63925
Frequency of 50 most common words in English (sample of 19 million words)
Does it fit Zipf’s Law?
the 59 from 92 or 101of 58 he 95 about 102to 82 million 98 market 101a 98 year 100 they 103in 103 its 100 this 105and 122 be 104 would 107that 75 was 105 you 106for 84 company 109 which 107is 72 an 105 bank 109said 78 has 106 stock 110it 78 are 109 trade 112on 77 have 112 his 114by 81 but 114 more 114as 80 will 117 who 106at 80 say 113 one 107mr 86 new 112 their 108with 91 share 114
The following shows rf×1000r is the rank of word w in the samplef is the frequency of word w in the sample
MapReduce: Index ConstructionMap over all documents
Emit term as key, (docid, tf) as valueEmit other information as necessary (e.g., term position)
ReduceTrivial: each value represents a posting!Might want to sort the postings (e g by docid or tf)Might want to sort the postings (e.g., by docid or tf)
MapReduce does all the heavy lifting!
Query ExecutionMapReduce is meant for large-data batch processing
Not suitable for lots of real time operations requiring low latency
The solution: “the secret sauce”Most likely involves document partitioningLots of system engineering: e.g., caching, load balancing, etc.
MapReduce: Query ExecutionHigh-throughput batch query execution:
Instead of sequentially accumulating scores per query term:Have mappers traverse postings in parallel, emitting partial score componentsReducers serve as the accumulators, summing contributions for each query term
MapReduce does all the heavy liftingReplace random access with sequential readsAmortize over lots of queriesExamine multiple postings in parallel
Questions?Questions?
-
12
MapReduce “killer app” #2:Graph AlgorithmsGraph Algorithms
Graph Algorithms: TopicsIntroduction to graph algorithms and graph representations
Single Source Shortest Path (SSSP) problemRefresher: Dijkstra’s algorithmBreadth-First Search with MapReduce
P R kPageRank
What’s a graph?G = (V,E), where
V represents the set of vertices (nodes)E represents the set of edges (links)Both vertices and edges may contain additional information
Different types of graphs:Directed vs undirected edgesDirected vs. undirected edgesPresence or absence of cycles
Graphs are everywhere:Hyperlink structure of the WebPhysical structure of computers on the InternetInterstate highway systemSocial networks
Some Graph ProblemsFinding shortest paths
Routing Internet traffic and UPS trucks
Finding minimum spanning treesTelco laying down fiber
Finding Max FlowAirline scheduling
Identify “special” nodes and communitiesBreaking up terrorist cells, spread of avian flu
Bipartite matchingMonster.com, Match.com
And of course... PageRank
Graphs and MapReduceGraph algorithms typically involve:
Performing computation at each nodeProcessing node-specific data, edge-specific data, and link structureTraversing the graph in some manner
Key questions:How do you represent graph data in MapReduce?How do you traverse a graph in MapReduce?
Representing GraphsG = (V, E)
A poor representation for computational purposes
Two common representationsAdjacency matrixAdjacency list
-
13
Adjacency MatricesRepresent a graph as an n x n square matrix M
n = |V|Mij = 1 means a link from node i to j
1 2 3 42
1 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0
1
3
4
Adjacency Matrices: CritiqueAdvantages:
Naturally encapsulates iteration over nodesRows and columns correspond to inlinks and outlinks
Disadvantages:Lots of zeros for sparse matricesLots of wasted spaceLots of wasted space
Adjacency ListsTake adjacency matrices… and throw away all the zeros
1 2 3 41 0 1 0 1 1: 2, 41 0 1 0 12 1 0 1 13 1 0 0 04 1 0 1 0
2: 1, 3, 43: 14: 1, 3
Adjacency Lists: CritiqueAdvantages:
Much more compact representationEasy to compute over outlinksGraph structure can be broken up and distributed
Disadvantages:Much more difficult to compute over inlinksMuch more difficult to compute over inlinks
Single Source Shortest PathProblem: find shortest path from a source node to one or more target nodes
First, a refresher: Dijkstra’s Algorithm
Dijkstra’s Algorithm Example
∞ ∞
10
1
0
∞ ∞
5
2 3
2
9
7
4 6
Example from CLR
-
14
Dijkstra’s Algorithm Example
10 ∞
10
1
0
5 ∞
5
2 3
2
9
7
4 6
Example from CLR
Dijkstra’s Algorithm Example
8 14
10
1
0
5 7
5
2 3
2
9
7
4 6
Example from CLR
Dijkstra’s Algorithm Example
8 13
10
1
0
5 7
5
2 3
2
9
7
4 6
Example from CLR
Dijkstra’s Algorithm Example
8 9
10
1
0
5 7
5
2 3
2
9
7
4 6
Example from CLR
Dijkstra’s Algorithm Example
8 9
10
1
0
5 7
5
2 3
2
9
7
4 6
Example from CLR
Single Source Shortest PathProblem: find shortest path from a source node to one or more target nodes
Single processor machine: Dijkstra’s Algorithm
MapReduce: parallel Breadth-First Search (BFS)
-
15
Finding the Shortest PathFirst, consider equal edge weights
Solution to the problem can be defined inductively
Here’s the intuition:DistanceTo(startNode) = 0For all nodes n directly reachable from startNode, DistanceTo(n) = 1For all nodes n reachable from some other set of nodes S, DistanceTo(n) = 1 + min(DistanceTo(m), m ∈ S)
From Intuition to AlgorithmA map task receives
Key: node nValue: D (distance from start), points-to (list of nodes reachable from n)
∀p ∈ points-to: emit (p, D+1)
The reduce task gathers possible distances to a given pThe reduce task gathers possible distances to a given pand selects the minimum one
Multiple Iterations NeededThis MapReduce task advances the “known frontier” by one hop
Subsequent iterations include more reachable nodes as frontier advancesMultiple iterations are needed to explore entire graphFeed output back into the same MapReduce task
Preserving graph structure:Problem: Where did the points-to list go?Solution: Mapper emits (n, points-to) as well
Visualizing Parallel BFS
1
2 2
23
3
33
4
4
TerminationDoes the algorithm ever terminate?
Eventually, all nodes will be discovered, all edges will be considered (in a connected graph)
When do we stop?
Weighted EdgesNow add positive weights to the edges
Simple change: points-to list in map task includes a weight w for each pointed-to node
emit (p, D+wp) instead of (p, D+1) for each node p
Does this ever terminate?Yes! Eventually, no better distances will be found. When distance is the same, we stopMapper should emit (n, D) to ensure that “current distance” is carried into the reducer
-
16
Comparison to DijkstraDijkstra’s algorithm is more efficient
At any step it only pursues edges from the minimum-cost path inside the frontier
MapReduce explores all paths in parallelDivide and conquerThrow more hardware at the problemThrow more hardware at the problem
General ApproachMapReduce is adapt at manipulating graphs
Store graphs as adjacency lists
Graph algorithms with for MapReduce:Each map task receives a node and its outlinksMap task compute some function of the link structure, emits value with target as the keywith target as the keyReduce task collects keys (target nodes) and aggregates
Iterate multiple MapReduce cycles until some termination condition
Remember to “pass” graph structure from one iteration to next
Random Walks Over the WebModel:
User starts at a random Web pageUser randomly clicks on links, surfing from page to page
What’s the amount of time that will be spent on any given page?
Thi i P R kThis is PageRank
Given page x with in-bound links t1…tn, whereC(t) is the out-degree of tα is probability of random jumpN is the total number of nodes in the graph
PageRank: Defined
∑−+⎟⎠⎞
⎜⎝⎛=
ni
tCtPR
NxPR
)()()1(1)( αα
=⎠⎝ i itCN 1 )(
X
t1
t2
tn…
Computing PageRankProperties of PageRank
Can be computed iterativelyEffects at each iteration is local
Sketch of algorithm:Start with seed PRi valuesEach page distributes PR “credit” to all pages it links toEach page distributes PRi credit to all pages it links toEach target page adds up “credit” from multiple in-bound links to compute PRi+1Iterate until values converge
PageRank in MapReduce
Map: distribute PageRank “credit” to link targets
...
Reduce: gather up PageRank “credit” from multiple sources to compute new PageRank value
Iterate untilconvergence
-
17
PageRank: IssuesIs PageRank guaranteed to converge? How quickly?
What is the “correct” value of α, and how sensitive is the algorithm to it?
What about dangling links?
How do you know when to stop?y p Questions?Questions?