Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud...
Transcript of Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud...
![Page 1: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/1.jpg)
Basics of Cloud Computing – Lecture 3
Introduction to MapReduce
Satish Srirama
Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball,
& Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed
under Creation Commons Attribution 3.0 License)
![Page 2: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/2.jpg)
Outline
• Functional programming review
• MapReduce
• Hadoop
• HDFS (Hadoop Distributed File System)• HDFS (Hadoop Distributed File System)
19.04.2010 2Satish Srirama
![Page 3: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/3.jpg)
Economics of Cloud Providers –
Failures - Recap
• Cloud Computing providers bring a shift from high reliability/availability servers to commodity servers– At least one failure per day in large datacenter
• Caveat: User software has to adapt to failures• Caveat: User software has to adapt to failures
• Solution: Replicate data and computation– MapReduce & Distributed File System
• MapReduce = functional programming meets distributed processing on steroids – Not a new idea… dates back to the 50’s (or even 30’s)
19.04.2010 3/30 Satish Srirama
![Page 4: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/4.jpg)
Functional programming
• What is functional programming?
– Computation as application of functions
– Theoretical foundation provided by lambda calculus
• How is it different?• How is it different?
– Traditional notions of “data” and “instructions” are
not applicable
– Data flows are implicit in program
– Different orders of execution are possible
• Exemplified by LISP and ML
19.04.2010 4Satish Srirama
![Page 5: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/5.jpg)
Functional Programming Review
• Lists are primitive data types
• Functions = lambda expressions bound to variables
– f = lambda x : 2 * x– f = lambda x : 2 * x
print f(2)
• Higher-order functions
– Functions that take other functions as arguments
• Recursion is your friend
19.04.2010 5Satish Srirama
![Page 6: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/6.jpg)
Functional Programming - features
• Functional operations do not modify data structures: They always create new ones
• Original data still exists in unmodified form
• Data flows are implicit in program design• Data flows are implicit in program design
• Order of operations does not matter
19.04.2010 6Satish Srirama
![Page 7: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/7.jpg)
FP features - continued
• fun foo(l: int list) =
sum(l) + mul(l) + length(l)– Order of sum() and mul(), etc does not matter – they do not
modify l
• Functional Updates Do Not Modify Structures– fun append(x, lst) = – fun append(x, lst) =
let lst' = reverse lst in
reverse ( x :: lst' )
– The append() function reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.
– But it never modifies lst!
• Functions Can Be Used As Arguments– fun DoDouble(f, x) = f (f x)
19.04.2010 Satish Srirama 7
![Page 8: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/8.jpg)
Functional Programming ->
MapReduce
• Two important concepts in functional
programming
– Map: do something to everything in a list
– Fold: combine results of a list in some way– Fold: combine results of a list in some way
19.04.2010 Satish Srirama 8
![Page 9: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/9.jpg)
Map
• Map is a higher-order function
• How map works:– Function is applied to every element in a list
– Result is a new list
• map f lst: (’a->’b) -> (’a list) -> (’b list)• map f lst: (’a->’b) -> (’a list) -> (’b list)– Creates a new list by applying f to each element of the
input list; returns output in order.
19.04.2010 Satish Srirama 9
f f f f f f
![Page 10: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/10.jpg)
Fold
• Fold is also a higher-order function
• How fold works:– Accumulator set to initial value
– Function applied to list element and the accumulator
– Result stored in the accumulator– Result stored in the accumulator
– Repeated for every item in the list
– Result is the final value in the accumulator
19.04.2010 Satish Srirama 10
![Page 11: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/11.jpg)
Map/Fold in Action
• Simple map example:
• Fold examples:
(map (lambda (x) (* x x))
'(1 2 3 4 5))
→ '(1 4 9 16 25)
• Fold examples:
• Sum of squares:
(fold + 0 '(1 2 3 4 5)) → 15
(fold * 1 '(1 2 3 4 5)) → 120
(define (sum-of-squares v)
(fold + 0 (map (lambda (x) (* x x)) v)))
(sum-of-squares '(1 2 3 4 5)) → 55
19.04.2010 11Satish Srirama
![Page 12: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/12.jpg)
Implicit Parallelism In map
• In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements
• If order of application of f to elements in list is commutative, we can reorder or parallelize execution
• Let’s assume a long list of records: imagine if...– We can parallelize map operations
– We have a mechanism for bringing map results back together in the – We have a mechanism for bringing map results back together in the fold operation
• This is the “secret” that MapReduce exploits
• Observations:– No limit to map parallelization since maps are independent
– We can reorder folding if the fold function is commutative and associative
19.04.2010 Satish Srirama 12
![Page 13: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/13.jpg)
Typical Large-Data Problem
• Iterate over a large number of records
• Extract something of interest from each
• Shuffle and sort intermediate results
• Aggregate intermediate results• Aggregate intermediate results
• Generate final outputKey idea: provide a functional abstraction for
these two operations – MapReduce
19.04.2010 13Satish Srirama
![Page 14: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/14.jpg)
MapReduce
• Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’) → <k’, v’>*
– All values with the same key are sent to the same – All values with the same key are sent to the same reducer
• The execution framework handles everything else…
19.04.2010 14Satish Srirama
![Page 15: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/15.jpg)
mapmap map map
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 8
MapReduce
Shuffle and Sort: aggregate values by keys
reduce reduce reduce
ba 1 2 c c3 6 a c5 2 b c7 8
a 1 5 b 2 7 c 2 3 6 8
r1 s1 r2 s2 r3 s3
19.04.2010 15Satish Srirama
![Page 16: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/16.jpg)
MapReduce
• Programmers specify two functions:
map (k, v) → <k’, v’>*
reduce (k’, v’) → <k’, v’>*
– All values with the same key are sent to the same – All values with the same key are sent to the same reducer
• The execution framework handles everything else…
What’s “everything else”?
19.04.2010 16Satish Srirama
![Page 17: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/17.jpg)
MapReduce “Runtime”• Handles scheduling
– Assigns workers to map and reduce tasks
• Handles “data distribution”– Moves processes to data
• Handles synchronizationGathers, sorts, and shuffles intermediate data– Gathers, sorts, and shuffles intermediate data
• Handles errors and faults– Detects worker failures and automatically restarts
• Handles speculative execution– Detects “slow” workers and re-executes work
• Everything happens on top of a distributed FS (later)
Sounds simple, but many challenges!19.04.2010 17Satish Srirama
![Page 18: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/18.jpg)
MapReduce - extended
• Programmers specify two functions:map (k, v) → <k’, v’>*reduce (k’, v’) → <k’, v’>*– All values with the same key are reduced together
• The execution framework handles everything else…• Not quite…usually, programmers also specify:• Not quite…usually, programmers also specify:
partition (k’, number of partitions) → partition for k’– Often a simple hash of the key, e.g., hash(k’) mod n– Divides up key space for parallel reduce operationscombine (k’, v’) → <k’, v’>*– Mini-reducers that run in memory after the map phase– Used as an optimization to reduce network traffic
19.04.2010 18Satish Srirama
![Page 19: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/19.jpg)
combinecombine combine combine
ba 1 2 c 9 a c5 2 b c7 8
mapmap map map
k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6
ba 1 2 c c3 6 a c5 2 b c7 8
ba 1 2 c 9 a c5 2 b c7 8
partition partition partition partition
Shuffle and Sort: aggregate values by keys
reduce reduce reduce
a 1 5 b 2 7 c 2 9 8
r1 s1 r2 s2 r3 s3
c 2 3 6 8
19.04.2010 19Satish Srirama
![Page 20: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/20.jpg)
Two more details…
• Barrier between map and reduce phases
– But we can begin copying intermediate data
earlier
• Keys arrive at each reducer in sorted order• Keys arrive at each reducer in sorted order
– No enforced ordering across reducers
19.04.2010 20Satish Srirama
![Page 21: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/21.jpg)
split 0
worker
Master
User
Program
output
(1) submit
(2) schedule map (2) schedule reduce
MapReduce Overall Architecture
split 0
split 1
split 2
split 3
split 4
worker
worker
worker
worker
output
file 0
output
file 1
(3) read(4) local write
(5) remote read (6) write
Input
files
Map
phase
Intermediate files
(on local disk)
Reduce
phase
Output
files
Adapted from (Dean and Ghemawat, OSDI 2004)
19.04.2010 21Satish Srirama
![Page 22: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/22.jpg)
“Hello World” Example: Word Count
Map(String docid, String text):
for each word w in text:
Emit(w, 1);
Reduce(String term, Iterator<Int> values):
int sum = 0;
for each v in values:
sum += v;
Emit(term, value);
19.04.2010 22Satish Srirama
![Page 23: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/23.jpg)
MapReduce can refer to…
• The programming model
• The execution framework (aka “runtime”)
• The specific implementation
Usage is usually clear from context!
19.04.2010 23Satish Srirama
![Page 24: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/24.jpg)
MapReduce Implementations
• Google has a proprietary implementation in C++
– Bindings in Java, Python
• Hadoop is an open-source implementation in Java
– Development led by Yahoo, used in production– Development led by Yahoo, used in production
– Now an Apache project
– Rapidly expanding software ecosystem, but still lots of
room for improvement (e.g., OSDI 2008, Nexus)
• Lots of custom research implementations
– For GPUs, cell processors, etc.
19.04.2010 24Satish Srirama
![Page 25: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/25.jpg)
Cloud Computing Storage, or how do
we get data to the workers?NAS
Compute Nodes
SAN
What’s the problem here?
19.04.2010 25Satish Srirama
![Page 26: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/26.jpg)
Distributed File System
• Don’t move data to workers… move workers to the data!– Store data on the local disks of nodes in the cluster
– Start up the workers on the node that has the data local
• Why?• Why?– Network bisection bandwidth is limited
– Not enough RAM to hold all the data in memory
– Disk access is slow, but disk throughput is reasonable
• A distributed file system is the answer– GFS (Google File System) for Google’s MapReduce
– HDFS (Hadoop Distributed File System) for Hadoop
19.04.2010 26Satish Srirama
![Page 27: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/27.jpg)
GFS: Assumptions
• Choose commodity hardware over “exotic” hardware
– Scale “out”, not “up”
• High component failure rates
– Inexpensive commodity components fail all the time
• “Modest” number of huge files• “Modest” number of huge files
– Multi-gigabyte files are common, if not encouraged
• Files are write-once, mostly appended to
– Perhaps concurrently
• Large streaming reads over random access
– High sustained throughput over low latency
GFS slides adapted from material by (Ghemawat et al., SOSP 2003)
19.04.2010 27Satish Srirama
![Page 28: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/28.jpg)
GFS: Design Decisions
• Files stored as chunks
– Fixed size (64MB)
• Reliability through replication
– Each chunk replicated across 3+ chunkservers
• Single master to coordinate access, keep metadata• Single master to coordinate access, keep metadata
– Simple centralized management
• No data caching
– Little benefit due to large datasets, streaming reads
• Simplify the API
– Push some of the issues onto the client (e.g., data layout)
HDFS = GFS clone (same basic ideas implemented in Java)
19.04.2010 28Satish Srirama
![Page 29: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/29.jpg)
From GFS to HDFS
• Terminology differences:
– GFS master = Hadoop namenode
– GFS chunkservers = Hadoop datanodes
• Functional differences:• Functional differences:
– No file appends in HDFS (planned feature)
– HDFS performance is (likely) slower
For the most part, we’ll use the Hadoop terminology…
19.04.2010 29Satish Srirama
![Page 30: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/30.jpg)
(file name, block id)
(block id, block location)
HDFS namenode
File namespace/foo/bar
block 3df2
Application
HDFS Client
HDFS Architecture
Adapted from (Ghemawat et al., SOSP 2003)
instructions to datanode
datanode state(block id, byte range)
block data
HDFS datanode
Linux file system
…
HDFS datanode
Linux file system
…
19.04.2010 30Satish Srirama
![Page 31: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/31.jpg)
Namenode Responsibilities
• Managing the file system namespace:
– Holds file/directory structure, metadata, file-to-block mapping, access permissions, etc.
• Coordinating file operations:
– Directs clients to datanodes for reads and writes– Directs clients to datanodes for reads and writes
– No data is moved through the namenode
• Maintaining overall health:
– Periodic communication with the datanodes
– Block re-replication and rebalancing
– Garbage collection
19.04.2010 31Satish Srirama
![Page 32: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/32.jpg)
Putting everything together…
namenode
namenode daemon
job submission node
jobtracker
datanode daemon
Linux file system
…
tasktracker
slave node
datanode daemon
Linux file system
…
tasktracker
slave node
datanode daemon
Linux file system
…
tasktracker
slave node
19.04.2010 32Satish Srirama
![Page 33: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/33.jpg)
MapReduce/GFS Summary
• Simple, but powerful programming model
• Scales to handle petabyte+ workloads
– Google: six hours and two minutes to sort 1PB (10
trillion 100-byte records) on 4,000 computerstrillion 100-byte records) on 4,000 computers
– Yahoo!: 16.25 hours to sort 1PB on 3,800 computers
• Incremental performance improvement with
more nodes
• Seamlessly handles failures, but possibly with
performance penalties
19.04.2010 33Satish Srirama
![Page 34: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/34.jpg)
Next Lectures
• Deeper look at Hadoop
• MapReduce in different domains
• Let us have a look at some algorithms
19.04.2010 Satish Srirama 34
![Page 35: Basics of Cloud Computing –Lecture 3 Introduction to MapReducekodu.ut.ee › ~srirama › cloud › L3_MapReduce.pdf · Basics of Cloud Computing –Lecture 3 Introduction to MapReduce](https://reader034.fdocuments.in/reader034/viewer/2022042406/5f2034031262b463af7747b4/html5/thumbnails/35.jpg)
References
• Hadoop wiki http://wiki.apache.org/hadoop/
• Cloudera – Hadoop training http://www.cloudera.com/developers/learn-hadoop/training/
• J. Dean and S. Ghemawat, “MapReduce: • J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, OSDI'04: Sixth Symposium on Operating System Design and Implementation, Dec, 2004.
• Todo:– Work with SciCloud Hadoop setup and Cloudera
virtual machine
19.04.2010 Satish Srirama 35