MapReduce - Seoul National University

MapReduce:Simplified Data Processing on Large Clusters

Jeffrey Dean and Sanjay Ghemawat (OSDI `04)

Seong Hoon Seo, Hyunji ChoiDecember 1st, 2020

Distributed Systems, 2020 Fall

Contents

● Introduction and Motivation

● Programming Model

● Execution Flow

● Implementation

● Details and Refinements

● Performance

● Experience

● Conclusion

Contents

● Execution Flow

● Implementation

● Performance

● Experience

● Conclusion

Introduction and Motivation

● Computation in Google: Derived Data = F(large raw data)○ Input: crawled documents, web request logs

○ Output: inverted indices, set of most frequent queries

● Example: Inverted Index

Source: Lucidworkshttps://www.slideshare.net/erikhatcher/introduction-to-solr-9213241

● Characteristics of Computation○ Conceptually straightforward

○ Distributed computation is necessary

○ Complex Implementation in distributed environment

● Challenges of Distributed Computation○ Parallelization

○ Fault-tolerance

○ Data distribution

○ Load balancing

Solution: MapReduce Programming Model

● Interface

○ Enables automatic parallelization and distribution

● Implementation

○ Resolves the challenges of distributed computation and achieves performance

Contents

● Execution Flow

● Implementation

● Performance

● Experience

● Conclusion

Programming Model

● Input and Output: Set of key/value pairs (i.e., (k, v))

● Map: (k1, v1) → list of (k2, v2)

● Reduce: (k2, list of (v2)) → (k2, list of (v2))

*k1, k2, v1, v2 are types (e.g., Int, String)

● Implementation Details

○ How intermediate values associated with a given key is grouped

Programming Model

Map: (k1, v1) → list of (k2, v2)

Reduce: (k2, list of (v2)) → (k2, list of (v2))

● Example 1: Word Count

○ Map: (document name, contents) → list of (word, 1)

○ Reduce: (word, list of (“1”)) → (word, Count)

Programming Model

Map: (k1, v1) → list of (k2, v2)

Reduce: (k2, list of (v2)) → (k2, list of (v2))

● Example 2: Inverted Index

○ Map: (document, words) → list of (word, document ID)

○ Reduce: (word, list of (document IDs)) → (word, sorted list of (document IDs))

Contents

● Execution Flow

● Implementation

● Performance

● Experience

● Conclusion

Execution Flow

# of Map tasks: M = 5

# of Reduce tasks: R = 2

Execution Flow

● Step 1: Input Split

○ M pieces, usually 16 ~ 64 MB per piece (configurable)

○ Each piece corresponds to a “map task”

Execution Flow

● Step 2: Master and Worker Generation

○ Single master node

Execution Flow

● Two types of tasks

○ M pieces → M map tasks

○ R partitioned intermediate key space → R reduce tasks

■ e.g., hash(key) mod R

Execution Flow

● Step 3: Map Phase

○ parse input key/value pairs

○ Intermediate key/value pairs buffered in memory

Execution Flow

● Step 4: Periodic Store

○ Buffered pairs written to local disk

○ Each local disk is partitioned into R regions

○ Location of buffered pairs on local disk are passed back to the master

Execution Flow

● Step 5: Reduce Phase - Read

○ Master notifies locations of buffered pairs to reduce workers

○ Use Remote Procedure Calls (RPC) to read data from disks in map worker

Execution Flow

● Step 6: Reduce Phase - Process

○ Sorts and Groups by intermediate keys

○ Perform Reduce function for each unique key

○ Append result to output file

Contents

● Execution Flow

● Implementation

● Performance

● Experience

● Conclusion

Implementation

A. Master Data Structures

● State of each map and reduce task (idle / in-progress /completed)

○ Assigned worker node identity (for non-idle tasks)

● Location and size of intermediate file regions for each map task

B. Task Granularity

● Factors to Consider

○ Scheduling decision: O(M + R)

○ Master state capacity: O(M * R)

○ User preference on number of output files

Implementation

C. Fault tolerance

1. Worker Failure

● Detection: periodic ping

● Recovery: Reset task to idle and reassign

2. Master Failure

● Retry the entire MapReduce operation

● Make master write periodic checkpoints of master data structure

Reset Required? Map Task Reduce Task

In-Progress O O

Completed O X

(O) intermediate pairs stored on local disk of failed machine is no longer accessible

(X) output of Reduce is stored in a global file system

Contents

● Execution Flow

● Implementation

● Performance

● Experience

● Conclusion

Implementation Details - Locality

● Locality Optimization

○ GFS divides each file into 64MB blocks and stores several copies.

○ Master attempts to schedule a map task on a machine that contains a replica of

the corresponding input data or near it (same network switch).

○ Conserve network bandwidth.

Implementation Details - Backup Tasks

● Problem: “Straggler” workers

○ Workers that take unusually long time to complete a task

● Solution: schedule “backup” executions of the remaining tasks

○ When a MapReduce operation is close to completion

● Gain: significant on large operations

○ 44% slower without backup tasks for Sort

Refinements - M splits to R outputs

R = 2M = 5

Input Split Map Reduce Output

A red dog and a blue cat and a blue dog and a red cat

A red dog and a

Blue cat and a blue

Dog and a red cat

A, 4And, 3Blue, 2

Cat, 2Dog, 2Red, 2

M = 3 R = 2

A, 1Red, 1Dog, 1And, 1A, 1

Blue, 1Cat, 1And, 1A, 1Blue, 1

Dog, 1And, 1A, 1Red, 1Cat, 1

Map Partition Combiner Shuffle Sort

A, 4And, 3Blue, 2

Cat, 2Dog, 2Red, 2

A, 1Red, 1Dog, 1And, 1A, 1

Blue, 1Cat, 1And, 1A, 1Blue, 1

Dog, 1And, 1A, 1Red, 1Cat, 1

Reduce

A, 1And, 1A, 1

Blue, 1And, 1A, 1Blue, 1

And, 1A, 1

Red, 1Dog, 1

Cat, 1

Dog, 1Red, 1Cat, 1

A, 2And, 1Blue, 2And, 1A, 1And, 1A, 1

Red, 1Dog, 1Cat, 1Dog, 1Red, 1Cat, 1

A, 2And, 1

Blue, 2And, 1A, 1

And, 1A, 1

Red, 1Dog, 1

Cat, 1

Dog, 1Red, 1Cat, 1

A, 2A, 1A, 1And, 1And, 1And, 1Blue, 2

Cat, 1Cat, 1Dog, 1Dog, 1Red, 1Red, 1

15 k-v pairs 13 k-v pairs

● Partitioning Function

○ Default: hash(key) mod R

○ Custom: ex) hash(Hostname(urlkey)) mod R

● Ordering Guarantees

○ Processed in increasing key order.

● Combiner Function

○ Reducer applied in map task workers.

○ Reduce network overhead.

Refinements - Interaction with Master

Skipping Bad Records

Record 34Record 35Record 36

Signal Handler

Record 34Record 35Record 36

Reduce

Signal Handler

Record 34 0Record 35 2Record 36 0

Master

Send skip signal

Status Information

Counters

Refinements

● Input/Output Types

○ ex) “text” mode input: <offset, contents of line>

○ User can define custom reader/writer interface.

● Side-effects

○ Produce auxiliary files as additional outputs.

● Local Execution

○ Sequentially executes all of the work on the local machine.

○ Easily use any debugging or testing tools.

Contents

● Execution Flow

● Implementation

● Performance

● Experience

● Conclusion

Performance

● Cluster Configuration

○ 1800 nodes.

○ Two 2GHz Intel Xeon with HyperThreading, 2.5-3GB memory.

○ 100-200 Gbps of aggregate bandwidth.

● Benchmarks

○ Grep: search a rare pattern (92K matching records) out of 1010 100-byte records.

○ Sort: sort 1010 100-byte records (modeled after TeraSort benchmark).

Performance - Grep

1764 workers assigned

Read done

Startup overhead *

* Copy program to all workers & Locality optimization.

Performance - Sort

Text line Key, Text line (Sorted) Text line

● Input rate less than Grep

○ Intermediate files are larger (matching pattern vs total text line).

● Input > Shuffle > Output rate

○ Input rate benefits from locality optimization.

○ Output rate is low due to reliability policy of GFS - keep 2 copies.

Performance - Effect of Backup Tasks

5 stragglers44% increase in time

Performance - Machine failures

Re-read completed Map files

Only 5% increase

Contents

● Execution Flow

● Implementation

● Performance

● Experience

● Conclusion

Experience in Google

● Broadly applicable including

○ Large-scale machine learning problems.

○ Extraction of properties of web pages.

● Renewed production indexing system

○ Code is simpler, hiding details regarding fault tolerance and parallelization.

○ Keep conceptually unrelated computations separate.

○ Easy to operate and scale.

Conclusion

● MapReduce programming model

○ Is easy to use.

○ A large variety of problems are easily expressible as MapReduce

computations.

○ Scales to large clusters of machines.

MapReduce - Seoul National University

Documents

Transcript of MapReduce - Seoul National University

Seoul National University Academia Sinica 1 2 Seoul National …rosaec.snu.ac.kr/meet/file/20140730x.pdf · 2018. 4. 12. · Seoul National University Academia Sinica 1 1 2 1 1 2?

Korean 1 - Seoul National Univeristy

erase racism) - Seoul National University

Disclaimer - Seoul National University

School of Physics, Seoul National University, Seoul 151 ...ybkim/KIAS_APCTP/khkim.pdf · School of Physics, Seoul National University, Seoul 151-742, South Korea Yoon Seok Oh Seoul

1,2,5 Seoul National University, Seoul, Korea 3 Kunsan ... · 1,2,5 Seoul National University, Seoul, Korea 3 Kunsan National University, Kunsan, Korea 4 Dankook University, Seoul,

121101 Edanz Seoul National University

Lecture Note - Seoul National University

Manipulator kinematics - Seoul National University

Seoul National University

Seoul National University - Ch. 9. Statistical MechanicsSeoul National University Seoul National University CenterforActive Plasmonics Application Systems CenterforActivePlasmonics

Schedule - Seoul National University

Partitioning - Seoul National University

Performance - Seoul National University

Parallel Computation of Skyline and Reverse Skyline Queries … · 2019-07-12 · Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce Yoonjae Park Seoul National

2019 - Seoul National University

Fabrication Processes - Seoul National University

(Multiple Leadership) - Seoul National University

SEOUL NATIONAL UNIVERSITY Bio-inspired Robot Motor Learning F.C. Park Seoul National University.

MRTC Report - Seoul National University