MapReduce - USTH Moodle
Transcript of MapReduce - USTH Moodle
![Page 1: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/1.jpg)
What Why How
MapReduce
Tran Giang Son, [email protected]
ICT Department, USTH
MapReduce Tran Giang Son, [email protected] 1 / 44
![Page 3: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/3.jpg)
What Why How
What?
• A simple programming model that applies to manylarge-scale computing problem
• Parallel computation• Workload distribution• Load balancing• Fault tolerance
• Not a language• Not a library
MapReduce Tran Giang Son, [email protected] 3 / 44
![Page 4: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/4.jpg)
What Why How
What?
• Example• Count number of students inside USTH building at the
moment?
• Traditional way?• Smart way?• Smarter way?
MapReduce Tran Giang Son, [email protected] 4 / 44
![Page 5: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/5.jpg)
What Why How
What?
• Traditional way
:• Place a counting table at the parking entrance of USTH• Announce to everyone to go down there, make a queue, count
Problem: slow, bottleneck at the counting table
MapReduce Tran Giang Son, [email protected] 5 / 44
![Page 6: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/6.jpg)
What Why How
What?
• Traditional way :• Place a counting table at the parking entrance of USTH
• Announce to everyone to go down there, make a queue, count
Problem: slow, bottleneck at the counting table
MapReduce Tran Giang Son, [email protected] 5 / 44
![Page 7: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/7.jpg)
What Why How
What?
• Traditional way :• Place a counting table at the parking entrance of USTH• Announce to everyone to go down there, make a queue, count
Problem: slow, bottleneck at the counting table
MapReduce Tran Giang Son, [email protected] 5 / 44
![Page 8: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/8.jpg)
What Why How
What?
• Traditional way :• Place a counting table at the parking entrance of USTH• Announce to everyone to go down there, make a queue, count
Problem: slow, bottleneck at the counting table
MapReduce Tran Giang Son, [email protected] 5 / 44
![Page 9: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/9.jpg)
What Why How
What?
• Smart way
: place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum• Parking exits at the ground floor (2)• Stair exits on the second floor (2)• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 10: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/10.jpg)
What Why How
What?
• Smart way : place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum• Parking exits at the ground floor (2)• Stair exits on the second floor (2)• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 11: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/11.jpg)
What Why How
What?
• Smart way : place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum
• Parking exits at the ground floor (2)• Stair exits on the second floor (2)• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 12: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/12.jpg)
What Why How
What?
• Smart way : place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum• Parking exits at the ground floor (2)
• Stair exits on the second floor (2)• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 13: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/13.jpg)
What Why How
What?
• Smart way : place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum• Parking exits at the ground floor (2)• Stair exits on the second floor (2)
• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 14: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/14.jpg)
What Why How
What?
• Smart way : place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum• Parking exits at the ground floor (2)• Stair exits on the second floor (2)• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 15: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/15.jpg)
What Why How
What?
• Smart way : place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum• Parking exits at the ground floor (2)• Stair exits on the second floor (2)• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 16: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/16.jpg)
What Why How
What?
• Smart way : place counting tables at every possible exit ofUSTH building
• Emergency exit near the museum• Parking exits at the ground floor (2)• Stair exits on the second floor (2)• Hit fire alarm
• Wait and count
• Still bottleneck at counting tables
MapReduce Tran Giang Son, [email protected] 6 / 44
![Page 17: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/17.jpg)
What Why How
What?
• Smarter way
:• Come to each classroom• Ask the class monitor to count• Aggregate the results in the second time
• Less intrusive, more work done, can be better parallelized
MapReduce Tran Giang Son, [email protected] 7 / 44
![Page 18: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/18.jpg)
What Why How
What?
• Smarter way :• Come to each classroom
• Ask the class monitor to count• Aggregate the results in the second time
• Less intrusive, more work done, can be better parallelized
MapReduce Tran Giang Son, [email protected] 7 / 44
![Page 19: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/19.jpg)
What Why How
What?
• Smarter way :• Come to each classroom• Ask the class monitor to count
• Aggregate the results in the second time
• Less intrusive, more work done, can be better parallelized
MapReduce Tran Giang Son, [email protected] 7 / 44
![Page 20: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/20.jpg)
What Why How
What?
• Smarter way :• Come to each classroom• Ask the class monitor to count• Aggregate the results in the second time
• Less intrusive, more work done, can be better parallelized
MapReduce Tran Giang Son, [email protected] 7 / 44
![Page 21: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/21.jpg)
What Why How
What?
• Smarter way :• Come to each classroom• Ask the class monitor to count• Aggregate the results in the second time
• Less intrusive, more work done, can be better parallelized
MapReduce Tran Giang Son, [email protected] 7 / 44
![Page 22: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/22.jpg)
What Why How
What?
• Two operations• map(): “one to one” transform of each element in a set
mapfS = {f(x)|x ∈ S}
• reduce(): “many to one” transform of a element set
reducefS = f({x|x ∈ S})
MapReduce Tran Giang Son, [email protected] 8 / 44
![Page 23: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/23.jpg)
What Why How
map()
• Pre-map()• Reads data from source
• Transform
MapReduce Tran Giang Son, [email protected] 9 / 44
![Page 25: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/25.jpg)
What Why How
Data explosion
• A lot of data• 130+ trillion of webpages (2016)• 20KB each• 2,600,000+ TB
MapReduce Tran Giang Son, [email protected] 11 / 44
![Page 26: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/26.jpg)
What Why How
Data explosion
• Hard drive: 100MB/s sequential read• ~824,450,000 years to read
• SSD• SATA3 500MB/s sequential read ~ 164,800,000 years• M.2 3500MB/s sequential read ~ 23,500,000 years
• Processing this data• Sorting / Searching / Indexing / Classification
MapReduce Tran Giang Son, [email protected] 12 / 44
![Page 27: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/27.jpg)
What Why How
Data explosion
• Hard drive: 100MB/s sequential read• ~824,450,000 years to read
• SSD• SATA3 500MB/s sequential read ~ 164,800,000 years• M.2 3500MB/s sequential read ~ 23,500,000 years
• Processing this data• Sorting / Searching / Indexing / Classification
MapReduce Tran Giang Son, [email protected] 12 / 44
![Page 28: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/28.jpg)
What Why How
Data explosion
• Hard drive: 100MB/s sequential read• ~824,450,000 years to read
• SSD• SATA3 500MB/s sequential read ~ 164,800,000 years• M.2 3500MB/s sequential read ~ 23,500,000 years
• Processing this data• Sorting / Searching / Indexing / Classification
MapReduce Tran Giang Son, [email protected] 12 / 44
![Page 29: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/29.jpg)
What Why How
Why MapReduce?
• Traditional programming is serial• Break processing into independent batches• Process concurrently• Aggregate result
MapReduce Tran Giang Son, [email protected] 13 / 44
![Page 30: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/30.jpg)
What Why How
Parallelization
• Multi-core• Multi-CPU• Cluster• Grid
MapReduce Tran Giang Son, [email protected] 14 / 44
![Page 36: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/36.jpg)
What Why How
Parallelization
Key Value
Name Sunway TaihulightNodes 40,960CPU SW26010, 256 cores 1.45GHz/nodeCores 10,649,600Memory 1.31PB (1310TB)Storage 20PB (20000TB)Peak 125 PFLOPSLinpack 93.01 PFLOPSPower 15MWLocation National Supercomputer Center, Wuxi, ChinaActive June 2016
MapReduce Tran Giang Son, [email protected] 20 / 44
![Page 37: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/37.jpg)
What Why How
Why MapReduce?
Challenges:
• Breaking problem into smaller task• Assigning tasks to machines?• Partitioning and distributing data?• Sharing intermediate data?• Coordinating synchronization? Scheduling? Fault-tolerance?
MapReduce Tran Giang Son, [email protected] 21 / 44
![Page 38: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/38.jpg)
What Why How
Why MapReduce?
• Scale “out”, not scale “up”• E.g. more workers, not more levels of management
• Failure are common• Process data sequentially and not randomly
MapReduce Tran Giang Son, [email protected] 22 / 44
![Page 40: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/40.jpg)
What Why How
Implementations
• Google• Internal• Proprietary
• Apache Hadoop MapReduce• Most common open source implementation
• Amazon Elastic MapReduce• On EC2
MapReduce Tran Giang Son, [email protected] 24 / 44
![Page 41: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/41.jpg)
What Why How
Who does what?
• Implement two methods• map(): Mapper• reduce(): Reducer
MapReduce Tran Giang Son, [email protected] 25 / 44
![Page 42: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/42.jpg)
What Why How
MapReduce architecture
split 1
split 0 Map
Map
Map
Reduce
Reduce
output 0
output 1
Map phase Shuffle & Sort Reduce phase
split 2
split 3
split 4
split 5
Input (HDFS)
Intermediate Results (Local)
Output (HDFS)
MapReduce Tran Giang Son, [email protected] 26 / 44
![Page 43: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/43.jpg)
What Why How
Execution Framework
• The execution framework (runtime) handles everything else• Scheduling: who does map()? who does reduce()?• Data distribution: move data to processes (worker)• Synchronization: gathers, sorts,• Fault-tolerance: detects failure, restarts• Distributed file system
MapReduce Tran Giang Son, [email protected] 27 / 44
![Page 44: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/44.jpg)
What Why How
Who does what?
• A “master” controls execution of “slaves”• Mappers are put near their input block
• Minimize network usage
• Mappers persist outputs to disk before passing to producer• For fault tolerance
MapReduce Tran Giang Son, [email protected] 28 / 44
![Page 45: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/45.jpg)
What Why How
Fault Tolerance
• Task crashes• Retry on other node
• map()?
no deps• reduce()? saved on disk
• Important: Task independence
MapReduce Tran Giang Son, [email protected] 29 / 44
![Page 46: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/46.jpg)
What Why How
Fault Tolerance
• Task crashes• Retry on other node
• map()? no deps
• reduce()? saved on disk
• Important: Task independence
MapReduce Tran Giang Son, [email protected] 29 / 44
![Page 47: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/47.jpg)
What Why How
Fault Tolerance
• Task crashes• Retry on other node
• map()? no deps• reduce()?
saved on disk
• Important: Task independence
MapReduce Tran Giang Son, [email protected] 29 / 44
![Page 48: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/48.jpg)
What Why How
Fault Tolerance
• Task crashes• Retry on other node
• map()? no deps• reduce()? saved on disk
• Important: Task independence
MapReduce Tran Giang Son, [email protected] 29 / 44
![Page 49: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/49.jpg)
What Why How
Fault Tolerance
• Task crashes• Retry on other node
• map()? no deps• reduce()? saved on disk
• Important: Task independence
MapReduce Tran Giang Son, [email protected] 29 / 44
![Page 50: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/50.jpg)
What Why How
Fault Tolerance
• Node crashes• Start tasks on a new node
• map()?
restart• reduce()? nothing else
MapReduce Tran Giang Son, [email protected] 30 / 44
![Page 51: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/51.jpg)
What Why How
Fault Tolerance
• Node crashes• Start tasks on a new node
• map()? restart
• reduce()? nothing else
MapReduce Tran Giang Son, [email protected] 30 / 44
![Page 52: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/52.jpg)
What Why How
Fault Tolerance
• Node crashes• Start tasks on a new node
• map()? restart• reduce()?
nothing else
MapReduce Tran Giang Son, [email protected] 30 / 44
![Page 53: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/53.jpg)
What Why How
Fault Tolerance
• Node crashes• Start tasks on a new node
• map()? restart• reduce()? nothing else
MapReduce Tran Giang Son, [email protected] 30 / 44
![Page 54: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/54.jpg)
What Why How
Fault Tolerance
• Task becomes slow• Launch same task on another node• Use result of whoever finishes first• Kill the second one
• Popular in large cluster
MapReduce Tran Giang Son, [email protected] 31 / 44
![Page 55: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/55.jpg)
What Why How
Extras
• Extra optional supporting functions• partition(): divide key space for parallelization• combine(): mini reducers to combine after map
• Barriers
MapReduce Tran Giang Son, [email protected] 32 / 44
![Page 56: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/56.jpg)
What Why How
Example: Word Count
• The classic example for MapReduce• Input: a large text file• Output : number of occurrence of each word
MapReduce Tran Giang Son, [email protected] 33 / 44
![Page 57: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/57.jpg)
What Why How
Example: Word Count
• map(): count occurence of word in a single line
1. three witches watch threeswatch watches
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1>
MapReduce Tran Giang Son, [email protected] 34 / 44
![Page 58: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/58.jpg)
What Why How
Example: Word Count
• map(): count occurence of word in a single line
1. three witches watch threeswatch watches
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1>
MapReduce Tran Giang Son, [email protected] 34 / 44
![Page 59: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/59.jpg)
What Why How
Example: Word Count
• map(): count occurence of word in a single line
2. which witch watches whichswatch watch
<which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
MapReduce Tran Giang Son, [email protected] 35 / 44
![Page 60: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/60.jpg)
What Why How
Example: Word Count
• map(): count occurence of word in a single line
2. which witch watches whichswatch watch
<which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
MapReduce Tran Giang Son, [email protected] 35 / 44
![Page 61: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/61.jpg)
What Why How
Example: Word Count
• map(): count occurence of word in a single line
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1><which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
MapReduce Tran Giang Son, [email protected] 36 / 44
![Page 62: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/62.jpg)
What Why How
Example: Word Count
• Group pairs that have same K
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1><which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
<three, 1><three, 1><witches, 1><watch, 1><watch, 1><swatch, 1><swatch, 1><watches, 1><watches, 1><which, 1><which, 1><witch, 1>
MapReduce Tran Giang Son, [email protected] 37 / 44
![Page 63: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/63.jpg)
What Why How
Example: Word Count
• Group pairs that have same K
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1><which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
<three, 1><three, 1><witches, 1><watch, 1><watch, 1><swatch, 1><swatch, 1><watches, 1><watches, 1><which, 1><which, 1><witch, 1>
MapReduce Tran Giang Son, [email protected] 37 / 44
![Page 64: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/64.jpg)
What Why How
Example: Word Count
• reduce(): combine occurence of word in a single line
<three, 1><three, 1><witches, 1><watch, 1><watch, 1><swatch, 1><swatch, 1><watches, 1><watches, 1><which, 1><which, 1><witch, 1>
<three, 2><witches, 1><watch, 2><swatch, 2><watches, 2><which, 2><witch, 1>
MapReduce Tran Giang Son, [email protected] 38 / 44
![Page 65: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/65.jpg)
What Why How
Example: Word Count
1. threewitcheswatch threeswatchwatches
2. whichwitchwatcheswhichswatchwatch
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1>
<which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
<three, 1><three, 1><witches, 1><watch, 1><watch, 1><swatch, 1><swatch, 1><watches, 1><watches, 1><which, 1><which, 1><witch, 1>
<three, 2><witches, 1><watch, 2><swatch, 2><watches, 2><which, 2><witch, 1>
MapReduce Tran Giang Son, [email protected] 39 / 44
![Page 66: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/66.jpg)
What Why How
Example: Word Count
1. threewitcheswatch threeswatchwatches
2. whichwitchwatcheswhichswatchwatch
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1>
<which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
<three, 1><three, 1><witches, 1><watch, 1><watch, 1><swatch, 1><swatch, 1><watches, 1><watches, 1><which, 1><which, 1><witch, 1>
<three, 2><witches, 1><watch, 2><swatch, 2><watches, 2><which, 2><witch, 1>
MapReduce Tran Giang Son, [email protected] 39 / 44
![Page 67: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/67.jpg)
What Why How
Example: Word Count
1. threewitcheswatch threeswatchwatches
2. whichwitchwatcheswhichswatchwatch
<three, 1><witches, 1><watch, 1><three, 1><swatch, 1><watches, 1>
<which, 1><witch, 1><watches, 1><which, 1><swatch, 1><watch, 1>
<three, 1><three, 1><witches, 1><watch, 1><watch, 1><swatch, 1><swatch, 1><watches, 1><watches, 1><which, 1><which, 1><witch, 1>
<three, 2><witches, 1><watch, 2><swatch, 2><watches, 2><which, 2><witch, 1>
MapReduce Tran Giang Son, [email protected] 39 / 44
![Page 69: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/69.jpg)
What Why How
Example: Word Count Extra
three swiss witch-bitches, whichwished to be switched swisswitch-bitches, watch three swissswatch watch switches. whichswiss witch-bitch, which wishesto be a switched witch-bitch,wishes to watch which swissswatch watch switch?
<swiss, 5><witch, 4><watch, 4><three, 2><bitches, 2><switched, 2><swatch, 2><bitch, 2><wishes, 2><wished, 1>
MapReduce Tran Giang Son, [email protected] 41 / 44
![Page 70: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/70.jpg)
What Why How
Example: Word Count Extra
three swiss witch-bitches, whichwished to be switched swisswitch-bitches, watch three swissswatch watch switches. whichswiss witch-bitch, which wishesto be a switched witch-bitch,wishes to watch which swissswatch watch switch?
<swiss, 5><witch, 4><watch, 4><three, 2><bitches, 2><switched, 2><swatch, 2><bitch, 2><wishes, 2><wished, 1>
MapReduce Tran Giang Son, [email protected] 41 / 44
![Page 71: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/71.jpg)
What Why How
Practical work 4: Word Count
• Create a new directory named «WordCount»• Use any MapReduce framework of your choice to implement
Word Count example• Java is OK• C/C++ is still preferred
• No MapReduce framework for C/C++ at the moment• Invent yourself
MapReduce Tran Giang Son, [email protected] 42 / 44
![Page 72: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/72.jpg)
What Why How
Practical work 4: Word Count
• Write a short report in LATEX:• Name it « 04.word.count.tex »• Why you chose your specific MapReduce implementation• How your Mapper and Reducer work. Figure.• Who does what.
• Work in your group, in parallel• Push your report to corresponding forked Github repository
MapReduce Tran Giang Son, [email protected] 43 / 44
![Page 73: MapReduce - USTH Moodle](https://reader034.fdocuments.in/reader034/viewer/2022042421/62615a1f731bcd62137771db/html5/thumbnails/73.jpg)
What Why How
Practical work 5: The Longest Path
• Use any MapReduce framework of your choice to implementLongestPath toy project
• Input: set of files, one for each of your laptops• Each line contain one full path of a file• find /
• Output: longest path(s)
• Write a short report in LATEX:• Name it « 05.word.count.tex »• How your Mapper and Reducer work. Figure.• Who does what.
• Work in your group, in parallel• Push your report to corresponding forked Github repository
MapReduce Tran Giang Son, [email protected] 44 / 44