MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include...
-
Upload
claud-long -
Category
Documents
-
view
214 -
download
0
description
Transcript of MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include...
![Page 1: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/1.jpg)
MapReduceCS-4513 D-term 2008 1
MapReduce
CS-4513Distributed Computing Systems
(Slides include materials from Operating System Concepts, 7th ed., by Silbershatz, Galvin, & Gagne, Distributed Systems: Principles & Paradigms, 2nd ed. By Tanenbaum and Van Steen, and
Modern Operating Systems, 2nd ed., by Tanenbaum)
![Page 2: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/2.jpg)
MapReduceCS-4513 D-term 2008 2
Why MapReduce
• An important new model of parallel and distributed computing
• Particularly for problems dealing with “big data”
• An abstraction to automate the mechanics of data handling and to let the programmer concentrate on semantics of the problem
![Page 3: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/3.jpg)
MapReduceCS-4513 D-term 2008 3
From Operating System course
• Three fundamental models of parallel computing– Data Parallelism– Task Parallelism– Pipelined Parallelism
• Each requires a different set of tools• Each requires a different mode of thinking
![Page 4: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/4.jpg)
MapReduceCS-4513 D-term 2008 4
MapReduce
• A new model• Fundamentally different from previous
models• Shares some elements with each one
• Promise (hope?) of solving new classes of problems that were previously very tedious to solve
• Not in textbooks• Not in previous Distributed Systems courses at WPI
![Page 5: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/5.jpg)
MapReduceCS-4513 D-term 2008 5
Learning about MapReduce
• Partition class into four teams• Each team responsible for understanding
and teaching the rest of the class about one subtopic
• 30-40 minutes of class time per team• Two teams on April 4• Two teams on April 8
![Page 6: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/6.jpg)
MapReduceCS-4513 D-term 2008 6
MapReduce subtopics
• The abstraction itself and its algorithms
• Distributed MapReduce
• Class of problems that MapReduce can help solve
• Google File System to support MapReduce
![Page 7: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/7.jpg)
MapReduceCS-4513 D-term 2008 7
MapReduce abstraction
• Explain the abstraction, what it does, etc.• Explain the algorithms• Show non-trivial programming examples
• Focus on how to think about a problem
![Page 8: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/8.jpg)
MapReduceCS-4513 D-term 2008 8
Distributed MapReduce
• Show how it is naturally distributable and scalable
• Up to terabytes of data and more
• Show how mechanics of distribution and parallelization are automated
• Focus on• Performance, Reliability,
Fault-tolerance, Failure recovery
![Page 9: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/9.jpg)
MapReduceCS-4513 D-term 2008 9
Classes of problems
• Identify classes of problems on which to use MapReduce
• Characterize them• Why were they difficult before• Why are people so excited about MapReduce• Why did Google rewrite 10,000 existing programs
in MapReduce form
![Page 10: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/10.jpg)
MapReduceCS-4513 D-term 2008 10
Google File System
• What is so special about it?• How different from traditional file systems
• How does it help MapReduce
• Focus on• Performance, Reliability,
Fault-tolerance, Failure recovery
![Page 11: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/11.jpg)
MapReduceCS-4513 D-term 2008 11
Action items today
• Form teams (one for each subtopic)• Roster to professor
• Get organized to• Do reading• Prepare topic
![Page 12: MapReduceCS-4513 D-term 20081 MapReduce CS-4513 Distributed Computing Systems (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz,](https://reader036.fdocuments.in/reader036/viewer/2022082723/5a4d1b597f8b9ab0599aa7a2/html5/thumbnails/12.jpg)
MapReduceCS-4513 D-term 2008 12
References
• See e-mails• See course web page