Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server...
Transcript of Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server...
![Page 1: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/1.jpg)
Memory Management for Spark
Ken SalemCheriton School of Computer ScienceUniversity of Waterloo
![Page 2: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/2.jpg)
Where I’m From
![Page 3: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/3.jpg)
Non-RelationalDatabase Design
What We’re Doing
DBMS-ManagedEnergy
Efficiency
FlexibleTransactionalPersistence
![Page 4: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/4.jpg)
Talk Plan
● Part 1: Spark overview○ What does Spark do? ○ Spark system architecture○ Spark programs○ Program execution: sessions, jobs, stages, tasks
● Part 2: Memory and Spark○ How does Spark use memory?
● Part 3: Memory-Oriented Research○ External caches○ Cache sharing○ Cache management
Michael Mior
![Page 5: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/5.jpg)
Part 1: Spark Overview
![Page 6: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/6.jpg)
What Is Spark?
● Popular Apache platform for scalable data processing● Analytical programs in Scala, Java, Python, embedding
Spark parallel operators● Higher level frameworks for ETL (SparkSQL), graph
processing (GraphX), streaming (Spark Streaming), machine learning (MLLib)
● Data in HDFS, NoSQL stores, Hive, others.
![Page 7: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/7.jpg)
server
A Spark Cluster
ExecutorW W W
server
ExecutorW W W
server
ExecutorW W W
Cluster Storage
Driver One per application. Runs sequential part, farms parallel part out to Executors.
Application input/output from cluster-wide storage system (e.g., HDFS, Cassandra, Hive)
Local secondary storage available for temporary/intermediate results
![Page 8: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/8.jpg)
server
A Spark Cluster
ExecutorW W W
server
ExecutorW W W
server
ExecutorW W W
Cluster Storage
Driver Driver
ExecutorW W W
ExecutorW W W
Each application has its own private driver and executors
![Page 9: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/9.jpg)
Spark Executors
Executor
W W W
cache
● Executor = JVM● Workers in an executor
share access to a cache.● Application specifies
○ (memory) size of executors
○ number of workers per executor
○ number of executors
![Page 10: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/10.jpg)
k-Means Clustering
https://elki-project.github.io/tutorial/hierarchical_clustering
![Page 11: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/11.jpg)
A Spark Program
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
Define an RDD of data points, from a file.
Each element of the RDD is a data point.
![Page 12: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/12.jpg)
RDDs
● RDD = Resilient Distributed Dataset
○ Set of elements on which parallel operations can be performed
● RDDs can be defined from:
○ Collections defined in the application program
○ External data sources (e.g., files)
○ Other RDDs
● Spark defines operations that can be performed on RDDs.
○ RDD operations are how Spark apps expose parallelism
![Page 13: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/13.jpg)
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
Spark Actions
● takeSample is an example of a Spark action
● Actions generate normal program values (in this case, a Python array) from RDDs
● kPoints is the initial set of cluster centers
![Page 14: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/14.jpg)
Spark Program Graph
Input file
data
RDD
takeSample
lines
![Page 15: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/15.jpg)
A Spark Program
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
map and reduceByKey are Spark transformations - they define a new RDD from an existing RDD.Transformations are evaluated lazily.
● Elements of closest: (index, (point,1))● Elements of pointStats: (index, (pointsum, count))● Elements of newPoints: (index, point)
![Page 16: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/16.jpg)
Spark Program Graph
data
takeSample
lines
closest
pointStats
newPoints
collect
![Page 17: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/17.jpg)
A Spark Program
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
● Regular Python code to compute (squared) distance between newPoints and current cluster centers (in kPoints)
● Iteration terminates when distance is small enough
![Page 18: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/18.jpg)
Spark Program Graph
data
takeSample
lines
closest
pointStats
newPoints
collect
closest
pointStats
newPoints
collect
closest
pointStats
newPoints
collect
![Page 19: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/19.jpg)
Another Example Program
tc = spark.sparkContext.parallelize(generateGraph(), partitions).cache()
edges = tc.map(lambda x_y: (x_y[1], x_y[0]))
oldCount = 0
nextCount = tc.count()
while True:
oldCount = nextCount
# Perform the join, obtaining an RDD of (y, (z, x)) pairs,
# then project the result to obtain the new (x, z) paths.
new_edges = tc.join(edges).map(lambda __a_b: (__a_b[1][1], __a_b[1][0]))
tc = tc.union(new_edges).distinct().cache()
nextCount = tc.count()
if nextCount == oldCount:
Break
print("TC has %i edges" % tc.count())
actions
unary transformations
binary transformations
![Page 20: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/20.jpg)
Spark Jobs
data
takeSample
lines
closest
pointStats
newPoints
collect
closest
pointStats
newPoints
collect
closest
pointStats
newPoints
collect
One Spark job per action.
![Page 21: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/21.jpg)
Spark Jobs
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
Job 1
Jobs 2...N
![Page 22: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/22.jpg)
Spark Jobs
● One job per Spark action.
● Spark transformations are evaluated lazily.
● Jobs are created sequentially, as actions are encountered during program execution.
● Spark driver does not know in advance how many jobs a program will produce.
![Page 23: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/23.jpg)
RDD Partitioning
data
lines
closest
pointStats
newPoints
collect
[2,3][1,3][7,6][3,3][2,5]
[4,2][3,2][1,1][7,2][8,2][5,1]
[6,3][1,2][2,3][4,7][6,1]
(1,([2,3],1))(1,([1,3],1))(2,([7,6],1))(1,([3,3],1))(2,([2,5],1))
(2,([4,2],1))(2,([3,2],1))(1,([1,1],1))(2,([7,2],1))(2,([8,2],1))(2,([5,1],1))
(2,([6,3],1))(1,([1,2],1))(1,([2,3],1))(2,([4,7],1))(1,([6,1],1))
(1,([2,3],1))(1,([1,3],1))(1,([3,3],1))(1,([1,1],1))(1,([1,2],1))(1,([2,3],1))(1,([6,1],1))
(2,([7,6],1))(2,([2,5],1))(2,([4,2],1))(2,([3,2],1))(2,([7,2],1))(2,([8,2],1))(2,([5,1],1))(2,([6,3],1))(2,([4,7],1))
(1,([16,16],7))
(2,([46,30],9))
(1,[2.28,2.28])
(2,[5.11,3.33])
file
data closest pointStats newPoints
shuffle
![Page 24: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/24.jpg)
Stages and Tasks
data
lines
closest
pointStats
newPoints
collect
[2,3][1,3][7,6][3,3][2,5]
[4,2][3,2][1,1][7,2][8,2][5,1]
[6,3][1,2][2,3][4,7][6,1]
(1,([2,3],1))(1,([1,3],1))(2,([7,6],1))(1,([3,3],1))(2,([2,5],1))
(2,([4,2],1))(2,([3,2],1))(1,([1,1],1))(2,([7,2],1))(2,([8,2],1))(2,([5,1],1))
(2,([6,3],1))(1,([1,2],1))(1,([2,3],1))(2,([4,7],1))(1,([6,1],1))
(1,([2,3],1))(1,([1,3],1))(1,([3,3],1))(1,([1,1],1))(1,([1,2],1))(1,([2,3],1))(1,([6,1],1))
(2,([7,6],1))(2,([2,5],1))(2,([4,2],1))(2,([3,2],1))(2,([7,2],1))(2,([8,2],1))(2,([5,1],1))(2,([6,3],1))(2,([4,7],1))
(1,([16,16],7))
(2,([46,30],9))
(1,[2.28,2.28])
(2,[5.11,3.33])
file
data closest pointStats newPoints
TASK
TASK
TASK
TASK
TASK
TASK
TASK
TASK
Job 1 Job 2, Stage 1 Job 2, Stage 2
![Page 25: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/25.jpg)
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
Job Execution and Scheduling
1. Local execution at Driver until takeSample triggers Job 1.
2. Scheduler determines Job 1 stages and tasks, and assigns tasks to Executors.
![Page 26: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/26.jpg)
server
Spark Scheduler
ExecutorW W W
server
ExecutorW W W
server
ExecutorW W W
Cluster Storage
Driver
Job 1 Tasks
![Page 27: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/27.jpg)
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
Job Execution and Scheduling
1. Local execution at Driver until collect triggers Job 2.
2. Scheduler determines Job 2 stages and tasks, and assigns tasks to Executors.
![Page 28: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/28.jpg)
server
Spark Scheduler
ExecutorW W W
server
ExecutorW W W
server
ExecutorW W W
Cluster Storage
Driver
Job 2 Tasks
![Page 29: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/29.jpg)
Part 2: Spark and Memory
![Page 30: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/30.jpg)
● Memory for task execution
○ Shuffles, joins, sorts, aggregations,....
● Memory for caching
○ Saved RDD partitions
● Indirect Memory use, e.g., file caching
How Does Spark Use Memory?
![Page 31: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/31.jpg)
container
Spark Executor Memory
executor (JVM)
Spark memory
storagememory
executionmemory
● Boundary can adjust dynamically
● Execution can evict stored RDDs
Storage lower bound
![Page 32: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/32.jpg)
Executor Out-of-Memory Failures
From: M. Kunjir, S. Babu. Understanding Memory Management In Spark For Fun And Profit. Spark Summit 2016. Used with permission.
![Page 33: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/33.jpg)
Performance Depends on Memory
failure @ 512MB
![Page 34: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/34.jpg)
Caching in Spark
data
takeSample
lines
closest
pointStats
newPoints
collect
closest
pointStats
newPoints
collect
closest
pointStats
newPoints
collect
Job 1 computes data RDD, takes sample.
Job 2 must re-compute the data RDD, unless it has been cached!
Job 3 also needs the data RDD.
![Page 35: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/35.jpg)
What to Cache?
Spark can cache some or all of the partitions of any RDD.
Q: How does Spark know which RDDs to cache?
A: The programmer must tell it!
![Page 36: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/36.jpg)
Application-Managed Caching
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
Quartet: Harmonizing task scheduling and caching for cluster
computing (p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints))
Dear Spark:
Please cache this RDD.
Thanks,T. Programmer
![Page 37: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/37.jpg)
persist()
● Spark programmers use persist(storage_level) to identify RDDs that should be cached
● cache() means persist at the default storage level ● storage_level?
○ MEMORY_ONLY
○ MEMORY_ONLY_SER
○ DISK_ONLY
○ DISK_AND_MEMORY
○ and more….
● Also: unpersist()
![Page 38: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/38.jpg)
Effect of Caching
![Page 39: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/39.jpg)
Spark Caching Advice
Don’t cache all RDDs.
Do cache RDDs that are re-used.
Unless there are too many of them.
In which case, just cache some of them
Or maybe try a DISK or SER level, depending on I/O capacity, recomputation cost, and serialization effectiveness.
![Page 40: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/40.jpg)
Part III: Research
TopicsExploit Re-Use of Inputs
Cache Sharing Across ProgramsBetter Cache Mangement
![Page 41: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/41.jpg)
Research: Quartet
● Input skew: 80% accesses to 10% of files● Reuse happens quickly: 90% of re-use within one hour● Key idea: run tasks with cached file inputs first
○ Tasks in each stage are independent○ Maximize “memory local” task input
Deslaurier et al. Quartet: Harmonizing task scheduling and caching for cluster computing.HotStorage 2016
● Exploits Duet: in-kernel framework for exposing contents of file cache to applications
![Page 42: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/42.jpg)
server
In-Kernel File Cache
ExecutorW W W
server
ExecutorW W W
server
ExecutorW W W
Driver Driver
ExecutorW W W
ExecutorW W W
Cluster Storage
cache cache cache
![Page 43: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/43.jpg)
Quartet Example
lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
data = lines.map(parseVector).cache()
kPoints = data.takeSample(False, K, 1)
tempDist = 1.0
while tempDist > convergeDist:
closest = data.map(lambda p: (closestPoint(p, kPoints), (p, 1)))
pointStats = closest.reduceByKey(lambda p1_c1, p2_c2:
(p1_c1[0] + p2_c2[0], p1_c1[1] + p2_c2[1]))
newPoints = pointStats.map(lambda st: (st[0], st[1][0] / st[1][1])).collect()
tempDist = sum(np.sum((kPoints[iK] - p) ** 2) for (iK, p) in newPoints)
for (iK, p) in newPoints:
kPoints[iK] = p
print("Final centers: " + str(kPoints)) This job runs faster if the input is already (partially) in the file cache. This job does not benefit.
![Page 44: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/44.jpg)
Research: Cache Sharing
● Quartet can exploit data sharing across programs, since file system cache is external to Spark executors.
● Other ways to achieve this:
○ External caches, e.g., Tachyon/Alluxio○ Concurrent Spark applications with shared Spark
contexts
![Page 45: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/45.jpg)
server
External Caches
W W W
server
W W W
server
W W W
Driver Driver
W W W W W W
Cluster Storage
cache cache cache
Alluxio
● Distributed, shared in-memory file system
● Caching, or tiered with underlying persistent file system
![Page 46: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/46.jpg)
server
Sharing Spark Contexts
ExecutorW W W
server
ExecutorW W W
server
ExecutorW W W
Driver
Cluster Storage
T1 T2 ● Concurrent Spark app● Each thread can run
Spark jobs● All jobs share the same
Executors (and executor caches)
![Page 47: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/47.jpg)
Research: Robus
● Goal: Manage a shared cache to improve performance of a multi-tenant workload, while maintaining fairness.
● Approach: (1) Batch incoming jobs, (2) make caching decision for batch, (3) load cache, (4) run batch, (5) repeat.
● Enables sharing within batch, not across batches.
Kunjir et al. ROBUS: Fair Cache Allocation for Data-parallel Workloads. Proc. SIGMOD 2017.
![Page 48: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/48.jpg)
server
Robus
ExecutorW W W
server
ExecutorW W W
server
ExecutorW W W
ROBUS(driver)
Cluster Storage
job queue, with weight (relative importance)
![Page 49: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/49.jpg)
Research: What to Cache?
● Spark applications identify RDDs to be cached
● Executors attempt to cache what the application identifies.
● LRU eviction
● Cache granularity = RDD partition
![Page 50: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/50.jpg)
Application-Managed Caching:Best Practice
What to cache?● RDDs that will be re-used
How to cache it?● MEMORY_ONLY if everything fits● MEMORY_ONLY_SER if it helps things fit● Avoid disk unless computation is expensive or selective
Difficult Advice to Follow!
Can we automate this?
![Page 51: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/51.jpg)
Spark Caching Example
A
B
C
D
E
F
G
H
I
J
K
L
M
Job 2
Job 1
Job 3 Job 4
Application says to cache C, E, and J
Is this the best choice?Can Spark make it automatically?
![Page 52: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/52.jpg)
Caching Issue #1: Partial Info
A
B
C
D
E
F
G
H
I
J
K
L
M
Job 2
Job 1
Job 3 Job 4
Amount of re-use of C,E is unknown after Job 1
Unpersist after Job 2?
![Page 53: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/53.jpg)
Caching Issue #2: Job Structure
A
B
C
D
E
F
G
H
I
J
K
L
M
Job 2
Job 1
Job 3 Job 4
Cost of recalculating E depends on whether its ancestors are cached
![Page 54: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/54.jpg)
Caching Issue #3: Unhinted Candidates
A
B
C
D
E
F
G
H
I
J
K
L
M
Job 2
Job 1
Job 3 Job 4
Alternatives to C, E, J.
Is D smaller than E?
Is D -> E expensive?
![Page 55: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/55.jpg)
A Larger Example
9 jobs
16 stages
55 RDDs
![Page 56: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/56.jpg)
Research: Neutrino
● Goal: Eliminate programmer-specified storage levels.
Xu et al. Neutrino: Revisiting Memory Caching for Iterative Data Analytics. Proc. USENIX HotStorage 2016.
● Approach:○ Pilot run(s) to learn complete program
graph○ Schedule cache/convert/discard on RDD
partitions at each stage to achieve min-cost execution.
![Page 57: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/57.jpg)
Our Take on the Caching Problem
Goal: Application-managed caching ⇒ Application-assisted caching
Approach:Application identifies re-use
● avoid pilot runs, program analysisSpark maximizes effectiveness of available space
● Key challenge: size and cost estimation!● Approach: on-the-fly estimation, conditional plans.
![Page 58: Memory Management for Spark - ScaDSA Spark Cluster Executor W W W server Executor W W W server Executor W W W Cluster Storage Driver One per application. Runs sequential part, farms](https://reader033.fdocuments.in/reader033/viewer/2022052519/60519e417fed4d35377f44d5/html5/thumbnails/58.jpg)
Summary
● Spark applications expose parallelism using RDD operations
● Spark provides distributed, fault-tolerant, in-memory execution
● Memory is a critical resource for Spark○ For program execution, for result caching
● Related Research○ Exploiting re-use of input data and query results across programs○ What, where, and how to cache results