Spark Tuning for Enterprise System Administrators
-
Upload
alpine-data -
Category
Data & Analytics
-
view
145 -
download
3
Transcript of Spark Tuning for Enterprise System Administrators
![Page 1: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/1.jpg)
Spark Tuning for Enterprise System Administrators
Anya T. Bida, PhD Rachel B. Warren
![Page 2: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/2.jpg)
Don't worry about missing something...
Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be Presentation: http://www.slideshare.net/anyabida Cheat-sheet: http://techsuppdiva.github.io/ !!Anya: https://www.linkedin.com/in/anyabida Rachel: https://www.linkedin.com/in/rachelbwarren !! !2
![Page 3: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/3.jpg)
About Anya About RachelOperations Engineer !!!
Spark & Scala Enthusiast / Data Engineer
Alpine Data!alpinenow.com
![Page 4: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/4.jpg)
About You*
Intermittent
Reliable Optimal
Spark practitioners
mySparkApp Success
*
![Page 5: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/5.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
![Page 6: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/6.jpg)
Default != RecommendedExample: By default, spark.executor.memory = 1g 1g allows small jobs to finish out of the box. Spark assumes you'll increase this parameter.
!6
![Page 7: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/7.jpg)
Which parameters are important? !
How do I configure them?
!7
Default != Recommended
![Page 8: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/8.jpg)
Filter* data before an
expensive reduce or aggregation
consider* coalesce(
Use* data structures that
require less memory
Serialize*
PySpark
serializing is built-in
Scala/Java?
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *
tuning.html#tuning-data-structures
See "Optimize partitions." *
See "GC investigation." *
See "Checkpointing." *
The Spark Tuning Cheat-Sheet
![Page 9: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/9.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
![Page 10: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/10.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
![Page 11: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/11.jpg)
!11
How many in the audience have their own
cluster?
![Page 12: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/12.jpg)
!12
![Page 13: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/13.jpg)
Fair Schedulers
!13
YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>
SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>
![Page 14: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/14.jpg)
Fair Schedulers
!14
YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>
SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>
![Page 15: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/15.jpg)
Fair Schedulers
!15
YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>
SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>
![Page 16: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/16.jpg)
Fair Schedulers
!16
YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>
SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>
![Page 17: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/17.jpg)
Fair Schedulers
!17
YARN <allocations> <queue name="sample_queue"> <minResources>4000 mb,0vcores</minResources> <maxResources>8000 mb,8vcores</maxResources> <maxRunningApps>10</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>fair</schedulingPolicy> </queue> </allocations>
SPARK <allocations> <pool name="sample_queue"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>2</minShare> </pool> </allocations>
Use these parameters!
![Page 18: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/18.jpg)
Fair Schedulers
!18
YARN <allocations> <user name="sample_user"> <maxRunningApps>6</maxRunningApps> </user> <userMaxAppsDefault>5</userMaxAppsDefault> !</allocations>
![Page 19: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/19.jpg)
Fair Schedulers
!19
YARN <allocations> <user name="sample_user"> <maxRunningApps>6</maxRunningApps> </user> <userMaxAppsDefault>5</userMaxAppsDefault> !</allocations>
![Page 20: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/20.jpg)
What is the memory limit for mySparkApp?
!20
![Page 21: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/21.jpg)
!21
Driver
Executor
Cluster Manager
Sidebar: Spark Architecture
Mark Grover: http://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-by-mark-grover-and-ted-malaska
Executor
![Page 22: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/22.jpg)
!22
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!
What is the memory limit for mySparkApp?
![Page 23: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/23.jpg)
!23
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!
What is the memory limit for mySparkApp?
![Page 24: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/24.jpg)
!24
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!
<maxResources>___mb</maxResources>
Limitation
What is the memory limit for mySparkApp?
![Page 25: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/25.jpg)
What is the memory limit for mySparkApp?
!25
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!
Reserve 25% for overhead
![Page 26: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/26.jpg)
!26
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !!!
What is the memory limit for mySparkApp?
![Page 27: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/27.jpg)
!27
![Page 28: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/28.jpg)
!28
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !
mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)
What is the memory limit for mySparkApp?
![Page 29: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/29.jpg)
!29
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !
mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)
What is the memory limit for mySparkApp?
![Page 30: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/30.jpg)
!30
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !
mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)
What is the memory limit for mySparkApp?
Limitation: Driver must not be larger than a single node.
![Page 31: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/31.jpg)
!31
yarn.nodemanager.resource.memory-mb
Driver Container
spark.driver.memory
![Page 32: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/32.jpg)
!32
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !
mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)
What is the memory limit for mySparkApp?
![Page 33: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/33.jpg)
!33
Driver
Executor
Cluster Manager
Sidebar: Spark Architecture
Mark Grover: http://www.slideshare.net/SparkSummit/top-5-mistakes-when-writing-spark-applications-by-mark-grover-and-ted-malaska
Executor
![Page 34: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/34.jpg)
!34
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit !
mySparkApp_mem_limit > driver.memory + (executor.memory x dynamicAllocation.maxExecutors)
What is the memory limit for mySparkApp?
Verify my calculations respect this limitation.
![Page 35: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/35.jpg)
!35
![Page 36: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/36.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
![Page 37: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/37.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
![Page 38: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/38.jpg)
mySparkApp memory issues
![Page 39: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/39.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
![Page 40: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/40.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
![Page 41: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/41.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
here let's talk about one scenario
![Page 42: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/42.jpg)
![Page 43: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/43.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
![Page 44: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/44.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
![Page 45: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/45.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *
![Page 46: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/46.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *
![Page 47: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/47.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
![Page 48: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/48.jpg)
Reduce the memory needed for mySparkApp. How?
Gracefully handle memory limitations. How?
mySparkApp memory issues
here let's talk about one scenario
![Page 49: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/49.jpg)
![Page 50: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/50.jpg)
Spark 1.1-1.5, Recommendation: Increase spark.memory.storageFraction
![Page 51: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/51.jpg)
!51Alexey Grishchenko: https://0x0fff.com/spark-memory-management/
Spark 1.1-1.5, Recommendation: Increase spark.memory.storageFraction !Spark 1.6, Recommendation: UnifiedMemoryManager
![Page 52: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/52.jpg)
Alexey Grishchenko: https://0x0fff.com/spark-memory-management/Sandy Ryza: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
yarn.nodemanager.resource.memory-mb
spar
k.ya
rn.e
xecu
tor.m
emor
yOve
rhea
d
Executor Container
spark.executor.memory
![Page 53: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/53.jpg)
!53
Driver
Cluster Manager
Sidebar: Spark Architecture
yarn.nodema
spar
k.ya
rn.e
Execspark.e
yarn.nodema
spar
k.ya
rn.e
Execspark.e
yarn.nodema
spar
k.ya
rn.e
Execspark.e
Executor
Executor
![Page 54: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/54.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
![Page 55: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/55.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
Instead of 2.5 hours, myApp completes in 1 hour.
![Page 56: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/56.jpg)
Cheat-sheet techsuppdiva.github.io/
![Page 57: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/57.jpg)
Intermittent Reliable
Optimal
mySparkApp Success
Memory trouble
Initial config
HighPerformanceSpark.com
![Page 58: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/58.jpg)
Further Reading:• Spark Tuning Cheat-sheet
techsuppdiva.github.io
• Apache Spark Documentation https://spark.apache.org/docs/latest
• Checkpointinghttp://spark.apache.org/docs/latest/streaming-programming-guide.html#checkpointinghttps://github.com/jaceklaskowski/mastering-apache-spark-book/blob/master/spark-rdd-checkpointing.adoc
• Learning Spark, by H. Karau, A. Konwinski, P. Wendell, M. Zaharia, 2015
!58
![Page 59: Spark Tuning for Enterprise System Administrators](https://reader031.fdocuments.in/reader031/viewer/2022030316/587332f11a28ab596c8b6f3b/html5/thumbnails/59.jpg)
More Questions?
!59
Video: https://www.youtube.com/watch?v=DNWaMR8uKDc&feature=youtu.be Presentation: http://www.slideshare.net/anyabida Cheat-sheet: http://techsuppdiva.github.io/ !!Anya: https://www.linkedin.com/in/anyabida Rachel: https://www.linkedin.com/in/rachelbwarren !! Thanks!