Performance Analysis - Marco Serafini · •Garbage collection deallocates memory for you...
Transcript of Performance Analysis - Marco Serafini · •Garbage collection deallocates memory for you...
Performance Analysis
Marco Serafini
COMPSCI 590SLecture 14
33
Blocked-Time Analysis• Two steps
1. Measure time blocked while doing X2. Simulate scheduling without blocking
• Why simulation? • Want to reproduce scheduling
• Why not measure “CPU blocking” time• Much more complex to disentangle
• Gains are relative, simulation/simulation
44
Measuring a System• Scientific experiment• Needed
• A good definition of the metric to be measured• A good model of the behavior of the system• A good setup that eliminates external factors
• Existing systems did not always measure right
55
Disclaimer• Performance measurements are hard to generalize• Combination of
• Workload (and tuning if synthetic)• System design• System configuration and tuning• Hardware configuration
66
Background: Spark SQL• SQL-like queries à Spark jobs• Ports some DBMS concepts into Spark
• E.g. query optimizer• We will come back to that later in the course
77
Why Disk is not so Important• Blocked time measures pure I/O time• Before being used, data needs
• Compression/decompression• Serialization/deserialization• Memory allocation/deallocation • All these are CPU costs!
• Surprise: Spark speedup due to serialization not I/O
88
Why Network is not so Important• Little intermediate data to be shuffled
• Analytics workloads are mostly read-only• Analytics workload compute aggregates• Valid for the workloads they considered (queries)
• Input data is read directly from disk• Worker is typically run where data is
99
Why Stragglers are not so Important• Spark uses small tasks• Multiple waves of allocation• Single, static wave of allocation would lead to much larger straggler overhead
10
Garbage Collection
11 11
Traditional Languages (C/C++)• Basic heap management• Must manually allocate/deallocate regions (malloc/free)• Must manage pointers• Problem: difficult to tell if a region is still in use
• Memory leaks: memory is not deallocated• Incorrectly reclaim memory for objects that are still used• Hard to debug: memory access may return “junk”
• Advantage: control
1212
Memory-Managed Languages (Java)• “No pointers”: object variables are pointers• Garbage collection deallocates memory for you
• Advantage: never read “junk”• Disadvantage: performance bottlenecks• Overhead can be mitigated by understanding how GC works!
• Goals of GC• Maximize memory utilization• Minimize memory fragmentation• Minimize GC overhead
1313
Generational Garbage Collection• Standard approach used by Java and other langs• Assumptions
• Most instantiated objects are short-lived• Few connections between long-lived and short-lived objects
• Approach: keep two generations• Young generation: recently instantiated objects• Old generation: non-recently instantiated objects
• Following discussion targets Java but general concepts
14 14
Determining “Live” Objects• Called “Marking"
• Start from root object • Traverse all references• See which objects are not referenced any longer
1515
Young Generation• Divided in three sections
• Eden• Two survivors spaces S0 and S1
• New objects are instantiated in eden• Survivor spaces labeled “From” and “To”
• From contains objects, To is empty• When eden fills up, minor garbage collection
1616
Minor Garbage Collection• “Stop-the-world”: JVM stops while GCing• Steps
• Scan Eden and From survivor space• Copy all live objects to To area• Logically clear Eden and From area• Switch From and To label
• After n copies, objects promoted to Old generation• Logic
• Eden and From areas are always compact• When scanning, few objects will be live and copied
1717
Major Garbage Collection• Run less frequently• “Stop-the-world”• Steps
• Scan Old generation• Remove objects that are not referenced• Compact
1818
Permanent Generation• Used by JVM to store metadata (classes etc.)• Limited control over it
1919
GC and Data Processing• Large datasets, potentially LOTS of objects
• Eden can fill up very quickly• Very frequent minor GCs
• GC at one node can slow down the whole system• GC adds further load to CPU
2020
Workarounds• Reuse objects
• Say that you are scanning a table of objects
• Instantiate a single cursor object and reassign its fields
• Eventually this object becomes old and not GCed
• Use primitive data types
• Boxing: Integer vs. int
• int[] vs. ArrayList<Integer>
• Beware Strings, Integers etc. are immutable
• Specialized libraries (fastutil, koloboke, …)