BigDebug:)Debugging)Primitives)for) Interactive)Big)Data...
Transcript of BigDebug:)Debugging)Primitives)for) Interactive)Big)Data...
![Page 1: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/1.jpg)
BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark
Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, Sai Deep Tetali, Tyson Condie, Todd Millstein, Miryung Kim
University of California, Los Angeles
1
![Page 2: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/2.jpg)
Developing Big Data Analytics
• Big Data Analytics is becoming increasingly important.
• Big Data Analytics are built using data intensive computing platforms such as Map Reduce, Hadoop, and Apache Spark.
Apache Pig
Map Reduce
2
![Page 3: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/3.jpg)
Apache Spark: Next Generation Map Reduce
• Apache Spark is up to 100X faster than HadoopMapReduce
• It is open source and over 800 developers have contributed in its development
• 200+ companies are currently using Spark
• Spark also provides libraries such as SparkSQL and Mllib
3
![Page 4: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/4.jpg)
Running a Map Reduce Job on Cluster
A job is distributed to workers in cluster
Each worker performs pipelined transformations on a partition with millions of records
A user submits a job
ReduceFilter Map
4
![Page 5: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/5.jpg)
Motivating Scenario: Election Record Analysis
• Alice writes a Spark program that runs correctly on local machine (100MB data) but crashes on cluster (1TB)
• Alice cannot see the crash-‐inducing intermediate result.
• Alice cannot identify which input from 1TB causing crash
• When crash occurs, all intermediate results are thrown away.
val log = "s3n://poll.log"val text_file = spark.textFile(log)val count = text_file.filter( line => line.split()[3].toInt > 1440012701).map(line = > (line.split()[1] , 1)).reduceByKey(_ + _).collect()
1234567
VoterID Candidate State Time9213 Sanders TX 1440023087
5
![Page 6: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/6.jpg)
Motivating Scenario: Election Record Analysis
• Alice writes a Spark program that runs correctly on local machine (100MB data ) but crashes on cluster (1TB)
• Alice cannot see the crash-‐inducing intermediate result.
• Alice cannot identify which input from 1TB causing crash
• When crash occurs, all intermediate results are thrown away.
1234567
VoterID Candidate State Time9213 Sanders TX 1440023087
Task 31 failed 3 times; aborting jobERROR Executor: Exception in task 31 in stage 0 (TID 31)
java.lang.NumberFormatException
6
val log = "s3n://poll.log"val text_file = spark.textFile(log)val count = text_file.filter( line => line.split()[3].toInt > 1440012701).map(line = > (line.split()[1] , 1)).reduceByKey(_ + _).collect()
![Page 7: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/7.jpg)
BigDebug: Interactive Debugger Features
Simulated Breakpoint1
Crash Culprit Identification3
Guarded Watchpoint2
Backward and Forward Tracing
4
Crashing at transformation 2Crashing Record : “Sanders”ArrayIndexOutofBoundException
$>Crash inducing input records :
9K23 Cruz TX 14400236452FSD Cruz KS 14400264569909 Cruz KS 1440023768
7
![Page 8: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/8.jpg)
Outline• Interactive Debugging Primitives
1. Simulated Breakpoint
2. On-‐Demand Watchpoint
3. Crash Culprit Identification
4. Backward and Forward Tracing
5. Fine Grained Latency Alert
• Performance Evaluation
8
![Page 9: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/9.jpg)
Why Traditional Debug Primitives Do Not Work for Apache Spark?
Enabling interactive debugging requires us to re-‐think the features of traditional debugger such as GDB
• Pausing the entire computation on the cloud could reduce throughput
• It is clearly infeasible for a user to inspect billion of records through a regular watchpoint
• Even launching remote JVM debuggers to individual worker nodes cannot scale for big data computing
9
![Page 10: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/10.jpg)
Spark Program with Transformations
Flatmap Map ReduceByKey Map ReduceByKey Filter
10
![Page 11: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/11.jpg)
Spark Program Scheduled as Stages
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
11
![Page 12: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/12.jpg)
Stage 3Stage 2Stage 1
Materialization Points in Spark
Flatmap Map ReduceByKey Map ReduceByKey Filter
Stored data records
12
![Page 13: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/13.jpg)
Stage 3Stage 2Stage 1
1. Simulated Breakpoint
Flatmap Map ReduceByKey Map ReduceByKey Filter
Stored data records
13
![Page 14: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/14.jpg)
Stage 3Stage 2Stage 1
1. Simulated Breakpoint
Flatmap Map ReduceByKey Map ReduceByKey Filter
Breakpoint
Stored data records
14
![Page 15: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/15.jpg)
Stage 3Stage 2Stage 1
1. Simulated Breakpoint
Flatmap Map ReduceByKey Map ReduceByKey Filter
Breakpoint
Stored data records
Simulated breakpoint replays computation from the latest materialization point where data is stored in memory 15
![Page 16: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/16.jpg)
1. Simulated Breakpoint – Realtime Code Fix
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
Breakpoint
Allow a user to fix code after the breakpoint16
![Page 17: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/17.jpg)
Stage 2
2. On-‐Demand Guarded Watchpoint
ReduceByKey Map Watchpoint
Watchpoint captures individual data records matching a user-‐provided guard
Stage 1
Flatmap Map
17
![Page 18: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/18.jpg)
state.equals(“TX”)||state.equals(“CA”)
Stage 2
2. On-‐Demand Guarded Watchpoint
ReduceByKey Map Watchpoint
Watchpoint captures individual data records matching a user-‐provided guard
Stage 1
Flatmap Map
18
![Page 19: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/19.jpg)
Stage 2
2. On-‐Demand Guarded Watchpoint
ReduceByKey Map Watchpoint
Watchpoint captures individual data records matching a user-‐provided guard
Stage 1
Flatmap Map
19
state.equals(“CA”)
![Page 20: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/20.jpg)
Crash in Apache Spark
To recover to from crash, a user need to find input causing crash and re-‐execute the whole job.
A job failure in Spark throws away the intermediate results of correctly computed stages
Task 31 failed 3 times; aborting jobERROR Executor: Exception in task 31 in stage 0 (TID 31)java.lang.NumberFormatException
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
20
![Page 21: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/21.jpg)
3. Crash Culprit Identification
A user can see the crash-‐causing intermediate record and trace the original inputs leading to the crash.
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
Crash occurred at transformation 3Crashing Record : “Sanders”ArrayIndexOutofBoundExceptionSkipping the record.Continuing processing.
21
![Page 22: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/22.jpg)
3. Crash Culprit Remediation
A user can either correct the crashed record, skip the crash culprit, or supply a code fix to repair the crash culprit.
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
22
![Page 23: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/23.jpg)
4. Backward and Forward Tracing
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
A user can also issue tracing queries on intermediate records at realtime 23
![Page 24: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/24.jpg)
4. Backward and Forward Tracing
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
A user can also issue tracing queries on intermediate records at realtime 24
![Page 25: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/25.jpg)
Input Output a yb w… …
Titian: Data Provenance for Spark [PVLDB2016]
Titian logically reconstructs mapping from output to input records by recursively joining the provenance tables
Input Output 0 a25 b… …
Input Output t xa yc z… …
Input Output 0 a25 b
Step 2
Step 1
Titian instruments Spark jobs with tracing agents to generate fine grained tracing tables
Tracing Table 2 Tracing Table 3Tracing Table 1 Input Output
x 0y 1… …w 10
25
![Page 26: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/26.jpg)
Stage 3Stage 2Stage 1
5. Fine Grained Latency Alert
Flatmap Map ReduceByKey Map ReduceByKey Filter
A latency alert is issued if the processing time is greater than k standard deviations above the moving average
Latency alert is enabled
26
![Page 27: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/27.jpg)
44ms400ms10ms20ms
5. Fine Grained Latency Alert
A latency alert is issued if the processing time is greater than k standard deviations above the moving average
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
Latency alert is enabled
27
![Page 28: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/28.jpg)
Evaluation
• Q1 : How does BigDebug scale to massive data?
• Q2 : What is the performance overheadof instrumentation and communication for debugging primitives?
• Q3 : How much time saving does BigDebugprovide through its runtime crash remediation, in comparison to an existing replay debugger?
28
![Page 29: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/29.jpg)
Q1 : How does BigDebug scale to massive data?
1
10
100
1000
10000
0.5 0.9 4 8 30 70 200 1000
Time (s)
Dataset Size (GB)
BigDebug Scale Up
Spark
29
![Page 30: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/30.jpg)
Q1 : How does BigDebug scale to massive data?
BigDebug retains scale up property of Spark. This property is critical for Big Data processing frameworks
1
10
100
1000
10000
0.5 0.9 4 8 30 70 200 1000
Time (s)
Dataset Size (GB)
BigDebug Scale Up
BigDebug Spark
30
![Page 31: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/31.jpg)
Q1 : How does BigDebug scale to massive data?
0
100
200
300
2 4 6 8
Time (s)
Number of Workers
BigDebug Scale Out
Spark 10GB Spark 30GB Spark 50GB
31
![Page 32: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/32.jpg)
Q1 : How does BigDebug scale to massive data?
BigDebug retains scale out property of Spark. This property is critical for Big Data processing frameworks
0
100
200
300
400
2 4 6 8
Time (s)
Number of Workers
BigDebug Scale Out
BigDebug 10GB BigDebug 30GB BigDebug 50GB
Spark 10GB Spark 30GB Spark 50GB
32
![Page 33: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/33.jpg)
Q2 : What is the performance overhead of debugging primitives?
Program Dataset size (GB)
Max Max w/o Latency Alert
Watchpoint Crash Culprit
Tracing
WordCount 0.5 -‐ 1000 2.5X 1.34X 1.09X 1.18X 1.22X
Grep 20 -‐ 90 1.76X 1.07X 1.05X 1.04X 1.05X
PigMix-‐L1 1 -‐ 200 1.38X 1.29X 1.03X 1.19X 1.24X
BigDebugposes at most 2.5X overhead with the maximum instrumentation setting.
Max : All the features of BigDebugare enabled
33
![Page 34: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/34.jpg)
Q3 : How much time saving does BigDebugprovide when resolving crash?
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
The first run crashesArthur
Suppose that a user wants to skip or correct the crash causing inputs.
34
![Page 35: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/35.jpg)
Q3 : How much time saving does BigDebugprovide when resolving crash?
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
The first run crashes
The second run instruments all records leading to a crash
The third run removes the crash inducing records from the inputs.
Arthur
Suppose that a user wants to skip or correct the crash causing inputs.
35
![Page 36: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/36.jpg)
Q3 : How much time saving does BigDebugprovide when resolving crash?
Stage 3Stage 2Stage 1
Flatmap Map ReduceByKey Map ReduceByKey Filter
The first run crashes
The second run instruments all records leading to a crash
The third run removes the crash inducing records from the inputs.
Arthur
Suppose that a user wants to skip or correct the crash causing inputs.
BigDebugA single run can detect and remove the crash culprit and resumes the job
36
![Page 37: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/37.jpg)
Q3 : How much time saving does BigDebugprovide?
BigDebug finds a crash inducing record with 100% accuracy andsaves upto 100% time saving through runtime crash remediation
0
50
100
150
200
250
S1 S2 S3 S4
Time (s)
Location of crash (Stage)
Time saving
BigDebug
Arthur
37
![Page 38: BigDebug:)Debugging)Primitives)for) Interactive)Big)Data ...web.cs.ucla.edu/~gulzar/assets/pdf/icse2016-bigdebug-slides.pdf · BigDebug:)Debugging)Primitives)for) Interactive)Big)Data)Processing)in)Spark](https://reader035.fdocuments.in/reader035/viewer/2022081401/5f0733127e708231d41bcd69/html5/thumbnails/38.jpg)
Conclusion
• Debugging big data applications is painstaking and expensive
• BigDebugprovides interactive debugging primitives for high performance in-‐memory processing in Spark
• BigDebugoffers simulated breakpoints and guarded watchpointswith little performance overhead
• It scales to massive data in the order of terabytes, its record level tracing poses 25% overhead and provides up to 100% time saving
• BigDebug is publically available athttps://sites.google.com/site/sparkbigdebug/
38