Morticia: Visualizing And Debugging Complex Spark Workflows

27
Morticia: Visualize and Debug Complex Spark Workflows Jacob Perkins Stitchfix

Transcript of Morticia: Visualizing And Debugging Complex Spark Workflows

Page 1: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia: Visualize and Debug Complex Spark Workflows

Jacob PerkinsStitchfix

Page 2: Morticia: Visualizing And Debugging Complex Spark Workflows

Who am I?

Page 3: Morticia: Visualizing And Debugging Complex Spark Workflows

What do we do?

● Developer enablement platform

Page 4: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia?

● Self-service debugging of spark workflows

Page 5: Morticia: Visualizing And Debugging Complex Spark Workflows

Why bother?

● Data scientists are not going to become spark experts

Page 6: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...l Here's a simple query…l

select count (distinct source)from test.marvel_social_graph

where target != 'CAPTAIN AMERICA'

Page 7: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...● Questions:●

● How many input records?●● What is the parallelism throughout?●● How did my query get mapped to actual work?●● Logs?●

Page 8: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...Input record count?

Page 9: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...

Input record count? Not here.

Page 10: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...← Input record count!

Page 11: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...Number of tasks? Count!

Page 12: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...

Number of tasks? Count!

Page 13: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...

How did my query map to actual work? Uhhh

Page 14: Morticia: Visualizing And Debugging Complex Spark Workflows

Current state of the universe...

·Logs?·

· You're on your own

Page 15: Morticia: Visualizing And Debugging Complex Spark Workflows

Enter Morticia...

● Interactive, coherent, unified view● Logical information● Status● Archival

Page 16: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia

Page 17: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia

Page 18: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia

Page 19: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia

Page 20: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia

Page 21: Morticia: Visualizing And Debugging Complex Spark Workflows

Morticia

Page 22: Morticia: Visualizing And Debugging Complex Spark Workflows
Page 23: Morticia: Visualizing And Debugging Complex Spark Workflows
Page 24: Morticia: Visualizing And Debugging Complex Spark Workflows

How?

● Public SparkListener interface + AspectJ pointcuts to access internal state

Page 25: Morticia: Visualizing And Debugging Complex Spark Workflows

Please help!• Public interface for logical and physical planning events

Page 26: Morticia: Visualizing And Debugging Complex Spark Workflows

Btw, why Morticia?● Morticia Addams is inspiring and powerful● Initially a tool for post-mortem analysis● AspectJ pointcuts == basically witchcraft; Morticia is a witch● Amidst chaos and complexity, Morticia remains calm and incisive

Page 27: Morticia: Visualizing And Debugging Complex Spark Workflows

THANK YOU.