Spark Summit EU talk by Chris Pool and Jeroen Vlek
-
Upload
spark-summit -
Category
Data & Analytics
-
view
401 -
download
0
Transcript of Spark Summit EU talk by Chris Pool and Jeroen Vlek
![Page 1: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/1.jpg)
MakingtheswitchSparkSummit 2016
![Page 2: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/2.jpg)
why Actthebestway attherightmoment.
how Thinking radically different and innovative about generating insights.
what Weareexperts in dataexcellence, by delivering solutions inthefield of (big)data,datascience andartificial intelligence.
DataIntegration
DataProcessing
DataScience
ArtificialIntelligence
AdviceCustomsolutionsTraining
![Page 3: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/3.jpg)
Anchormen
30-10-16 3
• We specialize in data excellence:• Consumer 360• 24/7 Business• Search, Match & Find
• anchormen.nl/careers
![Page 4: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/4.jpg)
8-4-2015 4
About us
Jeroen Vlek• Lead data engineer• Struggling with Bloodborne
(PS4)
Chris Pool• Data scientist• Struggling with diapers
![Page 5: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/5.jpg)
Dutch railways
• Most used network in Europe • 3,3 million journeys• 1.157.260 daily travellers
30-10-16 5
![Page 6: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/6.jpg)
What does Strukton Rail do?
30-10-16 6
![Page 7: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/7.jpg)
Predictive Maintenance @ Strukton
• Less delays and canceling of trains• Making Strukton the leading company in the field of rail maintenance• Cost reduction• Better preparation for repair personnel
30-10-16 7
![Page 8: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/8.jpg)
Switch Failures
30-10-16 8
![Page 9: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/9.jpg)
Switch Failure Causus
Frequently obstructed movements due to:• Poor adjustment of rolling construction• Lack of grease on slide chairs• Bent blades• Electrical problems (worn-out brushes, motor, etc.)
30-10-16 9
![Page 10: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/10.jpg)
Goal: Predict switch failure
30-10-16 10
Engine start
‘Flipping’
Locked
Amperage
#Measurements
![Page 11: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/11.jpg)
Challenges
Labeling
Skewed
Black box
Non-Intrusive
30-10-16 11
![Page 12: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/12.jpg)
Problem Definition
30-10-16 12
Learn the deviations in the data that indicate an upcoming malfunction
![Page 13: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/13.jpg)
Data
~1500 withsensors
~21 millionflips
100- 1000 points / flip
50 GB data / year
30-10-16 13
![Page 14: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/14.jpg)
Segments
30-10-16 14
![Page 15: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/15.jpg)
Derived features
• Features that represent the curve (per segment):• Min• Max• Average• Length• Difference compared to previous flip
• Features for entire flip• Days since last failure• Temperature
30-10-16 15
![Page 16: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/16.jpg)
Normalization and Aggregation
• Normalize data using sliding window• Aggregate per day
• Min• Max• First• Last• Variance• Average• Count
30-10-16 16
![Page 17: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/17.jpg)
Model
• Decision tree: Will it break within the next 3 weeks or not?• Strukton: “keep it simple and explainable”• From days until failure to classes
• 0-2 days• 2-7 days• 7-21 days• 21-55 days• >55 days
30-10-16 17
![Page 18: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/18.jpg)
Architecture (current)
30-10-16 18
MS-SQL MS-SQL
![Page 19: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/19.jpg)
Why Spark?
• Lots of data prep and feature computation• More switches to be added in the future• Streaming scenarios:
• Short term failures• Optimize personnel’s routes
30-10-16 19
![Page 20: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/20.jpg)
Results
30-10-16 20
True negative True positive class precision
Predicted negative 798 23 97.20%
Predicted positive 1 64 98.46%
class recall 99.87% 73.56%
![Page 21: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/21.jpg)
Precision vs Recall
• Precision and recall are easily explained• Sending a mechanic is cheaper than a fine• Recall is more important
30-10-16 21
![Page 22: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/22.jpg)
Future work
• Deep learning• Predict the number of days (regression)• Predict type of failures
• Less voltage• Too disorderly• Not locking: Too frequent• Up/down movement
30-10-16 22
![Page 23: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/23.jpg)
Next steps
• Production• Lambda architecture • Nation wide roll out
30-10-16 23
![Page 25: Spark Summit EU talk by Chris Pool and Jeroen Vlek](https://reader031.fdocuments.in/reader031/viewer/2022021919/586f75cf1a28ab10258b61a1/html5/thumbnails/25.jpg)