Spark Summit EU talk by Chris Pool and Jeroen Vlek

25
Making the switch SparkSummit 2016

Transcript of Spark Summit EU talk by Chris Pool and Jeroen Vlek

Page 1: Spark Summit EU talk by Chris Pool and Jeroen Vlek

MakingtheswitchSparkSummit 2016

Page 2: Spark Summit EU talk by Chris Pool and Jeroen Vlek

why Actthebestway attherightmoment.

how Thinking radically different and innovative about generating insights.

what Weareexperts in dataexcellence, by delivering solutions inthefield of (big)data,datascience andartificial intelligence.

DataIntegration

DataProcessing

DataScience

ArtificialIntelligence

AdviceCustomsolutionsTraining

Page 3: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Anchormen

30-10-16 3

• We specialize in data excellence:• Consumer 360• 24/7 Business• Search, Match & Find

• anchormen.nl/careers

Page 4: Spark Summit EU talk by Chris Pool and Jeroen Vlek

8-4-2015 4

About us

Jeroen Vlek• Lead data engineer• Struggling with Bloodborne

(PS4)

Chris Pool• Data scientist• Struggling with diapers

Page 5: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Dutch railways

• Most used network in Europe • 3,3 million journeys• 1.157.260 daily travellers

30-10-16 5

Page 6: Spark Summit EU talk by Chris Pool and Jeroen Vlek

What does Strukton Rail do?

30-10-16 6

Page 7: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Predictive Maintenance @ Strukton

• Less delays and canceling of trains• Making Strukton the leading company in the field of rail maintenance• Cost reduction• Better preparation for repair personnel

30-10-16 7

Page 8: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Switch Failures

30-10-16 8

Page 9: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Switch Failure Causus

Frequently obstructed movements due to:• Poor adjustment of rolling construction• Lack of grease on slide chairs• Bent blades• Electrical problems (worn-out brushes, motor, etc.)

30-10-16 9

Page 10: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Goal: Predict switch failure

30-10-16 10

Engine start

‘Flipping’

Locked

Amperage

#Measurements

Page 11: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Challenges

Labeling

Skewed

Black box

Non-Intrusive

30-10-16 11

Page 12: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Problem Definition

30-10-16 12

Learn the deviations in the data that indicate an upcoming malfunction

Page 13: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Data

~1500 withsensors

~21 millionflips

100- 1000 points / flip

50 GB data / year

30-10-16 13

Page 14: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Segments

30-10-16 14

Page 15: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Derived features

• Features that represent the curve (per segment):• Min• Max• Average• Length• Difference compared to previous flip

• Features for entire flip• Days since last failure• Temperature

30-10-16 15

Page 16: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Normalization and Aggregation

• Normalize data using sliding window• Aggregate per day

• Min• Max• First• Last• Variance• Average• Count

30-10-16 16

Page 17: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Model

• Decision tree: Will it break within the next 3 weeks or not?• Strukton: “keep it simple and explainable”• From days until failure to classes

• 0-2 days• 2-7 days• 7-21 days• 21-55 days• >55 days

30-10-16 17

Page 18: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Architecture (current)

30-10-16 18

MS-SQL MS-SQL

Page 19: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Why Spark?

• Lots of data prep and feature computation• More switches to be added in the future• Streaming scenarios:

• Short term failures• Optimize personnel’s routes

30-10-16 19

Page 20: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Results

30-10-16 20

True negative True positive class precision

Predicted negative 798 23 97.20%

Predicted positive 1 64 98.46%

class recall 99.87% 73.56%

Page 21: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Precision vs Recall

• Precision and recall are easily explained• Sending a mechanic is cheaper than a fine• Recall is more important

30-10-16 21

Page 22: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Future work

• Deep learning• Predict the number of days (regression)• Predict type of failures

• Less voltage• Too disorderly• Not locking: Too frequent• Up/down movement

30-10-16 22

Page 23: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Next steps

• Production• Lambda architecture • Nation wide roll out

30-10-16 23

Page 24: Spark Summit EU talk by Chris Pool and Jeroen Vlek

Questions?

[email protected]• www.anchormen.nl• @anchormenBDS

30-10-16 24

Page 25: Spark Summit EU talk by Chris Pool and Jeroen Vlek