Predicting Truck Breakdowns - Rory Woods

17
Big Data for Big Rigs Predicting Truck Breakdowns Rory Woods Lead Data Scientist 9-24-2016

Transcript of Predicting Truck Breakdowns - Rory Woods

Page 1: Predicting Truck Breakdowns - Rory Woods

Big Data for Big Rigs

Predicting Truck BreakdownsRory Woods

Lead Data Scientist9-24-2016

Page 2: Predicting Truck Breakdowns - Rory Woods

05/01/20232

Overview What is Preteckt?

Getting and working with the data

Making predictions work

Conclusion

Page 3: Predicting Truck Breakdowns - Rory Woods

05/01/20233

What Does Preteckt Do?Prevent on-the-road breakdowns by identifying

breakdowns many days in advance.

• Take data from truck sensors

• Analyze and compare them to other trucks

• Monitor trucks in real time to identify breakdowns in advance

Page 4: Predicting Truck Breakdowns - Rory Woods

05/01/20234

Preteckt’s Data Science Team

Rory Woods – Lead Data ScientistPhD in Computational Astrophysics with experience in high performance computing.

Bertrand Brelier – Data ScientistFormer research scientist at IBM and data scientist at Numeris. PhD in Physics.

Mikhail Klassen – Chief Data Scientist at Paradigm Knowledge Solutions, PhD in Computational Astrophysics.

Ben Keller – PhD student in Computational Astrophysics.

Jim Reilly – Professor of ECEInterests in signal processing and machine learning techniques

Ken Sills – CTO15-years experience in data analytics; Master of Electrical and Computer Engineering.

Page 5: Predicting Truck Breakdowns - Rory Woods

05/01/20235

We use proprietary hardware, with a built-in microcomputer, to gain access to the data generated on a truck.

• Use small computer with cellular access

• Sniff ECU bus on truck

• Record and sync all data to servers

Page 6: Predicting Truck Breakdowns - Rory Woods

05/01/20236

Data Flow Within Preteckt

Page 7: Predicting Truck Breakdowns - Rory Woods

05/01/20237

Finding Useful SensorsO(104) - All Sensors

O(103) – Documented sensors

O(500) – Available on any one truck

O(100) – Good sensors

Drop proprietary, undocumented

Drop unavailable sensorsWrite conversion functions by hand

Drop “bad” sensors(garbage data, constant values)

O(50) – Relevant sensors

Method-specific feature selection

Page 8: Predicting Truck Breakdowns - Rory Woods

05/01/20238

Data Attributes

Time Voltage Engine Speed

Fuel Speed Pressure

1 14.1 1200 120 90 300

2 14.0 - - 92 300

3 14.1 - - 512 300

4 13.9 1230 - 92 300

5 14.1 - - - 300

6 14.1 - - 520 300

7 14.0 - - 92 300

8 14.1 - 119 518 300

9 13.9 1260 - 90 300

Irregular

High Frequency

Low Frequency

Bad Readings

Constant Readings

Page 9: Predicting Truck Breakdowns - Rory Woods

05/01/20239

Typical Pre-processingTime Voltage Engine

Speed Fuel Speed Pressure

1 14.1 1200 120 90 300

2 14.0 1210 120 92 300

3 14.1 1220 120 512 300

4 13.9 1230 120 92 300

5 14.1 1235 120 92 300

6 14.1 1240 120 520 300

7 14.0 1245 120 92 300

8 14.1 1250 119 518 300

9 13.9 1255 119 90 300

92

92

91

Drop Garbage

Drop 0 variance

Interpolate, OR

Forward fill

Page 10: Predicting Truck Breakdowns - Rory Woods

05/01/202310

Unlabeled Data

Time Voltage Engine Speed

Fuel Speed LABEL

1 14.1 1200 120 90 0

2 14.0 1210 120 92 0

3 14.1 1220 120 92 0

4 13.9 1230 120 92 0

5 14.1 1235 120 92 0

6 14.1 1240 120 92 0

7 14.0 1245 120 92 0

8 14.1 1250 119 91 1

9 13.9 1255 119 90 1

Truck Breaks down here

?

?

?

?

?

?

?

?

?

Page 11: Predicting Truck Breakdowns - Rory Woods

05/01/202311

Unlabeled Data

Labeling breakdowns is currently the biggest bottleneck!

1. Create labels from sensors- Sensor a = 1 if part x is not

functioning correctly- Sensor a > threshold = bad

2. Use Unsupervised Learning techniques- Clustering

Start With This- Anomaly Detection

Page 12: Predicting Truck Breakdowns - Rory Woods

05/01/202312

Predicting Rates of Change

Goal: Predict time-derivative of sensor xPreprocessing:

1. Use above-mentioned data cleaning2. Smooth x using rolling window3. Take derivative of X4. Smooth dX/dt using rolling window

Sensor XdX/dt

Time (s)

Page 13: Predicting Truck Breakdowns - Rory Woods

05/01/202313

Predicting Rates of Change

Method R Score

Ordinary Least Squares ~ 0.05

Lasso, Ridge, LARS ~ 0.02-0.15

Partial Least Squares ~ 0.2

Avoid Predicting Continuous Variables!

Page 14: Predicting Truck Breakdowns - Rory Woods

05/01/202314

Predicting EventsLabel “events” as points when sensor Y = 1.

1. Pre-process data (scaling, etc.)

2. Create N label columns representing “Event occurs in x hours = True”

3. Chose N lead times (we used 3, 6, 12, 24, 28, and 72 hours)

4. Do feature selection to reduce sensors (PCA, mrmr)

5. Run classifiers to predict lead times (good results with logistic regression and SVM)

Page 15: Predicting Truck Breakdowns - Rory Woods

05/01/202315

Predicting EventsLead Time (hours) F1, R

(roughly the same for all)3 0.96

6 0.95

12 0.81

24 0.70

48 0.70

72 0.75

Note: Frequency of Y = 1 is very roughly once every 48-72 hours.

Page 16: Predicting Truck Breakdowns - Rory Woods

05/01/202316

Probability of y = 1 in the next 24 hours

Time (s)

P (y

=1, 2

4hr)

Truck shuts downy = 1 here

Note: data only trained on y ≠ 1

Target

Predicted

y = 1

Page 17: Predicting Truck Breakdowns - Rory Woods

05/01/202317

Future Plans

• Identify other sensors to repeat the above process

• Once we have enough breakdowns, apply above procedure to breakdowns

• Recurrent Neural Network• With large number of labels, can do survival

analysis