Post on 10-Feb-2017
ML Little Data
Vincent TangLead ML Engineer
SAMSUNG ACCELERATOR
EMBEDDED ML
BIG DATA, BIG COMPUTE
STANDARD DATA PIPELINE + LEARNING
X
DX
D
D
D
X
D
D
D
D D D
D D D
XD
D
D
X
DEVICES IN THE WILD
Move Compute ML to the Data Edge
MOVE ML TO THE EDGE
D
D
D
D
D
D
D
DD
D D
D
D
D
D
D
D
D
D
D
D
D D
D DD
D D
D D
DD
Traditional Embedded
Resources MOAR GPUs Each thread counts; small buffers
Power 60-130 watts / server 0.18 mW for 32 bytes/second
Updates Commit + Push OTA (sometimes)
Languages Python & R FTW! C, C++, Java
Parameters Stationarity Non-stationarity
Cycle Batch Online, up to 1600hz
Type Supervised Unsupervised
Variance “Napolean Dynamite” problem Unreliable sensors
Metric arg max (accuracy) arg max (accuracy / big-O)
COMPARISON
PIPELINE
Acquisition (20%) Feature Engineering (60%) Learning (10%) Deploy (10%)
PIPELINE
Acquisition (20%) Feature Engineering (60%) Learning (10%) Deploy (10%)
PIPELINE
Acquisition (20%) Feature Engineering (60%) Learning (10%) Deploy (10%)
PIPELINE
Acquisition (20%) Feature Engineering (60%) Learning (10%) Deploy (10%)
DEEP NETS
Acquisition (20%) Feature Engineering (60%) Learning (10%) Deploy (10%)
Feature Engineering & Learning for the price of one!
PIPELINE
Acquisition (20%) Feature Engineering (60%) Learning (10%) Deploy (10%)
PIPELINE
Acquisition (20%) Feature Engineering (60%) Learning (10%) Deploy (10%)
Tighter Feedback & Cleaner Code!
● More data > smarter algorithm● Start with simple learners, then increase complexity as needed● Cast a wide net, then prune● Reject hypotheses early and often
ADVICE FOR PRACTITIONERS
SAMSUNG
CASE STUDY: UNCLIP
Thank you!