Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)
-
Upload
tumra-big-data-science-gain-a-competitive-advantage-through-big-data-data-science -
Category
Technology
-
view
10.710 -
download
0
description
Transcript of Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)
![Page 1: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/1.jpg)
Real-Time Machine Learning at Industrial scale
9th October 2012Michael Cutler @cotdp
tumra.com@tumra
TUMRA LTD, Building 3, Chiswick Park,566 Chiswick High Road, W4 5YA
... the battle of accuracy vs latency
![Page 2: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/2.jpg)
$ whoami
Michael Cutler (@cotdp)
● Previously at British Sky Broadcasting○ Last 7 years in R&D○ Created several patented systems & algorithms○ Kicked off ‘Big Data’ initiative at Sky in 2008
● Co-founder CTO @ TUMRA in March '12○ Real-time big data science platform○ Alpha-testing with selected clients
![Page 3: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/3.jpg)
Agenda
● Background● Real-Time vs Batch processing● Accuracy vs Latency● Use Cases
○ eCommerce○ Financial Services○ Media
● Questions
![Page 4: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/4.jpg)
Background
Big Data is "in vogue", but what does it mean:
● Distributed processing● Massively scalable● Commodity
Apache Hadoop is "Kernel" of Big Data OS:
● Distributed Filesystem (HDFS)● Parallel Processing (Map/Reduce, YARN)
![Page 5: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/5.jpg)
Background (cont'd)
Solving problems with Big Data is hard:
● Tools are all low-level (Pig, Hive etc.)● Skills are hard to find
What is "Data Science":
● Understanding data & solving problems● Applies the following skills:
○ Statistical Analysis○ Machine Learning○ Communicating Results
![Page 6: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/6.jpg)
Real-Time vsBatch processing
![Page 7: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/7.jpg)
Credit: http://bit.ly/Q71u4W
Batch - Hoppers, Bins, Buckets
![Page 8: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/8.jpg)
Real-Time - Flows & Streams
Credit: http://bit.ly/NOslqf
![Page 9: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/9.jpg)
Real-Time vs Batch processing
Similarities to the Industrial Revolution:
● From handicraft to Batch & Real-Time● Complexity increases
Need for "Real-Time":
● Wherever the variation can change faster than you can retrain models
● When you can't pre-compute everything ahead of time
![Page 10: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/10.jpg)
Accuracy vs Latency
![Page 11: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/11.jpg)
Accuracy vs Latency
Netflix Prize winning entry :-
● Ensemble of 100's of models● Massively compute intensive solution● Marginally better than much simpler models
IBM won the KDD Cup 2009 (Orange) :-
● IBM Watson team won by sheer brute force● Used a "one of everything" approach
generating hundreds of models
![Page 12: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/12.jpg)
Accuracy vs Latency (cont'd)
Mathematical navel-gazing:
● Often the factor we're optimising for, isn't the thing we measure improvement in:○ User ratings vs. customer longevity/value○ Overfitting outliers vs. missing clear Fraud
Given the choice between a "best guess" now, and a "marginally better" answer later, I'd take the "best guess" every time.
![Page 13: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/13.jpg)
However, that doesn't mean...
![Page 14: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/14.jpg)
Accuracy vs Latency (cont'd)
It's a trade-off:
● Sometimes "best guess" is good enough,● Other times we can wait for the accuracy,● And of course, occasionally we want both!
Key objective:
● Most appropriate solution for the use-case● Hybrid solutions part batch, part real-time
![Page 15: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/15.jpg)
Use CaseeCommerce
![Page 16: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/16.jpg)
Use Case - eCommerce
Objective - Increase profits
How:● Match potential customers to the right products● Personalise user experience on web & email● Customer lifecycle management
Method:● Ensemble of real-time models● Collect lots of implicit feedback data
![Page 17: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/17.jpg)
Use Case - eCommerce (cont'd)
Detail:● Clustering - behavior, demogs● Simple predictors - keywords to products● Bayesian Bandit - blend the output
Requirements:● Predictions in < 50 ms● Online learning models● Occasional batch updates are OK
![Page 18: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/18.jpg)
![Page 19: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/19.jpg)
When eCommerce #FAILs
![Page 20: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/20.jpg)
I've only ever bought Cat food...
![Page 21: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/21.jpg)
... wait there's more, no Cat food
![Page 22: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/22.jpg)
Even Amazon can #FAIL
![Page 23: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/23.jpg)
Use CaseFinancial Services
![Page 24: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/24.jpg)
Use Case - Financial Services
Objective - Reduce Fraud
How:● Compute patterns/predictors for individuals● Cluster individuals and recompute for clusters● Compute baselines across all data
Method:● Hybrid and Hierarchical Clustering models● Simple predictors for individuals, clusters & baseline
![Page 25: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/25.jpg)
Use Case - Financial Services
Detail:● CHEAT!!! ... Cluster to nearest centroid
○ will degrade over time (Hunchback Clusters)● Use simple metrics to alert (stddev)
Requirements:● Ability to alert/intervene near real-time < 1 second● Adapt to rapid changes (within baseline & clusters)● Periodic batch processing to recompute clusters
![Page 26: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/26.jpg)
Use Case - Financial Services
![Page 27: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/27.jpg)
Use CaseMedia
![Page 28: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/28.jpg)
Use Case - Media
Objective - Generating Metadata
Why:● Drive second screen applications● Create new streams of information for resale
How:● Video / Audio analysis● Closed Caption or, Subtitle text processing● Knowledgebase :- People, Places, Products & Things
![Page 29: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/29.jpg)
Use Case - Media (cont'd)
Method:● Natural Language Processing
○ Named Entity Recognition○ Topic Extraction & Disambiguation
● Graph databases & algorithms
Requirements:● Responses in < 1 second● Ability to learn new 'Things'
Example of 12,000 entities from our Knowledgebase...
![Page 30: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/30.jpg)
![Page 31: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/31.jpg)
![Page 32: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/32.jpg)
![Page 33: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/33.jpg)
Summary
![Page 34: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/34.jpg)
Summary
Key points:● Clear move towards distributed algorithms● Latency is often more favorable than accuracy● Trade-offs are dependant on the use-cases
Further reading:● Apache Mahout - http://mahout.apache.org/● Storm Project - http://storm-project.net/● Data Science London - http://datasciencelondon.org/● Machine Learning Meetup - http://bit.ly/w8V8f6
![Page 35: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/35.jpg)
Almost finished!
![Page 36: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/36.jpg)
Introducing TUMRA Labs
API access to some of our real-time models:
● Probabilistic Demographics
Coming Soon:● Language detection● Sentiment analysis● Metadata Generation
Free to signup and easy to get started!
http://labs.tumra.com/
![Page 37: Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct 2012)](https://reader035.fdocuments.in/reader035/viewer/2022070322/5591af961a28ab21518b4616/html5/thumbnails/37.jpg)
Questions?
Worktumra.com
@tumra
Personalcotdp.com
@cotdp