An Architecture for Agile Machine Learning in Real-Time Applications
-
Upload
johann-schleier-smith -
Category
Technology
-
view
130 -
download
1
Transcript of An Architecture for Agile Machine Learning in Real-Time Applications
![Page 1: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/1.jpg)
An Architecture for Agile Machine Learning in Real-Time Applications
[email protected]@jssmith github.com/ifwe
Johann Schleier-Smith if(we) Inc.
August 11, 2015KDD, Sydney Australia
![Page 2: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/2.jpg)
• Profitable startup actively pursuing big opportunities in social apps
• Millions of users on existing products
• Thousands of social contacts per second
![Page 3: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/3.jpg)
Overview
• Agile machine learning can be difficult—but brings big benefits
• Key challenges in deployment and feature engineering
• Solution in single path to data
![Page 4: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/4.jpg)
production
development
servepersonalized
recommendations
datacollection
modelupdates
![Page 5: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/5.jpg)
production
development
servepersonalized
recommendations
datacollection
modelupdates
study &understand
train &backtest
design newmodels & features
![Page 6: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/6.jpg)
production
development
servepersonalized
recommendations
datacollection
study &understand
design newmodels & features
modelupdates
train &backtest
![Page 7: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/7.jpg)
modelupdates
train &backtest
writespec
e-mail modelto engineers
requestengineering
why didwe want this?
QA
bug fixesmeetingswait
exportto Excel
checkparameters
Java development
new databaseschema
![Page 8: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/8.jpg)
modelupdates
train &backtest
• Shared path to data• Shared feature definition code
![Page 9: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/9.jpg)
production
development
servepersonalized
recommendations
datacollection
modelupdates
study &understand
train &backtest
design newmodels & features
![Page 10: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/10.jpg)
• >10 million candidates • >1000 updates/sec
• Must be responsive to current activity • Users expect instant query results
Recommendation Enginefor Dating Product
![Page 11: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/11.jpg)
Model
![Page 12: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/12.jpg)
Model
![Page 13: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/13.jpg)
Model• Decompose likelihood of match between vote outcomes
and vote occurrence
• Logistic regression
• Real-time personalization through feature vector evolution
• Model parameters trained offline by data scientists
• Consider 1000s of features, select 50-100
![Page 14: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/14.jpg)
Application APIs& Business Logic
RDBMS
![Page 15: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/15.jpg)
Application APIs& Business Logic
RDBMSData Warehouse /
Hadoop
![Page 16: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/16.jpg)
Application APIs& Business Logic
RDBMSData Warehouse /
HadoopStreaming Logs
![Page 17: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/17.jpg)
Application APIs& Business Logic
RDBMSData Warehouse /
HadoopStreaming Logs
![Page 18: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/18.jpg)
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
Data Warehouse /HadoopStreaming Logs
![Page 19: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/19.jpg)
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
BatchPredictions
Data Warehouse /HadoopStreaming Logs
![Page 20: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/20.jpg)
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
BatchPredictions
Predictive Services /Ranking
Data Warehouse /HadoopStreaming Logs
![Page 21: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/21.jpg)
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
BatchPredictions
Predictive Services /Ranking
Data Warehouse /HadoopStreaming Logs
![Page 22: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/22.jpg)
EventsTime
![Page 23: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/23.jpg)
Aggregation
first( )last( )
count( )
sum( )max( )
count( )
avg( ) min( )
EventsTime
![Page 24: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/24.jpg)
Machine learning inputAggregation
first( )last( )
count( )
sum( )max( )
count( )
avg( ) min( )
EventsTime
![Page 25: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/25.jpg)
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
![Page 26: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/26.jpg)
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
![Page 27: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/27.jpg)
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
![Page 28: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/28.jpg)
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
+∞ for real-time streaming
![Page 29: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/29.jpg)
Events
Alice updates profile
Bob opens app
Bob sees Alice in recommendations
Bob swipes yes on Alice
Alice receives push notification
Alice sees Bob in recommendations
Alice sends message to Bob
Tim
e
![Page 30: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/30.jpg)
Online feature stateEvents
Alice updates profile
Bob opens app
Bob sees Alice in recommendations
Bob swipes yes on Alice
Alice receives push notification
Alice sees Bob in recommendations
Alice sends message to Bob
Tim
e
![Page 31: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/31.jpg)
Machine learning inputOnline feature stateEvents
Alice updates profile
Bob opens app
Bob sees Alice in recommendations
Bob swipes yes on Alice
Alice receives push notification
Alice sees Bob in recommendations
Alice sends message to Bob
Tim
e
![Page 32: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/32.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
![Page 33: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/33.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
![Page 34: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/34.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
![Page 35: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/35.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
![Page 36: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/36.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
![Page 37: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/37.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
![Page 38: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/38.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
![Page 39: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/39.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
![Page 40: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/40.jpg)
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
![Page 41: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/41.jpg)
Monitoring
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
![Page 42: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/42.jpg)
• Single path to data for real-time streaming and history
• Shared feature engineering code for development and production
• Team shares access to code and data
• Fine-grained alignment of feature state and prediction outcomes
• Temporally accurate modeling ensured (no looking ahead)
Event History API
![Page 43: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/43.jpg)
![Page 44: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/44.jpg)
15 new models released and tested within 6 months >30% cumulative improvement in usage shown in A/B testing
0
500,000
1,000,000
1,500,000
2,000,000
Apr 2013 Jul 2013 Oct 2013 Jan 2014 Apr 2014
Daily
Uni
que
User
s
MatchersVoters
New model releasedA/B test updated
![Page 45: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/45.jpg)
• Open source implementation derived from if(we)’s proprietary platform
• Provides Scala DSL for building online features from event history
• Examples include dating recommendations and product search with learning to rank
• Not yet ready for scale or production
• Seeking collaborators
![Page 46: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/46.jpg)
Production Serving Data Science
Ranking R MatlabPython
Feature Engineering
Event History API
Kafka
Streaming data
Storm
Historical data
S3 NFSHDFS
Antelope Open Source Vision
![Page 47: An Architecture for Agile Machine Learning in Real-Time Applications](https://reader034.fdocuments.in/reader034/viewer/2022042608/55d3a074bb61ebf8098b4598/html5/thumbnails/47.jpg)
Agile Machine Learning with Event History
• Solving deployment yields quick product cycles
• All data saved and retrieved as time-ordered events • Single path to data for both historical and real-time access • Same feature engineering code used in development and production
• Agile success • Team shares access to code and data • Production product iterations measured in days rather than months
github.com/ifwe/antelope@jssmith