Building Intelligent Data Products
-
Upload
stephen-whitworth -
Category
Technology
-
view
663 -
download
0
Transcript of Building Intelligent Data Products
![Page 1: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/1.jpg)
building intelligent data products
![Page 2: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/2.jpg)
what actually is fraud
architecting flexible data ‘plumbing’
building solid data products on top of them
![Page 3: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/3.jpg)
stephen whitworth
2 years at Hailo as data scientist/jack of some trades out of university
product and marketplace analytics, agent based modelling, data engineering, ‘ML’ services
data science/engineering at ravelin, specifically focused on our detection capabilities
![Page 4: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/4.jpg)
what is ravelin?
online fraud detection and prevention platform
stream application/server data to our events API
we give fraud probability + beautiful data visualisation
backed by techstars/passion/playfair/amadeus/indeed.com founder/wonga founder amongst other great investors
![Page 5: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/5.jpg)
fraud?
![Page 6: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/6.jpg)
$14Ba dollar for every year the universe has existed
![Page 7: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/7.jpg)
Same day delivery On-demand services
![Page 8: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/8.jpg)
‘victimless crime’
police ill-equipped to handle
low barrier to entry from dark net
3D secure - conversion killer
![Page 9: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/9.jpg)
traditional: human generated rules, born of deep expertise
order-centric view of the world
![Page 10: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/10.jpg)
hybrid: augment expertise by learning rules from data
cards don’t commit fraud, people do
![Page 11: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/11.jpg)
building good plumbing
![Page 12: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/12.jpg)
receive firehose through API
decode arbitrary data and store
extract hundreds of features
http/slack/whatever notification to customer
in 100-300ms (ish)
run through N models and rule engine to get probability
![Page 13: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/13.jpg)
BUZZWORDS ABOUND
go
postgres
AWS
microservices
zookeeper
NSQ python
event-driven
elasticsearch bigquery dynamodb
redis
![Page 14: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/14.jpg)
![Page 15: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/15.jpg)
instrumentation
![Page 16: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/16.jpg)
different databases for different needs
kudos if you get The Office reference
![Page 17: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/17.jpg)
postgres: solid, start here
dynamodb: very high throughput, low latency data
bigquery: to answer any question you could possibly have
elasticsearch: rich querying in a reasonable amount of time
graph db: haven’t decided, recommendations?
![Page 18: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/18.jpg)
asynchronous systemsfirehoses
nice deployment patterns
‘lambda architecture’ - the append only log
services store their own interpretation of events
services are almost entirely decoupled
![Page 19: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/19.jpg)
asynchronous systemsfirehoses
error propagation is challenging
no guarantees of SLA - at least as slow as your queue
hard to know who or what is consuming your data
![Page 20: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/20.jpg)
building data products
![Page 21: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/21.jpg)
‘a random forest is like a room full of experts who have seen different
cases of fraud from different perspectives’
![Page 22: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/22.jpg)
‘a random forest is like a room full of experts who have seen different
cases of fraud from different perspectives’
N
![Page 23: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/23.jpg)
precision: of all of my predictions, what % was I correct?
recall: out of all of the fraudsters, what % did I catch?
implicit tradeoff between conversion and fraud loss
‘accuracy’ a useless metric for fraud
![Page 24: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/24.jpg)
99.8% ACCURATE
![Page 25: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/25.jpg)
![Page 26: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/26.jpg)
keep model interfaces simple
hide arbitrarily complex transformations behind it
blend global and client specific models
![Page 27: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/27.jpg)
building and training statistical models
currently batch
will combine with online
![Page 28: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/28.jpg)
RANDOM FORESTS
![Page 29: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/29.jpg)
‘a random forest is like a room full of experts who have seen different
cases of fraud from different perspectives’
![Page 30: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/30.jpg)
RANDOM FORESTS
MONITORING
![Page 31: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/31.jpg)
probabilistic, not deterministic
dogfood - use live robot customers
run models in ‘dark mode’ to determine performance
![Page 32: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/32.jpg)
why not deep learning? ..yet
ability to debug random forests
had nice results with keras
![Page 33: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/33.jpg)
serialisation and deployment: an unsolved problem
![Page 34: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/34.jpg)
in beta and signing up clients
looking for on-demand services/marketplaces
talk to me afterwards
![Page 35: Building Intelligent Data Products](https://reader035.fdocuments.in/reader035/viewer/2022062316/58a960931a28abfd648b48e7/html5/thumbnails/35.jpg)
obligatory: we are hiring!
senior machine learning engineers/data scientists
[email protected] or talk to me after