Extracting information from images using deep learning and transfer learning — Pierre Gutierrez...

39
Extracting information from images using deep learning and transfer learning. Application to personalized recommendations.

Transcript of Extracting information from images using deep learning and transfer learning — Pierre Gutierrez...

Page 1: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Extracting information from images using deep learning and transfer learning. Application to personalized recommendations.

Page 2: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Introduction and Context

Iterative building of a recommender system

Labeling Images Pragmatic deep learning for dummies

Post Processing AKA: Image for BI on steroids

Outline Results More images !

Page 3: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Dataiku

•  Founded in 2013 •  90 + employees, 100 + clients •  Paris, New-York, London, San Francisco,

Singapore

Data Science Software Editor of Dataiku DSS

DESIGN

Load and prepare your data

PREPARE Build your

models

MODEL Visualize and share

your work

ANALYSE

Re-execute your workflow at

ease

AUTOMATE Follow your production

environment

MONITOR Get predictions

in real time

SCORE PRODUCTIO

N

Page 4: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Client Key Figures

E-business vacation retailer Negotiate the best prize for their clients Discount luxury

Sale Image is paramount Purchase is impulsive

18 Millions of clients. Hundreds of sales opened everyday

Page 5: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Specificities

Highly temporary sales -> Classical recommender system fail -> Time event linked (Christmas, ski, summer)

Expensive Product -> Few recurrent buyers -> Appearance counts a lot

Page 6: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Iterative Building of a Recommender System

Page 7: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Basic Recommendation Engines

Page 8: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Other Factors

Page 9: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

One Meta Model to Rule Them All

Recommenders  as  features  

Machine  learning  to  op5mize  purchasing  

probability  

Combine  

Recommend  

Describe  

Page 10: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Recommender system for Home Page Ordering

Cleaning, combining and enrichment of

data

Recommendation Engines Optimization of

home display

the application automatically runs and

compiles heterogeneous data

Generation of recommendations based

on user behaviour

Every customer is shown the 10 sales he is the

most likely to buy

Customer visits Purchases

Sales Images

Metal model combine recommendations to

directly optimize purchasing probability

Meta Model

+7% revenue

Sales information

(A/B testing)

Batch Scoring every night

Page 11: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Why use Image ?

We want do distinguish

« Sun and

Beach »

« Ski »

A picture is worth a thousand words

Page 12: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Integrating Image Information

Sales Images Labelling Model

Pool + Palm Trees Hotel

+ Mountains

Pool + Forest + Hotel + Sea

Sea + Beach +Forest + Hotel

Sales descriptions

vector

CONTENT  BASED  

Recommender System

Page 13: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Image Labelling For Recommendation Engine Pragma&c  Deep  learning  for  “Dummies”  

Page 14: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Using Deep Learning models Common Issues

“I don’t have GPUs server” “I don’t have a deep leaning expert”

“I don’t have labelled data” (or too few)

“I don’t have the time to wait for model training ”

I don’t want to pay to pay for private apis” / “I’m afraid their labelling will change over time”

Page 15: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

“I don’t have (or few) labelled data” -> Is there similar data ?

Solution 1 : Pre trained models

PLACES  DATABASE  US   SUN  DATABASE  

205  categories  2.5  M  images  

307  categories  110  K  images  

Page 16: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

tower: 0.53 skyscraper: 0.26

swimming_pool/outdoor: 0.65 inn/outdoor: 0.06

Solution 1 : Pre trained models If there is open data, there is an open pre trained model ! •  Kudos to the community •  Check the licensing

Example  with  Places  (Caffe  Model  Zoo)  :    

Page 17: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Solution 2 : Transfer Learning

Credit  :    Fei-­‐Fei  Li  &  Andrej  Karpathy  &  Jus5n  Johnson  hYp://cs231n.stanford.edu/slides/winter1516_lecture11.pdf  

Page 18: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Solution 2 : Transfer Learning Use the network as a feature extractor

Fine tune the model Keras makes it super easy !

(Possible to freeze layer to train just the top) hYps://github.com/PGu5/Deep_Learning_tuto  

Page 19: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

PLACES  DATABASE   US  SUN  DATABASE  

Training  (op5onal)  

Pre-­‐trained  model  VGG16  

tower: 0.53 skyscraper: 0.26

Re-­‐Training  

Transferred  Data  :  Last  convolu5onal  layer  features  

Re-­‐trained  model  TensorFlow  

2  fully  connected  layers  

Caffe  Model  Zoo  

 

GPU  

CPU  

GPU  

Leverage existing knowledge ! Solution 2 : Transfer Learning

Accuracy:  72%,  Top-­‐5  Acc:  90  %  >  state  of  the  art  on  dataset  alone  

Transfer  

Page 20: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Post Treatment & Results

(Or how we transfer the labelling information)

Using  Images  informa&on  for  BI  on  steroids    

Page 21: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Labels post-processing

Complementary information Redondant information

Issue with our approach:

Solution : Matrix Factorization

Page 22: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Topic extraction with Non-Negative Matrix Factorization

•  Non Negative Matrix factorization (NMF) X = WH •  X : image x tags, non negative •  W : image x theme •  H : theme x tag (scikit learn implementation)

•  Most represented Themes

•  Swimming-pool_Apartment_Putting-green •  Ocean_Coast_SandBar •  Coast_SeaCliff_RockArch •  Beach_Coast_BoardWalk •  Bridge_Viaduc_River •  Palace_BuildingFacade-Mansion •  Castle_Mansion_Monastery •  HotelRoom_Bedroom_DormRoom

•  Dimension Reduction •  200x200 pixels -> 600 tags => 30

themes •  Faster content based filtering

•  Image often sparse combination of themes

Faster content based filtering

•  Each theme has the same explication power

Balanced vector for content based

•  Explicability Each theme corresponds to a few labels

Page 23: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Image content detection Topic scores determine the importance of topics in an image

TOPIC   TOPIC  SCORE  (%)  

Golf  course  –  Fairway  –  PuHng  green   31  

Hotel  –  Inn  –  Apartment  building  outdoor  

30  

Swimming  pool  –  Lido  Deck  –  Hot  tub  outdoor  

22  

Beach  –  Coast  -­‐  Harbor   17  

TOPIC   TOPIC  SCORE  (%)  

Tower  –  Skyscraper  –  Office  building   62  

Bridge  –  River  –  Viaduct   38  

Page 24: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Note on model performance

•  Images labels are used for similarity Calling herb field “putting green”:

•  Is not important if all herbs field are called this way. •  Would be if we had lot’s of golf trips sales.

•  Improving the NN performance ? •  Labels are used in NMF and reduced to themes •  Themes are used to calculate similarities for CB

recommenders •  CB Recommenders are used as a feature in meta model •  Meta model give probabilities of purchase = order •  Users only check 10 sales…

-> what is the change of online performance for 1% accuracy ?

Page 25: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Results

Page 26: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Results ? 1) Visits : •  France and Morocco •  Pool displayed

2) First Recommendation • Mostly France &

Mediterranean •  Fails to display pools

3) Only Images recommendation •  Pool all around the world •  Does not respect budget

4) Third column = Right Mix

Page 27: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Results ? 1) Visits : •  Spain •  Sun & Beach •  Pool displayed

2) First Recommendation •  Displays nature…

3) Only Images ? •  Pool all around the world

4) Third = Right Mix •  Get the bungalow

feature !

Page 28: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Learned along the way

What’s next ?

AYrac5veness  =  %  visits  with  tag  /  %  sales  with  tag  -­‐1    

For  ski  sales,  indoor  pictures  performs  beYer!    

Page 29: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

What’s Next ?

Kenya

Prague

Berlin

Cambodia

Page 30: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Conclusion

Do iterative data science ! Start simple and grow Evaluate at each steps Image labelling = BI on steroids

Transfer Learning Kick-start your project Gain time and money Any Data Scientist can do it

Deep Learning Don’t start from scratch ! Is there existing data ? Is there a pre-trained model ?

Page 31: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Thank you for your attention !

Page 32: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Basic Recommendation Engines •  Implementation

•  Everything in SQL -  Vertica -  Then Impala -  Then Spark

•  Collaborative filtering •  Score(user, future sale j) = Sum_{i:visited sale} sim(i,j)

•  Content based filtering •  Sale profile: (sale_id, feature, value) •  User profile: (user_id, feature, value) •  Sparse coding + join does the trick

Join  on  user_id  

Join  on  user_id  +  sales  id  

Page 33: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

One Meta Model to Rule Them All •  Negative sampling

•  Take all purchases tuples : (user, product, timestamp)-> 1 •  Select 5 sales open at the same date the user did not buy -> 0 •  The model directly optimize purchasing probability

•  Machine learning model •  Features : recommender systems. •  Logistic Regression

Regularizing effect : we don’t want to overfit leaks.

•  Reranking approach. Similar to Google or Yandex (Kaggle challenge)

Page 34: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Classification and not Detection problem… •  Only have probabilities of each class •  Selecting based on probability threshold fails •  Keeping all information is not sparse

Labels post-processing

Deep/Transfer Learning

5-10 tags per images 2s/image with CPU

x20 speed up with GPU Our images

Cafe

-> we keep 5 labels With probabilities per image

Page 35: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Solution 3 : What about APIs ?

Page 36: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

Solution 3 : What about APIs ? •  Price

•  Their cost often rather cheap. Ex: 100 K request for less than 300$

•  VS the one of redeveloping (probably not as well)

•  Full Database scoring •  APIs are often limited query per month. •  Make sure to be able to avoid cold start problem

•  Stability •  Use model versioning •  Avoid covariate shift, distribution drift

Page 37: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

What about APIs ? Use for generating labels !

How to steal model: •  1) Score part of the database for training •  2) Train a model •  3) Score your entire database !

(Or don’t, it’s illegal)

But I have only 5000 requests ? -> Use Transfer Learning !

Tramèr, Florian, et al. "Stealing Machine Learning Models via Prediction APIs." arXiv preprint arXiv:1609.02943 (2016).

Page 38: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

What about APIs ? Use for generating labels !

Experiment: •  5000 requests on API. (4500/500 split) •  Transfer learning with MIT Places Pre-trained Model •  Scikit learn Multilabel model

(Or don’t, it’s illegal) (demo, not used in any real project)

Page 39: Extracting information from images using deep learning and transfer learning — Pierre Gutierrez (dataiku labs) @PAPIs Connect — São Paulo 2017

What about APIs ? Results

Precision   75  

Label   Probability   Label   Probability  

landscape 1,0000 sunset 0,9998

sky 1,0000 no person 0,9996

outdoors 1,0000 water 0,9990

nature 1,0000 park 0,9849

rock 1,0000 river 0,9678

travel 1,0000 scenic 0,8031

Label   Probability   Label   Probability  

beach 1,0000 ocean 1,0000

summer 1,0000 relaxation 1,0000

sand 1,0000 island 1,0000

tropical 1,0000 idyllic 1,0000

travel 1,0000 seashore 0,9998

seascape 1,0000 water 0,9997

(demo, not used in any real project)

Recall   80  Accuracy   95