Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering...
Transcript of Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering...
![Page 1: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/1.jpg)
Automatic Feature Engineering The manual approach
Pierre Gutierrez Leo Dreyfus-Schmidt
Du Phan
![Page 2: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/2.jpg)
CONTENTS
A general-purpose human-powered feature generation pipeline.
INITIAL SOLUTION
Dealing with vertical business problems.
Accelerating the feature engineering process.
MOTIVATION
![Page 3: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/3.jpg)
Leveraging relational nature between tables to aggregate features.
DEEP FEATURE SYNTHESIS
What about Deep Learning ?
Where does the solution fit in a general data science workflow ?
CONCLUSION
![Page 4: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/4.jpg)
CONTENTS
A general-purpose human-powered feature generation pipeline.
INITIAL SOLUTION
Dealing with vertical business problems.
Accelerating the feature engineering process.
MOTIVATION
![Page 5: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/5.jpg)
![Page 6: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/6.jpg)
![Page 7: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/7.jpg)
How can we provide a general solution for these problems ?
![Page 8: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/8.jpg)
There is always a structure in our data waiting to be exploited
![Page 9: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/9.jpg)
Feature Engineering process
(sample size: 1)
37 % Meh.Fun !
![Page 10: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/10.jpg)
How can we accelerate the boring parts ?
![Page 11: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/11.jpg)
CONTENTS
A general-purpose human-powered feature generation pipeline.
INITIAL SOLUTION
Dealing with vertical business problems.
Accelerating the feature engineering process.
MOTIVATION
![Page 12: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/12.jpg)
CONTENTS
A general-purpose human-powered feature generation pipeline.
INITIAL SOLUTION
Dealing with vertical business problems.
Accelerating the feature engineering process.
MOTIVATION
![Page 13: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/13.jpg)
OBJECTIVELeveraging human knowledge for automatic
feature engineering
Build a general-purpose feature generation pipeline
Create expressive features based on user's data model
Versatility, Modularity and Interpretability
![Page 14: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/14.jpg)
Most problems can be aggregated with some primary keys
![Page 15: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/15.jpg)
user_id user_id + event_timestamp
user_id + product_id
![Page 16: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/16.jpg)
Most features belong to a “general” feature family
![Page 17: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/17.jpg)
Frequency: how often does the client do a specific action ?
![Page 18: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/18.jpg)
Recency: when was the last time that he did this action ?
![Page 19: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/19.jpg)
Monetary: what is his spending habit ?
![Page 20: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/20.jpg)
Distribution: what type of clients is he ?
![Page 21: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/21.jpg)
![Page 22: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/22.jpg)
fittransform
![Page 23: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/23.jpg)
Feature: frequency of the buying event in the last 6 months Time window: last 6 months
Primary key: user_id Filter: event_type is buy_order
![Page 24: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/24.jpg)
DROP TABLE IF EXISTS frequency_feature_last_6_month_group_1; CREATE TABLE frequency_feature_last_6_month_group_1 AS( SELECT *, group_1_frequency_last_6_month/6 as mean_frequency_group_1_per_month_last_6_month FROM ( SELECT user_id, COUNT(event_timestamp) as group_1_frequency_last_6_month FROM ( SELECT * FROM “events_complete" WHERE event_timestamp::timestamp >= (ref_date - INTERVAL '6 month') AND event_timestamp::timestamp <= ref_date AND event_type IN (‘buy_order’) ) as table_layer_2 GROUP BY user_id ) as table_layer_3
![Page 25: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/25.jpg)
Leveraging relational nature between tables to aggregate features.
DEEP FEATURE SYNTHESIS
What about Deep Learning ?
Where does the solution fit in a general data science workflow ?
CONCLUSION
![Page 26: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/26.jpg)
Max Kanter Kalyan Veeramachaneni
![Page 27: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/27.jpg)
![Page 28: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/28.jpg)
Features are often derived using relationships in the dataset
![Page 29: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/29.jpg)
![Page 30: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/30.jpg)
![Page 31: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/31.jpg)
![Page 32: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/32.jpg)
Across datasets, many features are derived using similar mathematical operations
![Page 33: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/33.jpg)
![Page 34: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/34.jpg)
![Page 35: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/35.jpg)
New features are composed using previously derived features
![Page 36: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/36.jpg)
Customers CustomerID
Age Churned
Orders CustomerID
OrderID Date
OrderProduct OrderID
ProductID Product.Price
![Page 37: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/37.jpg)
Step 1: SUM(Product.Price) GROUP BY OrderID
![Page 38: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/38.jpg)
Step 2: AVG(Orders.SUM(Product.Price)) GROUP BY CustomerID
-> average expense per order per customer
![Page 39: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/39.jpg)
![Page 40: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/40.jpg)
![Page 41: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/41.jpg)
Limit ?
![Page 42: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/42.jpg)
Brute-force nature -> Feature selection needs to be considered
![Page 43: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/43.jpg)
Leveraging relational nature between tables to aggregate features.
DEEP FEATURE SYNTHESIS
What about Deep Learning ?
Where does the solution fit in a general data science workflow ?
CONCLUSION
![Page 44: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/44.jpg)
Alex Net (2012)
![Page 45: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/45.jpg)
Feature Engineering vs Representation Learning: the chess game metaphor
![Page 46: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/46.jpg)
Feature Engineering: same game, different forms
![Page 47: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/47.jpg)
Representation Learning: different game
![Page 48: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/48.jpg)
Interpretability: do you need it ?
![Page 49: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/49.jpg)
Where does this method fit in the data science workflow ?
![Page 50: Automatic Feature Engineering The manual approach Amsterdam Du... · Automatic Feature Engineering The manual approach Pierre Gutierrez Leo Dreyfus-Schmidt Du Phan. CONTENTS A general-purpose](https://reader033.fdocuments.in/reader033/viewer/2022051605/600e09743e6a891fda217a35/html5/thumbnails/50.jpg)
Thank you for your attention! Question time