Big data market research
-
Upload
frank-smadja -
Category
Software
-
view
105 -
download
0
description
Transcript of Big data market research
![Page 1: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/1.jpg)
The Big Data Challenges of
Computational Market Research Frank Smadja
[email protected] (@FrankieMbaye)EVP Engineering
Toluna
April 2014
![Page 2: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/2.jpg)
Toluna
Table of Content
1. What is a Market Research study2. The main challenge: Targeting.3. Machine Learning Problem and Model4. Some Experiments5. Current and Future Work
![Page 3: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/3.jpg)
Toluna
What is a Market Research Study?
![Page 4: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/4.jpg)
Toluna
Market Research Goal: Answering Questions for Brands
Customer/Employee Satisfaction:
• Are my customers happy?
• What can I do better for them?
• Am I getting better or worse?
Concept testing:
• Would dog owners buy my organic dog food?
• What should be my target market?
Ad testing:
• Is my advertising campaign effective?
Brand positioning:
• How is my brand doing compared to the competition?
• What are my perceived strong features?
• Where should I invest more?
And many more types of questions
![Page 5: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/5.jpg)
Toluna
Output Example : ‘Positioning survey’ for Hilton Garden Inn.
![Page 6: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/6.jpg)
Toluna
Output Example : ‘Positioning survey’ for Hilton Garden Inn.
![Page 7: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/7.jpg)
Toluna
Example : Positioning survey for Beyonce
![Page 8: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/8.jpg)
Toluna
Example : Positioning survey for Beyonce
![Page 9: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/9.jpg)
Toluna
Market Research Main Challenge: Targeting
Select segment of respondents (sample) that is:
• Relevant to the question (dog owners who have one big dog and one small dog, smokers who are trying to stop, etc.)
• Representative and balanced (not biased).
The tougher/restrictive the targeting, the more expensive the study.
![Page 10: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/10.jpg)
Toluna
The Targeting Pipeline and Incidence Rate
Demographics Behavioral Study
Select the right population based on simple demographic attributes: Age, Gender, Region, Ethnicity, Income, etc.
Further select based on behavioral and custom attributes: fly more than 5 times a year, uses aspirin on a daily basis, etc.
Fixed set of attributes known beforehand
Free style attributes, usually unknown.
Incidence Rate:
IR = Completes / Starts
Cost is a growing function of IR
Targeting process
Start
Complete
![Page 11: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/11.jpg)
Toluna
Why is targeting hard?
Looking for 1,000 people in the UK who “smoke,” “tried to stop in the past,” “live around London,” “age 24-50.”
Data on UK population:
• 18% of the UK adults smoke
• 40% of smokers tried to stop
• 15% of the population is in the London area
• 30% is between 24-50Incidence rate:0.18 * 0.4 *.15 * .3 = 0.3 %Sample size: 333,333 UK
London
Adults
Smokers
Tried to stop
![Page 12: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/12.jpg)
Toluna
State of the Art: Use Known Demographic Features
• Basic Demographics are known: 100% incidence.o Age and London
• Smokers: 18%
• Tried to stop: 40%
Incidence rate:1 * 0.18 * 0.4 = 9 %
Sample size: 11,000
Adults in the London Area
Smokers
Tried to stop
![Page 13: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/13.jpg)
Toluna
New Direction: Use Known Features and Predict Unknown Features
• Basic Demographics are known: 100% incidence.o Age and London
• If we could predict smokers with 85% accuracy.
• Tried to stop still unknown: 40%
Incidence rate:1 * 0.85 * 0.4 = 34 %
Sample size: 2,900
Adults in the London Area who are predicted to be smokers
Tried to stop
Smokers
![Page 14: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/14.jpg)
Toluna
How to Predict Features?The Space Model
Users
Features
Shirt color
Red Blue
Smokes?
Yes NoSex, Age, Region, etc.
User 1
User 2
User 3
User 4
10^^9 users
10^^7 features
Sparse Matrix containing all the attributes (integer answers to questions) we have ever asked.
Demographic attributes
Behavioral attributes
![Page 15: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/15.jpg)
Toluna
The Learning Task - The Model
Try to predict answer to the “Smokes?” attribute based on other attributes.
Smokes? Dog owner? Jogger? Overweight?
![Page 16: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/16.jpg)
Toluna
The Learning Task - Collaborative FilteringUser correlation or Feature correlation
User correlation: High level features [William Cohen]
• If Josie and Bob both have the X feature then if Josie has the Y feature, Bob is more likely to have the Y feature as well.
• Dog owners
• Political inclination, Taste, LifestyleFeature correlation:
• If Josie has the X feature, Josie is more likely to also have the Y feature.
• Joggers (y) and Smokers (n)
• Favorite sports and Race/Ethnicity
• Income level and Education level
![Page 17: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/17.jpg)
Toluna
Smaller Task: Complete missing data on a single survey for a single customer.
Example: On a specific survey, some respondents skip some questions on income, some other skip the income level question. Use answers provided by other respondents to impute the missing data.
Imputation: Complete missing data with substituted values with more or less sophistication. Mean, Nearest neighbor, Multiple Imputation, etc. [Andridge & Little 2011], [Rubin 1987], ...
Implementation: IBM, SPSS Missing Values module. Uses an iterative Markov Chain Monte Carlo (MCMC) and multiple imputation.
Used by the US Census bureau.
First Experiments with Multiple Imputation
![Page 18: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/18.jpg)
Toluna
First Experiments with Multiple Imputation Some Results
Where it does not work:
• Too much missing data (over 10%)
• Too many possible answers (what is the name of your children? what is your home city, etc.)
• Not enough data overall (less than 1,000)
Example of features that work well:Dog owners, Smokers, Income level, Age (3 bands), etc.
Accuracy: 85% using blind tests.
![Page 19: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/19.jpg)
Toluna
Current Work
Currently working on the storing component in AWS using Hbase, Elastic search and Hadoop.
Some queries:
• Find people who Smoke, Have a red shirt and are between 22 and 34.
• Compute and store the similarity or correlation between any two pair of users.
• Compute and store the similarity between features.
![Page 20: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/20.jpg)
Toluna
Future Work
• Define model: binary features (smokes), Integer (number of children, income), Strings (city, diseases, etc.).
• Experiment on a large scale with Collaborative Filtering algorithm and others.
• Experiment with user based and feature based filtering (blend?, Slope-One?)
• Integrate this into Targeting methodology
![Page 21: Big data market research](https://reader036.fdocuments.in/reader036/viewer/2022062511/54c6648e4a7959f3208b4576/html5/thumbnails/21.jpg)
Toluna
Q&A
Suggestions? Ideas?Comments?Questions?