UIUC Reflections 2015: Data Science and Your Financial Journey
-
Upload
winnie-cheng -
Category
Data & Analytics
-
view
771 -
download
0
Transcript of UIUC Reflections 2015: Data Science and Your Financial Journey
Data Science and Your Financial Journey
10/3/2015
Winnie Cheng, Chief Data Scientist
@Bankrate.com
10/3/20152
What is Bankrate.com?EMPOWERS YOU TO MANAGE $ FOR YOUR LIFE GOALS
Image Source: http://www.blue-fs.co.uk/
Credit Cards
Mortgage
Auto Loans
Investments
Insurance
Retirement
10/3/20153
Faces behind Bankrate.comHQ in sunny Palm Beach Florida
Offices in NYC, SF and major US Cities
500+ Employees
10/3/20154
Where does Data Science come in?EVERYWHERE.
Information Relevancy
what you want to know. when you want to know it
1
Search & Reachability
get the word out there
2
Marketplace Intelligence
connect what you need to who we know
3
SVM
Anomaly Detection
• Determine: Are you looking to buy a home or plan for retirement?
• User Segmentation divides visitors into groups with similar characteristics
• What do we know?
– For each user, articles read on our site
• Group users with similar preference into a segment
– Machine learning can help here
• Compute user-to-user similarity
• Apply Hierarchical Clustering Algorithm
Distance = 1 - similarity
10/3/2015
User Intent and Segmentation: Who are you?PREDICTING YOUR LIFE STAGE
10/3/20156
Nodes: Users
Edges: User-User Similarity
Visualizing User Segments
10/3/20157
Visualizing User Segments
Car Buyers
Tax
Retirement
Home Purchase
10/3/20158
• Previous approach useful for looking at how users naturally form groups
• But what if we want to find a very specific group of users?
– E.g., those who are looking to buy home for the first time?
• Find list of relevant articles with item-to-item similarity
– Start with a few articles on first-time home buying
– Similarity model helps identify more articles relevant to first-time homebuying
• People that read X also read Y
• If users read any of relevant articles, they are in group
Finding a specific user segmentTURNING THE PROBLEM AROUND
10/3/20159
• Product Managers are constantly improving the look-and-feel of our site
and Digital Content teams work hard to produce informative articles and
engaging videos.
Moneyball: Site Optimization FrameworkALGO-DRIVEN IMPROVEMENT OF SITE
Moneyball – Decide what works, what doesn’t
• How to determine whether this headline is better than another variation?
• How to serve ‘best’ headline or design variation in timing manner?
• How to do this with thousands of components changing at the same time?
Or “Rates are going up
with more home buyers”?
10/3/201510
• Want to pick and serve variations that have highest user engagement (click-
through-rate CTR)
• Analogous to abstract class of math problem: Multi-Armed Bandit
• Slot Machines in a Casino
– Which machine (arm) to pull next to maximize my chance of hitting jackpot
without going broke?
Multi-Armed Bandit Problem
Slot Machines -> Design Variations
Pull It -> Show it (Page Impression)
Jackpot -> User Click
Cost to Pull -> Cost of showing ‘bad’ variation and losing the click
Casino to Site Optimization
10/3/201511
• Algorithm for multi-armed bandit problem that minimizes the regret
• Key idea:
– For each variation, estimate and model the distribution of CTR
– Pick next variation based on sampling from these distribution and taking the one
with higher CTR
Bayesian Bandit
CTR
Pro
babili
ty
W1: 56.37%W2: 43.63%
W1: 9.82%W2: 90.18%
Steady state CTR
Variation 1: 25%Variation 2: 30%
10/3/201512
Big Data Stack to Support Real-Time Streaming
Full-feedback tech stack
• Designed low-latency big data platform from the grounds-up
• Worked closely with data engineering team
• Ensure 24/7 availability and scalability to multiple data centers
10/3/201513
• Great algorithms and technology to provide relevant information to users
• But how do users find Bankrate.com?
– Portion of traffic comes from major search engines
• SEO team makes our site easily reachable from these engines
• Data Science helps SEO team:
– Understand connectivity of pages within our site
– Assess PageRank of urls relative to each other
– Perform Path Analysis and identify deadlinks
Search Engine Optimization (SEO)
10/3/201514
Network Graph with Neo4j
10/3/201515
• Having accurate tags on articles is important as they serve as inputs for
content recommendation, other machine learning algorithms and internal
data reporting and visualization
• Approach:
– Apply supervised learning to suggest what tags should be associated to a new
article based on the words in it
– Text is messy, several Natural Language Processing (NLP) techniques to get a
good set of words
• Stemming, Term Frequency Filtering, etc.
• Topic modeling to visualize processed text
Content TaggingIMPROVE CATEGORIZATION OF ARTICLES
10/3/201516
Topic ModelingLATENT DIRICHLET ALLOCATION (LDA) CLUSTERING ALGORITHM (# TOPICS=10)
10/3/201517
• Train classification algorithm (RandomForest, SGD)
– Present it with articles and how each is tagged
Tag Suggestion as Classification ProblemSHOULD WE TAG THIS ARTICLE WITH KEYWORD X?
addit isnt mile particularli knock increasingli fear suffer 24 whose privat group monitor monitor monitor monitormonitor expos mother had state better certainli deviat trauma must senat norm woman woman around familiar watchwatch solut know fall fall shadow surviv requir choic enabl mother-in-law $59 benefit she she quick bone still oldmental replac $100000 see cost cost video home home home home home home home 80 genworth said clockpattern away abl abl abl abl figur figur estim health testifi told plu patient situat base let tub stay sinc care care carecare comparison last testimoni my technolog technolog surround fell place unabl committe long-term chang think firstfirst live point schedul independ elder elder alreadi famili famili famili nurs nurs nurs nurs save save fee commununivers regist system system system system system system system long by stuck reli privaci privaci privaci monthlidevast immedi devis tell tell friend friend door big hundr electr ridden phone broke $9 instant $3 he 10 octoberinsignific hour possibl provid older bed guilt didnt aliv featur comput almost certain deliv stroke lie engin floor policother frail imag arent insur normal assist assist reach someon alert alert healthi moment aros medicaid physic billionbillion wife professor no spent issu issu issu medicar missouri
Actual Tags: retired, investingPredicted Tags: retired, investing
Post-Processed Text of Article
10/3/201518
• Due diligence in onboarding new
lenders
How to ensure lenders are protected
from malicious traffic and click bots?
• Fraud Detection Models as semi-
supervised machine learning
– Identify outliers
– Construct initial training set
– Train model to predict whether a
given click is fraudulent or not
• Some observations:
– Foreign countries, site crawl
– Intent inconsistencies
Bankrate.com a Trusted MarketplaceCLICK FRAUD DETECTION – HOW TO ENSURE LENDERS ARE PROTECTED?
Users
(Home Buyers)Banks & Lenders
10/3/201519
Model-driven insights:
Can we assess how likely a user will get a
mortgage approval? Suggest remedial actions?
Can we predict CTR and conversion from
market conditions and site dynamics?
Can we anticipate demands for specific
financial products? (e.g. refinance season)
… and more.. to improve your financial journey
Assessing Lead Quality and DemandTAKING IT FURTHER
Users
(Home Buyers)Banks & Lenders