Analyzing Large-Scale User Data with Hadoop and HBase
-
Upload
wibidata -
Category
Technology
-
view
2.064 -
download
0
description
Transcript of Analyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBase
WibiData, Inc.
Aaron Kimball – CTO
We can now collect more data than at any time in history.
Yesterday’s engineering challenge: Fitting the problem into the hardware.
Today’s constrained resource is understanding.
How do we best apply data
…to better serving our users?
The best products are user-centric
• Intuitive UI
• Continuously learning – Guided search
– Smarter recommendations
• More effective service
What are we building toward?
What are we building toward?
What are we building toward?
What are we building toward?
What are we building toward?
Requirements
1. Understand the user population
Requirements
2. Respond to users in real time
Requirements
3. Support graceful data evolution
Large-scale data science is hard
• What does a user look like?
– What data is available about the user?
– Which features are important?
– Which features are correlated?
• How do I model this in MapReduce?
• How do I serve results in a timely fashion?
Tools of the trade
• Store all data about a user in one place
• Support real-time get/put, as well as MapReduce
Tools of the trade
• Use complex data types to model complex data
• Support extended data models over time
• Retain support for legacy systems using older models
Tools of the trade
• Abstract computational model away from MapReduce
• Support computation over all users… or one user at a time
: for set-top boxes
Viewing/recording history
Viewing/recording history
Personalized offers and recommendations
Libraries Device and User Analysis
: for set-top boxes
Viewing/recording history
Personalized offers and recommendations
Libraries Device and User Analysis
Analysis for product roadmap
: for set-top boxes
Viewing/recording history
Personalized offers and recommendations
Libraries Device and User Analysis
Analysis for product roadmap Tech support portal
: for set-top boxes
Viewing/recording history
Improved reports for advertisers
Personalized offers and recommendations
Libraries Device and User Analysis
Analysis for product roadmap Tech support portal
: for set-top boxes
The future
• More personalization
• Adaptive UIs (self arranging dashboards)
• Targeted content, ads
• More effective customer service
Conclusions
• Applications are becoming increasingly user-centric
• Data drives this capability, but harnessing it requires a new distributed architecture
• The biggest challenge is allowing data scientists to effectively leverage the data
www.wibidata.com / @wibidata Aaron Kimball – [email protected]