Analyzing Large-Scale User Data with Hadoop and HBase

Post on 21-Nov-2014

2.064 views 0 download

Tags:

description

WibiData's presentation on personalization and large-scale user data at Structure:Data 2012

Transcript of Analyzing Large-Scale User Data with Hadoop and HBase

Analyzing Large-Scale User Data with Hadoop and HBase

WibiData, Inc.

Aaron Kimball – CTO

We can now collect more data than at any time in history.

Yesterday’s engineering challenge: Fitting the problem into the hardware.

Today’s constrained resource is understanding.

How do we best apply data

…to better serving our users?

The best products are user-centric

• Intuitive UI

• Continuously learning – Guided search

– Smarter recommendations

• More effective service

What are we building toward?

What are we building toward?

What are we building toward?

What are we building toward?

What are we building toward?

Requirements

1. Understand the user population

Requirements

2. Respond to users in real time

Requirements

3. Support graceful data evolution

Large-scale data science is hard

• What does a user look like?

– What data is available about the user?

– Which features are important?

– Which features are correlated?

• How do I model this in MapReduce?

• How do I serve results in a timely fashion?

Tools of the trade

• Store all data about a user in one place

• Support real-time get/put, as well as MapReduce

Tools of the trade

• Use complex data types to model complex data

• Support extended data models over time

• Retain support for legacy systems using older models

Tools of the trade

• Abstract computational model away from MapReduce

• Support computation over all users… or one user at a time

: for set-top boxes

Viewing/recording history

Viewing/recording history

Personalized offers and recommendations

Libraries Device and User Analysis

: for set-top boxes

Viewing/recording history

Personalized offers and recommendations

Libraries Device and User Analysis

Analysis for product roadmap

: for set-top boxes

Viewing/recording history

Personalized offers and recommendations

Libraries Device and User Analysis

Analysis for product roadmap Tech support portal

: for set-top boxes

Viewing/recording history

Improved reports for advertisers

Personalized offers and recommendations

Libraries Device and User Analysis

Analysis for product roadmap Tech support portal

: for set-top boxes

The future

• More personalization

• Adaptive UIs (self arranging dashboards)

• Targeted content, ads

• More effective customer service

Conclusions

• Applications are becoming increasingly user-centric

• Data drives this capability, but harnessing it requires a new distributed architecture

• The biggest challenge is allowing data scientists to effectively leverage the data

www.wibidata.com / @wibidata Aaron Kimball – aaron@wibidata.com