Analyzing Large-Scale User Data with Hadoop and HBase

32

description

WibiData's presentation on personalization and large-scale user data at Structure:Data 2012

Transcript of Analyzing Large-Scale User Data with Hadoop and HBase

Page 1: Analyzing Large-Scale User Data with Hadoop and HBase
Page 2: Analyzing Large-Scale User Data with Hadoop and HBase

Analyzing Large-Scale User Data with Hadoop and HBase

WibiData, Inc.

Aaron Kimball – CTO

Page 3: Analyzing Large-Scale User Data with Hadoop and HBase

We can now collect more data than at any time in history.

Page 4: Analyzing Large-Scale User Data with Hadoop and HBase

Yesterday’s engineering challenge: Fitting the problem into the hardware.

Page 5: Analyzing Large-Scale User Data with Hadoop and HBase

Today’s constrained resource is understanding.

Page 6: Analyzing Large-Scale User Data with Hadoop and HBase

How do we best apply data

…to better serving our users?

Page 7: Analyzing Large-Scale User Data with Hadoop and HBase

The best products are user-centric

• Intuitive UI

• Continuously learning – Guided search

– Smarter recommendations

• More effective service

Page 8: Analyzing Large-Scale User Data with Hadoop and HBase

What are we building toward?

Page 9: Analyzing Large-Scale User Data with Hadoop and HBase

What are we building toward?

Page 10: Analyzing Large-Scale User Data with Hadoop and HBase

What are we building toward?

Page 11: Analyzing Large-Scale User Data with Hadoop and HBase

What are we building toward?

Page 12: Analyzing Large-Scale User Data with Hadoop and HBase

What are we building toward?

Page 13: Analyzing Large-Scale User Data with Hadoop and HBase

Requirements

1. Understand the user population

Page 14: Analyzing Large-Scale User Data with Hadoop and HBase

Requirements

2. Respond to users in real time

Page 15: Analyzing Large-Scale User Data with Hadoop and HBase

Requirements

3. Support graceful data evolution

Page 16: Analyzing Large-Scale User Data with Hadoop and HBase

Large-scale data science is hard

• What does a user look like?

– What data is available about the user?

– Which features are important?

– Which features are correlated?

• How do I model this in MapReduce?

• How do I serve results in a timely fashion?

Page 17: Analyzing Large-Scale User Data with Hadoop and HBase
Page 18: Analyzing Large-Scale User Data with Hadoop and HBase

Tools of the trade

• Store all data about a user in one place

• Support real-time get/put, as well as MapReduce

Page 19: Analyzing Large-Scale User Data with Hadoop and HBase

Tools of the trade

• Use complex data types to model complex data

• Support extended data models over time

• Retain support for legacy systems using older models

Page 20: Analyzing Large-Scale User Data with Hadoop and HBase

Tools of the trade

• Abstract computational model away from MapReduce

• Support computation over all users… or one user at a time

Page 21: Analyzing Large-Scale User Data with Hadoop and HBase
Page 22: Analyzing Large-Scale User Data with Hadoop and HBase
Page 23: Analyzing Large-Scale User Data with Hadoop and HBase
Page 24: Analyzing Large-Scale User Data with Hadoop and HBase
Page 25: Analyzing Large-Scale User Data with Hadoop and HBase

: for set-top boxes

Viewing/recording history

Page 26: Analyzing Large-Scale User Data with Hadoop and HBase

Viewing/recording history

Personalized offers and recommendations

Libraries Device and User Analysis

: for set-top boxes

Page 27: Analyzing Large-Scale User Data with Hadoop and HBase

Viewing/recording history

Personalized offers and recommendations

Libraries Device and User Analysis

Analysis for product roadmap

: for set-top boxes

Page 28: Analyzing Large-Scale User Data with Hadoop and HBase

Viewing/recording history

Personalized offers and recommendations

Libraries Device and User Analysis

Analysis for product roadmap Tech support portal

: for set-top boxes

Page 29: Analyzing Large-Scale User Data with Hadoop and HBase

Viewing/recording history

Improved reports for advertisers

Personalized offers and recommendations

Libraries Device and User Analysis

Analysis for product roadmap Tech support portal

: for set-top boxes

Page 30: Analyzing Large-Scale User Data with Hadoop and HBase

The future

• More personalization

• Adaptive UIs (self arranging dashboards)

• Targeted content, ads

• More effective customer service

Page 31: Analyzing Large-Scale User Data with Hadoop and HBase

Conclusions

• Applications are becoming increasingly user-centric

• Data drives this capability, but harnessing it requires a new distributed architecture

• The biggest challenge is allowing data scientists to effectively leverage the data

Page 32: Analyzing Large-Scale User Data with Hadoop and HBase

www.wibidata.com / @wibidata Aaron Kimball – [email protected]