Challenges and Opportunities in Data Mining: Big...

31
Center for Web Intelligence Center for Web Intelligence School of Computing, DePaul University Chicago, Illinois, USA Challenges and Opportunities in Data Mining: Big Data Predictive User Modeling and Big Data, Predictive User Modeling, and Personalization Bamshad Mobasher Center for Web Intelligence Center for Web Intelligence School of Computing DePaul University, Chicago, Illinois, USA April 20, 2012

Transcript of Challenges and Opportunities in Data Mining: Big...

Page 1: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

Challenges and Opportunities in Data Mining:Big Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Personalization

Bamshad MobasherCenter for Web IntelligenceCenter for Web Intelligence

School of ComputingDePaul University, Chicago, Illinois, USA

April 20, 2012

Page 2: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Google Trends: Data Mining vs. AnalyticsGoogle Trends: Data Mining vs. Analytics

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

2

Page 3: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

The Big Question?The Big Question?

Will data mining remain relevant? If so, how?

Quick survey: Do you think the amount of data available in the digital worldg

will decrease in the future?will become less complex?

Where is the Life we have lost in living?Where is the wisdom we have lost in knowledge?Where is the knowledge we have lost in information?

-- T.S. Eliot, “The Rock”

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

3

,

Page 4: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

How much data?Google: ~20-30 PB a dayWayback Machine has ~4 PB + 100-200 TB/month

f /Facebook: ~3 PB of user data + 25 TB/dayeBay: ~7 PB of user data + 50 TB/dayCERN’s Large Hydron Collider generates 15 PB a yearCERN s Large Hydron Collider generates 15 PB a yearIn 2010, enterprises stored 7 Exabytes = 7,000,000,000 GB

640K ought to be enough for anybody.

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

Page 5: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

The Data Tsunami

McKinsy Global Institute Report:“Big Data: the next frontier forg

innovation, competition and productivity”

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

5

Page 6: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Big Data Valueg

McKinsy Global Institute Report:“Big Data: the next frontier for innovation,

competition and productivity”

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

6

Page 7: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

7

Page 8: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

8

Page 9: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

What’s Seen the Most Growth in 2008-2011

Types of Data Types of Activities/Areas• Location / Geo / Mobile Data • Search / Web content mining• Music / Audio• Social Media / Social Networks• Time Series

g• Text mining / opinion analysis• Personalization / recommendation• Social network / Social media

• Images / Video• User Profile data• Text feeds / Micro-blog data

analysis• Topic modeling / micro-blog analysis

H lth i f ti

Much of this growth is driven by end user mobile or Web-based applications

• Health informatics

applicationsusers are inundated with huge volume of complex informationneed for more personalized intelligent applications

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

9

Page 10: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Personalization

The ProblemDynamically serve customized content (pages productsDynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests

Why we need it?Information spaces are becoming much more complex for userInformation spaces are becoming much more complex for user to navigate (huge online repositories, social networks, mobile applications, blogs, ….)For businesses: need to grow customer loyalty / increase salesFor businesses: need to grow customer loyalty / increase salesIndustry Research: successful online retailers are generating as much as 35% of their business from recommendations

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

10

Page 11: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Data Mining and PersonalizationData Mining and Personalization

“Killer App” for data mining?Tangible successes both in the research and in industrial applications

recommender systemsrecommender systemspersonalized Web agentsuser adaptive systemsWeb marketing and eCRMpersonalized search

Sophisticated modeling approaches based on bothSophisticated modeling approaches based on both predictive and unsupervised DM techniques

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

11

Page 12: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

PersonalizationCommon Approaches

Collaborative FilteringCollaborative FilteringGive recommendations to a user based on preferences of “similar” users

Content Based FilteringContent-Based FilteringGive recommendations to a user based on items with “similar” content in the user’s profile

R l B d (K l d B d) Filt iRule-Based (Knowledge-Based) FilteringProvide recommendations to users based on predefined (or learned) rulesage(x, 25-35) and income(x, 70-100K) and childred(x, >=3) recommend(x, Minivan)

Combined or Hybrid Approaches

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

12

Page 13: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

The Recommendation Task

Basic formulation as a prediction problem

Given a profile Pu for a user u, and a target item it, predict the preference score of user u on item it

Typically the profile P contains preference scores by u

predict the preference score of user u on item it

Typically, the profile Pu contains preference scores by uon some other items, {i1, …, ik} different from it

preference scores on i1, …, ik may have been obtained explicitly ( i ti ) i li itl ( ti t d t(e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article)

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

13

Page 14: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

The Recommendation Task

Content-Based RecommendationPredictions for unseen (target) items are computed based onPredictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile

C ll b ti R d tiCollaborative RecommendationPredictions for unseen (target) items are computed based the other users’ with similar interest scores on items in user u’sprofile

i.e. users with similar tastes (aka “nearest neighbors”)requires computing correlations between user u and other users

di i iaccording to interest scores or ratingsk-nearest-neighbor (knn) strategy

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

14

Page 15: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Content-Based Recommender

Systems

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

15

Page 16: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Content-Based Recommenders: Personalized SearchPersonalized Search

How can the search i d t i thengine determine the

“user’s intent”??

Query: “Madonna and Child”

??

Need to “learn” the user profile:pUser is an art historian?

User is a pop music fan?

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

16

Page 17: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Content-Based Recommenders:: more examples

Music recommendationsPlay list generation

Example: PandoraCenter for Web IntelligenceCenter for Web Intelligence

School of Computing, DePaul UniversityChicago, Illinois, USA

17

Example: Pandora

Page 18: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Collaborative Recommender

Systems

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

18

Page 19: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Collaborative Recommender Systems

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

19

Page 20: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Collaborative Recommender Systems

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

20

Page 21: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Personalization Based on User Behavior Data: Data Mining ApproachData Mining Approach

Typically an Offline ProcessData Preparation / Modeling Phase Pattern Discovery Phase

Implicit or explicit User preference data

(clicktrhoughs, ratings, purchases, reviews

Pattern FilteringAggregation

Pattern Analysis

p ,

Data CleaningData Integration

Data Preprocessing

AggregationCharacterization

AggregateUser Models

Data IntegrationData Transformation

Event Model GenerationSessionization

Data Mining

PatternsContent

& Structure

UserTransaction /PreferenceDatabase

User SegmentationItem Clustering / SimilarityUser/Item Classification

Correlation AnalysisAssociation Rule Mining

Sequential Pattern Mining

Domain Knowledge

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

21

Sequential Pattern Mining

Page 22: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Personalization Based on User Behavior Data: Data Mining ApproachData Mining Approach

Online Process

Recommendation Engine

Recommendations,Integrated

AggregateUser Models

<user,item1,item2, Recommendations,Predictions

gUser Profile

user,item1,item2,…>

Stored User Profile

Web Server Client ApplicationActive SessionDomain Knowledge

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

22

Page 23: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

New Challenges g

Context-AwarenessCan s stems nderstand ser’s conte t sit ationCan systems understand user’s context, situation, current intentions?Need to understand “task” being performed; user’s g p ;environment, domain knowledge/characteristics; short-term and long-term preferences

I t ti D i K l dIntegrating Domain KnowledgeMost current modeling approaches focus on the discovery of “shallow” patternsdiscovery of shallow patternsDM + Domain Knowledge (DM + AI) intelligent apps that can reason about / explain patterns

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

23

Page 24: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

New Challenges Security / Trust / Reputation

Many user adaptive systems vulnerable to malicious manipulation (e g “shilling”)manipulation (e.g., shilling ) Need more robust algorithms and ways to detect malicious profilesI i l t th ti f “ t ti ” b iti lIn social systems the notion of “reputation” beocmes critical

SerendipityMost predictive models not necessarily the bestp yNeed the ability to “surprise” or provide novelty

Big Data ChallengesQ i f l i f k d l i hQuestions of scale require new frameworks and algorithmsWide variation in user behaviors require more sophisticated models (e.g., matrix factorization, hybrid / ensemble models)

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

24

Page 25: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Challenges:: Problems of Scaleg

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

25

Page 26: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

New Opportunities:: Social Annotation S stemsSystems

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

26

Page 27: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Amazon Example: Tags describe the gResource

• Tags can describeTags can describe• The resource (genre, actors, etc)

• Organizational (toRead)• Subjective (awesome)

• Ownership (abc)etc

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

• etc

Page 28: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Tag RecommendationTag Recommendation

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

Page 29: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

Example: Tags describe the userThese systems are “collaborative.”

Example: Tags describe the user

Recommendation / Analytics based on the “wisdom of crowds.”

Rai Aren's profileRai Aren s profileco-author

“Secret of the Sands"

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

Page 30: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

New Opportunities:: Social RecommendationRecommendation

A form of collaborative filtering using social network data

U filUsers profiles represented as sets of links to other nodes (users or items) in the networkPrediction problem: inferPrediction problem: infer a currently non-existent link in the network

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

30

Page 31: Challenges and Opportunities in Data Mining: Big …facweb.cs.depaul.edu/mobasher/Sagence-April20-2012.pdfBig Data Predictive User Modeling andBig Data, Predictive User Modeling, and

ConclusionsPersonalization and Recommendation Technologies

The killer app for predictive data analyticsThe killer app for predictive data analyticsWill drive the next generation of Web applications

Lots of new (and old) challengesNew: Social media and social networks provide new challenges and opportunities; big data challenges scalability and effectiveness of old algorithmsscalability and effectiveness of old algorithmsOld: scalability, sparsity, scrutability, serendipityPromising new work:Promising new work:

New approaches to hybridizationSocial media analytics

Center for Web IntelligenceCenter for Web IntelligenceSchool of Computing, DePaul University

Chicago, Illinois, USA

Context-aware recommendation / personalization31