Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

22
Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University

Transcript of Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Page 1: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Recommender Systems

Sumir ChandraThe Applied Software Systems

LaboratoryRutgers University

Page 2: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Introduction Information overload – decisions??? Too many domains, less experience, too much

data- books, movies, music, websites, articles, etc.

System providing recommendations to users based on opinions/behaviors of others- efficient attention, better matches, non-obvious connections, keep users coming back for more …

E.g. – E-commerce: Reel.com, Levi’s, eBay, Excite Commerce: call centers, direct marketing

Page 3: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Introduction (contd.) Data sources: purchase data, browsing &

searching data, feedback by users, text comments, expert recommendations

Taxonomy:- text comments (expert/user reviews)- attribute based (this author also wrote …)- item-to-item correlation (people who bought this item also bought …)- people-to-people correlation (users like you …)

Primary transformation: recommendation aggregation or good matching between recommender and seeker

Page 4: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

CorrelationsItem-to-item correlation Connect users to

items they may be unaware of

Based on keywords or features of object

Key statistic: high/low- # people who bought A & B / # people who bought A

People-to-people correlation Collaborative filtering Assumes user will-

- prefer like-minded prefer- prefer dissimilar dislike

Object ranking by users CF: majority rules, nearest

neighbor, weighted averages (prediction, S.D., covariance) +ve or -ve

Page 5: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Design IssuesTechnical Design Space Content of evaluation: single bit to unstructured

textual notations – ease of use, computation overload Explicit/Implicit evaluation: nature of

recommendation User identity: real names, pseudonyms, anonymous Evaluation aggregation: research area – weighted

voting, content analyses, referral chains, etc. Evaluation usage: filtering out negatives, sorting of

items according to numeric evaluations, display

Page 6: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Design Issues (contd.)

Page 7: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Design Issues (contd.)Domain-Space Characteristics of items evaluated

Domain to which items belong

Sheer volume variable Lifetime – rate of gathering

and distributing evaluations Cost structure – miss a

good item, sample a bad one, costs of incorrect decisions

Domain-Space Characteristics of participants and evaluations

Set of recommenders Recommendation density – do

recommenders tend to evaluate many items in common

Set of consumers Consumer taste variability –

taste matching better for larger set, personalized aggregation better when tastes differ

Page 8: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Design Issues (contd.)

Page 9: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Design Issues (contd.)Social Implications Free Riders: take but not give; mandatory,

monetary incentives; weighted voting to avoid unfair evaluation; discourage “vote early and often” phenomenon

Privacy: information vs. privacy; privacy blends; attributed credit for recommendation efforts; blind refereeing as in peer review system

Advertisers: charge recipients through subscription or pay-per-use; advertiser support; charge owners of the evaluated media

Page 10: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Recommender System Types Collaborative/Social-filtering system – aggregation

of consumers’ preferences and recommendations to other users based on similarity in behavioral patterns

Content-based system – supervised machine learning used to induce a classifier to discriminate between interesting and uninteresting items for the user

Knowledge-based system – knowledge about users and products used to reason what meets the user’s requirements, using discrimination tree, decision support tools, case-based reasoning (CBR)

Page 11: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Content-based Collaborative Information Filtering

Research Assistant Agent Project (RAAP) Nagoya Institute of Technology, Japan Registration, research profile – bookmark database Interesting page -> agent suggestion -> classification

-> reconfirm or change In parallel, agent checks for newly classified bookmarks

-> recommend to other users -> accept/reject on login Text categorization: positive/negative examples, most

similar classifier for candidate class using term weighting, with TF-IDF scheme in Information Retrieval

Page 12: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Content-based Collaborative Information Filtering (contd.)

Relevance feedback – positive/negative prototypes; similarity measure is simt(c,D) = (Qt+.Dt) – (Qt-.Dt)

Feature selection – removal of non-informative terms using Information Gain (IG) using prob. of term present

Learning to recommend – agent counts with 2 matrices; user vs. category matrix (for successful classification) and user’s confidence factor (0.1 to 1) w.r.t. other users to compute correlation

Circular reference avoided – verify that recommended document is not registered in target’s database

Page 13: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Knowledge-based Systems FindMe technique – knowledge-based similarity retrieval User selects source item -> requests similar items “Tweak” application – same but candidate set is filtered

prior to sorting, leaving only candidates satisfying tweak Car Navigator – conversational interaction/navigation

focused around high-level responses PickAFlick – multiple task-specific retrieval strategies RentMe – query menus set, NLP to generate database Recommender Personal Shopper (RPS) – a domain-

independent implementation of FindMe algorithm

Page 14: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Knowledge-based Systems (contd.)

Similarity measures – goal-based, priorities for goals Sorting algorithm – metric-based bucket sorting Retrieval algorithm – priority-ordered metric

constraints, plus tweaks, forming an SQL query Product data – creation of product database in which

unique items are associated with sets of features Metrics – similarity, directional metrics with

preference

Hybrid system – knowledge-based system with collaborative filtering

Page 15: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Recommender Tradeoffs

Technique Pluses Minuses

Knowledge-based

A. No ramp-up requiredB. Detailed qualitative

preference feedbackC. Sensitive to preference

changes

H. Knowledge engineeringI. Suggestion ability is static

Collaborative filtering

D. Can identify niches preciselyE. Domain knowledge not neededF. Quality improves over timeG. Personalized recommendations

J. Quality dependent on large historical data setK. Subject to statistical anomalies in dataL. Insensitive to preference changes

Ideal Hybrid

A, B, C, D, F, G H

Page 16: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

ARMaDA Recommender No single partitioning scheme performs the best

for all types of applications and systems Optimal partitioning technique depends on input

parameters and application runtime state Partitioning behavior characterized by the tuple

{partitioner, application, computer system} (PAC) PAC quality characterized by 5-component metric

– communication, load imbalance, data migration, partitioning time, partitioning overhead

Octant approach characterizes application/system state

Adaptive meta-partitioner -> fully dynamic PAC

Page 17: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Dynamic Characterization

Page 18: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

RM-3D Switching Test Richtmyer-Meshkov fingering instability in 3

dimension Application trace has 51 time-step iterations RM-3D has more localized adaptation and lower

activity dynamics Depending on computer system, application RM-3D

resides in octants I and III for most of its execution Partitioning schemes pBD-ISP and G-MISP+SP are

suited for these octants Application trace -> Partitioner -> Output trace ->

Simulator -> metric measurements

Page 19: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

RM-3D Switching Test (contd.)

Page 20: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

RM-3D Switching Test (contd.)

Test Runs CGD – complete run pBD-ISP – complete run CGD+pBD-ISP_load (for improved load balance)

0 – 12 -> CGD 13 – 22 -> pBD-ISP23 – 26 -> CGD 27 – 36 -> pBD-ISP37 – 48 -> CGD 49 – 51 -> pBD-ISP

CGD+pBD-ISP_data (for reduced data migration)0 – 10 -> CGD 11 – 28 -> pBD-ISP29 – 34 -> CGD 35 – 51 -> pBD-ISP

Page 21: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

RM-3D Switching Test (contd.)

Metric CGD pBD-ISP CGD+pBD-ISP_load

CGD+pBD-ISP_data

Avg. max. load

imbalance

18.9048 %

37.9821 % 34.749 % 39.3693 %

Avg. avg. data

movement

127.275 18.3137 187.431 110.216

Avg. avg. intra-level

comm.

1063.43 429.804 691.608 723.569

Avg. avg. inter-level

comm.

451.49 0 265.882 127.667

Avg. max. no. of boxes

210.333 2.98039 16.9804 84.8824

Page 22: Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.

Conclusions YES !!! Experimental results conform to theoretical

observations Recommender systems in ARMaDA can result in

performance optimization Future work

- more robust rule-set and switching policies- partitioner/hierarchy optimization at switch-points- integration of recommender engine within ARMaDA- partitioner and application characterization research to form policy rule base