Recommender Systems. Outline Limitations of Recommender Systems SMARTMUSEUM Case Study.
Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.
-
Upload
albert-merritt -
Category
Documents
-
view
216 -
download
0
Transcript of Recommender Systems Sumir Chandra The Applied Software Systems Laboratory Rutgers University.
Recommender Systems
Sumir ChandraThe Applied Software Systems
LaboratoryRutgers University
Introduction Information overload – decisions??? Too many domains, less experience, too much
data- books, movies, music, websites, articles, etc.
System providing recommendations to users based on opinions/behaviors of others- efficient attention, better matches, non-obvious connections, keep users coming back for more …
E.g. – E-commerce: Reel.com, Levi’s, eBay, Excite Commerce: call centers, direct marketing
Introduction (contd.) Data sources: purchase data, browsing &
searching data, feedback by users, text comments, expert recommendations
Taxonomy:- text comments (expert/user reviews)- attribute based (this author also wrote …)- item-to-item correlation (people who bought this item also bought …)- people-to-people correlation (users like you …)
Primary transformation: recommendation aggregation or good matching between recommender and seeker
CorrelationsItem-to-item correlation Connect users to
items they may be unaware of
Based on keywords or features of object
Key statistic: high/low- # people who bought A & B / # people who bought A
People-to-people correlation Collaborative filtering Assumes user will-
- prefer like-minded prefer- prefer dissimilar dislike
Object ranking by users CF: majority rules, nearest
neighbor, weighted averages (prediction, S.D., covariance) +ve or -ve
Design IssuesTechnical Design Space Content of evaluation: single bit to unstructured
textual notations – ease of use, computation overload Explicit/Implicit evaluation: nature of
recommendation User identity: real names, pseudonyms, anonymous Evaluation aggregation: research area – weighted
voting, content analyses, referral chains, etc. Evaluation usage: filtering out negatives, sorting of
items according to numeric evaluations, display
Design Issues (contd.)
Design Issues (contd.)Domain-Space Characteristics of items evaluated
Domain to which items belong
Sheer volume variable Lifetime – rate of gathering
and distributing evaluations Cost structure – miss a
good item, sample a bad one, costs of incorrect decisions
Domain-Space Characteristics of participants and evaluations
Set of recommenders Recommendation density – do
recommenders tend to evaluate many items in common
Set of consumers Consumer taste variability –
taste matching better for larger set, personalized aggregation better when tastes differ
Design Issues (contd.)
Design Issues (contd.)Social Implications Free Riders: take but not give; mandatory,
monetary incentives; weighted voting to avoid unfair evaluation; discourage “vote early and often” phenomenon
Privacy: information vs. privacy; privacy blends; attributed credit for recommendation efforts; blind refereeing as in peer review system
Advertisers: charge recipients through subscription or pay-per-use; advertiser support; charge owners of the evaluated media
Recommender System Types Collaborative/Social-filtering system – aggregation
of consumers’ preferences and recommendations to other users based on similarity in behavioral patterns
Content-based system – supervised machine learning used to induce a classifier to discriminate between interesting and uninteresting items for the user
Knowledge-based system – knowledge about users and products used to reason what meets the user’s requirements, using discrimination tree, decision support tools, case-based reasoning (CBR)
Content-based Collaborative Information Filtering
Research Assistant Agent Project (RAAP) Nagoya Institute of Technology, Japan Registration, research profile – bookmark database Interesting page -> agent suggestion -> classification
-> reconfirm or change In parallel, agent checks for newly classified bookmarks
-> recommend to other users -> accept/reject on login Text categorization: positive/negative examples, most
similar classifier for candidate class using term weighting, with TF-IDF scheme in Information Retrieval
Content-based Collaborative Information Filtering (contd.)
Relevance feedback – positive/negative prototypes; similarity measure is simt(c,D) = (Qt+.Dt) – (Qt-.Dt)
Feature selection – removal of non-informative terms using Information Gain (IG) using prob. of term present
Learning to recommend – agent counts with 2 matrices; user vs. category matrix (for successful classification) and user’s confidence factor (0.1 to 1) w.r.t. other users to compute correlation
Circular reference avoided – verify that recommended document is not registered in target’s database
Knowledge-based Systems FindMe technique – knowledge-based similarity retrieval User selects source item -> requests similar items “Tweak” application – same but candidate set is filtered
prior to sorting, leaving only candidates satisfying tweak Car Navigator – conversational interaction/navigation
focused around high-level responses PickAFlick – multiple task-specific retrieval strategies RentMe – query menus set, NLP to generate database Recommender Personal Shopper (RPS) – a domain-
independent implementation of FindMe algorithm
Knowledge-based Systems (contd.)
Similarity measures – goal-based, priorities for goals Sorting algorithm – metric-based bucket sorting Retrieval algorithm – priority-ordered metric
constraints, plus tweaks, forming an SQL query Product data – creation of product database in which
unique items are associated with sets of features Metrics – similarity, directional metrics with
preference
Hybrid system – knowledge-based system with collaborative filtering
Recommender Tradeoffs
Technique Pluses Minuses
Knowledge-based
A. No ramp-up requiredB. Detailed qualitative
preference feedbackC. Sensitive to preference
changes
H. Knowledge engineeringI. Suggestion ability is static
Collaborative filtering
D. Can identify niches preciselyE. Domain knowledge not neededF. Quality improves over timeG. Personalized recommendations
J. Quality dependent on large historical data setK. Subject to statistical anomalies in dataL. Insensitive to preference changes
Ideal Hybrid
A, B, C, D, F, G H
ARMaDA Recommender No single partitioning scheme performs the best
for all types of applications and systems Optimal partitioning technique depends on input
parameters and application runtime state Partitioning behavior characterized by the tuple
{partitioner, application, computer system} (PAC) PAC quality characterized by 5-component metric
– communication, load imbalance, data migration, partitioning time, partitioning overhead
Octant approach characterizes application/system state
Adaptive meta-partitioner -> fully dynamic PAC
Dynamic Characterization
RM-3D Switching Test Richtmyer-Meshkov fingering instability in 3
dimension Application trace has 51 time-step iterations RM-3D has more localized adaptation and lower
activity dynamics Depending on computer system, application RM-3D
resides in octants I and III for most of its execution Partitioning schemes pBD-ISP and G-MISP+SP are
suited for these octants Application trace -> Partitioner -> Output trace ->
Simulator -> metric measurements
RM-3D Switching Test (contd.)
RM-3D Switching Test (contd.)
Test Runs CGD – complete run pBD-ISP – complete run CGD+pBD-ISP_load (for improved load balance)
0 – 12 -> CGD 13 – 22 -> pBD-ISP23 – 26 -> CGD 27 – 36 -> pBD-ISP37 – 48 -> CGD 49 – 51 -> pBD-ISP
CGD+pBD-ISP_data (for reduced data migration)0 – 10 -> CGD 11 – 28 -> pBD-ISP29 – 34 -> CGD 35 – 51 -> pBD-ISP
RM-3D Switching Test (contd.)
Metric CGD pBD-ISP CGD+pBD-ISP_load
CGD+pBD-ISP_data
Avg. max. load
imbalance
18.9048 %
37.9821 % 34.749 % 39.3693 %
Avg. avg. data
movement
127.275 18.3137 187.431 110.216
Avg. avg. intra-level
comm.
1063.43 429.804 691.608 723.569
Avg. avg. inter-level
comm.
451.49 0 265.882 127.667
Avg. max. no. of boxes
210.333 2.98039 16.9804 84.8824
Conclusions YES !!! Experimental results conform to theoretical
observations Recommender systems in ARMaDA can result in
performance optimization Future work
- more robust rule-set and switching policies- partitioner/hierarchy optimization at switch-points- integration of recommender engine within ARMaDA- partitioner and application characterization research to form policy rule base