NoTube: Pattern-based Recommendations (part 3)
-
Upload
notubeproject -
Category
Technology
-
view
116 -
download
1
description
Transcript of NoTube: Pattern-based Recommendations (part 3)
WP 3User profiling &
Recommenda6on (Part 3)BBC, Pro-‐ne+cs, VUA
1
Wednesday, March 28, 12
Contents
26-27 March 2012 2NoTube 3rd Review
Overview
User profilingGeneral goal & approachFrom activity streams to profileIssuesAnalyticsBeancounter
RecommendationsGeneral goal & approachSemantic recommendationStatistical recommendationHybrid recommendation
Exploitation
Conclusions
Wednesday, March 28, 12
Overview
26-27 March 2012 3
TV Program Enrichment
SemanticPattern-based
Recommendation Strategy
RDF GraphTV
Programs
Semantic ContentPatterns for
TV Programs
HybridRecommendation
Strategy
StatisticalSimilarity-based
Recommendation StrategyUser Ratings &
Demographics(BBC EPG
Data)
EPG Metadata(BBC)
Recommendation Service
SimilarityClusters
of Programs
User Data Analysis
End-UsersEnd Users
NoTube 3rd Review
Wednesday, March 28, 12
Overview
26-27 March 2012 3
TV Program Enrichment
SemanticPattern-based
Recommendation Strategy
RDF GraphTV
Programs
Semantic ContentPatterns for
TV Programs
HybridRecommendation
Strategy
StatisticalSimilarity-based
Recommendation StrategyUser Ratings &
Demographics(BBC EPG
Data)
EPG Metadata(BBC)
Recommendation Service
SimilarityClusters
of Programs
User Data Analysis
End-UsersEnd Users
BEANCOUNTER
NoTube 3rd Review
Wednesday, March 28, 12
Statistical recommendations
26-27 March 2012 4NoTube 3rd Review
• We had privileged access to two bulk user ratings datasets from BBC
• From these, used Apache Mahout toolkit to derive "item to item" similarity measures between each pair of items
• With larger (20k users) this worked well; with a smaller (1k) dataset, less well
• With BBC, investigating publication of these behaviour-derived similarity measures
Wednesday, March 28, 12
31
Hybrid models:
factual paths and statistical similarity
(and not to mention ‘@wossy’ is on Twitter with 1 million followers...)
Wednesday, March 28, 12
Statistical recommendation
26-27 March 2012 6NoTube 3rd Review
89 05 2 9
00 88 8 6
23 97 9 8
20k
12k
Wednesday, March 28, 12
Statistical recommendation
26-27 March 2012 7NoTube 3rd Review
09 00 0 9
00 88 0 0
00 97 0 8
Wednesday, March 28, 12
99
Wednesday, March 28, 12
1010
Wednesday, March 28, 12
1111
Wednesday, March 28, 12
1212
Wednesday, March 28, 12
TV Preference Data is very sparse
26-27 March 2012 12NoTube 3rd Review
• Even for a single service (e.g. Netflix), data is ‘overwhelmingly sparse’
• For NoTube’s open systems, challenges multiply:– often no global view, only per-user data
– many ways of identifying the same content item
– many ways of identifying the same user
– never mind other entities (actors, directors, ...)
• Q: Can we tell a story about how organizations with such privileged overviews can contribute in a privacy respecting way to the public commons of linked data? (A: yes! see WP4)
Wednesday, March 28, 12
Fragmentation by site
26-27 March 2012 13NoTube 3rd Review
Wednesday, March 28, 12
29
Wednesday, March 28, 12
30
Wednesday, March 28, 12
Statistical recommendation: Process
26-27 March 2012 16NoTube 3rd Review
• Build on best-in-class opensource code, rather than re-invent
• Big-data ready (Hadoop-based)
• Of various options, LogLikelihoodSimilarity generally gave best results (standard 'withold some ratings' evaluation strategy)
• Other explorations: including large scale (1/2 billion tweet) Twitter analysis, Spectral Clustering, using demographics, ...
Wednesday, March 28, 12
Exploitation & Further Development
26-27 March 2012 17NoTube 3rd Review
Beancounter: •Pronetics’ user profiling SaaS•integration in the e-commerce technological solution
• making it more general purpose• making it capable of big data management a SaaS playground for Semantic Web researcher
•open source licensing•community extensions
Wednesday, March 28, 12
Exploitation & Further Development
26-27 March 2012 18NoTube 3rd Review
Recommendations: •explore further the combination of demographic stereotypes & semantics in a hybrid approach to learn a prediction model for the shows a user is most likely interested in•integrate in personalized semantic search frameworks•extend with additional LOD sources•test further the measures for diversity, serendipity and predictability
•open source licensing•community extensions
Wednesday, March 28, 12
Acknowledgements
26-27 March 2012 19NoTube 3rd Review
Wednesday, March 28, 12