Project ZieOok - Berlin Buzzwords 2011

19

description

Presentation given during Berlin Buzzwords 2011.Talk on project ZieOok: building a generic recommendation platform on top of Mahout an Hadoop.

Transcript of Project ZieOok - Berlin Buzzwords 2011

Page 1: Project ZieOok - Berlin Buzzwords 2011
Page 2: Project ZieOok - Berlin Buzzwords 2011

ZieOok (‘AlsoSee’) building a generic recommendation framework for

the cultural heritage field

by Siem Vaessen - managing partner @ Zimmerman & ZimmermanBerlin Buzzwords 2011

Page 3: Project ZieOok - Berlin Buzzwords 2011

About the images for the future

Preserving audiovisual heritage of the Netherlands through conservation and digitization;

Seven year project

Budget of €154 million;

, started in 2007, will end in 2014

During the project, a total of 137.200 hours of video, 22.510 hours of film, 123.900 hours of audio, and 2.9 million photos from these archives will be restored, preserved, digitized, and

So what to do with all this data?besides digitization...

disclosed through various services.

Page 4: Project ZieOok - Berlin Buzzwords 2011

Current status

more info @ http://imagesforthefuture.com/en/

+ loads of interfaces, applications and tools built on top of this content

Page 5: Project ZieOok - Berlin Buzzwords 2011

Main purpose ZieOok (‘AlsoSee’)

“To create meaningfull relations between assets and users by means of a recommendation engine” (june 2009)

Build an API which will fully function based on REST calls on top of the Mahout/Hadoop setup;Develop a recommendation framework based on an existing framework;

Develop an administrator dashboard: a central hub for controlling main components of the recommendation framework (GUI);

Code developed within ZieOok needs to become open-source.

Page 6: Project ZieOok - Berlin Buzzwords 2011

Long tail

Bringing niche content to users

Page 7: Project ZieOok - Berlin Buzzwords 2011

The ‘market-analysis’

Identify codebase that is suitable for the project;

Make sure that codebase is sustainable.

Question: can a semantic correlation be established within the project?

1. Lexicon- or ontology based (connecting Thesauri);2. A Trust network based sytem based on the FOAF (Friend of a Friend) specification;3. Context-adaptable system that extracts addtional information from the lexicon or the ontology.

Two frameworks identified

“Duine Framework is a (collection of) software libraries that allows developers to create prediction engines.”

Telematica Instituut/Novay / version 4.0.0.0 RC1 (17/2/09)

Page 8: Project ZieOok - Berlin Buzzwords 2011

Apache Lucene Mahout (fka: Taste)

At that time version 0.2;An Apache foundation project;

2.0 version of the Apache License.

Choice made!and now for the actual work...

Page 9: Project ZieOok - Berlin Buzzwords 2011

Core concept ZieOok

Page 10: Project ZieOok - Berlin Buzzwords 2011

Technical architecture ZieOok

Page 11: Project ZieOok - Berlin Buzzwords 2011

Rails ‘front-end’ structure

!

Page 12: Project ZieOok - Berlin Buzzwords 2011

ZieOok datamodel: FOAF

Friend of a Friend specification. (http://www.foaf-project.org/)

!

!

<foaf:person> <foaf:gender /> <foaf:age /> <foaf:knows /> <foaf:based_near /> <foaf:made rdf:resource=”some-rating-uri” /></foaf:Person>

<zieook:rating> <foaf:maker rdf:resource=”some-user-uri” /> <foaf:Document rdf:resource=”item-uri” /> <rdf:DateTime /> <zieook:value /> <zieook:range /> <zieook:source rdf:resource=”source-uri” /> <zieook:recom rdf:resource=”recommender-uri /></zieook:rating>

Page 13: Project ZieOok - Berlin Buzzwords 2011

ZieOok Dashboard: central hub

Import- and train collections of content-providers;

Grant access to Dashboard for content-providers;

Create recommenders;

Create templates for recommenders;

Provide statistics;Provide a HTML widget for simple usage on blogs etc.;

Provide a REST API to build GUI’s and recommendations.

Set filters to recommendations (date-limit, use subparts of collections only)

Page 14: Project ZieOok - Berlin Buzzwords 2011

Collections, users and ratings

Twofold way:

1.using OAI PMH (Open Archives Iniative - Protocol Metadata Harvesting)

http://anyplace.org/OAI?verb=GetRecord&identifier=oai:arXiv.org:hep-th/9901001&metadataPrefix=oai_czp

+ collections are updated by content-provider;

- no user information stored in OAI however, specific ZieOok job;

2.use the Movielens format

+ have a variety of connectors available (aoi_dc -Dublin Core-);

add collection file;add ratings file;add user file;

+ ‘ideal start’: all data available from collection, users and ratings;- static, updates need to arrive from content-platform itself, no harvesting mechanism available.

- cold start problem: no information on ratings, nor users.

Page 15: Project ZieOok - Berlin Buzzwords 2011

Recommendations

Two ways to render recommendations

1. Simple HTML widget

ZieOok created recommendation renders unstyled HTML:top 5 recommendation;like/dislike;

2. Call on the ZieOok REST API

Get full access from the ZieOok API to build custom recommenders

import/analyse/train data;

use REST calls to the ZieOok framework;

real-time;

Page 16: Project ZieOok - Berlin Buzzwords 2011

Usecase

Connect Dutch Broadcasting Organisation (NPO) to ZieOok. (on-demand)

Recommend itemsRate items (like/dislike)See similar users & connect

Back-end: (Dashboard)

Front-end:

Set linear recommenders (in between 16:00-18:00, 18:00-20:00, 20:00-00:00)

Filters (limit date on content or only show category ‘sports,news’ within the collection )

Page 17: Project ZieOok - Berlin Buzzwords 2011

Quality of recommendations

So what defines quality?Quality set by a gold standard;

But also define non-quality such as:

Currently an editorial process

Also see:

X

Page 18: Project ZieOok - Berlin Buzzwords 2011

Roadmap

1.Bring ZieOok onstream: end of this month (June 2011);2.Release ZieOok REST API to the community (under discussion);

1.Maintain ZieOok Cluster for a 3 year period;

Short term

3.Connect content-platforms.

Long term

2.Hybrid recommender (recommend cross-platform);3.Identify risks in development and upgrades: Mahout API changes, Hadoop changes etc.

Page 19: Project ZieOok - Berlin Buzzwords 2011

End of presentation / Q&Aby Siem Vaessen - managing partner @ Zimmerman & Zimmerman