Ufa State Aviation Technical University

Post on 28-Jan-2016

16 views 0 download

Tags:

description

Ufa State Aviation Technical University. Grigory A. Makeev. Distributed Collaborative Filtering System as a Prototype of a New Information Messaging Media. Paranoia: a web-based blog and RSS aggregation system. Ufa, 200 7. Information messaging. - PowerPoint PPT Presentation

Transcript of Ufa State Aviation Technical University

Ufa State Aviation Technical University

Distributed Collaborative Filtering System

as a Prototype of a New Information Messaging Media

Ufa, 2007

Paranoia: a web-based blog and RSS aggregation system

Grigory A. Makeev

2

Information messaging

•Important from his own point of view (selectivity);•In time (operativeness);•Most of existing important ones (pervasion);

A person, being an element of a social system, needs to obtain adequate information to interact with others. Thus we suppose that every person wishes to get information messages:

However, natural limitations are evident:

•Importance can be estimated only by user himself;•Messages are too many to handle in time;•Messages are too many to process them all;

3

Hypothesis: collaboration

At least until the semantics of natural languages can be processed effectively, importance of a message would always initially be estimated manually, by a human user.

•One single user has to process messages manually•Many collaborative users can effectively process a large set of

messages, exchanging important messages they find•May a message importance be estimated only once?•May a user use/trust an estimation of an arbitrary user(s)?

4

Collaborative filtering problem

U1 U2 Un...

...

Ui

m1

{m1,m3}

M m2

m3

m4mj

{m1,m2} {m4,m2} { ? }

P1(mk) P2(mk) Pn(mk) Pi(mk)

Building a recommendation

Ui

M

{ ? }

M`Í M

Models and methods of recommender systems

Restrictions

5

Recommender systemsSearch engines: • Google

Web-based recommender systems: • GroupLens • IOwl

Online stores: • Amazon• Ebay

Resources with elements of social networks

General drawbacks of existing collaborative filtering systems:•recommendations are built using data from all users, thus result has a

bad selectivity;•centralization;•vulnerability on logical and physical layers;•users lack control on the process;•users lack the explanation of the results;•systems do not allow an objective efficiency estimation.

Approaches to recommender systems

Content analysis Recommendation

support systems

Social data-mining

Collaborative filtering

6

An approach on collaborative filtering

• Users U1,U2,…,Ui;

• Every Ui controls a peer of a p2p-network, identified by a pair of

security keys;• Every Ui manages a set of messages Mi;

• If a message is in Mi, Ui is said to recommend this message;

• Only user Ui may manage messages of Mi set;

• Other users may retrieve Mi, receiving a recommendation of Ui

Data structures: messages

User UiUi

Public key

Private key

User name

Ui

Channel

Climate

Messages

Message

February, 13th, a strong hurricane approached

central Antarctica

Sgntr

...

UserName: Иванов И.И.

Location: УфаLanguage: русский

...

...

7

An approach on collaborative filtering

• Users U1,U2,…,Ui;

• Every Ui controls a set of rates Ri – pairs of (Uj,vij); vij [0,1]

which may have an additional information, such as a channel;

• Only user Ui may manage rates in Ri;

• Other users may retrieve Ri

Data structures: rates

Ui

Channel

Climate

Rates

User

Uj

Rate

0,9...

Society Uk 0,7

8

An approach on collaborative filtering

• Every user rates a limited number of users directly, that he knows of, or that he is somewhat sure of;

• Transitivity allows us to extend a set of users, included in collaborative filtering for a particular user;

•Messages, retrieved from all users included in a filtering process, are sorted by how many users recommended it and what their value was;

•Aggregation function AMF(m, R*i) is also to be found

Uj

Uk

0,9

0,8

Ui

TRF(Ui,Uj,Uk) = 0,8*0,9 = 0,72

Extending rates set and message aggregation

•Transitive rate is computed with a special function TRF(Ui,Uj,Uk) to be found

9

A proposed scheme of collaborative filtering

1. User evaluates an extended rates set of a sufficient depth.

Stage 1

UA

UB

UD UC

UE

0,9 0,8

0,8 0,7

(UD, 0.72, 1), (UB, 0.9, 0), (UC, 0.63, 1), (UE, 0.8, 0)

10

A proposed scheme of collaborative filtering

2. Retrieving messages from many peers, user evaluates an extended messages set M*I – unsorted result

of collaborative filtering;3. Calculating a value of every

message, user evaluates an extended messages set MR*i –

sorted result of collaborative filtering.

Stages 2-3

UA

UBUD UC UE

0.80.630.90.72

{m1, m2} {m1, m3} {m4, m5} {m1, m4}

m U v

m1 UD 0.72

m2 UD 0.72

m1 UB 0.9

m3 UB 0.9

m4 UC 0.63

m5 UC 0.63

m1 UE 0.8

m4 UE 0.8

m Uv

m1 UD,UB,UE2.42

m4 UC,UE1.43

m3 UB0.9

m2 UD0.9

m5 UC0.63

11

A proposed scheme of collaborative filtering

4. User corrects his own set of messages Mi;

5. User corrects his own set of rates;

Stages 4-5

m Uv

m1 UD,UB,UE2.42

m4 UC,UE1.43

m3 UB0.9

m2 UD0.9

m5 UC0.63

Ui

Chnl

...

Rates

User

UB

Rate

0,4

Chnl...

Messages

Messagem1

Sgntr...

... m4 ...

m6

... m6 ...

... UE 0,8

Ui

12

Advantages of the approach

Features of the system implementing the approach proposed:• Decentralization• Anonymity of authors• Authors can prove themselves and ownership on the message• Selectivity• Controllability• Explainability• Flood resistance• Antagonistic societies can co-exist and even collaborate

13

Results of the formal analysis and experiments

• Criteria of controllability and persistency on users and messages found and formalized;

• Several transitivity functions TRF and message aggregation function AMF found, examined to conform criteria found and the best one chosen;

• A system of virtual users created, seeking and exchanging important messages:

• Messages considered numbers;• Every user had a favourite number;• Users constructed their trusted neighbours in the making,

starting with random rates set, or a preset one;• Users aim at collecting most favourable messages;• An objective efficiency of the system is calculated;• Dependencies of efficiency on many factors investigated;

14

Proposed prototype implementation

• HTTP instead of p2p-network protocols• DNS routing instead of ad-hoc p2p naming and routing protocols• Web-server instead of p2p-node• Users sharing common web-servers instead of users on p2p-nodes• RSS as a message delivery protocol

A web-based RSS aggregator

It looks like a web-based RSS aggregator, but a typical one of them• does not actually “aggregate”, merely “collects”

It looks like a typical web-based collaborative filtering system, but most of them• use “general” reputation, influenced by everyone• are server based, centralized• are not customizeable

As a working prototype we propose an open-source (GNU GPL) web-based RSS aggregator – Paranoia, available at

http://greg.southural.ru/paranoia/

15

Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia

Paranoia server

LiveJournal

Blog 1<p>a message</p><p>a message</p><p>a message</p> ...

RSS

HTML

Syndicated feeds

A Paranoia blog is accessible both in browsers and RSS-aggregators

Paranoia server RSS-aggregator

Syndicated feeds

A Paranoia blog is accessiblein LiveJournal throughSyndicated feeds feature

A Paranoia blog is accessibleon another Paranoia server

A Paranoia blog is accessiblein any other RSS-aggregator

16

Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia

Paranoia server

My news<p>news</p><p>news</p><p>news</p> ...

Blog 2<p>message</p><p>message</p><p>message</p> ...

Paranoia server

LiveJournal

Paranoia can aggregate messagesfrom different sources – users of the samesystem, users of remote Paranoia system,users and communities of LiveJournal,and of arbitrary RSS feed.

RSS feeds

17

Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia

Paranoia server

My news<p>news</p><p>news</p><p>news</p> ...

My messages<p>message</p><p>message</p><p>message</p> ...

RSS

HTML

RSS

HTML

18

Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia

А

B

This rate means the following:

«I want to receive messages from user А in channel «Politics», and I value him for 0.5 in this channel»

Channel: politicsRate: 0.5

Channel: handiworkRate: 0.2

B

Channel: handiworkRate: 0.8

LJ

Channel: mainRate: 0.2

19

Proposed prototype implementationAn open-source web-based RSS aggregator - Paranoia

S А

B

The news will be as higher whenit has come from many users andwhen as higher as is their valuefor you.

Channel: politicsRate: 0.5

Channel: HandiworkRate: 0.2

B

Channel: HandiworkRate: 0.8

My news<p>news</p> 0.5<p>news</p> 0.4<p>news</p> 0.4 ...

Channel: KnitworkRate: 0.2

C

Channel: mainRate: 0.5

D

Channel: politicsRate: 0.5

E

Channel: politicsRate: 0.5

E

Channel: mainRate: 0.5

...

...

Paranoia server

...

Paranoia server

LiveJournal

...

20

Proposed prototype implementation

А B

Paranoia server

C D

E F

News

А and В process news feed ontheir own – thus they can not useeach other’s labour

An open-source web-based RSS aggregator - Paranoia

А B

Paranoia server

C D

E F

News

A and B collaborate in processing news feeds,- thus a message, coming from both a feed anda fellow user would receive higher rank ina news result set

21

Proposed prototype implementationNon-trivial features

An environment appears to be very flexible, and many tasks can be solved trivially within:

1. Administrator notifications: every user automatically rates a local administrator in a channel ‘system’

Channel: systemRate: 0.5

User news: system<p>notification1</p><p>notification2</p><p>notification3</p> ...

Paranoia server

Admin messages: system<p>notification1</p><p>notification2</p><p>notification3</p> ...

2. Users feedback: local administrator automatically rates every user in a channel ‘feedback’

C D ...

Channel: feedbackRate: 0.1

User A messages: feedback<p>feedback1</p><p>feedback2</p> ...

Paranoia server

Admin news: feedback<p>feedback1</p> 0.2<p>feedback2</p> 0.1<p>feedback3</p> 0.1 ...

User B messages: feedback<p>feedback1</p><p>feedback3</p> ...

Channel: feedbackRate: 0.1

22

Proposed prototype implementationNon-trivial features

3. Comments to messages are merely one’s own messages, stored in a special channel:

• Comments to do leave creator’s peer;• Comments are retrieved when needed, following the same rules as

any other message;

C D ...

Channel: politicsRate: 0.5

User A news: politics<p>message1</p>

<p>comment1</p><p>comment2</p>

<p>message2</p> ...

Paranoia server

User B messages: politics<p>message2</p> ...

User B messages: comments<p>comment1</p> ...

4. If comments are retrieved only from trusted peers and are not stored locally:• No one (except trusted peers) can spam the discussion;• Different groups with rates among group fellows can discuss the same

message without interfering!

23

Conclusion

In our opinion messaging systems (news messaging or whatsoever) would evolve gradually:

• to be distributed among many storages• to have many initial sources of information

• with emphasis to direct witnesses• to implement collaborative filtering

• specific for every user• controllable by every user• resistant to most types of malicious behaviour

Thank you!