© December 1999 George Paliouras, All Rights Reserved1 Learning Communities of Users on the...

34
© December 1999 George Paliouras, All Rights Reserved 1 Learning Communities Learning Communities of Users on the of Users on the Internet Internet George Paliouras Christos Papatheodorou Vangelis Karkaletsis Constantine D. Spyropoulos NCSR “Demokritos” Email: [email protected] WWW: http://www.iit.demokritos.gr/skel

Transcript of © December 1999 George Paliouras, All Rights Reserved1 Learning Communities of Users on the...

© December 1999 George Paliouras, All Rights Reserved 1

Learning Communities of Learning Communities of Users on the InternetUsers on the Internet

George Paliouras

Christos Papatheodorou

Vangelis Karkaletsis

Constantine D. Spyropoulos

NCSR “Demokritos”

Email: [email protected]

WWW: http://www.iit.demokritos.gr/skel

© December 1999 George Paliouras, All Rights Reserved 2

Structure of the talkStructure of the talk

• Services on the Internet• Personalization of Internet services• Learning communities from usage data• Three case studies

– Information broker (filtering)– Digital library (retrieval)– Web-site (navigation)

• Conclusions

© December 1999 George Paliouras, All Rights Reserved 3

WWW: the new face of WWW: the new face of the Netthe NetOnce upon a time, the Internet was a forum for exchanging information. Then … …came

the Web.The Web introduced new capabilities …

…and attracted many more people …

…increasing commercial interest …

…and turning the Net into a real forum …

© December 1999 George Paliouras, All Rights Reserved 4

Services on the InternetServices on the Internet

Information providers are still the majority…

Commercial Non-Commercial

CNN Reuters

Times Yahoo

CORDIS NCSTRL

MLNET Library

© December 1999 George Paliouras, All Rights Reserved 5

• We have looked at three different types:– Information filtering (profiles)– Information retrieval (queries)– Navigation

Information accessInformation access

• The Web has introduced new ways to access information.

passive

active

• … covering the majority of today’s information services.

© December 1999 George Paliouras, All Rights Reserved 6

Personalized information Personalized information accessaccess

• Adaptation of the system to the user.

• Social motivation:– Better service for the citizen (reduction of

the information overload).

• Commercial motivation: – Customer relationship management

(targeted advertisement, customer retention, increased sales, etc.)

© December 1999 George Paliouras, All Rights Reserved 7

Personalized information Personalized information accessaccess

“The Quantity of People Visiting Your Site Is Less Important Than the Quality of

Their Experience”

Evan I. Schwartz, Webonomics, Broadway Books, 1997

© December 1999 George Paliouras, All Rights Reserved 8

Personalized information Personalized information accessaccess

sources

server

receivers

© December 1999 George Paliouras, All Rights Reserved 9

User modelingUser modeling

• The process of constructing models that can be used to adapt the system to the user’s requirements.

• Types of user requirement:– Interests (e.g. sports and finance articles)– Knowledge level (e.g. novice – expert)– Preferences (e.g. appearance of GUI)– etc.

© December 1999 George Paliouras, All Rights Reserved 10

User ModelsUser Models

• User model (type A): [PERSONAL]

User x -> sports, stock market

• User model (type B): [PERSONAL]

User x, Age 26, Male -> sports, stock market

• User community: [GENERIC]

Users {x,y,z} -> sports, stock market

• User sterotype: [GENERIC]

Users {x,y,z}, Age [20..30], Male -> sports, stock market

© December 1999 George Paliouras, All Rights Reserved 11

Machine Learning / Data Machine Learning / Data MiningMining

• Acquisition of models from usage data.

• Types of learning– Supervised learning: requires manually

tagged examples.– Unsupervised learning: clusters untagged

examples, according to similarity.

© December 1999 George Paliouras, All Rights Reserved 12

Learning user modelsLearning user models

User 1 User 2 User 3 User 4 User 5

Observation of the users interacting with the system.

User models

Community 1 Community 2 User communities

© December 1999 George Paliouras, All Rights Reserved 13

Collaborative filteringCollaborative filtering

• Memory-based “learning”, (e.g. k-nn):– Given a group of users…– …and a new user…– …find similar users.

• Already in commercial use (e.g. Firefly, amazon.com)

• Problem: It does not give any insight about the usage of the system.

© December 1999 George Paliouras, All Rights Reserved 14

Clustering users into Clustering users into communitiescommunities

• Clustering methods:– Conceptual clustering

(COBWEB, ITERATE)– Graph-based clustering (Cluster mining)– Statistical clustering (Autoclass)– Neural clustering (Self-organising Maps)

© December 1999 George Paliouras, All Rights Reserved 15

Conceptual clusteringConceptual clustering

• COBWEB generates a hierarchy of concepts.

• Each concept is a cluster of objects.

• Our concepts are the communities.

• Our objects are “user models”.

• Similarity metric: category utility.

• Each user in only one community.

© December 1999 George Paliouras, All Rights Reserved 16

Meaningful communitiesMeaningful communities

• Question: What are the characteristics of a community?

• Answer: Community characterization, measuring frequency increase.

• Example: How frequently do users of the community read sports news, compared to the whole set of users.

© December 1999 George Paliouras, All Rights Reserved 17

Cluster miningCluster mining

• Searches for cliques in a graph of the following form:

hardware

mathematics of computing

software

computingmilieux

computingmethodologies

0.22

0.12 0.27

0.19

0.13

0.024

0.03

0.04

0.040.04 0.02

0.030.014 0.0262

0.02

© December 1999 George Paliouras, All Rights Reserved 18

Cluster miningCluster mining

• Nodes: features in the user model.

• Edge labels: frequency at which the two nodes appear together in the data.

• Edge reduction: using a threshold.

• Clique: commonly met pattern in the behavior of the users.

• Each user in several communities.

© December 1999 George Paliouras, All Rights Reserved 19

Case studiesCase studies

• Information broker (filtering)

• Digital Library

(retrieval)

• Web-site

(navigation)ACAI99

NCSTRL

?

© December 1999 George Paliouras, All Rights Reserved 20

Criteria for the Criteria for the communitiescommunities

• We evaluate the quality of community descriptions (behavioral patterns), by:– Coverage: Proportion of characteristics

appearing in the descriptions.– Overlap: Extend of overlap between

descriptions:– Meaningfulness: Do the descriptions make

sense? Are they interesting?

© December 1999 George Paliouras, All Rights Reserved 21

I: Profile-based filteringI: Profile-based filtering

• User models: profiles of news categories for each user.

• User communities: users with common news-reading interests.

• Community descriptions: news categories for each community.

© December 1999 George Paliouras, All Rights Reserved 22

I: COBWEBI: COBWEB

A (1078)

B (681)C (397)

D (328) E (353) F (98)G (181) H (118)

J

(104)

K

(161)

L

(95)

M

(102)

N

(156)

O

(38)

P

(17)

Q

(43)

R

(36)

S

(96)

I

(63)

W

(28)

V

(62)

U

(28)

T

(49)

Community hierarchy

© December 1999 George Paliouras, All Rights Reserved 23

0

0,2

0,4

0,6

0,8

1

0 0,5 1pruning parameter

cove

rage

cobweb (level 2)cobweb (level 3)

I: COBWEBI: COBWEB

Coverage Overlap

012345678

0 0,5 1pruning parameter

over

lap

cobweb (level 2)cobweb (level 3)

© December 1999 George Paliouras, All Rights Reserved 24

I: COBWEBI: COBWEB

D  

E Internet (0.55)

F Economic ind. (0.73), Economics & Finance (0.68), Computers (0.6), Transport (0.53), Financial ind. (0.5)

G Economic ind. (0.58), Economics & Finance (0.61)

H Computers (0.53)

Community descriptions

© December 1999 George Paliouras, All Rights Reserved 25

I: Cluster miningI: Cluster mining

Behavioral patternsTelecom, Computers, Internet, Industries, Economics/Finance

Telecom, Computers, Networks

Telecom, Economic ind., Economics/Finance

Hardware, Software

Financial ind., Economic ind., Economics/Finance

Financial ind., Economic ind., Financial markets

Sport, Entertainment electronics

© December 1999 George Paliouras, All Rights Reserved 26

I: ComparisonI: Comparison

012345678

0 0,5 1Connectivity threshold (cluster mining)

and pruning parameter (COBWEB)

Ove

rlap

cluster mining

COBWEB(level 2)

© December 1999 George Paliouras, All Rights Reserved 27

II: Query-based retrievalII: Query-based retrieval

• User models: processed queries.

• User communities: user queries with common keywords.

• Community descriptions: characteristic keywords for each community.

• Pre-processing:– Lemmatization and synonyms (WordNet).– Generalization to top ACM categories.

© December 1999 George Paliouras, All Rights Reserved 28

II: COBWEBII: COBWEB

Community descriptionsComputer Systems Organisation (1.0)

Software (1.0)

Hardware (1.0)

Information Systems (1.0), Computing milieux (0.63), Computing methodologies (0.28)

Information Systems (1.0)

Computing methodologies (1.0), Hardware (1.0)

Computing methodologies (1.0), Software (1.0)

Computing methodologies (1.0)

© December 1999 George Paliouras, All Rights Reserved 29

II: Cluster miningII: Cluster mining

Behavioral patternsHardware, Software, Computing Milieux, Computing Methodologies

Hardware, Software, Computing Milieux, Maths of Computing

Hardware, Computer Systems Organisation

Theory of Computation, Maths of Computing

Information Systems, Software, Computing Milieux, Computing Methodologies

Information Systems, Software, Computing Milieux, Maths of Computing

© December 1999 George Paliouras, All Rights Reserved 30

III: Web-site navigationIII: Web-site navigation

• User models: access sessions as sets of pages or sets of page transitions.

• User communities: users with common navigation behavior.

• Community descriptions: Pages or page transitions for each community.

• Pre-processing: – Sessions from access logs. (duration)– Dimensionality reduction, by feature selection.

© December 1999 George Paliouras, All Rights Reserved 31

III: COBWEBIII: COBWEB

Community descriptions24>25, 23>24, 1>24, 1>19, 19>23

1>22, 22>20, 20>31, 31>27, 27>7, 19>23

22>31, 1>22

22>27, 1>22

1>30

1>30, 8>1, 1>8

30>31, 1>30

© December 1999 George Paliouras, All Rights Reserved 32

III: Cluster miningIII: Cluster mining

Behavioral patterns

1>19, 19>23, 23>24, 24>25

1>24, 24>25

1>22, 22>31

1>22, 22>20

1>30, 30>31

22>20, 20>31, 31>27

22>20, 20>27

1>8, 8>1

1>9, 9>2

20>31, 31>27, 27>7

19>23, 23>14

23>14, 27>7

1>2, 2>11

2>11, 11>12

1>23, 23>24

© December 1999 George Paliouras, All Rights Reserved 33

ConclusionsConclusions

• Community construction gives insight about the usage of information services.

• Unsupervised learning can do the job.

• Characterization makes the results useful.

• Substantial data engineering is need for different types of information access.

© December 1999 George Paliouras, All Rights Reserved 34

A paradoxA paradox

High commercial demand for a research product!

Solutions need to be simple and efficient!