Outline

Post on 03-Jan-2016

31 views 0 download

Tags:

description

Outline. Motivation Information overload in a scientific congress scenario Conference Participant Advisor Service Profile-driven paper recommending User Profiles as Bayesian Text Classifiers User Profiles learned from documents semantically indexed through a WSD procedure [*] - PowerPoint PPT Presentation

Transcript of Outline

Personalization in Digital Library:Personalization in Digital Library:

An Intelligent Service based onAn Intelligent Service based on

Semantic User ProfilesSemantic User Profiles

Department of Computer Science University of Bari - Italy

3rd Italian Research Conference on

Digital Library Systems Padova, Italy, 29-30 January 2007

Giovanni Semeraro Pasquale Lops

Marco Degemmis Pierpaolo Basile

Annalisa Gentile

2

OutlineOutline

Motivation

Information overload in a scientific congress scenario Conference Participant Advisor Service

Profile-driven paper recommending User Profiles as Bayesian Text ClassifiersUser Profiles learned from documents semantically

indexed through a WSD procedure [*] Empirical Evaluation Conclusions and Future Work

[*] Combining Learning and Word Sense Disambiguation for Intelligent User Profiling - IJCAI 2007

3

MotivationMotivation

Information overload in the scientific congress scenario

4

MotivationMotivation

Information overload in the scientific congress scenario

5

Personalized systems adapt their behavior to individual users by learning user profilesStructured model of the user interestsExploitable for providing personalized content and

services

Personalization usually done automatically based on the user profile and possibly the profiles of other users with similar interests (collaborative approach)

How personalization can be used in the scientific congress scenario?

Web PersonalizationWeb Personalization

6

Web Personalization in the Web Personalization in the scientific congress scenarioscientific congress scenario

Learn research interests of participants from papers they rated

Store research interests in personal profiles Used to build personalized programs delivered to

participants

7

OUR STRATEGY

content-based recommendations by

learning from TEXT

and USER FEEDBACK on items

Learning User Profiles as a Learning User Profiles as a Text Categorization problemText Categorization problem

8

AI is a branch of computer science

doc1

the 2007 International Joint Conference on Artificial Intelligence will be held in India

doc2

apple launches a new product…

doc3

artificial

0.02

intelligence 0.01

apple 0.13

AI 0.15

USER PROFILE

MULTI-WORD CONCEPTS

Keyword-based profiles: Keyword-based profiles: problemsproblems

9

AI is a branch of computer science

doc1

the 2007 International Joint Conference on Artificial Intelligence will be held in India

doc2

apple launches a new product…

doc3

artificial

0.02

intelligence 0.01

apple 0.13

AI 0.15

USER PROFILE

SYNONYMY

Keyword-based profiles: Keyword-based profiles: problemsproblems

10

AI is a branch of computer science

doc1

the 2007 International Joint Conference on Artificial Intelligence will be held in India

doc2

apple launches a new product…

doc3

artificial

0.02

intelligence 0.01

apple 0.13

AI 0.15

USER PROFILE

POLYSEMY

Keyword-based profiles: Keyword-based profiles: problemsproblems

11

Advanced NLP techniques used to represent documents

Naïve Bayes text classification to assign a score (level of interest) to items according to the user preferences

Result: semantic user profile - as a binary text classifier (user-likes and user-dislikes) - containing the probabilistic model of user preferences

ITem Recommender (ITR)ITem Recommender (ITR)

12

ITem Recommender (ITR)ITem Recommender (ITR)

13

Word Sense Word Sense Disambiguation (WSD)Disambiguation (WSD)

Process of deciding which sense of a word is used in a specific context

WordNet as sense inventorynouns, verbs, adverbs and adjectives organized

into SYNonym SETs (synset), each one representing an underlying lexical concept

change of text representation from vectors (bag) of words (BOW) into vectors (bag) of synsets (BOS)

14

JIGSAW WSD algorithmJIGSAW WSD algorithm

Three different strategies to disambiguate nouns, verbs, adjectives and adverbsEffectiveness of WSD strongly influenced by the

POS tag of the target word

Input: d = {w1, w2, …. , wh} document

Output: X = {s1, s2, …. , sk} (kh)

Each si obtained by disambiguating wi based on the context of each word

Some words not recognized by WordNet Groups of words recognized as a single concept

15

JIGSAWJIGSAWnounsnouns: The idea: The idea

Adaptation of the Resnik algorithm Semantic similarity between synsets inversely

proportional to their distance in the WordNet IS-A hierarchy Path length similarity between synsets used to

assign scores to the candidate synsets of a polysemous word

16

Synset Semantic Similarity Synset Semantic Similarity

SINSIM(cat,mouse) =

-log(5/32)=0.806

Placental mammal

Carnivore Rodent

Feline, felid

Cat(feline mammal)

Mouse(rodent)

1

2

3 4

5

Leacock-Chodorow similarity

17

JIGSAWJIGSAWnounsnouns

w = cat C = {mouse}white hunt mousecat

mousecat mousemouse

02244530: any of numerous small

rodents…

03651364: a hand-operated electronic

device …

cat

“The white cat is hunting the mouse”

02037721: feline mammal…

00847815: computerized axial

tomography…

T={02244530,03651364}

Wcat={02037721,00847815}

18

w = cat C = {mouse}white hunt

cat mousemouse

02244530: any of numerous small

rodents…

03651364: a hand-operated electronic

device …

cat

T={02244530,03651364}

“The white cat is hunting the mouse”

02037721: feline mammal…

00847815: computerized axial

tomography…

Wcat={02037721,00847815}

0.107

0.0

0.0

0.8060.8060.806

JIGSAWJIGSAWnounsnouns

19

Description of synset si = gloss + example phrases in WordNet for si

GlosseGlossess

JIGSAWJIGSAWverbsverbs: synset : synset descriptiondescription

20

Description of synset si = gloss + example phrases in WordNet for si

Example phrasesExample phrases

JIGSAWJIGSAWverbsverbs: synset : synset descriptiondescription

21

JIGSAWJIGSAWverbsverbs: The idea: The idea

It tries to establish a relation between verbs and nounsNot directly linked in WordNet

Verb w disambiguated using:nouns in the context of wnouns into the description of each candidate

synset for w

22

1. (70) play -- (participate in games or sport; "We played hockey all afternoon"; "play cards"; "Pele played for the Brazilian teams in many important matches")

2. (29) play -- (play on an instrument; "The band played all night long")

3. …

JIGSAWJIGSAWverbsverbs: Example (1/4): Example (1/4)

nouns(play,1): game, sport, hockey, afternoon, card, team, match

w=play N={basketball,

soccer}

nouns(play,2): instrument, band, night

nouns(play,35): …

I play basketball and soccer

23

nouns(play,1): game, sport, hockey, afternoon, card, team, match

JIGSAWJIGSAWverbsverbs: Example (2/4): Example (2/4)

game

game1

game2

gamek

…basketball

basketball1

basketballh

MAXbasketball = MAXi SinSim(wi,basketball)

winouns(play,1)

w=play N={basketball,

soccer}

sport

sport1

sport2

sportk

24

nouns(play,1): game, sport, hockey, afternoon, card, team, match

JIGSAWJIGSAWverbsverbs: Example (3/4): Example (3/4)

game

game1

game2

gamek

…soccer

soccer1

soccerh

MAXsoccer = MAXi SinSim(wi, soccer) winouns(play,1)

w=play N={basketball,

soccer}

sport

sport1

sport2

sportk

25

JIGSAWJIGSAWverbsverbs: Example (4/4): Example (4/4)

nouns(play,1)

MAXsoccer

MAXbasketball Φ (play,1)= Weighted average of MAX values taking into account the position of each word in the context wrt the verb

nouns(play,i) Φ (play,i)

... ...

Synset assigned to “Synset assigned to “play”play” = argmax = argmax Φ (play,i)Φ (play,i) ii

26

Based on the Lesk algorithm Similarity between the glosses of each candidate

sense of target word and the glosses of words in the context

JIGSAWJIGSAWothersothers

27

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (1/5)Example (1/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

Candidate synsets for the target wordCandidate synsets for the target word

28

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

Keep glosses of candidate synsetsKeep glosses of candidate synsets

29

1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)

2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …

JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

Keep glosses of each word in the contextKeep glosses of each word in the context

30

1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)

2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …

JIGSAWJIGSAWothersothers::Example (2/5)Example (2/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

1. {07784932} wine, vino -- (fermented juice (of grapes especially))

2. {04907195} wine, wine-colored -- (a red as dark as red wine)

31

1. {02848798} bottle -- (a glass or plastic vessel used for storing drinks or other liquids; typically cylindrical without handles and with a narrow neck that can be plugged or capped)

2. {13584548} bottle, bottleful -- (the quantity contained in a bottle) …

JIGSAWJIGSAWothersothers::Example (3/5)Example (3/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

1. {07784932} wine, vino -- (fermented juice (of grapes especially)) 2. {04907195} wine, wine-colored -- (a red as dark as red wine)

+

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

= Gloss of the whole Gloss of the whole contextcontext

32

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (4/5)Example (4/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

Overlap between GlossesOverlap between Glosses

No overlapNo overlap

33

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (4/5)Example (4/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

OverlapOverlap

34

1. {01703749} aged, elderly, older, senior -- (advanced in years; "aged members of the society"; "elderly residents could remember the construction of the first skyscraper"; "senior citizen")

2. {01546830} aged, ripened - (of wines, fruit, cheeses; having reached a desired or final condition; "mature well-aged cheeses")

JIGSAWJIGSAWothersothers::Example (5/5)Example (5/5)

w=aged N={bottle, wine}I bought a bottle of aged

wine

a glass or plastic vessel used for storing drinks or other liquids typically cylindrical without handles and with a narrow neck that can be plugged or capped the quantity contained in a bottle fermented juice (of grapes especially) a red as dark as red wine

selected synset: 01546830 selected synset: 01546830

35

Paper RecommendingPaper Recommending

Instance

(paper)

Instance

(paper)

AbstractAbstract

AuthorsAuthors

TitleTitle

Tokenization + Stopword +Stemming

Keyword-based representation (BOW)

Tokenization + Stopword +POS + disambiguation

Sense-based representation (BOS)

content-based recommendations by learning from TEXT and USER RATINGS

(1-5) on papers

36

An example of An example of BOS-generated ProfileBOS-generated Profile

37

Conference Participant Conference Participant Advisor: LoginAdvisor: Login

Conference Participant Advisor service

38

Conference Participant Advisor: Conference Participant Advisor: Selecting Papers to train the systemSelecting Papers to train the system

39

Conference Participant Advisor: Conference Participant Advisor: Query disambiguationQuery disambiguation

40

Conference Participant Advisor: Conference Participant Advisor: Rating Retrieved PapersRating Retrieved Papers

41

Conference Participant Advisor: Conference Participant Advisor: Getting the Personalized ProgramGetting the Personalized Program

42

Personalized Program Personalized Program delivered by maildelivered by mail

1 - personalized conference program

2 - details about recommended papers

43

Conference Participant Advisor: Conference Participant Advisor: Personalized Program + Paper detailsPersonalized Program + Paper details

44

Experimental EvaluationExperimental Evaluation

Experiments: BOW-generated profiles vs. BOS-generated profiles

ISWC dataset 100 papers accepted at ISWC 02-03 288 ratings collected by 11 users

5-fold stratified cross-validation Precision, Recall, F-measure, NDPM

Paper relevant if rating >3 Probability of class “likes” >0.5

Wilcoxon signed rank test Classification for each user is a trial Low number of independent trials Significance level p < 0.05

45

Results of Semantic Profiles Results of Semantic Profiles EvaluationEvaluation

UserIdUserIdPrecisionPrecision RecallRecall F1F1 NDPMNDPM

BOWBOW BOSBOS BOWBOW BOSBOS BOWBOW BOSBOS BOWBOW BOSBOS

11 0.570.57 0.550.55 0.470.47 0.500.50 0.510.51 0.530.53 0.600.60 0.560.56

22 0.730.73 0.550.55 0.700.70 0.830.83 0.720.72 0.670.67 0.430.43 0.460.46

33 0.600.60 0.570.57 0.350.35 0.350.35 0.440.44 0.430.43 0.550.55 0.590.59

44 0.600.60 0.530.53 0.300.30 0.430.43 0.400.40 0.480.48 0.470.47 0.470.47

55 0.580.58 0.670.67 0.650.65 0.530.53 0.610.61 0.590.59 0.390.39 0.590.59

66 0.930.93 0.960.96 0.830.83 0.830.83 0.880.88 0.890.89 0.460.46 0.360.36

77 0.550.55 0.900.90 0.600.60 0.600.60 0.580.58 0.720.72 0.450.45 0.480.48

88 0.740.74 0.650.65 0.630.63 0.620.62 0.680.68 0.630.63 0.370.37 0.330.33

99 0.600.60 0.540.54 0.630.63 0.730.73 0.620.62 0.620.62 0.310.31 0.270.27

1010 0.500.50 0.700.70 0.370.37 0.500.50 0.420.42 0.580.58 0.510.51 0.480.48

1111 0.550.55 0.450.45 0.830.83 0.700.70 0.670.67 0.550.55 0.380.38 0.330.33

MeanMean 0.630.63 0.640.64 0.580.58 0.600.60 0.590.59 0.610.61 0.450.45 0.450.45

+1% +2% +2% =

46

Conclusions & Future WorksConclusions & Future Works

Conference Participant Advisor Intelligent service relying on concept-based profiles WSD based on linguistic ontology

As a future work integration of: domain-specific ontologies in the process of semantic

representation and indexing of documents social networks of conference participants as additional

source of information

47

Service detailsService details

Service deployed in VIKEF project at: http://193.204.187.223:8080/iswc_rebuild/

48

49

Backup slidesBackup slides

50

Bag of WordsBag of Words Bag of SynsetsBag of SynsetsDoc_iDoc_i

ddWord Word FormForm Occurr.Occurr.

3131 artificialartificial 11

3131 intelligenceintelligence 11

…… …… ……

11341134 WWWWWW 33

11341134 webweb 22

…… …… ……

Bag of SynsetsBag of Synsets

Reduction of features Recognition of bigrams Synonyms represented by the same synsets

Doc_idDoc_id Word Word FormForm

Synset_iSynset_idd

OccurrencOccurrencee

3131artificial artificial

intelligencintelligencee

67125686712568 11

…… …… ……

11341134 rollroll 20517202051720 33

11341134 wheelwheel 20517202051720 22

…… …… ……

11341134 WWW,webWWW,web 0442551704425517 55

51

Classification Phase Classification Phase

Each document is represented as a vector of BOS, one for each slot

Each slot is independent from the others

),|()(

)()|(

||

1

||

1

S

m

b

k

nmjk

i

jij

im

kimsctPdP

cPdcP

S = {s1, s2, …, s|S|} is the set of slots

tk is the kth token (occurring nkim times in BOS bim)

bim is the BOS in slot sm of instance di

52

Training PhaseTraining Phase

2||

1)(ˆ

||

1

TRcP

TR

i

ij

j

),|(ˆmjk sctP

C = {c+, c-} C+ likes (ratings 4-5) C– dislikes (ratings 1-2) (3 is neutral)

User ratUser ratingings rs ri i Weighted Instances Weighted Instances iiii

MAX

r

11

1

||

1

),,(TR

ikim

ijmjk nsctN

0),,( if

),,(

),,(

1

mjkV

hmjhc

mjk sctN

sctNV

sctNjc

j

0),,( if 1

),,(1

mjk

jcV

hmjh

jc sctNVV

sctNV

V

jc

jc

53

EvaluationEvaluation

JIGSAW evaluated on SENSEVAL-3 English Sample task: 37.6% Precision

JIGSAW evaluated on SENSEVAL-3 English All Word task: 52% Precision

Algorithm Precision

Lesk-based (nouns)

0.246

Lesk-based (verbs)

0.295

Lesk-based (adjectives)

0.403

JIGSAWnouns 0.319

JIGSAWverbs 0.405

JIGSAWothers 0.403

SENSEVAL-3 English Sample task