Selected New Training Documents to Update User Profile Abdulmohsen Algarni and Yuefeng Li and Yue Xu...

Selected New Training Documents to Update User Profile

Abdulmohsen Algarni and Yuefeng Li and Yue Xu

CIKM 2010

Hao-Chin Chang

Department of Computer Science & Information EngineeringNational Taiwan Normal University

2011/06/07

2

Outline

• Introduction

• Pattern-based model

• Adaptive information filtering

• Experiment

• Conclusion

Introduction

• Filtering task indicates to the user which document might be interested to him– Determine which ones are really relevant is fully reserved to the user

• Information Filtering (IF) model aim is to re-rank the incoming set of documents based on user profile of the user's topic

• Major studies in this area can be grouped into two main groups– First, the purpose of knowledge extraction from user feedback is to build

the user's profile

– The second deals with how effectively an efficiently a user profile can be updated with a new feedback in order to follow the user's interest change and improve the quality of the filtering system

3

Introduction

• Relevance Feature Discovery(RFD)

• High level features (patterns) and low level features will be extracted from initial training documents

• The higher level features include both positive and negative patterns

• The low level term weights are evaluated according to both their specicity and their distributions in the higher level features

4

Introduction

• In order to deal with adaptive issues in an IF model, there are two main areas of focus– The first involves updates of the user's profile to follow changes in the

user's interests with new information– The second area involves updating the user profile to solve

nonmonotonic problems Training documents about the “Agent” IF systems may return information objects such as “Intelligent Agent” ,

“Property Agent” , “Software Agent” previous matching decisions (e.g. considering “Property Agent” as relevant) user’s actual information need (e.g. user is only interested in “Software

Agent” as non relevant)

• How slove nonmonotonic problems– The first is how to select a document that contains new knowledge that a

system does not have– The second issue is how to evaluate and update based-knowledge with

the new one in an efficient way

5

Introduction

• Adaptive Relevance Features Discovery (ARFD)

• First, is the ability of IF system to extract dierent knowledge for dierent users in dierent interested topics

• Second, is the ability of updating and reviewing the weight of features in the hypothesis space model when is

received a new feedback

6

Pattern-based model

• We used a pattern-based model to extract features from relevance feedback

• This is different from the usual defnition where a pattern consists of distinct terms and duplicate terms are removed.

• Coverset({t3,t4,t6},d) = {dp2,dp3,dp4}

• supa({t3,t4,t6},d) = 3

• Supr ({t3,t4,t6},d) = 3/6=0.5

• Closed patterns :< t3,t4,t6 >,< t1,t2>,< t6 >

7

Pattern-based model

• For a given term t, its weight in discovered patterns in positive text documents

• The specicity of a given term t in the training set D = D+∪ D-

• The initial weights of terms finally are revised

8

||

1 ||),(sup)Dw(t,w(t)

D

i SPpt

ir

iPdp

}|{)(cov

}|{)(cov |D|

| )(cov | - | )(cov |)(-

dtDdterage

dtDdterageterageteragetspe

})(|{

})(|{

})(|{

1

2

21

tsptTtT

tsptTtT

tsptTtG

Ttftspetwtwweight

GtftwweightTtftspetwtwweight

i )()()(

i )( i )()()(

Adaptive information filtering

• Document Selection– The new feedback can be categorized into two main categories

• First, is a document that contains more explanation about what user need on the same topic

• Second category is documents that contain new area or topic which are indicate that the user changed his interest topic and that is out of our scope

• The system has used the following ranking function.

9

dtifdtdtifdt

dttwdrankTt

0),( 1),(

),()()(

Adaptive information filtering

• Knowledge Extraction and Merging of Adaptive RFD model (ARFD)

(1) Mining features and functions from the initial (or the base) training set Db

(2)SNTSelect describes the details of selecting some target documents Ds from a new training set Dn in order to remove some redundant documents(3)Mining features and functions from the target documents Ds

(4)Merging these features and the functions discovered from the both initial training set and the selected target documents

10

Experiment

11

Experiment

12

13

Conclusion

• We proposed an adaptive information fltering system called adaptive relevance features discovery

• The main aim of this method is the efficient revision and updating of extracted features weight in vector space using new training documents to solve the nonmonotonic problem

• The combination of the knowledge will be tested to ensure that it helps to solve the nonmonotonic problem

P-value

• P-value 定義上是 : 以現有的樣本資料而言 , 能棄絕 (reject) 虛無假說 H0 的最小顯著水準• 顯著水準是做檢定時我們能容許的型一錯誤機率上限。因此 , 顯著水準愈小 , 則棄絕域愈小。所以 , 若在特定的顯著水準下依據目前的資料 H0 能被棄絕 , 則可將顯著水準降低 ; 但降得太低 , 則目前的資料點可能被排擠出棄絕域之外 , 即不能棄絕 H0 。 P-value 就是表示顯著水準放寬至能棄絕 H0 後又儘量縮減至幾乎不能棄絕 H0 的情況。

14

SPMining

15

HLFmining

16

NRevision

17

SNTselect

18

Selected New Training Documents to Update User Profile Abdulmohsen Algarni and Yuefeng Li and Yue Xu...

Documents

Transcript of Selected New Training Documents to Update User Profile Abdulmohsen Algarni and Yuefeng Li and Yue Xu...