Extending facet search to the general web

38
Extending Facet Search to the General Web Date:2014/11/27 Author:Weize Kong,James Allan Source:CIKM’14 Advisor:Jia-ling Koh Spearker:LIN,CI-JIE 1

Transcript of Extending facet search to the general web

Extending Facet Search to the

General WebDate:2014/11/27

Author:Weize Kong,James Allan

Source:CIKM’14

Advisor:Jia-ling Koh

Spearker:LIN,CI-JIE1

Outline

Introduction

Method

Evaluation

Experiment

Conclusion

2

Outline

Introduction

Method

Evaluation

Experiment

Conclusion

3

Introduction

Faceted search helps users by offering drill-down

options as a complement to the keyword input boxfacet

facet term

Multiple

selections

4

Introduction

However, this idea is not well explored for general

web search

Big data

Heterogeneous nature

Google defaul facet 5

Introduction

Goal:

query-dependent automatic facet generation

Incorporate user feedback on these query facets into

document ranking

所有航線國際航線國內航線

Query-dependent facet

6

Outline

Introduction

Method

Evaluation

Experiment

Conclusion

7

Flow chart

querySearch result

Candidate facets

Facets

Extracting

CandidateRefining

Candidate

Facet feedback terms

Selecting facets

Top-ranked Document

Facet feedback model

Ranking

documents

8

Flow chart

querySearch result

Candidate facets

Facets

Extracting

CandidateRefining

Candidate

Facet feedback terms

Selecting facets

Top-ranked Document

Facet feedback model

Ranking

documents

9

Extracting candidate

applied both textual and HTML patterns on the top

search results

10

Extracting candidate example

query : “mars landing”

search result: ”Mars rovers such as Curiosity,

opportunity and Spirit”

candidate facets:

C: {Curiosity,Opportunity,Spirit}

11

Cleaning candidate query facets

Converting text to lowercase

Removing non-alphanumeric characters

Removing stopwords and duplicate terms

Removing all candidate facets that contain

only one item or more than 200 items

12

Extracting candidate

The candidate query facets extracted are usually noisy

Non-relevant to the issued query

Terms are not members of the same class

Incomplete

Four candidate facets for the query “mars landing”

13

Flow chart

querySearch result

Candidate facets

Facets

Extracting

CandidateRefining

Candidate

Facet feedback terms

Selecting facets

Top-ranked Document

Facet feedback model

Ranking

documents

14

Refining Candidate

Re-cluster the query facets or their facet terms

into higher quality query facets

15

Refining Candidate

Topic model

pLSA, LDA

Unsupervised clustering method

QDMiner, QDM

supervised methods based on a graphical model

QF-I, QF-J

16

Refining Candidate example

Input: {sets of noisy terms}

Output: {pure facets}

Year: {2001,2012,2013}

Lab: {nasa,bell lab,mars science lab }

Refining

Candidate

17

Flow chart

querySearch result

Candidate facets

Facets

Extracting

CandidateRefining

Candidate

Facet feedback terms

Selecting facets

Top-ranked Document

Facet feedback model

Ranking

documents

18

Facet feedback model

gives a score for document

Input : Document,Query,Facet feedback terms

Model:

Boolean Filtering Model

Soft Ranking Model

Output: the score of each document

19

Boolean Filtering Model

𝐹𝑢denote the set of feedback facets selected by a

user

condition B can be either AND, OR, or A+O

S(D,Q) is the score returned by the original retrieval

model

20

Soft Ranking Model

λ is a parameter for adjusting the weight between the two

parts

𝑆𝐸(D, 𝐹𝑢) is the expansion model which captures the

relevance between the document D and feedback facet 𝐹𝑢

21

Expansion model

Term expansion model

Facet expansion model

22

Outline

Introduction

Method

Evaluation

Experiment

Conclusion

23

Evaluation

Intrinsic Evaluation

Extrinsic Evaluation

24

Intrinsic Evaluation

Ground truth: query facets are constructed by human

annotators

The ground truth to be compared with facets generated by

different systems

Annotators are asked to group or re-group terms in the pool

into preferred query facets Pooling facets generated by the different systems

25

Intrinsic Evaluation

SearchResult

Candidate Facets Facets : { terms }

Query

Extracting

Candidates

Refining

Candidates

Pool

user

Facets : { terms }re-group

Facets generated by different

systems

annotators26

Extrinsic Evaluation

Evaluate a system based on an interactive search task that

incorporates Facet web Search(FWS)

The gain can be measured by the improvement of the re-

ranked results

The cost can be measured by the time spent by the users

giving facet feedback

27

Extrinsic Evaluation

Oracle Feedback and Annotator Feedback

Oracle feedback only selected effective terms as feedback

The annotator is asked to select all the terms from the

facets that would help address the information need

28

Extrinsic Evaluation

User model

based on user model we can estimate the time cost for the

user

time for scanning facet

time for selecting terms

29

Extrinsic Evaluation

30

Facet GenerationModel

Facets : { terms }

User model TimeSimulatedFeedback

Performance

Selected Terms

FacetFeedback

evaluation

Outline

Introduction

Method

Evaluation

Experiment

Conclusion

31

Experiment Settings

Data set

For the document corpus, we use the ClueWeb09 Category-B collection

196 queries and 678 query subtopics

Facet Generation Models

pLSA, LDA,QDM, QF-I and QF-J

Facet FeedbackModels

Boolean filtering model, soft ranking model

Baseline Retrieval Model

SDM32

Experiment

33

Experiment

Experiment testies to the

potential of Facet Web Search

34

Experiment

35

Outline

Introduction

Method

Evaluation

Experiment

Conclusion

36

Conclusion

Proposed Faceted Web Search, an extension of

faceted search to the general Web

Boolean filtering models are too strict in Faceted Web

Search, and less effective than soft ranking models

37

Thanks for listening.

38