11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online...

32
1 1 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R. Lyu, Young-In Song, Yunbo Cao Chinese University of Hong Kong Microsoft Research Asia AT&T Labs Research August 11, 2011@AAAI San Francisco, USA

Transcript of 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online...

Page 1: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

11Learning to Suggest Questions in Online Forums@AAAI

Learning to Suggest Questions in Online Forums

Tom Chao Zhou, Chin-Yew Lin, Irwin KingMichael R. Lyu, Young-In Song, Yunbo Cao

Chinese University of Hong KongMicrosoft Research Asia

AT&T Labs Research

August 11, 2011@AAAISan Francisco, USA

Page 2: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

22Learning to Suggest Questions in Online Forums@AAAI

Background

Motivation

Related Work

Experiments

Our Approach

Conclusions and Future Work

Page 3: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

33Learning to Suggest Questions in Online Forums@AAAI

Background

• Online forum– Web application– Interactive, domain-specific– E.g. travel, sports, programming

Page 4: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

44Learning to Suggest Questions in Online Forums@AAAI

Background

Threads

Each thread contains a discussion topic

Page 5: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

55Learning to Suggest Questions in Online Forums@AAAI

Background

• Questions are focus – [Shrestha and McKeown 2004]

• Mining knowledge, Question-Answer pairs– [Cong et al. 2008][Bian et al. 2008]

• Question search– How is Orange Beach in Alabama?– Any idea about Orange Beach in Alabama?

• Limitation– Unware query only capture one aspect of a

topic

Page 6: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

66Learning to Suggest Questions in Online Forums@AAAI

Background

Motivation

Related Work

Experiments

Our Approach

Conclusions and Future Work

Page 7: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

77Learning to Suggest Questions in Online Forums@AAAI

Motivation

• Suggest semantically related questions– How is Orange Beach in Alabama?– Is the water pretty clear this time of year on

Orange Beach?– Do they have chair and umbrella rentals on

Orange Beach?– Topic: “Travel in Orange Beach”– beach, water, chair, umbrella, rental…

Page 8: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

88Learning to Suggest Questions in Online Forums@AAAI

Motivation

• Benefits– Explore information needs from different

aspects• “Travel”: beach, water, chair, umbrella

– Increase page views• Enticing users’ clicks on suggested questions

– Relevance feedback mechanism• Mining users’ click through logs on suggested

questions

Page 9: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

99Learning to Suggest Questions in Online Forums@AAAI

Background

Motivation

Related Work

Experiments

Our Approach

Conclusions and Future Work

Page 10: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1010Learning to Suggest Questions in Online Forums@AAAI

Related Work

• Question search– Translation model

• [Jeon, Croft and Lee 2005][Duan et al. 2008]

– Translation based language model• [Xue, Jeon and Croft 2008]

• Question recommendation– MDL-based tree cut model

• [Cao et al. 2008]

• Differences– Fuse both lexical and latent semantic information– Utilizing interactive nature of online forums

Page 11: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1111Learning to Suggest Questions in Online Forums@AAAI

Background

Motivation

Related Work

Experiments

Our Approach

Conclusions and Future Work

Page 12: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1212Learning to Suggest Questions in Online Forums@AAAI

Our Approach

• Document representation– Bag-of-words

• Independent• Fine-grained representation• Lexically similar

– Topic model• Assign a set of latent topic distributions to each word• Capturing important relationships between words• Coarse-grained representation• Semantically related

Page 13: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1313Learning to Suggest Questions in Online Forums@AAAI

Our Approach

• TopicTRLM– Topic-enhanced Translation-based Language

Model

Page 14: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1414Learning to Suggest Questions in Online Forums@AAAI

Our Approach

• TopicTRLM

– q: a query, D: a candidate question– w: a word in query– : parameter balance weights of BoW and topic

model– Jelinek-Mercer smoothing

TRLM score: BoW

LDA score: topic model

Page 15: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1515Learning to Suggest Questions in Online Forums@AAAI

Our Approach

• TRLM

– C: question corpus, :Dirichlet smoothing parameter

– T(w|t): word to word translation probabilities

• Use of LDA

• K: number of topics, z: a topic

Page 16: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1616Learning to Suggest Questions in Online Forums@AAAI

Our Approach

• Estimate T(w|t)– IBM model 1, monolingual parallel corpus– Questions are focus of forum discussions,

questions posted by a thread starter (TS) during the discussion are very likely to explore different aspects of a topic

• Build parallel corpus– Extract questions posted by TS, question pool Q– Question-question pairs, enumerating

combinations in Q– Aggregating all q-q pairs from each forum thread

Page 17: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1717Learning to Suggest Questions in Online Forums@AAAI

Background

Motivation

Related Work

Experiments

Our Approach

Conclusions and Future Work

Page 18: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1818Learning to Suggest Questions in Online Forums@AAAI

Experiments

• Data set– Crawled from TripAdvisor– TST_LABEL: labeled data for 268 questions– TST_UNLABEL: 10,000 threads at least 2 questions

posted by thread starters– TRAIN_SET: 1,976,522 questions,971,859 threads

• Parallel corpus to learn T(w|t)• LDA training data• Question repository

• Question detector– Labeled sequential pattern mining[Cong et al.

2008]

Page 19: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

1919Learning to Suggest Questions in Online Forums@AAAI

Experiments

• Data analysis• Post level

• Forum discussions are quite interactive• Power law

# Threads # Threads that have replied posts from TS

Average # replied posts from TS

1,412,141 566,256 1.9

Page 20: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2020Learning to Suggest Questions in Online Forums@AAAI

Experiments

• Data analysis• Question level

• 68.8% thread starters asked questions• On average 2 questions are asked by

thread starters in each thread• Question is a focus of forum discussions

# Threads # Threads TSs’ posts contain questions

Average # questions in TSs’ posts

1,412,141 971,859 2.0

Page 21: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2121Learning to Suggest Questions in Online Forums@AAAI

Experiments

• Word translation

• IBM 1: semantic relationships of words from semantically related questions

• LDA: co-occurrence relations in a question

Page 22: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2222Learning to Suggest Questions in Online Forums@AAAI

Experiments

• Labeled question

• LDA performs the worst, coarse-grained• TRLM > TR > QL• TopicTRLM outperforms other approaches

Page 23: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2323Learning to Suggest Questions in Online Forums@AAAI

Experiments

• Topics’ joint probability distribution– For each q, consider its first subsequent

question q’ posted by the TS as relevant– For 10,000 q, LDA to infer the most probable

topic, aggregate the counts of topic transitions– K * K topic transition matrix as ground truth– KL divergence, the smaller, the better

Page 24: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2424Learning to Suggest Questions in Online Forums@AAAI

Background

Motivation

Related Work

Experiments

Our Approach

Conclusions and Future Work

Page 25: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2525Learning to Suggest Questions in Online Forums@AAAI

Conclusions and Future Work

• Summary– Propose a question suggestion application in

forums– Propose a method to build parallel corpus of

related questions– Propose TopicTRLM, which fuses lexical

knowledge with latent semantic knowledge

• Future work– How to measure and diversify the suggested

questions?– How question suggestion could help long query

suggestion?

Page 26: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2626Learning to Suggest Questions in Online Forums@AAAI

• Thanks!• Q & A

Page 27: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2727Learning to Suggest Questions in Online Forums@AAAI

FAQ

• Q: Which tools do you use?• A:

– GIZA++ [Och and Ney 2003] train IBM model 1.– GibbsLDA++ [Phan, Nguyen and Horiguchi

2008] to conduct LDA training and inference. – Porter Stemmer to stem question words.– Stop word list by SMART system, but 5W1H

were removed

Page 28: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2828Learning to Suggest Questions in Online Forums@AAAI

FAQ

• Q: Which metrics do you use?• A:

– P@R: Precision at Rank R– MAP: Mean average precision– MRR: Mean reciprocal rank– KL-divergence: Kullback-Leibler divergence

Page 29: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

2929Learning to Suggest Questions in Online Forums@AAAI

FAQ

• Q: How to tune parameters?• A: We used 20 queries from TST_LABEL,

and employ MAP to tune parameters

Page 30: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

3030Learning to Suggest Questions in Online Forums@AAAI

FAQ

• Q:Aligned monolingual questions• A:

– Has anyone had an experiences with the Eden Condos in Perdido Key?

– Does anyone know how the beaches are there in Perdido key?

– Can you go fishing right from the shore on Orange Beach?

– What kinds of rods, and bait is needed for fishing down there?

Page 31: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

3131Learning to Suggest Questions in Online Forums@AAAI

FAQ

• Query likelihood language model using Dirichlet smoothing (QL)

Page 32: 11 Learning to Suggest Questions in Online Forums@AAAI Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

3232Learning to Suggest Questions in Online Forums@AAAI

FAQ

• Translation model using Dirichlet smoothing (TR)