A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank...

14
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan Zhou Duke University, ECE December 18, 2009

Transcript of A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank...

Page 1: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model

Domain Adaptation

Frank Wood and Yee Whye Teh AISTATS 2009

Presented by: Mingyuan ZhouDuke University, ECEDecember 18, 2009

Page 2: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Outline

• Background• Pitman-Yor Process• Hierachical Pitman-Yor Process Language Models• Doubly Hierachical Pitman-Yor Process Language Model • Inference• Experimental results• Summary

Page 3: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Background: Language modeling and n-Gram models

• “A language model is usually formulated as a probability distribution p(s) over strings s that attempts to reflect how frequently a string s occurs as a sentence”.

• n-Gram (n=2: bigram, n=3: trigram)

• Smoothing:

Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.

Page 4: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

• Example

• Smoothing

Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.

Page 5: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

• Evaluation

• Train the n-Gram model:

• Calculate:

• Cross-entropy:

• Perplexity:

Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.

Page 6: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Dirichlet Process and Pitman-Yor Process

• Dirichlet Process

Number of unique words grows at

• Pitman-Yor Process

Number of unique words grows at

• When d=0, Pitman-Yor Process reduces to DP

• Both can be understood through the Chinese Restaurant process

DP Pitman-Yor

Sitting at Table k

Sitting at new Table

0~ DP( , )G G

1

( ) /( )t

k kk

c d c

1

( ) /( )t

kk

dt c

1

/( )t

k kk

c c

1

/( )t

kk

c

Page 7: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Power-law properties of the Pitman-Yor Process

Num

ber

of u

niqu

e w

ords

Number of words

0d

0.5d 0.9d

Pro

port

ion

of w

ords

app

earin

g on

ce

Number of words

0d

0.5d

0.9d

Page 8: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Hierachical Pitman-Yor Process Language Models

Page 9: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Doubly Hierachical Pitman-Yor Process Language Model

Page 10: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Doubly Hierachical Pitman-Yor Process Language Model

Page 11: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Inference• Direchlet Process, Chinese Restaurant Process

• Hierachical Direchlet Process, Chinese Restaurant Franchise

• Pitman-Yor Process, Chinese Restaurant Process

• Hierachical Pitman-Yor Process, Chinese Restaurant Franchise

• Doubly Hierachical Pitman-Yor Language Model, Graphical Pitman-Yor Process, Multi-floor Chinese Restaurant Process, Multi-floor Chinese Restaurant Franchise

Page 12: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Experimental results (HPYLM)

Page 13: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Experimental results (DHPYLM)

Page 14: A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.

Summary

• DHPYLM achieves encouraging domain adaptation results.

• A graphical Pitman-Yor process is constructed and a multi-floor Chinese restaurant representation is proposed for doing sampling.

• DHPYLM may be integrated into topic models to eliminate “bag-of-words” assumptions.