Empirical Development of an Exponential Probabilistic Model Using Textual Analysis to Build a Better...

Post on 18-Dec-2015

221 views 4 download

Tags:

Transcript of Empirical Development of an Exponential Probabilistic Model Using Textual Analysis to Build a Better...

Empirical Development of anExponential Probabilistic Model

Using Textual Analysis to Build a Better Model

Jaime Teevan & David R. KargerCSAIL (LCS+AI), MIT

Goal: Better Generative Model

Generative v. discriminative modelApplies to many applications Information retrieval (IR)

Relevance feedback Using unlabeled data

Classification

Assumptions explicit

Using a Model for IR

1. Define model2. Learn parameters from query3. Rank documents

Hyper-learn

• Better model improves applications Trickle down to improve retrieval Classification, relevance feedback, …

• Corpus specific models

Overview

Related workProbabilistic models Example: Poisson Model Compare model to text

Hyper-learning the model Exponential framework Investigate retrieval performance

Conclusion and future work

Related Work

Using text for retrieval algorithm [Jones, 1972], [Greiff, 1998]

Using text to model text [Church & Gale, 1995], [Katz, 1996]

Learning model parameters [Zhai & Lafferty, 2002]

Hyper-learn the model from text!

Probabilistic Models

Rank documents by RV = Pr(rel|d)

Naïve Bayesian models

RV = Pr(rel|d)

Probabilistic Models

Rank documents by RV = Pr(rel|d)

Naïve Bayesian models

= Pr(dt|rel) features t

RV = Pr(rel|d) 8Open assumptionsFeature definitionFeature distribution family

words

# occs in doc

Defines the model!

Pr(d|rel)

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

Pr(dt|rel) =

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

Pr(dt|rel) = θ e -θ

dt!

dtPoisson Model

θ: specifies term distribution

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

Poisson

Term occurs exactly dt

times

Pr(

d t|rel)

Example Poisson Distribution

θ=0.0006

Pr(dt|rel)≈1E-15

+

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

Learn a θ for each term

Maximum likelihood θ Term’s average number of occurrence

Incorporate prior expectations

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

For each document, find RV

Sort documents by RV

= Pr(dt|rel)

. words t

RV

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

For each document, find RV

Sort documents by RV

= Pr(dt|rel)

. words t

RV

Which step goes wrong?

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

Using a Naïve Bayesian Model

1. Define model2. Learn parameters from query3. Rank documents

Pr(dt|rel) = θ e -θ

dt!

dt

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

DataPoisson

Term occurs exactly dt

times

Pr(

d t|rel)

How Good is the Model?

θ=0.0006

15 times

+

How Good is the Model?

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

DataPoisson

Term occurs exactly dt

times

Pr(

d t|rel)

θ=0.0006

15 times

Misfit!

+

Hyper-learning a Better FitThrough Textual Analysis

Using an Exponential Framework

Need framework for hyper-learning

Bernoulli

Poisson

Normal

Mixtures

Hyper-Learning Framework

Need framework for hyper-learning

Goal: Same benefits as Poisson Model One parameter Easy to work with (e.g., prior)

Bernoulli

Poisson

Normal

One parameter exponential families

Mixtures

Hyper-Learning Framework

Well understood, learning easy [Bernardo & Smith, 1994], [Gous, 1998]

Pr( dt | rel ) = f(dt) g(θ) e

Functions f(dt) and h(dt) specify family E.g., Poisson: f(dt) = (dt!)-1, h(dt) = dt

Parameter θ term’s specific distribution

Exponential Framework

θ h(dt)

Using a Hyper-learned Model

1. Define model2. Learn parameters from query3. Rank documents

Using a Hyper-learned Model

1. Hyper-learn model2. Learn parameters from query3. Rank documents

Using a Hyper-learned Model

1. Hyper-learn model2. Learn parameters from query3. Rank documents

Want “best” f(dt) and h(dt)

Iterative hill climbing Local maximum Poisson starting point

Using a Hyper-learned Model

1. Hyper-learn model2. Learn parameters from query3. Rank documents

Data: TREC query result sets Past queries to learn about future queries

Hyper-learn and test with different sets

Recall the Poisson Distribution

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

DataPoissonNew Model

Term occurs exactly dt

times

Pr(

d t|rel)

15 times

+

Poisson Starting Point - h(dt)

-2

-1

0

1

2

3

4

5

6

0 1 2 3 4 5

PoissonLearned

h(d

t)

dt

Pr(dt|rel) = f(dt) g(θ) eθ h(dt)

+

-2

-1

0

1

2

3

4

5

6

0 1 2 3 4 5

PoissonLearned

h(d

t)

dt

Hyper-learned Model - h(dt)Hyper-learned Model - h(dt)+

Pr(dt|rel) = f(dt) g(θ) eθ h(dt)

Poisson Distribution

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

DataPoissonNew Model

Term occurs exactly dt

times

Pr(

d t|rel)

15 times

+

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

DataPoissonNew Model

Term occurs exactly dt

times

Hyper-learned Distribution

15 times

Hyper-learned Distribution+

Pr(

d t|rel)

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

DataPoissonNew Model

Term occurs exactly dt

times

5 times

Hyper-learned DistributionHyper-learned Distribution+

Pr(

d t|rel)

1E-19

1E-171E-15

1E-131E-11

1E-091E-07

1E-050.001

0.1

0 1 2 3 4 5

DataPoissonNew Model

Term occurs exactly dt

times

30 times

Hyper-learned DistributionHyper-learned Distribution+

Pr(

d t|rel)

1E-19

1E-171E-15

1E-13

1E-111E-09

1E-071E-05

0.001

0.1

0 1 2 3 4 5

DataPoissonNew Model

Term occurs exactly dt

times

300 times

Hyper-learned DistributionHyper-learned Distribution+

Pr(

d t|rel)

Performing Retrieval

1. Hyper-learn model2. Learn parameters from query3. Rank documents

Performing Retrieval

1. Hyper-learn model2. Learn parameters from query3. Rank documents

Pr( dt | rel ) = f(dt) g(θ) e

Learn θ for each term

θ h(dt)

Labeled docs

Learning θ

Sufficient statistics Summarize all observed data τ1: # of observations τ2: Σobservations d h(dt)

Incorporating prior easy

Map τ1 and τ2 θ

20 labeled documents

Performing Retrieval

1. Hyper-learn model2. Learn parameters from query3. Rank documents

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

PoissonNew Model

Recall

Pre

cisi

on

Results: Labeled DocumentsResults: Labeled Documents

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

PoissonNew Model

Recall

Pre

cisi

on

Results: Labeled DocumentsResults: Labeled Documents

Performing Retrieval

1. Hyper-learn model2. Learn parameters from query3. Rank documents

Short query

Query = single labeled documentVector space-like equation

RV = Σ a(t, d) + Σ b(q, d)

Problem: Document dominatesSolution: Use only query portion Another solution: Normalize

Retrieval: Query

t in doc q in query

Retrieval: Query

0

0.1

0.2

0.3

0.4

0.5

0.6

PoissonNew ModelTF.IDF

Recall

Pre

cisi

on

Retrieval: Query

0

0.1

0.2

0.3

0.4

0.5

0.6

PoissonNew ModelTF.IDF

Recall

Pre

cisi

on

Retrieval: Query

0

0.1

0.2

0.3

0.4

0.5

0.6

PoissonNew ModelTF.IDF

Recall

Pre

cisi

on

Retrieval: Query

Conclusion

Probabilistic models Example: Poisson Model

Hyper-learning the model Exponential framework Learned a better model Investigate retrieval performance

- Easy to work with

- Better …

- Bad text model

- Heavy tailed!

Use model betterUse for other applications Other IR applications Classification

Correct for document lengthHyper-learn on different corpora Test if learned model generalizes Different for genre? Language?

People?

Hyper-learn model better

Future Work

Questions?

Contact us with questions:

Jaime Teevanteevan@ai.mit.edu

David Kargerkarger@theory.lcs.mit.edu