+ Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10.

Post on 03-Jan-2016

226 views 0 download

Transcript of + Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10.

+

Detecting Genre Shift

Mark Dredze, Tim Oates, Christine Piatko

Paper to appear at EMNLP-10

+Natural Language Processing and Machine Learning

Extracting findings from scientific papers

•Genetic epidemiology (development domain)

•PubMed search produces thousands of papers

•Manually reviewed to extract findings

•Findings determine relevant papers/studies

•Automate this process with ML/NLP methods

•Create searchable database of findings

•Allow machine inference over findings

•Suggest new scientific hypotheses

+Genre Shift in Statistical NLP

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… President Barack Obama is urging members to …

… President Barack Obama is urging members to …

+Supervised Machine Learning for Named Entity Recognition

Today the Atlantic Ocean is in an uproar and North Carolina remains in a state of anxiety.

Windowed Text Label

Today the Atlantic Ocean is B

the Atlantic Ocean is in I

Atlantic Ocean is in an O

Ocean is in an uproar O

is in an uproar and O

in an uproar and North O

an uproar and North Carolina O

uproar and North Carolina remains B

and North Carolina remains in I

North Carolina remains in a O

+Supervised Machine Learning for Named Entity Recognition

Windowed Text Label

Today the Atlantic Ocean is B

the Atlantic Ocean is in I

Atlantic Ocean is in an O

Feature Vector Label

[today, the, atlantic, ocean, is, U, L, U, U, L] B

[the, atlantic, ocean, is, in, L, U, U, L, L] I

[atlantic, ocean, is, in, an, U, U, L, L, L] O

+Genre Shift in Statistical NLP

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…

???

+This is a Pervasive Problem

Extracting regulatory pathways from online bioinformatics journals using a parser trained on the WSJ

Finding faces in images of disaster victims using a model trained on “mug shot” images

Identifying RNA sequences that regulate gene expression in a lab in Baltimore using a model trained on data gathered in a lab in Germany

When things change in a way that’s harmful, we’d like to know!

+Data Streams Change Over Time

Natural drift Users unaware of system limitations

Sentiment classification from movie reviews

+Detecting Genre Shift

Two problems1)Detect changes in stream of

numbers (A-distance)2)Convert document stream to

stream of informative numbers (margin)

Genre shift hurts system performance (accuracy)

+Detecting Genre Shift

Measure accuracy directlyRequires labeled examples!

Look for changes in feature distributionsWords become more/less commonNew words appear

Genre shift hurts system performance (accuracy)

+Measuring Changes in Streams:The A-Distance

A nonparametric, distribution independent measure of changes in univariate, real-valued data streams (Kifer, Ben-David, and Gherke, 2004)

P P’

+Measuring Changes in Streams:The A-Distance

P P’

> ε

+Measuring Changes in Streams:The A-Distance

P P’

> ε

+Changes in Document Streams

… President Barack Obama is urging members to …

X

+Changes in Document Streams

… President Barack Obama is urging members to …

X

Obama

embassy

41

4

1

+Changes in Document Streams

… President Barack Obama is urging members to …

XW

Obama

embassy

41

1.6

0.1 1.6 * 4 + 0.1 * 1 + … = 3.7

+Changes in Document Streams

… President Barack Obama is urging members to …

XW

Obama

embassy

41

1.6

0.1 1.6 * 4 + 0.1 * 1 + … = 3.7

• WX = margin• sign of WX is class label (+/-)• magnitude of WX is “certainty” in label

+Why Margins?

We have an easy way of producing them from unlabeled examples!

We want to track feature changes Margins are linear combinations of feature values Removing important features yields smaller

margins Only track features that matter, features with

zero (small) weight don’t affect margin (much)

Spoiler alert! Tracking margins works really well for unsupervised detection on genre shifts.

+Accuracy vs. Margins

DVD to Electronics

+Accuracy vs. Margins

DVD to Electronics

Average in block

Average over last 100 instances

+Accuracy vs. Margins

DVD to Electronics

+Confidence Weighted Margins

Margins can be viewed as measure of confidence

We detect when confidence in classifications drops

Confidence Weighted (CW) learning refines this idea Gaussian distribution over weight vectors Mean of weight vector: μ in RN

Diagonal co-variance matrix: σ in RNxN

Low variance high confidence

Normalized margin: μx / (xTσx)0.5

Called VARIANCE in slides that follow

μ

1.6

0.1

σ = 0.02σ = 1.74

+Experiments

Datasets Sentiment classification between domains (Blitzer et al.,

2007) DVDs, electronics, books, kitchen appliances

Spam classification between users (Jiang and Zhai, 2007) Named entity classification between genres (ACE 2005)

News articles, broadcast news, telephone, blogs, etc.

Algorithms Baselines: SVM, MIRA, CW Our method: VARIANCE

+Experiments

Simulated domain shifts between each pair of genres 38 pairs, 10 trials each with different random instance

orderings 500 source examples 1500 target examples

False change 11 datasets with no shift, 10 trials with different random

instance orderings

If no shift found then detection recorded as end of target examples when computing averages

+Comparing Algorithms

Instances from point of shift

Good for o

ur

approach

!

Good for b

aselin

e

+SVM vs. VARIANCE

+SVM vs. VARIANCE

+Summary of Results Thus Far

VARIANCE detected shifts faster than … SVM 34 times out of 38 MIRA 26 times out of 38 CW 27 times out of 38

+Gradual Shifts

+What if you have labels?

STEPD: a Statistical Test of Equal Proportions to Detect concept drift (Nishida and Yamauchi, 2007)

Monitors accuracy of classifier from stream of labeled examples

Parameters: window size, W, and threshold, α

+Comparison to STEPD

+What about false positives?

+The A-Distance: Choosing Parameters

P

> ε

A

n

+The A-Distance: Choosing Parameters

P

> ε

A

n

+The A-Distance: Choosing Parameters

• A-distance paper gives bounds on FPs and FNs• Bounds depend on n and • Bounds do not depend on tiling!• So loose as to be meaningless• No guidance on how to choose tiling

• What if tiles lie outside support of data?

+Better Bounds

PA = true probability of a point falling in tile A

h = number of points that actually fell in A

pA = h/n = ML estimate of PA

Define P’A, h’, and p’A for second window

Suppose PA = P’A, then any change detected is a false positive

> ε

What is the probability that |pA – p’A| > /2?

+Posterior Over PA

B(, ) is the Beta function over + Bernoulli trials

trials have one outcome (point lands in tile A)

trials have the other (point lands in some other tile)

+False Positives: Two Cases

+Don’t worry, I’m not going to explain this (much)

+Probability of a FP (n = 200)

+Probability of FN

+Minimizing Expected Loss

+Moving Forward

GenreClassifier Newswire

TranscribedBroadcast

News

Twitter

+Genre Shift “Fix”

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…

+Genre Shift “Fix”

… told that John Paul Stevens is retiring this summer …

Named Entity Recognition

… PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…

… President Barack Obama is urging members to …

+Conclusion

Changes in margins convey useful information about changes in classification accuracy No need for labeled examples!

The A-distance applied to margin streams finds genre shifts with few false positives/negatives

Confidence weighted margins normalized by variance detect shifts faster than SVM, MIRA, or (non-normalized) CW margins

Our approach even works with gradual shifts and compares favorably to shift detectors that use labeled examples

+Thank you!