1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh.

1

Identifying Subjective Language

Janyce WiebeUniversity of Pittsburgh

2

Overview

General area: acquire knowledge of evaluative and speculative language and use it in NLP applications

Primarily corpus-based work

Today: results of exploratory studies

3

Collaborators

Rebecca Bruce, Vasileios Hatzivassiloglou, Joseph PhillipsMatthew Bell, Melanie Martin,Theresa Wilson

4

Subjectivity Tagging

Recognizing opinions and evaluations (Subjective sentences) as opposed to material objectively presented as true (Objective sentences)

Banfield 1985, Fludernik 1993, Wiebe 1994, Stein & Wright 1995

5

Examples

At several different levels, it’s a fascinating tale. subjective

Bell Industries Inc. increased its quarterly to 10 cents from 7 cents a share. objective

6

Subjectivity

“Complained”“You Idiot!”

“Terrible product”

“Speculated”“Maybe”

“Enthused”“Wonderful!”

“Great product”

7

Examples

Strong addressee-oriented negative evaluation Recognizing flames (Spertus 1997) Personal e-mail filters (Kaufer 2000)

I had in mind your facts, buddy, not hers.

Nice touch. “Alleges” whenever facts posted are not in your persona of what is “real.”

8

Examples

Opinionated, editorial language IR, text categorization (Kessler et al. 1997) Do the writers purport to be objective?

Look, this is a man who has great numbers.

We stand in awe of the Woodstock generation’sability to be unceasingly fascinated by the subjectof itself.

9

Examples

Belief and speech reports Information extraction, summarization,

intellectual attribution (Teufel & Moens 2000)

Northwest Airlines settled the remaining lawsuits,a federal judge said.

“The cost of health care is eroding our standard ofliving and sapping industrial strength”, complainsWalter Maher.

10

Other Applications

Review mining (Terveen et al. 1997)

Clustering documents by ideology (Sack 1995)

Style in machine translation and generation (Hovy 1987)

11

Potential Subjective Elements

"The cost of health care is eroding standards of living and sapping industrial strength,” complains Walter Maher.

Sap: potential subjective element

Subjective element

12

Subjectivity

Multiple types, sources, and targets

We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself.

Somehow grown-ups believed that wisdom adhered to youth.

13

Outline

Data and annotationSentence-level classification Individual wordsCollocationsCombinations

14

Annotations

Three levels: expression level sentence level document level

Manually tagged + existing annotations

15

Expression Level Annotations

[Perhaps you’ll forgive me] for reposting his response

They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]

16

Expression Level Annotations

Difficult for manual and automatic tagging: detailed no predetermined classification unit

To date: used for training and bootstrapping

Probably the most natural level

17

Document Level Annotations

Manual: flames in Newsgroups

Existing: opinion pieces in the WSJ: editorials, letters to the editor, arts & leisure reviews

* to ***** reviews

+ More directly related to applications, but …

18


Opinion pieces contain objective sentences and Non-opinion pieces contain subjective sentences

Editorials contain facts supporting the argument

News reports present reactions (van Dijk 1988) “Critics claim …” “Supporters argue …”

Reviews contain information about the product

19


opinion pieces subj 74% obj 26%

In a WSJ data set:

non-opinion pieces subj 43% obj 57%

20

Data in this Talk

Sentence level 1000 WSJ sentences 3 judges reached good agreement after rounds Used for training and evaluation

Expression level 1000 WSJ sentences (2J) 462 newsgroup messages (2J) + 15413 words (1J) Single round; results promising Used to generate features, and not for evaluation

21

Data in this Talk

Document level: Existing opinion-piece annotations used to generate features

Manually refined classifications used for evaluation Identified editorials not marked as such Only clear instances labeled To date: 1 judge

Distinct from the other data3 editions, each more than 150K words

22

Sentence Level AnnotationsA sentence is labeled subjective if any significantexpression of subjectivity appears

“The cost of health care is eroding our standard of living and sapping industrial strength,’’ complains Walter Maher.

“What an idiot,’’ the idiot presumably complained.

23

Sentence Classification

Binary Features: pronoun, adjective, number, modal ¬ “will “,

adverb ¬ “not”, new paragraph

Lexical feature: good for subj; good for obj; good for neither

Probabilistic classifier

10-fold cross validation; 51% baseline72% average accuracy across folds 82% average accuracy on sentences rated certain

24

Identifying PSEs

There are few high precision, high frequencypotential subjective elements

25

Identifying Individual PSEs

Classifications correlated with adjectivesGood subsets Dynamic adjectives (Quirk et al. 1985)

Positive, negative polarity; gradability automatically identified in corpora (Hatzivassiloglou & McKeown 1997)

Results from distributional similarity

26

Distributional Similarity

Word similarity based on distributional pattern of words

Much work in NLP (see Lee 99, Lee and Pereira 99)

Purposes: Improve estimates of unseen eventsThesaurus and dictionary construction from corpora

27

Lin’s Distributional Similarity

Lin 1998

I have a brown dogR1

R3

R2

R4

Word R W I R1 havehave R2 dogbrown R3 dog . . .

28

Lin’s Distributional Similarity

R W R W R WR W R W R W R W R W

Word1 Word2

Pairs statistically correlated with Word1

Sum over RWint: I(Word1,RWint) + I(Word2,RWint) /Sum over RWw1: I(Word1,RWw1) + Sum over RWw2: I(Word2,RWw2)

29

Bizarre

strange similar scary unusual fascinatinginteresting curious tragic different contradictory peculiar silly sad absurdpoignant crazy funny comic compellingodd

30

Bizarre


31

Bizarre


32

Filtering

SeedWords

Words+Clusters

Filtered Set

Word + cluster removedif precision on training set< threshold

33

Parameters

SeedWords

Words+Clusters

Cluster size

Threshold

34

Seeds from Annotations

1000 WSJ sentences with sentence level and expression level annotations

They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff].

"It's [e? 3 really] [e- 3 bizarre]," says Albert Lerman, creative director at the Wells agency.

35

Experiments

910

110 1/10 used for training, 9/10 for testing

Parameters: Cluster-size fixed at 20 Filtering threshold: precision of baseline adjective feature on the training data

+7.5% ave 10-fold cross validation

[More improvements with other adj features]

36

Opinion Pieces

3 WSJ data sets, over 150K words each

Skewed distribution: 13-17% words in opinions

Baseline for comparison: # words in opinions / total # words

For measuring precision: Prec(S) = # instances of S in opinions /

total # instances of S

37

Parameters

SeedWords

Words+Clusters

Cluster size

Threshold

1-70%

2-40

38

Results

Varies with parameter settings, but there are smoothregions of the space

Here: training/validation/testing

39

Low Frequency Words

Single instance in a corpus ~ low frequency

Analysis of expression level annotations: there are many more single-instance words in subjective elements than outside them

40

Unique Words

Replace all words that appear once in the test datawith “UNIQUE”

+5-10% points

41

Collocations

here we go again get out of here what a well and good rocket science for the last time just as well … !

Start with the observation that low precision wordsoften compose higher precision collocations

42

Collocations

Identify n-gram PSEs as sequences whose precisionis higher than the maximum precision of its constituents

W1,W2 is a PSE if prec(W1,W2) > max (prec(W1),prec(W2))

W1,W2,W3 is a PSE if prec(W1,W2,W3) > max(prec(W1,W2),prec(W3)) or prec(W1,W2,W3) > max(prec(W1),prec(W2,W3))

43

CollocationsModerate improvements: +3-10% points

But with all unique words mapped to “UNIQUE”:+13-24% points

44

Example Collocations with Unique

highly||adverb UNIQUE||adj

highly unsatisfactory

highly unorthodox

highly talented

highly conjectural

highly erotic

45

Example Collocations with Unique

UNIQUE||verb out||IN farm out chuck out ruling out crowd out flesh out blot out spoken out luck out

46

Collocations

UNIQUE||adj to||TO UNIQUE||verb impervious to reason strange to celebrate wise to temper

UNIQUE||noun of||IN its||pronoun sum of its usurpation of its proprietor of its

they||pronoun are||verb UNIQUE||noun they are fools they are noncontenders

47

Opinion Results: Summary

Best Worst baseline 17% baseline 13% +prec/freq +prec/freq

Adjs +21/373 +09/2137 Verbs +16/721 +07/31932-grams +10/569 +04/5253-grams +07/156 +03/1481-U-grams +10/6065 +06/60452-U-grams +24/294 +14/2883-U-grams +27/138 +13/144

Disparate features have consistent performanceN Collocation sets largely distinct

48

Does it add up?

Good preliminary results classifying opinion piecesusing density and feature count features.

49

Future Work

Mutual bootstrapping (Riloff & Jones 1999)

Co-training (Collins & Singer 1999) to learn both PSEs and contextual featuresIntegration into a probabilistic modelText classification and review mining

50

References

Banfield, A. (1982). Unspeakable Sentences. Routledge and Kegan Paul.Collins, M. & Singer, Y. (1999). Unsupervised models for named entity classification. EMNLP-VLC-99.van Dijk, T.A. (1988). News as Discourse. Lawrence Erlbaum.Fludernik, M. (1983). The Fictions of Language and the Languages of Fiction. Routledge.Hovy, E. (1987). Generating Natural Language Under Pragmatic Constraints. PhD dissertation.Kaufer, D. (2000). Flaming. www.eudora.comKessler, B., Nunberg, G., Schutze H. (1997). Automatic Detection of Genre. ACL-EACL-97.Riloff, E. & Jones R. (1999). Learning Dictionaries for Information Extraction by Multi-level Boot-strapping. AAAI-99

51

References

Stein, D. & Wright, S. (1995). Subjectivity and Subjectivisation. Cambridge.Terveen, W., Hill, W., Amento, B. ,McDonald D. & Creter, J. (1997). Building Task-Specific Interfaces to High Volume Conversational Data. CHI-97.Teufel S., & Moens M. (2000). What’s Yours and What’s Mine: Determining Intellectual Attribution in Scientific Texts. EMNLP-VLC-00.Wiebe, J. (2000). Learning Subjective Adjectives from Corpora. AAAI-00.Wiebe, J. (1994). Tracking Point of View in Narrative. Computational Linguistics (20) 2.Wiebe, J. , Bruce, R., & O’Hara T. (1999). Development and Use of a Gold Standard Data Set for Subjectivity Classifications. ACL-99.

52

References

Hatzivassiloglou V. & McKeown K. (1997). Predicting the Semantic Orientation of Adjectives. ACL-EACL-97.Hatzovassiloglou V. & Wiebe J. (2000). Effects of Adjective Orientation and Gradability on Sentence Subjectivity. COLING-00.Lee, L. (1999). Measures of Distributional Similarity. ACL-99.Lee, L. & Pereira F. (1999). ACL-99.Lin, D. (1998). Automatic Retrieval and Clustering of Similar Words. COLING-ACL-98.Quirk, R, Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. Longman.Sack, W. (1995). Representing and Recognizing Point of View. AAAI Fall Symposium on Knowledge Navigation and Retrieval.

53

Sentence Annotations

Ave pair-wise Kappa scores: all data: .69 certain data: .88 (60% of the corpus)

Case study of analyzing and improving intercoderreliability:

if there is symmetric disagreement resulting from biasassessed by fitting probability models (Bishop et al. 1975, CoCo)

•bias: marginal homogeneity •symmetric disagreement: quasi-symmetry

use the latent class model to correct disagreements

54

Test for Bias: Marginal Homogeneity

Worse the fit,greater the bias

C1

C2

C4

C1

C3

C2 C3 C4

4+ = X4

3+ = X3

2+ = X2

1+ = X1

X1+1 =

X2+2 =

X3+3 =

X4+4 =

ii pp for all i

55

Test for Symmetric Disagreement: Quasi-Symmetry

C1

C2

C4

C1

C3

C2 C3 C4

*

*

***

***

* *

**Tests relationshipsamong the off-diagonal counts

Better the fit,higher the correlation

56

(Potential) Subjective Elements

Same word, different types “Great majority” objective “Great!“ positive evaluative “Just great.” negative evaluative

57

Review Mining

From: Hoodoo>[email protected]>Newsgroups: rec.gardensSubject: Re: Garden software

I bought a copy of Garden Encyclopedia from Sierra.Well worth the time and money.

1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh.

Documents

Transcript of 1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh.