1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh.
-
Upload
jocelyn-simon -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Identifying Subjective Language Janyce Wiebe University of Pittsburgh.
1
Identifying Subjective Language
Janyce WiebeUniversity of Pittsburgh
2
Overview
General area: acquire knowledge of evaluative and speculative language and use it in NLP applications
Primarily corpus-based work
Today: results of exploratory studies
3
Collaborators
Rebecca Bruce, Vasileios Hatzivassiloglou, Joseph PhillipsMatthew Bell, Melanie Martin,Theresa Wilson
4
Subjectivity Tagging
Recognizing opinions and evaluations (Subjective sentences) as opposed to material objectively presented as true (Objective sentences)
Banfield 1985, Fludernik 1993, Wiebe 1994, Stein & Wright 1995
5
Examples
At several different levels, it’s a fascinating tale. subjective
Bell Industries Inc. increased its quarterly to 10 cents from 7 cents a share. objective
6
Subjectivity
“Complained”“You Idiot!”
“Terrible product”
“Speculated”“Maybe”
“Enthused”“Wonderful!”
“Great product”
7
Examples
Strong addressee-oriented negative evaluation Recognizing flames (Spertus 1997) Personal e-mail filters (Kaufer 2000)
I had in mind your facts, buddy, not hers.
Nice touch. “Alleges” whenever facts posted are not in your persona of what is “real.”
8
Examples
Opinionated, editorial language IR, text categorization (Kessler et al. 1997) Do the writers purport to be objective?
Look, this is a man who has great numbers.
We stand in awe of the Woodstock generation’sability to be unceasingly fascinated by the subjectof itself.
9
Examples
Belief and speech reports Information extraction, summarization,
intellectual attribution (Teufel & Moens 2000)
Northwest Airlines settled the remaining lawsuits,a federal judge said.
“The cost of health care is eroding our standard ofliving and sapping industrial strength”, complainsWalter Maher.
10
Other Applications
Review mining (Terveen et al. 1997)
Clustering documents by ideology (Sack 1995)
Style in machine translation and generation (Hovy 1987)
11
Potential Subjective Elements
"The cost of health care is eroding standards of living and sapping industrial strength,” complains Walter Maher.
Sap: potential subjective element
Subjective element
12
Subjectivity
Multiple types, sources, and targets
We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself.
Somehow grown-ups believed that wisdom adhered to youth.
13
Outline
Data and annotationSentence-level classification Individual wordsCollocationsCombinations
14
Annotations
Three levels: expression level sentence level document level
Manually tagged + existing annotations
15
Expression Level Annotations
[Perhaps you’ll forgive me] for reposting his response
They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]
16
Expression Level Annotations
Difficult for manual and automatic tagging: detailed no predetermined classification unit
To date: used for training and bootstrapping
Probably the most natural level
17
Document Level Annotations
Manual: flames in Newsgroups
Existing: opinion pieces in the WSJ: editorials, letters to the editor, arts & leisure reviews
* to ***** reviews
+ More directly related to applications, but …
18
Document Level Annotations
Opinion pieces contain objective sentences and Non-opinion pieces contain subjective sentences
Editorials contain facts supporting the argument
News reports present reactions (van Dijk 1988) “Critics claim …” “Supporters argue …”
Reviews contain information about the product
19
Document Level Annotations
opinion pieces subj 74% obj 26%
In a WSJ data set:
non-opinion pieces subj 43% obj 57%
20
Data in this Talk
Sentence level 1000 WSJ sentences 3 judges reached good agreement after rounds Used for training and evaluation
Expression level 1000 WSJ sentences (2J) 462 newsgroup messages (2J) + 15413 words (1J) Single round; results promising Used to generate features, and not for evaluation
21
Data in this Talk
Document level: Existing opinion-piece annotations used to generate features
Manually refined classifications used for evaluation Identified editorials not marked as such Only clear instances labeled To date: 1 judge
Distinct from the other data3 editions, each more than 150K words
22
Sentence Level AnnotationsA sentence is labeled subjective if any significantexpression of subjectivity appears
“The cost of health care is eroding our standard of living and sapping industrial strength,’’ complains Walter Maher.
“What an idiot,’’ the idiot presumably complained.
23
Sentence Classification
Binary Features: pronoun, adjective, number, modal ¬ “will “,
adverb ¬ “not”, new paragraph
Lexical feature: good for subj; good for obj; good for neither
Probabilistic classifier
10-fold cross validation; 51% baseline72% average accuracy across folds 82% average accuracy on sentences rated certain
24
Identifying PSEs
There are few high precision, high frequencypotential subjective elements
25
Identifying Individual PSEs
Classifications correlated with adjectivesGood subsets Dynamic adjectives (Quirk et al. 1985)
Positive, negative polarity; gradability automatically identified in corpora (Hatzivassiloglou & McKeown 1997)
Results from distributional similarity
26
Distributional Similarity
Word similarity based on distributional pattern of words
Much work in NLP (see Lee 99, Lee and Pereira 99)
Purposes: Improve estimates of unseen eventsThesaurus and dictionary construction from corpora
27
Lin’s Distributional Similarity
Lin 1998
I have a brown dogR1
R3
R2
R4
Word R W I R1 havehave R2 dogbrown R3 dog . . .
28
Lin’s Distributional Similarity
R W R W R WR W R W R W R W R W
Word1 Word2
Pairs statistically correlated with Word1
Sum over RWint: I(Word1,RWint) + I(Word2,RWint) /Sum over RWw1: I(Word1,RWw1) + Sum over RWw2: I(Word2,RWw2)
29
Bizarre
strange similar scary unusual fascinatinginteresting curious tragic different contradictory peculiar silly sad absurdpoignant crazy funny comic compellingodd
30
Bizarre
strange similar scary unusual fascinatinginteresting curious tragic different contradictory peculiar silly sad absurdpoignant crazy funny comic compellingodd
31
Bizarre
strange similar scary unusual fascinatinginteresting curious tragic different contradictory peculiar silly sad absurdpoignant crazy funny comic compellingodd
32
Filtering
SeedWords
Words+Clusters
Filtered Set
Word + cluster removedif precision on training set< threshold
33
Parameters
SeedWords
Words+Clusters
Cluster size
Threshold
34
Seeds from Annotations
1000 WSJ sentences with sentence level and expression level annotations
They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff].
"It's [e? 3 really] [e- 3 bizarre]," says Albert Lerman, creative director at the Wells agency.
35
Experiments
910
110 1/10 used for training, 9/10 for testing
Parameters: Cluster-size fixed at 20 Filtering threshold: precision of baseline adjective feature on the training data
+7.5% ave 10-fold cross validation
[More improvements with other adj features]
36
Opinion Pieces
3 WSJ data sets, over 150K words each
Skewed distribution: 13-17% words in opinions
Baseline for comparison: # words in opinions / total # words
For measuring precision: Prec(S) = # instances of S in opinions /
total # instances of S
37
Parameters
SeedWords
Words+Clusters
Cluster size
Threshold
1-70%
2-40
38
Results
Varies with parameter settings, but there are smoothregions of the space
Here: training/validation/testing
39
Low Frequency Words
Single instance in a corpus ~ low frequency
Analysis of expression level annotations: there are many more single-instance words in subjective elements than outside them
40
Unique Words
Replace all words that appear once in the test datawith “UNIQUE”
+5-10% points
41
Collocations
here we go again get out of here what a well and good rocket science for the last time just as well … !
Start with the observation that low precision wordsoften compose higher precision collocations
42
Collocations
Identify n-gram PSEs as sequences whose precisionis higher than the maximum precision of its constituents
W1,W2 is a PSE if prec(W1,W2) > max (prec(W1),prec(W2))
W1,W2,W3 is a PSE if prec(W1,W2,W3) > max(prec(W1,W2),prec(W3)) or prec(W1,W2,W3) > max(prec(W1),prec(W2,W3))
43
CollocationsModerate improvements: +3-10% points
But with all unique words mapped to “UNIQUE”:+13-24% points
44
Example Collocations with Unique
highly||adverb UNIQUE||adj
highly unsatisfactory
highly unorthodox
highly talented
highly conjectural
highly erotic
45
Example Collocations with Unique
UNIQUE||verb out||IN farm out chuck out ruling out crowd out flesh out blot out spoken out luck out
46
Collocations
UNIQUE||adj to||TO UNIQUE||verb impervious to reason strange to celebrate wise to temper
UNIQUE||noun of||IN its||pronoun sum of its usurpation of its proprietor of its
they||pronoun are||verb UNIQUE||noun they are fools they are noncontenders
47
Opinion Results: Summary
Best Worst baseline 17% baseline 13% +prec/freq +prec/freq
Adjs +21/373 +09/2137 Verbs +16/721 +07/31932-grams +10/569 +04/5253-grams +07/156 +03/1481-U-grams +10/6065 +06/60452-U-grams +24/294 +14/2883-U-grams +27/138 +13/144
Disparate features have consistent performanceN Collocation sets largely distinct
48
Does it add up?
Good preliminary results classifying opinion piecesusing density and feature count features.
49
Future Work
Mutual bootstrapping (Riloff & Jones 1999)
Co-training (Collins & Singer 1999) to learn both PSEs and contextual featuresIntegration into a probabilistic modelText classification and review mining
50
References
Banfield, A. (1982). Unspeakable Sentences. Routledge and Kegan Paul.Collins, M. & Singer, Y. (1999). Unsupervised models for named entity classification. EMNLP-VLC-99.van Dijk, T.A. (1988). News as Discourse. Lawrence Erlbaum.Fludernik, M. (1983). The Fictions of Language and the Languages of Fiction. Routledge.Hovy, E. (1987). Generating Natural Language Under Pragmatic Constraints. PhD dissertation.Kaufer, D. (2000). Flaming. www.eudora.comKessler, B., Nunberg, G., Schutze H. (1997). Automatic Detection of Genre. ACL-EACL-97.Riloff, E. & Jones R. (1999). Learning Dictionaries for Information Extraction by Multi-level Boot-strapping. AAAI-99
51
References
Stein, D. & Wright, S. (1995). Subjectivity and Subjectivisation. Cambridge.Terveen, W., Hill, W., Amento, B. ,McDonald D. & Creter, J. (1997). Building Task-Specific Interfaces to High Volume Conversational Data. CHI-97.Teufel S., & Moens M. (2000). What’s Yours and What’s Mine: Determining Intellectual Attribution in Scientific Texts. EMNLP-VLC-00.Wiebe, J. (2000). Learning Subjective Adjectives from Corpora. AAAI-00.Wiebe, J. (1994). Tracking Point of View in Narrative. Computational Linguistics (20) 2.Wiebe, J. , Bruce, R., & O’Hara T. (1999). Development and Use of a Gold Standard Data Set for Subjectivity Classifications. ACL-99.
52
References
Hatzivassiloglou V. & McKeown K. (1997). Predicting the Semantic Orientation of Adjectives. ACL-EACL-97.Hatzovassiloglou V. & Wiebe J. (2000). Effects of Adjective Orientation and Gradability on Sentence Subjectivity. COLING-00.Lee, L. (1999). Measures of Distributional Similarity. ACL-99.Lee, L. & Pereira F. (1999). ACL-99.Lin, D. (1998). Automatic Retrieval and Clustering of Similar Words. COLING-ACL-98.Quirk, R, Greenbaum, S., Leech, G., & Svartvik, J. (1985). A Comprehensive Grammar of the English Language. Longman.Sack, W. (1995). Representing and Recognizing Point of View. AAAI Fall Symposium on Knowledge Navigation and Retrieval.
53
Sentence Annotations
Ave pair-wise Kappa scores: all data: .69 certain data: .88 (60% of the corpus)
Case study of analyzing and improving intercoderreliability:
if there is symmetric disagreement resulting from biasassessed by fitting probability models (Bishop et al. 1975, CoCo)
•bias: marginal homogeneity •symmetric disagreement: quasi-symmetry
use the latent class model to correct disagreements
54
Test for Bias: Marginal Homogeneity
Worse the fit,greater the bias
C1
C2
C4
C1
C3
C2 C3 C4
4+ = X4
3+ = X3
2+ = X2
1+ = X1
X1+1 =
X2+2 =
X3+3 =
X4+4 =
ii pp for all i
55
Test for Symmetric Disagreement: Quasi-Symmetry
C1
C2
C4
C1
C3
C2 C3 C4
*
*
***
***
* *
**Tests relationshipsamong the off-diagonal counts
Better the fit,higher the correlation
56
(Potential) Subjective Elements
Same word, different types “Great majority” objective “Great!“ positive evaluative “Just great.” negative evaluative
57
Review Mining
From: Hoodoo>[email protected]>Newsgroups: rec.gardensSubject: Re: Garden software
I bought a copy of Garden Encyclopedia from Sierra.Well worth the time and money.