Frank Reichartz Gerhard Paaß Knowledge Discovery Fraunhofer IAIS St. Augustin, Germany
description
Transcript of Frank Reichartz Gerhard Paaß Knowledge Discovery Fraunhofer IAIS St. Augustin, Germany
Frank ReichartzGerhard Paaß
Knowledge Discovery Fraunhofer IAISSt. Augustin, Germany
Estimating Supersenses with Conditional Random Fields
2© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Agenda
1.Introduction
2. Models for Supersenses
3. Conditional Random Fields
4. Lumped Observations
5. Summary
Agenda
3© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Use Case Contentus
Digitize a Multimedia Collection of German National Library
• Music of former GDR
• Digitize
• Quality control
• Meta data collection
• Semantic indexing
• Semantic search engine Target
• Provide content: text, score sheets, video, images, speech
• Generate meta data: composers, premiere, director, artists, …
• Extract entities: dates, places, relations, composers, pieces of music, …
• Assign meanings to words and phrases: use ontology
4© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Wordnet as Ontology
• WordNet is a fine-grained word sense hierarchy
• The same word may have different senses: bank = financial institutebank = river boundary
• Defines senses (synsets) for
• Verbs
• Common & proper nouns
• Adjectives
• Adverbs
Target: assign each word to a synset
• Easy semantic indexing & retrieval
5© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Fine Grained Word Senses • Example: senses of noun „blow“
Very subtle differences between senses
6© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Hierarchy of Hypernyms
• Supersense level
• Fewer distinctions
• Retains main differences
Target: assign verbs / nouns to a supersense
7© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
List of Supersenses
15
26
8© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Supersenses discriminate between many synsets
Noun blow:
• 7 synsets
• 5 supersenses
Verb blow:
• 22 synsets
• 9 supersenses
Sufficient for coarse disambiguation
9© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Agenda
1.Introduction
2. Models for Supersenses
3. Conditional Random Fields
4. Lumped Observations
5. Summary
Agenda
10© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Training Data: SemCor Dataset
A A DT
compromise compromise NN 1190419noun.communication
will will MD
leave leave VB 2610151 verb.stative
both both DT
sides side NN 8294366 noun.group
without without IN
the the DT
glow glow NN 13864852 noun.state
of of IN
triumph triumph NN 7425691 noun.feeling
, , PUNC
but but CC
it it PRP
will will MD
save save VB 2526596 verb.social
Berlin location NNP 26074 noun.location
. . PUNC
Synset Supersense
Output
Input
11© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Prior Work: Classifier
• Bag-of-words is not sufficient: code relative positions
• Use classifiers
• MaxEnt
• SVM
• Naive Bayes
• kNN
• Proc. SemEval 2007Coarse-Grained English All-Words Task
12© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Prior Work: Sequence Modelling
Chiaramita & Altun 06: Use perceptron-trained HMM
x)(y, sequencea for counts feature ),,(),(
),(,);,(
sequencea in feature / wordth-i and label th-i is ,
1 11
d
i
y
jjji
ii
xyyyx
yxwwyxF
xy
• Maximize predictive performance on training set
• Ignore ambiguity: use only most frequent sense
Deschacht & Moens 06: Use Conditional Random Field
• Exploit hierarchy to model many classes
• Apply to fine grained word sense: good results
13© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Agenda
1.Introduction
2. Models for Supersenses
3. Conditional Random Fields
4. Lumped Observations
5. Summary
Agenda
14© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Definition of Conditional Random Fields
observed words / features X1,…,Xn
states Y1,…,Yn
each state Yt may be influenced by many of the X1,…,Xn features
Definition: Let G=(V,E) be a graph such that Y=(Yt)tV, so that Y is indexed
by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yt obey the Markov property with respect to the
graph:
),|(),,...,,...,|( )(,111 XYYpXYYYYYptYneighwwtnttt
Hammersley Cliffort Theorem: probability can be written as a product of potential functions
Variable sets are cliques in the neighborhood set
15© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Simplification: Sequential Chain
observed words / features X1,…,Xn
states Y1,…,Yn
N
t
N
k
N
ktCkkttCkkn YgYYf
ZXYYp
1 1 1,1,1
2 1
),(),,(exp),,(
1)|,,( XX
X
features may involve two consecutive states and all observed words
examples
feature has value 1 if Yt-1="other" and Yt=location and Xt has pos tag "proper name" and (Xt-2,Xt-1)="arrived in". Otherwise the value is 0
estimated POS tags, noun phrase tags, weekday, amounts, etc.
prefixes, suffixes; matching regular expressions for capitalization, etc.
information from lexica, lists of proper names[Lafferty, McCallum, Pereira 01]
16© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Derivative of Likelihood
estimate the optimal parameter vector for the training set (Xd,Yd), d=1,…,D
D
d
D
ddsdddk
sfpf
L
1 1
),(),|(),(Y
XYXYXY
observed
feature value
expected
feature value
how can we calculate the expected feature values?
need for every document d and state Yd,t the probability p(Ydt=i|Xd)
need for every d and states Yd,t , Yd,t+1 the probability p(Yd,t=i,Yd,t+1=j|Xd)
use forward - backward algorithm as for the hidden Markov model
17© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Optimization Procedure: Gradient Ascent
Regularization term: use small weight values, if possible smaller generalization errorBayesian prior distribution: Gaussian, Laplace, etc.
use gradient-based optimizer: e.g. conjugate gradient, BFGS approximate quadratic optimization
use stochastic gradient
v
D
d
D
ddsdddk
s
sfpfL
1 1
),(),|(),(Y
XYXYXY
regularizationexpected feature
value
observed
feature value
[Sutton, McCallum 06]
18© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Features
• Lemmas of verbs of previous position
• Part-of-Speech of lags -2|-1|0|1|2
• Coarse POS lags -2|-1|0|1|2
• Three letter prefixes lag -1,0,1
• Three letter suffixes lag -1,0,1
• INITCAP -1|0|1
• ALLDIGITS -1|0|1
• ALLCAPS -1|0|1
• MIXEDCAPS -1|0|1
• CONTAINSDASH -1|0|1
• Class-ID of unsupervised LDA topic model with 50 classes
SemCor training set: ~ 20k sentences, 5-fold cross validation
19© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Results for nouns
Different F1-valuesevent: 67.0%Tops: 98.2%
Micro-Average: 83.5%
Macro-Average: 77.9%
Different frequencies of examplesmotive: 133artifact: 8894
Noun
20© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Comparison to Prior Result
Micro-Average:
Ciaramita & Altun 06: 77.18 % ()
CRF 83.5% ()
~ 28% reduction of error
Supersense
Ciaramita-Altun 06 CRF
Rec. Prec. F Rec. Prec. F
n.person 92.0 87.9 89.9 96.0 96.9 96.4
n.group 75.4 79.6 77.4 84.4 85.1 84.7
n.location 77.2 75.4 76.3 85.1 80.8 82.9
n.time 88.4 84.3 86.3 92.6 90.5 91.5
21© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Agenda
1.Introduction
2. Models for Supersenses
3. Conditional Random Fields
4. Lumped Observations
5. Summary
22© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Need for More Data
WordNet covers more than 100000 synsets
Few examples per supersense: higher training error
Many examples required to train each synsetSemCor: ~20k sentences
Manual labelling is costly
Exploit restrictions in WordNet
Each word has only a subset of possible supersenses
• Blow: n.act, n.event, n.phenomenon, n.artifact, n.act
Unlabeled data: assign possible supersenses to each word
Specialized CRF required
23© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Conditional Random Field with Lumped Supersenses observed words / features X1,…,Xn
states Y1,…,Yn, Yt…
Observations: Subsets of supersenses Yt …
An observation (X1,…,Xn; Y1,…,Yn) contains a large number of sequences
N
t
N
k
N
ktCkkttCkkn YgYYf
ZXYYp
1 1 1,1,1
2 1),(),,(exp
),,(1
)|,,( XXX
Adapt likelihood computation
),,( 1 nYY
24© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Training Data: SemCor DatasetA A DT
compromise compromise NN n.communication, n.act
will will MD
leave leave VB v.stative, v.motion, v.cogn., v.change, v.social, v.possession
both both DT
sides side NN n.group, n. location, n.body, n.artifact, n.cogn., n. food, n. communication, n.object, n.event
without without IN
the the DT
glow glow NN n.state, n.attribute, n.phenomenon, n.feeling
of of IN
triumph triumph NN n.feeling, n.event
, , PUNC
but but CC
it it PRP
will will MD
save save VB v.social, v.possession, v.change,
Berlin location NNP n.location, n.person, n.artifact
. . PUNC
Possible Supersenses
25© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Results for Lumped Supersenses: SemCor
Simulate lumped supersenses
Determine possible supersenses for SemCor
Use different fractions of annotated / possible supersenses
Method Fraction annotated - possible
Precision
Recall F1
Ciaramita & Altun 06
3 - 0 76.6 77.7 77.1
CRF 3 - 0 83.4 83.6 83.5
CRF 2 - 0 83.1 83.2 83.1
CRF 2 - 1 83.7 83.9 83.8
CRF 0 – 3 (nothing observed)
82.6 82.7 82.7
Work in progress
Supersenses estimated without annotations: only 0.8% reduction of F-value
26© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Agenda
1.Theseus Overview
2. Use Case Contentus
3. Core Technology Cluster
4. Supersense Tagging
5. Summary & Conclusions
27© Fraunhofer IAIS. Gerhard Paaß19 Sept 08 HLIE Workshop at ECML/PKDD’08
Summary
• Sequence models are able to extract supersenses
• New features like topic models help
• We may use non-annotated texts by exploiting restrictions in the ontology
• Chance to improve classifiers considerably
• May enhance higher order IE and information retrieval
Todo
• Apply to lower levels of hierarchy
• Detect new senses / supersenses of words in WordNet