Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition?...
-
Upload
lynette-greene -
Category
Documents
-
view
213 -
download
0
Transcript of Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition?...
![Page 1: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/1.jpg)
Pastra et al., LREC 2002
How feasible is the reuse of grammars forNamed Entity Recognition?
Katerina Pastra, Diana Maynard, Oana Hamza,
Hamish Cunningham and Yorick Wilks
Department of Computer Science, Natural Language Processing
Group, University of Sheffield, U.K.
![Page 2: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/2.jpg)
Pastra et al., LREC 2002
The paradox
NER results: close to human performance
Reuse of NER resources: minimal
We will focus on:
Traditional rule-based NER systems
NER in text
Reuse of grammars for NER
Manual adaptation of grammars
![Page 3: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/3.jpg)
Pastra et al., LREC 2002
1) Grammar Formalism
2) Application Domain 3) Natural Language
What is it that hinders grammar reuse?
The use of Flexible System Architectures guarantees
reusability of resources>>> But
is this a “sine qua non” solution ?Does the lack of such architectures render
reusability simply “not feasible” ?
![Page 4: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/4.jpg)
Pastra et al., LREC 2002
Grammar Formalism (1)
>> Current Practice: No standardised formalism
>> Traditional pattern-matching languages:
inappropriate for NER
>> Norm: Use of AV notations (allow for reference
to token attributes from multiple analysis levels).
• Translating formalisms: a time-effective solution?
• Time gained-information lost: is there a trade-off?
![Page 5: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/5.jpg)
Pastra et al., LREC 2002
Grammar Formalism (2)
The need: NER for SOCIS (not main task – limited time)
The problem:Existing grammar in another formalism
>> NEA – JAPE Similarities: Declarative, context-sensitive, non-det PM…
>> NEA – JAPE Differences: BU rule invocation – FST cascades Appelt control mechanism - Appelt, First, Brill Rules augmented with PROLOG – JAVA Wildcards, “don’t care sequ”: not common Iterations, (!=) : different mechanisms
![Page 6: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/6.jpg)
Pastra et al., LREC 2002
Grammar Formalism (3)
The experiment: From the NEA notation to JAPE
NEA notation: A => B\C/D
JAPE: (B)(C) :label (D) :label.EntityType = {attr}
• one’s LHS another’s RHS
• same things handled in different ways
• differences in modules run before NER affect rulesSTILL:
Original set in 2 months – SOCIS set in 1 week
![Page 7: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/7.jpg)
Pastra et al., LREC 2002
Application Domain (1)
Is there a core set of grammar rules that are always domain independent ?
General purpose NER grammars:
• Developed to serve grammar reuse, but originated
themselves from specific applications
• They separate specific from general information.
• MUSE: automatic resource switches ~ text features
• HaSIE: company reports on health and safety issues
![Page 8: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/8.jpg)
Pastra et al., LREC 2002
Application Domain (2)
The experiment:• The gazetteers were enriched with police and crime related information• All original domain-specific rules were deleted• Original results with no modifications to the grammar : close to 90% • Only 1 change to the core set and addition of rules
From newswire text on Biotechnology
to … Crime Scene Police Reports
![Page 9: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/9.jpg)
Pastra et al., LREC 2002
Natural Language (1)
Parameters to consider:
• The relation of A and B (close related or not)
determines the extent of reuse
• Nature of NEs (formation, syntagmatic relations)
unpredictable behaviour and structure
finite set
NER Grammar in language (A) + linguistic knowledge of NE in (B) = NER grammar for (B) ?
![Page 10: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/10.jpg)
Pastra et al., LREC 2002
Natural Language (2)
Romanian NE (compared to English):
• Rich inflection
• Flexible word order
• Different word order (e.g modifier follows noun)
The experiment:
Run NER grammar for English on Romanian text
![Page 11: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/11.jpg)
Pastra et al., LREC 2002
Natural Language (3)
1st experiment: Romanian Gaz + English grammar
>> Overall Results: P = 0.82, R = 0.67
• Low recall even for entity types rec with high P
(e.g. Org 0.75P – 0.39R)
2nd experiment: Romanian Gaz + Adapted grammar
>> Overall Results: P = 0.95, R = 0.94
Corpus: 1MB of Romanian newspaper texts
Manual marking of NEs – Romanian NER (3 weeks)
![Page 12: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/12.jpg)
Pastra et al., LREC 2002
Natural Language (3)
Entity Type Precision Recall
Address 0.81 0.81
Date 0.67 0.77
Location 0.88 0.96
Money 0.82 0.47
Organisation
0.75 0.39
Percent 1 0.82
Person 0.68 0.78
Identifier 0.94 0.38
Overall 0.82 0.67
Entity Type Precision Recall
Address 0.96 0.93
Date 0.95 0.94
Location 0.92 0.97
Money 0.98 0.92
Organisation 0.95 0.89
Percent 1 0.99
Person 0.88 0.92
Identifier 0.99 0.96
Overall 0.95 0.94
![Page 13: Pastra et al., LREC 2002 How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham.](https://reader036.fdocuments.in/reader036/viewer/2022072015/56649ecf5503460f94bdc7d5/html5/thumbnails/13.jpg)
Pastra et al., LREC 2002
Reuse of existing NER grammars is time effective
and should be attempted even when the formalisms,
applications and languages involved are different
Conclusions
Further issues to be addressed:
• Reuse of NER grammars for spoken NEs
• Reuse in statistical/ML NER approaches
• Automating grammar reuse