Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham...

14
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK

Transcript of Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham...

Page 1: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Towards a semantic extraction of named entities

Diana Maynard, Kalina Bontcheva, Hamish Cunningham

University of Sheffield, UK

Page 2: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Introduction

• Challenges posed by progression from traditional IE to a more semantic representation of NEs

• What techniques are best for the deeper level of analysis necessary?

• Can traditional rule-based methods cope with such a transition, or does the future lie solely with machine learning?

Page 3: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

The ACE program

“A program to develop technology to extract and characterise meaning from human language”

Aims:• produce structured information about entities,

events and the relations that hold between them• promote design of more generic systems rather

than those tuned to a very specific domain and text type (as with MUC)

Page 4: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

The ACE tasks

• Identification of entities and classification into semantic types (Person, Organisation, Location, GPE, Facility)

• Identification and coreference of all mentions of each entity in the text (name, pronominal, nominal)

• Identification of relations holding between such entities

Page 5: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

<entity ID="ft-airlines-27-jul-2001-2" GENERIC="FALSE" entity_type = "ORGANIZATION"> <entity_mention ID="M003" TYPE = "NAME" string = "National Air Traffic Services"> </entity_mention> <entity_mention ID="M004" TYPE = "NAME" string = "NATS"> </entity_mention> <entity_mention ID="M005" TYPE = "PRO" string = "its"> </entity_mention> <entity_mention ID="M006" TYPE = "NAME" string = "Nats"> </entity_mention> </entity>

Page 6: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

The MACE System

• Rule-based NE system developed within GATE, adapted from ANNIE

• PRs: tokeniser, sentence splitter, POS tagger, gazetteer, semantic tagger, orthomatcher, pronominal and nominal coreferencer

• Also: genre ID, switching controller to select different PRs automatically

Page 7: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Page 8: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Differences between ANNIE and MACE

• Locations Location / GPE• GPEs have roles (GPE, Per, Org, Loc)• New type Facility (subsumes some Orgs)• Metonymy means context is necessary for

disambiguation (e.g. England cricket team vs England country)

• No Date, Time, Money, Percent, Address, Identifier

Page 9: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

What does this mean in practical terms?

• Separation of specific from general information makes adaptation easier

• Reclassification of gazetteers unnecessary

• Changes mainly to semantic grammars to

- use different gazetteer lookups

- use more contextual information

- group rules together differently

Page 10: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Semantic Grammars

• ANNIE uses 21 phases, 187 rules, 9 entity types (av. 20.8 rules per entity type)

• MACE uses 15 phases, 180 rules, 5 entity types (av. 36 rules per entity type)

• The important factor is the increased complexity of new rules, rather than the number

• Rules may be hand-crafted, but an experienced JAPE user can write several rules per minute

• 6 weeks for adaptation

Page 11: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Evaluation (1)

Text Precision Recall Fmeasure

ACE 82.4 82 82.2

MUCENAMEX

only

89 90 89.5

Page 12: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Evaluation (2)

• NEWS – 92 articles (business news)

• ACE – 86 broadcast news from September 2002 evaluation

• Difference on ACE task

• MACE on MUC-style annotations – GPEs are left as GPE (so count as errors)– GPEs are mapped to Locations

Page 13: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Comparison of ANNIE vs MACE

0

10

20

30

40

50

60

70

80

90

100

ANNIE-Ace ANNIE-News MACE-Ace MACE-News

System

Precision

Recall

Fmeasure

72% Precision, 84% Recall if GPEs mapped to Locations

Page 14: Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.

Conclusions

• MACE is a rule-based NE system, in contrast with most systems which use ML.

• Advantages that doesn’t require much training data, and is fast to adapt because of its robust design

• If large amounts of training data are available, HMM-based systems tend to perform slightly better

• Rule-based systems tend to be good at recall but sometimes low on precision unless supported additionally by ML methods