GATE: an AKT success story [GATE: open source language technology component architecture and many...

12
GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] http://gate.ac.uk/ http://nlp.shef.ac.uk/ Hamish Cunningham Kalina Bontcheva Yorick Wilks Southampton, January 2004 1. New GATE-related projects 2. Current state of the system 3. Future plans

Transcript of GATE: an AKT success story [GATE: open source language technology component architecture and many...

Page 1: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

GATE: an AKT success story[GATE: open source language technology component architecture and many tools, with a number of AKT roles]

http://gate.ac.uk/ http://nlp.shef.ac.uk/

Hamish CunninghamKalina Bontcheva

Yorick WilksSouthampton, January 2004

1. New GATE-related projects2. Current state of the system3. Future plans

Page 2: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

2(12)

New Projects• SEKT: €9m IP with BT, AIFB, JSI, Empolis,

SAI, OntoPrise, ISOCO, UB, Kea-Pro

• PrestoSpace – €9m IP with BBC, RAI, ORF, INA, ...: preservation of audio-visual media

• KnowledgeWeb – NoE successor to OntoWeb

• ETCSL – GATE for humanities scholars

• hTechSight – petrochem tech oversight

• SWAN – large-scale semantic annotation

Page 3: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

3(12)

HumanLanguage

Formal Knowledge(ontologies andinstance bases)

(MI)IE

CLIE

(M)NLG

ControlledLanguage

OBIE

SemanticWeb; Semantic Grid;Semantic Web Services

KEYMNLG: Multilingual Natural Language GenerationOBIE: Ontology-Based Information Extraction(MI)IE: Mixed-Intiative IECLIE: Controlled Language IE

SEKT: large-scale DM + robust HLT for NGKM

Page 4: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

4(12)

SEKT: Evaluating Semantic Tagging• Need for new metrics

when evaluating hierarchy/ontology-based NE tagging

• Need to take into account distance in the hierarchy

• Tagging a company as a charity is less wrong than tagging it as a person

• Several SEKT-related initiatives (w/s at ECAI; Pascal network)

Page 5: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

5(12)

PrestoSpace• Cultural Heritage / Digital Libraries IP• BBC, RAI, ORF, INA, B&G, USFD, and 23

others (!)• 20th Century Rot: rapid disappearance of audio-

visual media• Preservation and digitisation is high cost• Therefore we need rich metadata and semantic

access• Little training data, open domain: FSTs for users• Follows MUMIS and other projects• Evaluation: TRECVID, OBIE

Page 6: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

6(12)

GATE Status (version 2½)• Stable core since end 2002

• Increasing numbers of users (next slide)

• Increasing numbers of languages (most recently: Chinese, Arabic, Russian, German system from DotKom)

• Increasing numbers of 3rd party components (e.g. Medline and UMLS work, OBIE/KIM, QA, summarisation, ...)

• Embedded in KM applications

Page 7: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

7(12)

A bit of a nuisance (GATE users)GATE team projects.

Past:• MUMIS: semantic index of sports video• MUSE, cross-genre entitiy finder• HSL, Health-and-safety IE• Old Bailey: collaboration with HRI on

17th century court reports• Multiflora: plant taxonomy text analysis

for biodiversity research e-science• EMILLE: S. Asian languages corpus• ACE / TIDES: Arabic, Chinese NE

Present:• Advanced Knowledge Technologies• SEKT: next-generation KM• PrestoSpace: audiovisual preservation)• KnowledgeWeb: semantic web network• h-TechSight: technology oversight• ETCSL: Sumerian language corpus• SWAN: Semantic Web Annotator• MiAKT: medical informatics KM

Thousands of users at hundreds of sites (based on survey of 4,700 downloaders). A representative sample:

• the American National Corpus project • the Perseus Digital Library project, Tufts

University, US• Greenstone digital library, NZ• Longman Pearson publishing, UK• Merck KgAa, Germany• Canon Europe, UK• Knight Ridder, US• BBN (leading HLT research lab), US• SMEs inc. Sirma AI Ltd., Bulgaria• Imperial College, London, the University of

Manchester, UMIST, Vassar College, the University of Southern California and a large number of other UK, US and EU Universities

• UK and EU projects inc.MyGrid, CLEF, DotKom, AMITIES, Cub Reporter, EMILLE, Poesia...

Page 8: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

8(12)

Some new stuff• Johns Hopkins w/s on Semantic Annotation:

BNC-based corpus, ME expts• WEKA 2 release (JSI library integration soon)• papers: RANLP, ISWC, Journal of Digital

Libraries, Journal of Data and Knowledge Eng.• JWS editorial board; co-editor JNLE special• RANLP IE tutorial, tutorial on HLT/SW at ESWS• HLT/SW evaluation workshop at ECAI• OBIE in Multiflora, hTechsight• SW NLG in MiAKT (below)

Page 9: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

9(12)

MIAKT – NLG for SWRDF input from image annotation GUI...

...generated text

MIAKT has important productivity and accuracy implications

Page 10: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

10(12)

hTechSight tech oversight• Ontology-Based IE (OBIE) for semantic

tagging of job adverts, news and reports in chemical engineering domain

• Aim is to track technological change over time

• Centred around domain-specific ontology

• Terminological gazetteer lists are linked to classes in the ontology

• Rules classify the mentions in the text wrt. the domain ontology

• Annotations output to DB or RDF

Page 11: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

OBIE in MultiFlora 2 Combining Information Extraction and Knowledge

Representation for Biodiversity Informatics

BBSRC project led by Mary McGee Wood, U. Mcr.

Varyingplanttaxa

Merged RDF

Page 12: GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] //gate.ac.uk

12(12)

GATE 4: the Final Conflict• (GATE 3 release happening soonish)• Continuity guaranteed for AKT phase 2 (€2 million

GATE-related work 2004-2007)• Some future elements:

– more and better OBIE, inc. cross-doc co-reference– pluggable OWL repository support (now only Sesame; soon

3Store, KAON)– large- and huge-scale processing– standardisation of the component integration model

(ECLIPSE)– service-based integration (“SDK” SW API)

• This talk: http://gate.ac.uk/sale/talks/akt-jan04.ppt

• What else? You tell us...