GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between...

61
University of Sheffield, NLP GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK

Transcript of GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between...

Page 1: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

GATE: Bridging the Gap between Terminology and

Linguistics

Diana Maynard

University of Sheffield, UK

Page 2: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Why do terminologists need GATE?• Terminologists face the problem of lack of suitable tools to

process their data.• Lots of in-house tools for doing individual things• Lack of common tools that can be used collaboratively

and across different systems and domains.• Tools must be flexible, robust and able to adapt to

different processing tasks and languages• GATE and its components are a key tool in today's world of

information and data overload• Enable users to perform tasks such as document

management, business intelligence, information retrieval, question answering, and knowledge indexing, modelling and conceptualisation.

Page 3: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

GATE can help terminologists:

• Save time and money on management of text and data from multiple sources

• Find hidden links scattered across huge volumes of diverse information

• Integrate structured data from variety of sources

• Interlink text and data

• Collect information and extract new facts

Page 4: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

A vision for text mining

• It is difficult to access unstructured information efficiently

• IE automates extraction of facts from text at reasonable accuracy and cost, increasing the value and utility of unstructured content

• Interlinking of text and data enables more efficient search, navigation and querying

• Text analysis is a matter of engineering: GATE offers practical solutions able to match specific requirements

Page 5: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Threat tracking application

Page 6: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Text mining and semantic annotation

• Extract structured data from text by

– Linking references to entities – Linking entities to their semantic descriptions

• Automatic semantic annotation based on IE technology

• Attaches metadata to documents, which can be used for searching and hyperlinking

• Adds value to content of libraries, enabling user interaction with content

• Enhanced capability for cross-referencing and dynamic document classification

Page 7: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Semantic Annotation

Page 8: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Semantic Annotation of Entities

• Recognition of the type of the entities in the text from a rich taxonomy of classes

• Reference to their semantic description.

• Traditional NE recognition approach results in: <Person>Lama Ole Nydahl</Person>

• Semantic Annotation of NEs results in:<ReligiousPerson ID=“http://..kim/Person111111”>Lama Ole Nydahl

</ReligiousPerson>

Page 9: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

GATE: the Swiss Army Knife of NLP• Has an attachment for almost

every eventuality

• Some are hard to prise open

• Some are useful, but you might have to put up with a bit of clunkiness in practice

• Some will only be useful once in a lifetime, but you're glad to have them just in case.

• There are many imitations, but nothing like the real thing.

Page 10: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

History of GATE• early 1990s: you want me to write that all over again?• 1995-7: first GATE (and "large-scale IE") project• 1996: GATE 1: Tcl/Tk, Perl, C++, ...• 2002: release of completely rewritten version 2, 100%

Java• 2009: mature ecosystem with established community

– Tens of thousands of research users– 25,000 downloads per year– commercial users getting serious

Page 11: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

GATE is very eco-friendly!

Page 12: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

GATE commercial usersTypical commercial uses:

• dynamic search and indexing of repositories

• finding relations between elements in distributed repositories

• aggregating information from different text sources

• populating repositories

• fact finding from distributed knowledge sources

Typical users:

• Pharmaceutics, news, intelligence (business, competitor, government, etc.), manufacturing, telecommunications

Page 13: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 14: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 15: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 16: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 17: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 18: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 19: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 20: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 21: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

So what exactly is GATE?

An architecture: A macro-level organisational picture for HLT software systems.

A framework: For programmers, GATE is an object-oriented class library that implements the architecture.

A development environment: For language engineers, computational linguists et al, a graphical development environment.

A community of users and contributors

Page 22: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

                                                                                                                           

Architectural principlesNon-prescriptive, theory neutral

(strength and weakness)

Re-use, interoperation, not reimplementation (e.g. diverse XML support, integration of Protégé, Jena, Yale...)

(Almost) everything is a component, and component sets are user-extendable

(Almost) all operations are available both from API and GUI

Page 23: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

In short…GATE includes:

• components for language processing, e.g. parsers, machine learning tools, stemmers, IR tools, IE components for various languages...

• tools for visualising and manipulating text, annotations, ontologies, parse trees, etc.

• various information extraction tools

• evaluation and benchmarking tools

Page 24: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Algorithms + Data + GUI = Applications

GATE components are one of three types:Language Resources (LRs), e.g. lexicons, corpora,

ontologiesProcessing Resources (PRs), e.g. parsers, generators,

taggersVisual Resources (VRs), i.e. visualisation and editing

components Algorithms are separated from the data, which

means:– the two can be developed independently by users with

different expertise.– alternative resources of one type can be used without

affecting the other, e.g. a different visual resource can be used with the same language resource

Page 25: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

But isn’t GATE just about IE?

• Many people think of GATE as an IE tool• IE is its primary function, but it also does a lot more• Pretty much kind of linguistic processing can be done

in GATE• The only field we really don't cover is Machine

Translation, but you could easily add components for that if you wanted

• More about the other functionality later, but now back to IE...

Page 26: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Two Approaches to IE

Knowledge Engineering rule based developed by

experienced language engineers

make use of human intuition

obtain marginally better performance

development could be very time consuming

some changes may be hard to accommodate

Learning Systems use statistics or other

machine learning developers do not need

LE expertise requires large amounts of

annotated training data some changes may

require re-annotation of the entire training corpus

Page 27: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Named Entity Recognition• Named Entity recognition is the cornerstone of IE• Identification of proper names in texts, and

classification into a set of predefined categories of interest.

• Three universally accepted categories: person, location and organisation

• Other common tasks: recognition of date/time expressions, measures (percent, money, weight etc), email addresses etc.

• Other domain-specific entities: names of drugs, medical conditions, names of ships, bibliographic references etc.

Page 28: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

ANNIE• ANNIE is GATE's rule-based IE system• It uses the language engineering approach (though

we also have tools in GATE for ML)• Distributed as part of GATE• Uses a finite-state pattern-action rule language, JAPE • More on JAPE later.....• ANNIE contains a reusable and easily extendable set

of components:– generic preprocessing components for

tokenisation, sentence splitting etc– components for performing NE on general open

domain text

Page 29: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

ANNIE Modules

Page 30: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Unicode Tokeniser• Bases tokenisation on Unicode character classes • Language-independent tokenisation • Declarative token specification language, e.g.:"UPPERCASE_LETTER" LOWERCASE_LETTER"* > Token; orthography=upperInitial; kind=word• Identifies words, numbers, spaces, different classes of punctuation, orthography• Recognition deliberately basic so that

− more powerful tools (JAPE) can be used for finer distinctions

− greater reuse possibilities

Page 31: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Gazetteer• Set of lists compiled into Finite State Machines • 60k entries in 80 types• List entries are matched in the text as Lookup

annotations• Each list has some pre-defined features, which enable

different kinds of matches to be identified• Additional arbitrary features and values can be added to

individual list entries• Entries can be matched according to root forms, or more

flexibly based on e.g. edit distance

Page 32: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 33: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Limitations of gazetteers

• Gazetteer lists are designed for annotating simple, regular features

• Some flexibility is provided, but this is not enough for most tasks

• Recognising e-mail addresses using just a gazetteer would be impossible

• But combined with other linguistic pre-processing results, we have a whole lot of annotations and features

• POS tags, capitalisation, punctuation, lookup features, etc can all be combined to form patterns suggesting more complex information

• Luckily, we have JAPE to take care of this.

Page 34: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

What is JAPE?

• a Jolly and Pleasant Experience• Specially developed pattern matching language for

GATE• Each JAPE rule consists of

– LHS which contains patterns to match– RHS which details the annotations (and

optionally features) to be created• JAPE rules combine to create a phase• Rule priority based on pattern length, rule status

and rule ordering • Phases combine to create a grammar

Page 35: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Named Entity Grammars • Hand-coded rules written in JAPE applied to

annotations to identify NEs • Phases run sequentially and constitute a cascade of

FSTs over annotations • Annotations from format analysis, tokeniser. splitter,

POS tagger, morphological analysis, gazetteer etc.• Because phases are sequential, annotations can be

built up over a period of phases, as new information is gleaned

• Standard named entities: persons, locations, organisations, dates, addresses, money

• Basic NE grammars can be adapted for new applications, domains and languages

Page 36: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

JAPE exampleUniversity of SheffieldRule: namedUniversity ( {Token.string == "University"} {Token.string == "of"}

({Lookup.minorType == city} | ({Token.category == NNP})+ )

):orgName --> :orgName.Organisation = {kind = "university", rule = "namedUniversity"}

• Looks for specific words “University of” followed by:

– city name from gazetteer, or– one or more proper nouns

Page 37: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Combining existing annotationsAssociate a company with a share price

e.g. Whitbread shares closed up 2p at 645p.

Phase: SharesInput: Token Organization Lookup Money PercentOptions: control = appeltRule:ShareChange( {Organization} ({Token})[0,3] {Lookup.majorType=="change"} ({Token})[0,3] ({Money}|{Percent})):change --> :change.ShareChange = {rule = "ShareChange"}

Page 38: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Orthomatcher• Orthographic coreference between annotations in

the same document, e.g. Mr Brown, James Brown• Matching rules are invoked between annotations

of the same type, or between an existing annotation and an “Unknown” annotation

• The latter is the only case where an annotation type can be changed

• Lookup tables of aliases and exceptions (i.e. overriding of matching rules)

• Also PRs for pronominal and nominal coreference

Page 39: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

What about other languages?• Since we're based in Sheffield, you can't blame us for

developing GATE primarily for English• But contrary to popular belief about the British, we

don't hate all foreigners!• And we have lots of capabilities for processing in other

languages

• Currently systems for English, French, German, Romanian, Bulgarian, Russian, Cebuano, Hindi, Chinese, Arabic

• You have a POS tagger for Swahili? Just add it as a plugin and combine it with existing tokeniser etc.

Page 40: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

It's all Chinese to me....

Page 41: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Processing multiple languages• If you have a language

identifier PR, you can combine processing of texts in different languages in a single application

• The system will choose the right PRs for each document or document section

• Conditional application fires a PR if some condition is met

Page 42: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Other plugins• Parsers (Stanford, MiniPar, RASP, SUPPLE)

• More flexible gazetteers

• Specialised NE (Chemistry, Biomedicine, etc)

• PRs for other languages, Alignment

• Lemmatisers, morphological analyser, NP and VP chunkers

• Machine Learning

• Evaluation toolkit including IAA

• IR, Google and Yahoo search engines, web crawlers

• WordNet

• Whole host of ontology-based tools

Page 43: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Alignment plugin

Page 44: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

GATE in use

• We have dozens of applications, not all just research projects!

• A few examples.....

Page 45: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Semantic Annotation• Adding information to documents that is usable by

machines to enable better presentation, navigation or searching, e.g. Perseus:

Page 46: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 47: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Indexing news at the BBCBBC Archives: 'Newsnight' archiving time is 8 hours per

hour

Automatic transcription to extract some potential indexing terms

• Result: temporally precise, but very noisy dataPartial solution: search the web, intranet, digital library for

related pages, and process with IE/SA• Result: less noisy but temporally imprecise

So we merge this information with the speech signal data• Result: works well for easy stuff (high precision, low

recall)

Page 48: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 49: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Ontology linking at FAO• FAO have sets of fisheries-related ontologies, e.g.

Gear, species, fishing areas• No way to link between them using ontology alignment

techniques, because we require information external to the ontology (fish lives in a particular area)

• NLP techniques make use of information from documents which provide this missing link

• Not always an exact match between text and the ontology elements, e.g. Mummichogs vs. fundulus heteroclitus

• Use techniques such as headword matching, noun phrase chunking, synonym and acronym finding, etc

• Find relations in the text to link the entities together

Page 50: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Ontology linking at FAO

Fishing Gear Fishing Area

Species

Commodities

caught_by found_in

basis_of

Page 51: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Matching text descriptions• Find NPs and terms; use OntoRootGazetteer to find

morphological variants of ontology elements, perform headword and synonym matching etc.

• “Pelagic species, mainly fish and cephalopds , northern shrimp (also small crustaceans, krill”

• Match text span to ontology instance, retaining URIs• Create annotations and features, e.g. caught_by =

{gear_type = midwater otter trawlstarget_species = cephalopods}

• Convert to RDF triples

Page 52: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Page 53: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

53

University of Sheffield, NLP

Using ANNIC to view results

Page 54: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Outsmarting our competitors

Page 55: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

If you can't beat 'em, join 'em

• UIMA

• OpenCalais

• Lingpipe

All integrated into

GATE as plugins

Page 56: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

UIMA• UIMA is an NL engineering platform developed by IBM

• Shares some functionality with GATE, but is complementary in most respects.

• Interoperability layer has been developed to allow UIMA applications to be run within GATE, and vice versa, in order to combine elements of both.

• Emphasis is on architectural support, including asynchronous scaleout (deploying many copies of an application in parallel)

• Much narrower range of resources provided than GATE

http://incubator.apache.org/uima/

Page 57: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

OpenCalais

• Web service for semantic annotation of text.

• The user submits a document to the web service, which returns entity and relations annotations in RDF, JSON or some other format.

• Typically, users integrate OpenCalais annotation of their web pages to provide additional links and ‘semantic functionality’.

• OpenCalais annotates both relations and entities, although the GATE plugin only supports entities.

http://www.opencalais.com

Page 58: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

LingPipe

• Provides set of IE and data mining tools largely ML-based. Has a set of models trained for particular tasks/corpora.

• Limited ontology support: can connect entities found to databases and ontologies

• Advantage: ML models can suggest more than one output, ranked by confidence. The user can choose number of suggestions generated.

• Disadvantage: ML models only apply to specific tasks and domains.

http://alias-i.com/lingpipe/index.html

Page 59: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

In summary...• We like to think GATE is the best thing since sliced

bread for most NLP and terminology tasks

• You can use it for plenty of other things too, don't let us stop you being creative!

• Incorporates huge number of plugins, is easily extendable and highly customisable

• The only limit is your imagination...

• So if you're now convinced you can't live without GATE, there are two possibilities:

– ask us to get involved with a project– try GATE yourself

Page 60: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

Get your own hands dirty

• We run 3x yearly training courses in Sheffield and other selected locations

• Different tracks available

• GATE certification available

Page 61: GATE: Bridging the Gap between Terminology and Linguistics · GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK. University of Sheffield,

University of Sheffield, NLP

More info, contact details, demos, publications: http://gate.ac.uk

Now it's time to nudge your neighbour if they are asleep....

Or ask that burning question about GATE.