Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1,...
-
Upload
buddy-washington -
Category
Documents
-
view
220 -
download
0
description
Transcript of Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1,...
![Page 1: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/1.jpg)
Machine Learning in GATE
Valentin Tablan
![Page 2: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/2.jpg)
2
Machine Learning in GATE
• Uses classification.[Attr1, Attr2, Attr3, … Attrn] Class
• Classifies annotations.(Documents can be classified as well using a
simple trick.)• Annotations of a particular type are
selected as instances.• Attributes refer to instance annotations.• Attributes have a position relative to the
instance annotation they refer to.
![Page 3: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/3.jpg)
3
Attributes
Attributes can be:– Boolean
The [lack of] presence of an annotation of a particular type [partially] overlapping the referred instance annotation.
– NominalThe value of a particular feature of the referred instance
annotation. The complete set of acceptable values must be specified a-priori.
– NumericThe numeric value (converted from String) of a particular
feature of the referred instance annotation.
![Page 4: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/4.jpg)
4
Implementation
Machine Learning PR in GATE.Has two functioning modes:
– training– application
Uses an XML file for configuration:<?xml version="1.0" encoding="windows-1252"?><ML-CONFIG>
<DATASET> … </DATASET><ENGINE>…</ENGINE>
<ML-CONFIG>
![Page 5: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/5.jpg)
5
<DATASET><DATASET><INSTANCE-TYPE>Token</INSTANCE-TYPE> <ATTRIBUTE> <NAME>POS_category(0)</NAME> <TYPE>Token</TYPE> <FEATURE>category</FEATURE> <POSITION>0</POSITION> <VALUES> <VALUE>NN</VALUE> <VALUE>NNP</VALUE> <VALUE>NNPS</VALUE> … </VALUES> [<CLASS/>] </ATTRIBUTE> …</DATASET>
![Page 6: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/6.jpg)
6
<ENGINE>
<ENGINE> <WRAPPER>gate.creole.ml.weka.Wrapper</WRAPPER> <OPTIONS> <CLASSIFIER>weka.classifiers.j48.J48</CLASSIFIER> <CLASSIFIER-OPTIONS>-K 3</CLASSIFIER-OPTIONS> <CONFIDENCE-THRESHOLD>0.85</CONFIDENCE-
THRESHOLD> </OPTIONS> </ENGINE>
![Page 7: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/7.jpg)
7
Attributes Position
Instances type: Token
![Page 8: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/8.jpg)
8
Machine Learning PR
• Can save a learnt model to an external file for later use.Saves the actual model and the collected dataset.
• Can export the collected dataset in .arff format.
![Page 9: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/9.jpg)
9
Standard Use ScenarioTraining• Prepare training data by
enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc).
• Run the ML PR in training mode.
• Export the dataset as .arff and perform experiments using the WEKA interface in order to find the best attribute set / algorithm / algorithm options.
• Update the configuration file accordingly.
• Run the ML PR again to collect the actual data.
• [ Save the learnt model. ]
Application• Prepare data by enriching the
documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc).
• [ Load the previously saved model. ]
• Run the ML PR in application mode.
• [ Save the learnt model. ]
![Page 10: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/10.jpg)
10
An Example
Learn POS category from POS context.
![Page 11: Machine Learning in GATE Valentin Tablan. 2 Machine Learning in GATE Uses classification. [Attr 1, Attr 2, Attr 3, … Attr n ] Class Classifies annotations.](https://reader036.fdocuments.in/reader036/viewer/2022082600/5a4d1b3f7f8b9ab0599a0398/html5/thumbnails/11.jpg)
11
Using Other ML LibrariesThe MLEngine InterfaceMethod Summary• void addTrainingInstance(List attributes)
Adds a new training instance to the dataset. • Object classifyInstance(List attributes)
Classifies a new instance. • void init()
This method will be called after an engine is created and has its dataset and options set.
• void setDatasetDefinition(DatasetDefintion definition) Sets the definition for the dataset used.
• void setOptions(org.jdom.Element options) Sets the options from an XML JDom element.
• void setOwnerPR(ProcessingResource pr) Registers the PR using the engine with the engine.