Word Tagging using Max Entropy Model and Feature selection
-
Upload
yomna-mahmoud-ibrahim-hassan -
Category
Science
-
view
329 -
download
0
Transcript of Word Tagging using Max Entropy Model and Feature selection
1
Word Tagging using Max Entropy Model
and Feature selection
NLP Final Project
Advanced NLP Pre-PhD Course
Submitted to:
Prof.Dr. Ali Fahmy
Prof.Dr. Ali Farghali
Submitted by:
Eman Negm
Marwa Mostafa
Wessam Sayed
Yomna Mahmoud
Yosr Eman
2
Contents
Project Introduction and Motivation ...................................................... 3
What is Maximum entropy model? ...................................................... 3
Why Maximum Entropy in NLP? ........................................................... 3
Why we are concerned about POS Tagging? ........................................ 3
Tools ....................................................................................................... 4
Methodology .......................................................................................... 4
Corpus Selection: ................................................................................. 4
Part-of-Speech Tags Selection: ............................................................. 4
Indicators Definition: ............................................................................ 5
MaxEntropy learning module: .............................................................. 5
Results and Analysis ................................................................................ 6
For the Noun Phrase tag: ..................................................................... 6
Sample of Selected Features: ............................................................ 6
For the Verb Phrase tag: ....................................................................... 7
Sample of Selected Features: ............................................................ 8
For the Adjective tag: ........................................................................... 8
Sample of Selected Features: ............................................................ 9
For the Adverb tag: .............................................................................. 9
Sample of Selected Features: .......................................................... 10
For the Pronoun tag: .......................................................................... 10
Sample of Selected Features: .......................................................... 11
For all tags .......................................................................................... 11
References ............................................................................................ 12
3
Project Introduction and Motivation
What is Maximum entropy model?
It is an information theory tool that is utilized to construct a
model from partially available data. When trying to model some
unknown events, we choose the one that has Maximum Entropy.
Why Maximum Entropy in NLP?
MaxEnt has been applied successfully in various fields including
NLP. Previous work similar to our work has been presented over
the years [1, 2, 8].
Why we are concerned about POS Tagging?
Part-of-Speech (POS) tagging is the task of understanding the
place of everywhere in the sentence based on its definition and
context. POS tagging helps the computer to better distinguish
words grammatically and correctly understand sentences.
4
Tools In this project we have been using NLTK (Natural Language
Toolkit), which is a python based toolkit. Our work is windows
based. We have used Python 3 as the base for our code.
Methodology Our work has been divided into five main steps:
1. Corpus Selection.
2. Part-of-Speech tags selection.
3. Indicators definition.
4. MaxEntropy learning module.
5. Running the algorithms on different selected features and
performing analysis (This step will be mentioned in details
in the following section).
We will go through each step and describe it in details.
Corpus Selection:
Our focus was to select a tagged corpus so that we can compare
our results to the existing tags to validate our work. We have
selected a part of a very known corpus that has been used
heavily in Natural Language research called “The Brown
Corpus” [3] (90% training & 10% test data).
Part-of-Speech Tags Selection:
After selecting the tagged corpus, we went through the selection
of tags. We selected the tags for the most common words, the
5
main selected tags are (Nouns, Verbs, Pronouns, Adjectives and
Adverbs). Through each main tag, we have set of subtags.
Description of the subtags are presented at the University of
Leeds website [4].
Indicators Definition:
For each tag from the defined above we have defined a set of
Indicators which represents the appearance of the tagged POS.
For example, we have defined that a verb, in the present tense,
3rd person singular -> ends with s or es, the word before it is an
adverb or noun.
These set of indicators have been collected from various
online resources and from our knowledge of English
grammar.
Matching the indicators was done through regular
expressions.
MaxEntropy learning module:
We have utilized the “Classify package” in NLTK. It is a
classifier model based on a maximum entropy modeling
framework. For learning the weights in the model, we have
utilized the Improved Iterative Scaling algorithm (IIS).
6
Results and Analysis We trained each tag alone, and all tags together using 100
iteration. The features have been selected according to our
research for the noun features, and our knowledge about the
English language. Following sections describe the results of the
implemented tags:
For the Noun Phrase tag:
The noun phrase tag consists of many features, we implemented
in our project 15 features. The result on 100 iteration was as
following:
Length of features set: 18438
Testing data size: 1843
Total Accuracy: 0.9696147585458491
Sample of Selected Features:
1. Nouns have determiners before them like: a, an, the, this,
that, these, those, some, many, their, one, two, three,
several.
2. Nouns may be singular or plural.
One book five books
One map several maps
One tooth three teeth
One box six boxes
One girl many girls
One child eight children
7
3. Nouns can own or be owned (can be possessive).
Frank’s bike is a ten-speed.
The window’s pane was frosted.
The duck’s pond was cloudy with muck.
The dog’s fur was curly and coarse.
4. Nouns can be formed from other words by using noun
suffixes such as:
-ation imagine + ation = imagination (information,
creation, suffocation, inspiration).
-ism capital + ism = capitalism (Mormonism, Catholicism,
idealism, realism, pessimism).
-ment assign + ment = assignment (arrangement,
encampment, enlargement, judgement)
-ness lonely + ness = loneliness (sadness, happiness,
painlessness, graciousness)
-ance accept + ance = acceptance (distance, penance,
repentance, romance)
For the Verb Phrase tag:
The verb phrase tag consists of four main sections: the DO
verbs, the BE verbs, the HAVE verbs, and the fourth section
consists of every other verb. The result on 100 iteration was as
following:
Length of features set: 8287
8
Testing data size: 828
Total Accuracy: 0.9975845410628019
Sample of Selected Features:
1. Some verbs end with s or es, and word before it is adverb
or noun.
E.g. Mohammed plays Football, Mona Amazingly handled
the situation.
2. Some verbs in the infinitive tense ends or start with certain
morphemes, such as: Ending with ate, ify, en, ize; starting
with en, em, re, over,sub, mis ,un.
E.g. Criticize, Modify.
For the Adjective tag:
The adjective tag consists of many features. We implemented in
this project 10 features.
The following figure shows the result for 100 iterations.
Length of features set: 4686
Testing data size: 468
Total Accuracy: 0.9807692307692307
The features have been selected according to our research for
various a features, and our knowledge about the English
language [5].
9
Sample of Selected Features:
1. Words ending in \-able" or \-ible" with a verb base are
tagged as adjective.
Example: adorable, agreeable
2. Another good indicator of adjectives is if it is a
comparative. We test this by determining if the word ends
is a superlative ending in \-er" or \-est".
Example: warmer, warmest, harder, shortest, smallest
3. Words ending in \ful are tagged as adjective.
Example: awful, beautiful, colorful.
For the Adverb tag:
Adverbs are divided to many categories [6]. The largest category
is called “manner adverbs”, most of the words in this category
are derivative –ly adverbs (e.g. quickly, bravely, happily). Other
categories like the comparative category (e.g. earlier, better,
later, higher), superlative category (e.g. highest, uppermost,
nearest), particle category (e.g. over, on, in, about, through).
Brown Corpus implemented 10 tags to support the different
adjective categories. We implemented in this project 10 features
to recognize the above tags. The following figure shows the
result for 100 iterations.
Length of features: 2742
Testing data size: 274
Total Accuracy: 0.9416058394160584
10
Sample of Selected Features:
1. Feature that represents manner adverbs by recognized the
words end with ‘-ly’.
Example: quickly, happily.
2. Feature that represents comparative adverbs by recognized
the words end with ‘-er’.
Example: earlier, better.
3. Feature that represents superlative adverbs by recognized
the words end with ‘-est’.
Example: highest, uppermost.
For the Pronoun tag:
Pronouns can be divided into several categories: personal,
indefinite, reflexive, reciprocal, possessive, demonstrative,
interrogative and relative [7]. We discussed in this project 24
features. The following figure shows the result for 100
iterations.
Length of features: 3400
Testing data size: 340
Total Accuracy: 0.8147058823529412
11
Sample of Selected Features:
1. Feature that represents singular, reflexive pronoun.
Example: itself, himself, myself, yourself and ownself.
2. Feature that represents plural pronoun.
Example: themselves, ourselves and yourselves
3. Feature that represents personal, accusative pronoun.
Example: them, it, him, me, us, you, 'em, her and we'uns.
4. Feature that represents personal, nominative, 3rd person
singular pronoun.
Example: he, she and thee.
For all tags
The below table describes the average results over all tags:
Tags Total Feature Set
Length
Testing Feature Set
Length (10%) Average
Noun 18438 1843 96.96%
Verb 8287 828 99.75%
Adjective 4686 468 98.07%
Adverb 2742 274 94.16%
Pronoun 3400 340 81.47%
12
References [1] Nugues, Pierre M. "Part-of-Speech Tagging Using Statistical
Techniques."Language Processing with Perl and Prolog. Springer Berlin
Heidelberg, 2014. 223-251.
[2] Ratnaparkhi, Adwait. "A maximum entropy model for part-of-speech
tagging."Proceedings of the conference on empirical methods in natural language
processing. Vol. 1. 1996.
[3] Francis, W. Nelson, and Henry Kucera. "Brown corpus manual." Brown
University Department of Linguistics (1979).
[4] The brown corpus tag-set. Available at:
http://www.scs.leeds.ac.uk/ccalas/tagsets/brown.html
http://www.uefap.com/writing/feature/complex_noun.htm
[5] S.Malik. "PARSING JAVA METHOD NAMES FOR IMPROVED
SOFTWARE ANALYSIS." Spring 2011
[6] Nancarrow, Owen, and Eric Atwell. "A comparative study of the tagging of
adverbs in modern English corpora." Proceedings of Corpus Linguistics
2007(2007).
[7] Börjars, Kersti; Burridge, Kate. “Introducing English grammar (2nd ed.)”.
London: Hodder Education. pp. 50–57. ISBN 978-1444109870. (2010).
[8] Malecha, Gregory, and Ian Smith. "Maximum Entropy Part-of-Speech Tagging
in NLTK." unpublished course-related report: http://www. people. fas. harvard.
edu/gmalecha (2010).