BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter...
Transcript of BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter...
BY TSHISHONGA AW 2859268
11/04/08 1
Co-Supervisor : Mr Reg Dodds
•
Supervisor :Professor I.M Venter
•
APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT PARTS-OF-SPEECH TAGGER
Part-of-speech (POS) tagging is the process of assigning words their part of speech tag.
A part of speech tag is a label i.e. Noun, Verb , Adjectives, etc.
POS is done by looking at the relationship with adjacent words.
A simplified form is taught to school children. The Venda language has unique diacritics.
◦
INTRODUCTION
11/04/08
A Venda translator
A generic tagger
Best Solution
A parts-of-speech tagger that allows the user to change tags to solve for ambiguity of tags. Compute initial Hidden Markov Models(HMMs) .Compute test data
Very ambitious
Still ambitious
THE DEVELOPMENT PROCESS
R
Abney, S. Part-of-speech tagging and partial parsing.
Brill, E. A simple rule-based part-of-speech tagger.
Prez, L. C. IEEE information theory society newsletter.
Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. pp. 246–253.
Shannon, C. E. A mathematical theory of communication.
11/04/08
Research User Requirements
Prototype
REQUIREMENT ANALYSIS
• For all the code
• For all the databases
11/04/08
IMPLEMENTATION TOOLS
11/04/08
GUI was criticized
•Change GUI
Displaying diacritics on the
GUI•Use Dejavu fonts
MySql Database not writing diacritics
•Change the Character encoding•Use a flat file.
PROBLEMS WITH THE PROTOTYPE
USER OCCUPATION
USER 1 MSc STATS
USER 2 BSc Honors
USER 3 BSc Microbiology
USER 4 UCT MSc Sociology
USABILITY TESTING
USABILITY TESTING
11/04/08 9
USABILITY TESTING
First Screen File Menu
◦ Open a file◦ Exit
View Menu◦ Word frequency◦ Count words
Edit ◦ Clear
Help
Second Screen File Menu
◦ Save a file◦ Exit
View Menu◦ Word frequency◦ Count words
Edit ◦ Word model
11/04/08
USER’S GUIDE
[1] Abney, S. Part-of-speech tagging and partial parsing. In Corpus-Based Methods in Language and Speech (Dordrecht, 1996), K. Church, S. Young, and G. Bloothooft, Eds., Kluwer Academic Publishers.
[2] Brill, E. A simple rule-based part-of-speech tagger. In Proceedings of ANLP-92, 3rd Conference on Applied Natural Language Processing(Trento, IT, 1992), pp. 152–155.
11/04/08
REFERENCES
[3] Prez, L. C. Ieee information theory society newsletter. ISSN 105 53, 04(2003), pp1–10.
[4] Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics (Morristown, NJ, USA, 1997), Association for Computational Linguistics, pp. 246–253.
[5] Shannon, C. E. A mathematical theory of communication. The Bell System Technical (1948), pp1–12.
11/04/08
REFERENCES
Open a file◦ View User manual
Tagging a file.◦ Search for multiple occurrences of word.◦ Insert a diacritic.◦ Copy and paste.◦ Save a file
Exit the system
11/04/08
THE DEMO