BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter...

Post on 01-Jan-2016

214 views 0 download

Tags:

Transcript of BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter...

BY TSHISHONGA AW 2859268

11/04/08 1

Co-Supervisor : Mr Reg Dodds

Supervisor :Professor I.M Venter

APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT PARTS-OF-SPEECH TAGGER

Part-of-speech (POS) tagging is the process of assigning words their part of speech tag.

A part of speech tag is a label i.e. Noun, Verb , Adjectives, etc.

POS is done by looking at the relationship with adjacent words.

A simplified form is taught to school children. The Venda language has unique diacritics.

INTRODUCTION

11/04/08

A Venda translator

A generic tagger

Best Solution

A parts-of-speech tagger that allows the user to change tags to solve for ambiguity of tags. Compute initial Hidden Markov Models(HMMs) .Compute test data

Very ambitious

Still ambitious

THE DEVELOPMENT PROCESS

R

Abney, S. Part-of-speech tagging and partial parsing.

Brill, E. A simple rule-based part-of-speech tagger.

Prez, L. C. IEEE information theory society newsletter.

Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. pp. 246–253.

Shannon, C. E. A mathematical theory of communication.

11/04/08

Research User Requirements

Prototype

REQUIREMENT ANALYSIS

• For all the code

• For all the databases

11/04/08

IMPLEMENTATION TOOLS

11/04/08

GUI was criticized

•Change GUI

Displaying diacritics on the

GUI•Use Dejavu fonts

MySql Database not writing diacritics

•Change the Character encoding•Use a flat file.

PROBLEMS WITH THE PROTOTYPE

USER OCCUPATION

USER 1 MSc STATS

USER 2 BSc Honors

USER 3 BSc Microbiology

USER 4 UCT MSc Sociology

USABILITY TESTING

USABILITY TESTING

11/04/08 9

USABILITY TESTING

First Screen File Menu

◦ Open a file◦ Exit

View Menu◦ Word frequency◦ Count words

Edit ◦ Clear

Help

Second Screen File Menu

◦ Save a file◦ Exit

View Menu◦ Word frequency◦ Count words

Edit ◦ Word model

11/04/08

USER’S GUIDE

[1] Abney, S. Part-of-speech tagging and partial parsing. In Corpus-Based Methods in Language and Speech (Dordrecht, 1996), K. Church, S. Young, and G. Bloothooft, Eds., Kluwer Academic Publishers.

[2] Brill, E. A simple rule-based part-of-speech tagger. In Proceedings of ANLP-92, 3rd Conference on Applied Natural Language Processing(Trento, IT, 1992), pp. 152–155.

11/04/08

REFERENCES

[3] Prez, L. C. Ieee information theory society newsletter. ISSN 105 53, 04(2003), pp1–10.

[4] Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics (Morristown, NJ, USA, 1997), Association for Computational Linguistics, pp. 246–253.

[5] Shannon, C. E. A mathematical theory of communication. The Bell System Technical (1948), pp1–12.

11/04/08

REFERENCES

Open a file◦ View User manual

Tagging a file.◦ Search for multiple occurrences of word.◦ Insert a diacritic.◦ Copy and paste.◦ Save a file

Exit the system

11/04/08

THE DEMO