BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter...

13
BY TSHISHONGA AW 2859268 11/04/08 1 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT PARTS-OF-SPEECH TAGGER

Transcript of BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter...

Page 1: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

BY TSHISHONGA AW 2859268

11/04/08 1

Co-Supervisor : Mr Reg Dodds

Supervisor :Professor I.M Venter

APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT PARTS-OF-SPEECH TAGGER

Page 2: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

Part-of-speech (POS) tagging is the process of assigning words their part of speech tag.

A part of speech tag is a label i.e. Noun, Verb , Adjectives, etc.

POS is done by looking at the relationship with adjacent words.

A simplified form is taught to school children. The Venda language has unique diacritics.

INTRODUCTION

Page 3: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

11/04/08

A Venda translator

A generic tagger

Best Solution

A parts-of-speech tagger that allows the user to change tags to solve for ambiguity of tags. Compute initial Hidden Markov Models(HMMs) .Compute test data

Very ambitious

Still ambitious

THE DEVELOPMENT PROCESS

Page 4: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

R

Abney, S. Part-of-speech tagging and partial parsing.

Brill, E. A simple rule-based part-of-speech tagger.

Prez, L. C. IEEE information theory society newsletter.

Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. pp. 246–253.

Shannon, C. E. A mathematical theory of communication.

11/04/08

Research User Requirements

Prototype

REQUIREMENT ANALYSIS

Page 5: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

• For all the code

• For all the databases

11/04/08

IMPLEMENTATION TOOLS

Page 6: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

11/04/08

GUI was criticized

•Change GUI

Displaying diacritics on the

GUI•Use Dejavu fonts

MySql Database not writing diacritics

•Change the Character encoding•Use a flat file.

PROBLEMS WITH THE PROTOTYPE

Page 7: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

USER OCCUPATION

USER 1 MSc STATS

USER 2 BSc Honors

USER 3 BSc Microbiology

USER 4 UCT MSc Sociology

USABILITY TESTING

Page 8: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

USABILITY TESTING

Page 9: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

11/04/08 9

USABILITY TESTING

Page 10: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

First Screen File Menu

◦ Open a file◦ Exit

View Menu◦ Word frequency◦ Count words

Edit ◦ Clear

Help

Second Screen File Menu

◦ Save a file◦ Exit

View Menu◦ Word frequency◦ Count words

Edit ◦ Word model

11/04/08

USER’S GUIDE

Page 11: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

[1] Abney, S. Part-of-speech tagging and partial parsing. In Corpus-Based Methods in Language and Speech (Dordrecht, 1996), K. Church, S. Young, and G. Bloothooft, Eds., Kluwer Academic Publishers.

[2] Brill, E. A simple rule-based part-of-speech tagger. In Proceedings of ANLP-92, 3rd Conference on Applied Natural Language Processing(Trento, IT, 1992), pp. 152–155.

11/04/08

REFERENCES

Page 12: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

[3] Prez, L. C. Ieee information theory society newsletter. ISSN 105 53, 04(2003), pp1–10.

[4] Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics (Morristown, NJ, USA, 1997), Association for Computational Linguistics, pp. 246–253.

[5] Shannon, C. E. A mathematical theory of communication. The Bell System Technical (1948), pp1–12.

11/04/08

REFERENCES

Page 13: BY TSHISHONGA AW 2859268 11/04/081 Co-Supervisor : Mr Reg Dodds Supervisor :Professor I.M Venter APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT.

Open a file◦ View User manual

Tagging a file.◦ Search for multiple occurrences of word.◦ Insert a diacritic.◦ Copy and paste.◦ Save a file

Exit the system

11/04/08

THE DEMO