Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL
Transcript of Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL
![Page 1: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/1.jpg)
DESIGNING POS TAG SET FOR KANNADA
Presented by:Presented by:Vijayalaxmi .F. Patil
LDC-IL
![Page 2: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/2.jpg)
CONTENTS
Introduction
Dravidian Languages
Tag set : Meaning and Structure
Kannada Tag set : Category, Type, Attribute Kannada Tag set : Category, Type, Attribute
Conclusion
![Page 3: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/3.jpg)
INTRODUCTION
This paper presents the importance and the structure of POS tag set for Kannada, one of the major languages of the Dravidian Language family.
This is a process of marking up the words in a text or corpus as corresponding to a particular part of speech based on both its definition, as well as its context i.e. the relationship with adjacent and related words in a phrase, sentence or paragraph.
![Page 4: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/4.jpg)
Continue…..
POS tagging is often the first stage of natural language processing following further processing like chunking, parsing etc are done. Tags play vital role in speech recognition, information retrieval and information extraction.
Recent machine learning techniques makes use of corpora to acquire high-level language knowledge. This knowledge is estimated from the corpora which are usually tagged with the correct part of speech labels. Many words occurring in the natural language texts are not listed in any catalog or lexicon.
![Page 5: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/5.jpg)
DRAVIDIAN LANGUAGES
South Indian languages belong to a common source and the cognate languages constitute a single family known as Dravidian family. About 23 languages are there in the Dravidian language family which appears to be unrelated to any other known language family. There are more than 40 million speakers of Dravidian languages. Dravidian languages are divided on the basis of geographical perspective, shared innovations and characteristic features possessed by the languages. Classification of the Dravidian languages into three sub groups namely-languages into three sub groups namely-
Dravidian Languages
South Dravidian Central Dravidian North Dravidian
![Page 6: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/6.jpg)
Continues…………South Dravidian languages: The name itself reveals the languages spoken in Southern part of India are south Dravidian languages and they are eight in number viz, Kannada, Malayalam, Tamil, Tulu, Kodagu, Badaga, Toda and Kota.
Central Dravidian languages : The languages which are Central Dravidian languages : The languages which are spoken by central part of India are Central Dravidian languages. They are 12 in number viz, Telugu, Gondi, Konda, Kui, Kuvi, Pengo, Manda, Kolami, Naiky, Parji, Gadaba Ollari and Gadaba Sillur.
North Dravidian languages : The languages spoken in the north part of India are North Dravidian languages and they are three in number viz, Kurukh, Malto and Brahui.
![Page 7: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/7.jpg)
Continues………..� Kannada Language is spoken predominantly
in the state of Karnataka, whose native speakers are called Kannadigas (కన��గరు Kannadigaru). It is the 27th most spoken language in the world. It is one of the scheduled languages of India and the official and administrative language of the state of Karnataka.Karnataka.
� Based on the recommendations of the Committee of Linguistic Experts, appointed by the Ministry of Culture, the Government of India officially recognized Kannada as a classical language. During later centuries, Kannada, along with other Dravidian languages like Telugu, Tamil, Malayalam etc, has been greatly influenced by Sanskrit in terms of vocabulary, grammar and literary styles.
![Page 8: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/8.jpg)
Tag set : Meaning and Structure
What is a tag set?
A set of defined tags i.e a set of word categories to be applied to the word categories to be applied to the word tokens of a text.
![Page 9: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/9.jpg)
Continues……………
Types of tag set
Flat tag setHierarchical tag setFine grained tag set
Flat tag set just list down the categories applicable for a particular Flat tag set just list down the categories applicable for a particular language without any provision for modularity or feature reusability.Hierarchical tag set means that the categories is that tag set which is structured relative to one another rather than a large number of independent categories. A hierarchical tag set will contain a small number of categories, each category contains a number of Types, and each Type contains Attributes, and so on, in a tree-like structure.Fine grained tag set is the tagset where the minute things are considered and is accutare in syntactic analysis.
![Page 10: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/10.jpg)
Continues……….
Present paper is based on a hierarchical tag set
Preprocessing: A process of normalization of text before tokenization.
Part of speech: Categories [that] group lexical items which perform similar grammatical functions
Lexicon: A list of possible tags for the root forms of all the valid words in a given language.
![Page 11: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/11.jpg)
KANNADA TAG SET
Category
Noun (N) Pronoun (P) Demonstrative (D) Nominal Modifier (J) Nominal Modifier (J) Verb (V) Adverb (A) Participle (L) Particle (C) Numeral (NUM) Reduplication (RDP) Residual (RD) Unknown (UNK) Punctuation (PU)
![Page 12: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/12.jpg)
NOUN
CategoryCategoryCategoryCategory TypeTypeTypeType AttributeAttributeAttributeAttribute
Noun (N) Common (NC)
Gender, Number, CaseMarker, Adverbial suffix, Adjectival suffix, Post-position, Negative, Clitic,
Proper (NP) Gender, Number, CaseMarker, Adverbial suffix, Adjectival suffix, Post-
E.g.(1)మనుష�ెౕ \NC.hum.pl.nom.0.0.0.0.emp ‘people’(2)ర�ౕశ�ెూడ�ె \NP.mas.sg.gen.0.0.pp.0.0 ‘with Ramesh’(3) �ాడువ�ద�ె�ౕ \NV.acc.0.0.emp ‘doing’(4) అ��యవ�ెగూ \NST.dis.gen.pp.incl ‘till there’
Adjectival suffix, Post-position, Negative, Clitic
Verbal (NV) Case Marker, Post-position, Negative, Clitic
Spatio-temporal (NST) Dimension, Case marker, Post-position, Clitic.
![Page 13: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/13.jpg)
PRONOUNCategoryCategoryCategoryCategory TypeTypeTypeType AttributeAttributeAttributeAttribute
Pronoun Pronominal (PRP) Gender, Number, Person, Case Marker, Dimention, Adverbialsuffix, Adjectival suffix, Post-position, Negative, Clitic
Reflexive (PRF) Gender, Number, Person, CaseMarker, Adverbial suffix, Post-position, Negative, Clitic
Reciprocal (PRC) Gender, Number, Person, Case
Eg. (5)అవళ� \PRP.fem.sg.3rd.nom.dis.0.0.0.0.0 ‘she’(6)�ా�ెౕ\PRF.hum.pl.nom.0.0.0.epm ‘yourself’(7) పరస"ర \PRC.hum.pl.0.nom.0.0.0.0.0 ‘reciprocal’(8)#ారు\PWH.hum.0.0.nom.0.0.0.0.0 ‘who’
Reciprocal (PRC) Gender, Number, Person, CaseMarker, Adverbial suffix, Post-position, Negative, Clitic
Wh-Pronoun (PWH) Gender, Number, Person, CaseMarker, Adverbial suffix, Adjectival suffix, Post-position, Negative, Clitic
![Page 14: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/14.jpg)
DEMONSTRATIVE
CategoryCategoryCategoryCategory TypeTypeTypeType AttributeAttributeAttributeAttribute
Demonstrative(DAB) Absolute (DAB) Dimension
Wh-demonstrative (DWH)
E.g. (9)ఆ \DAB.dis ‘that’
(10)#ావ\DWH ‘which’
![Page 15: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/15.jpg)
NOMINAL MODIFIER
Category Type Attribute
Nominal Modifier (J)
Adjective (JJ) Negative. Adjectival suffix, Clitic
Quantifier (JQ) Gender, Number, Numeral, Case Marker, Adverbial suffix, Adjectival suffix, Post-position, Dimension, Negative, Clitic,
E.g. (11)సుందర�ాద\JJ.0.adj.0 ‘beautiful’
(12)అష&�ె�ౕ\JQ.nue.0.nnm.acc.0.0.0.dis.0.emp (that much)
(13)బహళ\JINT.0 ‘much’
Post-position, Dimension, Negative, Clitic,
Intensifier (JINT) Clitic
![Page 16: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/16.jpg)
VERB
Category Type Attribute
Verb (V) Gender, Number, Person, Tense, causative,
Aspect, Mood, Finiteness, Negative, Defective verb,
Clitic
E.g. (14)బరు�ా)*ె+ౕ \V.fem.sg.3rd.fut.n.prg.intr.nfn.n.n.intr ‘will she come?’
(15),ను�,)ద-ళ� \ V.fem.sg.3rd.pst.n.prg.0.nfn.n.n.0 ‘he will divide’
![Page 17: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/17.jpg)
Category Type Attribute
Adverb (A) Manner (AMN) Clitic
ADVERB
E.g.(16).ధన�ా0+ౕ\AMN.emp ‘slowly’
![Page 18: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/18.jpg)
Category Type Attribute
Participle (L) Relative (LRL) Tense, Negative, Adjectival suffix, Post-
position, Negative, Clitic,
Verbal (LV) Tense, Negative, Clitic
Nominal (LN) Gender, Number, Tense, negative, Case
Marker, Adverbial suffix, Adjectival suffix,
PARTICIPLE
E.g. (17)బంద \LRL.pst.0.0.0.emp ‘which has come’
(18)1ెూౕ0 \LV.pst.0 ‘go’
(19)బరదవరు \LN.hum.pl.pst.y.nom.0.0.0.0 ‘those who have not ‘come’
(20)1ెౕళ2ద-�ె \LC.0.y.0 ‘if not tell’
Marker, Adverbial suffix, Adjectival suffix,
Postposition, Clitic,
Conditional (LC) adjective suffix, Negative, Clitic,
![Page 19: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/19.jpg)
PARTICAL
ExamplesAttributesTypeCategory
(24)1ౌదు,(‘yes’) అల�(‘no’)
(Dis) Agreement (CAGR)
(23)ఒ6 (‘oh’), అ7ౕ(‘alas’)
Interjection (CIN)
(22)అథ�ా (‘or’)Subordinating (CSB)
(21)మతూ), (‘and’)ఆదరూ (‘but’)
CliticCo-ordinating (CCD)
Particle (C)
Others (CX)
(28)కూడ (‘also’)Inclusive (CINCL)
(27)బహుశః ‘probably’)Dubitative (CDUB)
(26)�ాత;, <ెౕవల,(‘only’)
CliticDelimitive (CDLIM)
(25)1ౌద= �ా, అ=ా>(‘isn’t it’)
Confirmative( CCON)
(‘no’)
![Page 20: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/20.jpg)
NUMERAL
Category Type Attribute Examples
Numeral (NUM)
Real (NUMR)
Case marker, Clitic, Adverbial
(29)10,20,30,40
Clitic, Adverbial suffix, Postposition
Serial (NUMS) (30)10.5, 25.02
Calendric (NUMC) (31)
Ordinal (NUMO) (32)3rd, 4th, 20th
![Page 21: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/21.jpg)
Category Type Attribute
Reduplication(RDP)
Gender, number, person,
Case marker, Post-
position, Adverbial suffix,
Cilitic
REDUPLICATION
Cilitic
E.g.(33)ఒ?ెూ@బ@�ా0\RDP.hum.pl.0.nom.0.adv.0 ‘one by one’
(34)అవరవ�ెూడ�ె \RDP.hum.pl.3rd.gen.pp.0.0 ‘with them’
![Page 22: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/22.jpg)
RESIDUAL
Category Type Attribute
Residual(RD)Residual(RD)Residual(RD)Residual(RD) Foreign Word (RDF)
Symbol (RDS)
E.g. (35)काम ‘work’
(36)Ink
(37)@ # $ & %
![Page 23: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/23.jpg)
UNKNOWN
Unknown (UNK)
Category
E.g.(38)యAాయAా ధమBస ‘Sanskrit shloka’
![Page 24: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/24.jpg)
PUNCTUATION
Category
Punctuation(PU) (39), . / ? “ : ; } [ \ | = + _ /
![Page 25: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/25.jpg)
ATTRIBUTES AND THEIR VALUESAttributeAttributeAttributeAttribute ValuesValuesValuesValues
Person \PER First\1 Second\2 Third\3
Number\NUM Singular\sg Plural\pl
Gender\GEN Masculine\mas Feminine\fem Neuter\neu Human\hum
Case Marker
\CSM
Nominative/no
m
Accusative\acc Instrumental\i
ns
Dative\dat Ablative\abl Genitive\gen Locative\loc
Tense \TNS Present\prs Past\pst Future\fut
Aspect Imperfect\ ipfv Perfect\prf Progressive\
prog
Mood \MOOD Interrogative\i
nt
Habitual\hab Imperative\imp Optative\opt Hortative\hort Debitive\debt Potential \potn
nt
Finiteness\FIN Finite\fin Non-finite\nfn Infinitive\inf
Dimension
\DIM
Proximal\prx Distal\dst
Clitic /CL Interrogative\int
Inclusive\incl Indefiniteness\i
nd
Emphatic\emp Comparative\c
om
Heresay\hers
Numeral \NML Cardinal (crd) Ordinal (ord) Non-numeral
(nnm)
Negative (NEG) Yes/y No/n
Adverbial
suffix/adv
Adjectival
suffix/adj
Defective
verb\DEF
Yes\y No\n
![Page 26: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/26.jpg)
CONCLUSION
The use of morphological features is especially helpful todevelop a reasonable POS tagger when tagged resources arelimited. In Pos tagging one word may have more than onepart- of speech label. Syntactic and semantic parsing ofnatural language sentences are generally influenced byadequate part-of-speech.
![Page 27: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/27.jpg)
REFERENCES
ANDREW, H. developing a tag set for automated part-of-speech taggingin urdu. department of linguistics and modern english language,university of lancaster.
BALI, K. microsoft research india. bangalore.
BASKARAN, S. microsoft research india. bangalore.
BHATTACHARYA, T. delhi university, delhi.
BHATTACHARYYA, P. iit-bombay, mumbai.BHATTACHARYYA, P. iit-bombay, mumbai.
DANDAPAT, S., april 2008 . part-of-speech tagging for bengali.
HUDSON THOMAS, 1878 . elementary grammar of the kannadalanguage
JHA, G. N. jawaharlal nehru university, delhi.
MALLIKARJUN, B. ciil mysore, 31st march 2005 . morphological
processing of kannada verbs
MEETEI, A. N., 1st december 2009 . an introduction to language
and annotation
![Page 28: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/28.jpg)
REFERENCES
NICOLA, U. AND HERMANN, N., 2003 . using pos information forstatistical machine translation into morphologically rich languagesRAJENDRAN, S. tamil university, thanjavur.SARAVANAN, K. microsoft research india, bangalore.SCHIFFMAN, H., september 1979. a reference grammar of spokenkannadaSHARMA, D. M., SAMAR HUSAIN, AND RAJEEV SANGAL, pune 2008 .SHARMA, D. M., SAMAR HUSAIN, AND RAJEEV SANGAL, pune 2008 .linguistic data annotation for indian languagesSHRIDHAR, S.N.1990 . kannada (descriptive grammars) SOBHA L, au-kbc research centre, chennai.SUBBARAO, K. V. delhi, 2008 . designing a common pos-tagsetframework for indian languages.UPPOOR, N. june 2009. a rule-based parts of speech tagger forkannadawikipedia.org/wiki/kannada language. kannada language
![Page 29: Designing POs Tagst for Kannada -Vijayakashmi - LDC-IL](https://reader031.fdocuments.in/reader031/viewer/2022020703/61fb2e0c2e268c58cd5b1962/html5/thumbnails/29.jpg)