NOOJ Conference Inalco, Saarbruecken June 5th, 2013

Post on 02-Feb-2016

30 views 0 download

description

Russian Module for NooJ: Semantic annotation. Conception and realisation of semantic tags for the Russian language for Max Silberztein’s Nooj software. NOOJ Conference Inalco, Saarbruecken June 5th, 2013. Vincent BÉNET INALCO CREE Recherche assistée par ordinateur. - PowerPoint PPT Presentation

Transcript of NOOJ Conference Inalco, Saarbruecken June 5th, 2013

11

NOOJ Conference NOOJ Conference Inalco, SaarbrueckenInalco, Saarbruecken

June 5th, 2013June 5th, 2013

Vincent BÉNETINALCO

CREE Recherche assistée par ordinateur

Conception and realisation Conception and realisation of semantic tagsof semantic tags

for the Russian languagefor the Russian language

for Max Silberztein’s Nooj softwarefor Max Silberztein’s Nooj software

Russian Module for NooJ: Semantic annotation

ORDIDOM

22

one main dictionary (95000 entries)one main dictionary (95000 entries) two annex dictionariestwo annex dictionaries one for proper nounsone for proper nouns one for noun-adjectivesone for noun-adjectives

Russian Module for NooJ:

design and implementation

of lexical and grammatical ressources

33

How ?How ?-by adding tags to the general dictionaryby adding tags to the general dictionary-by writing grammarsby writing grammars

Semantic Tagging or Annotation ?Semantic Tagging or Annotation ?

Russian Module for NooJ:

design and implementation

of basic semantic ressources

44

Writing semantic resources for the Writing semantic resources for the Russian languageRussian language

The semantic tags of the Russian national Corpus:

Taxonomy (a lexeme's thematic class) – for nouns, verbs, adjectives and adverbs.

Mereology (“part – whole” and “element – aggregate” relationships) – for concrete and abstract nouns

Topology – for concrete names Causation – for verbs Evaluation – for abstract and concrete nouns, adjectives

and adverbs

55

Writing semantic resources for the Writing semantic resources for the Russian languageRussian language

27 semantic taxonomic tags for verbs

t:move — movement (бежать, дергаться, бросить, нести)t:be — sphere of existence (жить, возникнуть, убить)t:loc — location (лежать, стоять, положить)t:poss — sphere of possession (иметь дать, подарить, приобрести) t:ment — mental sphere (знать, верить, догадаться, помнить)t:perc — perception (смотреть, слышать, нюхать, чуять)t:speech — speech (говорить, советовать, спорить, каламбурить)t:sound — sounds (гудеть, шелестеть)t:light — light (гаснуть, лучиться)

66

Semantic information in the Russian Semantic information in the Russian national corpus (Verbs)national corpus (Verbs)

77

Writing semantic resources for the Writing semantic resources for the Russian languageRussian language

khodit’,V+Mvt+Indet+ipf+intr+FLX=ходитьIdti,V+Mvt+Det+ipf+intr+FLX=идти

Vkhodit’,V+Mvt+Pvb+ipf+intr+FLX=ходитьVojti ’,V+Mvt+Pvb+pf+intr+FLX=идтиVykhodit’,V+Mvt+Pvb+ipf+intr+FLX=ходитьPriezzhat’,V+Mvt+Pvb+ipf+intr+FLX=акать

88

Grammar to locate the verbs of motion

99

Searching for « verbs of motion » Searching for « verbs of motion » with with NoojNooj

1010

Searching for « verbs of motion » Searching for « verbs of motion » with with NoojNooj

1111

Writing semantic resources for the Writing semantic resources for the Russian languageRussian language

— concrete nouns (девочка, стол, молоко) — abstract nouns (вождение, яркость, время) — proper names (Иван, Эйнштейн, Петроград)

— person (человек, учитель)— ethnonyms (эфиоп, итальянка)— kinship terms (брат, бабушка)— supernatural creatures (русалка, инопланетянин)— animals (корова, жираф, сорока, ящерица, муравей)— plants (береза, роза, трава)

a.s.o.

1212

Semantic information in the Russian Semantic information in the Russian national corpus (Nouns)national corpus (Nouns)

1313

Semantic information in the Russian Semantic information in the Russian national corpus (Adjectives)national corpus (Adjectives)

1414

Semantic information in the Russian Semantic information in the Russian national corpus (Adverbs)national corpus (Adverbs)

1515

Writing basic semantic resources for the Writing basic semantic resources for the Russian languageRussian language

Nooj properties.def file

N_Genre = m | f | n ;N_SGenr = an | inan ;N_Nombre = s | p;N_Cas = Im | Vi | Ro | R2 | Da | Tv | Pr | P2 | Zv ;…V_Type = Mvt;V_Morph = Pref | Suff;

1616

Writing basic semantic resources for the Writing basic semantic resources for the Russian languageRussian language

Nooj properties.def file

A_Sem = Animal; Color ( Hum = App)

N_Sem = Hum | Prof | Parents | Body Conc | Abstr | Org | Text | Animal | Food | Health | Arts | Lit | Music | Sports Topo | Country | River | City | Mount| Lake | Posit | Time | Color ;

ADV_Sem = Time |Topo | Modal;

V_Sem = Color | Topo | Posit |Modal;

1717

Writing semantic resources for the Writing semantic resources for the Russian languageRussian language

mal’chik, N+an+Hum+FLX=bul’dogpered tem kak,CONJ+UNAMB+Time

Moskva,N+f+inan+City+FLX=МоскваDon,N+m+inan+River+FLX=ДонKatar,N+Country+m+s+FLX=Ленинград

Nora,N+Forename+Hum+f+an+FLX=Лена

1818

Writing semantic resources for the Writing semantic resources for the Russian languageRussian language

zelënyj,A+Color+FLX=novyjzelenovatyj,A+ Color+FLX=zelënen’kij, A+Color+FLX=novyjtemno-zelënyj, A+Color+FLX=novyjzelen’,N+f+inan+Color+FLX=smes’zelenet’,V+intr+ipf+Color+FLX=belet’zazelenet’,V+intr+pf+Color+FLX=belet’zazelenet’sja,V+sja+pf+Color+FLX=….

1919

Writing basic semantic resources for the Writing basic semantic resources for the Russian languageRussian language

Prof = 900Parent = 160 itemsForenames = 2280Animal = 370 Food = 280 (Liquid = 25 )Body = 285Health = 175Arts = 65Lit = 40Music = 155Sport = 65

Topo = 40 Country = 180 River = 15 City = 175 Mount = 5 Lake = 5

Posit = 25Time = 135Modal = 15Color = 275

2020

Searching for « colors » with Searching for « colors » with NoojNooj

2121

Searching for « body parts » with Searching for « body parts » with NoojNooj

2222

Searching for « parents (relatives) » Searching for « parents (relatives) » words words with with NoojNooj

2323

Writing basic semantic resources for the Writing basic semantic resources for the Russian languageRussian language

NEXT WORK TO BE DONE….

-Completion of the dictionary for concrete nouns using thematic dictonaries

-a new parameter to the dictionary +Translation= to use Nooj as a resource to build basic dictionaries for parallel corpuses.

2424

NOOJ Conference Inalco, NOOJ Conference Inalco, SaarbrueckenSaarbrueckenJune 5th, 2013June 5th, 2013

vincent.benet@inalco.fr

Thank you for your attentionThank you for your attention

Russian Module for NooJ: Semantic annotation

ORDIDOM