Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry...

20
Detection of Links between Detection of Links between Words in the Task of Syntactic- Words in the Task of Syntactic- Semantic Analysis of Russian Semantic Analysis of Russian Texts. Texts. Dmitry V. Merkuryev Dmitry V. Merkuryev Saint-Petersburg State University, Russia Saint-Petersburg State University, Russia Mathematics and Mechanics Faculty Mathematics and Mechanics Faculty Department of Computer Science Department of Computer Science Petrozavodsk, May 21st, 2008 Petrozavodsk, May 21st, 2008

Transcript of Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry...

Page 1: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Detection of Links between Words in the Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Task of Syntactic-Semantic Analysis of

Russian Texts.Russian Texts.

Dmitry V. Merkuryev Dmitry V. Merkuryev Saint-Petersburg State University, RussiaSaint-Petersburg State University, Russia

Mathematics and Mechanics FacultyMathematics and Mechanics FacultyDepartment of Computer ScienceDepartment of Computer Science

Petrozavodsk, May 21st, 2008Petrozavodsk, May 21st, 2008

Page 2: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

ContentContent

1. Introduction. The task of Syntactic-Semantic Analysis of 1. Introduction. The task of Syntactic-Semantic Analysis of Russian Texts.Russian Texts.

2. 2. Syntactic and semantic analyzersSyntactic and semantic analyzers.. 3. Main principles of V.A Tuzov’s theory.3. Main principles of V.A Tuzov’s theory. 4. Sentence analysis.4. Sentence analysis. 5. The detection of links between words.5. The detection of links between words. 6. Examples.6. Examples. 7.Conclusions.7.Conclusions.

Page 3: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

1. Introduction. The task of Syntactic-Semantic Analysis of Russian Texts.

Natural Language Processing (NLP) is one of the most actual tasks of modern computer science. Professor V.A.Tuzov's functional model [1], [2] is an adequate solution for natural language formalization.

Syntactic-semantic analyzer is the unique working system based on this theory.

It allows getting syntactic structure of Russian sentences which matches with their semantic one.

The analyzer is able to solve word sense disambiguation problem for the most sentences of journal and even literature Russian texts.

The detection of links between words is one of the most significant operations of the syntactic-semantic analyzer. This operation allows getting right semantic alternative of a word in sentence context.

Page 4: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

2. Syntactic and semantic analyzers.

Some of the most actual NLP parsers:

DictaScope (Russian language syntactic parser) [3]

The program automatically builds a word subordination tree.It also gets grammar values of words in a sentence.

AOT (automatic handling of texts, Russian language) [4]

This program builds semantic graph and performs initial semantic analysis of a text.

Link Grammar Parser (syntactic parser of English) [5]

The system assigns to a sentence a syntactic structure, which consists of a set of labeled links connecting pairs of words.

All of these parsers have restrictions because of word sense disambiguation problem.

Therefore, Professor Tuzov’s Syntactic-semantic analyzer is the unique system.

Page 5: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

3. Main principles of V.A Tuzov’s theory.

Thesis 1.

Language is algebraic system {f1, f2, ... , fn, M},

where fi is a basic function and M is data structure (basic concepts) of a given language.

Thesis 2.

Every word of language is the name of the function. This function allows us to evaluate

the semantics of given word. Each sentence is a superposition of these functions.

Thesis 3.

Grammar is linked with semantics of language and represented by semantic dictionary.

Page 6: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

A function that corresponds to a word has semantic arguments and semantic-grammar types.

Semantic arguments and grammar types consist of semantic classes and prepositional-case forms.

Examples:

$16~!Вин($16~! “Accusative”)$15~!Где($15~!”Where”)

$<number> - notation of semantic class!Вин, !наВин(“on Accusative”), !Дат(“Dative”), etc – notations of prepositional-case forms!Куда(“Where to”),!Где,!Кому(“Whom”) , etc – notations of generalized grammar types

Semantic-grammar types define links where this word connects to other words as an argument.Semantic arguments determine links where this word connects other words as arguments (by their semantic-grammar types).

Page 7: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Example (results from the analyzer):

Он едет в город (“He is going into the city”).

Syntax tree of the sentence:

@Глагол едет<X002.002> (@Им Он<X001.002><+МестГлаг3/2/.Шаг=1+>, @Куда в<X003.202><+ГлагОбст1/2/.Шаг=3+> (@Вин город<X004.001><+ПредлСуществ5/2/.Шаг=2+>) )

Semantic values of each word and links between them:Он (“He”) ** <X001.002> ОН {Мест._Муж @ОНЪ$17@Им} $17() semantics: <X001.002> ОН () \\ <2> links: Z1: @ОНЪ$17 <= <X002.002>

Page 8: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

едет(“is going”) **<X002.002>ЕХАТЬ{Глагол.$15402~@Глагол}N%~ПОЕЗДКА$15402(Z1:!ОНЪ$17\!ОНА$17\!ОНО$17,Z2:ПРИЧИНА$1/37/05\ПРИКАЗ$1526031~!Почему,Z3:НЕЧТО$1~!поДат,Z4:НЕЧТО$1~!Откуда,Z5: НЕЧТО$1~!Куда,Z6: ТРАНСПОРТ$121324~!Тв\!наПред) semantics:<X002.002>ЕХАТЬ Oper01(Z1,ПОЕЗДКА$15402(ПОЧЕМУ:Z2,ПОДАТ:Z3,ОТКУДА:Z4,КУДА:Z5,ТВ:НАПРЕД:Z6)) \\ <2> links: Z1: @ОНЪ$17 => <X001.002> Z5: $1~@Куда => <X003.202>в (“into”) ** <X003.202> В {Предлог. $12314~@Куда} (Z0:y> @Куда ,Z1: ПОСЕЛЕНИЕ$123~!Вин) semantics: <X003.202> В Y1>Direkt(Y1:,ВНУТРИ$12/313/05(ВИН:Z1)) \\ <200> links: Z1: $123~@Вин => <X004.001> Z5: $1~@Куда <= <X002.002>

Page 9: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

город (“the city”) ** <X004.001> ГОРОД {Сущв._Муж_Неодуш $12314~@ОНЪ$17@Вин} $12314(Z1 :СТРАНА$1231~!Род) semantics: <X004.001> ГОРОД (РОД:Z1) \\ <1> links: Z1: $123~@Вин <= <X003.202>

Classifier of basic concepts.

Basic concept is a word which meaning can’t be expressed through more simple concepts. There are more than 20000 basic concepts (nouns and adjectives) in the semantic dictionary. Other more than 90000 words (derived words) are expressed using superposition of basic concepts and basic functions. Basic concepts are organized in hierarchical tree (classifier).

Main rules:

All words of a class inherit the same semantic properties from parent class. Also words of the class have its own specific characteristics. The name of the root class is НЕЧТО("SOMETHING”).There are more than 1500 classes.

Page 10: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Examples:

$1 NounНЕЧТО(“SOMETHING”), СУЩЕСТВИТЕЛЬНОЕ(“NOUN”) ,…$110 Noun AO (Abstract Object) IdeaПОНЯТИЕ (“CONCEPT”),…$1100/01 Noun АО Idea => Abstract-ConcreteАБСТРАКТНЫЙ(“ABSTRACT”), КОНКРЕТНЫЙ(“CONCRETE”),…$12 Noun PO (Physical Object)МАТЕРИЯ(“SUBSTANCE”), ПРОСТРАНСТВО(“SPACE”), ТЕЛО(“BODY”),…$122 Noun PO NatureПРИРОДА(“NATURE”),… $122/1 Noun PO Nature WeatherПОГОДА(“WEATHER”),…$12211 Noun PO Nature Plants TreesДЕРЕВО(“TREE”), ДУБ(“OAK”), СОСНА(“PINE”),…

Basic functions.

Basic functions describe relationship between its arguments. We can express the formal meanings of each derived word by superposition of basic concepts and basic functions.

Page 11: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Examples:

And(x,y) x and y Caus(x,y) x causes of y Cont(x) x is continuing Content(x,y) x contents yControl(x,y) x controls y Func(x) x occursHab(x,y) x has y Incep(x) x is starting Lab(x,y) x exposes yLoc(x,y) x situated in yMagn(x) x higher of normMult(x) multiset of xNe(x) negation of xOper(x,y) x performs yRel(x,y) x has a relation to yetc…

ЛЕСНОЙ A1>Rel(A1:НЕЧТО$1,ЛЕС$122412) (“forest”, adjective,“something has a relation to a forest”)КОНСТРУИРОВАТЬ Caus(Z1,IncepFunc(КОНСТРУКЦИЯ$1/422(ВИН:Z2)))(“construct”, verb, “Z1 causes the appearance of a construction”)

Page 12: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Semantic dictionary.

It consists of more than 100000 Russian words.The dictionary can be divided into 2 main parts: syntactic and semantic.

Examples:

ПОЛУЧИТЬ (“get”, verb)Syntactic:ПОЛУЧИТЬ N%~ПОЛУЧЕНИЕ$15310/0/04({Z1: НЕЧТО$1~!Им,Z2: НЕЧТО$1~!Откуда\!Изо\!Ото\!сРод,Z3: !заВин,Z4: ПИЩА$101/0\НЕЧТО$1~!Вин})Semantic:ПОЛУЧИТЬ N%~ПОЛУЧЕНИЕ $15310/0/04(PerfCaus(Oper01(Uzor(Z1,ОТКУДА:Z2),Z3),Hab(Z1,РОД:Z4))) \\ <4>

НАГРАДА (“reward”, noun)Syntactic:НАГРАДА $1241/131/03({Z1: !Дат\!Род,Z2: !Тв,Z3: !заВин,Z4: !наВин})Semantic:НАГРАДА $1241/131/03(ДАТ:РОД:Z1,ТВ:Z2,ЗАВИН:Z3,НАВИН:Z4) \\ <1>

Page 13: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

4. Sentence analysis.

The processing of natural language texts includes morphologic, word-by-word and syntactic-semantic analysis. The syntactic-semantic analyzer solves 2 main problems: - the selection of right semantic alternative of a word - the binding of selected alternatives in integrated construction.This system is represented with a bunch of recursive functions. Each function handles specific part of speech: verb, noun, preposition, adjective etc.

5. The detection of links between words.

The detection of links is the main operation of the analyzer .It binds words or assembled constructions.

There are 2 main types of interoperabilities between 2 constructions:

- semantic arguments of incorporating construction interact with semantic-grammar types of affiliable construction (control link, e.g., verb and noun).- semantic-grammar types of a construction interact with semantic-grammar types of another one.(agreement link, e.g., adjective and noun)

Page 14: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Examples of links:

- by case: pronoun and noun: его успех (“his success”)links: @Им(“nominative”), @Вин(“accusative”)

- by semantic class, case, gender, number: adjective and noun:красивый лес(“beautiful forest”)links: $1~@Онъ$17@Им $1~@Онъ$17@Вин

Other examples are contained in the item 6 of the presentation.

Page 15: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Dictionary articles of two neighboring words after the first steps of text processing havefollowing structures:<word1>< semantic alternative 1>< semantic alternative 2>< semantic alternative 3> ...< semantic alternative n1><word2>< semantic alternative 1>< semantic alternative 2>< semantic alternative 3> ...< semantic alternative n2>

< semantic alternative>::= < { morphologic information, semantic-grammar types} (syntactic-semantic information, semantic arguments) <<additional arguments>> >

Detection of links procedure check matches for all arguments of all semantic alternativesin a word1 with all arguments of all semantic alternatives in a word2.This procedure can be sufficiently optimized if use complex data structures (the optimization is the subject of current investigations).

Page 16: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

6. An example of analyzed sentence.

Люди любят отдыхать на природе(“People like to rest in nature”).

Syntax tree of the sentence:@Глагол любят<X002.003> (@Им Люди<X001.001><+СущГлаг3/2/.Шаг=2+>, @Инфин отдыхать<X003.001><+ГлагИнфин6/2/.Шаг=1+> (@Где на<X004.090><+ГлагОбст1/2/.Шаг=4+> (@Пред природе<X005.001><+ПредлСуществ5/2/.Шаг=3+>) ) )

Semantic values of each word and links between them:

Люди (“People”) ** <X001.001> ЧЕЛОВЕК {Сущв._Муж_Одуш $1241~@ОНИ$17@Им} $1241(Z1: ВРЕМЯ$16\ЧЕЛОВЕК$1241\ПЛАНЕТА$12271~!Род)semantics: <X001.001> ЧЕЛОВЕК (РОД:Z1) \\ <1> links: Z2: $124~@ОНИ$17 <= <X002.003>

Page 17: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

любят (“like”)

** <X002.003> ЛЮБИТЬ {Глагол. $1241/40113/05~@Глагол} N%~ЛЮБОВЬ$1241/40113/05(Z1: !Инфин,Z2: ЖИВОЙ$124~!ОНИ$17,Z3: !заВин)semantics: <X002.003> ЛЮБИТЬ Caus(ИНФИН:Z1,Oper02(ИМ:Z2,ПРИЯТНОСТЬ$1241/40012/03(ЗАВИН:Z3))) \\ <3> links: Z1: @Инфин => <X003.001> Z2: $124~@ОНИ$17 => <X001.001>

отдыхать(“to rest”)

** <X003.001> ОТДЫХАТЬ {Глагол. $15308~@Инфин} N%~ОТДЫХ$15308(Imperf Z1 : #,Z2: !Ото,Z3: НЕЧТО$1~!Где) semantics: <X003.001> ОТДЫХАТЬ Oper01(Z1,ОТДЫХ$15308(ОТО:Z2,ГДЕ:Z3)) \\ <1> links: Z3: $1~@Где => <X004.090> Z1: @Инфин <= <X002.003>

Page 18: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

на (“in”) ** <X004.090> НА {Предлог. $122~@Где} (Z0:y> @Где, Z1:ПРИРОДА$122 \ГРАНИЦА$12/15/16\РАССТОЯНИЕ$12/32\ПЛОЩАДЬ$12316~!Пред)semantics: <X004.090> НА Y1>Loc(Y1:,ПРЕД:Z1) \\ <105>links: Z1: $122~@Пред => <X005.001> Z3: $1~@Где <= <X003.001>

природе (“nature”) ** <X005.001> ПРИРОДА {Сущв._Жен_Неодуш $122~@ОНА$17@Пред} $122(Z1 : !Род) semantics: <X005.001> ПРИРОДА (РОД:Z1) \\ <1> links: Z1: $122~@Пред <= <X004.090>

Page 19: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

7.Conclusions.

The syntactic-semantic analyzer based on V.A.Tuzov’s theory is the unique system.The detection of links between words allows getting the right semantic alternative of a word in a sentence. The correctness of text processing is more than 95%.

Page 20: Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.

Bibliography, internet resources:

[1] Tuzov V.A. Mathematical Model of Language. Saint-Petersburg State University Publishing House, 1984, p. 176 (in Russian).

[2] Tuzov V.A. Computer Semantics of Russian Language. Saint-Petersburg State University Publishing House, 2004, p. 400 (in Russian).

[3] http://www.dictum.ru/

[4] http://www.aot.ru/

[5] http://www.link.cs.cmu.edu/link/