Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry...
-
Upload
ashley-bond -
Category
Documents
-
view
217 -
download
0
Transcript of Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry...
Detection of Links between Words in the Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Task of Syntactic-Semantic Analysis of
Russian Texts.Russian Texts.
Dmitry V. Merkuryev Dmitry V. Merkuryev Saint-Petersburg State University, RussiaSaint-Petersburg State University, Russia
Mathematics and Mechanics FacultyMathematics and Mechanics FacultyDepartment of Computer ScienceDepartment of Computer Science
Petrozavodsk, May 21st, 2008Petrozavodsk, May 21st, 2008
ContentContent
1. Introduction. The task of Syntactic-Semantic Analysis of 1. Introduction. The task of Syntactic-Semantic Analysis of Russian Texts.Russian Texts.
2. 2. Syntactic and semantic analyzersSyntactic and semantic analyzers.. 3. Main principles of V.A Tuzov’s theory.3. Main principles of V.A Tuzov’s theory. 4. Sentence analysis.4. Sentence analysis. 5. The detection of links between words.5. The detection of links between words. 6. Examples.6. Examples. 7.Conclusions.7.Conclusions.
1. Introduction. The task of Syntactic-Semantic Analysis of Russian Texts.
Natural Language Processing (NLP) is one of the most actual tasks of modern computer science. Professor V.A.Tuzov's functional model [1], [2] is an adequate solution for natural language formalization.
Syntactic-semantic analyzer is the unique working system based on this theory.
It allows getting syntactic structure of Russian sentences which matches with their semantic one.
The analyzer is able to solve word sense disambiguation problem for the most sentences of journal and even literature Russian texts.
The detection of links between words is one of the most significant operations of the syntactic-semantic analyzer. This operation allows getting right semantic alternative of a word in sentence context.
2. Syntactic and semantic analyzers.
Some of the most actual NLP parsers:
DictaScope (Russian language syntactic parser) [3]
The program automatically builds a word subordination tree.It also gets grammar values of words in a sentence.
AOT (automatic handling of texts, Russian language) [4]
This program builds semantic graph and performs initial semantic analysis of a text.
Link Grammar Parser (syntactic parser of English) [5]
The system assigns to a sentence a syntactic structure, which consists of a set of labeled links connecting pairs of words.
All of these parsers have restrictions because of word sense disambiguation problem.
Therefore, Professor Tuzov’s Syntactic-semantic analyzer is the unique system.
3. Main principles of V.A Tuzov’s theory.
Thesis 1.
Language is algebraic system {f1, f2, ... , fn, M},
where fi is a basic function and M is data structure (basic concepts) of a given language.
Thesis 2.
Every word of language is the name of the function. This function allows us to evaluate
the semantics of given word. Each sentence is a superposition of these functions.
Thesis 3.
Grammar is linked with semantics of language and represented by semantic dictionary.
A function that corresponds to a word has semantic arguments and semantic-grammar types.
Semantic arguments and grammar types consist of semantic classes and prepositional-case forms.
Examples:
$16~!Вин($16~! “Accusative”)$15~!Где($15~!”Where”)
$<number> - notation of semantic class!Вин, !наВин(“on Accusative”), !Дат(“Dative”), etc – notations of prepositional-case forms!Куда(“Where to”),!Где,!Кому(“Whom”) , etc – notations of generalized grammar types
Semantic-grammar types define links where this word connects to other words as an argument.Semantic arguments determine links where this word connects other words as arguments (by their semantic-grammar types).
Example (results from the analyzer):
Он едет в город (“He is going into the city”).
Syntax tree of the sentence:
@Глагол едет<X002.002> (@Им Он<X001.002><+МестГлаг3/2/.Шаг=1+>, @Куда в<X003.202><+ГлагОбст1/2/.Шаг=3+> (@Вин город<X004.001><+ПредлСуществ5/2/.Шаг=2+>) )
Semantic values of each word and links between them:Он (“He”) ** <X001.002> ОН {Мест._Муж @ОНЪ$17@Им} $17() semantics: <X001.002> ОН () \\ <2> links: Z1: @ОНЪ$17 <= <X002.002>
едет(“is going”) **<X002.002>ЕХАТЬ{Глагол.$15402~@Глагол}N%~ПОЕЗДКА$15402(Z1:!ОНЪ$17\!ОНА$17\!ОНО$17,Z2:ПРИЧИНА$1/37/05\ПРИКАЗ$1526031~!Почему,Z3:НЕЧТО$1~!поДат,Z4:НЕЧТО$1~!Откуда,Z5: НЕЧТО$1~!Куда,Z6: ТРАНСПОРТ$121324~!Тв\!наПред) semantics:<X002.002>ЕХАТЬ Oper01(Z1,ПОЕЗДКА$15402(ПОЧЕМУ:Z2,ПОДАТ:Z3,ОТКУДА:Z4,КУДА:Z5,ТВ:НАПРЕД:Z6)) \\ <2> links: Z1: @ОНЪ$17 => <X001.002> Z5: $1~@Куда => <X003.202>в (“into”) ** <X003.202> В {Предлог. $12314~@Куда} (Z0:y> @Куда ,Z1: ПОСЕЛЕНИЕ$123~!Вин) semantics: <X003.202> В Y1>Direkt(Y1:,ВНУТРИ$12/313/05(ВИН:Z1)) \\ <200> links: Z1: $123~@Вин => <X004.001> Z5: $1~@Куда <= <X002.002>
город (“the city”) ** <X004.001> ГОРОД {Сущв._Муж_Неодуш $12314~@ОНЪ$17@Вин} $12314(Z1 :СТРАНА$1231~!Род) semantics: <X004.001> ГОРОД (РОД:Z1) \\ <1> links: Z1: $123~@Вин <= <X003.202>
Classifier of basic concepts.
Basic concept is a word which meaning can’t be expressed through more simple concepts. There are more than 20000 basic concepts (nouns and adjectives) in the semantic dictionary. Other more than 90000 words (derived words) are expressed using superposition of basic concepts and basic functions. Basic concepts are organized in hierarchical tree (classifier).
Main rules:
All words of a class inherit the same semantic properties from parent class. Also words of the class have its own specific characteristics. The name of the root class is НЕЧТО("SOMETHING”).There are more than 1500 classes.
Examples:
$1 NounНЕЧТО(“SOMETHING”), СУЩЕСТВИТЕЛЬНОЕ(“NOUN”) ,…$110 Noun AO (Abstract Object) IdeaПОНЯТИЕ (“CONCEPT”),…$1100/01 Noun АО Idea => Abstract-ConcreteАБСТРАКТНЫЙ(“ABSTRACT”), КОНКРЕТНЫЙ(“CONCRETE”),…$12 Noun PO (Physical Object)МАТЕРИЯ(“SUBSTANCE”), ПРОСТРАНСТВО(“SPACE”), ТЕЛО(“BODY”),…$122 Noun PO NatureПРИРОДА(“NATURE”),… $122/1 Noun PO Nature WeatherПОГОДА(“WEATHER”),…$12211 Noun PO Nature Plants TreesДЕРЕВО(“TREE”), ДУБ(“OAK”), СОСНА(“PINE”),…
Basic functions.
Basic functions describe relationship between its arguments. We can express the formal meanings of each derived word by superposition of basic concepts and basic functions.
Examples:
And(x,y) x and y Caus(x,y) x causes of y Cont(x) x is continuing Content(x,y) x contents yControl(x,y) x controls y Func(x) x occursHab(x,y) x has y Incep(x) x is starting Lab(x,y) x exposes yLoc(x,y) x situated in yMagn(x) x higher of normMult(x) multiset of xNe(x) negation of xOper(x,y) x performs yRel(x,y) x has a relation to yetc…
ЛЕСНОЙ A1>Rel(A1:НЕЧТО$1,ЛЕС$122412) (“forest”, adjective,“something has a relation to a forest”)КОНСТРУИРОВАТЬ Caus(Z1,IncepFunc(КОНСТРУКЦИЯ$1/422(ВИН:Z2)))(“construct”, verb, “Z1 causes the appearance of a construction”)
Semantic dictionary.
It consists of more than 100000 Russian words.The dictionary can be divided into 2 main parts: syntactic and semantic.
Examples:
ПОЛУЧИТЬ (“get”, verb)Syntactic:ПОЛУЧИТЬ N%~ПОЛУЧЕНИЕ$15310/0/04({Z1: НЕЧТО$1~!Им,Z2: НЕЧТО$1~!Откуда\!Изо\!Ото\!сРод,Z3: !заВин,Z4: ПИЩА$101/0\НЕЧТО$1~!Вин})Semantic:ПОЛУЧИТЬ N%~ПОЛУЧЕНИЕ $15310/0/04(PerfCaus(Oper01(Uzor(Z1,ОТКУДА:Z2),Z3),Hab(Z1,РОД:Z4))) \\ <4>
НАГРАДА (“reward”, noun)Syntactic:НАГРАДА $1241/131/03({Z1: !Дат\!Род,Z2: !Тв,Z3: !заВин,Z4: !наВин})Semantic:НАГРАДА $1241/131/03(ДАТ:РОД:Z1,ТВ:Z2,ЗАВИН:Z3,НАВИН:Z4) \\ <1>
4. Sentence analysis.
The processing of natural language texts includes morphologic, word-by-word and syntactic-semantic analysis. The syntactic-semantic analyzer solves 2 main problems: - the selection of right semantic alternative of a word - the binding of selected alternatives in integrated construction.This system is represented with a bunch of recursive functions. Each function handles specific part of speech: verb, noun, preposition, adjective etc.
5. The detection of links between words.
The detection of links is the main operation of the analyzer .It binds words or assembled constructions.
There are 2 main types of interoperabilities between 2 constructions:
- semantic arguments of incorporating construction interact with semantic-grammar types of affiliable construction (control link, e.g., verb and noun).- semantic-grammar types of a construction interact with semantic-grammar types of another one.(agreement link, e.g., adjective and noun)
Examples of links:
- by case: pronoun and noun: его успех (“his success”)links: @Им(“nominative”), @Вин(“accusative”)
- by semantic class, case, gender, number: adjective and noun:красивый лес(“beautiful forest”)links: $1~@Онъ$17@Им $1~@Онъ$17@Вин
Other examples are contained in the item 6 of the presentation.
Dictionary articles of two neighboring words after the first steps of text processing havefollowing structures:<word1>< semantic alternative 1>< semantic alternative 2>< semantic alternative 3> ...< semantic alternative n1><word2>< semantic alternative 1>< semantic alternative 2>< semantic alternative 3> ...< semantic alternative n2>
< semantic alternative>::= < { morphologic information, semantic-grammar types} (syntactic-semantic information, semantic arguments) <<additional arguments>> >
Detection of links procedure check matches for all arguments of all semantic alternativesin a word1 with all arguments of all semantic alternatives in a word2.This procedure can be sufficiently optimized if use complex data structures (the optimization is the subject of current investigations).
6. An example of analyzed sentence.
Люди любят отдыхать на природе(“People like to rest in nature”).
Syntax tree of the sentence:@Глагол любят<X002.003> (@Им Люди<X001.001><+СущГлаг3/2/.Шаг=2+>, @Инфин отдыхать<X003.001><+ГлагИнфин6/2/.Шаг=1+> (@Где на<X004.090><+ГлагОбст1/2/.Шаг=4+> (@Пред природе<X005.001><+ПредлСуществ5/2/.Шаг=3+>) ) )
Semantic values of each word and links between them:
Люди (“People”) ** <X001.001> ЧЕЛОВЕК {Сущв._Муж_Одуш $1241~@ОНИ$17@Им} $1241(Z1: ВРЕМЯ$16\ЧЕЛОВЕК$1241\ПЛАНЕТА$12271~!Род)semantics: <X001.001> ЧЕЛОВЕК (РОД:Z1) \\ <1> links: Z2: $124~@ОНИ$17 <= <X002.003>
любят (“like”)
** <X002.003> ЛЮБИТЬ {Глагол. $1241/40113/05~@Глагол} N%~ЛЮБОВЬ$1241/40113/05(Z1: !Инфин,Z2: ЖИВОЙ$124~!ОНИ$17,Z3: !заВин)semantics: <X002.003> ЛЮБИТЬ Caus(ИНФИН:Z1,Oper02(ИМ:Z2,ПРИЯТНОСТЬ$1241/40012/03(ЗАВИН:Z3))) \\ <3> links: Z1: @Инфин => <X003.001> Z2: $124~@ОНИ$17 => <X001.001>
отдыхать(“to rest”)
** <X003.001> ОТДЫХАТЬ {Глагол. $15308~@Инфин} N%~ОТДЫХ$15308(Imperf Z1 : #,Z2: !Ото,Z3: НЕЧТО$1~!Где) semantics: <X003.001> ОТДЫХАТЬ Oper01(Z1,ОТДЫХ$15308(ОТО:Z2,ГДЕ:Z3)) \\ <1> links: Z3: $1~@Где => <X004.090> Z1: @Инфин <= <X002.003>
на (“in”) ** <X004.090> НА {Предлог. $122~@Где} (Z0:y> @Где, Z1:ПРИРОДА$122 \ГРАНИЦА$12/15/16\РАССТОЯНИЕ$12/32\ПЛОЩАДЬ$12316~!Пред)semantics: <X004.090> НА Y1>Loc(Y1:,ПРЕД:Z1) \\ <105>links: Z1: $122~@Пред => <X005.001> Z3: $1~@Где <= <X003.001>
природе (“nature”) ** <X005.001> ПРИРОДА {Сущв._Жен_Неодуш $122~@ОНА$17@Пред} $122(Z1 : !Род) semantics: <X005.001> ПРИРОДА (РОД:Z1) \\ <1> links: Z1: $122~@Пред <= <X004.090>
7.Conclusions.
The syntactic-semantic analyzer based on V.A.Tuzov’s theory is the unique system.The detection of links between words allows getting the right semantic alternative of a word in a sentence. The correctness of text processing is more than 95%.
Bibliography, internet resources:
[1] Tuzov V.A. Mathematical Model of Language. Saint-Petersburg State University Publishing House, 1984, p. 176 (in Russian).
[2] Tuzov V.A. Computer Semantics of Russian Language. Saint-Petersburg State University Publishing House, 2004, p. 400 (in Russian).
[3] http://www.dictum.ru/
[4] http://www.aot.ru/
[5] http://www.link.cs.cmu.edu/link/