Syntactic Analysis Operator-Precedence Parsing Recursive-Descent Parsing
Towards Parsing Croatian Complex Sentences: Dependent Noun Clauses
description
Transcript of Towards Parsing Croatian Complex Sentences: Dependent Noun Clauses
Towards Parsing Croatian Complex Sentences:
Dependent Noun Clauses Vanja Štefanec, Kristina Vučković, Zdravko Dovedan
University of Zagreb, Faculty of Humanities and Social Sciences
{vstefane, kvuckovi, zdovedan}@ffzg.hr
NooJ2010 Komotini, Greece
2010-05-28
NooJ2010Komotini
2/22
Our goal to determine the boundaries of dependent
clauses within the complex sentence focusing the parser performing disambiguation of chunks improving the chunker
to test the adequacy of this model as a pre-parsing method for complex sentences
NooJ2010Komotini
3/22
Overview of the work grammar that can recognize the dependent
noun clause (object clause) in the complex sentence
both simple object clause and coordination of object clauses
by defining the co-text in which object clause can occur
NOT by describing its structure relying on
output of the chunker conjunctions, complementizers, punctuations, ...
NooJ2010Komotini
4/22
Object clauses in Croatian very frequent refer to their superordinate clause predicate as a
direct object
three types (according to grammars) relative (odnosne) interrogative (zavisnoupitne) declarative (izrične)
NooJ2010Komotini
5/22
Relative object clauses introduced by relative pronouns and adjectives
Jeste li našli [što ste tražili]? Have you found [what you’ve been looking for]?
Kupit ću [kakvog nađem]. *I will buy [of the kind I’ll find].
NooJ2010Komotini
6/22
Interrogative object clauses 1. general (općeupitne)
introduced by interrogative conjunctions ‘li’, ‘da li’ or by interrogative pronouns (‘tko’, ‘koji’, ‘čiji’, ‘što’, …)
Još ne shvaćaš [što se dogodilo]. You still don’t understand [what happened].
Zaboravio sam [koji je danas dan]. I forgot [which day it is].
NooJ2010Komotini
7/22
Interrogative object clauses2. of place (mjesne)
introduced by interrogative adverbs of place
Recite [kamo ste se zaputili]. Tell us [where you are headed].
3. of time (vremenske) introduced by interrogative adverbs of time
Nisu rekli [kad će doći]. They didn’t say [when they’ll be coming].
NooJ2010Komotini
8/22
Interrogative object clauses4. of manner (načinske)
introduced by interrogative adverb ‘kako’
Još nismo saznali [kako se to dogodilo]. We still haven’t found out [how that happened].
5. qualitative (kvalitativne) introduced by interrogative adjectives ‘kakav’,
‘kakva’, ‘kakvo’
Ne znam [kakav si ti to čovjek]. I don’t know [what kind of a person you are]?
NooJ2010Komotini
9/22
Interrogative object clauses6. of amount (količinske)
introduced by interrogative adverb ‘koliko’
Znaš li [koliko si već popio]? Do you know [how much you drank already]?
7. of cause (uzročne) introduced by interrogative adverbs of cause or
prepositional expressions ‘zašto’, ‘zbog čega’, …
Ne razumijem [zašto si zakasnio].I don’t understand [why you are late].
NooJ2010Komotini
10/22
Declarative object clauses introduced by conjunctions
‘da’ (most common) ‘kako’ (less frequent; stylistic variant of ‘da’) ‘gdje’ (extremely rare; very stylistically marked)
Obećao si [da ćeš doći]. You promised [that you’ll come].
Rekli su [kako ga nije briga]. They said [that he doesn't care].
NooJ2010Komotini
11/22
Object clauses in Croatian have to be preceded by a transitive verb in an
active voice form impossible to predict their function by observing
only the structure (Vidio sam)PRED ([da se igra])OBJ.
I saw that he’s playing. object-clause
(Vidio sam)PRED (ga)OBJ ([da se igra])ATTR.
I saw him playing. adjective clause
(Izišao je)PRED (van)ADV ([da se igra])ADV.
He went out to play. purpose clause
NooJ2010Komotini
12/22
Object clauses in Croatian can be easily confused with subject clauses subject clauses refer either to the nominal
predicate or verbal predicate in passive voice forms
(Poznato je)PRED ([da pušenje uzrokuje rak])SUBJ.
It is well known that smoking causes cancer.
(Kaže se)PRED ([da je bolje spriječiti nego liječiti])SUBJ.
It is said that it is better to be safe than sorry.
NooJ2010Komotini
13/22
The model can be divided into four parts
1. the predicate2. what can appear between the predicate and
object clause3. object clause4. what can appear after the object clause
1. 2. 3. 4.
NooJ2010Komotini
14/22
1. the predicate
NooJ2010Komotini
15/22
2. between predicate and the clause
NooJ2010Komotini
16/22
3. object clause - conjunctions
NooJ2010Komotini
17/22
3. object clause - body
NooJ2010Komotini
18/22
4. after the object clause
NooJ2010Komotini
19/22
ExamplesDodao je ([da približavanje Hrvatske EU ima dvije faze]).
Pretpostavimo ([da imate visoke demokratske standarde], [da manjine imaju puna prava], [da su medijske slobode savršene])...
Zato savjetuje svima koji namjeravaju podići kredite ([da malo pričekaju, ako to mogu]).
Odgovarajući na pitanje hoće li na dogovore iz Mokrica djelovati skorašnji slovenski lokalni izbori, Maštruko je rekao ([kako u to ne vjeruje] te [da bi u slučaju kad bi države svaki put čekale ([da prođu izbori]), pregovaranje bilo nemoguće]).
NooJ2010Komotini
20/22
Problems chunker can not identify the whole VP undisambiguated chunks
subject clauses some verbs can take two arguments in
accusative case ‘pitati’ (to ask), ‘učiti’ (to teach), ... adjective clauses, purpose clauses
identifying the level of subordination often problem beyond syntax
rules of orthography proper use of punctuation marks (comma, dash)
NooJ2010Komotini
21/22
Evaluation performed in ideal circumstances
predicate is correctly identified (i.e. chunked) information about verb valency is present
corpus consists of 174 sentences with 215 object clauses
PRECISION RECALL F-MEASURE
0,46 0,82 0,59
NooJ2010Komotini
22/22
Evaluation low precision BUT correct identification in 91% of the cases average number of results per clause is 2,15 disambiguation!
high recall confirms the adequacy of the model AND we have identified the critical cases so
improvements can also be expected
Thank you for your attention.
The research within the project ACCURAT leading to these results has received funding from the
European Union Seventh Framework Programme
(FP7/2007-2013), grant agreement no 248347.
www.accurat-project.eu