Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics...

14
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague from Constituency to Dependency

description

April 15, 2003Arabic Syntactic Trees: from Constituency to Dependency3 Constituency X Dependency Non-terminal nodes + Text tokens Constituent labeling on non-terminals Slots and traces Linguistic Data Consortium, University of Pennsylvania Sentence root node + Text tokens Analytical function for every tree node Government and roles CCL & IFAL & ICL, Charles University in Prague

Transcript of Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics...

Page 1: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

Arabic Syntactic Trees

Zdeněk ŽabokrtskýOtakar Smrž

Center for Computational LinguisticsFaculty of Mathematics and PhysicsCharles University in Prague

from Constituency to Dependency

Page 2: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 2

Motivation & Background Linguistic Data Consortium Arabic Treebank

Constituent-syntax bracketing ~100k words published Modification from English to Arabic

Prague Arabic Dependency Treebank Dependency approach to syntax ~50k words in

progress Pre-step to tectogrammatical description

Motivation: co-operation and resource exchange Our goal: transform the data from one annotation

scheme to the other

Page 3: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 3

Constituency X Dependency Non-terminal nodes

+ Text tokens Constituent labeling

on non-terminals Slots and traces

Linguistic Data Consortium, University of Pennsylvania

Sentence root node + Text tokens

Analytical function for every tree node

Government and roles

CCL & IFAL & ICL, Charles University in Prague

Page 4: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 4

Model Arabic Phrase I Trace of the antecedent

subject Compound function of

the head of the clause – outer and inner perspectives

Free word-order compliant

Page 5: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 5

Outline of the Transformation

1. Build temporary dependency tree Contraction of the input phrase-structure tree Uniquely determined by head selection function Implementation: simple recursive procedure

2. Create analytical tree topology Post-processing (corrections) of the temporary dep.

tree, e.g., substituting traces with trace coindexed fillers

Re-arrangement of special complex constructs

3. Assign analytical functions

Page 6: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 6

Head Selection Function For each constituent, select the head

constituent among its children Based on (ordered) handcrafted rules Examples:

If there is a node with tag=PREP among the children, then it is the head

If there is a node with phrase_label=VP among the children, then it is the head

... etc ... If nothing was selected by the rules, then the

rightmost child is selected

Page 7: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 7

Analytical Function Assignment

Based on (ordered) handcrafted rules and lexical lists

Completes the process, does not override previous assignments

Examples: phrase_label=NP-SBJ afun=Sb lemma=wa- afun=Coord pos_tag=CONJ afun=AuxC ... etc ...

Page 8: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 8

Model Arabic Phrase II Sister-like co-ordination Conjunction of co-ordination

Status constructus

Page 9: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 9

Model Arabic Phrase III Non-expressed subject (?) Complex modality

constructs Principal discrepancies

between descriptions – both in topology and labeling

Page 10: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 10

Model Arabic Sentence Wa lam yakun mina ’s-sahli `alay hi

muwāğahatu kāmīrāti ’t-tilfizyūni wa `adasāti ’l-muşawwirīna wa huwa yaş`adu ’l-bāşa.

It was not easy for him to face the television cameras and the lenses of photographers as he was getting on the bus.

Page 11: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 11

Constituency Annotation

Page 12: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 12

Dependency Annotation

Page 13: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 13

Evaluation & Conclusion Implementation still in progress, fine-tuning

needed

10,000 words manually annotated in both styles ~60% of correctly aimed dependencies

2nd Prague Penn Arabic Treebanking Workshop, May 2003 in Prague

Transfer from dependency to constituency?

Page 14: Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

April 15, 2003 Arabic Syntactic Trees: from Constituency to Dependency 14

Related Work New tool for assignment of analytical functions

Based on machine learning (C5-trained decision trees) Error rate 17% (supposing the topology of the tree is

correct)

First experiments with Arabic dependency parser

Incorporated into the process of annotation of Prague Arabic Dependency Treebank