Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural...

60
Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University – CCLS [email protected]
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural...

Page 1: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Treebanks are Not Naturally Occurring Data

Choices in Treebank Design and What They Mean for Natural Language Processing

Owen RambowColumbia University – CCLS

[email protected]

Page 2: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Goal of this Talk

• In natural language processing, we all use syntactic representations (treebanks)

• In theoretical syntax, they use syntactic representations• Problems in both communities:

– Little reflection on what these syntactic representations mean

– Representation confused with phenomenon which is the object of scientific investigation: natural language syntax

• Goal: increase meta-scientific understanding of syntactic representations for NLP

Page 3: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Overview

• What is a Syntactic Representation?• Dependency and Phrase Structure – Definitions– Syntactic and representational constituency and

dependency– Intermediate projections

• What does this Mean for NLP?• Takehome lessons

Page 4: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Acknowledgements

• Rajesh Bhatt and Fei Xia – Bhatt, Rambow and Xia: Linguistic Phenomena,

Analyses, and Representations: Understanding Conversion between Treebanks, IJCNLP 2011

• Hindi-Urdu Treebank Project– Dipti Misra Sharma (IIIT Hyderabad)– Martha Palmer (Colorado)– NSF Grant

• All errors and infelicitous choices in this presentation are my own

Page 5: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

What is a Syntactic Representation?

1. Syntactic phenomena, e.g.:– Subject of a verb– Relative clause– Small clauseLinguists tend to agree on what phenomena exist

2. Mathematical representation type, e.g.:– Phrase structure tree– Dependency tree– Or something more complicated: LFG, TAG, …

3. Formal syntactic description:a. Mapping from phenomena to representations (in particular type)b. Chosen representation for a specific phenomenon also called

analysisc. Phenomena extracted in representation are the interpretationd. Formal description is a syntactic theory if it makes predictions

Page 6: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Example: Small Clauses

• Hindi– अा��ति�फ ने सी�मा� को� बेवको� फ सीमाझा� – Atif ne Seema ko bewakuuf samjhaa – Atif Erg Seema Acc stupid consider.Pfv – ‘Atif considered Seema stupid.’

• English– Atif considered Seema stupid– Atif considered her stupid

Page 7: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

What is the Phenomenon?

• Syntactically and semantically, consider takes a clausal complement– Atif considered [clause that she is stupid]

– Atif considered [clause her stupid]

• But two problems:– No verb – her is semantically subject of stupid but has accusative

case, which is unusual (subjects are usually nominative)• So:– Atif considered [small clause her stupid]

Page 8: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

What is the Representation Type?

• For this example, we will show dependency trees and phrase structure trees

Page 9: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 1a for Small Clauses:No Accusative Case Marking

• Structure represents her as subject but not accusative case marking of her

considers

Atif stupid

Subj Obj

her

Subj

Page 10: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 1b for Small Clauses:Exceptional Case Marking

• Structure represents her as subject and accusative case marking through node label

considers

Atif stupid

Subj Obj-ECM

her

Subj

Page 11: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 1a for Small Clauses:No Accusative Case Marking

• Structure represents her as subject but not accusative case marking of her

considers

Atif stupid

Subj Obj

her

Subj

S

NP

Atif

VP

considers S

her VP

AdjPstupid

Page 12: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 1b for Small Clauses:Exceptional Case Marking

• Structure represents her as subject but not accusative case marking of her

considers

Atif stupid

Subj Obj-ECM

her

Subj

S

NP

Atif

VP

considers SC

her VP

AdjPstupidClose to analysis adopted in Chomsky (1981)

Page 13: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Note on DS and PS

• These analyses are intuitively very similar• Formal notion: “consistency” (Fei Xia, see

Bhatt, Rambow & Fei 2011)– Intution: very simple and general algorithm can

transform consistent DS to PS and vice versaI

Page 14: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 2a for Small Clauses:General Monoclausal Analysis

• Structure represents accusative case marking of her (as object of matrix verb) but not her as semantic subject

considers

Atif stupid

Subj Obj2

her

Obj

Page 15: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 2b for Small Clauses:Syntactic Monoclausal Analysis

• Structure represents accusative case marking of her (as object of matrix verb) and her as semantic subject using node label

considers

Atif stupid

Subj ObjPred

her

Obj

Page 16: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 2b for Small Clauses:Syntactic Monoclausal Analysis

• Structure represents accusative case marking of her (as object of matrix verb) and her as semantic subject using node label

considers

Atif stupid

k1 k2s

her

k2

Neo-Paninian analysis from IIIT Hyderabad,Used for DS in Hindi-Urdu Treebank

Page 17: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 2b for Small Clauses:Syntactic Monoclausal Analysis

• Structure represents accusative case marking of her (as object of matrix verb) and her as semantic subject using node label

सीमाझा�

अा��ति�फ ने बेवको� फ

k1 k2s

सी�मा� को�

k2

Neo-Paninian analysis from IIIT Hyderabad,Used for DS in Hindi-Urdu Treebank

Page 18: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 2a for Small Clauses:General Monoclausal Analysis

• Structure represents accusative case marking of her (as object of matrix verb) but not her as semantic subject

considers

Atif stupid

Subj Obj2

her

ObjNP

Atif

VP

considers her AdjP

stupid

S

Page 19: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 2b for Small Clauses:Syntactic Monoclausal Analysis

• Structure represents accusative case marking of her (as object of matrix verb) and her as semantic subject using node label

considers

Atif stupid

k1 k2s

her

k2NP

Atif

VP

considers her

AdjP

stupid

S

SC

Page 20: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 3 for Small Clauses:Raising to Object

• Structure represents accusative case marking of her and her as semantic subject but requires empty category

considers

Atif stupid

Subj Obj-Pred

her1

Obj

e1

Subj

Page 21: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Analysis 3 for Small Clauses:Raising to Object

• Structure represents accusative case marking of her and her as semantic subject but requires empty category

considers

Atif stupid

Subj Obj-Pred

her1

Obj

e1

Subj

S

NP

Atif

VP

considers S

VP

AdjPstupid

her1

e1

Analysis used for PS in Hindi-Urdu Treebank

Page 22: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Comparison of Representations

• Less Information • Same information

considers

Atif stupid

Subj Obj

her

Subj

considers

Atif stupid

Subj Obj2

her

Obj

considers

Atif stupid

Subj Obj-Pred

her1

Obj

e1

Subj

considers

Atif stupid

Subj ObjPred

her

Obj

considers

Atif stupid

Subj Obj-ECM

her

Subj

Tree 1a

Tree 2a

Tree 1b

Tree 2b

Tree 3

Page 23: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Initial Summary: Syntactic Phenomena, Representation Types, Analyses

• Syntactic phenomena are the empirical data of syntax as part of the science of language– Can be very similar across languages

• There can be several possible analyses– Some have less information– But there can be different analyses that represent

the same information differently• The analyses can be similar in DS and PS• Lots of choices in treebank design!

Page 24: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Overview

• What is a Syntactic Representation?• Dependency and Phrase Structure – Definitions– Syntactic and representational constituency and

dependency– Intermediate projections

• What does this Mean for NLP?• Takehome lessons

Page 25: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Representation Types:Dependency and Phrase Structure

• Dependency Tree (DS):– One label alphabet, words (= words in a sentence)– All nodes labeled with words or empty strings

• Phrase Structure Tree (PS):– Two disjoint label alphabets, terminals (= words in

sentence) and nonterminals– All and only interior nodes are labeled with

nonterminals– Leaves are labeled with terminals or empty strings

Page 26: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Non-Differences Between DS and PS

• WRONG: “DS can’t have empty categories”

considers

Atif stupid

Subj Obj-Pred

her1

Obj

e1

Subj

Page 27: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Non-Differences Between DS and PS

• WRONG: “DS must be unordered, PS must be ordered”

that the dancer a flower the violinist gives

Page 28: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Non-Differences Between DS and PS

• WRONG: “PS can’t be non-projective/have crossing arcs”

Page 29: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Overview

• What is a Syntactic Representation?• Dependency and Phrase Structure – Definitions– Syntactic and representational constituency and

dependency– Intermediate projections

• What does this Mean for NLP?• Takehome lessons

Page 30: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

The Double Life of Dependency and of Constituency

• There are two meanings to “dependency”:– A type of tree (no nonterminal labels)– A type of syntactic phenomenon, namely a

relation between words (e.g., “subjecthood”)• There are two meanings to “constituency”– A type of tree (terminal labels on leaf nodes; = PS

tree)– A type of syntactic phenomenon, namely a

grouping of words into phrases

Mathematical objects = tools for linguists to

represent data

Empirical facts = data for linguists

Don’t confuse data with its representation!

Page 31: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Constituency in PS Representation but not in Syntax

• VP premodifiers in PTB can be either sister to VP or in VP, there is no difference in meaning– (S (NP-SBJ Sandy

(ADVP-TMP often) (VP throws (NP curves)))

– (S (NP-SBJ Sandy (VP (ADVP-TMP often) throws (NP curves)))

• “The variation is in general free” (PTB Guidelines p. 137)

Page 32: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Constituency in Syntax but not in PS Representation

• Flat NP in PTB– Syntax of Adj-N-N sequences: • (green (baby bassinet))• ((small business) administration)

– Representation in Penn Treebank:• (green baby bassinet)• (small business administration)

Conscious decision not to represent NP-internal pre-nominal sturcture

Page 33: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Dependency in DS Representation but not in Syntax

• --- links in CATiB dependency treebank of Arabic– Signal that connected words are not analyzed by

annotators for dependency• Pof link in DS part of Hindi-Urdu Treebank– Used for multiword expressions which form a unit

syntactically (and thus have no structure) but may occur apart in sentence

Page 34: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Dependency in Syntax but not in DS Representation

• Small clauses in DS for Hindi-Urdu Treebank: dependency between embedded predicate and its semantic subject not marked as dependency link in tree

considers

Atif stupid

k1 k2s

her

k2

Page 35: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Aren’t DS and PS Representations Complementary? NO!

• Syntactic dependency can be encoded in DS, and typically is

• Usual convention: attachment in projection shows type of dependency

Page 36: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Aren’t DS and PS Representations Complementary? NO!

• Syntactic constituency is represented in DS• Usual convention: each node is the word, and

the head of the phrase containing it and all descendents

Page 37: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Overview

• What is a Syntactic Representation?• Dependency and Phrase Structure – Definitions– Syntactic and representational constituency and

dependency– Intermediate projections

• What does this Mean for NLP?• Takehome lessons

Page 38: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

But What About Intermediate Projections?

• If PS:– only has preterminals (eg V) and maximal

projections (eg S);– and no attchment happens at preterminal;then we have trivial correspondence to DS (consistency in the formal sense of Xia)

• Then PS and DS must have the exact same expressive power

Page 39: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

No Intermediate Projections: DS and PS Representationally Equivalent

S

N Adv NV

Rajesh happily eats laddus

eats

Rajesh happily laddus

What about the POS tags in the DS?

Page 40: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

No Intermediate Projections: DS and PS Representationally Equivalent

S

N Adv NV

Rajesh happily eats laddus

eatsV

RajeshN happilyAdv laddusN

What about grammatical functions?

Page 41: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

No Intermediate Projections: DS and PS Representationally Equivalent

S

NSubj AdvAdj NObjV

Rajesh happily eats laddus

eatsV

RajeshN happilyAdv laddusN

Subj ObjAdj

Page 42: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Intermediate Projections

• If PS only has preterminals (eg V) and maximal projections (eg S), we have trivial correspondence to DS

• No direct equivalent of intermediate projections (eg VP) in DS

Page 43: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Intermediate Projections: DS and PS Different!

S

NP

Adv NPVRajesh

happily eats laddus

eatsV

RajeshN happilyAdv laddusN

Subj ObjAdj

VP

VP

AdvPN

Note: X– X’—XP = maximal projectionEXCEPT: V—VP—S:

VP = intermediate projection

Page 44: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Intermediate Projections

• If PS only has preterminals (eg V) and maximal projections (eg S), we have trivial correspondence to DS

• No direct equivalent of intermediate projections (eg VP) in DS – we have a problem?

• What is important: how intermediate projections are used in the formal syntactic description

• A node is not information about the syntax, only a node and its interpretation

Page 45: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Why Have Intermediate Projections?

• Evidence for IntProj as a constituent– VP fronting in English

• Use of a VP to represent semantic facts– Interpretation of adverbs in English

• Use of IntProj to represent syntactic facts– Subject and object of verbs in English– Genitive versus adjectival modification in Arabic– Also: • Arguments of verbs in Hindi• Verb second in German

Page 46: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Intermediate Projection as Constituent:VP Fronting in English

• VP can be moved as a constituent– Rajesh said he would [VP eat laddus]

and [VP eat laddus] Rajesh did

Page 47: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

VP Fronting in English

S

NP

didRajesh

VP

AuxN

NPV

eat laddus

VP

Sbar

Page 48: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

What can Dependency Do?

ε1

laddus

Rajesh did

Aux

Obj

Subj

eat1

Top

did

laddus

Rajesh

Obj

Subj

eat

Top

did

Rajesh

Subj

laddus

Obj

eat

Comp

eat

did laddus

ObjAux

Rajesh

Subj

If auxiliary did is head(basically, we have a VP!)

If eat is head

Issue: two possible analysesof auxiliary-verb structures in DS

Page 49: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

• Possible claim:– VP adverbs characterize manner of event

• John quickly/happily played chess

– S adverbs characterize speaker attitude towards event:• Happily/fortunately (enough), John lost against me at chess

• Problem: interpretation independent of word order– Quickly/happily John played chess– John happily/oddly (enough) lost against me at chess

• Conflict between using VP to identify subject and using VP to describe semantics of adverbs– Would need traces to show scope

• In fact, Penn Treebank for English does not use VPs for scope

Intermediate Projections and Semantic Scope

Page 50: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Use of VP to Indicate Subject

considers

Atif stupid

Subj Obj-Pred

her1

Obj

e1

Subj

S

NP

Atif

VP

considers S

VP

AdjPstupid

her1

e1

Page 51: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Projections:Noun Modification in Arabic Treebank

• Idafa (genitive) construction: – (NP (N بيت/byt/house)

(NP (N المصري/AlmSry/the-Egyptian)))the house of the Egyptian

• Modification construction:– (NP (N الوزير/Alwzyr/the-minister)

(Adj المصري/AlmSry/the-Egyptian)) the Egyptian minister

• Syntactic difference signaled by projection to X or XP• Can’t do in DS – need labels

Page 52: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Conclusion from Intermediate Projections

• Intermediate projections (eg VP, N’) are specific to PS

• They can be used to encode syntactic or semantic information

• But: DS can express the same information!• Conclusion: the existence of intermediate

projections in PS is not a reason to say that PS is more expressive than DS

Page 53: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Overview

• What is a Syntactic Representation?• Dependency and Phrase Structure – Definitions– Syntactic and representational constituency and

dependency– Intermediate projections

• What does this Mean for NLP?• Takehome lessons

Page 54: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

What Does This Mean for NLP?

• Treebanks are not naturally occurring data • The guidelines are painstakingly produced by linguists and

represent a formal description of the language• Annotators understand a sentence, determine what syntactic

phenomena exist, and use the guidelines to choose an analysis for the sentence (a structure)

• Users of the treebank can use the guidelines to interpret the structures and get back the syntactic phenomena present

• These phenomena, and not their representation in the treebank, can be used for NLP in whatever representation chosen by the researcher!

• There is already lots of linguistics in our resources, we just need to make use of that linguistic information!

Page 55: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Conclusion

• 4 takehome lessons

Page 56: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Lesson 1: There are All Sortsof Trees

Fallacy: DS trees are unordered,have no empty categories,… and PS trees are ordered and …

Truth: DS and PS trees are graphs that can

have all sorts of properties

Page 57: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Lesson 2:Trees Need an Interpretation

Fallacy: Trees have intrinsic syntactic meaning

Truth: Trees only gain syntactic meaning through interpretation;in treebanks: guidelines

Page 58: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Lesson 3: Conversion between Formats Depends on Interpretation

Fallacy: It is easier/harder to convertDS to PS than vice versa

Truth: conversion means preserving interpretation; thus it depends on the two representations between which conversion happens

Page 59: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Lesson 4: There are two Meanings to “Constituency” and “Dependency”

Fallacy: DS trees only show dependency and PS trees only show constituency

Truth: there are distinct mathematical (data structure) notions and linguistic notions of “dependency” and

“phrase structure”; each tree type can show each

Page 60: Treebanks are Not Naturally Occurring Data Choices in Treebank Design and What They Mean for Natural Language Processing Owen Rambow Columbia University.

Thank you!