Thomas Hoffmann (University of Regensburg)

37
Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, 02.02.-04.02.2006 Thomas Hoffmann (University of Regensburg)

description

Thomas Hoffmann (University of Regensburg). Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses. Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, 02.02.-04.02.2006. - PowerPoint PPT Presentation

Transcript of Thomas Hoffmann (University of Regensburg)

Page 1: Thomas Hoffmann (University of Regensburg)

Corpus and Experimental Data as Corroborating Evidence:The Case of Preposition Placement in English Relative Clauses

Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, 02.02.-04.02.2006

Thomas Hoffmann

(University of Regensburg)

Page 2: Thomas Hoffmann (University of Regensburg)

1. Introduction: Corpus vs. Introspection

We do not need to use intuition in justifying our grammars, and as scientists, we must not use intuition in this way. (Sampson 2001: 135)

You don’t take a corpus, you ask questions. […] You can take as many texts as you like, you can take tape recordings, but you’ll never get the answer. (Chomsky in Aarts 2000: 5-6)

Which type of data are we left with then?

Page 3: Thomas Hoffmann (University of Regensburg)

1. Introduction: Corpus vs. Introspection

A corpus and an introspection-based approach to linguistics […] can be gainfully viewed as being complementary.

(McEnery and Wilson 1996: 16)

corpus and introspection data = corroborating evidence

case study: P placement in English Relative clauses

Page 4: Thomas Hoffmann (University of Regensburg)

1. Introduction: What to Expect

1. corpora vs. introspection?

2. categorical corpus data (ICE-GB corpus)

3. Magnitude Estimation experiment

4. variable corpus data (ICE-GB corpus)

5. conclusion

Page 5: Thomas Hoffmann (University of Regensburg)

2. Corpora and Introspection

Arguments against corpus data:

• “performance” problem:

• “negative data” problem:

• “homogeneity” problem:

“only use introspection”

Page 6: Thomas Hoffmann (University of Regensburg)

2. Corpora and Introspection

Arguments against corpus data: no corpus

• “performance” problem: yet: performance result of competence

modern corpora representative

• “negative data” problem: yet: only additional (different) data needed

• “homogeneity” problem:yet: empirical claim that needs to be investigated

use corpora + additional data type

Page 7: Thomas Hoffmann (University of Regensburg)

2. Corpora and Introspection

Arguments against introspection data:

• “unnatural data” problem:

• “irrefutable data” problem:

• “illusion” problem:

• “stability” problem:

“only use corpora”

Page 8: Thomas Hoffmann (University of Regensburg)

2. Corpora and Introspection

Arguments against introspection data: no introspection

• “unnatural data” problem:yet: only additional (context) data needed

• “irrefutable data”:yet: depends only on collection method

• “illusion” problem: yet: only additional (natural) data needed

• “stability” problem: yet: empirical claim that needs to be investigated

use corpora + additional data type

Page 9: Thomas Hoffmann (University of Regensburg)

2. Corpora and Introspection

Corpora and introspection are corroborating evidence:

= weaknesses of corpus data

= weaknesses of introspection data

+ ungrammaticality+ unexpected patterns

+ negative data+ contextual factors

+ rare phenomena+ natural language

introspectioncorpus

Page 10: Thomas Hoffmann (University of Regensburg)

3. Case Study: Preposition Placement

I want a data source ...

(1) a. which I can rely on [stranded preposition]

b. on which I can rely [pied-piped preposition]

driving question:data source for empirical analysis of (1a,b)?

Page 11: Thomas Hoffmann (University of Regensburg)

4. Empirical Study I: Corpus Data

• Corpus used:

International Corpus of English ICE-GB (Nelson et al. 2002)(educated Present-day BE, written & spoken)

• Analysis tool:

GOLDVARB computer programme (logistic regression; Robinson et al. 2001) relative influence of various contextual factors (weights: <0.5 = inhibiting factors; >0.5 = favouring)

Page 12: Thomas Hoffmann (University of Regensburg)

Pstrand/pied-piped token tested for

1. finiteness

2. restrictiveness

3. relativizer

4. XP contained in (V / N, e.g. entrance to sth. / Adj, e.g. afraid of sth.)

5. level of formality

6. X-PP relationship (Vprepositional, PPLoc_Adjunct, PPMan_Adjunct …)

except 2: all factors discussed in literature before, but not w.r.t. interdependence (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000)

4. Empirical Study I: Corpus Data I

Page 13: Thomas Hoffmann (University of Regensburg)

raw ICE-GB P-placement data:

1074 finite relative clauses

659 (61.4%) tokens: pied piped

415 (38.6%) tokens: stranded

as expected: many categorical effects

accidental vs. systematic gaps?

4.1 Categorical corpus data

Page 14: Thomas Hoffmann (University of Regensburg)

1. relativizer:

all that/Ø-tokens in ICE-GB stranded

176 that+Pstranded-token

(2) a data source on that I can rely

177 Ø+Pstranded-token

(3) a data source on Ø I can rely

ICE-GB result: expected

implications: (2) = (3)? / that WH-

4.2 Categorical corpus data: that/Ø ≠ WH-relatives

Page 15: Thomas Hoffmann (University of Regensburg)

2. X-PP relationship:

Literature (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000):

Pstranding favoured with complement PP

disfavoured with adjunct PP

ICE-GB data:

Pstranding restricted to PPs which

add thematic information to predicates/events

4.3 Categorical corpus data: Constraints on Pstrand

Page 16: Thomas Hoffmann (University of Regensburg)

2. X-PP relationship:

categorical effect of WH-PPAdjuncts-tokens:

a) just P+WH / no that/Ø+P in ICE-GB: manner, degree, frequency & respect PPs, e.g.:

(4) a. the ways in which the satire is achieved <ICE-GB:S1B-014 #5:1:A>

b. the ways which/that/Ø the satire is achieved in

4.3 Categorical corpus data: Constraints on Pstrand

Page 17: Thomas Hoffmann (University of Regensburg)

2. X-PP relationship:

categorical effect of WH-PPAdjuncts-tokens:

b) just P+WH / but that/Ø+P in ICE-GB: subcat. PP (put sth. in/into/under)

& locative, affected loc., direction PP adjuncts

(5) a. … the world that I was working in and studying in <ICE-GB:S1A-001 #35:1B>

b. … the world in which I was working and studying

4.3 Categorical corpus data: Constraints on Pstrand

Page 18: Thomas Hoffmann (University of Regensburg)

Claim: comparison of WH- vs that/Ø shows:

P can only be stranded if: PP adds thematic information to predicates/events

• manner & degree adjuncts:compare events “to other possible events of V-ing” (Ernst 2002: 59)

• frequency & respect adjuncts: have scope over temporal information (frequency) and truth value of entire clause (respect)

don’t add thematic participant Pstrand with these: systematic gap

4.3 Categorical corpus data: Constraints on Pstrand

Page 19: Thomas Hoffmann (University of Regensburg)

Claim: comparison of WH- vs that/Ø shows:

P can only be stranded if: PP adds thematic information to predicates/events

• subcat. PP & loc., affected loc., direction PP adjuncts:

add thematic participant WH+P with these: accidental gap

4.3 Categorical corpus data: Constraints on Pstrand

Page 20: Thomas Hoffmann (University of Regensburg)

Claim: comparison of WH- vs that/Ø shows:

P can only be stranded if: PP adds thematic information to predicates/events

Comparison of WH- vs that/Ø good evidence, but:still “negative data” problem

further corroborating evidence neededIntrospection: Magnitude Estimation study

4.3 Categorical corpus data: Constraints on Pstrand

Page 21: Thomas Hoffmann (University of Regensburg)

• relative judgements (reference sentence)

• informal, restrictive RCs tested for:

P-PLACEMENT (Pstrand, Ppied-piped)RELATIVIZER (WH-, that-, Ø-)X-PP (VPrep, PPTemp/Loc_Adjunct, PPManner/Degree_Adjunct)

• tokens counterbalanced: 6 material groups a 18 tokens + 36 filler = 54 tokens

• tokens randomized (Web-Exp-software)

• N = 36 BE native speakers (sex: 18m, 18f / age: 17-64)

5. Empirical Study II: Magnitude Estimation

Page 22: Thomas Hoffmann (University of Regensburg)

18 filler sentences: ungrammatical

a. That’s a tape I sent them that done I’ve myself (word order violation; original source: <ICE-GB:S1A-033 074>)

b. There was lots of activity that goes on there (subject contact clause; original source: <ICE-GB:S1A-004 #067>)

c. There are so many people who needs physiotherapy (subject-verb agreement error; original source: <ICE-GB:S1A-003 #027>)

5. Empirical Study II: Magnitude Estimation

Page 23: Thomas Hoffmann (University of Regensburg)

ANOVA: significant effects

• P-PLACEMENT: F(1,33) = 4.536, p < 0.05

• RELATIVIZER: F(2,66) = 17.149, p < 0.001

• P-PLACEMENT*X-PP: F(2,66) = 9.740, p < 0.001

• P-PLACEMENT*RELATIVIZER: F(2,66) = 4.217, p < 0.02

5. Empirical Study II: Magnitude Estimation

Page 24: Thomas Hoffmann (University of Regensburg)

ANOVA: not significant

• AGE: F(1,33) = 2.760, p > 0.10

• GENDER:F(1,33) = 1.495, p > 0.20

indicates: homogeneity of subjects

5. Empirical Study II: Magnitude Estimation

Page 25: Thomas Hoffmann (University of Regensburg)

Post-hoc Tukey test: P-Place*Relativizer

• Ppied-piped:WH- >> that [p < 0.001]WH- >> [p < 0.001]

that > [p < 0.010]

• Pstrand:no difference:WH- = that = [p >> 0.100]

5. Empirical Study II: Magnitude Estimation

Page 26: Thomas Hoffmann (University of Regensburg)

Post-hoc Tukey test: P-Place*X-PP

• Ppied-piped:PPMan/Deg > VPrep [p < 0.010]PPMan/Deg = PPTemp/Loc [p = 0.100]

VPrep = PPTemp/Loc [p > 0.100]

• Pstrand:no difference:VPrep > PPTemp/Loc > PPMan/Deg [p < 0.001]

5. Empirical Study II: Magnitude Estimation

Page 27: Thomas Hoffmann (University of Regensburg)

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

me

nts

(z-

sco

res)

P+WH

P+That

P+0

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Fig. 1: Magnitude estimation result for P + relativizer

P+WH >> P+that > P+Ø

Page 28: Thomas Hoffmann (University of Regensburg)

Fig. 2: Magnitude estimation result for P + relativizercompared with fillers

P+that & P+Ø = ungrammatical fillers violation of “hard constraint” (Sorace & Keller 2005)

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

men

ts (

z-sc

ore

s)

P+WH

P+That

P+0

Filler (grammatical)

Filler (*Agree)

Filler(*ZeroSubj)

Filler(*WordOrder)

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Page 29: Thomas Hoffmann (University of Regensburg)

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

me

nts

(z-

sco

res)

WH+P

That+P

0+P

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Fig. 3: Magnitude estimation result for relativizer + P

WH + P= that + P = Ø + PVPrep > PPTemp/Loc > PPMan/Deg

Page 30: Thomas Hoffmann (University of Regensburg)

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2M

ean

Ju

dg

me

nts

(z-

sco

res)

X+P

Filler_Good

Filler(*Agree)

Filler(*ZeroSubj)

Filler(*WordOrder)

prepositional verbs temp/loc adjuncts manner/deg adjuncts

Fig. 3: Magnitude estimation result for relativizer + P

VPrep > PPTemp/Loc > PPMan/Deg >> ungrammatical filler violation of “soft constraint” (Sorace & Keller 2005)

Page 31: Thomas Hoffmann (University of Regensburg)

6. Corroborating Evidence

Corroborating evidence:

corpus: man/deg PPs: no Pstranded (not even with that/) semantic constraint on Pstranded

experiment:man/deg PPs worst environment for Pstranded yet: better than ungrammatical fillers

(soft constraint violation)

Page 32: Thomas Hoffmann (University of Regensburg)

Constraints on variable corpus data (354 finite WH-token):

Goldvarb identified 3 independent factors: (Log likelihood = -88.437 Significance = 0.004;

Fit: X-square(27) = 27.977, accepted, p = 0.2040)

1. level of formality (as expected)

2. type of PP contained in (as expected)

3. restrictiveness (unexpected): restrictive RC favour pied piping: (weight: 0.592)

nonrestrictive RC clearly inhibit pied piping (i.e. favour stranding; weight: 0.248)

7. Empirical Study III: Corpus Data II

Page 33: Thomas Hoffmann (University of Regensburg)

(6) And uhm he left me there with this packet of Durex which I hadn't got a clue what to do **[with]** to be totally honest <ICE-GB:S1B-049 #167:1:B>

reasons for restrictiveness effect:

1. weaker semantic ties of non-restrictive clause with antecedent (pause/comma)

2. Pied-piped P receives connective function

functionalisation of preposition placement in WH-relative clause

7. Empirical Study III: Corpus Data II

Page 34: Thomas Hoffmann (University of Regensburg)

corpus and introspection data = corroborating evidence:

corpora:frequency/context effects (e.g. level of formality)unexpected patterns (e.g. restrictiveness)categorical data require further investigation

introspection: differentiation of

accidental gaps (WH+P with PPTemp/Loc)systematic gaps (X+P with PPMan/Deg)detection of degrees of ungrammaticality

8. Conclusion

Page 35: Thomas Hoffmann (University of Regensburg)

9. References

Aarts, B. 2000. "Corpus linguistics, Chomsky and Fuzzy Tree Fragments". In Christian Mair and Marianne Hundt, eds. 2000. Corpus Linguistics and Linguistic Theory. Amsterdam and Atlanta, GA: Rodopi, 5-13.

Bard, E.G. et al. 1996. “Magnitude Estimation of Linguistic acceptability”. Language 72:32-68.

Bergh, G. & A. Seppänen. 2000. “Preposition stranding with wh-relatives: A historical survey”. English Language and Linguistics 4:295-316.

Cowart, W. 1997. Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks: Sage.

Huddleston, R. et al. 2002. “Relative constructions and unbound dependencies”. In: G.K. Pullum & R. Huddleston, eds. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press, 1031-1096.

Jackendoff, R. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press.

Levine, R. & I.A. Sag. 2003. “WH-Nonmovement”. <http://www-csli.stanford.edu/~sag>, 04.07.2004.

Page 36: Thomas Hoffmann (University of Regensburg)

9. References

Nelson, G. et al. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam, Philadelphia: Benjamins.

McEnery, T. and A. Wilson. 1997. Corpus Linguistics. Edinburgh: Edinburgh University Press.

Pesetsky, D. 1998. “Some principles of sentence production”. In: Pilar Barbosa et al., eds. Is the Best Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press, 337-83.

Penke, M. & A. Rosenbach. 2004. "What counts as evidence in linguistics? An introduction". Studies in Language 28,3: 480-526.

Pickering, M. & G. Barry. 1991. “Sentence processing without empty categories”. Language and Cognitive Processes 6:229-259.

Quirk, R. et al. 1985. A Comprehensive Grammar of the English Language. London: Longman.

Robinson, J. et al. 2001. “GOLDVARB 2001: A Multivariate Analysis Application for Windows”. <http://www.york.ac.uk/depts/lang/webstuff/goldvarb/manualOct2001>

Page 37: Thomas Hoffmann (University of Regensburg)

9. References

Sag, I.A. 1997. “English relative constructions”. Journal of Linguistics 33:431-484.

Sampson, G. 2001. Empirical Linguistics. London, New York: Continuum.

Schütze, Carson T. 1996. The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: Chicago University Press.

Sorace, Antonella and Frank Keller. 2005. "Gradience in linguistic data". Lingua 115,11: 1497-1525.

Trotta, J. 2000. Wh-clauses in English: Aspects of Theory and Description. Amsterdam and Philadelphia, GA: Rodopi.

Van der Auwera, J. 1985. “Relative that — a centennial dispute”. Journal of Linguistics 21:149-179.