LIN 3098 Corpus Linguistics

41
LIN 3098 Corpus Linguistics Albert Gatt

description

LIN 3098 Corpus Linguistics. Albert Gatt. In this lecture. We proceed with our discussion of how corpus-based studies influence the study of grammar. Focus: lexico-grammar. Uses of corpora in grammar studies. The use of corpora to study grammar is relatively recent. - PowerPoint PPT Presentation

Transcript of LIN 3098 Corpus Linguistics

Page 1: LIN 3098 Corpus Linguistics

LIN 3098 Corpus Linguistics

Albert Gatt

Page 2: LIN 3098 Corpus Linguistics

In this lecture We proceed with our discussion of

how corpus-based studies influence the study of grammar.

Focus: lexico-grammar

Page 3: LIN 3098 Corpus Linguistics

Uses of corpora in grammar studies The use of corpora to study grammar is

relatively recent. With corpora, the unit of analysis tends to be the

word (tokens/types) Studies of lexis therefore a natural application.

The study of grammar has in fact emphasised the role of lexis. Also aided by recent developments in automatic

POS tagging and parsing. Additional grammatical information enables

search and analysis of complex structures.

Page 4: LIN 3098 Corpus Linguistics

Part 1

The relationship between grammar and lexis

Page 5: LIN 3098 Corpus Linguistics

Degrees of abstraction We have already looked at the use of

corpora in studying collocations. Given sufficient grammatical

annotation, we can look at collocational patterns at different degrees of abstraction.

Page 6: LIN 3098 Corpus Linguistics

Degrees of abstraction Example: all

preceding collocates of the noun time in the BNC.

Not all collocates are equally interesting. lots of noise when

searching for a single word!

word frequencythe 266first 104this 96of 72same 67a 65

Page 7: LIN 3098 Corpus Linguistics

Practical task 1 Let’s try to make our search more

interesting, by focusing on a combination of lexical and grammatical material.

Conduct a search for: Any adjective followed by the noun time

Page 8: LIN 3098 Corpus Linguistics

Degrees of abstraction Example: only

adjectival collocates of the noun time in the BNC.

Can make grammatically informed queries. [ADJ + time]

Allows focus on what is truly of interest.

word frequencylong 38good 11spare 7little 6present 6whole 5

Page 9: LIN 3098 Corpus Linguistics

Practical task 2 We can go further in abstracting away

from specific lexical material.

Conduct a search for: Any adjective followed by any noun

Page 10: LIN 3098 Corpus Linguistics

Degrees of abstraction Suppose we were

interested in all adjective-noun combinations. [ADJ + N]

Given a query language of the right complexity (such as CQL), we can extract grammatically interesting collocations.

ADJ+N Freq.prime minister 102

other hand 65

local authorities 44

long time 42

soviet union 41

Page 11: LIN 3098 Corpus Linguistics

Limitations of these approaches What we’ve done still retains a focus

on the word. The main purpose is to improve

lexical research by incorporating a limited amount of grammatical info (usually POS)

Can we go further and really investigate grammar?

Page 12: LIN 3098 Corpus Linguistics

Part 2

Collocational Frameworks

Page 13: LIN 3098 Corpus Linguistics

Does this sound familiar? Colourless green ideas sleep furiously Chomsky’s example illustrates an approach

to syntax where: the primary focus is on syntactic rules rules manipulate lexical items of the right

categories “grammatical” or “legal” is distinct from

“sensible” or “meaningful” syntactic rules operate (semi-) independently of

lexical items: if X is of the right category, then X can be slotted into a syntactic position

Page 14: LIN 3098 Corpus Linguistics

Chicken and egg questions When we formulate an utterance, which

comes first? syntax? lexical items? both in parallel?

Do particular syntactic constructions have a meaning (or communicative function)? E.g. what is the meaning of: the appositive that-construction

The reason that he gave was… the extraposed it-construction

It is possible to hire a car if you want one.

Page 15: LIN 3098 Corpus Linguistics

Lexical approaches to grammar Assumptions:

syntactic structures are highly sensitive to the lexical items that they can select

structures also may have specific communicative functions or meanings speakers/authors convey meaning, and syntax

is used as a resource to convey it ideally, grammar+lexis should be viewed as part

and parcel of the same process phraseology and co-selection play an important

role in particular constructions, we find that

particular words tend to co-occur with great regularity

Page 16: LIN 3098 Corpus Linguistics

The idiom principle Sinclair (1991):

“a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments”

Page 17: LIN 3098 Corpus Linguistics

Implications The idiom principle suggests that

speakers/writers: Don’t just apply abstract rules to build

structures; Re-use bits of structure;

It also implies that bits of structure are themselves meaningful.

Page 18: LIN 3098 Corpus Linguistics

The idiom principle vs open choice This principle contrasts with the

“open-choice” principle. Open choice predicts that:

Syntactic rules operate independently of lexical items.

Structures are constructed by applying rules and “plugging” in lexemes.

Page 19: LIN 3098 Corpus Linguistics

Putting the idiom principle to work Sinclair and Renouf (1991) introduced

collocational frameworks Intended as a practical way to investigate the

use and meaning of grammatical constructions A collocational framework consists of

a pattern involving 3 items: A function word A content word (specified via POS) Another function word Example: [a + Noun + of]

Page 20: LIN 3098 Corpus Linguistics

Collocational frameworks Is a pattern like [a + Noun + of] a

linguistic unit? If it is, we would expect that: The grammatical context (a, of) makes

restrictions on the semantics of the Noun in the middle (not any noun can be used)

Page 21: LIN 3098 Corpus Linguistics

Practical task 3 Conduct a search for:

The collocational framework [a+Noun+of]

In looking at the nouns that occur here, can you spot any semantic commonalities?

What does this tell you about the way the structure itself is used, and what it usually means?

Page 22: LIN 3098 Corpus Linguistics

[a + Noun + of] Nouns in this construction are often

quantities: a lot of a number of ...

This suggests that this construction itself places a restriction on the semantics of the content words used in it.

Page 23: LIN 3098 Corpus Linguistics

Collocational frameworks: final remarks Sinclair and Renouf did not suggest

that any string of words or pattern counts as a collocational framework.

Crucially, there has to be evidence for semantic restrictions on content words.

E.g. [Verb in NP] doesn’t count as a good pattern, because practically any verb can occur in the first position.

Page 24: LIN 3098 Corpus Linguistics

Part 3

Colligates

Page 25: LIN 3098 Corpus Linguistics

Colligations Roughly, a collocation at the level of

part of speech. An idea due to Firth. The main

question is: What are the grammatical environments

in which a particular word occurs? One way of answering this question is

to look for a word, and then look at the POSs to the left and right.

Page 26: LIN 3098 Corpus Linguistics

Practical task 4 Conduct a search for the word

consequence, specifying any word to the right and any word to the left.

Make a frequency count of node tags.

What do you observe?

Page 27: LIN 3098 Corpus Linguistics

Some data (Gries 2009) Left context of consequence

Article Adjective ...

Right context: Of Preposition ...

Page 28: LIN 3098 Corpus Linguistics

Observations This operationalisation of the concept

of colligation is highly related to the collocational framework of Renouf/Sinclair.

It’s primarily intended to give an idea of the grammatical environment in which a word occurs.

Page 29: LIN 3098 Corpus Linguistics

Limitations Both collocational frameworks and

colligations have some drawbacks: They’re still highly word-based They focus only on POS (not full syntax) Their view of grammatical structure is

purely linear.

Page 30: LIN 3098 Corpus Linguistics

Part 3

Some case studies

Page 31: LIN 3098 Corpus Linguistics

Example 1: It as object Components:

non-referential use of it object of a verb followed by an NP or AdjP

Examples (from the BNC): Many people who use drugs regularly find it

difficult to exist in a drug-free world . You can also find it hard to remember things in court unless they agree to do so , making it

difficult for detainees to challenge the validity

Page 32: LIN 3098 Corpus Linguistics

Example 1 continued Typical analysis:

this construction involves extraposition:People who use drugs find existing in a drug-free world difficult.People who use drugs find it difficult to exist in a drug-free world

Some empirical observations on lexis (Francis 1993): 98% of cases involve find and make some other verbs like think, consider, see to

Possible “meaning”/function of the structure: a stereotyped way of presenting a situation in terms

of how it is evaluated evaluation is placed after the verb

Page 33: LIN 3098 Corpus Linguistics

Example 2: appositive clauses Apposition:

a relation between an NP and another phrase which refers to the same thing (Leech and Svartvik, 1975)

Examples: your daughter, the lawyer, is here

In English, can also occur with that-clauses and to-clauses: the news that your daughter was here the plot to assassinate the president

Page 34: LIN 3098 Corpus Linguistics

Example 2: appositive clauses Distinguished from restrictive relative

clauses: the dog that I saw yesterday restricts the reference of the head noun

Appositive clause: the fact that I came does not restrict the reference of the head noun “amplifies” or “qualifies” the head noun

Page 35: LIN 3098 Corpus Linguistics

Example 2: Appositives Appositive that-clauses (BNC):

The fining of airlines plus the fact that the nationals of many refugee-producing countries

as firm as the Emperor Augustus about the principle that a ruler 's actual appearance matters less

Traditional grammars (Leech and Svartvik 1975): “head noun must be an abstract noun”

Question: what are the lexical restrictions here? do they have implications for the function of this

syntactic structure?

Page 36: LIN 3098 Corpus Linguistics

Levels of stereotypicality in syntax Phraseological constraints:

the co-selection of particular lexical items within a particular syntactic structure

These seem to range on a continuum. At one extreme: fixed, unchanging

constructions (behave like multi-word lexical items)

At the other: complete freedom in lexical selection.

Page 37: LIN 3098 Corpus Linguistics

Phraseology Completely fixed idioms:

it never rains but it pours Less fixed idioms:

put on a brave face putting a brave face on … put a good face on… Some room for lexical manoeuvre

Semi-prepackaged phrases which allow for variation: I haven’t the faintest/foggiest/remotest idea/notion

Highly nebulous lexico-syntactic dependencies: be a case of X a case of déjà vu a case of take the money and run …

Page 38: LIN 3098 Corpus Linguistics

Syntactic “fixedness” Given the cline from fixed to flexible, some

linguists (e.g. Francis 1993) suggest that the distinction between “lexicon” and “syntax” is arbitrary. This argument is based on phraseological

constraints observable only in very large corpora.

This is not too far from recent positions in Generative Grammar: Jackendoff (2002)’s parallel architecture; Construction Grammar (e.g. Goldberg, 1995)

Page 39: LIN 3098 Corpus Linguistics

The “item” and the “environment” Francis proposes that the distinction

between “lexical item” and “syntactic environment” only be used for convenience.

Proposed method: look at a syntactic environment discover lexical regularities focus on a subset of the lexical items discover further generalisations about the

grammar of those items

Page 40: LIN 3098 Corpus Linguistics

Case study: Extraposed it-clauses One of the most frequent adjectives is

possible: it is possible to hire a car it is possible that it will rain

Proposed interpretations: that-clause is used for possibility to-clause is used to express ability

This suggests that possible might have (at least) two different meanings.

Page 41: LIN 3098 Corpus Linguistics

The grammar of possible Further patterns involving possible:

article + superl. adj. + possible + nounthe best possible start

as … as possible …

Main idea: specifications of possible grammatical environments of the item can help specify its range of meanings. these examples seem to confirm the

ability/probability use of possible