LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture Corpora for the study of...
-
Upload
jasper-richardson -
Category
Documents
-
view
223 -
download
3
Transcript of LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture Corpora for the study of...
In this lecture
Corpora for the study of genre/register variation revisit the concept of representativeness
and balance external vs. internal criteria: Biber (1992)
introduce the multi-dimensional approach to register/genre variation (Biber 1988)
A preliminary example
Compare the following: It is hard to resolve this problem. I find it hard to resolve this problem.
Is one intuitively more “formal”? Why?
A preliminary example Extraposed to-clause
It is hard to resolve this problem. It (expletive) Verb be An adjective (hard) or participle (boring) Clause starting with to + infinitive verb
Tends to be associated with a formal, “anomymous” style.
Tends to be “static”: Adjective or participle denotes a state, not a
dynamic event.
A preliminary example Extraposed to-clause
It is hard to resolve this problem. It (expletive) Verb be An adjective (hard) or participle (boring) Clause starting with to + infinitive verb
If our intuitions are correct, we would expect the distribution of this clause to vary across genres and registers.
What is a register? Would you consider the following to
be registers?1. recipe English2. legal Maltese3. specialised language used by ship-
builders
What are the crucial characteristics of register?
Defining register
Possible definitions (see overview in Paolillo 2000): register = “a field of discourse” or
“topic” register = “a combination of all the
parameters of the communicative situation”
register = “an occupationally determined variety of language”
Defining genre In discourse analysis and related
fields, genre is given a “sociologically oriented” definition:
“A socially ratified way of using language in connection with a particular type of social activity” suggests “typical” settings in which
language is used e.g. interview, lecture, story…
Why is this relevant?
Reminder (see lecture 2): general-purpose corpora aim for balance
and representativeness how genre/register are defined affects the
structure and the uses of the corpus
corpus-based studies of variation across/within registers need a well-defined notion
Balance and representativeness Balance:
refers to the range of types of text in the corpus e.g. the BNC’s construction was based on an a
priori classification of texts by domain, time and medium
Representativeness: refers to the extent to which the corpus contains
the full range of variation in the language.
Representativeness depends on balance as a prerequisite
Biber (1993) on achieving balance
Biber distinguishes: external criteria:
social and communicative contexts in which a particular sample of text/speech is produced
external criteria define registers or genres internal criteria:
linguistic (e.g. lexico-grammatical) features that distinguish texts
internal criteria define text types
External vs. internal Example: academic writing vs. spoken
conversation Some external criteria of differentiation:
primary channel (spoken/written/…) type of addressee factuality
Some internal criteria of differentiation: more uses of personal pronouns in spoken
discourse more use of passives in academic writing …
Which should come first? Biber’s argument:
“in defining the population for a corpus, register/genre distinctions [i.e. external criteria] take precedence over text-type distinctions. […] identification of the salient text-type distinctions in a language requires a representative corpus of texts…”
Biber’s external criteria
1. Primary channel: written/spoken/scripted
2. Format: published/unpublished
includes various publication formats
3. Setting: institutional/other/private-personal
Biber’s external criteria
4. Addresse/receivera. Plurality: unenumerated/
plural/individual/selfb. Presence: present/absentc. Interactiveness: none/little/extensived. Shared knowledge: general/ specialised/
personal
Biber’s external criteria
5. Addressor:a. Demographic variation: age, sex etcb. Acknowledgement: acknowledged
invididual/insititution
6. Factuality: factual-informational / intermediate / imaginative
7. Purposes: persuade, entertain, edify, inform, instruct…
8. Topics: [cf. the “Domain” definition in BNC texts]
The logic behind genre/register comparison
A priori distinction between different genres/registers adequately sampled to be representative
Given these externally-based distinctions, the question is: what linguistic features are characteristic
(give rise to) different genres?
Biber (1988, 1995) Compared twenty-one genres in spoken
and written British English
Used a precompiled list of 67 linguistic features, comparing: the extent to which these features “cluster
together” across genres high relative frequency of personal pronouns
=> high relative frequency of questions the extent to which these clusters are more
clearly present in different genres
Primary goals
1. identify the main dimensions (clusters of features) of variation underlying all registers
2. find similarities and differences between different registers
Dimensions Dimension:
group of features that are empirically determined to co-occur in text
Functional interpretation: given a set of features forming a dimension
e.g. pers. pronouns + questions the crucial question is: how do we interpret it
functionally? e.g. the cluster containing pers. pronouns and
questions shows a high level of interpersonal focus in the text
Factor analysis
The MF/MD approach uses factor analysis statistical technique to group together
related features based on their co-occurrence
resulting clusters of features (“factors) are then interpreted and given a label
this is the process of identification and functional interpretation of dimensions
Biber’s methodology1. Identify the grammatical features
based on review of existing literature
2. tag all relevant features in the corpus texts
3. post-edit the texts to ensure accuracy
4. count frequency of each feature in each text
5. apply factor analysis to compute co-occurrence patterns among features
6. interpret the resulting dimensions functionally
7. compare different registers to see how much each dimension is represented in them
Types of features
Lexical features type-token ratio (indicates the average
no. of different types given the number of tokens)
word length
lexical semantic features e.g. word classes like hedges (probably,
possibly…); speech act verbs (declare), etc
Types of features
Grammatical feature classes nouns, prepositional phrases, attributive
and predicative adjectives, etc.
Syntactic features: relative clauses, that-complements, pied-
piping constructions (Which car does he like?), conditional subordination (should you ever…)
The dimensions identified Involved vs. informational production Narrative vs. non-narrative production Elaborated vs. situation-dependent
reference Overt expression of persuasion Abstract vs. non-abstract styleNB. Many of these dimensions define
“poles of opposition”
Dimension 1: involved vs. informational Features:
1st & 2nd personal pronouns
questions reductions stance verbs hedges emphatics adverbial
subordination nouns adjectives prepositional phrases long words
Typical of conversations, letters(high personal involvement)
Typical of informational exposition, e.g. in official documents and academic writing
Dimension 2: Narrative vs. non-narrative
Features: past tense perfect aspect 3rd person pronouns speech act verbs
present tense attributive adjectives
Typical of fiction
Typical of broadcasts, telephone conversations, professional letters
Dimension 3: elaborated vs. situation-dependent reference
Features: wh-relative clauses
pied-piping phrasal coordination
time adverbials place adverbials
Typical of “elaborated” text: official documents, professional letters, written exposition
Typical of “situation-independent language”
Typical of “situation-dependent language”, e.g. broadcasts, fiction, personal letters
Dimension 4: Overt expression of persuasion
Features: modals conditional
subordination
lack of any of the above
Defines an “overt expression of persuasion type”e.g editorials, professional letters
Language which does not overtly seek to persuade
Dimension 5: Abstract vs. non-abstract style
Features:
agentless passives by-passives …
lack of any of the above
An “abstract style”: technical prose, academic prose, official documents
Language which is typically not abstract: conversation, public speeches, broadcasts…
Biber’s main argument
No one dimension is enough to characterise the properties of a particular register dimensions are coherent, correlated
groupings of features every register could be defined in terms
of the relative prominence of all 5 dimensions
Biber’s main argument Biber finds no evidence of an absolute
difference between spoken and written language e.g. conversations often display similar
characteristics to other non-spoken genres
Better to identify different types of speech (broadcast, scripted, spontaneous) view similarities and differences to different
types of writing
Summary
Biber’s MF/MD approach has proved highly influential in the study of register and genre
Crucially, relies on a priori definition of: features (“what to look for”) registers (“situationally-defined uses of
language”)
References Paolillo, J. C. (2000). Formalising formality.
Journal of Linguistics, 36: 215—259
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8 (4): 243-258.
Biber, D. (1995). On the role of computational, statistical and interpretive techniques in multi-dimensional analysis of register variation. Text, 15 (3): 314—370