1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et...
-
Upload
adele-sarah-palmer -
Category
Documents
-
view
215 -
download
1
Transcript of 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et...
1
2 Modern Approaches to Corpus Linguistics
2 Modern Approaches to Corpus Linguistics
Dominique LONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles)
• automatic taggers as heuristic tools
• multilevel approaches : the motives
• what do they have in common ?
2 Modern Approaches to Corpus Linguistics
2
1. Automatic taggers as heuristic tools
a LASLA research project : testing various automatic recognition software, know as taggers
Biber, 1993, Illouz, 1999, etc. : the quality of production can vary significantly - from one type of text to another
- from one tagger to another.
Questions :- are the results better with a tagger trained
- on one author or on a given text for another text
- by the same author, or within the same discourse? - what can we deduce from those results regarding- the tagger or - the homogeneity of corpora?
2 Modern Approaches to Corpus Linguistics
3
1. Automatic taggers as heuristic tools
The test-texts :- book 3 of The Gallic Wars by Caesar – BGall3 (3673 tokens- The Conspiracy of Catilina by Sallust – SalCat. (10688 tokens), - book 3 of The History of Alexander the Great by Quintus Curtius
– QC3 (7261 tokens), - The First Oration Against Catilina by Cicero – CicCat1 (3333 tokens) - poem 66 of Catullus – Catu66 (586 tokens)
Varying the nature of the training and evaluation corpus , in order to identify and measure variant factors :
style of the workstyle of the authordiachronyliterary genretype of discourse
2 Modern Approaches to Corpus Linguistics
4
1. Automatic taggers as heuristic tools
In theoretical terms : taggers appear to have some value as heuristic instruments
For instance, highlight - the homogeneity of the historical style
over and above diachronic development- the gap between narration and discourse (speeches)- the gap between the styles of Caesar and Cicero- a smaller gap between Catullus and Cicero
or between Catullus and Quintus Curtius/Tacitus than the gap between Catullus and Caesar,
etc
2 Modern Approaches to Corpus Linguistics
5
2. Multilevel approaches : the “motives”
Some indicators intuitively catalogued in Latin narrative prose - sequences of verb tenses - lexical elements
repente, subito ‘suddenly’, ‘abruptly’- syntactical structures / ‘linking clichés’
Quibus rebus cognitis ‘Those things being known’Quod ubi animaduertit ‘When he had noticed that’
Limits- no very analysis as text’s structure indicators- no study of their interaction
- poor use for characterising text genre and style
2 Modern Approaches to Corpus Linguistics
6
2. Multilevel approaches : the “motives”
The Discourse Modes and Bases Approach - Kroon, 2007, 2009; Adema, 2007, 2008, 2009 - a priori definition of typical features for each discourse mode- in order to evaluate text homogeneity
LASLA and BCL approach
- to develop endogenous exploratory methods - to take into account this text linearity
- to specify functional convergences between several indicators
methods- calling upon mathematical models (neighborhoods, bursts) - combining
- small-scale qualitative approach- large-scope quantitative analysis
2 Modern Approaches to Corpus Linguistics
7
3. What do these approaches have in common ?
they take texts and discourses into account in both their dimensions
- the multilevel nature of texts and of languages, from phonetics to pragmatics
- the fact that texts and discourses - are organized according to linearity - can be considered as topological entities.