1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et...

7
1 2 Modern Approaches to Corpus Linguistics 2 Modern Approaches to Corpus Linguistics Dominique LONGRÉE, LASLA – Université de Liège et FUSL (Bruxel • automatic taggers as heuristic tools • multilevel approaches : the motives what do they have in common ?

Transcript of 1 2 Modern Approaches to Corpus Linguistics Dominique L ONGRÉE, LASLA – Université de Liège et...

1

2 Modern Approaches to Corpus Linguistics

2 Modern Approaches to Corpus Linguistics

Dominique LONGRÉE, LASLA – Université de Liège et FUSL (Bruxelles)

• automatic taggers as heuristic tools

• multilevel approaches : the motives

• what do they have in common ?

2 Modern Approaches to Corpus Linguistics

2

1. Automatic taggers as heuristic tools

a LASLA research project : testing various automatic recognition software, know as taggers

Biber, 1993, Illouz, 1999, etc. : the quality of production can vary significantly - from one type of text to another

- from one tagger to another.

Questions :- are the results better with a tagger trained

- on one author or on a given text for another text

- by the same author, or within the same discourse? - what can we deduce from those results regarding- the tagger or - the homogeneity of corpora?

2 Modern Approaches to Corpus Linguistics

3

1. Automatic taggers as heuristic tools

The test-texts :- book 3 of The Gallic Wars by Caesar – BGall3 (3673 tokens- The Conspiracy of Catilina by Sallust – SalCat. (10688 tokens), - book 3 of The History of Alexander the Great by Quintus Curtius

– QC3 (7261 tokens), - The First Oration Against Catilina by Cicero – CicCat1 (3333 tokens) - poem 66 of Catullus – Catu66 (586 tokens)

Varying the nature of the training and evaluation corpus , in order to identify and measure variant factors :

style of the workstyle of the authordiachronyliterary genretype of discourse

2 Modern Approaches to Corpus Linguistics

4

1. Automatic taggers as heuristic tools

In theoretical terms : taggers appear to have some value as heuristic instruments

For instance, highlight - the homogeneity of the historical style

over and above diachronic development- the gap between narration and discourse (speeches)- the gap between the styles of Caesar and Cicero- a smaller gap between Catullus and Cicero

or between Catullus and Quintus Curtius/Tacitus than the gap between Catullus and Caesar,

etc

2 Modern Approaches to Corpus Linguistics

5

2. Multilevel approaches : the “motives”

Some indicators intuitively catalogued in Latin narrative prose - sequences of verb tenses - lexical elements

repente, subito ‘suddenly’, ‘abruptly’- syntactical structures / ‘linking clichés’

Quibus rebus cognitis ‘Those things being known’Quod ubi animaduertit ‘When he had noticed that’

Limits- no very analysis as text’s structure indicators- no study of their interaction

- poor use for characterising text genre and style

2 Modern Approaches to Corpus Linguistics

6

2. Multilevel approaches : the “motives”

The Discourse Modes and Bases Approach - Kroon, 2007, 2009; Adema, 2007, 2008, 2009 - a priori definition of typical features for each discourse mode- in order to evaluate text homogeneity

LASLA and BCL approach

- to develop endogenous exploratory methods - to take into account this text linearity

- to specify functional convergences between several indicators

methods- calling upon mathematical models (neighborhoods, bursts) - combining

- small-scale qualitative approach- large-scope quantitative analysis

2 Modern Approaches to Corpus Linguistics

7

3. What do these approaches have in common ?

they take texts and discourses into account in both their dimensions

- the multilevel nature of texts and of languages, from phonetics to pragmatics

- the fact that texts and discourses - are organized according to linearity - can be considered as topological entities.