Vocabulary use during conversation: a cross- sectional study of development amongst learners of...

Vocabulary use during conversation: a cross-

sectional study of development amongst learners of Spanish

and FrenchWORK IN PROGRESS – PLEASE DO NOT CITE WITHOUT PERMISSION FROM THE AUTHORS

Emma Marsden, University of York, [email protected]

Annabelle David, University of Newcastle, [email protected]

Aims

• Document / describe progression – useful for teaching practice and assessmentAlso, indirectly & in long term:– Begin analysis of use of formulaic language

– Does it interact with learning a generative grammar?

– Begin to explore relationship between learner’s vocabulary and their morphosyntactic development

– L1 and early bilingual literature suggests causal link

Outline

• The task, data and participants• Results

1. General diversity of types & counts of tokens

2. Use of different word classes• Nouns versus verbs

3. Diversity of inflections4. Formulaic language

• Conclusions

Can we measure lexical knowledge from oral corpus data?

• Size of vocabulary - no– need to give them tests

• Richness, sophistication, rarity – yes, but not yet– needs a comparison with word lists. – From relevant corpus i.e. oral, and L2 classroom

learners – not available!

• Diversity, or lexical variation - YES!– when they produce language, how often do they have

to repeat the same words? – what is the balance of nouns, verbs, adjectives?

The task

• Photos task: semi-guided interview / conversation– Descriptions of photos– Questions about photos– Discussion around photos,

• relating to past, current and future activities.

The Participants

• English speaking learners of French and Spanish

• From years 9 and 13 (approximately 230 and 600 hours classroom instruction respectively)

• twenty learners in each group in each language. – twenty final year undergraduates in Spanish– native controls

• 15, age-matched Spanish natives, • and five adult French natives.

– approximately 120 participants in total

The Data: Which bits of speech are ‘words’?

• Data excluded from the analysis– Filled pauses (er…)– Repeated language (with or without corrections)– Imitations of researcher– Words in another language (e.g. French & English)

• Data included– Including made-up words or incorrect words e.g. mi

hermano nadar (for nada)– Final repair– Some lemma (stem) counts: va, vamos = 1 – Some whole word counts: va, vamos = 2 – Some counts of just the inflections

Types & tokens

• Ojos…ojos– 1 lexical type– 2 lexical tokens

• Mira….miran…miras…– 1 lexical type– 3 lexical tokens

Types and tokens based on LEMMAS

Group (n)

tokens(st.dev)

different types(st.dev)

TTR

Sp Fr Sp Fr Sp Fr

year 9 (20)

194(117)

230(115)

64(28)

65(24)

.387*(.122)

.311(.092)

year 13 (20)

523(134)

529(191)

155(32)

142(38)

.300*(.028)

.279(.038)

•The more diverse the speech, the higher the TTR. •According to TTR, year 9 have a more diverse vocabulary than year 13 (Spanish) and no difference in French!•TTR is problematic, not a valid measure (but it is standard

output in CLAN FREQ commands)

Compensating for influence of text length

• Guiraud index

Types/√tokens.

• D

Uses random sampling of tokens in plotting curve of TTR against increasing token size.

Calculated by vocd in CLAN software– Usually correlates well with Guiraud

DGroup (n) D based on words

(st. dev)D based on lemmas (st. dev)

Sp Fr Sp Fr

year 9 (17)

35.40 (8.55)

29.40(11.41)

24.31 (5.81)

20.98(6.82)

year 13 (20)

56.62 (12.99)

54.57(13.77)

38.63 (7.92)

35.60(57.34)

undergraduates (20)

Natives

Results 2: Use of word class types

1. Basic descriptions of use: nouns, verbs, adjectives, interrogative pronouns, adverbs

2. How ‘nouny’ are their productions? • What proportion of word types belong to a certain

class?• What is the density of different word classes in total

productions?

3. Is the diversity of nouns different to the diversity of verbs?

4. Do these give any indication of progression?

Basic description: Adjectives and adverbs

*lemmas, not colours **lemmas, not y/n

Types of adjectives*

Tokens of adjectives*

Types of adverbs**

Tokens of adverbs**

Sp Fr Sp Fr Sp Fr Sp Fr

Year 9 (20)

1.5(1.4)

1.6(1.7)

2.3(2.6)

2.0(2.3)

4.1(2.4)

2.6(2.8)

Year 13 (20)

10.6(4.4)

8.4(4.8)

14.6(6.6)

11.7(6.7)

12.1(2.7)

23.6(9.7)

Basic description: Creo que, el hombre que…,

and interrogative pronouns

Tokens of que as conjunction +

relative

Types of interrogative

pronouns (lemmas)

Tokens of interrogative pronouns (lemmas)

Sp Fr Sp Fr Sp Fr

Year 9 (20)0.7(1.3)

1.2(0.9)

2.6(2.5)

Year 13 (20) 6.8

(6.6)1.8(0.8)

5.9(3.2)

A nouny style

Year 9 (230 hours instruction)

*P02: two chi eh dos chicos un camisa Southampton.

*MJA: now I would like you to ask me questions about the pictures so...

*P02: hermanos ?

*MJA: eh estos son hermanos sí mmm.

How much speech is nouns & verbs?

*all are based on lemmas

Noun Types / Total Types*

Noun Tokens / Total Tokens

Verb Types / Total Types

Verb Tokens /Total Tokens

Sp Fr Sp Fr Sp Fr Sp Fr

Year 9 (20) 34%

(11)

28%(5)

28%(11)

17%(4)

12%

(4)

12%

(4) 15%(5)

16%(5)

Year 13 (20)

28%(4)

25%(4)

18%(2)

12%(2)

15%(2)

15%(2)

18%(3)

19%(3)

Proportion of types out of all types (see e.g. Kauschke and Hofmeister, 2002

How much speech is adjectives?

Group (n) Adj tokens out of all tokens

Adj types out of all types

Sp Fr Sp Fr

year 9 (20) 1.0% 0.7% 1.9% 2.0%

year 13 (20) 2.7% 2.1% 6.7% 5.6%undergraduate

(20)

Natives

Comparing diversity of noun types to diversity of verb types

• Malvern et al (2004) propose the ‘Limiting Relative Diversity’ calculation to compare the diversity of different word classes when token samples are different

• Implemented by CLAN vocd software– Square root of division of diversity of one

word class by the diversity of the other– Needs at least 50 tokens of noun, 50 of verb

Limiting relative diversity

Group (n) LRD (verbs / nouns)

Sp Fr

year 9(n Sp = 5)(n Fr = 4)

.366 (.061)

.353(.095)

year 13(n Sp = 18)(n Fr= 13)

.425(.089)

.341(.073)

•NO stat sig. differences between diversity of verbs and diversity of nouns between year 9 and 13.

•Unreliable (Small sample sizes) or•new nouns and verbs learnt at same rate??

• BUT LRD correlates well with:– proportion of verb types / total types (r=.786**)– verb tokens / total tokens (r=.862**)– verb noun ratio (r=.862**)

– And these all DO increase between yr 9 & 13

• Need year UG & natives to validate LRD

Results 3: Inflectional diversity

• Inflectional diversity

total number of words - total stem forms

= number of inflectional variations on stem forms

i.e. how well are the learners manipulating stem forms

• See Malvern et al. (2004).

Inflectional Diversity

Group (n) Inflectional diversity (D words-D lemmas)(st. dev)

Sp Fr

year 9 (Sp n=17)(Fr n = 20)

11.09 (4.38)

8.4(5.3)

year 13 (20) 17.99(5.59)

18.98(7.49)

Does verb use correlate with inflectional diversity?

• Broeder, Extra, van Hout (1993) found verb use indicates progression– See also NSF data, and argument by Myles

(2004)

• Correlating lexical and inflectional diversity with verb/noun proportions…

Indicators of development?

• As learners use more verb types, they use more inflections (strong positive correlations)

• Inflectional diversity does not seem to correlate with use of other word classes

• Nouns (tokens & types) decrease, verbs increase (strong negative correlations)

Results 4: Formulaic language – lexical items?

• Criteria for a ‘chunk’ (Myles et al, 1998)

– Greater length and complexity of sequence compared with other learner output; usually well-formed

– Often used inappropriately (syntactically, semantically, pragmatically), e.g. overextensions

Formulaic language (chunks)

asking about people in photos:P02: eh dónde vives ?*MJA: mmm ellos ? …ellos viven en

Southampton .*P02: mmm cuántos años tienes ?*MJA: eh ellos.*P02: tú ?*MJA: tienen doce y trece .

Chunks even when some verbs appear to be manipulated

*P03: come... lleva... están.... hacen ....están jugando...son...jugan...jugo ...tengo...voy...voy a ir

BUT THEN... eh cuánto años tienes ? (for how old is he?)

BUT later: mi hermano tiene once años y mi hermana que se llama Ellie y tiene ocho años

MJA: qué haces un sábado normal en tu en tu vida ?*P02: jugar al fútbol en mañana y salgo con mis

amigos en tarde

• CONTEX-DEPENDENT ACCURACY: CHUNKS, or item by item learning?

Conclusions

• The tasks in SPLLOC and FLLOC seemed to elicit broadly similar language

• Greater verb density seems to indicate progression

• 450 more hours instruction does make significant difference– both for vocabulary diversity and inflectional diversity– previous comparisons between smaller gaps suggest no gains

• Formulaic language– Evidence for item by item learning (constructionist)?

Limitations• Only one measure of lexical knowledge –

productive, oral

• This quantitative approach doesn’t tell us about accuracy of lexical or inflectional use (e.g. gastar (spend) time)

• We can say positive correlation between inflectional and lexical diversity – but this product data does not tell us whether increase in vocabulary enables processing of morphosyntax

Future directions

• Comparisons with undergrads and native controls

• A richness measure– will be based on rarity WITHIN our own corpus

• Analysis of closed class items, using CLAN’s list

• Further exploration of relationship of increased lexical knowledge, increased verb types and emerging morphosyntax

Vocabulary use during conversation: a cross- sectional study of development amongst learners of...

Documents

Transcript of Vocabulary use during conversation: a cross- sectional study of development amongst learners of...