Vocabulary use during conversation: a cross- sectional study of development amongst learners of...
-
Upload
dakota-kingsberry -
Category
Documents
-
view
215 -
download
0
Transcript of Vocabulary use during conversation: a cross- sectional study of development amongst learners of...
Vocabulary use during conversation: a cross-
sectional study of development amongst learners of Spanish
and FrenchWORK IN PROGRESS – PLEASE DO NOT CITE WITHOUT PERMISSION FROM THE AUTHORS
Emma Marsden, University of York, [email protected]
Annabelle David, University of Newcastle, [email protected]
Aims
• Document / describe progression – useful for teaching practice and assessmentAlso, indirectly & in long term:– Begin analysis of use of formulaic language
– Does it interact with learning a generative grammar?
– Begin to explore relationship between learner’s vocabulary and their morphosyntactic development
– L1 and early bilingual literature suggests causal link
Outline
• The task, data and participants• Results
1. General diversity of types & counts of tokens
2. Use of different word classes• Nouns versus verbs
3. Diversity of inflections4. Formulaic language
• Conclusions
Can we measure lexical knowledge from oral corpus data?
• Size of vocabulary - no– need to give them tests
• Richness, sophistication, rarity – yes, but not yet– needs a comparison with word lists. – From relevant corpus i.e. oral, and L2 classroom
learners – not available!
• Diversity, or lexical variation - YES!– when they produce language, how often do they have
to repeat the same words? – what is the balance of nouns, verbs, adjectives?
The task
• Photos task: semi-guided interview / conversation– Descriptions of photos– Questions about photos– Discussion around photos,
• relating to past, current and future activities.
The Participants
• English speaking learners of French and Spanish
• From years 9 and 13 (approximately 230 and 600 hours classroom instruction respectively)
• twenty learners in each group in each language. – twenty final year undergraduates in Spanish– native controls
• 15, age-matched Spanish natives, • and five adult French natives.
– approximately 120 participants in total
The Data: Which bits of speech are ‘words’?
• Data excluded from the analysis– Filled pauses (er…)– Repeated language (with or without corrections)– Imitations of researcher– Words in another language (e.g. French & English)
• Data included– Including made-up words or incorrect words e.g. mi
hermano nadar (for nada)– Final repair– Some lemma (stem) counts: va, vamos = 1 – Some whole word counts: va, vamos = 2 – Some counts of just the inflections
Types & tokens
• Ojos…ojos– 1 lexical type– 2 lexical tokens
• Mira….miran…miras…– 1 lexical type– 3 lexical tokens
Types and tokens based on LEMMAS
Group (n)
tokens(st.dev)
different types(st.dev)
TTR
Sp Fr Sp Fr Sp Fr
year 9 (20)
194(117)
230(115)
64(28)
65(24)
.387*(.122)
.311(.092)
year 13 (20)
523(134)
529(191)
155(32)
142(38)
.300*(.028)
.279(.038)
•The more diverse the speech, the higher the TTR. •According to TTR, year 9 have a more diverse vocabulary than year 13 (Spanish) and no difference in French!•TTR is problematic, not a valid measure (but it is standard
output in CLAN FREQ commands)
Compensating for influence of text length
• Guiraud index
Types/√tokens.
• D
Uses random sampling of tokens in plotting curve of TTR against increasing token size.
Calculated by vocd in CLAN software– Usually correlates well with Guiraud
DGroup (n) D based on words
(st. dev)D based on lemmas (st. dev)
Sp Fr Sp Fr
year 9 (17)
35.40 (8.55)
29.40(11.41)
24.31 (5.81)
20.98(6.82)
year 13 (20)
56.62 (12.99)
54.57(13.77)
38.63 (7.92)
35.60(57.34)
undergraduates (20)
Natives
Results 2: Use of word class types
1. Basic descriptions of use: nouns, verbs, adjectives, interrogative pronouns, adverbs
2. How ‘nouny’ are their productions? • What proportion of word types belong to a certain
class?• What is the density of different word classes in total
productions?
3. Is the diversity of nouns different to the diversity of verbs?
4. Do these give any indication of progression?
Basic description: Adjectives and adverbs
*lemmas, not colours **lemmas, not y/n
Types of adjectives*
Tokens of adjectives*
Types of adverbs**
Tokens of adverbs**
Sp Fr Sp Fr Sp Fr Sp Fr
Year 9 (20)
1.5(1.4)
1.6(1.7)
2.3(2.6)
2.0(2.3)
4.1(2.4)
2.6(2.8)
Year 13 (20)
10.6(4.4)
8.4(4.8)
14.6(6.6)
11.7(6.7)
12.1(2.7)
23.6(9.7)
Basic description: Creo que, el hombre que…,
and interrogative pronouns
Tokens of que as conjunction +
relative
Types of interrogative
pronouns (lemmas)
Tokens of interrogative pronouns (lemmas)
Sp Fr Sp Fr Sp Fr
Year 9 (20)0.7(1.3)
1.2(0.9)
2.6(2.5)
Year 13 (20) 6.8
(6.6)1.8(0.8)
5.9(3.2)
A nouny style
Year 9 (230 hours instruction)
*P02: two chi eh dos chicos un camisa Southampton.
*MJA: now I would like you to ask me questions about the pictures so...
*P02: hermanos ?
*MJA: eh estos son hermanos sí mmm.
How much speech is nouns & verbs?
*all are based on lemmas
Noun Types / Total Types*
Noun Tokens / Total Tokens
Verb Types / Total Types
Verb Tokens /Total Tokens
Sp Fr Sp Fr Sp Fr Sp Fr
Year 9 (20) 34%
(11)
28%(5)
28%(11)
17%(4)
12%
(4)
12%
(4) 15%(5)
16%(5)
Year 13 (20)
28%(4)
25%(4)
18%(2)
12%(2)
15%(2)
15%(2)
18%(3)
19%(3)
Proportion of types out of all types (see e.g. Kauschke and Hofmeister, 2002
How much speech is adjectives?
Group (n) Adj tokens out of all tokens
Adj types out of all types
Sp Fr Sp Fr
year 9 (20) 1.0% 0.7% 1.9% 2.0%
year 13 (20) 2.7% 2.1% 6.7% 5.6%undergraduate
(20)
Natives
Comparing diversity of noun types to diversity of verb types
• Malvern et al (2004) propose the ‘Limiting Relative Diversity’ calculation to compare the diversity of different word classes when token samples are different
• Implemented by CLAN vocd software– Square root of division of diversity of one
word class by the diversity of the other– Needs at least 50 tokens of noun, 50 of verb
Limiting relative diversity
Group (n) LRD (verbs / nouns)
Sp Fr
year 9(n Sp = 5)(n Fr = 4)
.366 (.061)
.353(.095)
year 13(n Sp = 18)(n Fr= 13)
.425(.089)
.341(.073)
•NO stat sig. differences between diversity of verbs and diversity of nouns between year 9 and 13.
•Unreliable (Small sample sizes) or•new nouns and verbs learnt at same rate??
• BUT LRD correlates well with:– proportion of verb types / total types (r=.786**)– verb tokens / total tokens (r=.862**)– verb noun ratio (r=.862**)
– And these all DO increase between yr 9 & 13
• Need year UG & natives to validate LRD
Results 3: Inflectional diversity
• Inflectional diversity
total number of words - total stem forms
= number of inflectional variations on stem forms
i.e. how well are the learners manipulating stem forms
• See Malvern et al. (2004).
Inflectional Diversity
Group (n) Inflectional diversity (D words-D lemmas)(st. dev)
Sp Fr
year 9 (Sp n=17)(Fr n = 20)
11.09 (4.38)
8.4(5.3)
year 13 (20) 17.99(5.59)
18.98(7.49)
Does verb use correlate with inflectional diversity?
• Broeder, Extra, van Hout (1993) found verb use indicates progression– See also NSF data, and argument by Myles
(2004)
• Correlating lexical and inflectional diversity with verb/noun proportions…
Indicators of development?
• As learners use more verb types, they use more inflections (strong positive correlations)
• Inflectional diversity does not seem to correlate with use of other word classes
• Nouns (tokens & types) decrease, verbs increase (strong negative correlations)
Results 4: Formulaic language – lexical items?
• Criteria for a ‘chunk’ (Myles et al, 1998)
– Greater length and complexity of sequence compared with other learner output; usually well-formed
– Often used inappropriately (syntactically, semantically, pragmatically), e.g. overextensions
Formulaic language (chunks)
asking about people in photos:P02: eh dónde vives ?*MJA: mmm ellos ? …ellos viven en
Southampton .*P02: mmm cuántos años tienes ?*MJA: eh ellos.*P02: tú ?*MJA: tienen doce y trece .
Chunks even when some verbs appear to be manipulated
*P03: come... lleva... están.... hacen ....están jugando...son...jugan...jugo ...tengo...voy...voy a ir
BUT THEN... eh cuánto años tienes ? (for how old is he?)
BUT later: mi hermano tiene once años y mi hermana que se llama Ellie y tiene ocho años
MJA: qué haces un sábado normal en tu en tu vida ?*P02: jugar al fútbol en mañana y salgo con mis
amigos en tarde
• CONTEX-DEPENDENT ACCURACY: CHUNKS, or item by item learning?
Conclusions
• The tasks in SPLLOC and FLLOC seemed to elicit broadly similar language
• Greater verb density seems to indicate progression
• 450 more hours instruction does make significant difference– both for vocabulary diversity and inflectional diversity– previous comparisons between smaller gaps suggest no gains
• Formulaic language– Evidence for item by item learning (constructionist)?
Limitations• Only one measure of lexical knowledge –
productive, oral
• This quantitative approach doesn’t tell us about accuracy of lexical or inflectional use (e.g. gastar (spend) time)
• We can say positive correlation between inflectional and lexical diversity – but this product data does not tell us whether increase in vocabulary enables processing of morphosyntax
Future directions
• Comparisons with undergrads and native controls
• A richness measure– will be based on rarity WITHIN our own corpus
• Analysis of closed class items, using CLAN’s list
• Further exploration of relationship of increased lexical knowledge, increased verb types and emerging morphosyntax