Post on 19-Jan-2016
A new corpus for Spanish A new corpus for Spanish Second Language Acquisition Second Language Acquisition
ResearchResearch
L. Dominguez, R. Mitchell, M. J. Arche (U. of L. Dominguez, R. Mitchell, M. J. Arche (U. of Southampton), E. Marsden (U. of York), F. Southampton), E. Marsden (U. of York), F.
Myles (Newcastle U.)Myles (Newcastle U.)
A corpus for L2 AcquisitionA corpus for L2 Acquisition SLA theory aims to understand the complex SLA theory aims to understand the complex
mechanisms and conditions behind learner mechanisms and conditions behind learner grammarsgrammars
Access to good quality data is crucial: Access to good quality data is crucial: learner learner productionproduction data + focused data + focused comprehensioncomprehension tasks tasks
Increasing interest in the creation of Increasing interest in the creation of electronic learner corpora:electronic learner corpora:– sharing data more easilysharing data more easily– automatising some aspects of data analysis automatising some aspects of data analysis
through the use of software such as through the use of software such as concordancers, part of speech taggers, etc. concordancers, part of speech taggers, etc.
Some Existing Learner Corpora
– CHILDES: CHILDES: http://childes.psy.cmu.edu/
– TALKBANK: TALKBANK: http://talkbank.org/
– English Corpus Linguistics: English Corpus Linguistics: http://cecl.fltr.ucl.ac.be/Cecl-Projects/Icle/icle.htm
– L2 FRENCHL2 FRENCH FLLOC: FLLOC: www.flloc.soton.ac.uk/
– L2 (Written) SPANISHL2 (Written) SPANISH CEDEL 2: CEDEL 2: www.ugr.es/~cristoballozano/cedel2.htm
SPLLOC SPLLOC “Spanish Language Learner Oral “Spanish Language Learner Oral
Corpus”Corpus” 2 year 2 year ESRCESRC funded corpus project investigating the funded corpus project investigating the
development of L2 Spanishdevelopment of L2 Spanish Aims:Aims:
– a small scale, high quality cross-sectional database of spoken learner a small scale, high quality cross-sectional database of spoken learner SpanishSpanish
– topics being investigated lie at the syntax/discourse interfacetopics being investigated lie at the syntax/discourse interface Data:Data:
Collected Collected - c40 hours of audio recordings (native/non-native)- c40 hours of audio recordings (native/non-native) - 80 written focused tests on word order- 80 written focused tests on word order - 60 computer based tests on clitic comprehension- 60 computer based tests on clitic comprehension
95% transcribed to date!95% transcribed to date!
Immediate Research Immediate Research AgendaAgenda
Syntax/discourse interface as conceptualised Syntax/discourse interface as conceptualised
in generative linguistics, including:in generative linguistics, including:
– The acquisition of Spanish word orderThe acquisition of Spanish word order
– Clitic pronounsClitic pronouns
Verbal morphologyVerbal morphology
Development of the L2 lexiconDevelopment of the L2 lexicon
Corpus DesignCorpus Design Balance of spontaneous and focused data (semi-Balance of spontaneous and focused data (semi-
spontaneous oral tasks are complemented by spontaneous oral tasks are complemented by focused judgement and production tasks)focused judgement and production tasks)
Balance of genres (semi-spontaneous oral tasks Balance of genres (semi-spontaneous oral tasks include interview, narrative and discussion)include interview, narrative and discussion)
Balance of participants (20 L2 speakers from each Balance of participants (20 L2 speakers from each of beginner, intermediate and advanced levels + of beginner, intermediate and advanced levels + NS speakers)NS speakers)
Flexibility of computer-aided analysis (use of the Flexibility of computer-aided analysis (use of the CHILDES system, plus an XML version)CHILDES system, plus an XML version)
Free web access to all materials (anonymised Free web access to all materials (anonymised sound files, transcripts, analysis files) for all sound files, transcripts, analysis files) for all bonafide research users.bonafide research users.
Summary of tasks by type, elicitation Summary of tasks by type, elicitation methodmethod and genreand genre
Task Task TypeType
ElicitationElicitation GenreGenre Modern Modern Times Times
Loch Loch NessNess
PhotosPhotos Paired Paired DiscussioDiscussio
nn
Clitic Clitic ProductioProductio
nn
Clitic Clitic ComprehCompreh
ensionension
Word Word OrderOrder
SEMI -SEMI -SPONTASPONTANEOUSNEOUS
Oral Oral
NarrativeNarrative √√ √√ √√
InterviewInterview √√
DiscussionDiscussion √√
FOCUSEFOCUSED D TASKSTASKS
Oral Oral Production Production √√
Computer Computer basedbased
CompreheComprehensnsionion
√√
Paper Paper WrittenWritten
√√
Some task samples
Illustrations by Alex Brychta for “A Monster Mistake” by Roderick Hunt (Oxford Reading Tree, 2003) used by permission of Oxford University Press.
Loch Ness
Modern Times
Photos task
Description of states
And
Description of events
Clitic Comprehension (computer based)
The learner hears a sentence with a clitic pronoun and has to click on the object it refers to.
32 screens: Combination of number and gender (canonical and non-canonical)
plus syntactic collocation.• Canonical feminine: -a ending (e.g. calculadora ‘calculator’)• Canonical masculine: -o ending (e.g. teléfono ‘phone)• Non canonical: no –a/-o ending (e.g. lápiz)• Collocation: Proclitic (as in coniugated verbs) vs. enclitic (as in
infinitives).
Clitic Production (computer based)
The learner is asked a question referring to an object based on the sequence of pictures shown.
32 slides; combination of number and gender (canonical and non-canonical) plus syntactic collocation.
Word Order Task (paper & pencil) Context-dependent word order preference test
• The learner is presented with 28 situations with a following question
• Two types of questions: What happened? (Broad focus) Who did x? (Narrow focus)
• 4 items by 7 syntactic contexts:4xSVO, 4xVOS, 4xCLLD, 4xUnerg/Narrow, 4xUnerg/
Broad, 4xUnacc/Narrow and 4xUnacc/Broad • Three options: Inverted (VS), non-inverted (SV) and both.
1. You get home and your brother just tells you that he has got an email from your friend Sue and that he has very good news to tell you. You ask your brother “¿Qué ha pasado?” (What happened?)
What could he say? a. Se ha comprado un coche Sue b. .Sue se ha comprado un coche c. Both sentences (Sue has bought a car) (Sue has bought a car)
2. Your brother is having some friends over for a get together at home. When your mother comes she sees some smoke coming out of the bathroom and she asks your brother: “¿Quién está fumando?” (Who’s smoking?)What could you brother say? a.Oscar está fumando b. B. Está fumando Oscar c. Both sentences (Oscar is smoking) (Oscar is smoking)
Summary of subjects by task Summary of subjects by task (to date)(to date)
Task Type Task Name University (Final Year)
Sixth Form College
(Year 13)
Lower Secondary
School (Year 9)
Natives (all ages)
Open-ended
Modern Times
20 5
Loch Ness 20 20 20 15
Photos 20 20 20 15
Paired Discussion
20 20 5
Focused
Clitic Comprehension
20 20 20 3
Picture Sequence
20 20 20 10
Word Order 20 20 19 20
Tools for Data AnalysisTools for Data Analysis
CHILDES (The Child Language Data CHILDES (The Child Language Data Exchange System)Exchange System)– CLAN = Computerised Language AnalysisCLAN = Computerised Language Analysis
Computer program suite for transcribing, Computer program suite for transcribing, searching and analysing language datasearching and analysing language data
– CHAT = Codes for the Human Analysis of CHAT = Codes for the Human Analysis of TranscriptsTranscripts A format for notation and transcriptionA format for notation and transcription
Types of Analyses: Types of Analyses: – FREQ, MLU, COMBO, KWALFREQ, MLU, COMBO, KWAL
Next StepsNext Steps Database will be available for use by the Database will be available for use by the
research community via research community via www.splloc.soton.ac.uk (in spring 2008) (in spring 2008)
Articles & conference papers (in 2007):Articles & conference papers (in 2007):– BAAL LLT SIGBAAL LLT SIG– GALAGALA– BUCLDBUCLD– HLSHLS– SLRF SLRF
CHILDES training workshop:CHILDES training workshop:– 25 January 200825 January 2008, University of Southampton., University of Southampton.
The SPLLOC project is supported by an ESRC research grant (RES 000231609)
We would like to thank all the participants in the project, including subjects, transcribers and fieldworkers
Acknowledgments
References Domínguez, L., Arche, M.J. 2007a. “Deviant optional forms in L2 Spanish: the case of word order
variation”. Poster presentation at GALA, Barcelona, 6-8 September. Domínguez, L., Arche, M.J. 2007b. “Optionality in L2 grammars: the acquisition of SV/VS contrast in
Spanish”. To be presented at BUCLD 32,Boston, 1-4 November. Domínguez, L., Arche, M.J. 2007c. “The L2 Acquisition of SV/VS contrast in Spanish”. To be
presented at the Hispanic Linguistic Symposium, Texas, 1-4 November. Domínguez, L., Arche, M.J., Mitchell, R, Marsden, E. and Myles, F 2007. “Innovations in Spanish SLA
research methodology: introducing the ‘Spanish Learner Language Oral Corpus’”. To be presented at the Hispanic Linguistic Symposium, Texas, 1-4 November.
Granger, S., J. Hung and S. Petch-Tyson (eds.). 2002. Computer Learner Corpora, second language acquisition and foreign language teaching. Amsterdam: John Benjamins.
Lozano, C. & Mendikoetxea, A. (in press). Verb-Subject order in L2 English: new evidence from the ICLE corpus. In: Actas del XXV Congreso Internacional de AESLA. Universidad de Murcia.
Lozano, C. & Mendikoetxea, A. (forthcoming 2007). Postverbal subjects at the interfaces in Spanish and Italian learners of L2 English: a corpus analysis. In: Papp, S., Díez, B. and Gilquin, G. (eds). Linking up contrastive and corpus learner research. Rodopi
Mitchell, R., Marsden, E., Domínguez, L., Arche, M. J. and Myles, F. 2007 “Creation and analysis of a Spanish language learner oral corpus (SPLLOC)”. Poster presentation at BAAL LLT SIG Conference “Towards a Researched Pedagogy”, University of Lancaster, 2-3 July.
Mitchell, R., Dominguez, L., Arche, M.J., Myles, F. and Marsden, E. “Developing a CHILDES-based corpus of L2 oral Spanish”. To be presented at Second Language Research Forum, Urbana-Champaign, 11-14 October.
Myles, F. 2002. Linguistic development in classroom learners of French: a cross-sectional study (No. End of ESRC award report R000223421). Southampton: University of Southampton.
Myles, F. 2005. Interlanguage corpora and second language acquisition research. Second Language Research, 21,4: 373-391.