Corpus Aided Language Learning ELT J 2011 Huang 481 4

key concepts in elt

Corpus-aided language learning

Li-Shih Huang

A ‘corpus’ is a large collection or database of machine-readable textsinvolving natural discourse in diverse contexts (Bernardini 2000). Suchdiscourses can be spoken, written, computer-mediated, spontaneous, orscripted and may represent a variety of genres (for example everydayconversations, lectures, seminars, meetings, radio and televisionprogrammes, and essays). Some readily available corpora include theBritish National Corpus (BNC, http://www.natcorp.ox.ac.uk), whichcontains 100 million words from written and spoken language in a variety ofcontexts, the Michigan Corpus of Academic Spoken English (MICASE,http://micase.elicorpora.info), which features 1.8 million words ofspeech in various academic contexts, and the Corpus of ContemporaryAmerican English (COCA), with 410 million words(http://www.americancorpus.org).1

Although corpus linguistics (i.e. computer-assisted analysistechniques for studying texts) is a young specialization, its usefulnessin teaching and learning has received growing attention andrecognition (for example Hunston 2002; Sinclair 2004; Conrad 2005;O’Keeffe, McCarthy, and Carter 2007; Bennett 2010; Reppen 2010). Inparticular, researchers have identified corpus data as resources that providedescriptive insights relevant to how people use language and as tools thatenable students and instructors to analyse both how people use differentlanguage forms at various levels of formality and how language fulfillsmultiple speech functions across contexts. Corpus data suggest thatindividuals often do not use language as specified in grammar books andthat word meanings vary across contexts and users (Biber and Reppen2002).

Over the past ten years, a growing number of studies have shown howlearners can use corpus data to further their language learning (seeHunston op.cit.; Boulton 2010). Numerous corpus linguists (for exampleGavioli and Aston 2001) have pointed out that learning activities centred onanalysing corpus data are consistent with current principles of language-learning theory, that is students develop more autonomy when they receiveguidance about how to observe language and make generalizations. Suchactivities promote noticing and grammatical consciousness raising(Schmidt 1990), which can enhance second language learning anddevelopment. Despite the growing interest in corpora and corpus-aidedlearning, however, many teachers believe that incorporating corpora intotheir teaching would be too technically challenging or time consuming

ELT Journal Volume 65/4 October 2011; doi:10.1093/elt/ccr031 481ªª The Author 2011. Published by Oxford University Press; all rights reserved.Advance Access publication May 5, 2011

at Liverpool U

niversity Library on N

ovember 3, 2015

http://eltj.oxfordjournals.org/D

ownloaded from

http://www.natcorp.ox.ac.uk

http://micase.elicorpora.info

http://www.americancorpus.org

http://eltj.oxfordjournals.org/

(Boulton 2010). Yet, while some researchers have suggested substantialtraining is necessary (for example Estling Vannestal and Lindquist 2007),others have provided evidence that only a minimal amount of training isneeded (for example Boulton 2008). Some have also recommended usingpaper-based materials generated from corpora as a viable alternative toaccessing corpora via computers (Boulton 2010).

A key pedagogical approach for using corpora in language teaching andlearning is ‘data-driven learning’ (DDL), which emerged in the mid-1980s.DDL was defined as ‘the use in the classroom of computer generatedconcordances to get students to explore the regularities of patterning in thetarget language, and the development of activities and exercises based onconcordance output’ (Johns and King 1991: iii). As Johns (1994: 297) stated,‘what distinguishes the DDL approach is the attempt to cut out themiddleman as far as possible and to give direct access to the data so that thelearner can take part in building up his or her own profiles of meaning anduses’. Furthermore, corpus data ‘[offer] a unique resource for thestimulation of inductive learning strategies—in particular the strategies ofperceiving similarities and differences and of hypothesis formation andtesting’ (ibid.). By extension, the corpus-aided discovery learning (CADL)approach entails encouraging learners to take the role of languageresearchers by systematically engaging in discovery learning (Gavioli 2000)and in learning how to learn through observations, analyses, interpretations,and presentations of language-use patterns in corpus data. In the CADL

approach, learning about language use is driven by a process of enquiry thatworks toward understanding or problem solving, and corpora are used asmediational tools (Vygotsky 1978) rather than as the basis for languageteaching and learning. Furthermore, instructors adhering to the CADL

approach play a critical role in facilitating or guiding the process ofdiscovery, which depends on the learners’ needs, stages of learning, andlevels of proficiency.

Researchers have generally agreed that corpus data enrich ourunderstanding of language use and are an important resource for languageteaching and learning. The use of corpora in language teaching is notwithout controversies, however. Among the debates featured in Seidlhofer(2003), for example, some scholars have advocated using ‘real examplesonly’ in the classroom (for example Sinclair 1997), while others, in contrast,wonder whether the discourse in corpora, taken out of its original context,can still be considered ‘authentic’, ‘real’, or ‘natural’, thereby questioning theefficacy of analysing displaced language that may not be relevant to learners’linguistic and sociocultural contexts. In response to Widdowson’s (1998)remark that corpora may provide samples of genuine language produced bylanguage users with real communication goals but do not necessarilyguarantee that learners can participate in discourse in ways that lead tolearning, researchers such as Gavioli and Aston (op.cit.) note that learnerscan still ‘authenticate’ language samples by adopting an observer’s role tocritically analyse the data, which will raise their awareness of lexical,grammatical, and textual issues as they restructure their views aboutlanguage use in real situations. Similarly, Carter (1998: 50–1) argues thatwhile ‘real’ English from corpora can be ‘unrealistic’ for classroominstruction and thus modified language used in the classroom that is based

482 Li-Shih Huang

at Liverpool U


ovember 3, 2015


ownloaded from


on learners’ needs and levels might be more ‘pedagogically viable andrealistic’, learners should be provided with opportunities to develop a ‘feel’for the language through corpus data. The validity of analysing corpora tocapture language use across seemingly limitless contexts or to describe theworkings of ‘real English’ around the world has also been questioned. Somescholars point out that communicative contexts are not restricted to nativespeaker discourse, and, as such, language teaching should not be basedsimply on descriptive facts generated from largely native speaker-orientedcorpora (Prodromou 1996).2

Despite these debates, technological advancements have undoubtedlyenhanced language learners’ and instructors’ access to corpora, and theplethora of articles and books written for language-teaching researchers andpractitioners published during the past five years suggest that attention toand interest in using corpora for teaching and learning purposes willcontinue for the foreseeable future.

Notes1 For more examples, visit http://corpus.byu.edu

and International Corpus of English: http://ice-corpora.net/ice.

2 The Vienna-Oxford International Corpus ofEnglish (VOICE) (http://www.univie.ac.at/voice)is one such corpora that collects English spoken bynon-native language users in various contexts.VOICE comprises one million words of naturallyoccurring, non-scripted, face-to-face interactionsby over 1,200 speakers with 50 different firstlanguages.

ReferencesBennett, G. 2010. Using Corpora in the LanguageLearning Classroom. Ann Arbor, MI: MichiganUniversity Press.Bernardini, S. 2000. ‘Systematising serendipity:proposals for concordancing large corpora withlanguage learners’ in L. Burnard and T. McEnery(eds.). Rethinking Language Pedagogy from a CorpusPerspective. Frankfurt am Main: Peter Lang.Biber, D. andR. Reppen.2002. ‘What does frequencyhave to do with grammar teaching?’Studies inSecondLanguage Acquisition 24: 199–208.Boulton, A. 2008. ‘Looking for empirical evidencefor DDL at lower levels’ in B. Lewandowska-Tomaszczyk (ed.). Corpus Linguistics, Computer Tools,and Applications: State of the Art. Frankfurt am Main:Peter Lang.Boulton, A. 2010. ‘Data-driven learning: taking thecomputer out of the Equation’. Language Learning60/3: 534–572.

Carter, R. 1998. ‘Orders of reality: CANCODE,communications, and culture’. ELT Journal 52/1:43–56.Conrad,S.2005. ‘Corpus linguistics and L2 teaching’in E. Hinkel (ed.). Handbook of Research in SecondLanguage Teaching and Learning. Mahwah, NJ:Lawrence Erlbaum Associates.Estling Vannestal, M. and H. Lindquist. 2007.‘Learning English grammar with a corpus:experimenting with concordancing in a universitygrammar course’. ReCALL 19/3: 329–50.Gavioli, L. 2000. ‘The learner as researcher:introducing corpus concordancing in the classroom’in G. Aston (ed.). Learning with Corpora. Houston,TX: Athelstan/Bologna: CLUEB.Gavioli, L. and G. Aston. 2001. ‘Enriching reality:language corpora in language pedagogy’. ELTJournal 55/3: 238–46.Hunston, S. 2002. Corpora in Applied Linguistics.Cambridge: Cambridge University Press.Johns, T. 1994. ‘From printout to handout: grammarand vocabulary teaching in the context of data-drivenlearning’ in T. Odlin (ed.). Perspectives on PedagogicalGrammar. Cambridge: Cambridge University Press.Johns, T. and P. King. (eds.). 1991. ‘Classroomconcordancing’. English Language Research Journal4: 27–45.O’Keeffe, A., M. McCarthy, and R. Carter. 2007. FromCorpus to Classroom: Language Use and LanguageTeaching. Cambridge: Cambridge University Press.Prodromou, L. 1996. ‘Correspondence’. ELT Journal50/1: 88–9.Reppen, R. 2010. Using Corpora in the LanguageClassroom. Cambridge: Cambridge University Press.

Corpus-aided language learning 483

at Liverpool U


ovember 3, 2015


ownloaded from


Schmidt, R. 1990. ‘The role of consciousness insecond language learning’. Applied Linguistics 11/2:129–58.Seidlhofer, B. 2003. Controversies in AppliedLinguistics. Oxford: Oxford University Press.Sinclair, J. 1997. ‘Corpus evidence in languagedescription’ in A. Wichmann, S. Fligelstone,T. McEnery, and G. Knowles (eds.). Teaching andLanguage Corpora. New York, NY: Longman.Sinclair, J. 2004. How to Use Corpora in LanguageTeaching. Amsterdam: John Benjamins PublishingCompany.Vygotsky, L. S. 1978.Mind inSociety: TheDevelopmentof Higher Psychological Processes. Cambridge, MA:Harvard University Press.

Widdowson, H. G. 1998. ‘Context, communityand authentic language’. TESOL Quarterly 32/4:705–16.

The authorLi-Shih Huang is an Associate Professor of AppliedLinguistics and Learning and Teaching CentreScholar-in-Residence at the University of Victoria,Canada. Her current research examines academiclanguage learning needs and outcomes assessment,corpus-aided discovery learning, and learnerstrategies in language learning and language testingcontexts.Email: [email protected]

484 Li-Shih Huang

at Liverpool U


ovember 3, 2015


ownloaded from


Corpus Aided Language Learning ELT J 2011 Huang 481 4

Documents

Transcript of Corpus Aided Language Learning ELT J 2011 Huang 481 4