Learner Corpora and Natural Language Processing

30
Learner Corpora and Natural Language Processing Detmar Meurers Universität Tübingen Revision 10781 Contents 1 Introduction 2 2 Core issues 3 2.1 Representing learner data and the relevance of target hypotheses ....... 3 2.2 Annotating learner data ............................. 6 2.2.1 Linguistic annotation .......................... 7 2.2.2 Error annotation ............................ 10 2.2.3 NLP approaches detecting and diagnosing learner errors ....... 10 3 Case studies 14 3.1 Rosen et al. (2013): Manual, semi-automatic and automatic annotation of a learner corpus .................................. 15 3.2 Metcalf and Meurers (2006): Word order errors and their different NLP re- quirements .................................... 18 4 Critical assessment and future directions 20 4.1 NLP supporting the use of learner corpora for SLA research ......... 20 4.1.1 Facilitating the analysis of learner language in a broad range of tasks 21 4.2 Deeper linguistic modelling in NLP and learner language analysis ...... 22 5 Key readings 23 To appear in: The Cambridge Handbook of Learner Corpus Research, edited by Sylviane Granger, Gaëtanelle Gilquin and Fanny Meunier, Cambridge University Press, 2015. 1

Transcript of Learner Corpora and Natural Language Processing

Page 1: Learner Corpora and Natural Language Processing

Learner Corpora and Natural Language Processing

Detmar MeurersUniversität Tübingen

Revision 10781

Contents

1 Introduction 2

2 Core issues 3

2.1 Representing learner data and the relevance of target hypotheses . . . . . . . 3

2.2 Annotating learner data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Linguistic annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.2 Error annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.3 NLP approaches detecting and diagnosing learner errors . . . . . . . 10

3 Case studies 14

3.1 Rosen et al. (2013): Manual, semi-automatic and automatic annotation of alearner corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Metcalf and Meurers (2006): Word order errors and their different NLP re-quirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Critical assessment and future directions 20

4.1 NLP supporting the use of learner corpora for SLA research . . . . . . . . . 20

4.1.1 Facilitating the analysis of learner language in a broad range of tasks 21

4.2 Deeper linguistic modelling in NLP and learner language analysis . . . . . . 22

5 Key readings 23

To appear in: The Cambridge Handbook of Learner Corpus Research, edited by SylvianeGranger, Gaëtanelle Gilquin and Fanny Meunier, Cambridge University Press, 2015.

1

Page 2: Learner Corpora and Natural Language Processing

1 IntroductionNatural Language Processing (NLP) deals with the representation and the automatic analysisand generation of human language (Jurafsky & Martin, 2009). Learner corpora collect thelanguage produced by people learning a language. The two thus overlap in the representationand automatic analysis of learner language, which constitutes the topic of this chapter. Wecan distinguish three main uses of NLP involving learner corpora:

First, NLP tools are employed to annotate learner corpora with a wide range of general prop-erties and to gain insights into the nature of language acquisition or typical learner needs onthat basis. On the one hand, this includes general linguistic properties from part-of-speechand morphology, via syntactic structure and dependency analysis, to aspects of meaning anddiscourse, function and style. On the other, there are properties specific to learner language,such as different types of learner errors, again ranging from the lexical and syntactic to dis-course, function and usage. The use of NLP tools for annotation can be combined with humanpost-editing to eliminate potential problems introduced by the automatic analysis. NLP toolscan also be integrated into a manual annotation setup to flag annotation that appears to beinconsistent across comparable corpus instances (Boyd et al., 2008), automatically identifylikely error locations and refine manual annotation (Rosen et al., 2013). The use of NLP forcorpus annotation will be an important focus of this chapter. A detailed discussion of thespell- and grammar-checking techniques related to error annotation can be found in Chapter27.

Second, NLP tools are used to provide specific analyses of the learner language in the cor-pus. For instance, for the task of Native Language Identification (NLI) discussed in detailin Chapter 29, classifiers are trained to automatically determine the native language of thesecond language learner who wrote a given essay. In another NLP task, learner essays areanalysed to determine the proficiency level of the learner who wrote a given essay (Pendar &Chapelle, 2008; Yannakoudakis et al., 2011; Vajjala & Lõo, 2013; Hancke & Meurers, 2013),a task related to the analysis of developmental sequences and criterial features of differentstages of proficiency (Granfeldt et al., 2005; Rahkonen & Håkansson, 2008; Alexopoulouet al., 2011; Tono, 2013; Murakami, 2013) and the popular application domain of automaticessay grading addressed in Chapter 28.

The third type of NLP application in the context of learner corpora is related to the previoustwo, but different from those it is not designed to provide insights into the learner corpusas such. Instead, the learner corpus is only used to train NLP tools, specifically the statisti-cal or machine learning components. The trained NLP tools can then be applied to learnerlanguage arising in other contexts. A tool trained on a learner corpus to detect particulartypes of learner errors can be used to provide immediate, individualised feedback to learnerswho complete exercises in an intelligent tutoring system. Such Computer-Assisted LanguageLearning (CALL) systems to which NLP has been added are commonly referred to as Intel-ligent Computer Assisted Language Learning (ICALL). While traditionally the two fields ofLearner Corpus Research and ICALL developed independently and largely unconnected (butcf. Granger et al., 2007), the NLP analysis of learner language for corpus annotation is es-sentially an offline version of the online NLP analysis of learner language in ICALL, wherethe learner is waiting for feedback.

Complementing the NLP analysis of learner language, the other use of NLP in the languagelearning context (Meurers, 2012) analyses the native language to be learned. Examples ofthe latter include the retrieval of pedagogically appropriate reading materials (e.g. Brown &

2

Page 3: Learner Corpora and Natural Language Processing

Eskenazi, 2005; Ott & Meurers, 2010), the generation of exercises (e.g. Aldabe, 2011), orthe presentation of texts to learners with visual or other enhancements supporting languagelearning (Meurers et al., 2010). The standard NLP tools have been developed for nativelanguage and thus are directly applicable in that domain. On the other hand, the analysisof learner language as the topic of this chapter raises additional challenges, which figureprominently in the next section on the core issues, from corpus representation as such tolinguistic and error annotation.

2 Core issues

2.1 Representing learner data and the relevance of target hypotheses

At the fundamental level, representing a learner corpus amounts to encoding the languageproduced by the learner and its meta-data, such as information about the learner and the taskperformed (cf. Granger, 2008:264). For the spoken language constituting the primary datafor research on first language acquisition (e.g., CHILDES, MacWhinney, 2000), most workon uninstructed second language acquisition (e.g., ESF, Perdue, 1993), and a small part ofthe instructed second language acquisition corpora (e.g., NICT-JLE, Tono et al., 2004), thisinvolves the question of how to encode and orthographically transcribe the sound recordings(Wichmann, 2008:195ff). Written language, such as the learner essays typically collected ininstructed second language learning contexts (e.g., ICLE, Granger et al., 2009), also requiretranscription for hand-written learner texts, though essays typed by learners are increasinglycommon. The language typed into CALL systems is starting to be systematically collected,supporting the creation of very large learner corpora such as EFCAMDAT (Geerzen et al.,2013). Learner corpora can also be collected from web sites such as Lang-8 (Brooke & Hirst,2013), a web site where non-native writers can receive feedback from native speakers.

While the fundamental questions around how to represent spoken and written language incorpora are largely independent of the nature of the language being collected, and good gen-eral corpus linguistic discussions can be found in McEnery et al. (2006) and Lüdeling & Kytö(2008), there are important representation aspects specific to learner language. Researchersin second language acquisition emphasise the individual, dynamic nature of interlanguage(Selinker, 1972), and focus on characterising its properties as a language system in its ownright. At the same time, the analysis of language, be it manual linguistic analysis or auto-matic NLP analysis, was developed for and trained on well-formed native language. Whentying to analyse learner data on that basis, one encounters forms and patterns which cannotbe analysed in terms of the targeted native language system.

Consider, for example, the learner sentence in (1), taken from the Non-native Corpus ofEnglish (NOCE; Díaz Negrillo, 2007), consisting of essays by intermediate Spanish learnersof English.

(1) People who speak another language have more opportunities to be choiced for a jobbecause there is a lot connection between the different countries nowadays.

In line with the native English language system, the verbal -ed suffix of the underlined wordchoiced can be identified as a verbal suffix and interpreted as past tense, and the distributionalslot between to be and for a job is syntactically appropriate for a verb. But the stem choicein English can only be a noun or an adjective. In this example and a systematic set of casesdiscussed in Díaz Negrillo et al. (2010), it thus is not possible to assign a unique Englishpart-of-speech to learner tokens.

3

Page 4: Learner Corpora and Natural Language Processing

In examples such as (1), it seems straightforward to analyse the sentence as though the learnerhad written the appropriate native English form chosen in place of the interlanguage formchoiced. Yet, even for such apparently clear non-word cases, where a learner used a wordthat is not part of the target language system, different native language words may be inferredas targets (e.g., selected could be another option for the example above), and the subsequentanalysis can differ depending on which target is assumed.

When we go beyond the occurrence of isolated non-words, the question of which level of rep-resentation of learner corpora can form the basis for the subsequent analysis becomes morepronounced. For example, consider the sentence in (2) written by a beginning learner of Ger-man as found in the Error-Annotated German Learner Corpus (EAGLE, Boyd, 2012:135ff).The sentence only includes well-formed words, but the subject and the verb fail to show thesubject-verb agreement required by German grammar.

(2) Duyou2sg

arbeitenwork1pl/3pl/inf

inin

Liechtenstein.Liechtenstein

Given that agreement phenomena always involve (at least) two elements, there is systemat-ically an ambiguity in determining grammatical target forms. If we take the second personsubject du at face value, the corresponding second person verb form arbeitest is the likelytarget. Or we keep the verb, assume that it is a finite form and thus postulate the correspond-ing plural third person sie (they) or first person wir (we) as subject to obtain a well-formedtarget language sentence. Fitzpatrick & Seegmiller (2004) present a study confirming that itis often difficult to decide on a unique target form.

Considering the difficulty of uniquely determining target forms, one needs to document thereference on which any subsequent analysis is based in order to ensure valid, sustainableinterpretations of learner language. Lüdeling (2008) thus argues for explicitly specifyingsuch target hypotheses as a representation level of learner corpora. Rosen et al. (2013:16f)confirm that disagreement in the analysis of learner data often arises from different targethypotheses being assumed. We will discuss their findings for the Czech as Second Language(CzeSL) corpus in the first case study in Section 3.1.

While there seems to be a growing consensus that a replicable analysis of learner data re-quires the explicit representation of target hypotheses in learner corpora, what constitutes atarget hypothesis and how it is obtained needs clarification. There are two pieces of evidencethat one can take into account in determining a target hypothesis:

On the one hand, one can interpret the forms produced by the learner bottom-up in termsof a linguistic reference system, such as the targeted native language system codified in thestandard corpus annotation schemes. One can then define a target hypothesis which encodesthe minimal form change that is required to turn the learner sentence into a sentence whichis well-formed in terms of the target language grammar. A good example is the MinimalTarget Hypothesis (TH1) made explicit in the annotation manual of the German learner cor-pus FALKO (Reznicek et al., 2012:42ff). An alternative incremental operationalization of apurely form-based target hypothesis is spelled out in Boyd (2012). Both approaches explic-itly define what counts as minimal form change. They do not try to guess what the learnermay have wanted to say and how this could have been expressed in a well-formed sentence,but determine the minimal number of explicitly defined form changes that is needed to turnthe learner sentence into a grammatical sentence in the target language. While this makes itpossible to uniquely identify a single target hypothesis in many cases, for some cases multi-

4

Page 5: Learner Corpora and Natural Language Processing

ple possible target hypotheses are required. This can readily be represented in corpora usingmulti-layer standoff annotation (Reznicek et al., 2013).

On the other hand, one can determine target hypotheses using top-down information aboutthe function of the language and the meaning the learner was trying to express, based on whatwe know about the particular task and expectations about human communication in general.The top-down, meaning-driven and the bottom-up form-driven interpretation processes es-sentially interact in any interpretation of human language. For interpreting learner language,this interaction is particularly relevant given that the interlanguage forms used by languagelearners cannot fully be interpreted in terms of the established linguistic reference systemsdeveloped for the native language. This is particularly evident for learner language such asthe Basic Variety that is characteristic of uninstructed second language acquisition (Klein &Perdue, 1997), which lacks most grammatical form marking. The fact that learner languageoffers limited and hard to interpret form-driven information bottom-up, makes corpora withexplicit task contexts particularly relevant for learner corpus research aimed at drawing validinferences about the learners’ second language knowledge and development.

Consider, for example, the learner sentences in (3) written by Japanese learners of English,as recorded in the Hiroshima English Learners’ Corpus (HELC, Miura, 1998).

(3) a. I don’t know his lives.b. I know where he lives.

Both sentences are grammatically well-formed in English. Given that the form-based targethypotheses is defined to consist of the minimal form change that is needed to obtain a sen-tence that is grammatical in the target language system, these target hypotheses are identicalto the learner sentences in both cases. However, if we go beyond the form of the sentenceand take the context and meaning into account, we find that both sentences were producedin a translation task to express the Japanese sentence meaning I don’t know where he lives.We can thus provide a second, meaning-based target hypothesis for the two sentences. Onthat basis, we can analyse the learner sentences and, for example, interpret them in terms ofthe learners’ capabilities to use do support, negation, and to distinguish semantically relatedwords with different parts of speech (cf. Zyzik & Azevedo, 2009).

While the example relies on an explicit task context in which a specific sentence encodesthe meaning to be expressed for this translation exercise, the idea to go beyond the formsin the sentence towards meaning and function in context is generally applicable. It is alsopresent in the annotation guidelines used for the learner essays and summaries collected inthe FALKO corpus. The Extended Target Hypothesis (TH2) operationalized in Rezniceket al. (2012:51ff) takes into account the overall text, the meaning expressed, the functionand information structure and aspects of the style. While such an extended target hypothesisprovides an important reference for a more global, functional analysis, as such it cannot bemade explicit in the same formal way as the minimal form change target hypothesis TH1. Toensure sufficient inter-annotator agreement for TH2 annotation, task design arguably requiresparticular attention. The importance of integrating more task and learner information into theanalysis of learner data is confirmed by the prominent evidence-centred design approach inlanguage assessment (Mislevy et al., 2003).

A global, meaning-based target hypothesis may also seem to come closer to an intuitive ideaof the target hypothesis as ‘what the learner wanted to say’, but such a seemingly intuitiveconceptualisation of target hypotheses would be somewhat naive and more misleading than

5

Page 6: Learner Corpora and Natural Language Processing

helpful. Learners do not simply write down language to express a specific meaning. Theyemploy a broad range of strategies to use language in a way that achieves their communicativeor task goals. Indeed, strategic competence is one of the components of language competencedistinguished in the seminal work of Canale & Swain (1980), and Bachman & Palmer (1996)explicitly discuss planning how to approach a test task as a good example of such strategiccompetence. In an instructed second language learning setting, learners know that formerrors are one of the aspects they typically are evaluated on, and therefore they strategicallyproduce language in a way that minimises the number of form errors they produce. Forexample, we found in the Corpus of Reading Comprehension Exercises in German (CREG)that the learners simply lift material from texts or use familiar chunks (Ott et al., 2012:59f), astrategy which, for example, allows the learners to avoid generating the complex agreementpatterns within German noun phrases. In a similar English learner corpus, Bailey (2008)found that this strategy was used more frequently by less proficient learners, who made fewerform errors overall (but less frequently answered the question successfully). A second reasonfor rejecting the idea that a target hypothesis is ‘what the learner wanted to say’ is thatlearners do not plan what they want to say in terms of full-fledged target language forms(though learners may access chunks and represent aspects at a propositional level, cf. Kintsch& Mangalath, 2011). Even when learners produce apparently grammatical target languageforms, their conceptualisation of the language forms does not necessarily coincide with theanalysis in terms of the target language system. For example, Amaral & Meurers (2009)found that learners could not interpret feedback provided by an intelligent tutoring systembecause they misconceptualised contracted forms.

Summing up this discussion, target hypotheses are intended to provide an explicit represen-tation that can be interpreted in terms of an established linguistic reference system, typicallythat of the language being acquired. It also is this targeted native language for which thelinguistic annotation schemes and NLP tools have been developed. The form-based and themeaning-based target hypotheses discussed above are two systematic options that can serveas a reference for a wide range of language analyses. Conceptually, a target hypothesisneeds to make explicit the minimal commitment required to support a specific type of analy-sis/annotation of the corpus. As such, target hypotheses may only consist of one or a coupleof words instead of full sentences, and more abstract target hypothesis representations mayhelp avoid an overcommitment that would be entailed by specifying the full surface forms ofsentences, e.g., where multiple word orders are possible in a given context.

2.2 Annotating learner data

The purpose of annotating learner corpora is to provide an effective and efficient index intorelevant subclasses of data. As such, linguistic annotation serves essentially the same purposeas the index of a telephone book. A telephone book allows us to efficiently look up the phonenumber of people by the first letter of the last name. The alternative of doing a linear searchby reading through the phone book from beginning to end until one finds the right person,would be possible in theory but would generally not be efficient enough in practice. Whileindexing phone book information by the first letter of the last name is typical, it is only onepossible index – one that is well-suited for the typical questions one tries to address usingphone books written in an alphabet. For Chinese names, on the other hand, the number ofstrokes in the name as written in the logographic writing system is typically used instead,with the radical-and-stroke sorting used in Chinese dictionaries being another option.

6

Page 7: Learner Corpora and Natural Language Processing

For other questions which can be addressed using the same telephone book information, weneed other indices. For example, consider a situation in which someone called us, we havea phone that displays the number of the caller, and we now want to find out, who called us.We would need a phone book that is indexed by phone numbers. Or to be able to efficientlylook up who lives on a particular street, we would need a book that is indexed alphabeti-cally by the first letter of the street name. Taking this running example one important stepfurther, consider what it takes to look up phone numbers of all the butchers in a given town.Given that a phone book typically does not list professions, we need an additional resourceto first determine the names of all the butchers. If we often want to look up people by theirprofession, we may decide to add that information to the phone book so that we can morereadily index the data based on that information. Any such index is an interpretation of thedata giving us direct access to specific subsets of data which are relevant under a particularperspective.

Each layer of annotation we add to corpora as collections of language data serves exactly thatpurpose of providing an efficient way to index language data to retrieve the subclasses of datathat helps us answer common (research) questions. For example, to pick out occurrences ofthe main verb can as in Dario doesn’t want to can tuna for a living, we need part-of-speechannotation that makes it possible to distinguish such occurrences of can from the frequentuses of can as an auxiliary (Cora can dance) or as a noun (What is Marius doing with thatcan of beer?) which cannot readily be distinguished by only looking at surface forms in thecorpus.

Which subclasses are relevant depends on the research question and how corpus data is in-volved in addressing it. For Foreign Language Teaching and Learning (FLTL), the questionsare driven by the desire to identify and exemplify typical student characteristics and needs.For Second Language Acquisition research (SLA), learner corpora are queried to inform theempirical basis on which theories of the acquisition process and its properties are developedand validated. General linguistic layers of annotation, such as parts-of-speech or syntacticdependencies, are useful for querying the corpus for a wide range of research questions aris-ing in FLTL and SLA – much like annotating telephone book entries with professions allowsus to search for people to address different needs, from plumbers to hairdressers. On the otherhand, annotating all phone entries with the particular day of the week on which they are bornwould not provide access to generally relevant classes of data. Which type of annotationsone can and should provide for learner corpora using automatic or manual annotation or acombination of the two is an important research issue at the intersection of learner corpusand NLP research.

2.2.1 Linguistic annotation

A wide range of linguistic corpus annotation schemes have been developed for written andspoken language corpora (cf., e.g., Garside et al., 1997; Leech, 2004), and the NLP toolsdeveloped over the past two decades support the automatic identification of a number oflanguage properties, including lexical, syntactic, semantic and pragmatic aspects of the lin-guistic system.

For learner corpora, the use of NLP tools for annotation is much more recent (de Haan,2000; de Mönnink, 2000; van Rooy & Schäfer, 2002, 2003; Sagae et al., 2010). Whichkind of annotation schemes are relevant and useful to address which learner corpus researchquestions is only starting to be discussed. For advanced learner varieties, the annotationschemes and NLP tools originally developed for native language, especially edited news

7

Page 8: Learner Corpora and Natural Language Processing

text, can seemingly be applied. At closer inspection, even this requires some leeway whenchecking whether the definitions in the annotation schemes apply to a given learner corpusexample. In NLP, such leeway is generally discussed under the topic of robustness.

Real-life NLP applications such as a machine translation system should, for example, be ableto translate sentences even if they contain some spelling mistakes or include words we havenot encountered before, such as a particular proper name. Robustness in corpus annotationallows the NLP tools to classify a given learner language instance as a member of a particularclass (e.g., a particular part-of-speech) even when the observed properties of those instancesdiffer from what is expected for that class, e.g., when the wrong stem is used, as in the caseof choiced we discussed for example (1). At a given level of analysis, robustness thus allowsthe NLP tools to gloss over those aspects of learner language that differ from the nativelanguage for which the annotation schemes and tools were developed and trained. In otherwords, robustness at a given level of analysis is intended to ignore the differences betweenthe learner and the native language at that level.

In contrast, many of the uses of learner corpora aim to advance our understanding of lan-guage acquisition by identifying characteristics of learner language. For such research, theparticularities and variability of learner language at the level being investigated are exactlywhat we want to identify, not gloss over robustly. In Section 2.1 we already discussed a keycomponent for addressing this issue: target hypotheses (specifically the form-based TH1).We can see those as a way of documenting the variation that robust analysis would simplyhave glossed over. Target hypotheses require the researcher to make explicit where a changeis required to be able to analyse the learner language using a standard linguistic annotationscheme. A learner corpus including target hypotheses and linguistic annotation thus makes itpossible to identify both the places where the learner language diverges from the native lan-guage norm as well as the general linguistic classes needed for retrieval of relevant subsetsof learner data.

At the same time, such an approach cannot be the full solution for analysing the charac-teristics of learner language. It amounts to interpreting learner language in a documentedway, but still in terms of the annotation schemes developed for native language instead ofannotation schemes defined to systematically reflect the properties of interlanguage itself.This is natural, given that linguistic category systems arose on the basis of a long historyof data observations, based on which a consensus of the relevant categories emerged. Suchcategory systems thus are difficult to develop for the individual, dynamic interlanguage oflanguage learners. But if we instead simply use a native language annotation scheme tocharacterise learner language, we run the danger of committing a comparative fallacy, ‘themistake of studying the systematic character of one language by comparing it to another’(Bley-Vroman, 1983:6).

Pushing the insight from hermeneutics (http://plato.stanford.edu/entries/hermeneutics) thatevery interpretation is based on a given background to the extreme, it is evident that wecan never perceive anything as such. Indeed, as argued in the accessible introduction tohermeneutics and radical constructivism in Winograd & Flores (1986), humans are autopoi-etic systems evolving in a constant hermeneutic circle. However, for the concrete case athand, there is a more pragmatic approach which in effect helps us limit the degree of thecomparative fallacy entailed by the annotation scheme used. The idea is to annotate learnerlanguage as close as possible to the specific dimensions of observable empirical properties.For example, traditional part-of-speech encode a bundle of syntactic, morphological, lexicaland semantic characteristics of words. For learner language, we proposed in Díaz Negrillo

8

Page 9: Learner Corpora and Natural Language Processing

et al. (2010) to instead employ a tripartite representation with three separate parts-of-speechexplicitly encoding the actually observable distributional, morphological and lexical steminformation. Consider the examples in (4).

(4) a. The intrepid girl smiled.b. He ambulated home.c. The king of France is bald.

In terms of the distributional evidence for the part-of-speech of the word intrepid in sentence(4a), between a determiner and a noun we are most likely to find an adjective. Illustrating themorphological evidence, in (4b) the word ending in the suffix -ed is most likely to be a verb.Finally, in (4c) the word of is lexically unambiguous so that looking up the isolated word ina dictionary is sufficient for determining that it is a preposition.

For native language, the three sources of empirical evidence converge and can be encoded byone part-of-speech tag. For learner language, these three information sources may diverge, aswas illustrated by example (1). To avoid some of the complexity of such multi-dimensionaltagsets, Reznicek & Zinsmeister (2013) show that the use of underspecified tags (leaving outsome information) and Portmanteau tags (providing richer tagsets, combining information)can lead to an improved part-of-speech analysis of German learner language.

In the syntactic domain, encoding classes close to the empirical observations can be realisedby breaking down constituency in terms of a) the overall topology of a sentence, i.e., thesentence-level word order, b) chunks and chunk-internal word order and c) lexical depen-dencies. What is encoded in the overall topology of a sentence depends on the language andincludes grammatical functions and discourse aspects, but the prevalence of notions suchas fronting or extraposition in linguistic characterisations of data illustrates the relevance ofsuch a global, topological characterisation of sentences. For some Germanic languages, thischaracterisation can build on the tradition of Topological Fields analysis, based on which au-tomatic NLP analyses have also been developed for German (Cheung & Penn, 2009). Topo-logical fields are also starting to be employed in the analysis of learner language (Hirschmannet al., 2007)

In terms of the notion of chunks, various chunk concepts are widely discussed in the contextof learner language (though often in need of a precise operationalization and corpus-basedevaluation), but the dependency analysis requires more elaboration here. To pursue the en-visaged analysis close to the specific empirical observations, one must carefully distinguishbetween morphological, syntactic and semantic dependencies. This is, for example, the casein Meaning Text Theory, (Mel’cuk, 1988) and it is reflected in the distinction between theanalytical and the tectogrammatical layer of the Prague Dependency Treebank (Böhmováet al., 2003). Against this background, we can distinguish two types of dependency analyseswhich have been developed for learner language. On the one hand, we find surface-evidencebased approaches that aim at providing a fine-grained record of the morphological and syn-tactic evidence (Dickinson & Ragheb, 2009; Ragheb & Dickinson, 2012), such as observablecase marking or agreement properties. On the other, there are approaches which essentiallytarget a level of semantic dependencies (Rosén & Smedt, 2010; Ott & Ziai, 2010). The goalhere is to robustly abstract away from learner specific forms and constructions where syntaxand semantics diverge (such as English case-marking prepositions or the interpretation ofthe subject of non-finite constructions) to encode the underlying function-argument relationsfrom which the sentential meaning can be derived. For example, dependency parsing is used

9

Page 10: Learner Corpora and Natural Language Processing

as part of a content assessment system analysing learner responses to reading comprehensionquestions (Hahn & Meurers, 2012), and King & Dickinson (2013) report on the NLP analysisof another task-based learner corpus supporting the evaluation of meaning. For learner datafrom a picture description task, they obtain very high accuracies for the extraction of the corefunctor argument relations using shallow semantic analysis.

For any dependency analysis of learner data to be useful for research, the essential questionis which kind of dependency distinctions can reliably be identified given the information inthe corpus. This is starting to be addressed in recent work (Ragheb & Dickinson, 2013).Relatedly, when using parsers to automatically assign dependency analyses for learner lan-guage, one needs to be aware that the particular parsing setup chosen impacts the nature andquality of the dependency analysis that is obtained. Comparing two different computationalapproaches to dependency parsing learner language for German, for example, in Krivanek &Meurers (2013) we show that a rule-based approach was more reliable in identifying the mainargument relations, whereas a data-driven parser was more reliable in identifying adjunct re-lations. This also is intuitively plausible given that statistical approaches can use the worldknowledge encoded in a corpus for disambiguation, whereas the grammar-based approachcan rely on high quality subcategorization information for the arguments.

2.2.2 Error annotation

A second type of annotation of learner corpora, error annotation, targets the nature of thedifference between learner data and native language. Given the FLTL interest in identify-ing, diagnosing and providing feedback on learner errors, and the fact that learner corporaare commonly collected in an FLTL context, error annotation is the most common formof annotation in the context of learner corpora (Granger, 2003; Díaz Negrillo & FernándezDomínguez, 2006). At the same time, error annotation is only starting to be subjected to therigorous systematisation and inter-annotator agreement testing established for linguistic an-notation, which will help determine which distinctions can reliably be annotated based on theevidence available in the corpus. The analyses becoming available in the NLP context con-firm that the issue indeed requires scrutiny. Rozovskaya & Roth (2010) find very low inter-annotator agreement for error classification of ESL sentences. Even for the highly focusedtask of annotating preposition errors, Tetreault & Chodorow (2008a) report that trained an-notators failed to reach good agreement. Rosen et al. (2013) provide detailed inter-agreementanalyses for their Czech learner corpus, making concrete for which aspects of error annota-tion good agreement can be obtained and what this requires – a study we discuss in detail inSection 3.1. For corpus annotation to support reliable, replicable access to systematic classesof data in the way explained in Section 2.2, it is essential to reduce annotation schemes tothose categories that can reliably be assigned based on the evidence available in the corpus.

2.2.3 NLP approaches detecting and diagnosing learner errors

In terms of tools for detecting errors, learner corpus research has long envisaged automaticapproaches (e.g., Granger & Meunier, 1994), but the small community at the intersection ofNLP and learner corpus research is only starting to make headway. The mentioned concep-tual challenges and the unavailability of gold-standard error annotated learner corpora hinderprogress in this area. Corpora with gold-standard annotation are essential for developing andevaluating current NLP technology, which generally is built using statistical or supervisedmachine learning components that need to be trained on large, representative gold-standarddata sets. Writer’s aids for native speakers, such as the standard spell and grammar check-

10

Page 11: Learner Corpora and Natural Language Processing

ers, may seem like a natural option to fall back on. However, such tools rely on assump-tions about typical errors made by native speakers, which are not necessarily applicable tolanguage learners (Flor et al., 2014). For example, Rimrott & Heift (2008) report that ‘incontrast to most misspellings by native writers, many L2 misspellings are multiple-edit er-rors and are thus not corrected by a spell checker designed for native writers.’ Examples forsuch multiple-edit errors, for instance, include lexical competence errors such as Postkeutzah→ Postleitzahl (postal code) or grammatical overgeneralizations as in gegehen 7→ gegangen(went).

The overall landscape of computational approaches for detecting and diagnosing learner er-rors can be systematised in terms of the nature of data that is targeted, from single tokensvia local domains to full sentences. Pattern-matching approaches target single tokens or lo-cal patterns to identify specific types of errors. Language-licensing approaches attempt toanalyse entire learner utterance to diagnose its characteristics. In the following, we providea conceptual overview, with Chapter 27 spelling the topic out further.

Pattern-matching approaches traditionally employ error patterns explicitly specifying surfaceforms. For example, an error pattern for English can target occurrences of their immediatelypreceding is or are to detect learner errors such as (5) from the Chinese Learner EnglishCorpus (CLEC)1.

(5) Their are all kinds of people around us.

Such local error patterns can also be defined in terms of annotations such as parts-of-speechto allow identification of more general patterns. For example, one can target more or lessfollowed by an adjective or adverb, followed by then, an error pattern instantiated by theCLEC learner sentence in (6).

(6) At class, students listen more carefuladj then any other time.

Error pattern matching is commonly implemented in standard grammar checkers. For ex-ample, the open source LanguageTool2 provides a general implementation, in which typicallearner error patterns can be specified.

While directly specifying such error patterns works well for certain clear cases of errors,more advanced pattern-matching splits the error identification into two steps. First, a contextpattern is defined to identify the contexts in which a particular type of error may arise. Thenpotentially relevant features are collected, recording all properties which may play a role indistinguishing erroneous from correct usage. A supervised machine learning setup can thenbe used to learn how to weigh the evidence to accurately diagnose the presence of an errorand its type. For example, given that determiner usage is a well-known problem area forlearners of English, a context pattern can be used to identify all noun chunks. Propertiesof the noun and its context can then be used to determine whether a definite, an indefinite,or no determiner is required for this chunk. This general approach is a very common setupfor NLP research targeting learner language (e.g., De Felice, 2008; Tetreault & Chodorow,2008b; Gamon et al., 2009), and it raises important general questions for future work in termsof how much context and which linguistic properties are needed to accurately diagnose whichtype of errors. Note the interesting connection between these question and the need to furtheradvance error annotation schemes based on detailed analyses of inter-annotator agreement.

1Yang Huizhong and Gui Shichun: http://purl.org/icall/clec (last accessed 12.7.2014)2http://www.languagetool.org (last accessed 12.7.2014)

11

Page 12: Learner Corpora and Natural Language Processing

Language licensing approaches go beyond characterising local patterns and attempt to anal-yse complete sentences. These so-called deep NLP approaches are based on fully explicit,formal grammars of the language to be licensed. Grammars essentially are compact repre-sentations of the wide range of lexical and syntactic possibilities of a language. To processwith such grammars, efficient parsing algorithms are available to license a potentially infiniteset of strings based on finite grammars. On the conceptual side, grammars can be expressedin two distinct ways (Johnson, 1994). In a validity-based grammar setup, a grammar is a setof rules. A string is recognised if and only if one can derive that string from the start symbolof the grammar. A grammar without rules licenses no strings, and essentially the more rulesare added, the more different types of strings can be licensed. In a satisfiability-based gram-mar setup, a grammar consists of a set of constraints. A string is grammatical if and only ifit satisfies all of the constraints in the grammar. A grammar without constraints thus licensesany string, and the more constraints are added, the fewer types of strings are licensed.

A number of linguistic formalisms have been developed for expressing such grammars, frombasic context-free grammars lacking the ability to generalise across categories and rules to themodern lexicalized grammar formalisms for which efficient parsing approaches have beendeveloped, such as Head-Driven Phrase Structure Grammar (HPSG), Lexical-FunctionalGrammar (LFG), Combinatory Categorial Grammar (CCG) and Tree-Adjoining Grammar(TAG) – cf. the grammar framework overviews in Brown (2006). To use any of these ap-proaches in our context, we need to consider that linguistic theories and grammars are gener-ally designed to license well-formed native language, which raises the question of how theycan license learner language (and identify errors as part of the process).

There are essentially two types of approaches for licensing learner language, correspondingto the two types of formal grammars introduced above. In a validity-based setup using aregular parser, so-called mal-rules can be added to the grammar (Schwind, 1990; Matthews,1992:cf., e.g., ), to also license and thereby identify ill-formed strings occurring in the learnerlanguage. For example, Schwind (1990:575) defines a phrase structure rule licensing Germannoun phrases in which the adjective follows the noun. This is ungrammatical in German, sothe rule is marked as licensing an erroneous structure, i.e., it is a mal-rule.

A mal-rule approach requires each possible type of learner error to be pre-envisaged andexplicitly encoded in every rule in which it may surface. For small grammar fragments, asneeded for exercises effectively constraining what the learner is likely to produce (Amaral& Meurers, 2011:9ff), this can be feasible; but for a grammar with any broader coverage oflanguage phenomena and learner error types, writing the mal-rules needed would be verylabour-intensive and error-prone.

Some types of errors can arise in a large number of rules; for example, subject-verb agree-ment errors may need to be accommodated in any rule realising subjects together with a finiteverbal projection. Following Weischedel & Sondheimer (1983), meta-rules can be used toexpress such generalisations over rules. For example, they define a meta-rule that allowssubject-verb agreement to be relaxed anywhere, and one allowing articles to be omitted indifferent types of noun phrases.

Rule-based grammars license the infinite set of possible strings by modularizing the analysisinto local trees. A local tree is a tree of depth one, i.e., a mother node and its immediatechildren. Each local tree in the overall analysis of a sentence is independently licensed by asingle rule in the grammar. Thus a mal-rule also licenses a local tree, which in combinationwith other local trees licensed by other rules ultimately licenses the entire sentence. The mal-rule approach is conceptually simple when the nature of an error can be captured within the

12

Page 13: Learner Corpora and Natural Language Processing

local domain of a single rule. For example, a mal-rule licensing the combination of an articleand a noun disagreeing in gender (lemasc tablefem) can be added to a French grammar. Thefact that rules and mal-rules in a grammar interact requires very careful grammar (re)writingto avoid unintended combinations. The situation is complicated further when the domain ofan error is larger than a local tree. For example, extending the word orders licensed by a ruleS → NP VP by adding the mal-rule S → VP NP makes it possible to license (7a) and (7b).

(7) a. Mary [loves cats].b. * [loves cats] Mary.c. * loves Mary cats.

The order in (7c), on the other hand, cannot be licensed in this way given that it involvesreordering words licensed by two different rules, so that no single mal-rule can do the job– unless one writes an ad-hoc, combined mal-rule for the flattened tree (S → V NP NP ),which would require adding such rules for combinations with all other rules licensing VPs(intransitive, ditransitive, etc.) as well. Lexicalized grammar formalisms using richer datastructures, such as the typed feature structure representation of signs used in HPSG, make itpossible to encode more general types of mal-rules (cf. Heift & Schulze, 2007). Similarly,mildly context-sensitive frameworks such as TAG and CCG provide an extended domain oflocality that could in principle be used to express mal-rules encoding errors in those extendeddomains.

To limit the search space explosion commonly resulting from rule interaction, the use ofmal-rules can be limited. One option is to only include the mal-rules in processing whenparsing a sentence with the regular grammar fails. However, this only reduces the searchspace for well-formed strings. If parsing fails, the question which mal-rules need to be addedis not addressed. An intelligent solution to this question was pursued by the ICICLE sys-tem (Michaud & McCoy, 2004). It selects groups of rules based on learner modelling. Forgrammar constructs that the learner has shown mastery of, it uses the native language ruleset, but no rules are included for constructs beyond the developmental level of the learner.For structures currently being acquired, both the native rule set and the mal-rules relating tothose phenomena are included. In a probabilistic grammar formalisms such as probabilis-tic context-free grammars (PCFGs) or when using LFG grammars with optimality theoreticmark-up, one can also try to tune the licensing of grammatical and ungrammatical structuresto the learner language characteristics (cf. Wagner & Foster, 2009).

The second approach to licensing sentences which go beyond the native language gram-mars is based on constraint relaxation (Kwasny & Sondheimer, 1981). It is based on asatisfiability-based grammar setup or a rule-based grammar formalism employing complexcategories (feature structures, first-order terms) for which the process of combining infor-mation (unification) and the enforcement of constraints can be relaxed. Instead of writingcomplete additional rules as in the mal-rule approach, constraint relaxation makes it possibleto eliminate specific requirements of regular rules, thereby admitting additional structuresnormally excluded. For example, the feature specifications ensuring subject-verb agreementcan be eliminated in this way to also license ungrammatical strings.

Relaxation works best when there is a natural one-to-one correspondence between a particu-lar kind of error and a particular specification in the grammar, as in the case of subject-verbagreement errors being directly linked to the person and number specifications of finite verbsand their subject argument. One can also integrate a mechanism corresponding to the meta-

13

Page 14: Learner Corpora and Natural Language Processing

rules of the mal-rule setup, in which the specifications of particular features are relaxed forparticular sets of rules or constraints, or everywhere in the grammar. In this context, oneoften finds the claim that constraint-relaxation does not require learner errors to be preenvis-aged and therefore should be preferred over a mal-rule approach. Closer inspection makesclear that such a broad claim is incorrect: To effectively parse a sentence with a potentiallyrecursive structure, it is essential to distinguish those constraints which may be relaxed fromthose that are supposed to be hard, i.e., always enforced. Otherwise either parsing does notterminate, or any learner sentence can be licensed with any structure so that nothing is gainedby parsing.

Instead of completely eliminating constraints, constraints can also be associated with weightsor probabilities with the goal of preferring or enforcing a particular analysis without rulingout ungrammatical sentences. One prominent example is the Weighted Constraint Depen-dency Grammar (WCDG) approach of Foth et al. (2005). The as yet unsolved questionraised by such approaches (and the probabilistic grammar formalisms mentioned above) ishow the weights can be obtained in a way that makes it possible to identify the likely errorcauses underlying a given learner sentence.

The constraint-relaxation research on learner error diagnosis has generally developed hand-crafted formalisms and solutions. At the same time, computer science has studied ConstraintSatisfaction Problems (CSP) in general and developed general CSP solvers. In essence,licensing a learner utterance then is dealt with in the same way as solving a Sudoku puzzle orfinding a solution to a complex scheduling problem. Boyd (2012) presents an approach thatexplores this connection and shows how learner error analysis can be compiled into a formthat can be handled by general CSP solvers, with diagnosis of learner errors being handledby general conflict detection approaches.

Finally, while most approaches licensing learner language use standard parsing algorithmswith extended or modified grammars to license ill-formed sentences, there also is some workmodifying the algorithmic side instead. For instance, Reuer (2003) combines a constraintrelaxation technique with a parsing algorithm modified to license strings in which wordshave been inserted or omitted, an idea which in essence moves generalisations over rules inthe spirit of meta-rules into the parsing algorithm.

Let us conclude this discussion with a note on evaluation. Just like the analysis of inter-annotator agreement is an important evaluation criterion for the viability of the distinctionsmade by an error annotation scheme, the meaningful evaluation of grammatical error detec-tion approaches is an important and underresearched area. One trend in this domain is toavoid the problem of gold standard error annotation as reference for testing by artificiallyintroducing errors into native corpora (e.g., Foster, 2005). While this may be a good choiceto monitor progress during development, such artificially created test sets naturally only re-flect the properties of learner data in a very limited sense and do not eliminate the need toultimately evaluate on authentic learner data with gold standard annotation. A good overviewof the range of issues behind the difficulty of evaluating grammatical error detection systemsis provided in Chodorow et al. (2012).

3 Case studiesThe following two case studies take a closer look at two representative approaches spellingout some of the general issues introduced above. The first case study focuses on a state-of-the-art learner corpus, for which detailed information on the integration of manual and

14

Page 15: Learner Corpora and Natural Language Processing

automatic analysis as well as detailed inter-annotator agreement information is available.The second case study provides a concrete example for error detection in the domain of wordorder errors, a frequent but underresearched type of error that also allows us to exemplifyhow the nature of the phenomenon determines the choice of NLP analysis used for errordetection.

3.1 Rosen et al. (2013): Manual, semi-automatic and automatic annotation ofa learner corpus

To showcase the key components of a state-of-the-art learner corpus annotation project in-tegrating insights and tools from NLP, we take a look at the Czech as Second Language(CzeSL) corpus based on Rosen et al. (2013). The corpus consists of 2.64 million wordswith written and transcribed spoken components, produced by foreign language learners ofCzech at all levels of proficiency and by Roma acquiring Czech as a second language. Sofar, a sample of 370,000 words from the written portion has been manually annotated.

The corpus is encoded in a multi-tier representation. Tier 0 encodes the learner text as such,tier 1 encodes a first target hypothesis in which all non-words are corrected, and tier 2 is atarget hypothesis in which syntax, word order and a few aspects of style are corrected. Thedifferences between the tiers 0 and 1 and between the tiers 1 and 2 can be annotated witherror tags. Depending on the nature of the error, the annotations link individual tokens acrosstwo tiers, or they can scope over multiple tokens, including discontinuous units. Figure 1(Rosen et al., 2013) exemplifies the CzeSL multi-tier representation.

Figure 1: An example for the multi-tier representation of the CzeSL corpus

Tier 0 at the top is the sentence as written by the learner. This learner sentence includesseveral non-words, of which three require changes to the inflection or stem, and one requirestwo tokens to be merged into a single word. Tier 2, shown at the bottom of the figure, furthercorrects an agreement error to obtain the target hypothesis, of which a glossed version isshown in (8).

(8) Myslím,thinkSG1

žethat

kdybychifSG1

bylwasMASC

sewith

svýmmy

dítetem,child

‘I think that if I were with my child, . . . ’

The errors in individual word forms treated at tier 1 include misspellings, misplaced words,inflectional and derivational morphology, incorrect word stems and invented or foreign words.The tier thus is closely related to the minimal form change target hypothesis we discussed in

15

Page 16: Learner Corpora and Natural Language Processing

Section 2.1, but focuses exclusively on obtaining well-formed individual words rather thanfull syntactic forms. Such a dedicated tier for individual word forms is well-motivated con-sidering the complex morphology of Czech. The target form encoded in tier 2 addresseserrors in agreement, valency, analytical forms, word order, pronominal reference, negativeconcord, the choice of tense, aspect, lexical item or idiom. The manual annotation process issupported by the annotation tool feat3 developed for this purpose.

The annotation started with a pilot annotation of 67 texts totalling almost 10,000 tokens.Fourteen annotators were split into two groups, with each group annotating the sample inde-pendently. The Inter-Annotator Agreement (IAA) was computed using the standard Cohen’skappa metric (cf. Artstein & Poesio, 2008). Since tier 1 and 2 can differ between annotators,for computing the IAA, error tags are projected onto tier 0 tokens. The feedback from thepilot was used to improve the annotation manual, the training of the annotators, and to mod-ify the error taxonomy of the annotation scheme in a few cases. The annotation was thencontinued by 31 annotators who analysed 1,396 texts totalling 175,234 words.

Both for the pilot and for the second annotation phase, a detailed quantitative and qualitativediscussion of IAA results and confusion matrices is provided in Rosen et al. (2013), of whichone is shown in Figure 2.

Tag Type of error Pilot sample All annotated texts

j Avg. tags j Avg. tags

incor* incorBase?incorInfl 0.84 1,038 0.88 14,380

incorBase Incorrect stem 0.75 723 0.82 10,780

incorInfl Incorrect inflection 0.61 398 0.71 4,679

wbd* wbdPre?wbdOther?wbdComp 0.21 37 0.56 840

wbdPre Incorrect word boundary (prefix/preposition) 0.18 11 0.75 484

wbdOther Incorrect word boundary – 0 0.69 842

wbdComp Incorrect word boundary (compound) 0.15 13 0.22 58

fw* fw?fwFab?fwNc 0.47 38 0.36 423

fwNc Foreign/unidentified form 0.24 12 0.30 298

fwFab Made-up/unidentified form 0.14 20 0.09 125

stylColl Colloquial style at T1 0.25 8 0.44 1,396

agr Agreement violation 0.54 199 0.69 2,622

dep Syntactic dependency errors 0.44 194 0.58 3,064

rflx Incorrect reflexive expression 0.26 11 0.42 141

lex Lexical or phraseology error 0.37 189 0.32 1,815

neg Incorrectly expressed negation 0.48 10 0.23 48

ref Pronominal reference error 0.16 18 0.16 115

sec Secondary (consequent) error 0.12 33 0.26 415

stylColl Colloquial style at T2 0.42 24 0.39 633

use Tense, aspect etc. error 0.22 84 0.39 696

vbx Complex verb form error 0.13 15 0.17 233

Figure 2: Inter-annotator agreement for selected CzeSL error tags3http://purl.org/net/feat (last accessed 12.7.2014)

16

Page 17: Learner Corpora and Natural Language Processing

We see that the annotators showed good agreement at tier 1 for incorrect morphology (incor*:κ > 0.8) and improper word boundaries (wbd*: κ > 0.6), and also for agreement errors (agr:κ > 0.6) and syntactic dependency errors (dep: κ 0.58). Such errors can thus be reliablyannotated given the explicit, form-based target hypothesis. On the other hand, pronominalreference (ref), secondary (follow-up) errors (sec), errors in analytical verb forms/complexpredicates (vbx) and negation (neg) show a very low IAA level, as do tags for usage andlexical errors (κ < 0.4).

The authors also conducted a detailed analysis of the relation between target hypotheses anderror annotation agreement. They show that whether the annotators agreed on a target hy-pothesis or not strongly influences the IAA of the error annotation. For example, annotatorsagreed on agreement errors with a κ of 0.82 when their tier 1 target hypotheses agreed, butonly with 0.24 when their target hypotheses differed. The study thus provides clear empiricalsupport for the argument made in Section 2.1 that explicit target hypotheses are essential forreliable corpus annotation.

The manual annotation process is integrated with automatic tools in three ways in the CzeSLproject. First, the availability of target hypotheses in the CzeSL corpus makes it possibleto systematically assign linguistic analyses. The sentences at tier 2 of CzeSL in principlesatisfy the grammatical regularities of native Czech, for which a range of NLP tools havebeen developed. The authors thus obtain morphosyntactic categories and lemmas for thetier 2 target hypotheses by applying the standard Czech taggers and lemmatisers. Theseannotations can then be projected back onto tier 1 and the original learner forms.

Second, some manually assigned error tags can automatically be enriched with further infor-mation. For example, for an incorrect base form or inflection that was manually annotated,an automatic process can diagnose the particular nature of the divergence and, e.g., tag it asan incorrect devoicing, a missing palatalisation, or as an error in capitalisation.

Third, automatic analysis is used to check the manual annotation. For example, the coverageof the automatic Czech morphological analyser used is very high. Learner forms at tier 0that cannot be analysed by the morphological analyser thus are very likely to be ill-formed.By flagging all such forms for which no corrections have been specified at tier 1, one cansystematically alert the annotators of potentially missing specifications.

In addition to these three forms of integration of manual and automatic analysis, the authorsalso explored fully automatic annotation. They use tools originally developed for nativeCzech, the spell and grammar checker Korektor and two types of part-of-speech taggers. Forthe spell and grammar checker, the authors provide a comparison of the auto-correct modeof Korektor with the two target hypotheses spelled out in the CzeSL corpus. The comparisonis based on a subset of almost 10,000 tokens from the pilot set of annotated texts. SinceKorektor integrates non-word spell checking with some grammar checking using limitedcontext, the nature of the tool’s output neither aligns with the purely lexical tier 1 of CzeSL,nor with tier 2 integrating everything from syntax and word order to style. Still, the authorsreport a precision of 74% and a recall of 71% for agreed-upon tier 1 annotations. For tier2, the precision drops to 60% and recall to 45%. Overall, the authors interpret the resultsas sufficiently high to justify integrating the spell checker as a reference into the annotationworkflow as an additional reference for the annotators.

For the part-of-speech taggers, they used two approaches based on different concepts. Marce(Votrubec, 2006) prioritises morphological diagnostics and information from a lexicon overdistributional context, whereas the trigram tagger TnT (Brants, 2000) essentially uses the

17

Page 18: Learner Corpora and Natural Language Processing

opposite strategy, extracting the lexicon from the training data. The authors show that thedifferent strategies indeed lead to significantly different results. The two tagging approachesagreed on the same tag in only 28.8% of the ill-formed tokens in a corpus sample of 12,681tokens. Considered in the context of Section 2.2.1 arguing for the need to linguisticallyannotate learner language close to the observable evidence, this evaluation of standard part-of-speech tagging tools underlines the relevance of a tripartite encoding of parts-of-speech forlearner language, distinguishing morphological, distributional and lexical stem information.

3.2 Metcalf and Meurers (2006): Word order errors and their different NLPrequirements

The second case study is intended to illustrate the issues involved in answering the question ofwhich NLP techniques are needed for which kind of learner language analysis. The originalgoal of Metcalf & Meurers (2006) was to automatically identify word order errors in differenttypes of exercises in an ICALL context.

Language learners are known to produce a range of word order errors, such errors are fre-quent, and word order differs significantly across languages so that transfer errors play a rolein this context (cf., e.g., Odlin, 1989). It is important for learners to master word order, alsosince errors in this domain can significantly complicate comprehension. This is exemplifiedby (9a) from the Hiroshima English Learners’ Corpus (HELC, Miura, 1998), which is virtu-ally incomprehensible, whereas the rearranged word order (9b) is already quite close to thetarget (9c) of this translation activity.

(9) a. He get to cleaned his son.b. He get his son to cleaned.c. He got his son to clean the room.

Word order errors are not uniform. For some, the analysis can rely on lexical triggers (one of afinite set of words is known to occur) or indicative patterns such as characteristic sequences ofparts-of-speech, whereas for others a deeper linguistic analysis is required. Correspondingly,there essentially are two types of approaches for automatically identifying word order errors:on the one hand, an instance-based, shallow list and match approach, and on the other, agrammar-based, deep analysis approach. The issue can be exemplified using two aspects ofEnglish grammar with characteristic word order properties: phrasal verbs and adverbs.

For separable phrasal verbs, particles can precede or follow a full NP object (10), but theymust follow a pronominal object (11). For inseparable phrasal verbs (also called prepositionalverbs), particles always precede the object (12).

(10) a. wrote down the numberb. wrote the number down

(11) a. * wrote down itb. wrote it down

(12) a. ran into {my neighbour, her }b. * ran {my neighbour, her } into

Authentic learner sentences instantiating these error patterns are shown in (13), taken fromthe Chinese Learner English Corpus (CLEC).

18

Page 19: Learner Corpora and Natural Language Processing

(13) a. * so they give up itb. * food which will build up himc. * rather than speed up it.d. * to pick up them.

Complementing such erroneous realisations, we also found patterns of avoidance in theCLEC, such as heavy use of a pattern that is always grammatical (verb < particle < NP),but little use of patterns restricted to certain verb and object types (e.g., verb < pronoun <particle). Related avoidance patterns are discussed in Liao & Fukuya (2002).

The relevant sets of particle verbs and their particles can readily be identified by the surfaceforms or by introducing a part-of-speech annotation including sufficiently detailed classesfor the specific subclasses of particle verbs. As a result, error patterns such as the onesin (14) and (15) (and the alternative avoidance patterns mentioned above) can be identifiedautomatically.

(14) * wrote down it separable-phrasal-verb < particle < pronoun

(15) a. * ran my neighbour into inseparable-phrasal-verb < NP/pronoun < particleb. * ran her into

Precedence (<) here can be specified using a regular expression allowing any number ofwords in-between. We are only interested in matches within the same sentence, so that asentence segmented corpus (Palmer, 2000) is required. In terms of the general landscape oflinguistic patterns that can be searched for in a corpus (Meurers, 2005), we here are dealingwith regular expression patterns over words and part-of-speech tags within basic domains.

The strength of such a shallow pattern matching approach is its simplicity and efficiency. Theweakness is its lack of generalisation over tokens and patterns: The words or parts-of-speechfor which order is to be checked must be known, and all relevant word orders must be preen-visaged and listed. As such, the approach works well for learner language patterns which areheavily restricted either by the targeted error pattern or through the activities in which thelearner language arises, e.g., when analysing learner language for restricted exercises suchas “Build a Sentence” or “Translation” in German Tutor (Heift, 2001).

The placement of adverbs in English illustrates an error type for where such a shallow patternapproach is inadequate. English includes a range of different types of adverbs, and the wordorder possibilities depend on various adverb subclass distinctions. For language learners,the rules governing adverb placement are difficult to notice and master, partly also becausemany adverb placements are not right or wrong, but more or less natural. As a result, studentsfrequently misplace adverbs, as illustrated by the following examples from the Polish part ofthe International Corpus of Learner English (PICLE).4

(16) a. they cannot already live without the dope.b. There have been already several campaigns held by ’Outdoor’.c. while any covert action brings rarely such negative connotations.d. It seems that the Earth has still a lot to reveal

4http://purl.org/net/PICLE

19

Page 20: Learner Corpora and Natural Language Processing

To detect such errors, shallow pattern matching is inadequate. Many placements throughouta sentence are possible, and the possible error patterns are in principle predictable but verynumerous. A compact characterisation of the possible word orders and the correspondingerror patterns requires reference to subclasses of adverbs and syntactic structure. A deep,grammar-based parsing approach can identify the necessary sentence structure, and the lexi-con of the grammar can directly encode the relevant adverb subclasses.

Using a language licensing NLP approach (see Section 2.2.2), mal-rules can be added to agrammar so that a parser can license the additional word orders. A downside of such anapproach arises from the fact that phrase structure grammars express two things at once, atthe level of the local syntactic tree: First, the generative potential (i.e., the combinatorics ofwhich language items must co-occur with which other items), and second, the word orderregularities. Given that the word order possibilities are directly tied to the combinatorics,licensing more word orders significantly increases the size of the grammar and thereforethe search space of parsing. In a lexicalized, constraint-based formalism such as HPSG,the position of an adverb can instead be constrained and recorded using a lexical principlegoverning the realisation of the entire head domain instead of in a local tree. Similarly, adedicated level recording the topological sentence structure may be used to modularize wordorder.

Summing up this second case study on the detection of word order errors, instance-basedmatching is the right approach when lexical material and erroneous placements are pre-dictable and listable and there is limited grammatical variation. Deep processing, on theother hand, is preferable when possible correct answers are predictable but not (conveniently)listable for a given activity or the predictable erroneous placements occur throughout a recur-sively built structure. In such a context, lexicalization or separate topological representationscan be an attractive, modular alternative to phrase structure based encodings.

4 Critical assessment and future directionsWrapping up the chapter with a critical analysis of the role and future directions of NLP inthe context of learner corpus research, we start by identifying the clear opportunity NLP pro-vides for connecting learner corpus and SLA research, before turning to trends and emergingapplications relating to the intersection of NLP and learner language.

4.1 NLP supporting the use of learner corpora for SLA research

Learner corpus research in the second language learning domain was heavily influenced byFLTL concerns, with limited connection to more theoretical SLA issues. One indicationof this disconnect is the emphasis on learner errors in much learner corpus research, whichruns counter to the characterisation of learner language as a systematic interlanguage to becharacterised in its own right as the established basis of SLA research over the past decades.With the corpus collection and representation methods largely established, learner corpusresearch in the future is well-placed to connect to and make important contributions to theSLA mainstream. In support of this perspective, NLP methods analysing and annotatinglearner data are essential for efficiently identifying the subsets of data of relevance to specificSLA research questions.

20

Page 21: Learner Corpora and Natural Language Processing

4.1.1 Facilitating the analysis of learner language in a broad range of tasks

To be able to analyse a broader, representative range of authentic learner language use, weexpect that learner corpora in the future will be designed to record learner language in abroader range of tasks. This seems crucially needed to obtain learner corpora with suffi-cient representation of the language constructs at issue in current SLA research. Consideringwhere such task-based learner data can be obtained, CALL systems will play an increasinglyimportant role – especially where the integration of NLP supports a broader range of tasksin ICALL systems, involving both form and meaning. This includes dialogue systems (Pe-tersen, 2010; Wilske, 2014), where the collection of learner data becomes directly relevantto SLA research on the effectiveness of interventions and feedback.

To facilitate the interpretation and annotation of learner language across tasks, it is necessaryto make explicit and take into account the degree of well-formed and ill-formed variabilitythat is supported by different tasks. Quixal (2012) provides a detailed analysis frameworkincorporating insights from task-based learning and CALL research. He uses this frameworkto analyse and specify the capabilities of the NLP tools that are needed to process the learnerlanguage arising in those tasks (and, as the practical goal of Quixal (2012), the capabilitiesneeded for a CALL authoring system designed to give teachers the ability to assign feedbackto specific sets of learner productions).

The relevance of analysing learner language produced in a range of explicitly controlled tasksstrengthens the link between learner corpora and ICALL. In essence, the connection is three-fold: i) ICALL systems and the automatic annotation of learner corpora both need NLP forthe analysis of learner language, ii) CALL environments facilitate the collection of large,task-based learner corpora, and iii) ICALL systems support the experimental testing of SLAhypotheses.

While more research is needed, results obtained in such ICALL experiments have beenshown to generalise. Consider Petersen (2010), who studies the effectiveness of recasts pro-viding implicit negative feedback in the domain of English question formation. He showsthat recasts in the human-computer setting using the dialogue system are as effective as thesame kind of feedback delivered in dyadic, human-human interaction. Presson et al. (2013)coin the term experimental CALL (eCALL) highlighting the new opportunities arising fromlinking CALL environments and SLA research. The availability of very large learner corporacollected as part of CALL environments, such as EFCAMDAT (Geerzen et al., 2013) alreadymake it possible to investigate important SLA issues with sufficient statistical power, includ-ing both cross-sectional and longitudinal study designs. A good example is the detailed studyby Murakami (2013), who raises serious doubts about the conclusion of the so-called mor-pheme studies, a hallmark of SLA research claiming that the order of acquisition essentiallyis independent of the native language of the learner.

With the advent of a wider range of meaning-based tasks in learner corpora, when thebroader question becomes how a learner uses the linguistic system to express a given mean-ing or function, the analysis of meaning will become an increasingly important aspect ofthe analysis of learner language – an area where NLP techniques are increasingly successful(Dzikovska et al., 2013). Tasks grounded in meaning are also attractive in making it possibleto analyse the choices learners make in the linguistic system to realise the same meaning, i.e.,to study the use of forms under a variationist perspective (Tagliamonte, 2011), connecting lin-guistic and extra-linguistic variables. Corpora integrating a range of tasks may even requiresuch analysis in order to make it possible to distinguish between those language aspects de-

21

Page 22: Learner Corpora and Natural Language Processing

termined by the task and those which are general characteristics of the learner language andlanguage development.

4.2 Deeper linguistic modelling in NLP and learner language analysis

In terms of the nature of the NLP resources and algorithms employed for the analysis oflearner language, where surface-based and where deeper linguistic analysis is needed ar-guably remains the most important modelling question. For example, as argued in Meurerset al. (2014), a task such as Native Language Identification (see also Chapter 29) providesan exciting experimental sandbox for quantitatively evaluating where a surface-based NLPanalysis is successful and sufficiently general to transfer to previously unseen data, new top-ics and genres, and where deeper abstractions are needed to capture aspects of the linguisticsystem which are not directly dependent on the topic or genre.

The question of which aspects of linguistic modelling are relevant for which type of appli-cation and analysis can also be seen to be gaining ground in the overall NLP and educationdomain. For example, automatic essay scoring is an application traditionally relying onsurface-based comparisons (e.g., Latent Semantic Analysis relying solely on which wordsoccur in a text) with essays for the same prompt for which gold-standard labels were man-ually assigned. The recent work on proficiency classification (Pendar & Chapelle, 2008;Yannakoudakis et al., 2011; Vajjala & Lõo, 2013; Hancke & Meurers, 2013), on the otherhand, studies lexical, syntactic, semantic and discourse aspects of the linguistic system, withthe goal of learning something general about the complexity of language. The trend towardsdeeper linguistic modelling is particularly strong where only little language material is avail-able. When analysing short answers to reading comprehension questions consisting of only acouple of sentences, which words occur in the answer by itself is not sufficiently informative.For the C-Rater system (Leacock & Chodorow, 2003) dealing with such data, very specificgrading information is provided by the item designers for each item. Where no such addi-tional, item-specific information is available, approaches must make the most of the limitedamount of language in short answers. This amounts to employing a wide range of linguisticanalyses for automatic meaning assessment, as exemplified by the range of approaches to therecent shared task analysing student responses (Dzikovska et al., 2013).

While most of the initial NLP work on learner corpus annotation and error detection fo-cused on English, we are starting to see some NLP-based work on learner corpora for otherlanguages, such as Korean (Israel et al., 2013), German (Ott & Ziai, 2010; Boyd, 2012;Hirschmann et al., 2013), Estonian (Vajjala & Lõo, 2013) or Czech (Rosen et al., 2013). Theincreased NLP interest in underresourced and endangered languages is likely to support fur-ther growth. Corpora with a broad, representative coverage of target (and native) languagesare essential for supporting general claims about SLA.

As a final point emphasising the relevance of interdisciplinary collaboration in this domainagain, there currently is a surprising lack of interaction between NLP research related to firstlanguage acquisition and that on second language acquisition. Many of the representation,modelling and algorithmic issues are the same or closely related, both in conceptual termsand also in practical terms of jointly developing and using the same tools. There is someprecedence, such as the use of CHILDES tools for SLA research (Myles & Mitchell, 2004),but significantly more opportunity for synergy in future work.

22

Page 23: Learner Corpora and Natural Language Processing

5 Key readingsDaniel Jurafsky and James H. Martin. 2009. Speech and Language Processing: An Introduc-tion to Natural Language Processing, Computational Linguistics and Speech Recognition,Second edition. Upper Saddle River, NJ: Prentice Hall.

The book provides an accessible introduction to Natural Language Processing.It covers the key issues, representations and algorithms from both the theory-driven perspectives and the data-driven, statistical perspectives. It also includesa discussion of current NLP applications.

Markus Dickinson, Chris Brew and Detmar Meurers. 2013. Language and Computers.Wiley-Blackwell.

This is an introduction for the reader interested in exploring the issues in compu-tational linguistics starting from the real-life applications, from search enginesvia dialog systems to machine translation. It includes a chapter on LanguageTutoring Systems which emphasises the motivation of the linguistic modellinginvolved in learner language analysis and the corresponding NLP techniques.

Trude Heift and Mathias Schulze. 2003. Errors and Intelligence in Computer-Assisted Lan-guage Learning: Parsers and Pedagogues. Routledge.

The authors provide a very comprehensive historical perspective of the field ofICALL situated at the intersection of CALL and NLP. NLP analysis specificallytargeting learner language for the most part was developed in this context; anunderstanding of the CALL needs and how NLP has tried to address them thusis of direct relevance to any NLP analysis targeting learner language. Readersinterested in zooming out to NLP in the context of language learning in generalcan find this discussion in Meurers (2012).

Claudia Leacock, Martin Chodorow, Michael Gamon and Joel Tetreault. 2014. AutomatedGrammatical Error Detection for Language Learners, Second edition. Synthesis Lectureson Human Language Technologies. Morgan & Claypool Publishers.

Starting with a characterisation of the grammatical constructions that secondlanguage learners of English find most difficult to master, the book introducesthe techniques used to automatically detect errors in learner writing. It discussesthe research issues and approaches, how to report results to ensure sustainedprogress in the field, and the use of annotated learner corpora in that context.

Ana Díaz-Negrillo, Nicolas Ballier and Paul Thompson (eds). 2013. Automatic Treatmentand Analysis of Learner Corpus Data. Studies in Corpus Linguistics 59. John Benjamins.

The book provides a broad collection of current perspectives on the automaticanalysis of learner corpora. It discusses corpus representation aspects, concep-tual and methodological issues, and presents concrete studies based on a rangeof different written and spoken language data.

Detmar Meurers. 2005. On the use of electronic corpora for theoretical linguistics. Lingua.115 (11). 1619–1639. http://purl.org/dm/papers/meurers-03.html (last accessed 12.7.2014)

23

Page 24: Learner Corpora and Natural Language Processing

What exactly is involved in using corpora to address linguistic research issues?The article discusses how the linguistic terminology used in formulating re-search questions can be translated to the annotation found in a corpus and theconceptual and methodological issues arising in this context. Readers who areparticularly interested in syntactically annotated corpora can continue with thediscussion with Meurers & Müller (2009), where such treebanks are the mainsource of data used.

Ron Artstein and Massimo Poesio. 2008. Survey Article: Inter-Coder Agreement for Com-putational Linguistics. Computational Linguistics 34 (4). 555–596. Extended version:http:/cswww.essex.ac.uk/Research/nle/arrau (last accessed 12.7.2014)

Corpus annotation is only useful for scientific work if it provides a reliable,replicable index into the data. Investigating the Inter-Code Agreement betweenmultiple independent annotators, is the most important method for determiningwhether the classes operationalized and documented in the annotation schemecan reliably be distinguished solely based on the information that is available inthe corpus and its meta-information. The article provides the essential method-ological background for investigating the Inter-Coder Agreement, which in thecorpus context typically is referred to as Inter-Annotator Agreement.

AcknowledgementsI would like to thank Heike Zinsmeister, the anonymous reviewers and the editors for thehelpful feedback they provided, and the editors for their friendly patience with deadline-challenged authors.

24

Page 25: Learner Corpora and Natural Language Processing

ReferencesAldabe, I. (2011). Automatic exercise generation based on corpora and natural language processing techniques.

Ph.D. thesis, Euskal Herriko Unibertsitatea (Universidad del País Vasco). URL http://purl.org/net/Aldabe-11.pdf (last accessed 12.7.2014).

Alexopoulou, T., H. Yannakoudakis & T. Briscoe (2011). From discriminative features to learner grammars: adata driven approach to learner corpora. Second Language Forum Research, Maryland. URL http://purl.org/net/Alexopoulou.ea-11.pdf (last accessed 12.7.2014).

Amaral, L. & D. Meurers (2009). Little Things With Big Effects: On the Identification and Interpretation ofTokens for Error Diagnosis in ICALL. CALICO Journal 26(3), 580–591. URL http://purl.org/dm/papers/amaral-meurers-09.html (last accessed 12.7.2014).

Amaral, L. & D. Meurers (2011). On Using Intelligent Computer-Assisted Language Learning in Real-Life Foreign Language Teaching and Learning. ReCALL 23(1), 4–24. URL http://purl.org/dm/papers/amaral-meurers-10.html (last accessed 12.7.2014).

Artstein, R. & M. Poesio (2008). Survey Article: Inter-Coder Agreement for Computational Linguistics. Com-putational Linguistics 34(4), 555–596. URL http://aclweb.org/anthology/J08-4004.pdf (last accessed 12.7.2014).

Bachman, L. F. & A. S. Palmer (1996). Language Testing in Practice: Designing and Developing Useful Lan-guage Tests. Oxford University Press.

Bailey, S. (2008). Content Assessment in Intelligent Computer-Aided Language Learning: Meaning Er-ror Diagnosis for English as a Second Language. Ph.D. thesis, The Ohio State University. URL http://purl.org/net/Bailey-08.pdf (last accessed 12.7.2014).

Bley-Vroman, R. (1983). The comparative fallacy in interlanguage studies: The case of systematicity. LanguageLearning 33(1), 1–17.

Böhmová, A., J. Hajic, E. Hajicová & B. Hladká (2003). The Prague Dependency Treebank. In A. Abeillé (ed.),Treebanks: Building and Using Parsed Corpora, Dordrecht: Kluwer, chap. 7, pp. 103–127.

Boyd, A. (2012). Detecting and Diagnosing Grammatical Errors for Beginning Learners of German: FromLearner Corpus Annotation to Constraint Satisfaction Problems. Ph.D. thesis, The Ohio State University.URL http://purl.org/net/Boyd-12.pdf (last accessed 12.7.2014).

Boyd, A., M. Dickinson & D. Meurers (2008). On Detecting Errors in Dependency Treebanks. Research on Lan-guage and Computation 6(2), 113–137. URL http://purl.org/dm/papers/boyd-et-al-08.html (last accessed 12.7.2014).

Brants, T. (2000). TnT – A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Conference on AppliedNatural Language Processing. pp. 224–231. URL http://aclweb.org/anthology/A00-1031 (last accessed 12.7.2014).

Brooke, J. & G. Hirst (2013). Native Language Detection with ’Cheap’ Learner Corpora. In S. Granger,G. Gilquin & F. Meunier (eds.), Twenty Years of Learner Corpus Research. Looking Back, Moving Ahead.Proceedings of the First Learner Corpus Research Conference (LCR 2011). Louvain-la-Neuve, Belgium:Presses universitaires de Louvain, vol. 1, pp. 37–47.

Brown, J. & M. Eskenazi (2005). Student, text and curriculum modeling for reader-specific document retrieval.In Proceedings of the IASTED International Conference on Human-Computer Interaction. Phoenix, Arizona.URL http://purl.org/net/Brown.Eskenazi-05.pdf (last accessed 12.7.2014).

Brown, K. (ed.) (2006). Encyclopedia of Language and Linguistics. Oxford: Elsevier, 2 ed.Canale, M. & M. Swain (1980). Theoretical Bases of Communicative Approaches to Second Language

Teaching and Testing. Applied Linguistics 1, 1–47. URL http://applij.oxfordjournals.org/cgi/reprint/I/1/1.pdf (last accessed 12.7.2014).

Chapelle, C. A. (ed.) (2012). Encyclopedia of Applied Linguistics. Oxford: Wiley.Cheung, J. C. K. & G. Penn (2009). Topological field parsing of German. In ACL-IJCNLP ’09: Proceedings of

the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference onNatural Language Processing of the AFNLP: Volume 1. Morristown, NJ, USA: Association for ComputationalLinguistics, pp. 64–72. URL http://aclweb.org/anthology/P09-1008 (last accessed 12.7.2014).

Chodorow, M., M. Dickinson, R. Israel & J. Tetreault (2012). Problems in Evaluating Grammatical Error De-tection Systems. In Proceedings of COLING 2012. Mumbai, India, pp. 611–628. URL http://aclweb.org/anthology/C12-1038 (last accessed 12.7.2014).

De Felice, R. (2008). Automatic Error Detection in Non-native English. Ph.D. thesis, St Catherine’s College,University of Oxford. URL http://purl.org/net/DeFelice-08.pdf (last accessed 12.7.2014).

de Haan, P. (2000). Tagging non-native English with the TOSCA-ICLE tagger. In Mair & Hundt (2000), pp.69–79.

de Mönnink, I. (2000). Parsing a learner corpus. In Mair & Hundt (2000), pp. 81–90.

25

Page 26: Learner Corpora and Natural Language Processing

Dickinson, M. & M. Ragheb (2009). Dependency Annotation for Learner Corpora. In Proceedings of the EighthWorkshop on Treebanks and Linguistic Theories (TLT-8). Milan, Italy. URL http://purl.org/net/Dickinson.Ragheb-09.html (last accessed 12.7.2014).

Dzikovska, M., R. Nielsen et al. (2013). SemEval-2013 Task 7: The Joint Student Response Analysis and8th Recognizing Textual Entailment Challenge. In Second Joint Conference on Lexical and ComputationalSemantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation(SemEval 2013). Atlanta, Georgia, USA: Association for Computational Linguistics, pp. 263–274. URLhttp://aclweb.org/anthology/S13-2045 (last accessed 12.7.2014).

Díaz Negrillo, A. (2007). A Fine-Grained Error Tagger for Learner Corpora. Ph.D. thesis, University of Jaén,Spain.

Díaz Negrillo, A. & J. Fernández Domínguez (2006). Error Tagging Systems for Learner Corpora. Re-vista Española de Lingüística Aplicada (RESLA) 19, 83–102. URL http://purl.org/net/DiazNegrillo.FernandezDominguez-06.pdf (last accessed 12.7.2014).

Díaz Negrillo, A., D. Meurers, S. Valera & H. Wunsch (2010). Towards interlanguage POS annotation foreffective learner corpora in SLA and FLT. Language Forum 36(1–2), 139–154. URL http://purl.org/dm/papers/diaz-negrillo-et-al-09.html (last accessed 12.7.2014).

Fitzpatrick, E. & M. S. Seegmiller (2004). The Montclair electronic language database project. In U. Connor& T. Upton (eds.), Applied Corpus Linguistics: A Multidimensional Perspective, Amsterdam: Rodopi. URLhttp://purl.org/net/Fitzpatrick.Seegmiller-04.pdf (last accessed 12.7.2014).

Flor, M., Y. Futagi, M. Lopez & M. Mulholland (2014). Patterns of misspellings in L2 and L1 English: a viewfrom the ETS Spelling Corpus. In A.-K. Helland Gujord (ed.), Proceedings of the Learner Corpus ResearchConference (LCR 2013). University of Bergen, Bergen Language and Linguistic Studies (BeLLS). To appear.

Foster, J. (2005). Good Reasons for Noting Bad Grammar: Empirical Investigations into the Parsing of Un-grammatical Written English. Ph.D. thesis, Trinity College Dublin, Department of Computer Science. URLhttp://purl.org/net/Foster-05.ps (last accessed 12.7.2014).

Foth, K., W. Menzel & I. Schröder (2005). Robust parsing with weighted constraints. Natural Language Engi-neering 11(01), 1–25.

Gamon, M., C. Leacock, C. Brockett, W. B. Dolan, J. Gao, D. Belenko & A. Klementiev (2009). Using StatisticalTechniques and Web Search to Correct ESL Errors. CALICO Journal 26(3), 491–511. URL http://purl.org/calico/Gamon.Leacock.ea-09.pdf (last accessed 12.7.2014).

Garside, R., G. Leech & T. McEnery (eds.) (1997). Corpus annotation: linguistic information from computertext corpora. Harlow, England: Addison Wesley Longman Limited.

Geerzen, J., T. Alexopoulou & A. Korhonen (2013). Automatic linguistic annotation of large scale L2 databases:The EF-Cambridge Open Language Database (EFCAMDAT). In Proceedings of the 31st Second LanguageResearch Forum (SLRF). Cascadilla Press. URL http://purl.org/icall/efcamdat (last accessed 12.7.2014).

Granfeldt, J., P. Nugues, E. Persson, L. Persson, F. Kostadinov, M. Ågren & S. Schlyter (2005). Direkt Profil: ASystem for Evaluating Texts of Second Language Learners of French Based on Developmental Sequences. InJ. Burstein & C. Leacock (eds.), Proceedings of the Second Workshop on Building Educational ApplicationsUsing NLP. Ann Arbor, Michigan: Association for Computational Linguistics, pp. 53–60. URL http://aclweb.org/anthology/W05-0209 (last accessed 12.7.2014).

Granger, S. (2003). Error-tagged learner corpora and CALL: A promising synergy. CALICO Journal 20(3),465–480. URL http://purl.org/calico/Granger03.pdf (last accessed 12.7.2014).

Granger, S. (2008). Learner corpora. In A. Lüdeling & M. Kytö (eds.), Corpus linguistics. An internationalhandbook, Berlin, New York: Walter de Gruyter, pp. 259–275.

Granger, S., E. Dagneaux, F. Meunier & M. Paquot (2009). International Corpus of Learner English, Version 2.Presses Universitaires de Louvain, Louvain-la-Neuve.

Granger, S., O. Kraif, C. Ponton, G. Antoniadis & V. Zampa (2007). Integrating learner corpora and naturallanguage processing: A crucial step towards reconciling technological sophistication and pedagogical effec-tiveness. ReCALL 19(3).

Granger, S. & F. Meunier (1994). Towards a grammar checker for learners of English. In U. Fries, G. Tottie &P. Schneider (eds.), Creating and Using English Language Corpora: Papers from the Fourteenth InternationalConference on English Language Research on Computerized Corpora, Zurich 1993, Amsterdam: Rodopi, pp.79–91.

Hahn, M. & D. Meurers (2012). Evaluating the Meaning of Answers to Reading Comprehension Questions:A Semantics-Based Approach. In Proceedings of the 7th Workshop on Innovative Use of NLP for BuildingEducational Applications (BEA-7) at NAACL-HLT 2012. Montreal, pp. 94–103. URL http://purl.org/dm/papers/hahn-meurers-12.html (last accessed 12.7.2014).

Hancke, J. & D. Meurers (2013). Exploring CEFR classification for German based on rich linguistic modeling. InLearner Corpus Research 2013, Book of Abstracts. Bergen, Norway. URL http://purl.org/dm/papers/Hancke.Meurers-13.html (last accessed 12.7.2014).

26

Page 27: Learner Corpora and Natural Language Processing

Heift, T. (2001). Error-Specific and Individualized Feedback in a Web-based Language Tutoring System: DoThey Read It? ReCALL 13(2), 129–142.

Heift, T. & M. Schulze (2007). Errors and Intelligence in Computer-Assisted Language Learning: Parsers andPedagogues. Routledge.

Hirschmann, H., S. Doolittle & A. Lüdeling (2007). Syntactic annotation of non-canonical linguistic structures.In Proceedings of Corpus Linguistics 2007. Birmingham. URL http://purl.org/net/Hirschmann.Doolittle.ea-07.pdf (last accessed 12.7.2014).

Hirschmann, H., A. Lüdeling, I. Rehbein, M. Reznicek & A. Zeldes (2013). Underuse of Syntactic Categories inFalko. A Case Study on Modification. In S. Granger, G. Gilquin & F. Meunier (eds.), 20 years of learner cor-pus research. Looking back, Moving ahead. Corpora and Lgnague in Use – Proceedings 1, Louvain la Neuve:Presses universitaires de Louvain. URL http://purl.org/net/Hirschmann.Luedeling.ea-13.pdf (last accessed 12.7.2014).

Israel, R., M. Dickinson & S.-H. Lee (2013). Detecting and Correcting Learner Korean Particle Omission Errors.In Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013).Nagoya, Japan. URL http://aclweb.org/anthology/I13-1199.pdf (last accessed 12.7.2014).

Johnson, M. (1994). Two ways of formalizing grammars. Linguistics and Philosophy 17(3), 221–248.Jurafsky, D. & J. H. Martin (2009). Speech and Language Processing: An Introduction to Natural Language

Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Prentice Hall, 2ndedition ed.

King, L. & M. Dickinson (2013). Shallow Semantic Analysis of Interactive Learner Sentences. In Proceedingsof the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, GA USA.URL http://aclweb.org/anthology/W13-1702.pdf (last accessed 12.7.2014).

Kintsch, W. & P. Mangalath (2011). The Construction of Meaning. Topics in Cognitive Science 3(2), 346–370.Klein, W. & C. Perdue (1997). The Basic Variety (or: Couldn’t natural languages be much simpler?). Second

language research 13(4), 301–347.Krivanek, J. & D. Meurers (2013). Comparing Rule-Based and Data-Driven Dependency Parsing of Learner

Language. In K. Gerdes, E. Hajicová & L. Wanner (eds.), Computational Dependency Theory, Amsterdam:IOS Press, pp. 207–225. URL http://purl.org/dm/papers/krivanek-meurers-13.html (last accessed 12.7.2014).

Kwasny, S. C. & N. K. Sondheimer (1981). Relaxation Techniques for Parsing Grammatically Ill-Formed Inputin Natural Language Understanding Systems. American Journal of Computational Linguistics 7(2), 99–108.URL http://purl.org/net/Kwasny.Sondheimer-81.pdf (last accessed 12.7.2014).

Leacock, C. & M. Chodorow (2003). C-rater: Automated Scoring of Short-Answer Questions. Computers andthe Humanities 37, 389–405.

Leech, G. (2004). Chapter 2. Adding Linguistic Annotation. In M. Wynne (ed.), Developing Linguistic Corpora:a Guide to Good Practice, Oxford: Oxbow Books. URL http://purl.org/net/Leech-04.html (last accessed 12.7.2014).

Liao, Y. D. & Y. J. Fukuya (2002). Avoidance of phrasal verbs: The case of Chinese learners of English. SecondLanguage Studies 20(2), 71–106. URL http://purl.org/net/Liao.Fukuya-02.pdf (last accessed 12.7.2014).

Lüdeling, A. (2008). Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora.In M. Walter & P. Grommes (eds.), Fortgeschrittene Lernervarietäten: Korpuslinguistik und Zweispracher-werbsforschung, Tübingen: Max Niemeyer Verlag, pp. 119–140.

Lüdeling, A. & M. Kytö (eds.) (2008). Corpus Linguistics. An International Handbook, vol. 1 of Handbooks ofLinguistics and Communication Science. Berlin: Mouton de Gruyter.

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Vol 1: The Format and Programs, Vol2: The Database. Mahwah, NJ: Lawrence Erlbaum Associates, 3rd ed.

Mair, C. & M. Hundt (eds.) (2000). Corpus Linguistics and Linguistic Theory. Amsterdam: Rodopi.Matthews, C. (1992). Going AI: Foundations of ICALL. Computer Assisted Language Learning 5(1), 13–31.McEnery, T., R. Xiao & Y. Tono (2006). Corpus-based language studies: An advanced resource book. London:

Routledge.Mel’cuk, I. (1988). Dependency Syntax: Theory and Practice. State University of New York Press.Metcalf, V. & D. Meurers (2006). When to Use Deep Processing and When Not To – The Example of Word

Order Errors. URL http://purl.org/net/icall/handouts/calico06-metcalf-meurers.pdf (last accessed 12.7.2014).Pre-conference Workshop on “NLP in CALL – Computational and Linguistic Challenges”. CALICO 2006.May 17, 2006. University of Hawaii.

Meurers, D. (2012). Natural Language Processing and Language Learning. In Chapelle (2012). URL http://purl.org/dm/papers/meurers-12.html (last accessed 12.7.2014).

Meurers, D., J. Krivanek & S. Bykh (2014). On the Automatic Analysis of Learner Corpora: Native Lan-guage Identification as Experimental Testbed of Language Modeling between Surface Features and LinguisticAbstraction. In Diachrony and Synchrony in English Corpus Studies. Frankfurt am Main: Peter Lang, pp.285–314.

27

Page 28: Learner Corpora and Natural Language Processing

Meurers, D., R. Ziai, L. Amaral, A. Boyd, A. Dimitrov, V. Metcalf & N. Ott (2010). Enhancing AuthenticWeb Pages for Language Learners. In Proceedings of the 5th Workshop on Innovative Use of NLP for Build-ing Educational Applications (BEA-5) at NAACL-HLT 2010. Los Angeles: Association for ComputationalLinguistics, pp. 10–18. URL http://aclweb.org/anthology/W10-1002.pdf (last accessed 12.7.2014).

Meurers, W. D. (2005). On the use of electronic corpora for theoretical linguistics. Case studies from the syntaxof German. Lingua 115(11), 1619–1639. URL http://purl.org/dm/papers/meurers-03.html (last accessed 12.7.2014).

Meurers, W. D. & S. Müller (2009). Corpora and Syntax (Article 42). In A. Lüdeling & M. Kytö (eds.), Corpuslinguistics, Berlin: Mouton de Gruyter, vol. 2 of Handbooks of Linguistics and Communication Science, pp.920–933. URL http://purl.org/dm/papers/meurers-mueller-09.html (last accessed 12.7.2014).

Michaud, L. N. & K. F. McCoy (2004). Empirical Derivation of a Sequence of User Stereotypes for LanguageLearning. User Modeling and User-Adapted Interaction 14(4), 317–350. URL http://www.springerlink.com/content/lp86123772372646/ (last accessed 12.7.2014).

Mislevy, R. J., R. G. Almond & J. F. Lukas (2003). A Brief Introduction to Evidence-centered Design. ETSResearch Report RR-03-16, Educational Testing Serice (ETS), Princeton, NJ, USA. URL http://www.ets.org/Media/Research/pdf/RR-03-16.pdf (last accessed 12.7.2014).

Miura, S. (1998). Hiroshima English Learners’ Corpus: English learner No. 2 (English I & English II). De-partment of English Language Education, Hiroshima University. http://purl.org/icall/helc (last accessed 12.7.2014).

Murakami, A. (2013). Individual Variation and the Role of L1 in the L2 Development of English GrammaticalMorphemes: Insights From Learner Corpora. Ph.D. thesis, University of Cambridge.

Myles, F. & R. Mitchell (2004). Using information technology to support empirical SLA research. Journal ofApplied Linguistics 1(2), 169–196.

Odlin, T. (1989). Language Transfer: Cross-linguistic influence in language learning. New York: CUP.Ott, N. & D. Meurers (2010). Information Retrieval for Education: Making Search Engines Language Aware.

Themes in Science and Technology Education. Special issue on computer-aided language analysis, teach-ing and learning: Approaches, perspectives and applications 3(1–2), 9–30. URL http://purl.org/dm/papers/ott-meurers-10.html (last accessed 12.7.2014).

Ott, N. & R. Ziai (2010). Evaluating Dependency Parsing Performance on German Learner Language. InM. Dickinson, K. Müürisep & M. Passarotti (eds.), Proceedings of the Ninth International Workshop onTreebanks and Linguistic Theories. vol. 9 of NEALT Proceeding Series, pp. 175–186. URL http://hdl.handle.net/10062/15960 (last accessed 12.7.2014).

Ott, N., R. Ziai & D. Meurers (2012). Creation and Analysis of a Reading Comprehension Exercise Corpus:Towards Evaluating Meaning in Context. In T. Schmidt & K. Wörner (eds.), Multilingual Corpora andMultilingual Corpus Analysis, Amsterdam: Benjamins, Hamburg Studies in Multilingualism (HSM), pp. 47–69. URL http://purl.org/dm/papers/ott-ziai-meurers-12.html (last accessed 12.7.2014).

Palmer, D. D. (2000). Tokenisation and Sentence Segmentation. In R. Dale, H. Moisl & H. Somers (eds.), Hand-book of Natural Language Processing, New York: Marcel Dekker, pp. 11–35. URL http://www.netLibrary.com/ebook_info.asp?product_id=47610 (last accessed 12.7.2014).

Pendar, N. & C. Chapelle (2008). Investigating the Promise of Learner Corpora: Methodological Issues. CALICOJournal 25(2), 189–206. URL https://calico.org/html/article_689.pdf (last accessed 12.7.2014).

Perdue, C. (ed.) (1993). Adult Language Acquisition. Cross-Linguistic Perspectives. Volume 1: Field methods.Cambridge, UK: Cambridge University Press.

Petersen, K. (2010). Implicit Corrective Feedback in Computer-Guided Interaction: Does Mode Matter? Ph.D.thesis, Georgetown University. URL http://purl.org/net/Petersen-10.pdf (last accessed 12.7.2014).

Presson, N., C. Davy & B. MacWhinney (2013). Experimentalized CALL for adult second language learners. InJ. W. Schwieter (ed.), Innovative Research and Practices in Second Language Acquisition and Bilingualism,John Benjamins, pp. 139—-164. URL http://talkbank.org/SLA/pubs/eCALL.pdf (last accessed 12.7.2014).

Quixal, M. (2012). Language Learning Tasks and Automatic Analysis of Learner Language. Connecting FLTLand NLP in the design of ICALL materials supporting effective use in real-life instruction. Ph.D. thesis,Universitat Pompeu Fabra, Barcelona and Eberhard-Karls-Universität Tübingen.

Ragheb, M. & M. Dickinson (2012). Defining Syntax for Learner Language Annotation. In Proceedings ofCOLING 2012. Mumbai, India, pp. 965–974. URL http://cl.indiana.edu/~md7/papers/ragheb-dickinson12.html (last accessed 12.7.2014).

Ragheb, M. & M. Dickinson (2013). Inter-annotator Agreement for Dependency Annotation of Learner Lan-guage. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applica-tions. Atlanta, GA USA. URL http://cl.indiana.edu/~md7/papers/ragheb-dickinson13.html (last accessed 12.7.2014).

Rahkonen, M. & G. Håkansson (2008). Production of Written L2-Swedish – Processability or Input Frequencies?In J.-U. Keßler (ed.), Processability Approaches to Second Language Development and Second Language

28

Page 29: Learner Corpora and Natural Language Processing

Learning, Cambridge Scholars Publishing, pp. 135–161.Reuer, V. (2003). Error recognition and feedback with Lexical Functional Grammar. CALICO Journal 20(3),

497–512. URL http://purl.org/calico/Reuer-03.pdf (last accessed 12.7.2014).Reznicek, M., A. Lüdeling & H. Hirschmann (2013). Competing Target Hypotheses in the Falko Corpus: A

Flexible Multi-Layer Corpus Architecture. In A. Díaz-Negrillo, N. Ballier & P. Thompson (eds.), AutomaticTreatment and Analysis of Learner Corpus Data, John Benjamins, vol. 59, pp. 101–123.

Reznicek, M., A. Lüdeling, C. Krummes & F. Schwantuschke (2012). Das Falko-Handbuch. Korpusaufbau undAnnotationen Version 2.0. URL http://purl.org/net/Falko-v2.pdf (last accessed 12.7.2014).

Reznicek, M. & H. Zinsmeister (2013). STTS-Konfusionsklassen beim Tagging von Fremdsprachlernertex-ten. Journal for Language Technology and Computational Lingustics (JLCL) URL http://www.jlcl.org/2013_Heft1/4Reznicek.pdf (last accessed 12.7.2014).

Rimrott, A. & T. Heift (2008). Evaluating automatic detection of misspellings in German. Language Learningand Technology 12(3), 73–92. URL http://llt.msu.edu/vol12num3/rimrottheift.pdf (last accessed 12.7.2014).

Rosen, A., J. Hana, B. Štindlová & A. Feldman (2013). Evaluating and automating the annotation of a learnercorpus. Language Resources and Evaluation pp. 1–28.

Rosén, V. & K. D. Smedt (2010). Syntactic Annotation of Learner Corpora. In H. Johansen, A. Golden, J. E.Hagen & A.-K. Helland (eds.), Systematisk, variert, men ikke tilfeldig. Antologi om norsk som andrespråk ianledning Kari Tenfjords 60-årsdag [Systematic, varied, but not arbitrary. Anthology about Norwegian as asecond language on the occasion of Kari Tenfjord’s 60th birthday], Oslo: Novus forlag, pp. 120–132.

Rozovskaya, A. & D. Roth (2010). Annotating ESL errors: Challenges and rewards. In Proceedings of the 5thWorkshop on Innovative Use of NLP for Building Educational Applications (BEA-5) at NAACL-HLT 2010.Los Angeles: Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W10-1004.pdf (last accessed 12.7.2014).

Sagae, K., E. Davis, A. Lavie, B. MacWhinney & S. Wintner (2010). Morphosyntactic annotation of CHILDEStranscripts. Journal of Child Language 37(3), 705–729.

Schwind, C. B. (1990). An Intelligent Language Tutoring System. International Journal of Man-Machine Studies33, 557–579.

Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10(3), 209–231.Tagliamonte, S. A. (2011). Variationist Sociolinguistics: Change, Observation, Interpretation. John Wiley &

Sons.Tetreault, J. & M. Chodorow (2008a). Native Judgments of Non-Native Usage: Experiments in Preposition Error

Detection. In Proceedings of the workshop on Human Judgments in Computational Linguistics at COLING-08. Manchester, UK: Association for Computational Linguistics, pp. 24–32. URL http://www.ets.org/Media/Research/pdf/h4.pdf (last accessed 12.7.2014).

Tetreault, J. & M. Chodorow (2008b). The Ups and Downs of Preposition Error Detection in ESL Writing. InProceedings of the 22nd International Conference on Computational Linguistics (COLING-08). Manchester,UK: Association for Computational Linguistics, pp. 865–872. URL http://www.ets.org/Media/Research/pdf/r3.pdf (last accessed 12.7.2014).

Tono, Y. (2013). Criterial feature extraction using parallel learner corpora and machine learning. In A. Díaz-Negrillo, N. Ballier & P. Thompson (eds.), Automatic Treatment and Analysis of Learner Corpus Data, JohnBenjamins, pp. 169–204.

Tono, Y., E. Izumi & E. Kaneko (2004). The NICT JLE Corpus: the final report. In K. Bradford-Watts, C. Ikeguchi & M. Swanson (eds.), Proceedings of the JALT Conference. Tokyo: JALT. URLhttp://jalt-publications.org/archive/proceedings/2004/E126.pdf (last accessed 12.7.2014).

Vajjala, S. & K. Lõo (2013). Role of Morpho-syntactic features in Estonian Proficiency Classification. In Pro-ceedings of the 8th Workshop on Innovative Use of NLP for Building Educational Applications (BEA8), As-sociation for Computational Linguistics. URL http://aclweb.org/anthology/W13-1708.pdf (last accessed 12.7.2014).

van Rooy, B. & L. Schäfer (2002). The Effect of Learner Errors on POS Tag Errors during Automatic POSTagging. Southern African Linguistics and Applied Language Studies 20, 325–335.

van Rooy, B. & L. Schäfer (2003). An Evaluation of Three POS Taggers for the Tagging of the TswanaLearner English Corpus. In D. Archer, P. Rayson, A. Wilson & T. McEnery (eds.), Proceedings of theCorpus Linguistics 2003 conference Lancaster University (UK), 28 – 31 March 2003. vol. 16 of Uni-versity Centre For Computer Corpus Research On Language Technical Papers, pp. 835–844. URL http://www.corpus4u.org/upload/forum/2005092023174960.pdf (last accessed 12.7.2014).

Votrubec, J. (2006). Morphological tagging based on averaged perceptron. In WDS’06 proceedings of contributedpapers. Praha, Czechia: Matfyzpress, Charles University, pp. 191—-195. URL http://www.mff.cuni.cz/veda/konference/wds/proc/pdf06/WDS06_134_i3_Votrubec.pdf (last accessed 12.7.2014).

Wagner, J. & J. Foster (2009). The effect of correcting grammatical errors on parse probabilities. In Proceedingsof the 11th International Conference on Parsing Technologies (IWPT’09). Paris, France: Association for

29

Page 30: Learner Corpora and Natural Language Processing

Computational Linguistics, pp. 176–179. URL http://aclweb.org/anthology/W09-3827 (last accessed 12.7.2014).

Weischedel, R. M. & N. K. Sondheimer (1983). Meta-rules as a Basis for Processing Ill-formed Input. Compu-tational Linguistics 9(3-4), 161–177. URL http://aclweb.org/anthology/J83-3003 (last accessed 12.7.2014).

Wichmann, A. (2008). Speech corpora and spoken corpora. In Lüdeling & Kytö (2008), pp. 187–207.Wilske, S. (2014). Form and meaning in dialogue-based computer-assisted language learning. Ph.D. thesis,

Universität des Saarlandes, Saarbrücken. URL http://purl.org/icall/wilske-thesis (last accessed 12.7.2014).Winograd, T. & F. Flores (1986). Understanding Computers and Cognition: A New Foundation for Design.

Norwood, NJ: Ablex.Yannakoudakis, H., T. Briscoe & B. Medlock (2011). A new dataset and method for automatically grading ESOL

texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: HumanLanguage Technologies - Volume 1. Stroudsburg, PA, USA: Association for Computational Linguistics, HLT’11, pp. 180–189. URL http://aclweb.org/anthology/P11-1019.pdf (last accessed 12.7.2014). Corpus avail-able: http://ilexir.co.uk/applications/clc-fce-dataset (last accessed 12.7.2014).

Zyzik, E. & C. Azevedo (2009). Word Class Distinctions in Second Language Acquisition. SSLA 31(31), 1–29.

30