James Wilson University of Leeds [email protected].

33
ReadingCorp: a corpus- based approach to teaching Russian for Research James Wilson University of Leeds [email protected]

Transcript of James Wilson University of Leeds [email protected].

Page 1: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

ReadingCorp: a corpus-based approach to teaching Russian for Research

James WilsonUniversity of Leeds

[email protected]

Page 2: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Part 1: “The Problem” (How do we teach ab-initio students to read authentic Russian texts in a year?)

Part 2: “A potential corpus-based solution”

The use of corpora and corpus tools to train ab-initio students to read authentic academic texts

ReadingCorp project Motivated by the demand for specialist PG language

training in Russian and the findings of previous research (Russian for Research 2008)

Structure of presentation

Page 3: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

6-month project funded by the Centre for East European Language Based Area Studies (CEELBAS) and carried out at the University of Sheffield in 2008

The project aimed to:

◦ build up a profile of what PG language training was offered at CEELBAS institutions and to identify the methods of and problems in teaching languages for research;

◦ identify the demand for language training for research purposes at member departments and to establish what such language training should include;

◦ look at new modes of delivery such as distance- and computer-aided learning and the possibility of sharing of resources.

Russian for Research project

Page 4: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Departments of Russian and Slavonic Studies are attracting more PG students who do not know Russian and whose research is therefore restricted (the same situation is true of other languages)

Students are unable to read primary sources, use archives and work with some online packages without Russian

You simply can’t do Russian-related economic research without Russian”; “Without language skills research is much impaired”

Background information

Page 5: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

There is a “massive” demand for PG language training across CEELBAS institutions

Potentially good researchers are being lost due to the lack of adequate PG language training

Conventional PG-focused intensive courses are effective but impractical at most institutions; they are not financially sustainable at any institution in the long term

Other methods (“piggy-backing”, non-intensive reading modules, following UG programmes) do not work

It is not possible to offer specialist tuition to the individual student or to cover all research areas

Texts are out-dated and/or more suited to some disciplines than others; their content is determined subjectively by linguists

A cost-effective way of delivering shared PG language programmes is necessary

Conclusions

Page 6: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Corpora are well suited to LSP learning and teaching for several reasons:◦ they can inform us of key items of vocabulary and grammar

points that require instruction in specific domains;◦ frequency data shape materials and syllabus design;◦ breadth of topics: a corpus can be created on any topic, no

matter how specialist, for which there is enough available material;

◦ needs of the individual: a corpus can be created from articles directly relevant to an individual student’s research topic;

◦ there is no printing/publication lag: corpora can be created on current events, yesterday’s news stories, etc.;

◦ they can be built within hours.

A corpus-based solution???

Page 7: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Corpora can be used directly or indirectly Corpora can be used in combination with

traditional teaching practices (blended learning)

Corpora have been used successfully for language for research projects in the past: German for Chemists (Butler) and on the Warwick course of Italian Language for PG students of Renaissance Studies

A corpus-based solution??? (2)

Page 8: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

ReadingCorp

Page 9: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

2-year project funded by the AHRC (Collaborative Language Skills Training project)

Run at the Department of Russian and Slavonic Studies (Sheffield), GRASS and CTS (Leeds)

Combines knowledge and practice of PG language teaching methods (Sheffield / Leeds) with technological expertise in creating corpus tools for language learning purposes (Leeds)

Project description

Page 10: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

To explore possibilities for using corpora to achieve reading competence in Russian

To create tools, reference materials (keyword lists, annotated readers, a grammar for researchers) and exercises to support the acquisition of vocabulary from specific and varied domains

To actively engage students in “vocabulary identification” exercises

Aims

Page 11: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

It may seem “ridiculous” to suggest that a complete beginner with no formal training in linguistics or experience in learning a foreign language can learn Russian in a year

We focus solely on reading skills Our aim is for students to read authentic texts

with the help of dictionaries and our tools and materials - we do not expect them to pick up a text and read it as someone with years of training would

Why within a year?

Putting our goals into perspective

Page 12: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Corpus◦ The Russian Academic Corpus (RAC)

Technology (additions to the IntelliText Interface)◦ Keyword list generator (single- and multi-words; POS-specific)◦ Grammar frequency◦ Advanced options for navigating texts◦ Vocabulary highlights (general academic, discipline-specific

keywords)◦ Automatic grammar classification

Pedagogy◦ Readers from 13 academic disciplines◦ “Cleaned” keyword lists from 13 academic disciplines◦ Transferable teaching materials◦ A PG-focused grammar

Corpora, tools and materials

Page 13: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Contains approximately 5 million words Used for compiling frequency lists and in teaching Made up of 13 sub-corpora (art, criminology,

culture, ecology, economics, geography, history, international relations, linguistics, medicine, politics, religion, sociology)

The sub-corpora are roughly equal in size and each contains 50 texts

The “main” corpus is freely available via the IntelliText Interface

Individual sub-corpora are available on demand

The Russian Academic Corpus (RAC)

Page 14: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

“General academic” and “discipline-specific” keywords were extracted

Single words (discipline-specific) and multi-words (general academic and discipline-specific)

“cleaned”: anomalies removed; lemmas changed to original form (то не менее > тем не менее, по отношение к > по отношению к)

100 keywords for each subject area Translations (all lists) and collocations (single

words)

Keyword lists

Page 15: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Phrase Translation

вместе с тем moreover; that said

тем не менее nevertheless

в зависимости от depending on

состоит в том is

заключается в том is

в это время at the (this / that) time

по отношению к with regard to

список используемой литературы bibliography

может привести к may lead to

один из важных an important

включает в себя includes

Academic phrases (three-word keywords)

Page 16: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Keyword Translation

вода water

загрязнение polution

отходы waste

вещество substance

атмосфера atmosphere

энергия energy

воздух air

почва soil

среда environment

газ gas

Top 10 one-word keywords from the “Ecology” sub-corpus

Page 17: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Keyword Translation Key collocations

вода water

сточные воды "waste water"; пресная вода "fresh water"; морская вода "sea water" грунтовые воды "ground waters"; качество воды "water quality"

отходы waste

бытовые отходы "domestic waste"; промышленные отходы "industrial waste"; твёрдые отходы "solid waste"; переработка отходов "waste processing"; размещение отходов "waste disposal"

Keywords and their collocates

Page 18: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Lexical bundle Translation

рынок труда labour market

национальная экономика national economy

оплата труда remuneration of labour

на рынке on the market

спрос на demand for

социальная политика social policy

рабочая сила work force

цена на price of

предпринимательский риск entrepreneurial riskпредпринимательская деятельность entrepreneurship

Two-word keywords from the “Economics” sub-corpus

Page 19: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

10 readers from each of the 13 sub-corpora Each text contains approximately 200 words The readers may be used to train general

academic vocabulary or discipline-specific vocabulary

Manually annotated Freely available

Readers

Page 20: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Криминогенность личности представляет собой качественной выражение соотношения негативной и позитивной направленности личности. А преступление является объективным, реальным показателем криминогенности личности. Криминогенность можно рассматривать с двух позиций. Исходя из первой, «криминогенность рождается и умирает вместе с преступлением». Однако криминогенность можно рассматривать не только как результат, но и как процесс ее становления. Таким образом, можно выделить три стадии генезиса криминогенности личности преступника: Формирование криминогенности личности, которая в этот период совершает аморальные поступки и правонарушения неуголовного характера.

Sample reader

Page 21: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Focus on “receptive” not “productive” language skills

Grammar identification: our aim is for users to identify and understand the use of grammatical features, with our notes and tools, not to be able to construct them

Grammar forms were selected on the basis of their frequency in academic texts: participles, gerunds and passive constructions were introduced early; some points of grammar commonly covered in the first year of UG programmes were not included.

Grammar

Page 22: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

The following information is included for each point of grammar:◦ an English-language commentary of how and for what

purpose it is used;◦ information on what the form looks like (identification);◦ lists of other points of grammar that have the same

form and notes on how to tell them apart (disambiguation);

◦ an annotated list of common words within the category;

◦ corpus examples and translations.

Grammar 2

Page 23: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Use: -ing forms: judging by his comments, I’d say that ...

Looks like: принимая ,судя, опираясь Common exceptions: будучи Can be confused with: soft feminine nouns (Nom.

Sing.) = неделя, hard feminine adjectives (Nom. Sing.) = интересная; soft masculine nouns (Gen. Sing.) = трамвая

Disambiguation: gerunds are very unlikely to be directly preceded by words ending in –ая or –ого; words ending in –a rarely follow gerunds (BUT принимая лекарства)

Example (imperfective gerunds)

Page 24: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Gerund Translation Notes

говоря speaking, talking

о "about" + Prep.; не говоря уже о "not to mention"; по-иному / иначе говоря "put another way, in other words"; строго говоря "strictly speaking"

исходя

on the basis of, on the strength of, based on the assumption that

из "from" + Gen.; исходя из этого "on this basis"; исходя из того, что "on the basis of" (+ verb)

начиная starting с "from" + Gen.

будучи being Instr.

учитывая considering Acc.

имея having в виду

считая considering что "that"; Acc.

опираясь based, drawing; relying на "on" + Acc.

рассматривая viewing, considering Acc.

стремясь trying, in an attempt to with verb infinitives; к + Dat.

Common forms (imperfective gerunds)

Page 25: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

For texts that are available online or that have been digitised

The ReadingCorp tools allow users to annotate their texts according to vocabulary and grammar

Vocabulary highlights work for any text uploaded to the system, as the list of academic words is stable and our tools automatically classify texts and corpora according to keywords

Automatic grammar classification helps users identify or disambiguate parts of speech

Demo with “Space” corpus

Reading texts with our tools

Page 26: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Automatic grammar classification

Page 27: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Initial corpus training (either one session over an afternoon or two shorter sessions)

Introduction to the Cyrillic alphabet (if necessary)

1 class a week focusing on (1) guided reading and (2) hands-on vocabulary building exercises

Exercises are based around keywords

Teaching methodology and materials

Page 28: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

verb рынок noun

adj. рынок verb

noun рынок verb

prep. рынок noun

Sample materials 1

verb регулировать

рынок труда noun

adj. внутренний рынок характеризуется

verb

noun сегментация рынок является verb

prep. на рынок сбыта noun

Page 29: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Adjective спрос

Verb Adjective спрос

Verb Adjective спрос Preposition

Verb Adjective спрос Preposition Noun

Verb спрос Preposition

Verb спрос Preposition (Adjective) Noun

Sample materials 2

Page 30: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Combination Lexical bundle Translation

Adj. + Search Word (SW)

совокупный спрос aggregate demand

Verb + Adj. + SW + Noun

отражать платежеспособный спрос населения

to reflect the population’s purchasing power

Verb + Adj. + SW пользоваться большим спросом

to be in high demand

Verb + SW + Noun удовлетворить спрос покупателей

to meet customers’ demands

Noun + SW + Prep. увеличение спроса на

rise in demand for

SW + Verb спрос падает demand is decreasing

Results

Page 31: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Tutors working with students whose research is in an area other than those covered by ReadingCorp may:◦ use our interface to create keyword lists and analyse

texts◦ use the readers for general reading practice◦ access the RAC◦ use the grammar ◦ use the keyword lists from the RAC

They will need to:◦ create keywords lists for the subject by building a small

corpus◦ add their own examples to the material templates

“Transferability” of resources

Page 32: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Is/Does a corpus-based approach:◦ suitable for distance learning? ◦ cover contemporary research topics? ◦ cost-effective and sustainable? ◦ transferable to other languages and domains? ◦ cater for the needs of the individual student? ◦ help structure syllabi? ◦ allow ab-initio students to acquire the

necessary reading skills to be able to effectively carry out their research?

How does a corpus approach address the CEELBAS issues?

Page 33: James Wilson University of Leeds j.a.wilson@leeds.ac.uk.

Corpora go beyond the traditional course book and offer exciting possibilities for LSP learning and teaching

A corpus-based approach is particularly well-suited to training reading competence in specific domains◦ It makes the goal of reading and understanding authentic academic texts

in Russian within a year a realistic objective

BUT will advances in machine translation and optical character recognition make specialised reading courses redundant? As machine translation becomes more reliable, as more material is digitised and made available online and as OCR technology becomes more accurate, will students need anything other than a scanner and Google Translate?

Conclusion