A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22,...

37
A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015

Transcript of A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22,...

Page 1: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

A UBC textbook corpus to

identify EAP target vocabulary

Mike Murphy | BC TEAL Conference | May 22, 2015

Page 2: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

What’s in it for you?• A better understanding of the Academic Word List• Food for thought about the nature of academic written

vocabulary• Technical tips, if you’re new to corpus research• An engineering-specific academic word list to use in your

teaching

Page 3: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Agenda• Introduction

• The academic vocabulary controversy / Research questions and design

• Background• Lexical thresholds / GSL and AWL / Previous engineering lists

• Methodology Notes• Identifying target texts / Notes on logistics

• Results• FEC coverage comparisons / Overlap between FEWL and AWL /

Word technicality assessment

• Conclusions• Is there an academic vocabulary? / Classroom applications of

FEWL

Page 4: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

INTRODUCTION• Is there an academic vocabulary?• Research questions• Research design

Page 5: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Is there an academic vocabulary?

Yes.• Paul Nation (2013): notwithstanding the distinctive

linguistic features of texts from different academic disciplines, all scholars share a common vocabulary because they perform common communicative functions, such as evaluating research, describing methods, and presenting and discussing data

• Word lists for EAP vocabulary teaching and learning should be derived from general academic corpora.

Page 6: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Is there an academic vocabulary?

No.• Hyland and Tse (2007): “It is by no means certain that

there is a single academic literacy which university students need to acquire to participate in academic environments, and we believe that a perspective which seeks to identify and teach such a vocabulary … does not correspond with the ways language is actually used in academic writing”

• Word lists for EAP vocabulary teaching and learning should be based on discipline-specific corpora.

Page 7: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Research questions

To what extent can there be said to exist a general, non-discipline-specific academic English?

1. How well will Coxhead’s AWL, an EAP word list derived from a general academic corpus, cover the lexis in a corpus of first-year engineering textbooks?

2. Compared to AWL, how well will a word list that is derived from the first-year engineering corpus cover the lexis in the corpus?

3. If the items in the engineering-specific word list differ from the items in AWL, what proportion of the non-AWL items will be too technical to use in non-engineering-specific EAP contexts?

Page 8: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Research design• Build a corpus of all textbooks used in first-year UBC

engineering courses for the 2014-’15 academic year• From this “First-year Engineering Corpus” (FEC), derive a

“First-year Engineering Word List” (FEWL) that resembles AWL in these ways:• Consists of the 570 most frequent word families in the corpus,

divided into 10 sublists by frequency• Excludes items from the General Service List (West 1953)

Page 9: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Research design (con’d)

• Compare coverages of FEC provided by the general and discipline-specific word lists AWL and FEWL

• Assess whether FEWL items that do not overlap with AWL are too technical for certain EAP contexts.

Page 10: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

BACKGROUND• Lexical thresholds for effective reading• The General Service List and Academic Word List• Previous engineering-specific word lists

Page 11: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Lexical thresholds for effective reading

• research suggests readers must know 95% of a text’s words to be able to guess the unknown words and have reasonable comprehension of the text overall.• Liu and Nation (1985)• Hirsh and Nation (1992)• Laufer (1992)

Page 12: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

The General Service List• Developed in 1940s mainly to support creation of

simplified, difficulty-sequenced ESL readers• 2,000 word families divided into GSL 1 and GSL 2• Gives an average of 82% coverage of various written

texts (Hirsh and Nation, 1992)• Has been criticized by some due to its age and written

focus but Nation and Waring (1997) claim it is still best available list of high-frequency English

• A New General Service List is available

Page 13: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

AWL: Profile of the corpus • Based on a 3.5M-token corpus of academic texts

• 4 broad areas—Arts, Commerce, Law, Science—each encompassing 7 disciplines, for a total of 28 disciplines

• Text types represented: Journal articles, textbooks and coursebooks, texts from the ‘Learned and Scientific’ subcorpora of Wellington, Brown, LOB

Page 14: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

AWL: Profile of the word list• Three broad selection criteria for list items

1. Distinctiveness – vocabulary item does not appear on GSL

2. Frequency – item appears at least 100 times in corpus as a whole

3. Range – items appears… a) At least 10 times in each of the 4 macro-discipline subcorpora

(Arts, Commerce, Law, Science )

b) At least once in 15 or more of the 28 subject areas subsumed by the four macro-disciplines

Page 15: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Coverage of Coxhead’s academic corpus provided by GSL and AWL

   Academic Word List

General Service List  Total

Subcorpus GSL 1 GSL 2

Arts 9.3 73.0 4.4 86.7

Commerce 12.0 71.6 5.2 88.8

Law 9.4 75.0 4.1 88.5

Science 9.1 65.7 5.0 79.8

Page 16: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Previous engineering lists: Englist (Ward ,1999)• Developed at a Thai university for engineering students• Unlike AWL, overlaps with GSL• 2,000 word families, derived from a 1M-token corpus

• Corpus made up of one textbook each from five required first-year engineering courses at his university

• EngList covered 95% of Ward’s engineering corpus• Ward favours word lists derived from discipline-

specific corpora

Page 17: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

EngList vs. GSL/UWLSubject Grouping Subject GSL/UWL

(2,836 words)

GSL Ward’s EngList

(2,000 words)

Background Science

Biology 85.9 76.5 79.5

Chemistry 88.7 76.1 89.2

Physics 92.5 81.7 94.3

Background

Engineering

Engineering Materials 86.5 76.0 89.4

Engineering

Mechanics

92.0 83.0 97.4

Fluid Mechanics 92.1 79.9 97.4

Specialist

Engineering

Mechanical

Engineering

80.3 71.4 84.2

Chemical Engineering 92.5 80.1 91.7

Electrical Engineering 94.1 81.7 96.7

Humanities

Economics 92.6 82.8 86.1

Philosophy 95.0 87.2 87.3

Psychology 91.0 80.9 84.4

Page 18: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Previous engineering lists:

The SEEC list (Mudraya, 2006)• Also developed in Thai university context• 2M-token Student Engineering English Corpus (SEEC)

• 13 complete textbooks used in required undergrad engineering courses

• ‘Keyness’ comparison with reference corpora BoE and BNC

• Most unusually frequent SEEC items that also had good range over 13 subcorpora were high frequency or general academic lexis, NOT technical lexis

Page 19: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Previous engineering lists:

Mudraya’s observations• Verbs that scored highest on keyness measures and

occurred in at least 5 of 13 subcorpora: act, apply, assume, be, become, calculate, consider, correspond to, define, determine, exert, give, illustrate, indicate, locate, obtain, occur, require, show, sketch, solve, substitute, use

• Distinctive lexis in her engineering corpus not engineering-specific • we can identify a general, pan-academic English and it, not

discipline-specific lexis, should be focus of EAP studies

Page 20: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

METHODOLOGY NOTES• Identifying target texts• Logistical Notes

Page 21: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Identifying target courses/readingsSubject Area Course

Comp. Sci. • Introduction to Computation in Engineering Design

Chemistry • Chemistry for Engineering

English • Strategies for University Writing

Math

• Differential Calculus with Applications to Physical Science and

Engineering

• Integral Calculus with Applications to Physical Science and

Engineering

• Linear Systems

Physics

• Introductory Physics for Engineers I

• Introductory Physics for Engineers II

• Mechanics

Page 22: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Notes on logistics

• Software used to…• Convert .PDFs of textbook pages to text files:

PDF2TXT, v. 3.2• Clean text files: EditPad Lite 7• Make frequency counts: AntConc concordancing

software (v. 3.4)• Calculate coverages: Microsoft Excel• Obtain electronic copies of AWL and GSL:

Paul Nation’s Range program

Page 23: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

RESULTS • FEC coverage comparisons• Overlap between FEWL and AWL• Word technicality assessment

Page 24: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Coverage of FEC by GSL, AWL, and FEWL

English Comp. Sci.

Chem. Math Physics Overall

General Service List (GSL)

80.8 78.0 77.0 81.1 80.9 79.5

Academic Word List (AWL)

10.6(91.4)

13.0(91.1)

9.9(86.9)

9.1(90.2)

7.9 (88.8)

10.1 (89.7)

First-year Engineering Word List (FEWL)

10.2(91)

18.3(96.3)

16.9(93.9)

16.0(97.1)

14.9(95.8)

15.3(94.8)

Page 25: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Most frequent FEWL items in each FEC subcorpus

English Comp. Sci. Chem. Math Physics

1. research function† react function† energy††

2. scholar data atom equation*** magnet

3. genre computer molecule vector potential

4. community array chemical matrix magnitude

5. summary file energy†† graph conduct

6. abstract vary electron††† linear constant

7. define* define* equation*** define* section**

8. cite structure bond theorem positive

9. identify equation*** equilibrium integral electron†††

10. culture section** ion section** wavelength

Page 26: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Top 50 FEWL words*non-AWL words are bolded

1. function 11. atom 21. electron 31. positive 41. integral

2. equation 12. matrix 22. potential 32. sum 42. negative

3. react 13. constant 23. area 33. equilibrium 43. ion

4. define 14. molecule 24. array 34. element 44. loop

5. energy 15. research 25. chemical 35. maximum 45. correspond

6. computer 16. structure 26. identify 36. series 46. magnitude

7. section 17. linear 27. magnet 37. initial 47. estimate

8. data 18. process 28. volume 38. require 48. axis

9. vector 19. chapter 29. file 39. occur 49. bond

10. vary 20. graph 30. assume 40. conduct 50. theorem

Page 27: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Overlap between FEWL and AWLFEWL Sublist

 

Non-AWL words in FEWL sublist

FEWL headwords that are non-AWL

Sublist 1 20 20 / 60 (33%)

Sublist 2 19 39 / 120 (33%)

Sublist 3 24 63 / 180 (35%)

Sublist 4 33 96 / 240 (40%)

Sublist 5 31 127 / 300 (42%)

Sublist 6 27 154 / 360 (43%)

Sublist 7 35 189 / 420 (45%)

Sublist 8 34 223 / 480 (46%)

Sublist 9 35 258 / 540 (48%)

Sublist 10 17 275 / 570 (48%)

Page 28: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Word technicality scale (Coxhead and Nation, 2001)

Category 1

item appears rarely if ever outside a particular field (e.g. in Law, jactitation, per curieam, cloture)

Category 2

item is used both inside and outside a particular field, but the meaning it conveys outside the field is completely different from the meaning it has in the field (e.g. in Law, a caution is a formal warning recited by a police officer when arresting a suspect; in its non-legal usage it means ‘prudence’ or ‘circumspection’)

Category 3

item is used both inside and outside a given field but most of its uses with a particular meaning occur in the field (e.g. most instances of reconstruction appear in legal discourse, but occurrences with this meaning also appear outside the field of Law). Crucially, specialized, ‘in-field’ meaning “is readily accessible through its meaning outside the field”

Category 4

item is somewhat more common inside than outside of field but specialization of meaning is minimal or non-existent (e.g. judge, mortgage, trespass)

Page 29: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Technicality results

Of the 275 FEWL items that do not overlap with AWL,• 176 (or 64%) were deemed non-technical • 99 (or 36%) were deemed technical

Page 30: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Top 10 non-AWL FEWL headwords judged 'technical' and

'non-technical'Technical Non-Technical

1. vector 1. atom

2. matrix 2. linear

3. molecule 3. graph

4. electron 4. magnet

5. array 5. equilibrium

6. ion 6. magnitude

7. loop 7. dense

8. axis 8. radius

9. theorem 9. scholar

10. velocity 10. wavelength

Page 31: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

CONCLUSIONS

Page 32: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Is there an academic vocabulary?• Yes, to an extent: Coxhead’s claim that AWL covers about

10% of lexis in academic texts holds up in this study, in the case of first-year UBC engineering texts

• However, at 15%, FEWL provides engineering EAP students with a much greater return on their investment in vocabulary study

Page 33: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Applications of FEWL?• Not suitable for mixed-major, pre-sessional EAP classes

• Could be given to engineering students in the class to be used for self-study

• Suitable for situations in which all students are taking or preparing to take engineering classes and the instructor has some specialized knowledge of engineering • E.g. engineering-specific pre-sessional courses; in-sessional

adjunct courses

Page 34: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

Contact information• Mike Murphy, English Language Institute, • [email protected]

Page 35: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

References• Anthony, L. (2014) AntConc• Bauer, L. and Nation, P. (1993) Word Families. International Journal of Lexicography, 6: 253–279• Bruce, I. (2011) Theory and concepts of English for academic purposes. Houndmills, Basingstoke, Hampshire ; New

York: Palgrave Macmillan• Cobb, T. and Horst, M. (2001) “Reading academic English: Carrying learners across the lexical threshold.” In Flowerdew, J.

and Peacock, M. (eds.) Research perspectives on English for academic purposes. Cambridge applied linguistics series. Cambridge, England: Cambridge University Press. pp. 315–329

• Cohen, A.D., Glasman, H., Rosenbaum-Cohen, P.R., et al. (1988) “Reading English for specialised purposes: Discourse analysis and the use of student informants.” In Carrell, P., Devine, J. and Eskey, D.E. (eds.) Interactive Approaches to Second Language Reading. Cambridge: Cambridge University Press. pp. 152–167

• Cooper, M. (1984) “Linguistic competence of practised and unpractised non-native speakers of English.” In Alderson, J.C. and Urquhart, A.H. (eds.) Reading in a foreign language. Applied linguistics and language study. London ; New York: Longman. pp. 122–138

• Coxhead, A. (2000) A New Academic Word List. TESOL Quarterly, 34 (2): 213–238• Coxhead, A. and Hirsh, D. (2007) A pilot science-specific word list. Revue française de linguistique appliquée, Vol. XII (2):

65–78• Coxhead, A. and Nation, P. (2001) “The specialised vocabulary of English for academic purposes.” In Flowerdew, J. and

Peacock, M. (eds.) Research perspectives on English for academic purposes. Cambridge applied linguistics series. Cambridge, England: Cambridge University Press. pp. 252–267

• Engels, L.K. (1968) The fallacy of word counts. IRAL, 6: 213–231• Farrell, P. (1990) Vocabulary in ESP: a lexical analysis of the English of electronics and a study of semi-technical

vocabulary [online]. M.Phil., Trinity College Dublin (University of Dublin) (Ireland). Available from: http://search.proquest.com.ezproxy.library.ubc.ca/docview/301436110?pq-origsite=summon [Accessed 3 November 2014]

• Freebody, P. and Anderson, R.C. (1983) Effects of Vocabulary Difficulty, Text Cohesion, and Schema Availability on Reading Comprehension. Reading Research Quarterly, 18 (3): 277–294

• Ghadessy, M. (1979) Frequency counts, word lists, and materials preparation: A new approach. English Teaching Forum, 17 (1): 24–27

Page 36: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

References• Guhr, D.J., Furtado, N.D. and Villareal, N. (2014) 2014 British Columbia International Education Intelligence Report. San

Carlos, CA: Illluminate Consulting Group• Haahr, M. (1998) RANDOM.ORG [online]. Dublin: Randomness and Integrity Services Ltd. Available from: www.random.org• Heatley, A. and Nation, I.S.P. (1996) Range [online]. Wellington, New Zealand: Victoria University of Wellington. Available from:

http://www.vuw.ac.nz/lals• Hirsh, D. (2004) A Functional Representation of Academic Vocabulary. PhD, Victoria University of Wellington• Hirsh, D. and Nation, P. (1992) What Vocabulary Size Is Needed to Read Unsimplified Texts for Pleasure? Reading in a

Foreign Language, 8 (2): 689–696• Hornby, A.S., Phillips, P. and Ashby, M. (eds.) (2010) Oxford advanced learner’s dictionary of current English. 8th ed.

Oxford: Oxford University Press• Hyland, K. and Tse, P. (2007) Is There an “Academic Vocabulary”? TESOL Quarterly, 41 (2): 235–253• Krishnamurthy, R. and Kosem, I. (2007) Issues in creating a corpus for EAP pedagogy and research. Journal of English for

Academic Purposes, 6 (4): 356–373• Laufer, B. (1992) “How much lexis is necessary for reading comprehension?” In Arnaud, P.J.L. and Bejoint, H. (eds.)

Vocabulary and applied linguistics. Houndmills, Basingstoke: Macmillan Academic and Professional. pp. 126–132• Law, J. and Martin, E.A. (2009) A Dictionary of Law. 7th ed. [online]. Oxford University Press. Available from:

http://www.oxfordreference.com/view/10.1093/acref/9780199551248.001.0001/acref-9780199551248 [Accessed 31 October 2014]

• Liu, N. and Nation, I.S.P. (1985) Factors affecting guessing vocabulary in context. RELC journal, 16 (1): 33–42• Lynn, R.W. (1973) Preparing word lists: A suggested method. RELC journal, 4 (1): 25–32• McEnery, T. and Hardie, A. (2012) Corpus linguistics: method, theory and practice. Cambridge textbooks in linguistics.

Cambridge ; New York: Cambridge University Press• Miller, D. (2011) ESL reading textbooks vs. university textbooks: Are we giving our students the input they may need. Journal

of English for Academic Purposes, 10 (1): 32–46• Moudraia, O. (2003) “The student engineering corpus: analysing word frequency.” In Archer, D., Rayson, P., Wilson, A., et al.

(eds.) Proceedings of the corpus linguistics 2003 conference. Lancaster: UCREL, Lancaster University. pp. 552–561

Page 37: A UBC textbook corpus to identify EAP target vocabulary Mike Murphy | BC TEAL Conference | May 22, 2015.

References• Mudraya, O. (2006) Engineering English: A lexical frequency instructional model. English for Specific Purposes,

25 (2): 235–256• Nation, I.S.P. (2013) Learning vocabulary in another language. The Cambridge applied linguistics series. Second

Edition. Cambridge: Cambridge University Press• Nation, P. and Waring, R. (1997) “Vocabulary size, text coverage and word lists.” In Schmitt, N. and McCarthy, M.

(eds.) Vocabulary: Description, Acquisition and Pedagogy. Cambridge University Press. pp. 6–19• Nurweni, A. and Read, J. (1999) The English language knowledge of Indonesian university students. English for

Specific Purposes, 18 (2): 161–175• PDF2TXT (2007). VeryPDF• Richards, J.C. (1974) Word list: Problems and prospects. RELC, 5 (2): 69–84• Salager, F. (1984) The English of medical literature research project. English for Specific Purposes, 87 (5 July)• Sinclair, J. (1991) Corpus, concordance, collocation. Describing English language. Oxford: Oxford University

Press• Sinclair, J. (2005) “Corpus and text--Basic principles.” In Wynne, M. (ed.) Developing linguistic corpora: a guide

to good practice. AHDS guides to good practice. Oxford [U.K.]: Oxbow. p. Chap. 1• Valipouri, L. and Nassaji, H. (2013) A corpus-based study of academic vocabulary in chemistry research articles.

Journal of English for Academic Purposes, 12 (4): 248–263• Ward, J. (1999) How Large a Vocabulary Do EAP Engineering Students Need? Reading in a Foreign Language,

12 (2): 309–323• West, M. (1953) A General Service List of English Words. Longman• Williams, J. (2013) LEAP Advanced Reading and Writing Student Book with CW+. Pearson Education, Limited• Xue, G. and Nation, I.S.P. (1984) A University Word List. Language Learning and Communication, 3 (2): 215–229• Yang, H. (1986) A new technique for identifying scientific/technical terms and describing science texts. Literary and

linguistic computing, 1 (2): 93–103