· Web viewComparing TOEFL iBT™ Speaking Tasks with real-world academic speaking tasks: Tasks,...

16
1 Comparing TOEFL iBT™ Speaking Tasks with real-world academic speaking tasks: Tasks, performance features and assessment Progress Report 3 Annie Brown 8th June, 2016

Transcript of   · Web viewComparing TOEFL iBT™ Speaking Tasks with real-world academic speaking tasks: Tasks,...

1

Comparing TOEFL iBT™ Speaking Tasks with real-world academic speaking tasks: Tasks, performance features and assessment

Progress Report 3

Annie Brownand

Ana Maria Ducasse

8th June, 2016

Overview

The following tasks and deliverables were scheduled for the period leading up to Progress Report 3.

1. Analysis of moderation session data (RQ1)2. Analysis of high level international student sample, native speaker samples, and TOEFL samples

(RQ2) 3. Analysis of mid and low level international student samples (RQ3)

All data analyses have now been completed and work is underway on the preparation of manuscripts. Because of the range of issues addressed in the research questions and the breadth of data collected and analysed, we anticipate preparing more than one journal article in the coming months.

The first data to be analysed was the tutor moderation session data. Inter-coder agreement was 83% on division into ideas units, and 82% on coding. The coding categories were developed by means of a review of a sub-set of ideas units.

The analysis of student data was quite time consuming as we had decided that, following segmentation (division into ideas units ), all student data would be double coded and that all disagreements would be finalised through discussion. This was done in an iterative manner, with improving inter-coder reliability as a result.

Word counts for the data sets were as follows:RQ2 high university presentations: Average word count = 644 wordsRQ2 TOEFL presentations: Average word count = 141 wordsRQ3 Low and mid University presentations: Average word count = 567 words

Inter-coder agreement for segmentation was 92.3% for TOEFL data, and 88% for university presentations. Inter-coder agreement on coding into coding categories (based on those identified in the pilot study) was 84% on TOEFL presentations and 78% on university presentations.

Final coding categories for the RST analysis were as follows:Background: Background, CircumstanceCondition: Condition, Hypothetical, ContingencyCause-result: Result, ConsequencePurpose: PurposeExplanation: Evidence, Explanation, reasonManner-means: Manner, MeansEvaluation: Evaluation, Interpretation, Conclusion, CommentComparison: Comparison, PreferenceContrast: Contrast, Concession, AntithesisTopic-Comment: Problem-solution, Question-Answer, Rhetorical-questionElaboration: Elaboration-additional, Elaboration-general-specific, Elaboration-process-step, Elaboration-object-attribute, Elaboration-set-member, Example, DefinitionTemporal: Temporal, Sequence

2

Summary: Preparation, Heading, Summary, Restatement

As attachments to this report, we include two conference presentations – abstract and slides - based on RQ1 and RQ2 respectively.

Attachment 1

Comparing TOEFL iBT speaking criteria and assessment practices for marking oral presentations by international and local students in three faculties

Paper presented at the 2nd Annual Conference of the Asian Association for Language Assessment, Bangkok, Thailand, April, 2015.

The integration of problem-based learning and graduate capabilities such as team work and oral communication into the undergraduate curriculum has heightened the importance of spoken language in academic classrooms and academic assessments. The criteria used to assess these classroom based academic oral assessment tasks are likely to be broader than those used to assess high-stake language tests, such as the TOEFL (c.f., McNamara’s [1996] distinction between strong and weak tests, and Jacoby and McNamara’s [1999] concept of indigenous criteria). Given the increasing role of spoken language in assessed academic performance, research that investigates how tutors assess the academic speaking performance of local and international students for classroom tasks in different disciplines can increase understanding of what constitutes acceptable oral communication at undergraduate level. Thus, this study focuses on oral presentation tasks in three disciplines and reports on similarities or differences in the characteristics of speaking that raters focus on in the different subject domains. The research compares: (a) the criteria in general in TOEFL and at university across both types of students and (b) whether raters focus on the same things for local and international students.

The samples of performances by local and international students were gathered from three faculties: Health Science, Business, and Science. Students were video-recorded performing an in-class oral presentation task. Each performance was assessed by two tutors: the student’s group tutor and another tutor from the same subject. Next, the tutors were recorded while participating in paired moderation sessions, where they applied and justified their assessments after watching and marking videos of the students from their discipline. The transcriptions of the rater discussions provide the data for a content analysis of the rater focus while applying indigenous assessment criteria to the performances by local and international students. The results hold implications for the relevance of assessment criteria in large-scale assessments and for the development of explicit marking criteria and specific activities to help university students with speaking assessment tasks.

Jacoby, S. and McNamara, T.F. (1999) Locating competence. English for Specific Purposes, 18, 3: 213-241.McNamara, T.F. (1996) Measuring Second Language Performance. London and New York: Longman.

3

4

5

6

7

8

Attachment 2

Comparing TOEFL iBT™ Speaking Tasks with real-world academic speaking tasks

Paper presented at the 3rd Annual Conference of the Asian Association for Language Assessment, Sanur, Bali, Indonesia, May 19-21, 2016.

The integration into the tertiary curriculum over the last few years of problem-based learning, and the need for university graduates to enter the workforce with highly-developed communication skills has heightened the importance of spoken language in academic classrooms. As a result, graduate capabilities such as team work and oral communication are being increasingly incorporated into the undergraduate curriculum, and students are increasingly required to demonstrate their learning in oral communication tasks. This study investigates the relationship between TOEFL speaking tasks and real-world academic oral assessment tasks. The study is designed to address the extrapolation inference of the TOEFL iBT™ validity argument, namely that the construct of academic language proficiency as assessed by TOEFL iBT™ tasks accounts for the quality of linguistic performance in similar tasks, i.e. monologic presentations, in an academic context. While TOEFL iBT™ speaking tasks have been carefully designed to replicate as far as possible the demands of real-world academic speaking (see Chapelle, Enright & Jamieson, 2008), the extent to which performance on TOEFL speaking predicts performance on real-world speaking tasks cannot be assumed; it is an empirical question which requires research. The study examines the discourse features of performances by thirty local and international students (15 high achieving international students and 15 local students) on assessed oral presentation tasks in three core first-year subjects across different faculties. These are then compared with the discourse characteristics of ideal performances by six high achieving candidates on TOEFL iBT™ speaking tasks. The analysis draws on genre theory and rhetorical structure theory, and focuses primarily on identifying the schematic structure of the oral texts, the range of rhetorical functions and the conceptual relationships that exist between them. While both types of analysis were developed originally for use with written texts, both now find application with oral texts also. The study is novel in that it takes as a criterion measure performance on oral academic assessment tasks, whereas previous validation studies have taken as their benchmark speaking performance within university-based English language programs, or on non-assessed university-based in-class and out-of-class speaking activities more broadly. In addition to seeking validation evidence for the TOEFL iBT™ Speaking test, specifically in relation to the extrapolation inference, it is argued that the outcomes of the study also have the potential to inform future refinement of the TOEFL iBT™ test tasks.

References: Chapelle, C.A., Enright, M.K. and Jamieson, J.M. (2008) Building a Validity Argument for the Test of English as a Foreign Language. Routledge, New York and Oxford.

Note: As the RST analysis was not quite finished in time for the conference presentation, and given the level of detail to be presented in relation to the genre analysis alone, we discussed only the genre analysis in the presentation.

9

10

11

12

13