Spaan V8 Kim

7/28/2019 Spaan V8 Kim

1/30

Spaan Fellow Working Papers in Second or Foreign Language AssessmentCopyright 2010

Volume 8: 130

English Language Institute

University of Michigan

www.lsa.umich.edu/eli/research/spaan

1

Investigating the Construct Validity of a Speaking Performance Test

Hyun Jung KimTeachers College, Columbia University

ABSTRACT With the increased demand for the integration of a performancecomponent in second language (L2) testing, speaking performance assessments

have focused on eliciting examinees underlying language ability through theiractual oral performance on a given task. Considering the nature of performance

assessments, many factors other than examinees speaking ability are

necessarily involved in the process of evaluation. Compared to the constructdefinition of speaking ability, however, relatively less attention has been givento tasks, which are regarded as a vehicle for assessment, although there is a

growing interest in authentic tasks in eliciting real-world language samples forevaluation. Thus, the present study investigates whether a speaking placement

test provides empirical evidence that the effect of task, as well as examineesattributes, should be considered in describing speaking ability in a performance

assessment. An understanding of the underlying structure of the speakingplacement test not only helps to identify the factors involved in the evaluation

process and their relationships, but ultimately makes it possible toappropriately infer examinees speaking ability.

In L2 testing, the notion ofperformance first emerged in the 1960s in response topractical needs, and since then, the demand to integrate examinees actual performance in L2

assessment has increased (McNamara, 1996). Early testers who advocated the integration of aperformance component focused on whether examinees could successfully fulfill a task in a

simulated real-life language use context (e.g., Clark, 1975; Jones, 1985; Morrow, 1979;Savignon, 1972). McNamara (1996) classified this approach as astrong sense of performance

assessment in which the definition of L2 ability construct is limited to examinees taskcompletion.

On the contrary, new theories of communicative competence and communicative

language ability in the 1980s and 1990s (e.g., Bachman, 1990; Bachman & Palmer, 1996,Canale, 1983; Canale & Swain, 1980) changed not only the perception of L2 language ability,but also the role of performance in language testing. They supported a weak sense of

performance assessment (McNamara, 1996), in which the main interest was examineeslanguage ability, instead of task completion. That is, L2 ability was determined based on

various language components derived from the theoretical models of communicativecompetence and communicative language ability. Examinees actual performance was elicited

for evaluation of language ability; however, the role of performance was limited to a vehicle


2/30

2 H. J. Kim

to elicit examinees underlying language ability. This approach to performance assessment,called a construct-centered approach (Bachman, 2002), has been widely accepted by L2

testers for most general purpose language performance assessments (e.g., Brindley, 1994;Fulcher, 2003; Luoma, 2004; McNamara, 1996; Messick, 1994; Skehan, 1998).

While the construct-centered approach to performance assessment gives priority to

definitions of L2 ability, a different perspective has recently been proposed. A task-centeredapproach focuses on what examinees can do with the language; that is, whether they canfulfill a given task (Brown, Hudson, Norris, & Bonk, 2002; Norris, Brown, Hudson, &

Yoshioka, 1998). Although this approach provides more systematic criteria for the evaluationof examinees task fulfillment than the approach of early testers who first argued for the

integration of performance in language testing, it basically shares the early testers view aboutwhat performance assessments aim to measure (i.e., strong version of performance

assessment). According to the task-centered approach, test contexts or tasks play a crucial rolein measuring L2 ability because examinees performance is evaluated based on real-world

criteria.The two approaches to performance assessment appear to be contradictory in nature.

Chapelle (1998), however, argued from an interactionalist perspective that both constructdefinitions and tasks should be considered together in defining L2 ability because the two

interact during communication. As reviewed, different perspectives on L2 performanceassessment have defined language ability distinctively with a different focus. What is

important is not which approach is superior, but whether a test is validated before inferringexaminees language ability from the test results. In other words, before an inference

regarding an examinees language ability is made from test scores, test developers and usersneed to make sure what the test aims to measure (e.g., various language components,

performance on tasks) and whether a test actually measures what it intends to measure.Although a test is designed for its intended purpose (e.g., following construct definitions, task

characteristics, or both), there are still many factors that need to be considered in L2performance assessments to understand examinees performance and define their language

ability. Examinees performance may be affected by factors other than their language ability(McNamara, 1996, 1997). McNamara (1995) elaborated a schematic representation (Figure 1),

which Kenyon (1992) first presented, to conceptualize the performance dimension of L2speaking performance tests. As presented in the figure, examinees performance in L2

speaking tests is affected by many factors in the testing phase (i.e., candidates, tasks,interlocutors, and their interactions) as well as in the rating phase (i.e., raters and rating

scales). Empirical studies have identified these factors that affect speaking performance testscores as effects of the: (1) candidate(Lumley & OSullivan, 2005; OLoughlin, 2002); (2)task(Chalhoub-Deville, 1995; Clark, 1988; Elder, Iwashita, & McNamara, 2002; Farris, 1995;Malabonga, Kenyon, & Carpenter, 2005; Shohamy, 1994; Wigglesworth, 1997); (3)

interlocutor(Brown, 2003; OSullivan, 2002); (4) rater(Barnwell, 1989; Bonk & Ockey,2003; Brown, 1995; Eckes, 2005; Elder, 1993; Y. Kim, 2009; Lumley, 1998; Lumley &

McNamara, 1995; Lynch & McNamara, 1998; Meiron & Schick, 2000; Orr, 2002;Wigglesworth, 1993); and (5)scale/criteria(M. Kim, 2001). It might be impossible tocompletely eliminate the effects of these factors on examinees speaking performance.However, it is important to understand relative contributions of these factors to examinees

performance and test scores in order to better estimate examinees speaking ability and moreappropriately interpret and use the test results.


3/30

3Investigating the Construct Validity of a Speaking Performance Test

Rater

Scale/Criteria Score

Performance

Interlocutor Task (includingother

candidate) Candidate

Figure 1. Interactions in Performance Assessment of Speaking Skills

(McNamara, 1995, p. 173)

To sum up, examinees speaking ability can be inferred only after a test is validated

with respect to its constructs and other factors involved in the process of evaluation. Thefocus of previous studies, however, has often been limited to effects of individual factors on

examinees test performance. In other words, speaking performance tests have not beenexamined in a big framework in which various factors (e.g., examinees language ability,

tasks, and rating criteria) interact with one another. Moreover, performance tests, especiallythose which do not involve high stakes, are oftentimes used without such validation. To this

end, the current study seeks to explore the nature of a speaking placement test, which hasbeen locally used in a community English program. In order to determine whether the

speaking test accurately measures speaking ability as intended, the underlying structure of thetest is investigated in the present study. In other words, the question of whether the

hypothesized components of speaking ability (reflected in the scoring rubric) actually functionas the operationalized constructs of the test is examined. In addition, to better explain how the

test works, the effects of other variables, such as rater perceptions and task characteristics, arealso investigated. That is, factors that can have an effect on speaking performance are

considered in addition to issues regarding construct definition.

Research Questions

The current study addresses the following three research questions: (1) What is thefactorial structure of the speaking test? (2) To what extent does the speaking test measure the

intended hypothesized constructs of speaking ability? (3) In addition to the measuredvariables, to what extent do other factors (i.e., raters and tasks) contribute to examinees

speaking performance?


4/30

4 H. J. Kim

Method

Context of the Current Study

The Community English Program (CEP) is an English as a second language (ESL)

program offered by the Teaching English to Speakers of Other Languages (TESOL) and

applied linguistics programs at Teachers College. The program targets adult ESL learners whowish to improve their communicative language ability. Therefore, the CEP curriculumemphasizes not only the various language components (grammar, vocabulary, and

pronunciation) but also the different language skills (listening, speaking, reading, and writing).To facilitate effective teaching and learning, all new students of the program are placed into

one of 12 proficiency levels based on results of a placement test, which consists of fivesections (i.e., listening, grammar, reading, writing, and speaking).

A majority of the CEP teachers are MA students of the TESOL and applied linguisticsprograms. That is, they are student teachers practicing ESL classroom teaching. Therefore,

their classrooms are regularly observed by faculty and colleagues and follow-up feedbacksessions are provided throughout the semester. The teachers also serve as raters of the writing

and speaking placement tests. From the rating experience, they not only become familiar withthe CEP students writing and speaking ability levels, but they also have an opportunity to

have hands-on experience in evaluating ESL learners writing and speaking ability. Therefore,the CEP functions as a teacher education program as well as an adult ESL program.

Participants

Participants in the current study consisted of 215 incoming CEP students who took theCEP speaking placement test. The majority of students in the program were adult immigrants

from the surrounding neighborhood or were family members of international students in theColumbia University community. The number of female students (73%) far exceeded that of

male students (27%). In terms of the participants first language, a large percentage consistedof three languages: Japanese (36%), Korean (19%), and Spanish (15%). With regard to their

length of residence, the vast majority of the participants responded that they had been inEnglish speaking countries, including the United States, for fewer than three years: less than

6 months (40%), 6 months to 1 year (19%), and 1 to 3 years (20%). In terms of theirmotivation for studying English, many participants reported academic and job-related reasons,

while over 50 percent gave priority to communication with friends as their reason forimproving their English.

Instruments

The instruments used in the current study included the CEP placement speaking testand an analytic scoring rubric. The speaking test was designed to measure speaking ability

under various real-life language use situations. The test had six tasks: complaining about acatering service (Task 1), talking about a favorite movie (Task 2), narrating a story based on a

sequence of pictures (Task 3), refusing a request from a landlord (Task 4), summarizing aradio commentary (Task 5), and summarizing a lecture (Task 6). The first three tasks (i.e.,

Tasks 1, 2, and 3) were the independent-skills tasks, which required examinees to draw ontheir background knowledge to perform the tasks. On the other hand, the last three tasks (i.e.,

Tasks 4, 5, and 6) were the integrated-skills tasks, which required examinees to use theirlistening skills in the performance of the tasks. That is, examinees were asked to listen to long


5/30


or short passages, which were provided as part of the tasks, and then formulate responsesbased on the content of the passages.

The speaking test was a semi-direct, computer-delivered test. That is, there was nointeraction between an examinee and an interlocutor. Instead, the examinees listened to the

pre-recorded instructions and prompts delivered by a computer and then they were asked to

record their responses. The six tasks and the test format for each task (e.g., preparation time,response time) are found in Appendix A.An analytic scoring rubric consisting of five rating scales (see Appendix B) was used

to score the examinees recorded oral responses. The five scales included meaningfulness,grammatical competence, discourse competence, task completion, and intelligibility. Each of

the five rating scales was rated on a six-point scale (0 for no control to 5 for excellentcontrol). To analyze each scale in relation to the different tasks in this study, the five scales

for each of the six tasks were regarded as individual items, making a total of 30 items (6 tasksx 5 rating scales) on the test. That is, each cell in Table 1 illustrates the individual items of the

test. For instance, the item MeanT1 represents meaningfulness for Task 1 while theitemMeanT2 refers to meaningfulness for Task 2.

Table 1. Taxonomy of Items (Task x Rating Scale) on Speaking Ability

TasksRating scales

Number

of Items Task 1 Task 2 Task 3 Task 4 Task 5 Task 6

Meaningfulness 6 MeanT1 MeanT2 MeanT3 MeanT4 MeanT5 MeanT6

Grammatical

competence6 GramT1 GramT2 GramT3 GramT4 GramT5 GramT6

Discourse

competence6 DiscT1 DiscT2 DiscT3 DiscT4 DiscT3 DiscT6

Task completion 6 TaskT1 TaskT2 TaskT3 TaskT4 TaskT5 TaskT6

Intelligibility 6 IntelT1 IntelT2 IntelT3 IntelT4 IntelT5 IntelT6

Total 30

Procedures

Test Administration

The speaking test was administered in a computer lab on the second day of a two-dayplacement test administration. The test was administered to groups of approximately 40

students. Each student was seated in front of a computer. They listened to the test instructionson a headset, read the instructions on the computer screen, and recorded their responses to the

test items using a microphone. Since all computers were controlled from a central console, theexaminees kept the same pace while taking the test. That is, the instructions and prompts were

delivered at the same time, and the preparation and response times were also provided to allexaminees at the same time.

Before the actual test began, the examinees were asked to fill in a background surveywhich asked for demographic information, prior English-learning experience, and plans for


6/30

6 H. J. Kim

future study. Once all examinees of a group completed the survey, they were given a practicetask so that they would be familiar with the test format. After a short intermission for any

questions about the test format, the six tasks were played in sequence. For each task, theexaminees first listened to or looked at an instruction and a prompt. They were allowed to

prepare responses during a short preparation time and lastly they recorded their responses

during the given response time.

Scoring

Each examinees performance was scored by two independent raters. The raters werethe CEP teachers, most of whom were MA or EdD students in the TESOL and applied

linguistics programs at Teachers College. Prior to the actual rating, the raters attended anorming session in which the test tasks and the rubric were introduced and sample responses

were provided for practice. Time was also given for discussion of analytic scores so that theraters had opportunities to monitor their decision-making processes by comparing the

rationale behind their scores with other raters opinions. Rating practice and discussioncontinued until the raters felt that they were well aware of the tasks and confident with

assigning scores on different rating scales. Following the norming session, each rater wasassigned a certain number of examinees. Since examinees performance on each of the six

tasks was scored on the five rating scales, each examinee was given 30 analytic ratings on 30items. The maximum score for each item was five and the minimum was zero. The scores

assigned by two independent raters were later averaged to determine a speaking score for theplacement test.

Analyses

Thedata were analyzed using SPSS version 12.0 (SPSS Inc., 2001) and EQS version6.1 (Bentler & Wu, 2005). Descriptive statistics (i.e., means, standard deviations,

maximum/minimum raw scores, and skewness and kurtosis values) were calculated for theentire test, for the 30 individual items, and for each of the five rating scales across the six

tasks separately using SPSS to verify central tendency and variability. Reliability estimateswere then calculated based on Cronbachs Alpha to examine the degree of relatedness among

the 30 items and the six items under each of the five rating scales. Also, the degree ofagreement between the two raters (i.e., inter-rater reliability) was investigated from various

perspectives, such as from the examinees total score, across the six tasks, and across the fiverating scales. Since composite scores comprised interval data that were converted from the

original ordinal data, inter-rater reliability was estimated based on Pearson Product-Momentcorrelations.

After calculating descriptive statistics and reliability estimates, exploratory factoranalyses (EFA) were conducted to determine the extent to which the 30 items clustered

together. In other words, factor analyses were used to examine what patterns of correlationswould be observed among the 30 items. Based on the correlation matrix, initial factors were

extracted by principal-axes factoring (PAF) after the appropriateness of the use of acorrelation matrix for factor analysis was verified using three calculations: (1) Bartletts test

of sphericity; (2) the Kaiser-Meyer-Olkin (KMO); and (3) the determinant of the correlationmatrix. The initial factors were then rotated until the best solution was found to determine the

number of underlying factors. Since it had been assumed that the factors were correlated with


7/30


one another, a direct oblimin rotation procedure was used after checking the factor correlationmatrices each time.

Finally, confirmatory factor analyses (CFA) were performed to establish a model ofthe speaking test. CFA was used to determine the extent to which the 30 items were measured

in relation to the six tasks and five scoring criteria. Based on a review of the literature, a

second-order Multitrait-Multimethod (MTMM) Model was first hypothesized. After failing tofind an appropriate solution with the hypothesized model, several other CFA models wereattempted to find a final model that best explained the data. To assess the adequacy of models

including the hypothesized model, several fit indices were used such as the Chi-squarestatistic, the Chi-square/df ratio, the comparative fit index (CFI), and the root mean-square

error of approximation (RMSEA). In addition, a distribution of standardized residuals waschecked. The results of the Lagrange Multiplier test and Ward test were analyzed for each run

in order to check any necessary and unnecessary parameters in a model. In the end, however,a final speaking test model was chosen in accordance with substantive considerations while

taking into account the issue of parsimony. In the process of model evaluation, the ML Robustmethod was used each time due to multivariate non-normality of the data.

Results

Descriptive Statistics

The descriptive statistics which were calculated for the item level, the rating scalelevel, and the entire 30-item test are presented in Table 2. The item-level means ranged from

2.64 to 3.41 and the standard deviations from 1.01 to 1.57. Although not very different, themeans of grammar-related items (i.e., GramT1 to GramT6) were lower than those for the

other groups of items. On the other hand, task completion-related items (i.e., TaskT1 to TaskT6) showed relatively higher means compared to the other items. Grammar-related items had

the least variability (average Std.=1.04) while task completion-related items had the largestvariability (average Std.=1.23). With regard to the task-related aspect, Task 6 items (i.e.,

MeanT6, GramT6, DiscT6, TaskT6, and IntelT6) had the lowest means under each ratingscale. However, their standard deviations were greatest compared to those for the other task

items under the same rating scale. The skewness and kurtosis values, within the acceptablerange, indicated that all 30 items and five rating scales appeared to be normally distributed.

Reliability Analyses

The reliability estimates for internal consistency were calculated for the five ratingscales and for the entire test (see Table 3). The reliability estimate for the entire test was very

high (0.991), signifying a high degree of homogeneity among the 30 items. Internalconsistency reliability for each rating scale also showed a high degree of consistency of the

six tasks under the five scales. The high reliability estimates, ranging from 0.936 to 0.963,suggested that the six tasks measured the same construct with a high degree of consistency

within each rating scale.


8/30

8 H. J. Kim

Table 2. Descriptive Statistics (N=215, K=30)

Variable Minimum Maximum Mean Std. Skewness Kurtosis

1. Meaningfulness (Mean) 0 5.00 3.10 1.14 -.87 .30

MeanT1 0 5.00 3.12 1.28 -.81 .24

MeanT2 0 5.00 3.12 1.18 -.93 .65

MeanT3 0 5.00 3.20 1.11 -.82 .60MeanT4 0 5.00 3.10 1.30 -.84 -.07

MeanT5 0 5.00 3.20 1.24 -.91 .27

MeanT6 0 5.00 2.85 1.38 -.66 -.45

2. Grammar (Gram) 0 4.58 2.87 1.04 -.95 .52

GramT1 0 5.00 2.83 1.15 -.84 .46

GramT2 0 4.50 2.87 1.04 -1.11 1.16

GramT3 0 4.50 2.92 1.01 -.92 .83

GramT4 0 5.00 2.93 1.20 -.97 .35

GramT5 0 5.00 2.93 1.12 -.92 .46

GramT6 0 4.50 2.73 1.27 -.79 -.29

3. Discourse Competence

(Disc)0 4.50 2.86 1.08 -.91 .33

DiscT1 0 5.00 2.83 1.21 -.78 .17

DiscT2 0 5.00 2.84 1.09 -.91 .59

DiscT3 0 5.00 2.95 1.05 -.83 .75

DiscT4 0 5.00 2.93 1.26 -.85 -.04

DiscT5 0 5.00 2.97 1.18 -.84 .17

DiscT6 0 5.00 2.64 1.32 -.63 -.42

4. Task Completion (Task) 0 5.00 3.18 1.23 -.86 .08

TaskT1 0 5.00 3.07 1.41 -.52 -.44

TaskT2 0 5.00 3.36 1.33 -1.00 .32TaskT3 0 5.00 3.41 1.23 -.92 .42

TaskT4 0 5.00 3.04 1.57 -.56 -1.01

TaskT5 0 5.00 3.37 1.41 -.82 -.25

TaskT6 0 5.00 2.86 1.48 -.52 -.73

5. Intelligibility (Intel) 0 4.92 3.02 1.09 -.92 .53

IntelT1 0 5.00 2.99 1.20 -.89 .50

IntelT2 0 5.00 3.00 1.15 -.96 .73

IntelT3 0 5.00 3.09 1.05 -.93 .86

IntelT4 0 5.00 3.07 1.25 -.90 .20

IntelT5 0 5.00 3.10 1.16 -.91 .67

IntelT6 0 5.00 2.89 1.30 -.75 -.16

Total (30 items) 0 4.73 3.01 1.10 -.94 .43


9/30


Table 3. Reliability Estimates (N=215)

Construct Items UsedNr of

ItemsReliability Estimates

Meaningfulness MeanT1 - MeanT6 6 0.960Grammatical Competence GramT1 - GramT6 6 0.963

Discourse Competence DiscT1 - DiscT6 6 0.958Task Completion TaskT1 - TaskT6 6 0.936

Intelligibility IntelT1 - IntelT6 6 0.963Total 30 0.991

Although average scores by the two raters were used for the statistical analyses, inter-

rater reliability was calculated to determine the degree of agreement between the two raters.The correlation between Rater 1 and Rater 2 was 0.837 for examinees total score (see Table

4), 0.71 to 0.80 across the six tasks (see Table 5), and 0.78 to 0.82 across the five rating scales(see Table 6). All correlations were significant at the alpha = 0.01 level, indicating that the

first raters score on each task, each rating scale, and entire test significantly correlated withthe second raters score on the same task, rating scale, and entire test. As a result, it can be

assumed that the two raters scored the examinees speaking with similar criteria in mind.

Table 4. Inter-rater Reliability for the Entire Speaking Test (N = 215)

Rater 1 (TotR1) Rater 2 (TotR2)

Rater 1 (TotR1) 1.00 0.837**

Rater 2 (TotR2) 0.837** 1.00

**p < 0.01 (2-tailed), R1 = Rater 1, R2 = Rater 2

Table 5. Inter-rater Reliability across Six Tasks (N = 215)T1R1 T1R2 T2R1 T2R2 T3R1 T3R2 T4R1 T4R2 T5R1 T5R2 T6R1 T6R2

T1R1 1.00 0.80**

T1R2 1.00

T2R1 1.00 0.75**

T2R2 1.00

T3R1 1.00 0.71**

T3R2 1.00

T4R1 1.00 0.81**

T4R2 1.00

T5R1 1.00 0.80**

T5R2 1.00

T6R1 1.00 0.80**

T6R2 1.00**p < 0.01 (2-tailed), T1T6: Task 1Task 6; R1 = Rater 1, R2 = Rater 2


10/30

10 H. J. Kim

Table 6. Inter-rater Reliability across the Five Constructs (N = 215)

MR1 MR2 GR1 GR2 DR1 DR2 TR1 TR2 IR1 IR2

MR1 1.00 0.78**

MR2 1.00GR1 1.00 0.82**

GR2 1.00DR1 1.00 0.80**

DR2 1.00TR1 1.00 0.80**

TR2 1.00IR1 1.00 0.81**

**p < 0.01 (2-tailed), M: Meaningfulness, G: Grammatical Competence, D: Discourse

Competence,T: Task Completion, I: Intelligibility,

R1 = Rater 1, R2 = Rater 2

Results of Exploratory Factor Analysis

Once the appropriateness of the use of a correlation matrix for factor analysis was

verified (e.g., a significant Chi-square, the positive determinant of the correlation matrix), anEFA was conducted as a preliminary step for a CFA in order to develop a factor structure for

the 30 observed variables. The initial factor extraction showed a very different result from thehypothesized design of speaking ability, which assumed five underlying factors (i.e., five

rating scales). Two factors with eigenvalues greater than 1.0 were extracted, which accountedfor 83.7 percentof the variance. Variable communalities were all above 0.7, specifying that

the variances of the variables accounted for by the common factors were very high. The screeplot also suggested the extraction of two factors. Since the number of factors obtained from

the initial extraction was quite different from the hypothesis set for the speaking test, solutionswith different numbers of factors were compared. The three factor oblique rotation was the

best solution to achieve maximum parsimony (see Table7). As observed in Table 7, the 30items used to measure speaking ability clustered around the type of task. For instance, items

for Tasks 1, 2, and 3 loaded on Factor 1, items for Task 6 loaded on Factor 2, and items forTasks 4 and 5 loaded on Factor 3. To illustrate, all five items for Task 6 (i.e., MeanT6,

GramT6, DiscT6, TaskT6, and IntelT6) showed factor loadings above 0.3 for Factor 2.Further analysis of the six tasks revealed a possible reason as to why the items

clustered around the task type factors rather than around the rating scales. Since Tasks 1, 2,and 3 required examinees to speak with the minimal input, the factor on which the items for

these three tasks loaded was interpreted as a Speak factor. Contrary to Tasks 1, 2, and 3,

Tasks 4 and 5 first required examinees to listen to a long message and then respond orsummarize it. Thus, Factor 3, which included items for Tasks 4 and 5, was coded as a Listenand Speak factor. While Task 6 was a summary task (as was Task 5), it appeared that Task 6

required examinees to have topical knowledge in the process of listening and summarizing amessage. That is, examinees familiarity with the topic of the task could help them approach

the task easily. Whereas Task 5 was about a topic (an electric car) that might be morecommonly discussed in everyday life, the listening prompt provided in Task 6 was a lecture

with highly specified content (the Barbizon School). Thus, Factor 2 was coded as Listen and


11/30


Speak with Topical Knowledge. In sum, the items did not cluster around operationalizedconstructs of speaking ability (i.e., rating scales), showing that examinees speaking

performance was better explained according to the task type rather than to the hypothesizedfive constructs of speaking ability. As a result, the two cross-loadings present (i.e., IntelT3

and GramT5) were not seen as problematic since grammar and intelligibility could be

involved in any task as long as factors were divided based on the task type. The final three-factor solution is presented in Table 8.

Table 7. Pattern Matrix for Speaking Ability

Factor1 2 3

DiscT2 1.015 .054 .171GramT2 .944 .142 .163

TaskT2 .890 .017 .014GramT1 .874 .027 -.037

IntelT1 .873 -.019 -.054DiscT1 .856 -.006 -.061

IntelT2 .852 .139 .069MeanT2 .847 .135 .037

MeanT1 .837 -.052 -.138TaskT3 .795 -.090 -.138

GramT3 .764 -.020 -.207DiscT3 .720 -.030 -.236

MeanT3 .702 .020 -.212TaskT1 .670 .028 -.190

IntelT3 .585 .015 -.337MeanT6 .011 .962 -.004

TaskT6 -.044 .933 -.063DiscT6 .056 .918 -.014

GramT6 .094 .847 -.053IntelT6 .033 .804 -.143

MeanT4 .076 -.004 -.897GramT4 .108 .058 -.809

TaskT4 -.057 .125 -.791IntelT4 .090 .120 -.769

DiscT4 .135 .100 -.741TaskT5 .066 .253 -.634

IntelT5 .189 .205 -.584MeanT5 .224 .182 -.573

DiscT5 .288 .133 -.564GramT5 .319 .179 -.484

Extraction Method: Principal Axes Factoring.

Rotation Method: Oblimin with Kaiser Normalization.a Rotation converged in 13 iterations.


12/30

12 H. J. Kim

Table 8. Revised Taxonomy of Speaking Ability (Based on Exploratory Factor Analysis)

FactorsNr of

ItemsItems

Speak 15

Task 1

Task 2Task 3

5

55

MeanT1, GramT1, DiscT1, TaskT1, IntelT1

MeanT2, GramT2, DiscT2, TaskT2, IntelT2MeanT3, GramT3, DiscT3, TaskT3, IntelT3

Listen & Speak withTopical Knowledge

5

Task 6 5 MeanT6, GramT6, DiscT6, TaskT6, IntelT6

Listen & Speak 10

Task 4Task 5

55

MeanT4, GramT4, DiscT4, TaskT4, IntelT4MeanT5, GramT5, DiscT5, TaskT5, IntelT5

Total 30

Results of Confirmatory Factor Analysis

Bachman (2002) argued that a language test should be designed taking task

characteristics into account as well as the construct definition of language ability in order toachieve the intended purpose of the test. In an attempt to understand the speaking test of the

current study in terms of both aspects (i.e., construct definition and task characteristics), thefirst MTMM model was hypothesized in which the 24 items loaded on both trait factors (i.e.,

the four rating scales) and method factors (i.e., the six tasks), while the four trait factorsloaded on a second-order factor, speaking ability (see Figure 2). The rating scale of task

completion was not included as a trait factor in the model since it was considered redundant inrelation to the other rating scales. As a result, six items related to task completion (i.e.,

TaskT1, TaskT2, TaskT3, TaskT4, TaskT5, TaskT6) were deleted for the analysis, making atotal of 24 observed variables. Moreover, correlations among six tasks were not established in

the first model because six different tasks were hypothesized to elicit different aspects ofspeaking ability.

In order to respecify the first model, several attempts were made. First, it was testedwhether four first-order factors (i.e., four trait factors) would load on the second-order factor

(i.e., speaking ability) without any method factors (see Figure 3). The data did not fit themodel, which indicated problems similar to those of the first model (e.g., condition codes and

factor loadings above 1.0). Moreover, the model showed a very poor fit, with a CFI of 0.715and a RMSEA of 0.162. The results confirmed a need for consideration of both construct (i.e.,

rating scales) and task to interpret test scores, since the model without the task factors did not

represent the data. In addition, based on the results of this model, it was decided that four traitfactors should be correlated instead of using of a second-order factor. The model-fitevaluation of the hypothesized model indicated an excellent fit, showing the very high CFI

(0.99) and the very low RMSEA (0.032 with the confidence interval [0.015, 0.044]). In termsof fit indices, the model was ideal since the CFI above 0.95 and the RMSEA below 0.05 are

considered an indication of a well-fitting model (Byrne, 2006). However, the test results werenot reliable due to a condition code for a variance of factor error (Parameter: D2, D2) which

caused an improper solution (e.g., the greater than 1.0 factor loading for Grammatical


13/30


Competence). Such a condition code, which is a common occurrence with MTMM data,might have occurred due to the complexity of model specification (Byrne, 2006). Thus, the

initially hypothesized model was rejected.

Figure 2. The Hypothesized Second-Order MTMM Model of CEP Speaking Placement Test

Mean: Meaningfulness, Gram: Grammatical Competence, Disc: Discourse Competence,Intel: Intelligibility, T1T6: Task 1Task 6


14/30

14 H. J. Kim

Figure 3. The Second-order Model without Method FactorsMean: Meaningfulness, Gram: Grammatical Competence, Disc: Discourse Competence,

Intel: Intelligibility, T1T6: Task 1Task 6

Another attempt was made before deciding upon a final model. A model was testedwith two additional factors: Rating 1 and Rating 2. The model was run both with and without

the correlation between the two ratings. However, both models were unsuccessful, whichconfirmed that the data were not explained with such models. Therefore, based on an

examination of several possible models, the final MTMM model was established with fourtrait factors which were correlated with each other and six method factors (see Figure 4). This

final model was obtained after statistically testing two assumptions which were made inadvance. The first assumption regarding the deletion of task completion factor was confirmed

since the inclusion of task completion factor to the final model lowered the overall fit of thedata. To test the other assumption related to possible task effect, the final MTMM model was


15/30


also tested with correlations among six method factors. Although the overall fit increased, itshowed very little improvement. Thus, it was concluded that six different tasks measured

different aspects of speaking ability so that the correlations were not included in the finalmodel. Though all estimates were statistically significant, they were not included in Figure 4

since they were not legible with the overabundance of arrows (Refer to Table 10 for the

estimates).As shown in Figure 4, there were 24 dependent variables (i.e., 24 observed variables)and 34 independent variables (i.e., 10 factors and 24 error terms). There were also 78

parameters (i.e., 48 factor loadings, 6 factor covariances, 24 error variances) and 34 fixednonzero parameters (i.e., 10 factor variances, 24 error regression paths). The structure of these

factors and variables as specified in the model was tested based on the covariance matrix.Following the summary of the model, model identification was confirmed in the output.

The model was first assessed as a whole. In terms of residuals, off-diagonal elementswere examined since they play a major role in the effect of Chi-square statistics. The

standardized residual values were evenly distributed, and the average off-diagonal absolutestandardized residual was also quite small, at 0.0156. In addition, the distribution of

standardized residuals was symmetric and centered around zero. As a result, it was found thatvery little discrepancy existed between S(q) (covariance matrix implied by the specified

structure of the hypothesized model) and S (sample covariance matrix of observed variablescores). With regard to the goodness of fit statistics, the independence Chi-square statistic was

5189.140 with 276 degrees of freedom. Although the Chi-square/df ratio was much greaterthan 2, implying a poor model-data fit, it was ignored due to Chi-square sensitivity to sample

size. Instead, fit indices were used for further model-fit evaluation (see Table 9).Table 9. EQS Output Goodness of Fit Statistics

GOODNESS OF FIT SUMMARY FOR METHOD = ROBUST

ROBUST INDEPENDENCE MODEL CHI-SQUARE = 5189.140 ON 276 DEGREES OF FREEDOMINDEPENDENCE AIC = 4637.140 INDEPENDENCE CAIC = 3430.844

MODEL AIC = -179.250 MODEL CAIC = -1149.532

SATORRA-BENTLER SCALED CHI-SQUARE = 264.7500 ON 222 DEGREES OF FREEDOM

PROBABILITY VALUE FOR THE CHI-SQUARE STATISTIC IS 0.02603

FIT INDICES

BENTLER-BONETT NORMED FIT INDEX = 0.949

BENTLER-BONETT NON-NORMED FIT INDEX = 0.989

COMPARATIVE FIT INDEX (CFI) = 0.991

BOLLEN'S (IFI) FIT INDEX = 0.991

MCDONALD'S (MFI) FIT INDEX = 0.905

ROOT MEAN-SQUARE ERROR OF APPROXIMATION (RMSEA) = 0.030

90% CONFIDENCE INTERVAL OF RMSEA (0.011, 0.043)


16/30

16 H. J. Kim

Figure 4. The Final MTMM ModelMean: Meaningfulness, Gram: Grammatical Competence, Disc: Discourse Competence,

Intel: Intelligibility, T1 T6: Task 1 Task 6; F1 F10: Factors 1 Factor 10; V2 V31:Observed Variables 2 31


17/30


As shown in Table 9, the CFI was 0.991 and the RMSEA was 0.03 with the confidenceinterval [0.011, 0.043], both of which indicated an excellent fit. The final indicator of overall

model fit was the number of iterations. According to the iterative summary in the output, onlyfive iterations were needed to reach convergence, which meant that the data fit the model

relatively easily. Thus, it was revealed from the analyses of residuals and fit indices that the

current 24 data fit the 10 factor MTMM model well as a whole.After confirming the good fit of the model as a whole, the fit of individual parameterswas also assessed. The statistical significance of parameter estimates was first checked based

on the unstandardized estimates. All parameter estimates were statistically significant.Therefore, all parameters could be considered important to the model, and none of the

parameters needed to be deleted from the model. Following the unstandardized estimates, astandardized solution was considered (see Table 10).

As shown in Table 10 (next page), the trait factor loadings (i.e., F1 to F4), rangingfrom 0.849 to 0.926, were much higher than method factor loadings (i.e., F5 to F10), ranging

from 0.265 to 0.466. This signified that the four traits (i.e., rating scales) were much strongerindicators than the six tasks, although both needed to be considered. Since the regression

coefficients of errors were quite small, ranging from 0.207 to 0.309, it can be concluded thatthe contribution of errors to the variables was low and the variables were mainly explained by

the factors. All of the very high R-squared values, which refer to the proportion of varianceaccounted for by its related factors, confirmed that all 24 items explained the model fairly

well. Moreover, as assumed above, correlations between the trait factors were quite high ataround 0.98. The four factors were all operationalized constructs of a single construct of

speaking ability. However, extremely high correlations were not considered ideal for analyticscoring since they indicated that four rating scales were almost indistinguishable.

Discussion and Conclusion

The present study examined the underlying structure of the CEP speaking placement

test based on a confirmatory factor analysis. The analysis was conducted with four traitfactors (i.e., meaningfulness, grammatical competence, discourse competence, and

intelligibility) and six method factors (i.e., Tasks 1 to 6). Also, the four trait factors werecorrelated with one another. Although these four traits were assumed to be related by virtue of

being aspects of the same ability, correlations over 0.90 were unexpected. These highcorrelations may indicate that speaking ability cannot be separated into several analytic

aspects, or the raters failed to understand and differentiate among the analytic scoring criteria.For example, raters may have given similar scores to the four rating scales of each task based

on their own impression rather than going over the different criteria carefully, or they may nothave been accustomed to the different criteria because of the short norming period. Further

research on raters rating processes may be required to explain the relationship among thesecomponents of speaking ability.


18/30

18 H. J. Kim

Table 10. EQS Output Standardized Solution

!"%!#"!#

"$

"$

!"$

""$

"$

"$

!"$

""$

"$

"$

!"$

""$

"$

"$

!"$""$

"$

"$

!"$

""$

"$

"$

!"$

""$

"!"$!

$


19/30


The final MTMM model explained the current test data very well, as evidenced by thehigh fit indices. In particular, the four operationalized constructs (i.e., four rating scales)

primarily explained the data with higher factor loadings than the six tasks. In other words,examinees performance on the test was mainly explained by the four constructs of speaking

ability; however, the characteristics of the six tasks had a non-negligible effect on the

examinees performance. Therefore, the results of the current study empirically supported theinteractionalist perspective in which examinees speaking ability is determined in terms ofboth constructs (traits) and task characteristics of the test.

Although the current study contributes to the recent discussion concerning theimportance of both construct definitions and test task characteristics in L2 performance

assessments, it has a number of limitations. First, due to a limited sample size, it was notpossible to include a rating factor as part of the underlying structure of the speaking test

although multiple ratings were available for all examinees responses. It has been argued thatraters are the one of the factors that affects examinees performance (Kenyon, 1992; Linacre,

1989; McNamara, 1995, 1996, 1997). Indeed, previous studies on raters, which analyzedraters rating behaviors both quantitatively and qualitatively, showed rater effects on

performance assessments (e.g., Bonk & Ockey, 2003; Brown, 2005; Chalhoub-Deville, 1995;Eckes, 2005; Meiron & Schick, 2000; Orr, 2002). Therefore, inclusion of a rater/rating factor

might change the underlying structure of the speaking test.The other limitation is that structural equation modeling is a data-specific statistical

tool. In other words, the results of the current analyses cannot be generalized to other CEPspeaking data which include different participants. Likewise, other data sets might be

explained with different factors or different factorial structures. Therefore, in order togeneralize the structure of CEP speaking placement test, repeated analyses of test data with a

larger sample size are required across different test administrations. Only then can the natureof the CEP speaking placement test be understood and, ultimately, can inferences made on

examinees speaking ability be considered reliable.

Acknowledgements

I would like to express my appreciation to the English Language Institute at the

University of Michigan for giving me an opportunity to perform this research. I am also verygrateful to Professor James Purpura and my colleagues at Teachers College, for their

insightful comments and suggestions throughout this study.

References

Bachman, L. F. (1990).Fundamental considerations in language testing. Oxford: OxfordUniversity Press.

Bachman, L. F. (2002). Some reflections on task-based language performance assessment.Language Testing, 19(4), 453476.

Bachman, L. F., & Palmer, A. S. (1996).Language testing in practice: Designing anddeveloping useful language tests. Oxford: Oxford University Press.

Barnwell, D. (1989). Naive native speakers and judgments of oral proficiency in Spanish.Language Testing, 6(2), 152163.


20/30

20 H. J. Kim

Bentler, P. M., & Wu, E. (2005).EQS 6.1 for windows users guide. Encino, CA: MultivariateSoftware, Inc.

Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second languagegroup oral discussion task.Language Testing, 20(1), 89110.

Brindley, G. (1994). Task-centred assessment in language learning: The promise and the

challenge. In N. Bird, P. Falvey, A. Tsui, D. Allison, & A. McNeill (Eds.), Languageand learning: Papers presented at the Annual International Language in EducationConference (Hong Kong, 1993) (pp. 7394). Hong Kong: Hong Kong Education

Department.Brown, A. (1995). The effect of rater variables in the development of an occupation-specific

language performance test.Language Testing, 12(1), 115.Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency.

Language Testing, 20(1), 125.Brown, A. (2005).Interviewer variability in oral proficiency interviews. Frankfurt, Germany:

Peter Lang.Brown, J. D., Hudson, T., Norris, J. M., & Bonk, W. (2002).An investigation of second

language task-based performance assessments. Honolulu: University of Hawaii Press.Byrne, B. M. (2006). Structural equation modeling with EQS. Mahwah NJ: Lawrence

Erlbaum Associates, Inc.Canale, M. (1983). On some dimensions of language proficiency. In J. W. Oller, Jr. (Ed.),

Issues in language testing research (pp. 333342). Rowley, MA: Newbury House.Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second

language teaching and testing.Applied Linguistics, 1(1), 147.Chalhoub-Deville, M. (1995). Deriving oral assessment scales across different tests and rater

groups.Language Testing, 12(1), 1633.Chapelle, C. (1998). Construct definition and validity inquiry in SLA research. In L. F.

Bachman & A. D. Cohen (Eds.),Interfaces between second language acquisition andlanguage testing research (pp. 3270). Cambridge: Cambridge University Press.

Clark, J. L. D. (1975). Theoretical and technical considerations in oral proficiency testing. InR. L. Jones, & B. Spolsky (Eds.), Testing language proficiency (pp. 1028). Arlington,

VA: Center for Applied Linguistics.Clark, J. L. D. (1988). Validation of a tape-mediated ACTFL/ILR-scale based test of Chinese

speaking proficiency.Language Testing, 5(2), 187205.Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance

assessments: A many-facet Rasch analysis.Language Assessment Quarterly, 2(3), 197221.

Elder, C. (1993). How do subject specialists construe classroom language proficiency?Language Testing, 10(3), 235254.

Elder, C., Iwashita, N., & McNamara, T. (2002). Estimating the difficulty of oral proficiencytasks: What does the test-taker have to offer?Language Testing, 19(4), 347368.

Farris, C. S. (1995). A semiotic analysis ofsajiao as a gender marked communication style inChinese. In M. Johnson & F. Y. L. Chiu (Eds.), Unbound Taiwan: Close-ups from a

distance. Selected Papers Vol. 8 (pp. 129). Chicago: Center for East Asian Studies,University of Chicago.

Fulcher, G. (2003). Testing second language speaking. London: Longman.


21/30


Jones, R. L. (1985). Second language performance testing: An overview. In P. C. Hauptman,R LeBlanc, & M. B. Wesche (Eds.), Second language performance testing(pp. 1524).

Ottawa: University of Ottawa Press.Kenyon, D. M. (1992). Introductory remarks at symposium onDevelopment and use of rating

scales in language testing, 14th Language Testing Research Colloquium, Vancouver,

February 27th March 1st.Kim, M. (2001). Detecting DIF across the different language groups in a speaking test.Language Testing, 18(1),89114.

Kim, Y. (2009). An investigation into native and non-native teachers judgments of oralEnglish performance: A mixed methods approach.Language Testing, 26(2), 187217.

Linacre, J. M. (1989).Many-facet Rasch measurement. Chicago: MESA Press.Lumley, T. (1998). Perceptions of language-trained raters and occupational experts in a test of

occupational English language proficiency.English for Specific Purposes, 17, 34767.Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for

training.Language Testing, 12(1), 5471.Lumley, T., & OSullivan, B. (2005). The effect of test-taker gender, audience and topic on

task performance in tape-mediated assessment of speaking. Language Testing, 22(4),415437.

Luoma, S. (2004).Assessing speaking. Cambridge: Cambridge University Press.Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and many-facet Rasch

measurement in the development of performance assessments of the ESL speaking skillsof immigrants.Language Testing, 15(2), 158180.

Malabonga, V., Kenyon, D. M., & Carpenter, H. (2005). Self-assessment, preparation andresponse time on a computerized oral proficiency test.Language Testing, 22(1), 5992.

McNamara, T. F. (1995). Modelling performance: Opening pandoras box.AppliedLinguistics, 16(2), 159179.

McNamara, T. F. (1996).Measuring second language performance. London: Longman.McNamara, T. F. (1997). Interaction in second language performance assessment: Whose

performance?Applied Linguistics, 18(4), 446466.Meiron, B., & Schick, L. (2000). Ratings, raters and test performance: An exploratory study.

In A. J. Kunnan (Ed.),Fairness and validation in language assessment. Selected papersfrom the 19

thLanguage Testing Research Colloquium, Orlando, Florida (pp. 6081).

Cambridge: Cambridge University Press.Messick, S. (1994). The interplay of evidence and consequences in the validation of

performance assessments.Educational Researcher, 23(2), 1323.Morrow, K. (1979). Communicative language testing: Revolution or evolution? In C. J.

Brumfit, & K. Johnson (Eds.), The communicative approach to language teaching(pp.143157). Oxford: Oxford University Press.

Norris, J. M., Brown, J. D., Hudson, T., & Yoshioka, J. (1998).Designing second languageperformance assessments (Technical Report No. 18). Honolulu: University of Hawaii,

Second Language Teaching & Curriculum Center.OLoughlin, K. K. (2002). The impact of gender in oral proficiency testing.Language Testing,

19(2), 169192.Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret test scores.

System, 30, 143154.


22/30

22 H. J. Kim

Savignon, S. J. (1972). Communicative competence: An experiment in foreign languageteaching. Philadelphia: The Center for Curriculum Development.

Shohamy, E. (1994). The validity of direct versus semi-direct oral tests.Language Testing,11(2), 99123.

Skehan, P. (1998).A cognitive approach to language learning. Oxford: Oxford University

Press.SPSS Inc. (2001). SPSS Base 12.0 for Windows[Computer Software]. Chicago IL: SPSS Inc.Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in

assessing oral interaction.Language Testing, 10(3), 305335.Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test

discourse.Language Testing, 14(1), 85106.


23/30


Appendix A. Speaking Test Tasks

Task 1. Catering serviceIn this task, you need to complain about something. Imagine you have ordered food from

Party Planners Inc. for your bosss birthday party. But there was not enough food and it was

delivered late. You spent a week planning the party, but it was ruined because of the food.You were extremely upset that it happened. Call the caterer to complain about it. You have 20seconds to plan.

Prompt (Audio)

[phone ringing] (Answering Machine) Hi! Youve reached Party Planners Inc. Were sorry,but were not available to take your call right now. Please leave a detailed message after the

beep, and well get back to you as soon as possible. [Beep]Test-Taker: (45 sec response time)

Task 2. Favorite movieIn this task, you will be asked to talk about a movie. Think about a movie that you liked and

tell your friend about it. You have 20 seconds to plan.

Prompt (Vidio)

Your friend: So, what was that movie you liked? What is it about?

Test-Taker: (60 sec response time)

Task 3. Fly in soupIn this task, you need to tell the story in the pictures. Look at the pictures (Pictures are shown

on the screen). Imagine this happened yesterday while you were having dinner at the nexttable. Tell your friend what you saw. You have 60 seconds to plan your response.

Prompt (Video)

Your friend: So, what happened last night at the restaurant?Test-taker: (60 sec response time)


24/30

24 H. J. Kim

Task 4. Moving outIn this task, you need to refuse a request. Imagine you are renting an apartment from a nice

old couple in New York City. You have been living there for over a year. Now, listen to atelephone message from the couple.Hi, this is Mary, your landlady. Tom and I have been trying to contact you, but you never seem

to be home. I guess you're really busy these days. Anywaywell, I don't know how to say this,butour granddaughter is moving to the City next month. She's gonna study at Columbiaand,as you know, living in the city is expensive, and the rents are really high. So, she asked us if she

could live in the apartment you have now. I know we just renewed your lease, and we have noright to ask you to move out, and, we really like you, too. But, do you think you can possibly

look for a different apartment? We're really sorry about this, but we have to do this for ourgranddaughter. Since theres not much time, we'd like to hear from you as soon as possible, so

we can let our granddaughter know too. Again, we're sorryCall and let us know, ok? Thanks.(162 words)(Q) Politely tell your landlady that you cant move out and explain why. You have 30 seconds

to plan.

Prompt (Audio)

Landlady: Hi. Come on in. Did you get our message? Have you thought about moving out?Test-taker: (45 sec response time)

Task 5. Electric cars

In this task, you will be asked to summarize a radio commentary for a friend. Imagine yourfriend, Jim is thinking about buying an electric car. Now, listen to the radio commentary.

(Host of the radio commentary) Today, were talking about electric cars. As youre well

aware, the conventional cars we drive everydayuse a lot of gasoline. You know, how theprice of gasoline is going upand more importantly, theres the issue of global warmingthese cars release harmful pollutants, like carbon monoxide. So, in reaction to this, engineers

have been working on cars that run on electric batteries, so lets hear about the current stateof the technology. We have a pre-recorded commentary by Ben Smith from General Autos.

Well, despite high expectations, the first generation of electric cars turned out to be acomplete failure. Why? The first problem is the batteryI mean, current battery technology is

still very limited. So electric cars can only travel a short distance before its battery needsrecharging. What this means is you cant make long trips without worrying about the battery

running out. Theyre only good for short trips like going to the supermarket or picking up thekids from school. And when you turn the air conditioner or the radio on, the battery is used up

even quicker.Then, you might say, we can just recharge the battery when its used up.

Welltheres a serious problem with recharging, too. To recharge a battery, we need anelectric outlet, right? But there arent many charging stationswhich means, the driver might

get stuckwithout being able to find a charging station nearby. Well, it gets even morefrustrating. Even if you can find a station, it takes up to 3 hours to fully recharge a battery. Its

way too long. Well, with these many limitations, does it make sense that anyone would wantto buy an electric car, even if it is environmentally friendly?


25/30


(Q) Summarize what you heard on the radio for Jim. Be sure to include two main problemswith electric cars. You have 30 seconds to plan.

Prompt (Video)

Jim: Did I tell you Im thinking about buying an electric car?

Test-taker: (60 sec response time)

Task 6. Barbizon schoolIn this task, you will be asked to summarize a lecture for a classmate. Imagine your classmate,

Jennifer missed todays lecture about the Barbizon school. Now, listen to the lecture.

Today, well talk about a group of artists, called the Barbizon School. The Barbizon School is

a group of French artists, who lived in the French town, Barbizon and who developed thegenre of landscape painting. So, what are their characteristics?

The Barbizon painters tried to find comfort in nature. I mean, they moved away from all the

commotion and disruption happening in, then, revolutionary Paris, and sought solace innature. And nature was the main theme of their paintingsthey painted landscapes and

scenes of rural life as true to life as possible. And they rejected the idea of manipulating orbeautifying nature. Instead, they tried to achieve a true representation of the countryside. OK?

Second, in addition to the efforts to paint nature as realistically as possible, they also tried to

establish landscape as an independent, legitimate genre in France. Traditionally, landscapepainting wasnt appreciated as a separate genre, but only considered as a background. But

Barbizon artists reacted against this convention of classical landscape, and painted landscapefor its own sake. With their huge success and recognition, the painters of the Barbizon school

established landscape and themes of country life as vital subjects for French artists.

Now, lets look at an examplea painting by Rousseau. This one is called The Forest inWinter at Sunset. [Show the painting on screen]. It shows the ancient forest near the village

of Barbizon. Rousseau is the best known member of the group. Each Barbizon painter had hisown style and specific interests, and Rousseaus vision was melancholic and sad. Can you feel

the depressing mood of the painting? At the top, a tangle of tree limbs, and birds flying intothe cloudy, dark, sunset sky. After the sun sets, the forest will be freezing cold. Rousseau

worked on this painting off-and-on for twenty years. He considered this his most importantpainting and refused to sell it during his lifetime.

(Q) Summarize the lecture for Jennifer. Be sure to include two main characteristics of the

school and the example shown. You have 30 seconds to plan.

Prompt (Video)

Jennifer: So, what was the lecture about? What did I miss?

Test-taker: (60 sec response time)


26/30


27/30


GrammaticalCompetence:Accuracy,

ComplexityandR

ange

5Excellent

4Good

3Ad

equate

2Fair

1Limited

0No

Theresponse:

Theresponse:

Therespon

se:

Theresponse:

Theresponse:

Theresponse:

is

grammatically

accurate.

is

generally

grammatically

accuratewithoutany

majorerrors(e.g.,

articleusage,

subject/verb

agreement,etc.)

that

obscuremeaning.

ra

relydisplaysmajor

errorsth

atobscure

meaningandafew

minorerrors(but

whatthespeaker

wantsto

saycanbe

understood).

di

splaysseveral

majorerrorsaswell

asfrequentminor

errors,causing

confusion

sometimes.

is

almostalways

grammatically

inaccurate,which

causesdifficultyin

understandingwhat

thespeakerwantsto

say.

dis

playsno

gra

mmaticalcontrol.

di

splaysawiderange

ofsyntacticstructures

andlexicalform.

di

splaysarelatively

widerangeof

syntacticstructures

andlexicalform.

di

splays

asomewhat

narrowrangeof

syntacticstructures;

tooman

ysimple

sentences.

di

splaysanarrow

rangeofsyntactic

structures,limitedto

simplesentences.

di

splayslackofbasic

sentencestructure

knowledge.

dis

playsseverely

lim

itedornorange

andsophisticationof

gra

mmaticalstructure

andlexicalform.

di

splayscomplex

syntacticstructures

(relativeclause,

embeddedclause,

passivevoice,etc.)

andlexicalform.

di

splaysrelatively

complexsyntactic

structuresandlexical

form.

di

splays

somewhat

simples

yntactic

structures

di

splaysuseof

simpleand

inaccuratelexical

form.

di

splaysgenerally

basiclexicalform.

co

ntainsnotenough

evidencetoevaluate.

di

splays

useof

somewh

atsimpleor

inaccuratelexical

form.


28/30

28 H. J. Kim

DiscourseCompetence:OrganizationandCohesion

5Excellent

4Good

3Adequate

2Fair

1Limited

0No

Theresponse:

Theresponse:

Theresponse:

Theresponse:

Theresponse:

Ther

esponse:

is

completely

coherent.

is

generally

coherent.

is

occasionally

incoherent.

is

looselyorganized

,

resultingingenerally

disjointeddiscourse

.

is

generally

incoherent.

is

incoherent.

is

logically

structuredlogical

openingsand

closures;logical

developmentof

ideas.

di

splaysgenerally

logicalstructure.

co

ntainspartsthat

displaysomewhat

illogicalorunclear

organization;

however,asawhole,

itisinge

neral

logically

structured.

of

tendisplays

illogicalorunclear

organization,causin

g

someconfusion.

di

splaysillogicalor

unclearorganization,

causinggreat

confusion.

dis

playsvirtually

non-existent

organization.

at

timesdisplays

somewha

tloose

connectionofideas.

di

splayssmooth

connectionand

transitionofideasby

meansofvarious

cohesivedevices

(logicalconnectors,a

controllingtheme,

repetitionofkey

words,e

tc.).

di

splaysgooduseof

cohesivedevicesthat

generallyconnect

ideassmoothly.

di

splaysuseof

simplecohesive

devices.

di

splaysrepetitive

useofsimple

cohesivedevices;use

ofcohesivedevices

arenotalways

effective.

di

splaysattemptsto

usecohesivedevices,

buttheyareeither

quitemechanicalor

inaccurateleavingthe

listenerconfused.

co

ntainsnotenough

evidencetoevaluate.


29/30


TaskCompletion

Towhatextentdoesthespeakercompletethetask?

5Excellent

4Good

3Adequate

2Fair

1Limited

0No

Theresponse:

Theresponse:

The

response:

Theresponse:

Theresponse:

Theresponse:

fu

llyaddressesthe

task.

ad

dressesthetaskwell

adequatelyaddresses

th

etask.

insufficiently

addressesthetask.

barelyaddressesthe

task.

showsno

understandingofthe

prompt.

displayscompletely

accurateunderstanding

ofthepromptwithout

anymisunderstood

points.

includesnonoticeably

misunderstoodpoints.

includesminor

m

isunderstanding(s)

th

atdoesnotinterfere

w

ithtaskfulfillment.

displayssome

major

incomprehension/

misunderstand

ing(s)

thatinterferes

with

successfultask

completion.

displaysmajor

incomprehension/

misunderstanding(s)

thatinterfereswith

addressingthetask.

containsnotenough

evidencetoevaluate.

completelycoversall

mainpointswith

completedetails

discussedinthe

prompt.

completelycoversallmain

pointswithagoodamountof

detailsdiscussedintheprompt.

(e

.g.,)

ElectricCars:twoproblemswith

th

ecurrenttechnology(battery

ru

nningoutquicklyand

in

convenienceinrecharging)

BarbizonSchool:2characteristics

of

theschoolandoneexample

(p

aintednatureandestablished

landscapingasanindependent

ge

nre,andtheForestinthesunset

as

anexample)

OR

touchesuponallmain

points,butleavesout

details.OR

completelycovers

one(ortwo)main

pointswithdetails,

butleavestherest

out.

OR

touchesuponbitsand

piecesofthep

rompts.


30/30

30 H. J. Kim

Intelligibility

Pronunciationandprosodic

features(intonation,rhythm,an

dpacing)

5Excellent

4Good

3Adequate

2Fair

1Limited

0No

Theresponse:

Theresponse:

Therespon

se:

Theresponse:

Theresponse:

The

response:

iscompletely

intelligible

althoughaccent

maybethere.

mayincludeminor

difficultieswith

pronunciationor

intonation,but

generallyintelligible.

maylack

intelligibilityin

places

impeding

comm

unication.

oftenlacks

intelligibility

impeding

communication.

generallylacks

intelligibility.

completelylacks

in

telligibility.

isalmostalways

clear,fluidand

sustained.

isgenerallyclear,

fluidandsustained.

Pacemayvaryat

times.

exhibitssome

difficu

ltieswith

pronunciation,

intona

tionor

pacing

.

frequentlyexhib

its

problemswith

pronunciation,

intonationor

pacing.

isgenerally

unclear,choppy,

fragmentedor

telegraphic.

containsnotenough

ev

idencetoevaluate.

doesnotrequire

listenereffort.

doesnotrequire

listenereffortmuch.

exhibitssome

fluidit

y.

maynotbe

sustainedata

consistentlevel

throughout.

containsfrequent

pausesand

hesitations.

mayrequiresome

listene

reffortsat

times.

mayrequire

significantlisten

er

effortattimes.

containsconsistent

pronunciationand

intonation

problems.

requires

considerable

listenereffort.

Spaan V8 Kim

Documents

Transcript of Spaan V8 Kim

SPAAN Tech, Inc. Statement of Capabilities DPI · DPI SPAAN Tech, Inc. Statement of Capabilities. Introduction •Founded in 1998, SPAAN Tech, Inc. is a Chicago Based, Certified M/WBE

Presentation v8

CEH v8 Pro

Demystifying v8 and JavaScript Performancethlorenz.com/talks/demystifying-v8/talk.pdf · Objects Object Map Property “bar” Property “foo” Extra Properties Elements • v8

Aviation Manual V8

V8 Overhaul

Access Logging V8

Spaan catalogo 25 años

MicroStation V8 Manual

Chevrolet COMMERCIAL...Duramax 6.6L Turbo-Diesel V8 Duramax 6.6L Turbo-Diesel V8 Duramax 6.6L Turbo-Diesel V8 Duramax 6.6L Turbo-Diesel V8 Duramax 6.6L Turbo-Diesel V8 Silverado Regular

Ad technology101 v8

State of California · State of California AIR RESOURCES BOARD ... Code for the GM vehicles listed in the attached Exhibit A. ... V8-6.2L V8-6.2L V8-6.2L V8-4.8L V8-5.3L

Pinning V8 Liner

MGB V8 Roadster restoration project Report 144 · MGB V8 Roadster restoration project – Report 144 V8 Register – MG Car Club 170611-V8-restoration-Mike-Macartney-Report-144 1

V8 PROTEIN

I C H I G A M N E G L I TE S T I N SPAAN FELLOW€¦ · Tobie van Dyk & Albert Weideman ... Shelley Dart, Barb Dobson, Theresa Rohlck, ... Spaan Fellow Working Papers in Second or

Motorola V8

Grand Cherokee 1993-98 V8-5.2L 1998 V8-5.9L

MININAP V8

WCS-V8 Programming Software for the Icom IC-V8 · 2015. 2. 4. · WCS-V8 Programming Software for the Icom IC-V8 The WCS-V8 Programmer is designed to give you the ease and convenience