Testing

17
Language Assessment 1. Which of the following is not an assumption underlying testing and measurement? a. Various approaches to measuring aspects of the same thing can be useful b. Error is rarely present in the measurement process c. Present-day behavior predicts future behavior d. Testing and assessment benefit society 2. What is NOT true about construct validity? A. Construct validity is very much an ongoing process as one refines a theory. B. Construct validity is the degree to which an instrument measures the trait or theoretical construct that it is intended to measure. C. Multitrait-multimatrix method is a method to establish construct validity. D. Estimate Cronbach’s alpha to establish construct validity. 3. Which of the following is a type of criterion–related validity evidence? a. Concurrent evidence b. Predictive evidence c. Internal consistency d. Both a and b are correct answers 4. If a test measures a single construct then: a. The items should correlate with the total score b. The items should not correlate with the total score c. The test should not correlate with other measures of the same construct d. There must be a reliable alternative form. 5. Professor X develops a test of emotional intelligence. Which of the following represent convergent and discriminant evidence? a. The test correlates highly with another test of emotional intelligence and is uncorrelated with self-efficacy b. The test correlates with highly with another test of emotional intelligence and is highly correlated with self-efficacy c. The test does not correlate with another test of emotional

Transcript of Testing

Language Assessment1. Which of the following is not an assumption underlying testing and measurement? a. Various approaches to measuring aspects of the same thing can be useful b. Error is rarely present in the measurement process c. Present-day behavior predicts future behavior d. Testing and assessment benefit society 2. What is NOT true about construct validity? A. Construct validity is very much an ongoing process as one refines a theory. B. Construct validity is the degree to which an instrument measures the trait or theoretical construct that it is intended to measure. C. Multitrait-multimatrix method is a method to establish construct validity. D. Estimate Cronbachs alpha to establish construct validity. 3. Which of the following is a type of criterionrelated validity evidence? a. Concurrent evidence b. Predictive evidence c. Internal consistency d. Both a and b are correct answers 4. If a test measures a single construct then: a. The items should correlate with the total score b. The items should not correlate with the total score c. The test should not correlate with other measures of the same construct d. There must be a reliable alternative form. 5. Professor X develops a test of emotional intelligence. Which of the following represent convergent and discriminant evidence? a. The test correlates highly with another test of emotional intelligence and is uncorrelated with self-efficacy b. The test correlates with highly with another test of emotional intelligence and is highly correlated with self-efficacy c. The test does not correlate with another test of emotional intelligence, but does correlate with self-efficacy d. The test does not correlate with other tests of emotional intelligence nor with selfefficacy 6. Internal consistency measures include: A. Split-half reliability B. Kuder-Richardson coefficient C. Cronbachs alpha D. All of the above

7. Which scale is the simplest form of measurement? a. Nominal

b. Ordinal c. Interval d. Ratio 8. ______ tests focus on ones predisposition to learn acquired through the informal learning that goes on in life. a. Personality b. Achievement c. Aptitude d. Intelligence 9. Lets say that a test accurately indicates participants scores on a future criterion (e.g., the PSAT is used to indicate high-school GPA scores). This test would clearly have which of the following? a. Face validity b. Concurrent validity c. Predictive validity d. Content validity 10. If a basketball coach calculates scores, what scale would be used? a. Interval scale b. Ratio scale c. Nominal scale d. Ordinal scale 11. According to the text, most of the outcome/dependent variable characteristics and attributes measured in educational research probably exist at the ______________ level of measurement. a. Nominal b. Ordinal c. Interval d. Ratio

12. Which of the following is most clearly an example of a construct? a. Anxiety enduring for months or years b. Anxiety over just seeing a spider c. Shyness when meeting a stranger for the first time d. Depression caused by the loss of a ball game 13. Characteristics of content validity include all, except: A. Content of the measure is justified by other evidence. B. Content validity entire range or universe of the construct. C. Content validity is usually evaluated and scored by experts in the content area. nursingplanet.com/Quiz/nursing_research_quiz20.html D. Content validity is a form of criterion related validity.

14. Reliability is most simply known as which of the following? a. Consistency or stability b. Appropriateness of interpretations on the basis of test scores c. Ways in which people are the same d. A rank order of participants on some characteristic

15. An ordinal scale is: a. The simplest form of measurement

b. A rank-order scale of measurement c. A scale with equal intervals between adjacent numbers d. A categorical scale 16. Which of the following is not a type of reliability? a. Test-retest b. Split-half c. Content d. Internal consistency

17. Which of the following statements accurately describes test-retest reliability? a. Measure of consistency of test scores over time b. Measure of consistency of scores obtained from two equivalent halves of the same test c. Measure of consistency with which a test measures a single construct or concept d. Measure of degree of agreement between two or more scorers, judges, or raters 18. Which of the following types of reliability refers to the consistency of test scores over time? a. Equivalent forms reliability b. Split-half reliability c. Test-retest reliability d. Inter-scorer reliability 19. Identify the following term that most closely refers to a judgment of the extent to which scores from a test can be used to infer, or predict, the examinees' performance in some activity: a. Content reliability b. Face validity c. Criterion-related validity d. Inference validity 20. Which of the following is the correct order of Stevens four levels of measurement? a. Ordinal, nominal, ratio, interval b. Nominal, ordinal, interval, ratio c. Interval, nominal, ordinal, ratio d. Ratio, interval, nominal, ordinal

21. Which is the process of gathering evidence supporting inferences based test scores? a. Validation b. Validity c. Reliability d. Prediction 22. When evaluating tests and assessments, reliability refers to asking ourselves which of the following questions? a. Does it measure what it is supposed to measure? b. Are there ways to avoid subjective judgments when measuring something? c. Does it give consistent results? d. Does it measure multiple constructs?

23. Validity of a test designed to measure a construct such as self-esteem is best described by which of the following?

a. b. c. d.

Scores from the test correlate highly with most intelligence tests Scores from the test correlate highly with most tests of different constructs Scores from the test are not correlated with anything Scores from the test have a relatively strong and positive correlation with other tests of the same construct (i.e., with other measures of self-esteem) but much lower correlations with tests of different constructs

24. Which type of reliability refers to the consistency of a group of individuals' scores on two equivalent forms of a test designed to measure the same characteristic? a. Split-half b. Test-retest c. Split-forms d. Equivalent forms 25. The degree to which a test measures what it purports to measure reflects its: (a) reliability (b) validity (c) objectivity (d) stability 26. If a sample of test items adequately represent the subject matter of the given apprenticeship course, the test is said to have: (a) construct validity (b) predictive validity (c) content validity (d) concurrent validity 27. A test which yields score that are found to be highly correlated with later performance has: (a) construct validity (b) predictive validity (c) content validity (d) concurrent validity 28. A test which yields scores that are highly correlated with performance now has: (a) construct validity (b) predictive validity (c) content validity (d) concurrent validity 29. The degree to which a test measures a given hypothetical construct is: (a) structural validity (b) predictive validity (c) content validity (d) none of these 30. Cronbachs alpha measures: (a) stability (b) split half reliability (c) reliability of parallel items (d) internal consistency 31. A measurement instrument can be valid but not reliable: (a) always false (b) always true (c) sometimes true (d) depends on the type of validity 32.The standard error of the measure provides an indication of:

(a) the accuracy of the correlation (b) the range of error round an individual value (c) the standard deviation of sampling error (d) none of these

33. If the SD = 6 and r = .64, what is the 95% range of estimate round a score of 100? (a) 103.6 96.64 (b) 107 93 (c) 101.96 98.04 (d) none of the above 34. If a test of 20 items has a reliability of .80, what is its reliability if it were three times as long? (a) 2.4. (b) 1.00 (c) .80 (d) .92 35. When computing the reliability from a narrow range of performance, would the reliability be higher, the same, or lower than that obtained if the performance range were larger. (a) higher (b) lower (c) the same (d) depends on N 36. Which of the following is true about integrative testing? A) Deconxtualization is emphasized in integrative testing. B) Skills are tested in isolation in integrative tests. C) Cloze tests and dictations are the examples of integrative tests. D) Integrative test directly assess students performance. 37. Some test writers believe that language is a system of separate categories such as phonemes, morphemes, words, etc. Which of the following is most probably practiced by these people? a. integrative approach b. discrete-point approach c. holistic approach d. analytic approach 38. Which of the following is used as a synonym for integrative tests? a. functional tests b. communicative tests c. pragmatic tests d. holistic tests 39. A language teacher has based his evaluation of students on observation and verbal or non-verbal descriptions such as letters of reference. What type of evaluation is this called? a. qualitative evaluation b. formative evaluation c. summative evaluation d. objective evaluation 40. In ---- tests, language skills can be separated and each skill can be tested separately in a successful manner. A) integrative B) dictation C) communicative D) discrete-point 41. Which of the following is one of the components of alternative assessment when compared to traditional?

A) Product orientation B) Fostering extrinsic motivation C) Summative assessment D) Norm-reference scoring E) Continuous assessment 42. Strategic competence is important in testing. A) computer-based B) dictation C) communicative D) discrete-point 43. Which of the following is true about the relationship between testing and evaluation? A) evaluation is more important for students than testing. B) evaluation is subordinate to assessment. C) evaluation is a subset of tests. D) evaluation is a both quantitative and qualitative. 44. What is the correct chronological order of the views on language testing when the historical development of testing is considered? I. Discrete point testing II. Integrative views on testing III. communicative testing

A) I, II and III B) II, I and III C) II, III and I D) III, I and II 45. Cloze tests and dictation are the representatives of ---- tests. A) norm-referenced B) standardized C) discrete-point D) integrative 46. A periodic achievement test in a course is an example of ---- and ---- assessment. A) formal / formative B) summative / formative C) informal / formative D) formal / summative 47. The real-life tasks compose the test tasks of the ---- tests. A) standardized B) discrete-point C) norm-referenced D) communicative 48. Which one of the following is not true about dictation? A) It is an integrative test. B) It requires careful listening. C) It is usually classroom-based. D) It is useful for all four language skills. 49. often means no more than that the assessment is carried out frequently and is planned at the

same time as teaching. A) Alternative assessment B) Formal assessment C) Standardized testing D) Norm-referenced tests 50. Which is true about cloze testing? A) They are difficult to score. B) They cannot be used to measure gain. D) The word deletion in cloze testing can be modified. E) Linguistic knowledge is enough to succeed in them. 51. Dictation is a familiar language-teaching technique that is both ---- and ----. A) direct / summative B) indirect / integrative C) direct / discrete point D) indirect / discrete point 52. Dictation with noise is an example of ---- testing. A) integrative B) discrete-point C) pragmatic D) computer-based 53. Which of the following is not true about alternative assessments? A) They are direct. B) They are authentic. C) They foster intrinsic motivation. D) They provide feedback and washback 54. Which of the following is true about norm-referenced and criterion referenced tests? A) Norm-referenced tests compare a students performance to that of his or her classmates. B) The purpose is to place students along mathematical continuum in rank order in criterion-referenced tests. C) Both norm-referenced and criterion referenced tests are standardized and large-scale. D) Norm-referenced tests are generally designed to give feedback on specific lesson objectives. 55. Which one of the sentences is not true about alternative assessment? A) It is always formative. B) It fosters intrinsic motivation. C) It focuses on the right answer. D) It is in free-response format. 56. What label is used to refer to the extent to which a test measures what it is supposed to measure? a. reliability b. validity c. practicality d. testability 57. What is the different between reliability and validity? A) Reliability is concerned with the consistency of measures, whereas validity is concerned with whether a measure of a concept actually measures the concept. B) Validity is concerned with the consistency of measures, whereas reliability is concerned with causality. C)Reliability is concerned with predictability, whereas validity is concerned with causality. D) None of the above 58. Which of the following choices best explains the concept of an operational definition? a. It supplies sufficient conditions for a variable b. It provides necessary conditions for a variable c. It provides necessary and sufficient conditions for a variable

d. It relates variables used in an hypothesis to measurable variables 59. Which of the following is a nominal variable? A) education. B) age. C) marital status. D) of the above. 60. Professor Shipley developed a new test to measure IQ. He claimed that using his test, someone with an IQ of 180 would be considered twice as intelligent as someone with an IQ of 90 and that someone with an IQ of 90 was three times as intelligent as someone with an IQ of 30. Professor Shipley's test treats IQ as A) nominal B) interval C) ratio D) ordinal 61. Reliability involves. A) whether a particular technique applied repeatedly to the same object would yield the same results each time. B) ensuring accuracy. C) ensuring that your measure measures what you think it should measure. D) all of the above. 62. Which of the following is true for a normally distributed set of data? a. mean = median = mode b. mean > median > mode c. mean < median < mode d. mean > median < mode 63. Which of the following is true for a data set that is skewed to the right? a. mean = median = mode b. mean > median > mode c. mean < median < mode d. mean > median < mode 64. Which of the following is true for a data set that is skewed to the left? a. mean = median = mode b. mean > median > mode c. mean < median < mode d. mean > median < mode 65. According to the empirical rule, what percentage of the data set is contained between z = -1 and z = 1? a. 50% b. 68% c. 75% d. 90%

66. According to the empirical rule, what fraction of the data values fall between z = 0 and z = -1?

a. 0.5 b. 0.68 c. 0.75 d. 0.34 67. According to the empirical rule for a normally distributed set of data, what fraction of the data values lie within 2 standard deviations of the mean? a. 0.98 b. 0.95 c. 0.75 d. 0.68 68. According to the empirical rule, which of the following statements is true about a normally distributed data set? a. All data values must lie within three standard deviations of the mean. b. All data values must lie within two standard deviations of the mean. c. All data values must lie within four standard deviations of the mean. d. Data values that are three standard deviations or more away from the mean could be found. For questions 69 73, pick the appropriate level of measurement. 69. Letter grades A, B and C awarded for a course. a. Nominal b. Ordinal c. Interval d. Ratio 70. The cost of a Toyota Corolla is $15,000. a. Nominal b. Ordinal c. Interval d. Ratio 71. My Social Security Number is 574 78 1590. a. Nominal b. Ordinal c. Interval d. Ratio 72. Gulf water temperature is 16oC. a. Nominal b. Ordinal c. Interval d. Ratio 73. Colors of cars. a. Nominal b. Ordinal c. Interval d. Ratio

74. What is the label used for the process that allows us to make judgment about the value of a measure? a. testing

b. evaluation c. assessment d. examination 75. Which of types of test is designed to predict a person's possible progress in achieving a skill or knowledge? a. proficiency test b. selection test c. prognostic test d. achievement test 76. Which of the following tests helps a teacher to find out about the strong and the weak areas in testees' knowledge? a. diagnostic test b. prognostic test c. progress test d. mastery test 77. Which of the following tests is used to determine the starting point for learners of English as a foreign language? a. achievement test b. power test c. speed test d. placement test 78. Which of the following methods of scoring a cloze test requires the examiner to base his judgment on the degree of response conformity to native-like norms? a. clozentropy b. standard method c. exact-word method d. most-frequent word method 79. Which of the following is most probably obtained if two tests are administered at the same time? a. predictive validity b. construct validity c. criterion-related validity d. face validity 80. Which of the following enable us to compare scores that have been obtained from different tests? a. raw scores b. percentile ranks c. standard scores d. cumulative frequencies 81. The split-half reliability of a certain test has been estimated to be .80. What is the reliability of the whole test? a. less than .80 b. more than .90 c. between .70 and .80 d. between .80 and .90

82. If we have a normal distribution, what percentage of scores fall between one standard deviation below and one standard deviation above the mean?

a. about 34% b. about 50% c. about 68% d. about 95%

line? a. about 34% b. about 20% c. about 68% d. about 3% 84. Which of the following tests can provide us with a scorer reliability of 1? a. composition tests b. cloze tests c. integrative tests d. multiple-choice tests 85. Which of the following is most appropriate to be tested by using dictation tests? a. listening comprehension b. speaking fluency c. spelling and grammar d. writing skill 86. Which of the following is increased if we increase the number of reading passages in a reading comprehension test? a. validity b. reliability c. readability d. practicality 87. A teacher believes that reading ability is a unitary skill. Which of the following tests is he more likely to use in his reading class? a. inferential method b. multiple-passage approach c. cloze procedure d. short-context technique 88. Which of the following tests is most useful when there is very little time for scoring? a. cloze b. composition c. true/false d. dictation 89. Which of the following is most appropriate when the number of testees is large and the time for scoring is short but the time for preparation is enough? a. interviews b. dictation c. cloze tests d. multiple-choice tests

90. Which of the following tests appears when the number of applicants is larger than the number of available positions?

a. selection test b. competition test c. mastery test d. power test 91. Which of the following is most appropriate if there is very little time for developing a test but plenty of time for scoring? a. composition b. multiple-choice c. cloze d. fill-in-the-blank 92. The following are the IF and ID of four different items. Which one is the most suitable item? a. .55 and .65 b. .75 and .15 c. .95 and .90 d. . 22 and .16 93. A normal curve has a number of properties. Which of the following is FALSE as one of them? a. The values of mode, mean, and median are the same. b. Most of the scores pile up in the center of the curve. c. Half of the scores are below and the other half are above the mean. d. Frequency of each score increases as we move away from the mean. 94. One hundred students have taken a test and 15 of them have obtained a score of94. No other score has a frequency as high as this score. What is this score called? a. median b. range c. mean d. mode 95. Which of the following methods of estimating reliability suffers most from practice effect? a. equivalent forms b. test-retest c. internal consistency d. split-half

96. Which of the following is main difficulty in estimating reliability through the alternate-form method? a. We cannot divide a test in half to prepare two forms that are similar. b .It is very difficult to prepare two equivalent forms of the same test. c. We cannot administer the test to the same people twice. d. It is difficult to have an equal number of items in both tests. 97. Which of the following refers to an unannounced, informal test of what has been covered in a previous session of a class? a. exam b. test batter y c. assessment d. quiz

98. What is the main function of the stem of a multiple-choice item? a. stating a specific point

b. implying a specific point c. expressing a specific point d. stating or implying a specific point 99. Where can we find most of the scores in a normal distribution? a. around the left tail of the curve b. around the right tail of the curve c. around the mean d. scattered evenly 100. One hundred students have taken a test. The cumulative frequency of the score 15 is 60. How many students have scored above this score? a. about 30 b. about 40 c. about 50 d. about 60