Nguyeãn Aùi Hoaøng ChaâuLöõ thò Ngoïc Lan
Voõ Duy MinhBuøi thò Minh
NguyeätNguyeãn Hoàng Leä NgoïcLeâ Ñöùc Thònh
What is the reliability?
How to identify the reliability of a test
• stable over time
• consistent in terms of the content sampling
• free from bias
A test-taker when re-examined with the same test on different occasions, or with different sets of equivalent items, or under variable examining conditions will have the same score
good validity poor reliability
poor validitygood reliability
poor validity poor reliability
good validitygood reliability
a type of validity evidence a posteriori validity evidence
“scoring validity” a measure of the stability of test scores a prerequisite for measurement validity
Test-retest reliability
Internal consistency
Marker reliability
Parallel forms reliability
Types of scoring validityTypes of scoring validity(methods of estimating (methods of estimating reliability)reliability)
using one test twice for the same test-takers
the period between the two tests is long enough for test-takers to forget the test but not too long
between the two tests, no lesson is given
Test-retest reliabilityTest-retest reliability
use two different but equivalent forms of the test to the same test-takers
the two tests can be applied in close succession
Parallel forms reliability
Leâ Ñöùc Leâ Ñöùc ThònhThònh
Internal Internal consistencyconsistency
A variation of parallel forms reliability; Using parallel statistic on one test
and dividing the test into two halves for statistics or estimating the correlation of each items in the test with another;
Focuses on the consistency with each other of a test’s internal elements.
Internal consistency Internal consistency correlationcorrelation
Split half reliability
Average inter-item correlation
Average item-total correlation
Internal consistency Internal consistency correlationcorrelation
Split half reliability
mea
sure
mea
sure
Item 01Item 01
Item 02Item 02
Item 03Item 03
Item 04Item 04
Item 05Item 05
Item 06Item 06
Item 01Item 01 Item 03Item 03 Item 04Item 04
Item 02Item 02 Item 05Item 05 Item 06Item 06
.87
Item 05Item 05
Item 02Item 02 Item 04Item 04
Internal consistency Internal consistency correlationcorrelation
Average inter-total correlation
mea
sure
mea
sure
Item 01Item 01
Item 02Item 02
Item 03Item 03
Item 04Item 04
Item 05Item 05
Item 06Item 06
i1 i2 i3 i4 i5 i6
i2
i3
i4
i5
i6
i1 1.00
.89
.91
.88
.84
.88
1.00
.92
.93
.86
.91
1.00
.95
.92
.95
1.00
.85
.87
1.00
.85 1.00
.90
Internal consistency Internal consistency correlationcorrelation
Average item-total correlation
mea
sure
mea
sure
Item 01Item 01
Item 02Item 02
Item 03Item 03
Item 04Item 04
Item 05Item 05
Item 06Item 06
i1 i2 i3 i4 i5 i6
i2
i3
i4
i5
i6
i1 1.00
.89
.91
.88
.84
.88
TotalTotal .84
1.00
.92
.93
.86
.91
.88
1.00
.95
.92
.95
.86
1.00
.85
.87
.87
1.00
.85
.83
1.00
.82 1.00
.85
Internal consistency Internal consistency estimationestimation
Excel correlation
Kuder-Richarson 20 or 21
Cronbach’s alpha
Internal consistencyInternal consistency
Advantages : Saving time and expenses; Higher value compared with the test-retest and
parallel forms
Disadvantages : Lack of temporal stability of the scores as they
result from a single administration of the test; Not easy to determine the level of difficulty of the
items; The items in one half may not be equivalent to the
items in the other half.
Threats to test reliabilityThreats to test reliability
environmental factors; construct, content, theory-based validity; define the level of difficulty/ease of the
items define the level of difficulty in reading
texts and their questions
Buøi thò Minh NguyeätBuøi thò Minh Nguyeät
Marker reliabilityMarker reliability
relate chiefly to tests in which samples of writing or speaking are produced
the consistency of the marker(s)
Marker reliabilityMarker reliability
intra-rater reliability : each marker needs to be consistent within himself/herself
inter-rater reliability : markers need to be consistent with each other
How to improve marker How to improve marker reliabilityreliability
Have explicit agreed criteria for carrying the marking task Analytic scales Holistic scales
Standardization Moderation of scores (Multi-faceted
Rasch - MFR)
Thank you for listening
Top Related