·
Human
beings don’t behave exactly the same way on every occasion, even when
circumstances seem identical. The solution is to construct, administer, and
score tests in such a way that the results seem similar on different days/timings (Reliability).
·
Reliability Coefficient:
Two tests
on the same subject by the same group of students are compared (like validity coefficient). Ideal coefficient
is 1 (test which would give same results for a particular group of students
regardless of the day/time of test). 0 means unconnected results (score of one
day doesn’t predict score on another day)—it is different for different skills
(0.9 for vocab, 0.7 for oral test). Test
reliability (performance of candidates from occasion to occasion) and Scorer
reliability (scorer’s consistency) are interrelated: if scoring is
unreliable, test results are also unreliable. Objective tests (evaluated by computers) almost always give a
coefficient lower than subjective tests
(marked by a human)—since computers are perfectly reliable (consistent)
scorers.
·
How to make tests more Reliable:
Test
Reliability
Ø Take
enough samples of behavior: the
more items (questions, passages, etc) we have on a test, the more reliable that
test is because many samples of behavior are representative of a person’s true
behavior. However, tests shouldn’t be so long that students feel bored or tired
(unrepresentative of their ability).
Ø Restrict
freedom of candidates: students should not be given too many
choices because a very broad subject might have different results on different
occasions by same student (writing an essay on tourism vs. writing an essay on
tourism in Kashmir—views might change with changing conditions in Kashmir).
However, too much restriction might distort the task.
Ø Write
unambiguous items: don’t ask open-ended Qs that have double
interpretations or an answer different from the one anticipated by the
examiner.
Ø Provide
clear/explicit instructions: written/oral tests should have clear
instructions so that students attempt the test in accordance with the scorer’s
requirements.
Ø Ensure
legibility of tests: students should not be expected to do
unwanted tasks—correcting badly typed/handwritten tests (variation in font,
spacing, print, etc).
Ø Familiarity
of candidates with format/testing techniques: teachers
should tell students in advance with the test format + his/her requirements
Ø Uniform/non-distracting
administration: more differences in administrative
conditions will result in more differences in test results
Scorer
Reliability
Ø Use
items that need more objective marking: fill in the
blanks, MCQs, and where these are not possible (such as comprehension tasks),
ask direct/unambiguous questions
Ø Make
direct comparison between students: similar to restrict freedom of candidates. Scoring on one topic is more
reliable than giving students choice to write on any one of four topics and
then comparing results.
Ø Scoring
key: examiner should anticipate all different
answers/apchs of students. His scoring key should clearly state the points of
totally correct and partially correct answers.
Ø Train
scorers: scorers scoring patterns should be analyzed
from time to time to ensure consistency
Ø Identify
candidates by number: candidates’ names/photograph should not be
mentioned while marking to ensure objective scoring (because of the scorer’s
prejudice for certain names, nationality, or gender).
Ø Employ
multiple/independent scoring: type of syndicate marking. This way, a
senior colleague can investigate discrepancies between scorers
·
Reliability and Validity: a
valid test must be reliable, but a reliable test may not be valid, e.g. a writing
test requiring its students to write down translation equivalents of 500 words
might be reliable (result will be the same/similar on different occasions), but
not valid for a test of writing (doesn’t really measure a student’s writing
ability).