Reliability is the consistency or stability of a measuring instrument. Kerlinger defined it as “the proportion of the ‘true’ variance to the total obtained variance of the data yielded by a measuring instrument.” Operationally, this translates to the proportion of error variance to the total variance yielded by a measuring instrument subtracted from 1.00 (the index of 1.00 indicating perfect reliability).

Schwab defined reliability as referring to the ratio of “true” to total variance in a set of parallel measurements obtained on an individual. While Mitchell discussed reliability as the measure of correlation between maximally similar items and assesses the random or choice error.

It is also useful to think of reliability as:

VO = VT + Ve (Variance Observed = True Variance + error)

The four methods for assessing reliability are:

- Kuder-Richarson-21 (KR-21) is considered the most conservative estimate for reliability of an instrument with binary scoring or where the response scale is dichotomous (e.g., true or false). The Kuder-Richardson formulas are special cases of Cronbach’s coefficient alpha. Cronbach’s alpha is the most frequently used metric because it is a) conservative, b) easier and less time-consuming than test-retest or parallel forms, and c) does not involve cutting out data as in the split-half test.
- Split-half: This involves splitting the items into two halves, with the goal of obtaining two equal or equivalent halves. Each person will have two half-scores; the responses of both halves are compared (possibly using a Pearson correlation) to ensure internal consistency. This is fairly conservative because it will underestimate the true reliability, since it is only the correlation of two halves of the test. Splitting the sample may be difficult, and the assessment of reliability now depends upon fewer items.
- Parallel forms: This involves creating two measures that are equivalent, but not identical. This can be very time-consuming for the researcher. Each person would be subjected to measurements by both instruments. The score of these two measures is then compared for consistency. This measure is less conservative because there is the chance for fatigue or boredom when respondents must complete two measurement indices. (Test A and Test B).
- Test-retest: Used to measure the stability of a measure over time. This involves administering the same measurement instrument to the same group of people on two different occasions. This measure is less conservative when there is high attrition if the organism being measured goes through dramatic developmental changes from time 1 to time 2. Not a good way of computing the reliability coefficient if attrition is high or the organisms being measured will be going through a dramatic developmental change