Test Analysis

DOE-HDBK-1205-97

6. TEST ANALYSIS

Because tests are used to qualify trainees to do a job or task, it is important that they are

developed properly. If tests are constructed systematically and administered correctly, they

will have a high degree of reliability. The quality and effectiveness of tests should be

continuously monitored and improved where necessary. Analysis of test results provides

important input to the quality and effectiveness of tests. Whereas most instructors and test

developers are not required to perform complicated statistical analyses, an understanding of

some basic concepts is beneficial in interpreting and refining the testing process.

6.1 Reliability

Reliability is functionally defined as the consistency between two separate measurements

of the same thing. If a test gives perfectly consistent results, it would be perfectly reliable.

Reliability is generally not a problem with performance tests as long as conditions in the

evaluation situation remain constant. Reliability can be a problem with written tests

because test item construction can be difficult. Reliability can be affected by ambiguous

test items, multiple correct answers, typographic errors, adverse testing conditions,

interruptions, limited time, and complicated answer sheets. Trainee readiness and scoring

errors also affect test reliability.

The following examples illustrate how reliability or unreliability may be indicated as tests are

analyzed.

Example: Ten trainees were given test A on Monday and then again on Tuesday.

Assuming that nobody forgot anything overnight, the Tuesday test results

should be exactly the same as the Monday test results if test A is reliable.

Any significant difference would indicate test unreliability since nothing changed from

Monday to Tuesday. This is a form of test-retest reliability. The time period for this type of

reliability is variable. Longer time periods generally result in greater differences in test

results, but long time periods can determine the long-term stability of the test.