Psychometric Review of the Maryland School Performance Assessment Program (MSPAP)

In 1989, the Governor’s Commission on School Performance (Sondheim Commission) published a report on what Maryland should do to improve its public schools. One of the recommendations was the establishment of a statewide assessment. What followed from the recommendation was an annual statewide assessment, the Maryland School Performance Assessment Program (MSPAP), covering six content areas and administered to students in grades 3, 5, and 8. Today, the MSPAP is a major component of the state’s accountability program for schools.

Main Features of the MSPAP

The MSPAP has a number of important features (features less common in other states or unique to Maryland are noted with an asterisk):

Student assessment is based on tasks that are multi-step, multi-question activities built around a common theme.*
Subject-matter is commonly measured through activities that integrate several subject areas. MSPAP tasks typically try to assess more than a single subject. Thus, a MSPAP task may try to measure student learning in, for example, mathematics, reading, and science.*
Many of the tasks have pre-assessment activities. These activities are not scored but are intended to acquaint students with the task itself and the sort of work required in the scored portions of the task. Either a large group of students or randomly formed smaller groups of students from each class do these pre-assessment activities.*
Students are always asked to construct their answers. This means that multiple-choice and true-false items are not used in the assessments.*
The MSPAP is criterion-referenced–this means that students are judged not against each other, but against standards of performance that are set in relation to well-defined domains of content measured by the assessments. In theory, every student could be placed in the highest performance category, or every student could be placed in the bottom performance category. Placement is based on test performance, and there are no quotas on the number of students who can or should be placed in each performance category.
Scoring is sometimes based on “patterns of student responses” across related sets of activities as opposed to simply summing up points for correct work on individual questions. This means that a score assigned to a student may be based on his/her responses to a number of questions, rather than scoring each question independently of the others.*
Individual students in the same subject but in different grades cannot be compared because of the design of the assessments. The significance of this feature is that gain or change scores for students over school years cannot be meaningfully interpreted.
At each grade, 20 tasks are used to assess student and school performance across the six content areas. Students are assigned on a random basis to take only six or seven of the tasks. Via “statistical equating,” the 20 tasks from one year’s assessment can be linked to the 20 tasks in the previous year’s assessment so that changes in the level of school, district, and state performance over time can be judged.
Approximately two-thirds of each assessment are tasks that have been administered in previous years.*
All schools are working against the same performance targets. Neither race nor family educational background of students, for example, is factored into the scoring process.
The assessment is principally intended to provide a measure of school-wide performance, not individual or classroom performance. Individual scores are available but not recommended for use by the Maryland Department of Education.
MSPAP scores can have both positive and negative consequences for individual schools.
MSPAP is intended to provide information for judging school achievement and growth and to provide data to improve classroom instruction.

MSPAP is unique among state assessment programs, and it has been in place for 10 years. Many of the features above, but especially features 1, 2, 3, 4, 6, and 9, are less common in other state assessments around the country. In some states, the focus may be on paper and pencil tests to reduce testing program costs or to deliver scores back to schools quickly. In other states, the focus may be on the assessment of basic skills and multiple-choice test items are quite adequate to assess the curricula. In many states, state law dictates that individual scores be provided. There is no shortage of reasons for differences in assessment programs across states.

The Abell Foundation commissioned two groups — a content panel consisting of subject-matter specialists and a psychometric panel consisting of psychometricians and experts in assessment — to investigate a number of important issues concerning the MSPAP. The questions and recommendations from the psychometric review panel follow in this Abell Report.

Please note that the report does not include the cited appendices. We apologize for any inconvenience this may cause.