Shelving Standardized Tests

Sunday, February 14th, 2021
Comments Off on Shelving Standardized Tests

In previous writings I detailed some of the issues associated with using results from state standardized tests and most other commercial standardized testing products to draw conclusions about student learning, teacher effectiveness, and the overall quality of schools. In this post I restate my rebuttals about four common claims made by proponents of using standardized tests to make important educational decisions.


Standardized tests are diagnostic and the results provide educators with important information about student mastery of academic content.

Claims about the diagnostic properties of the standardized tests used in schools today are not accurate. In order for a test to be diagnostic at the individual skill level, the test must include about 20 to 25 questions per skill (Frisbie, 1988; Tanner, 2001). That is because the reliability of the diagnosis must be high. We, as educators, want to be confident that the results we use to make potentially life changing decisions about students are reliable. The results from standardized tests must have reliability figures of around 0.80 to 0.90 at the individual skill level, to diagnose a student’s achievement of a specific skill.

The test must include about 20 to 25 questions per skill to attain that level of reliability.  (Frisbie, 1988; Tanner, 2001). YES, most standardized tests in use today have OVERALL TEST reliabilities of .80-.90, but NOT at the individual skill level. No test currently used by any state for accountability purposes has 20-25 questions per skill or curriculum content standard.  

For example, to diagnose student learning related to inferential comprehension, there would need to be about 20-25 questions related to that specific skill. The tests do not have enough questions to diagnose student achievement at the individual level in any of the tested skills or standards. Thus, any “diagnostic” decisions made from state test results, or any standardized test that does not have enough questions to be diagnostic at the individual student level, will be potentially flawed.

Another issue that prevents the test results from being diagnostic is the timeframe in which educators receive the results. The tests are SUMMATIVE monitoring devices, not tools to inform classroom instruction. By the time the results are returned to educators the school year is almost over, or their students have already transitioned to the next grade level.  The results from diagnostic testing in other professions, such as medicine or psychology, are generally received the day of the test or within a week. Waiting months for results about students that teachers no longer teach does not provide diagnostic information. At that point, the information is a historical artifact.

A final issue that prevents the results from standardized tests from state tests from providing diagnostic information is that parents, teachers, and students are not able to see every question from the tests. Officials from the various state education agencies have said it is cost prohibitive to release all the items. They make some items available, but for a parent or teacher, “some” items provide only “some” of the information needed to make a thorough determination about strengths and challenges faced by individual students.

For example, the lack of access to actual standardized test items is similar to a situation in which a child’s classroom teacher sends home only 10% or 20% of the questions from the weekly tests along with the child’s final grade from the test, followed by a determination of his or her overall achievement in the subject as a whole. The parent and student have no way to target their practice to improve because they don’t know which areas are strong and which areas need practice because they do not know which questions were correct or wrong.   


PARCC provides valid results of what students know and can do in English Language Arts and Mathematics.

Unfortunately, results from standardized tests most often provide more information about the family and community economic conditions in which a student lives than about how much a student knows or can (e.g. Maroun, 2018; Tienken et al., 2017). In this way, standardized tests are discriminatory against students from poverty and immigrants. Colleagues and I completed several studies from states around the country, as have other researchers, in which the results of standardized tests for schools and districts were predicted by factors outside the control of the school. We were generally able to predict the percentage of students passing at the school or district level for more than 70% of our samples- knowing nothing about the schools themselves.

If the results from standardized tests can be predicted with a moderate level of accuracy, how valid are those results for making important decisions about the quality of teaching and learning that occurs in schools and school districts?


Standardized test results can tell parents, students, teachers, and the public whether students in Grades 3–8 and high school are college and career ready.

This is an irresponsible claim, especially given that even the SAT cannot predict very accurately which students will do well their first year of college and beyond (College Board, 2012). In fact, a student’s high school GPA has consistently been a more accurate predictor of first-year college success and completion than the SAT (College Board, 2012). The claim about college readiness would actually take about 13 years to be validated—the amount of time an average general education student spends in public school before attending college.

If state mandated standardized tests can determine college readiness, then why don’t any four-year colleges accept the results for admissions purposes in lieu of SAT or ACT scores? Should students who score proficient on their state standardized tests but are not ultimately accepted to a competitive college be allowed to sue their state education agency for false advertising?


State standardized tests assesses important 21st-century skills and knowledge.

Unfortunately, state standardized tests mostly measure 19th-century skills with a 20th-century tool: the computer. Most state tests are still aligned to the Common Core State Standards, and those standards mandate knowledge and skills that are not much different than those taught for the last 150 years (Mathis, 2010). Although some states changed the names of their curriculum content standards, the majority of the content is still based on the Common Core.

A lot of claims have been made about the Common Core State Standards being more rigorous than curriculum standards that preceded them, but those claims most often come from one privately funded report by a pro-Common Core think tank. Sure, the Common Core State Standards include verbs like “analyze” in some places, but when taking a closer look, one notices that students are required only to analyze for one right answer. Common Core State Standards contain less complex thinking than some states’ previous curriculum content standards (Sforza, Tienken, & Kim, 2016). In fact, many of the Common Core State Standards and almost all of the mathematics and reading questions on state tests require students to find one best answer. To support their claim of increased rigor, questions found on state tests inflate the difficulty of the questions through the use of contrived directions and multistep, difficult to follow tasks.

Any test based on the Common Core State Standards will not be able to assess skills that associate with success in the knowledge economy because those standards do not include the necessary content (Tienken, 2017; Tienken 2011).


It seems as if some state education bureaucrats and so-called education reformers are trying to “teacher-proof” assessment through the use of standardized tests. You can’t take the teacher out of the assessment equation. A machine or a “canned” assessment program cannot replace the teacher. Over time, teacher assessments provide more detailed and actionable information than standardized tests. Teacher assessments result in less time spent on “test prep” and more time spent on learning. Teacher assessments employ an approach known as “assessment for learning” whereas high-stakes standardized tests rest on a mechanistic foundation of “assessment of knowledge acquisition” that is akin to weighing children instead of feeding them.

Large-scale projects like the current New York Performance Assessment Consortium (NYPAC) and the former Nebraska STARS statewide assessment program provide blueprints of how to balance accountability with authentic learning and assessment without inundating children and teachers with standardized tests. State education leaders can easily use the NYPAC and STARS blueprints to apply for changes to their state ESSA. The only existing standardized test I have found that assesses problem solving and higher order thinking in authentic contexts is the CWRA+ assessment from the Council for Aid to Education (CAE). The test includes a problem-based assessment portion that requires higher order thinking on the part of the student and provides important feedback about critical thinking skills.  

However, I believe strongly that the ultimate assessment system already exists in public school classrooms: the teacher. Assessment should not be outsourced to computers or corporations. State and federal education leaders should invest in developing teachers’ assessments skills instead of spending millions of dollars outsourcing assessments to commercial companies whose tests do not provide usable information for teaching and learning.


College Board. (2012). 2012 college-bound seniors: Total group profile report. New York, NY: Author. Retrieved from

Frisbie, D. A. (1988). Reliability of scores from teacher-made tests. Educational Measurement: Issues and Practice, 7(1), 25–35.

Maroun, J. (2018). The predictive power of out-of-school community and family level demographic factors on district level student performance on the New Jersey PARCC in Algebra 1 and Grade 10 English language arts/literacy (Doctoral dissertation, Seton Hall University). Retrieved from

Mathis, W. J. (2010, July). The “Common Core” Standards Initiative: An effective reform tool? Boulder, CO & Tempe, AZ: Education and the Public Interest Center & Education Policy Research Unit. Retrieved from

Sforza, D., Tienken, C. H., & Kim, E. (2016). A comparison of higher-order thinking between the Common Core State Standards and the 2009 New Jersey Content Standards in high school. AASA Journal of Scholarship and Practice, 12(4), 5–31.

Tanner, D. E. (2001). Assessing academic achievement. Boston, MA: Allyn & Bacon.

Tienken, C.H. (2017). Defying standardization: Creating curriculum for an uncertain future. Lanham, MD: Rowman and Littlefield.

Tienken, C.H. (2011). Common core state standards. An example of data-less decision making. AASA Journal of Scholarship and Practice:

Tienken, C. H., Colella, A., Angelillo, C., Fox, M., McCahill, K. R., & Wolfe, A. (2017). Predicting middle level state standardized test results using family and community demographic data. Research in Middle Level Education, 40(1), 1–13.

Post adapted from Parking PARCC Claims in the Dumpster of Failed Reforms, 2019, Kappa Delta Pi Record, 55, 57-59.

Christopher H. Tienken ©2012 Copyright. All Rights Reserved.