SWN (1996) — Chapter 6
Chapter 6
The author argues that the current campaign against objective testing is a misguided extension of Romantic progressivism that undermines educational excellence and social equity. While acknowledging minor abuses, Hirsch asserts that standardized tests are indispensable for providing incentives, monitoring progress, and ensuring fairness for all students.
Argument Chains (34)
How the chapter's premises build toward conclusions. Each chain shows a line of reasoning from top to bottom. Click any node for full evidence and counter-arguments.
The Defense of Test Validity strong
The repudiation of objective tests is an integral part of a Romantic progressivism dating back to the 1920s.7 ev
↓
Standardized tests effectively measure the ability to analyze, synthesize, and draw generalizations.
↓
The decline in standardized test scores in culturally homogeneous areas like Iowa refutes the claim that the tests are biased toward white middle-class culture.
↓
The objection that standardized tests contain complex language is an admission that schools are failing to teach students how to read complex prose.
↓
The larger, frequently repeated criticisms of objective tests are not valid.5 ev
The Reliability Crisis in Grading strong
Teacher-graded student work is highly subjective and inconsistent, often resulting in different grades for the same quality of work.
↓
Experienced teachers grading essays rarely achieve a correlation greater than .40 in their grading.
↓
The agreement achieved between graders in professional performance assessment sessions is arbitrary and unstable.
↓
Graders disagree fundamentally on the relative weights that should be awarded to different elements of a performance, such as ideas versus organization.
↓
An inexpungible arbitrariness lies at the heart of grading performance-based assessments.1 ca
The Technical Superiority of Objective Sampling strong
A test is primarily a sampling device, and its fairness depends on the size and representativeness of the sample taken from the domain.
↓
Objective tests sample a larger and more representative variety of factors within a skill or field of knowledge than performance tests of the same length.
↓
A multiple-choice test is a more valid and reliable measure of writing ability than a performance-based test of equal length.
↓
Multiple-choice sections achieve higher accuracy and fairness at a lower cost than multiple essay readings.1 ca
↓
For high-stakes summative tests, objective tests are superior to all other available systems despite their surface-level flaws.1 ca
Accountability and Reading Success strong
California's decline to the bottom of national reading scores resulted from combining nonaccountable matrix testing with naturalistic reading instruction.
↓
Standardized testing is an absolute necessity for holding individual teachers and students accountable for reading progress.1 ca
↓
Every child who is not organically impaired can be brought to reading competence by the end of first grade.
↓
Modern educational systems have a duty to provide ongoing diagnostic tests and a standardized reading test for every child at the end of first grade.1 ca
↓
Providing tests without guidance or guidance without tests constitutes an abandonment of adult responsibility to children.
Format Invariance Argument strong
The test format (multiple-choice versus other types) does not significantly influence what a test actually measures.
↓
Tests of mathematical reasoning measure the same attribute regardless of whether the format is multiple-choice or another type.
↓
There is no evidence of 'format factors' in testing; multiple-choice and open formats measure the same attributes.1 ca
↓
Research rejects the claim that the multiple-choice format is condemned to probe only surface-level facts rather than deep understanding.
↓
Critics of standardized testing have failed to demonstrate that the multiple-choice format itself causes tests to probe only lower-order skills.1 ca
The Fallacy of Test-Driven Decline strong
Since 1970, basic test scores have increased slightly while higher-order thinking skills have declined in virtually all subject areas.
↓
The argument that multiple-choice tests caused the decline in higher-order skills is a 'post hoc, ergo propter hoc' fallacy.
↓
The NAEP's ability to measure the decline in higher skills proves that multiple-choice tests are capable of measuring those skills.
↓
The basic-skills movement neglected higher-order skills because states provided no incentives for teaching them and tests remained focused on lower skills, not because test formats were incapable of measuring higher skills.
↓
The decline in higher-order skills was caused by state tests focusing on lower skills and providing no incentives for teaching higher ones.1 ca
The Competence-Justice Chain strong
Standardized multiple-choice tests, such as the AFQT, accurately probe real-world competencies.
↓
Standardized tests are not technically biased against any group because they correlate with actual economic achievement and job performance regardless of race.
↓
The wage gap between races closes dramatically when black-white comparisons are made based on actual educational achievement rather than nominal education level.
↓
Real social justice depends more on real-world competencies and economic improvement than on school grades or adjusted test scores.
↓
Adjusting test scores to achieve equity is a practical injustice to disadvantaged students because high scores without real competency do not lead to social equality.1 ca
The Competence Gap Mechanism strong
Students from middle and upper classes become more competent in school because their home-provided 'intellectual capital' allows them to fill in the gaps of a mediocre, repetitive school system.
↓
In a mediocre school system, the competence gap between social classes expands as students progress through school.
↓
American public elementary schools provide neither equality of opportunity nor equality of result.
↓
A coherent school system with definite year-by-year goals for all students makes systematic compensation possible and narrows the competence gap.1 ca
↓
Genuine equality of educational opportunity is impossible without effective, early compensatory measures that address the initial differential in intellectual capital.
Critique of Progressive Assessment Alternatives strong
Authentic assessments are claimed by proponents to be worth teaching to and capable of measuring higher-order skills while defeating negative test-preparation effects.2 ev
↓
There is an inadequate evidentiary basis for the claims that performance-based assessments yield accuracy, fairness, and educational improvement.1 ca
The Empirical Defense of Grading strong
Students who take courses for a grade study harder and learn more than students who take courses for intrinsic interest alone.1 ca
↓
The scientific evidence regarding pass-fail grading systems directly contradicts the claim that giving marks inhibits learning.
↓
Psychological research contravenes the progressive claim that learning under external incentives is superficial or short-lived.
↓
Tests and grades strongly contribute to effective teaching.
The Domain-Specific Nature of Literacy strong
Cognitive load constraints in working memory explain why performance declines when a student is unfamiliar with a topic.
↓
Performance quality is primarily determined by a student's level of familiarity with the specific task or subject matter.
↓
Individual performance varies significantly based on the specific task (task variability) rather than just general ability.
↓
There is no such thing as generalized, homogeneous reading or writing ability.1 ca
Social Responsibility and Goal Setting strong
Using off-the-shelf tests as primary tools of reform enables policymakers to evade difficult decisions regarding educational goals.
↓
Legislators and the public use test scores as a blunt and ineffective instrument of compulsion because they lack knowledge of what is being tested.
↓
Deciding educational and social goals is the duty of society as a whole, not test makers.
↓
Society—including parents, teachers, and representatives—has the duty to decide educational goals, not test makers.
The Mechanism of Test Corruption strong
The root cause of score inflation is teachers focusing instruction narrowly on the specific items they know will be tested.
↓
There exists a tacit collusion between administrators and teachers to keep test scores inaccurately high.
↓
Familiarity with specific test items leads to scores that significantly overstate a student's actual knowledge and skill.
↓
The only truly convincing educational objection to standardized tests is the practice of teaching narrowly to the specific test content.
The Equity-Testing Chain strong
Both the Taiwanese 'meritocratic' system and the Japanese 'egalitarian' system are fairer than the American system because they achieve higher competency and leave fewer students behind.
↓
No educational system has achieved or can achieve excellence and equity without effective monitoring and high incentives, including high-stakes testing.
↓
High-stakes tests serve functions—gatekeeping, monitoring, and incentives—that are essential to social fairness.1 ca
↓
Fairness in testing is inextricably linked to fairness and excellence in schooling.1 ca
The Schooling-Testing Fairness Link strong
Fairness in testing is inseparable from fairness in the quality of schooling provided.
↓
The principal unfairness in testing is the failure of the school system to prepare students for the competencies the tests measure.
↓
The charge of cultural bias in tests and schooling is highly misleading or irrelevant because the root of unfairness is historical and economic, not cultural.1 ca
The Necessity of Monitoring for Equity moderate
Learning requires effort.4 ev
↓
High-consequence tests act as spurs to student effort.3 ev
↓
Objective tests function as achievement incentives for students and teachers.4 ev
↓
Effective teaching and educational administration are impossible without effective monitoring through testing.7 ev
↓
Objective tests are needed for academic fairness and social equity.4 ev · 1 ca
↓
Objective tests are necessary in the American context to achieve excellence and fairness.5 ev · 1 ca
The Genealogy of Anti-Testing moderate
Progressive educational doctrine has been consistently opposed to testing and grading since the 1920s.
↓
The abolition of tests and grades in progressive schools is a logical extension of discarding a subject-matter curriculum.
↓
Giving number or letter grades to students is considered a fundamental educational mistake in the progressive-Romantic view.
↓
The opposition to grades is rooted in Romantic egalitarianism, which seeks to avoid any system where students are labeled as losers.
↓
Educational reformers are attempting to 're-educate' the public to favor equal results (substantive equity) over meritocratic opportunity (procedural equity).
The Practical Failure of Performance Assessment moderate
An inexpungible arbitrariness lies at the heart of grading performance-based assessments.1 ca
↓
A single writing sample is an inaccurate and misleading measure of a student's average ability.
↓
Performance-based assessments graded by raters have a low likelihood of accurately assessing a student’s average ability to perform.
↓
The financial costs of large-scale performance-based testing are sizeable.
↓
The unreliability of scoring in large-scale performance-based systems (like Vermont's) renders them useless for most intended educational purposes.
The Policy Case for Balanced Testing moderate
Multiple-choice sections achieve higher accuracy and fairness at a lower cost than multiple essay readings.1 ca
↓
The primary advantage of including an essay in a standardized test is the pedagogical model it provides rather than its statistical validity.
↓
An essay requirement signals to students that actual writing skill is a vital outcome of instruction and to teachers that multiple-choice success is insufficient evidence of effective teaching.
↓
A combination of multiple-choice items plus one essay is the best and fairest test of writing ability.
↓
Effective testing policy should balance performance-based and multiple-choice tests rather than relinquishing either.
The Validity of Objective Testing moderate
The creative and constructive side of writing is not well sampled even in the best performance tests because of time and topic constraints.
↓
Objective English questions do not focus exclusively on superficial aspects of writing ability.
↓
A sequence of objective questions can gauge the depth and breadth of knowledge through ingenuity and wit.
↓
Excellence in editing and excellence in writing are inextricable, as best writers edit mentally before writing.1 ca
↓
The belief that multiple-choice tests inherently impose superficiality and rote memorization is a form of stereotyping that ignores their substance.
The Marketplace Hybridity Chain moderate
The marketplace functions as a commons that erases ethnic distinctions.
↓
Standard written forms of major languages like Koine, Latin, and English are hybrids created by the marketplace to enable communication between strangers.
↓
Mainstream hybrid culture is a tool for communication between strangers rather than an essential, identity-defining culture of a specific group.
↓
Modern American school culture is a hybrid rather than a pure 'Eurocentric' or 'Anglocentric' tradition.
↓
Effective classroom schooling must be monocultural to allow all students to participate in the public marketplace.1 ca
The Parallel Form Solution moderate
The use of multiple parallel test forms is financially feasible for any school district because the main cost is grading, not printing.
↓
The use of multiple parallel test forms is an effective solution to the problem of teaching narrowly to a specific test.1 ca
↓
Test misuse can be easily prevented by delaying the selection of the specific test form until the last possible minute.
↓
If students are required to be prepared for a large number of parallel test forms, teaching to the test becomes equivalent to teaching the entire subject domain.1 ca
Defense of Standardized Testing moderate
Even the lowest-quality standardized reading tests correlate well with high-quality ones and real-world reading abilities.1 ca
↓
'Authentic' types of reading tests are uncontrolled, unfair, and unable to replicate real-world reading performance.
↓
No school-based performance can truly reproduce performance in the real world.
↓
Standardized tests are the most effective instruments available for demonstrating and measuring the corrupting influence of test misuse.
Practicality and Fairness of Alternatives moderate
Expertise is not a general trait; performance in one context does not predict performance in other contexts.
↓
Fair and accurate performance tests of productive skills have consistently failed to materialize even in high-stakes fields like medicine.
↓
Proposed performance tests are currently neither reliable nor fair as alternatives to standardized tests.
↓
Abolishing standardized tests would lead to less equity in schools and lower student competence.1 ca
The Performance Validity of Multiple Choice moderate
Relevant knowledge acts as a shortcut in real-world problem solving regardless of the test format used.
↓
Standardized reading tests require active production because students must productively read and comprehend passages to answer questions.
↓
Solving complex multiple-choice math problems requires the productive ability to determine which computation is applicable, not just computational skill.
↓
A well-constructed standardized test of math or reading is inherently a performance test requiring information integration and idea generation.1 ca
The Economic Reality of Testing moderate
The LSAT is statistically accurate in predicting grades for both Black and White law students.
↓
American public education has a differential effect on social classes, and consequently on ethnic and racial groups who belong disproportionately to disadvantaged classes.
↓
Focusing on equality of test results while ignoring underlying student incompetence and school ineffectiveness is a misguided approach.
↓
Balkanized schooling and testing will fail to lead to social and economic equity.
The Anti-Romantic Motivational Chain moderate
Educational achievement requires extrinsic motivation, discipline, toil, and sweat.
↓
High-stakes tests are effective in motivating students to work hard.1 ca
↓
The Romantic idea that learning is natural and motivation is internal is an illusion.
↓
The belief in natural learning is a barrier to social justice because disadvantaged students must be motivated to work harder than advantaged students.
The Democratic Common School Chain moderate
The compensatory equalization of intellectual capital was a core element of the Jefferson-Mann vision of the democratic common school.
↓
Educational principles that elicit high performance from advantaged students also elicit high performance from disadvantaged students.
↓
High average achievement in national systems is explained by bringing all children to a required high level, allowing the class to move forward collectively.
↓
Explicit, high achievement requirements for all students ensure that disadvantaged students gain essential intellectual capital not provided by their homes.
The Compensatory Potential of Schooling moderate
Feasibility of Assessments moderate
The term 'standardized test' should be defined specifically as any test that yields the same score for the same performance regardless of the scorer.
↓
The push for nonstandardized tests is a strategy to ensure that all demographic groups appear to perform the same.
↓
Performance-based assessment is not feasible for large-scale K-12 testing because it cannot be accurate, fair, and cost-effective simultaneously.
The Justification for Standardized Testing moderate
The unreliability of scoring in large-scale performance-based systems (like Vermont's) renders them useless for most intended educational purposes.
↓
Performance-based assessments graded by raters have a low likelihood of accurately assessing a student’s average ability to perform.
↓
Standardized tests have considerable strengths and are superior to performance tests for summative evaluation when properly used.1 ca
The Scapegoating Argument moderate
Foundations of Higher-Order Thinking moderate
The Political Critique of Anti-Testing Motives weak
Fair and accurate performance assessments cannot be achieved on a large scale at a reasonable cost.
↓
For high-stakes summative tests, objective tests are superior to all other available systems despite their surface-level flaws.1 ca
↓
Antitesters exploit repellent features of objective tests, such as student fatigue and the perception that they don't measure higher-order skills.
↓
The ongoing attack on multiple-choice tests is a smear campaign designed to discredit all reliable means of accountability for students and schools.1 ca
Counter-Arguments (33)
empirical challenge (3)
While content familiarity affects performance, statistical 'g' factors and literacy research suggest a substantial underlying general ability that allows proficient readers to decode and analyze even unfamiliar material.
Standardized testing for six-year-olds is unreliable due to the highly variable nature of early childhood development, leading to false negatives and mislabeling of children.
Early compensation within schools cannot overcome the 'Matthew Effect' where children from high-capital homes continue to gain knowledge at a faster rate than the school can compensate for, making the narrowing of the gap a mathematical improbability.
alternative explanation (10)
Standardized tests create a 'washback' effect where teachers narrow the curriculum only to what is tested, effectively reducing educational 'excellence' to test-taking proficiency.
The attack on standardized tests is not scapegoating but a legitimate response to 'curriculum narrowing,' where teachers only teach what is tested, thereby excluding important but non-tested subjects like art or civics.
What the author calls 'arbitrariness' is actually 'expert holistic judgment,' which captures nuances of quality that standardized metrics systematically ignore.
+ 7 more
value disagreement (8)
Performance-based assessments (portfolios) provide deep qualitative insight into student thinking that multiple-choice tests fundamentally cannot capture, regardless of inter-rater reliability.
While students may study harder for grades, this 'extrinsic' motivation can displace 'intrinsic' motivation, leading students to stop learning as soon as the grade is awarded, whereas interest-driven learning is self-sustaining.
Cost-effectiveness should not be a primary criterion for fairness if lower-cost tests systematically disadvantage students who think non-linearly or lack specific cultural proxies found in multiple-choice stems.
+ 5 more
methodological concern (8)
Standardized tests may be reliable (consistent), but they lack 'construct validity'—they measure the ability to take a test rather than the actual ability to write or think in real-world contexts.
Objective tests measure 'editing' and 'recognition' rather than the 'generative' and 'organizational' skills required for real-world writing, meaning high correlations may be deceptive.
The 'Editing' items in objective tests (like correcting 'we spectators') measure sociolinguistic conformity (standard dialect) rather than the structural and rhetorical 'excellence' associated with great writing.
+ 5 more
scope limitation (4)
Even if tests are objectively scored, they reflect the 'opportunity gap'—the unequal distribution of resources and knowledge in society—thereby legitimizing existing social inequalities rather than curing them.
High-stakes testing environments inevitably lead to 'curriculum narrowing' where teachers cut subjects not tested (like art or music) to focus on the parallel forms of tested subjects.
Standardized tests create a 'ceiling effect' where students are only required to reach the level of the most difficult distractor, rather than pushing toward truly novel or creative solutions required in real-world performance.
+ 1 more
Logical Gaps (27)
Unstated assumptions required for the arguments to work.
The absence of bias in one geographical context (Iowa) guarantees that the tests are fair and culturally neutral for all disparate subgroups in the United States.
critical
Those who advocate for performance assessment do so while knowing it is unreliable and expensive, rather than being motivated by a sincere belief in its pedagogical value.
critical
A democratically chosen curriculum can only be enforced or monitored through the specific mechanism of standardized testing.
critical
Administrative will to delay test selection is stronger than the institutional incentives for high scores described in the 'collusion' claims.
critical
The presence of a reliable external measure is a necessary condition for maintaining the instructional rigor required to produce student competence.
critical
Establishing that high-stakes standardized tests have a positive 'washback' effect on classroom pedagogy rather than merely narrowing it.
critical
Teaching a monocultural hybrid curriculum is mutually exclusive with maintaining or validating the diverse home cultures of students.
critical
Demonstrating that even with a coherent curriculum, the 'initial differential in intellectual capital' is small enough to be fully remediated by school-based measures.
critical
Establishing that American schools possess the pedagogical capacity to meet high requirements once the tests are implemented.
critical
Establishing that the 'regular track' in a bimodal system can be elevated to the 'elite track' level through curriculum alone, without the resource advantages of elite families.
critical
Individual student effort on tests translates directly into aggregate national economic productivity.
significant
The author assumes that 'studying harder' and 'learning more' (quantitative metrics) equate to the 'deep understanding' that progressives claim is undermined by grades.
significant
Even if the motive for performance-based assessment is to hide group differences, this does not prove that the assessments are inherently inaccurate or unfair in a technical sense.
minor
The flaws found in ETS/College Board essay grading are also present and uncorrected in specific state systems like Vermont's.
significant
Large-scale systems are incapable of implementing the 'multiple topics/dozens of readings' requirement due to logistical or fiscal barriers.
minor
Other Claims Not in Chains (88)
+ 58 more