PoC (1977) — Chapter 6
Chapter 6
Writing assessment is identified as the primary obstacle to progress in composition pedagogy and research, presenting a philosophical challenge regarding the nature of good writing. While assessment must align with the standards of literate society, empirical research demonstrates that spontaneous holistic judgments from educated professionals are highly inconsistent and lack correlation. This variability necessitates a move from forced group conformity in grading toward a valid theoretical criterion like relative readability.
Argument Chains (14)
How the chapter's premises build toward conclusions. Each chain shows a line of reasoning from top to bottom. Click any node for full evidence and counter-arguments.
The Reliability-Validity Dilemma strong
Any assessment method intended for teaching and research must possess both reliability and validity.1 ev
↓
The use of T-units as an assessment method is highly reliable because it relies on objective counting.
↓
T-unit length is an invalid measure of writing quality because it has no necessary connection to the effective communication of meaning.1 ca
↓
The average length of T-units indicates syntactical maturity but does not measure writing quality regarding communicative purpose.
↓
As of the current state of writing assessment, no proposed valid method has also proven to be truly reliable.
The Reliability Chain strong
Aristotelian 'intrinsic evaluation' judges a text based on its success in fulfilling its own implicit intentions or telos.
↓
Intrinsic evaluation is fundamentally a judgment of technical success and skill in accomplishing intentions.
↓
Intrinsic evaluation is the only type of assessment that can yield widespread agreement among readers.
↓
For the purposes of research, writing assessment must restrict itself to intrinsic judgments where agreement can be reached in principle.1 ca
↓
Intrinsic evaluation is the only kind of assessment in which anyone should have confidence.1 ca
The Conflict Analysis Chain strong
The fundamental grounds for reader disagreement in text evaluation have been known since the time of Plato and Aristotle.
↓
Platonic 'extrinsic evaluation' judges writing based on criteria external to the writer's intentions, including the quality of the ideas themselves.
↓
Aristotelian 'intrinsic evaluation' judges a text based on its success in fulfilling its own implicit intentions or telos.
↓
Most readers are 'Platonists' in their actual judgments, prioritizing the quality of ideas (extrinsic) over wording and phrasing (intrinsic).
↓
Variations in reader judgments of student writing reflect the underlying conflict between intrinsic technical assessment and extrinsic evaluation of content.
The Assessment Validation Chain strong
It is possible to expertly rewrite most papers while maintaining the synonymy of the original complex intentions.1 ca
↓
Audience uptake of tone and implicit attitudes can be accurately compared using a threshold questionnaire.
↓
The laborious methods required to objectively score readability are unusable in the classroom but useful for certifying the reliability of assessors.
↓
Reliable assessors of writing samples can be identified by their ability to consistently give correct scores to heterogeneous test samples.
↓
Agreement between certified assessors is founded on a sound normative standard rather than team conformity.
The Reliability Chain strong
Students show more variation in the quality of their ideas and aims across different topics than they do in their quality of presentation.
↓
Writing is a transferable skill intended to function across a variety of situations beyond the classroom.
↓
The proper object of assessment in composition is writing ability, not the quality of specific writing samples.
↓
A score focusing on intrinsic quality of presentation is a better index of writing ability than a holistic score that includes the quality of ideas.1 ca
The Scientific Progress Chain strong
The laborious methods required to objectively score readability are unusable in the classroom but useful for certifying the reliability of assessors.
↓
Evaluations of writing results based on relative readability experiments can be duplicated by other experimenters.
↓
Significant progress in both composition teaching and research depends on the adoption of common assessment principles.
↓
Systematic composition research can effectively raise the competence level of both teachers and writers.1 ca
The Social Mandate Chain strong
Composition is taught at the behest of society, making societal judgment the ultimate authority on writing quality.
↓
Any valid method of writing assessment must be consistent with the judgments of literate society at large.3 ev · 1 ca
↓
The principle of relative readability must be relevant to the assessment of writing to have practical utility.2 ev
The Efficiency Chain strong
The universal standard by which we judge the relative success of an achieved aim in speech is the standard of least effort.
↓
If the same complex of aims is achieved by a writer with less reader effort, the writing is intrinsically more successful.1 ca
↓
Judgments of relative readability are theoretically confirmable by objective methods independent of individual reader judgments.
The Necessity of the Analytical Approach moderate
There is currently no holistic agreement on what constitutes 'good writing' among English teachers or the general public.
↓
Reliability in writing assessment is dependent on achieving widespread societal agreement about the qualities of good writing.
↓
Diederich’s inductive results represent how literate society, acting as the 'court of last resort,' actually evaluates writing quality.
↓
A valid assessment method must adhere to the criteria actually used by society to judge writing.1 ca
↓
To achieve reliability in an analytical approach, a uniform system of weighting must be imposed on the scoring categories.
↓
Reliability and validity in writing assessment must be sought through an analytical approach rather than a holistic one.1 ca
The Holistic Impasse Chain moderate
Platonic 'extrinsic evaluation' judges writing based on criteria external to the writer's intentions, including the quality of the ideas themselves.
↓
A purely intrinsic evaluation is inadequate for significant human judgment because the value of the intention itself must be weighed.
↓
An A-plus success in achieving a trivial or harmful intention is a trivial or harmful success and ought to be judged as such.
↓
The mixture of basic judicial principles (intrinsic and extrinsic) is an embarrassment for the standardization of judgment.
↓
David Hume’s theory of taste fails to provide an adequate standard of judgment because it cannot solve the logical tangle between the 'sound state of the organ' and the 'intrinsic point of view.'
↓
The problem of holistic assessment is not susceptible of solution.1 ca
The Assessment Obstacle Chain moderate
There is a very low correlation (approximately .40) among professional groups, including English teachers, regarding the quality of the same writing samples.5 ev
↓
Current reliable grading of writing tests depends on a socializing process that forces group conformity rather than natural agreement.
↓
The problem of writing assessment is a significant philosophical problem that belongs in a theoretical study of composition.1 ev
↓
Writing assessment is the single most important obstacle to practical progress in composition teaching and research.3 ev · 1 ca
Social Consensus as Prerequisite moderate
Readers disagree widely in their holistic judgments of writing quality.
↓
There is currently no holistic agreement on what constitutes 'good writing' among English teachers or the general public.
↓
Reliability in writing assessment is dependent on achieving widespread societal agreement about the qualities of good writing.
↓
ETS scoring reliability is localized and contingent rather than universal; it depends on the specific 'conformed group' of readers.
The Consensual Assessment Chain moderate
Intrinsic evaluation is the only assessment principle capable of yielding widespread agreement in a heterogeneous world.1 ca
↓
Combining scores for mechanical correctness and quality of presentation makes assessment unreliable.
↓
Large-scale tests that separate correctness from quality of presentation are more informative for composition research.
↓
Writing assessment should be categorized into Extrinsic evaluation (Quality of intentions) and Intrinsic evaluation (Relative readability and Correctness).
The Professional Standardization Chain moderate
Certified assessors can accurately score relative readability after only a few days of practice.
↓
The talents required to assess relative readability do not exceed the natural capacities of most teachers.
↓
Judging writing according to its communicative effectiveness relative to its aims is already a common practice among many teachers.
↓
The widespread adoption of intrinsic assessments in grading would provide composition teachers with a core of common purposes and principles.1 ca
Counter-Arguments (14)
empirical challenge (2)
Readers may derive more value from a text that forces them to exert effort (active processing) than from a text that is 'effortless' but forgettable.
Intentions are often discovered through the act of writing; therefore, an author's 'complex intentions' are not a fixed entity that can be extracted and poured into a more 'readable' vessel by an expert.
alternative explanation (2)
Analytical scoring may capture the 'pieces' of writing but miss the 'gestalt' or emergent properties that actually constitute quality, resulting in high reliability for a construct that is no longer 'writing.'
A student's 'writing ability' is fundamentally their ability to generate and organize ideas for a specific audience; isolating 'presentation' treats writing as a decorative shell rather than a cognitive process.
value disagreement (4)
The goal of education is often to lead or reform societal standards rather than merely reflect the existing biases of the 'court of last resort.'
If we only have confidence in intrinsic assessment, we risk validating and rewarding high-level technical skill in the service of deceitful, harmful, or socially destructive intentions.
The fact that we cannot reach 'widespread agreement' on the value of ideas does not mean we should stop assessing them; it means assessment should be open to debate and local context rather than reduced to technical metrics.
+ 1 more
methodological concern (2)
Even if T-unit length has no direct link to meaning, it may serve as a powerful proxy or 'latent variable' that correlates so highly with expert judgment that its invalidity in theory is irrelevant in practice.
Restricting research to intrinsic evaluation creates a 'construct under-representation' where the most vital part of writing—the quality and truth of the ideas—is systematically ignored by the very researchers meant to improve it.
scope limitation (3)
Teaching can still progress by focusing on uncontroversial fundamentals (grammar, logic, clarity) even if the final 'aesthetic' or 'holistic' grade remains subjective among different professions.
While a 'perfect' holistic solution may be theoretically impossible, 'sufficient' reliability can be achieved through 'conformed' reader groups (as seen in ETS), making it a pragmatic success despite its theoretical failure.
The increase in 'competence' defined by readability scores may lead to a homogenization of student writing that lacks individual voice and intellectual risk-taking.
internal inconsistency (1)
Literate society's judgments are often based on 'flavor' or 'personality' (C6), which are idiosyncratic; a valid assessment method should specifically exclude these in favor of objective communicative efficiency.
Logical Gaps (10)
Unstated assumptions required for the arguments to work.
A forced weighting system in an analytical method can effectively substitute for genuine societal agreement.
critical
Establishing that separating extrinsic and intrinsic evaluation does not distort the fundamental nature of the communicative act.
critical
Societal preferences are sufficiently stable and coherent to provide a 'valid' benchmark for assessment.
significant
Disagreement in holistic grading cannot be bypassed by using objective linguistic markers (like T-units) to measure progress.
minor
What society currently values in writing is what society SHOULD value in writing for the purposes of education.
significant
Establishing that 'research agreement' must be based on 'social widespread agreement' rather than specialized expert consensus.
minor
The transition from 'agreement is possible' (reliability) to 'anyone should have confidence' (normative validity).
significant