PoC (1977) — Chapter 6

Chapter 6

Writing assessment is identified as the primary obstacle to progress in composition pedagogy and research, presenting a philosophical challenge regarding the nature of good writing. While assessment must align with the standards of literate society, empirical research demonstrates that spontaneous holistic judgments from educated professionals are highly inconsistent and lack correlation. This variability necessitates a move from forced group conformity in grading toward a valid theoretical criterion like relative readability.

95 claims

14 argument chains

23 evidence

14 counter-arguments

10 logical gaps

Argument Chains (14)

How the chapter's premises build toward conclusions. Each chain shows a line of reasoning from top to bottom. Click any node for full evidence and counter-arguments.

The Reliability-Validity Dilemma strong

Any assessment method intended for teaching and research must possess both reliability and validity.1 ev

↓

The use of T-units as an assessment method is highly reliable because it relies on objective counting.

↓

T-unit length is an invalid measure of writing quality because it has no necessary connection to the effective communication of meaning.1 ca

↓

The average length of T-units indicates syntactical maturity but does not measure writing quality regarding communicative purpose.

↓

As of the current state of writing assessment, no proposed valid method has also proven to be truly reliable.

The Reliability Chain strong

Aristotelian 'intrinsic evaluation' judges a text based on its success in fulfilling its own implicit intentions or telos.

↓

Intrinsic evaluation is fundamentally a judgment of technical success and skill in accomplishing intentions.

↓

Intrinsic evaluation is the only type of assessment that can yield widespread agreement among readers.

↓

For the purposes of research, writing assessment must restrict itself to intrinsic judgments where agreement can be reached in principle.1 ca

↓

Intrinsic evaluation is the only kind of assessment in which anyone should have confidence.1 ca

The Conflict Analysis Chain strong

The fundamental grounds for reader disagreement in text evaluation have been known since the time of Plato and Aristotle.

↓

Platonic 'extrinsic evaluation' judges writing based on criteria external to the writer's intentions, including the quality of the ideas themselves.

↓

Aristotelian 'intrinsic evaluation' judges a text based on its success in fulfilling its own implicit intentions or telos.

↓

Most readers are 'Platonists' in their actual judgments, prioritizing the quality of ideas (extrinsic) over wording and phrasing (intrinsic).

↓

Variations in reader judgments of student writing reflect the underlying conflict between intrinsic technical assessment and extrinsic evaluation of content.

The Assessment Validation Chain strong

It is possible to expertly rewrite most papers while maintaining the synonymy of the original complex intentions.1 ca

↓

Audience uptake of tone and implicit attitudes can be accurately compared using a threshold questionnaire.

↓

The laborious methods required to objectively score readability are unusable in the classroom but useful for certifying the reliability of assessors.

↓

Reliable assessors of writing samples can be identified by their ability to consistently give correct scores to heterogeneous test samples.

↓

Agreement between certified assessors is founded on a sound normative standard rather than team conformity.

The Reliability Chain strong

Students show more variation in the quality of their ideas and aims across different topics than they do in their quality of presentation.

↓

Writing is a transferable skill intended to function across a variety of situations beyond the classroom.

↓

The proper object of assessment in composition is writing ability, not the quality of specific writing samples.

↓

A score focusing on intrinsic quality of presentation is a better index of writing ability than a holistic score that includes the quality of ideas.1 ca

The Scientific Progress Chain strong

The laborious methods required to objectively score readability are unusable in the classroom but useful for certifying the reliability of assessors.

↓

Evaluations of writing results based on relative readability experiments can be duplicated by other experimenters.

↓

Significant progress in both composition teaching and research depends on the adoption of common assessment principles.

↓

Systematic composition research can effectively raise the competence level of both teachers and writers.1 ca

The Social Mandate Chain strong

Composition is taught at the behest of society, making societal judgment the ultimate authority on writing quality.

↓

Any valid method of writing assessment must be consistent with the judgments of literate society at large.3 ev · 1 ca

↓

The principle of relative readability must be relevant to the assessment of writing to have practical utility.2 ev

The Efficiency Chain strong

The universal standard by which we judge the relative success of an achieved aim in speech is the standard of least effort.

↓

If the same complex of aims is achieved by a writer with less reader effort, the writing is intrinsically more successful.1 ca

↓

Judgments of relative readability are theoretically confirmable by objective methods independent of individual reader judgments.

The Necessity of the Analytical Approach moderate

There is currently no holistic agreement on what constitutes 'good writing' among English teachers or the general public.

↓

Reliability in writing assessment is dependent on achieving widespread societal agreement about the qualities of good writing.

↓

Diederich’s inductive results represent how literate society, acting as the 'court of last resort,' actually evaluates writing quality.

↓

A valid assessment method must adhere to the criteria actually used by society to judge writing.1 ca

↓

To achieve reliability in an analytical approach, a uniform system of weighting must be imposed on the scoring categories.

↓

Reliability and validity in writing assessment must be sought through an analytical approach rather than a holistic one.1 ca

The Holistic Impasse Chain moderate

Platonic 'extrinsic evaluation' judges writing based on criteria external to the writer's intentions, including the quality of the ideas themselves.

↓

A purely intrinsic evaluation is inadequate for significant human judgment because the value of the intention itself must be weighed.

↓

An A-plus success in achieving a trivial or harmful intention is a trivial or harmful success and ought to be judged as such.

↓

The mixture of basic judicial principles (intrinsic and extrinsic) is an embarrassment for the standardization of judgment.

↓

David Hume’s theory of taste fails to provide an adequate standard of judgment because it cannot solve the logical tangle between the 'sound state of the organ' and the 'intrinsic point of view.'

↓

The problem of holistic assessment is not susceptible of solution.1 ca

The Assessment Obstacle Chain moderate

There is a very low correlation (approximately .40) among professional groups, including English teachers, regarding the quality of the same writing samples.5 ev

↓

Current reliable grading of writing tests depends on a socializing process that forces group conformity rather than natural agreement.

↓

The problem of writing assessment is a significant philosophical problem that belongs in a theoretical study of composition.1 ev

↓

Writing assessment is the single most important obstacle to practical progress in composition teaching and research.3 ev · 1 ca

Social Consensus as Prerequisite moderate

Readers disagree widely in their holistic judgments of writing quality.

↓

There is currently no holistic agreement on what constitutes 'good writing' among English teachers or the general public.

↓

Reliability in writing assessment is dependent on achieving widespread societal agreement about the qualities of good writing.

↓

ETS scoring reliability is localized and contingent rather than universal; it depends on the specific 'conformed group' of readers.

The Consensual Assessment Chain moderate

Intrinsic evaluation is the only assessment principle capable of yielding widespread agreement in a heterogeneous world.1 ca

↓

Combining scores for mechanical correctness and quality of presentation makes assessment unreliable.

↓

Large-scale tests that separate correctness from quality of presentation are more informative for composition research.

↓

Writing assessment should be categorized into Extrinsic evaluation (Quality of intentions) and Intrinsic evaluation (Relative readability and Correctness).

The Professional Standardization Chain moderate

Certified assessors can accurately score relative readability after only a few days of practice.

↓

The talents required to assess relative readability do not exceed the natural capacities of most teachers.

↓

Judging writing according to its communicative effectiveness relative to its aims is already a common practice among many teachers.

↓

The widespread adoption of intrinsic assessments in grading would provide composition teachers with a core of common purposes and principles.1 ca

Counter-Arguments (14)

empirical challenge (2)

Readers may derive more value from a text that forces them to exert effort (active processing) than from a text that is 'effortless' but forgettable.

Targets: If the same complex of aims is achieved by a writer with less reader e...

Intentions are often discovered through the act of writing; therefore, an author's 'complex intentions' are not a fixed entity that can be extracted and poured into a more 'readable' vessel by an expert.

Targets: It is possible to expertly rewrite most papers while maintaining the s...

alternative explanation (2)

Analytical scoring may capture the 'pieces' of writing but miss the 'gestalt' or emergent properties that actually constitute quality, resulting in high reliability for a construct that is no longer 'writing.'

Targets: Reliability and validity in writing assessment must be sought through ...

A student's 'writing ability' is fundamentally their ability to generate and organize ideas for a specific audience; isolating 'presentation' treats writing as a decorative shell rather than a cognitive process.

Targets: A score focusing on intrinsic quality of presentation is a better inde...

value disagreement (4)

The goal of education is often to lead or reform societal standards rather than merely reflect the existing biases of the 'court of last resort.'

Targets: A valid assessment method must adhere to the criteria actually used by...

If we only have confidence in intrinsic assessment, we risk validating and rewarding high-level technical skill in the service of deceitful, harmful, or socially destructive intentions.

Targets: Intrinsic evaluation is the only kind of assessment in which anyone sh...

The fact that we cannot reach 'widespread agreement' on the value of ideas does not mean we should stop assessing them; it means assessment should be open to debate and local context rather than reduced to technical metrics.

Targets: Intrinsic evaluation is the only assessment principle capable of yield...

+ 1 more

methodological concern (2)

Even if T-unit length has no direct link to meaning, it may serve as a powerful proxy or 'latent variable' that correlates so highly with expert judgment that its invalidity in theory is irrelevant in practice.

Targets: T-unit length is an invalid measure of writing quality because it has ...

Restricting research to intrinsic evaluation creates a 'construct under-representation' where the most vital part of writing—the quality and truth of the ideas—is systematically ignored by the very researchers meant to improve it.

Targets: For the purposes of research, writing assessment must restrict itself ...

scope limitation (3)

Teaching can still progress by focusing on uncontroversial fundamentals (grammar, logic, clarity) even if the final 'aesthetic' or 'holistic' grade remains subjective among different professions.

Targets: Writing assessment is the single most important obstacle to practical ...

While a 'perfect' holistic solution may be theoretically impossible, 'sufficient' reliability can be achieved through 'conformed' reader groups (as seen in ETS), making it a pragmatic success despite its theoretical failure.

Targets: The problem of holistic assessment is not susceptible of solution....

The increase in 'competence' defined by readability scores may lead to a homogenization of student writing that lacks individual voice and intellectual risk-taking.

Targets: Systematic composition research can effectively raise the competence l...

internal inconsistency (1)

Literate society's judgments are often based on 'flavor' or 'personality' (C6), which are idiosyncratic; a valid assessment method should specifically exclude these in favor of objective communicative efficiency.

Targets: Any valid method of writing assessment must be consistent with the jud...

Logical Gaps (10)

Unstated assumptions required for the arguments to work.

A forced weighting system in an analytical method can effectively substitute for genuine societal agreement.

critical

Establishing that separating extrinsic and intrinsic evaluation does not distort the fundamental nature of the communicative act.

critical

Societal preferences are sufficiently stable and coherent to provide a 'valid' benchmark for assessment.

significant

Disagreement in holistic grading cannot be bypassed by using objective linguistic markers (like T-units) to measure progress.

minor

What society currently values in writing is what society SHOULD value in writing for the purposes of education.

significant

Establishing that 'research agreement' must be based on 'social widespread agreement' rather than specialized expert consensus.

minor

The transition from 'agreement is possible' (reliability) to 'anyone should have confidence' (normative validity).

significant

Other Claims Not in Chains (38)

There are five predominant criteria used by literate society to judge writing: quality of ideas; usage/sentence structure/punctuation/spelling; organization/analysis; wording/phrasing; and flavor/personality.

empirical

Without periodic inter-team conferences to ensure conformity, ETS (Educational Testing Service) scoring teams cannot be depended upon to produce highly correlated results.

empirical

Disagreements in writing assessment can be categorized into clusters of readers who prioritize different internal criteria.

theoretical

Readers of student writing naturally cluster into groups based on the predominant weight they give to specific criteria.

empirical

The criteria used by readers to judge writing quality are independent of language and culture, as evidenced by identical results in English and Italian studies.

empirical

The primary value of Diederich and Remondino's work is the descriptive information they provided about how people judge writing, not the specific assessment methods they proposed.