[A piece that appeared in the Vocabula Review in February, 2006. Feeling lazy, I've made very few changes. – JK]
When I use a word," Humpty Dumpty said, in rather a scornful tone, "It means just what I choose it to mean — neither more nor less." — Lewis Carroll, Alice's Adventures in Wonderland
I |
knew my surefire, revolutionary cure for grade inflation was in for some tough sledding when I first began to run it past my colleagues here at Soybean State University. Somehow they just couldn't seem to focus. "Grade inflation," I would begin confidently, "has reached crisis proportions everywhere." Encouraged by an eager light in my colleague's eyes, I would take a breath, and that would be my undoing. "Oh, grades," my colleague would say, rushing into the breach, "Well, what I do, is —"
Then I would be in for it. Five minutes, minimum, of hearing about her approach to the delicate problem of sorting, stacking, and rank-ordering her students. How her point system worked. How her attendance policy worked. How letter grades could be assigned to anything as slippery and debatable as a college essay, and how those grades could be translated into numbers and then back again. By the time she finished explaining her procedures for curving the final, eyes shining as she noted that square roots can be used to reshape distributions that feel too brutal, the bitter truth would have dawned: here was no comrade-in-arms in the revolution I am planning, but someone positively in love with the status quo — or, rather, with her own little corner of it, though like everyone else she would join in polite hand-wringing over the larger picture.
Depending on your vantage, that larger picture can seem calamitous and disgraceful, or merely bizarre. The basic numbers are enough to make Spock crook both eyebrows as he reads them off to Kirk during a close flyby. Harvard, for instance, reports that As now account for half of all grades given to undergraduates. Princeton reported similar figures in 2003–2004, with Humanities departments, ever a den of floozies, cheapening themselves at the rate of 56.2 percent As. More recently a heroic intervention, along the lines of an old-fashioned tent revival, has brought that figure all the way down to 45.5 percent, cause for congratulations all around. The social sciences at Princeton meanwhile racked up a mere 38.4 percent As, while the stern chastity of the natural sciences held steady, with 36.4 percent of students "excelling." Yale refuses on principle to release data on grading patterns, but reports calmly that "administrators do not think grade inflation is an issue at the University." Good old Yale. You are on your way out to the parking lot, keys in your hand, before you realize that no one has denied that Yale grades are as inflated as Weimar Deutschmarks. They have only denied that they are worried about it.
Here at Soybean, fall semester statistics reveal that the College of Arts and Humanities awarded 4694 As, 4963 Bs, 2797 Cs, 515 Ds, and 510 Fs, for a collective average of 2.95 (just under B). Our orgy of high grading appears more restrained than Harvard's or Princeton's, but here the statistical context incorporates a very late course drop deadline (about two-thirds of the way through the semester) and a liberal course retake policy that will later convert perhaps half of those Ds and Fs to passing grades, the new mark replacing the old in the student's GPA calculation. CAH students had seven in ten chances of receiving an A (34.8 percent) or a B (36.8 percent), but only one in five of scoring a C (20.1 percent), while real initiative and marksmanship were required to hit either D or F, both at about 3.8 percent of grades awarded.
In the small Arts and Humanities Department, 83 students received 70 As and 13 Bs, pitching a shutout in the three other columns, for a stratospheric (and perfectly meaningless) departmental average of 3.84. Women's Studies was a fairly distant second at 3.50, with 40 As, 21 Bs, 4 Cs, and an F gotten by someone I would like to meet. At the other end of the distribution was Philosophy, ever the maiden staid and pure, grumping along at a mere 2.49.
Stuart Rojstaczer, a long-time crusader who maintains the website gradeinflation.com, has tracked grading patterns at thirty premier institutions since 1991. Over that period, he finds, the average of all grades given has inched up from 2.94 (slightly under a B) to 3.09 (nearly a B+). The relentless upward creep prophesies a golden day when there truly will be no child left behind, and all children will bang their heads together against the low ceiling of the 4.0, awarded as a matter of course to everyone. Indeed, we seem to have accomplished more than half the journey already; the real question is not how the average crept up .15 points in fifteen years, but why it was at 2.94 to begin with. Isn't C, 2.0, defined as average? What planet is this anyway, Spock?
It is Planet Academe, where just as in Lake Woebegon, all the children are above average, their performance having been measured against the rubber ruler of an entirely uncalibrated, nonstandardized standard. Well, perhaps that isn't quite fair. Grades actually are anchored to something slightly more substantial than the instructor's inner voices: namely, to language, in the form of ritualized verbal "definitions" that with very minor variations prevail everywhere. Here at State, the rubric, gravely set forth in an early page of the Catalog, reads as follows:
A Excellent
B Good
C Average
D Poor but Passing
F Failing
The definitions of A, B, and C, notice, are straightforwardly comparative; grading is imagined as an exercise in determining relative merit, with students ranked in reference to each other — not necessarily with reference to the academic task — in a way that would make "all As" or "all Cs" quite impossible. Success means beating the other fellow, never mind what the game is exactly. D, however, is seized with sudden doubt, and begins to hedge: "Poor" continues the premise of evaluation with respect to an implied reference group (either the students in this class this semester, or all those who have ever taken the course, or some other appropriate cohort); but "Passing" implies a very different kind of evaluation, one that refers directly to the task itself, on the apparent assumption that criteria of success and failure are intrinsically and objectively there.
And then we come to F, that hoary relic, so little heard from these days, to find that D's self-doubting afterthought has become the stuff of an oddly sunny confidence. The paradoxical role of F is to make everyone feel better just because it is there. F sees no need to compare students at all; it believes that the academic task is right there in front of us, like a crossbar set at 2 feet 6, and the student either clears it or doesn't. In fact F doesn't seem to belong here. If it were to continue the logic (the main logic) of the rest of the grading scale, it would need to be glossed by some comparative term, such as "lowest quintile" or "far below average" or "piss poor." But since the definition in fact is "Failing" (with that odd suggestion that F "stands for" this by way of abbreviation rather than alphabetical sequence), the implied logical counterbalance ought to be a single opposite term, such as P for Passing or S for Satisfactory. Instead F shares the seesaw of binary judgment with four different counterparts, none of them symmetrical.
Right at the outset, then, the grading scale (on Planet Academe we call it a scale) seems badly confused as to what it proposes. Are we determining only that students meet minimal criteria of accomplishment, or are we rating them exactingly and competitively, with a determination to see what the differences are? Yes, no, maybe, both. In truth, the underlying logic of the scale is one of emotion, not measurement. In its implicit denial of comparison, F, weakly seconded by D, provides an assurance that was achingly and thrillingly absent from your first swim meet or spelling bee or beauty contest: that no one has to finish last. If everyone gets over the bar, that's that, and we don't have to talk about who is best and worst.
The whole arrangement bespeaks an abiding, deep ambivalence toward assessment. On the one hand, we want to know how students stack up against each other; on the other, we want to spare ourselves exactly that knowledge, resisting cruel competition and odious comparison in favor of a scheme whereby nearly all students can be declared successful.
Does anything on the scale tell us what grade distributions might be reasonable? An intelligent alien might suppose that the five grades corresponded to the five quintiles of the graded cohort, and that, by mathematical necessity, Fs would account for 20 percent of all grades given, Ds for the next 20 percent, and so on. Or perhaps since F inhabits a different universe, speaking a different language, it should be arbitrarily restricted to just 5 or 10 percent; but then the remaining 90 or 95 percent of grades should be evenly distributed among D, C, B, and A. That would make some kind of sense; it would suggest that we are measuring something that is actually there, namely, the position of the students vis-à-vis one another.
In reality, of course, such distributions are never contemplated, and an instructor who implemented them would instantly incur the wrath of students, parents, chairs, deans, vice presidents, and possibly God. Even the most wild-eyed reformers think only in terms of a bell-curve distribution centered over C, tapering sharply toward each extreme. At the end of an unremarkable semester in English Literature for Philistines, you swallow, force a smile, and award 30 percent As, 45 percent Bs, 16 percent Cs, and a handful of Ds and Fs, these last going to students who never showed, never handed in work, or keyed your car. Of course the real world, as we so fancifully call it, sneers at the mathematical absurdity of such numbers. Businesspeople growl about receiving from their academic suppliers whole cases of human widgets with 3.8 GPAs and cum laude degrees who then swiftly prove to be quite ordinary in all of their abilities and attainments. Some of the disappointed go on to write Op-Ed pieces that savage Academe for what they see as its complete loss of standards, paying outside consultants to check their punctuation because the English BA they hired last fall can't seem to do it. But the folkways of the planet are not without their underlying, though largely unstated, rationale. Let me try to explain.
A classroom is not a factory, where merchandise coming down the line must be coolly scrutinized, nor a lab where the researcher collects data on test animals. It is a community, even a family, sustained while it lasts by many subtle and rather delicate human bonds. To the extent that the professor is effective, it is likely to be a community where 80, 90, even 100 percent of the members are enthusiastically contributing, each according to his or her abilities. No two have exactly the same aptitudes or work up to precisely the same standard, and on some level everyone understands this. Nevertheless the students are all in some final sense equal, and the effectiveness of the class — not just its happiness — depends on this being affirmed in various ways. The same ethic of undiscriminating inclusion that we saw at the low end of the grading scale is in many ways the right and instinctive ethic of the classroom, as of any team or community.
Then, too, grades invariably get awarded for "effort," for "encouragement," and for purely procedural ends like enforcing attendance and homework. At the end of the term, when the chits come due, sound pedagogy turns out to have been very unsound measurement, and academic courses universally over-reward diligent incomprehension. But the chief problem with objective, rational grading, with using the whole five-grade ruler to tease out all the bleak truth of human difference, is that it violates the spirit of community, setting in motion all kinds of unproductive jealousies and anxieties. Rather than a step toward wholesome discipline, it tends to be an act of psychological sabotage.
Sabotage against the professor, at least. Good teachers are, in many important ways, like good parents. They can be perfectly objective in their academic fields, but they are not all that objective about their students. True, Like doctors, lawyers, priests, and valets, they have the responsibility of delivering a certain amount of unpleasant truth, and most are excellent blusterers. A surprising number really do pride themselves on getting from the student the best the student can do. At the end of the day, nonetheless, they tend to give the benefit of every reasonable doubt and several unreasonable ones, for roughly the same reason that you still own badly focused videotape of your daughter's high school soccer games. But it's doubtful the foible can be corrected without collateral damage.
None of this is to say that grading is unimportant. On the contrary, it is crucial, but at present its importance is mainly symbolic. It gives meaning, shape, direction, and motivation to the student's work and the instructor's; but like any ritual it partakes of the irrational and requires a healthy suspension of disbelief. The D and F grades are not there to be used, so much, but to be evaded again and again, lending a certain zest of danger to the coursework, adding to everyone's feeling of accomplishment. They function like the threat that Santa Claus won't come this year, which allows everyone to wake up to the ritual surprise of presents under the tree and the delighted conviction that it results from all the children having been good. No one is really surprised, of course, or really convinced the children have not been monsters occasionally, but no one is seriously misled, either.
T |
o Spock and other hard-line traditionalists, I fear, all this will sound like a cheap rationale for rewarding laziness and nonachievement. And the traditionalists will have a point, especially if they are childless taxpayers in a state with a lavish flagship university. But the sentimentalism of the classroom does not begin with the professoriate. It comes from everywhere, from the culture at large, but especially perhaps from the students and their parents. Over the past half century or so, while the college diploma has become more and more the prerequisite to bearable employment, government support for college education has paradoxically dwindled. As a result, tuition costs have soared, and parents have increasingly come to regard the diploma as something like a big-ticket household purchase, believing that it cannot fairly be withheld so long as a certain number of bills have been paid, classes taken, and assignments handed in.
Try seeing two or three kids of average ability through to their four-year degrees at the state school, on the salary of a nurse or city worker, and you will find your enthusiasm for rigorous grading sharply curtailed: as long as my kid passes, fine. As costs rise, the emphasis shifts from what the student must do to measure up, to what the school is doing for the student in exchange for those fat tuition checks.
For similar reasons, administrators are constrained to think in terms of yield and retention rates — accepted students who come, enrolled students who stay — of graduation rates, of time-to-degree, of credit hours billed and awarded, of faculty loads carried and completed, and of surveys of alumni satisfaction: concerns that do not absolutely exclude high academic standards or rigorous grading, but generally tug in the opposite direction. The immediate objective is to keep the student busy and satisfied while she proceeds through the system at an acceptable rate; if you manage to teach her something in the process, so much the better. (The students themselves are generally much more likely to demand this of you than anyone else.)
Meanwhile certain unmistakable signals are given that the Catalog definitions are not to be taken literally. Here at Soybean State, one of the clearest of these is our threshold for academic probation and dismissal. A GPA that dips below 2.0 and stays there for more than a semester earns the student a ticket to a spot behind the cash register at McDonald's, for at least one semester, after which he may be reinstated. But of course 2.0 is the numerical equivalent of C, the midpoint of the grading scale, defined as "Average." How can average performance be grounds for dismissal? It can't, and if instructors really did award Cs for average work, the result would be a dismissal rate sufficient to send students, teachers, and administrators all packing. What happens instead, of course, is that grades rise automatically to the point where Soybean has a rate it can tolerate. An average floating around 3.0, or at any rate far above 2.0, has been built into the system to begin with.
Another clear sign to the instructor that the fiction of measurement is only that is the great absence, at SSU and elsewhere, of benchmarking procedures aimed at establishing consistency in grades from course to course, prof to prof, department to department. Instructors are left almost completely alone to develop their own working definitions of "Excellent," "Good," and so on, their right to grade as eccentrically as they please defended in the name of "academic freedom." The result is that grades become a sort of Humpty-Dumpty language that lacks any shared definitions, and "taking a stand" against grade inflation becomes a meaningless proposition: you can give, amid much grief, the false and arbitrary impression that your students are worse than everyone else's, but you cannot unilaterally supply the lack of a defining institutional context.
It is as if, through an exaggerated fear of hurting anyone's feelings, the Bureau of Weights and Measures declined to issue standard definitions of a meter or a peck or a pound. We do, of course, go back after the fact and solemnly endow grades with a certain quantitative, pseudo-scientific cast by computing the student's GPA, to two decimal places no less: but no one should confuse this attempt at sympathetic magic with real measurement.
Finally, we come to the most forceful signal of all that the game is to be played a certain way: the fairly recent, nationwide vogue of student evaluations as a method for evaluating instructors themselves. Student Evaluations of Teachers (SETs) are a monument to that bedrock American principle that, faced with intractable, dark, and complex questions, you can always take a vote. Does God exist? 96 percent of Americans think so, so She must — next question! Here on Academe, the same epistemological insouciance has led to the custom of having students take ten minutes or so at the end of every semester to answer multiple-choice questionnaires about the professor's performance, like patient diners completing their customer satisfaction surveys at Ruby Tuesdays.
These whimsical documents then get sucked into the knowledge vacuum known as the instructor's personnel file, where — since no one has ever satisfactorily defined effective teaching, let alone figured out how to measure it — they become the de facto criterion of success in the classroom. Of course any tenth-grader can spot the problems in this you-grade-me, I-grade-you arrangement. It takes a PhD to argue, as apologists do, that there is a positive correlation between high student evaluations and actual student learning, and no correlation between high evaluations and high grades given. Both notions are decisively refuted by Valen Johnson in a book called Grade Inflation: A Crisis in College Education (New York: Springer-Verlag, 2003), which I heartily recommend to anyone wishing to be seriously depressed.
We professors, then, are not guilty of inventing the Bizarro world of academic pseudo-assessment; we are guilty only of enjoying it, after the institution hands us the grade book with a broad wink, desiring a certain show of objectivity and only that. Is the overall situation really so bad? My perception is no doubt colored by the red lights here at Floozy Central, in creative writing classes offered by the English Department, but I don't think so. Grades have drifted up steadily since 1991, but there was a period in the seventies and eighties, as Rojstaczer notes, when they held steady, and it is always possible that they will retreat spontaneously from their current Dadaist extremes. In any case, the rhetoric of crisis fails to note that grades in the Humanities at least have always been vague and impressionistic, bits of expressive language never meant to place a precise objective valuation in the public spotlight. The term inflation itself is in some ways badly misleading. In real inflation, the economic kind, prices go up in concert while keeping roughly the same comparative relation to the underlying goods and services. The process can go on indefinitely without destroying the system, as you know if you have vacationed in one of those charming countries that have lots of zeroes on their currency, in faint recollection of wars lost and other distant calamities.
Obviously this is not the case with grades that are rising on a scale that goes no higher than 4.0, so observers speak anxiously of "grade compression" and Götterdämmerung; but a better description of the present state might be "nongrading" or "conscientious objection to grading." Much of Planet Academe has changed over to what really amounts to a Pass/Fail system, quietly deciding that it prefers to conduct business without much competitive or quantitative ranking of students. The ethic of undiscriminating inclusion that was always part of the grading dialectic has come to dominate, but this is less than an earthquake. The effects on student motivation and academic achievement are profound, no doubt, but complex and by no means uniformly negative. On the one hand, we sacrifice the great motivational advantages of fear and competitiveness; on the other, we gain those of community solidarity and positive reinforcement. On a quick view, it looks like a toss-up.
The first part of my revolutionary cure for grade inflation, then, is simply this: that we quit worrying about it and quit promising to bring it to a halt. It is after all a reality we have tacitly chosen, and when pressed we should be prepared to admit this.
But my plan also entails a second phase, which I hope will seem less disappointing to reformers. For of course there are many advantages to objective, uniform, and transparent grading. These are so patent and so universally insisted upon that so far I have ignored them completely, and will now cover them in a quick fusillade of bullet points:
• However fancifully they may be derived, grades have enormous real-world consequences, helping to determine who gets the scholarship, who gets into which graduate school, who gets what job. There is therefore a strong moral obligation to achieve as much consistency and fairness as possible.
• Students themselves need bluntly accurate assessments of their work, lest they waste their lives in a vain quest to become hairdressers when their real talents are in Microbiology.
• In ordered, sequential disciplines like Finnish and Differential Calculus and Cooking, early lessons are an indispensable foundation for later ones. Fuzzy grading thus leads to bother, grief, and waste motion when the instructor in the Advanced course realizes she must conduct a crash review for the half of the class that should have flunked Basic.
• Some of our students, and we can never know which ones, are on their way to highly sensitive careers as ship captains, brain surgeons, bomb disassemblers, and American Idol judges. By grading too indulgently, we forfeit a golden opportunity to ambush the dumb ones on their way to making a mess of the world.
• Both the institution and the state need hard information about student learning in order to make decisions about the allocation of scarce resources.
• The diploma once awarded is in many ways an entitlement, and one given at the expense of those who never get to go to college. A student in a night class once told me, "A lot of us adult students are here because if we don't get that piece of paper, some kid who doesn't know anything but got sent to college by his parents is going to take our job." She was too cynical, but right that there is a moral imperative here: we should make sure that, in fact, the kid who takes her job knows more than she does.
Much, much more could be said on this side of the question, and has been said, in Johnson's book for instance, but for our purposes this should do. So here is my remedy: let instructors go on awarding grades exactly as they do now. But then require them, in every class without exception, to rank-order their students, from first to last, with no ties permitted. For me the extra step would take no more than ten minutes per semester, tops, since I already compute an overall "course average" number for every student prior to assigning the letter grade. For other faculty, the time requirements might be a little greater, but after all it's my revolution. The student's A or B would be followed on the transcript by a two-digit number showing her percentile rank in that particular course and section: for example, "A — .56" for the weakest A student in a statistically typical Princeton class awarding 45.5 percent As. Or "D — .38" for the victim of the tough-as-nails professor whose average grade was C-.
The reform would allow the transcript, presently a muddled monologue, to become a kind of dialogue: in one column, traditional expressions of the faculty's esteem and good wishes, with their attestation that the material has basically been learned; in the other column, a hard judgment about just where the student stands. The numbers in the percentile column would be inflation-proof because every time one of them went up, another would have to go down by the same fraction — and vice versa. Knowing this would ease the psychological burden of a low evaluation for both instructor and student, since both would understand that a higher one could be had only at someone else's expense. The letter grades would of course lose some of their present importance, but I don't think they would be completely overshadowed by the ranking numbers. A grade line like "A — .16" would mean something materially different from "D — .16," to wit, "This class was competency-based. The nature of the material does not lend itself to much differentiation of students, so everyone got As, having mastered the subject more or less completely." Of course, a skeptical reader might still mutter, "Gut course."
Teachers, like parents, always think that their own students are smarter, more industrious, and taller than anyone else's, and many instructors in the Humanities, accustomed to having an unlimited number of As and Bs to dole out, can be expected to cry foul at the zero-sum nature of the new system. I will probably have to shoot a few of these. The rest will be appeased when I explain how secondary calculations can be used to remedy several long-standing inequities. Under my system, the GPA, preposterous construct that it is, would be replaced by the student's average percentile rank in all courses taken so that you could peg him, at a glance, as a .54 or .72 performer. Along with its other failings, the GPA is notoriously biased against "hard" majors and courses and instructors, but now we would have data allowing us to generate clear and accurate difficulty indexes for all three. A professor whose average student, based on the last end-of-semester computation, held a ranking of .65 in all classes taken so far would herself have a difficulty ranking of .65, and could proudly report that in her tenure portfolio, adding luster to student evaluations. Courses could likewise have their own difficulty rankings, based on the individual numbers of students enrolling for, let's say, the past three years; and different majors could likewise be compared. A grade line might come to look something like this:
B .44 .62 .38 .79
— with the letter expressing the instructor's basic certification that the task has been done (and done pretty well), the .44 the no longer surprising news that this "good" student was in fact slightly below average in this group, the .62 the compensating information that BioPhysics III typically enrolls above-average students. The .38 would add the surprising information that Professor Daffy typically attracts much less gifted students, and may have been a bit out of his depth with this particular course assignment; and the .79, finally, would remind us that BioPhysics is one of the hardest majors on campus.
Including such information in the transcript would alleviate the demoralizing pressure students now feel to seek out easy courses, and at least some of the corresponding pressure on teachers to "dumb down." Under my system, you could still locate a gut course filled with underperforming sluggards, and proceed to distinguish yourself with a .89 class ranking; but columns four and five of your transcript would clearly show that the course had a difficulty ranking of .26 and the professor one of .29, and no one would be fooled. At the end of the transcript, we would no doubt want to have a flurry of summary statistics: the average difficulty level for all courses taken, for major courses, and for nonmajor courses; the student's overall percentile ranking, likewise broken down into major and nonmajor components; and a revised percentile ranking adjusted for the difficulty of courses and (why not?) the professors offering them, this once again broken down in terms of major and nonmajor courses.
Thus my vision of the royal road to grading probity, reached by a fairly simple change in our record keeping. The plan might actually work, I think. But there has never been any shortage of such schemes. In this age of computers, in a profession full of people deeply versed in statistical methods, there can be no real difficulty in devising a grading system more fair, consistent, accurate, and far more informative than the one we now have. Once we decide that we really want such a system, that we prefer transparency to privacy in these matters, designing and implementing one will be child's play. But deciding is the hard part, and always has been.

