Blog Post

Crowdsourcing Grading: Follow-Up

Please don't ever ask me to bet on which topics will be "hot" and which ones not.  I managed to blog about an experiment in evaluation (y-a-w-n, so I thought!) during the week our new site was down several times for repair and then when I was off the grid, on my big one-week-of-vacation-fun of the year.  Off the grid.  I thought posting what is basically a revised version of a very successful course I taught last spring, "This Is Your Brain on the Internet" was a respectable place-holder during absence ('s and my own).   I threw in some new ideas about experimenting with grading methods almost as an afterthought.   About 5000 views and dozens of comments later, well, I get it now:  Grading is a hot topic!


Here's why I think that is the case.   I don't believe anyone really loves grading or that anyone believes that grading, as it is presently configured, is much more than an expedient (not a wonderful) form of measurement.  It works since our community has determined it's our normal metric for measuring, certifying, and applying a quick-to-read-and-calculate "summary" that represents a performance in a particular course.   I believe we all know it is an imperfect instrument and, well, we settle because it's the system we've inherited and it's easier to just go along with it than to invent something entirely new, profoundly accurate, and true to our Platonic idea of what real assessment might be, especially since no such idealistic or permanent standard exists.   The point of evaluations is the grounds and circumstances changes constantly so we settle for something that works well enough and that is legible to anyone who sees it.  An "A" is universally recognized as the top grade, even though we all know one person's A can be much stingier than another's.   Grades are a compromise in our pedagogical lives, which is one reason we're so proud when there are other external signs that we're getting it right--the student I think is best gets into a top grad school, wins a prize judged by someone else, and so forth.  But we barely wax nostalgic about the amazing grading experience we had back in 2005!  In fact, I've never been around the proverbial departmental water cooler with faculty reminiscing about their best ever grading experience--although I've been in several conversations about my colleagues' best ever teaching experience.. Good teachers love to talk about the "aha" moments of teaching:  The lightbulb that finally went on for a student who worked so hard to grasp a concept.  A great conversation after class.  A brilliant insight offered by that shy gal in the last row.  An inspired lecture on the Panic of 1873 delivered the October 2008 day after the announcement that all the world's financial systems were overextended and about to collapse.  A seminar that felt like magic every day.  Yes, we all wax eloquent about those experiences and their pedagogical opposite.  I have never once, in my entire career, heard anyone, young or old or in between, croon about the elegant perfection of assigning a grade.  "I knew it had to be a B,  and I just nailed it!"


Grades are a compromise to the practical.  And they are a compromise.  We all know that in our hearts and wrestle with it at the end of each semester.   "Is this an A-- or a B++?"  Really now.


I believe every  dedicated, experienced, concerned teacher has at least one grading story to report that still is a source of concern long after (I'm changing details here but all are based on actual incidents and you can all fill in your painful anecdote here:  the brilliant, passionate, on-fire student who was dumped by his girlfriend the week before the final and didn't tell you until five years later that that's why he never turned in the final paper.   A+ mind, no final paper, mysterious and disturbing silence. Why? The intrusion of a broken heart, physical exhaustion, depression, an illness not diagnosed until later, a death in the family: How do you calculate the average of life's pains?  


Or another example:   the student who was stunning in class participation and couldn't take a timed test to save her life? Now, we make provisions for this; ten years ago, that would have been, what, a D? Maybe a C if you're a pushover?   Maybe you find out later that she carefully planned her college years taking courses without timed exams but she took your course because she heard it was the kind of class students remember twenty years later.  She understood it, she contributed, she messed up on the tests.  Give that a D+?


Probably not in 2009.  The point is that we know a grade is an artifical marker of a certain kind of performance under a certain set of circumstances.  Our understanding of learning disabilities, for example, and our attention to them has changed so radically in the last generation that students who would have been "failures" in 1980 even, because of their test-taking shortcomings, now are required, by universities and by Federal Disabilities laws, to be given tests that adequately test their particular kind of intelligence.  Bravo, I say!   But such a provision already acknowledges that evaluation is a metric that, like all metrics, must be, well, evaluated.  Standards of measurement are invented; they do not exist in some universal, unchanging way. We've become obsessed, as a culture, with assessment.   Is that the same as being obsessed with the highest standards of excellence?  Is excellence the same in all situations?  I don't think so.  You measure success in a basketball game differently than you measure success at a symphony.   There is no one right way for every circumstance, nor is there, in teaching, for every individual, for every course, for every discipline, for every prof or every student. 


We also know that we, as teachers, fudge our evaluation of evaluations all the time.   We do not live in a perfect world and the drastic underfunding of teaching in the last decades has forced many a prof to make compromises that are anything but fair, respectable, or even defensible.  You want decent evaluation?  Then make sure your taxes (or your charitable contributions) go to supporting education in the way it should be supported.   Compromise your educational system and you will get profs who are forced to compromise, and evaluation, sadly, is one of many areas where such compromise happens.  Am I giving away state secrets when I suggest that there are some profs out there who, faced with 300 students in a course, with no TA or maybe only one or two, end up giving multiple choice or short-answer exams including in subjects where they would admit such exams are a travesty.   "Ahab was trying to kill the White Whale because (a) he was a monomaniac (b) the White Whale represented his unfulfilled quest for life (c) . . ."   Give it a rest.   If I'm pouring out my soul teaching the most conflicted, anguished, soul-wracked writer in America, do I really think answering (a) and not (b) makes my students deserve an A and not B in my Melville course?   I don't think so.


Because I'm writing a book now on cognition and digitality, I have spent a lot of the last decade reading books and articles (probably not just dozens but hundreds) on assessment, evaluation, and grading.   I didn't really understand, until my "How To Crowdsource Grading" blog, that others might be as interested in this topic as I am.   It is quite clear to me that assessment in the forms now used in K-12 and in colleges and universities too is very much a product of the Machine Age.   Historians of grading bicker about whether grading as a practice came about in the late 18th century at Cambridge or really in the mid-nineteenth century, but just about everyone concurs that just about the time that Taylor was taking out his stopwatch to measure how long it took for a worker to fill a wheel barrow, move it, dump it out again, there were others (like Galton) who were figuring out how to measure brain productivity.   Galton invented and/or perfected (this too has different scholarly adherents) many of the quantitative measures (such as regression toward the mean) that students still learn in Statistics 101.   Galton, of course, was Darwin's cousin.  He misrepresented his famous relative's evolutionary conclusions, applying a eugenics twist to his "scientific measurement" of brain power.   He had lots of ways for evaluating intellect, criminality, sexual tendencies, and so forth but we now tend to discredit head-bumps, fingerprints, and photographic imaging as scientific evidence of deviance or deficiency . . .   but we still rely on some of his other metrics, including achievement tests and linear regressions.  


We also still rely on the educational performance tests that Binet developed for the French educational ministry and that, in American hands, were transformed into tests of one's Intelligence Quotient.  Binet protested vehemently at what he saw as a misapplication of a diagnostic assessment of performance to a measure of innate and unchanging abilities but, never mind, the IQ test was administered, with the willing cooperation of the U.S. government and military, to over a million men who wanted to be soldiers in World War I.   The test determined who would be an officer and who lacked even the brain power to be a foot soldier, canon fodder in the bloodiest of wars.  Guess what?  We now know that officers were gentlemen, i.e. that IQ tests results given to WWI soldiers correlated with social class, education levels, affluence, linguistic abilities, acculturation (for immigrant groups), and not, as was thought in 1917, with one's inherent ethnic "traits." 


Oh, I could go on.   Suffice it to say that I'm blogging rapidly, from memory, but the basic point is that evaluation is vexed and ever-changing and often misapplied.  We're constantly going back and revising the test and then coming up with a different final score.   Those holier-than-thou who love to believe standards are objective and that anyone who would "experiment" is really getting out of work or soft in the head or a "relativist" or some other sin are, I believe, being disingenuous.  Push just about anyone and they can come up with an anecdote of a standard that didn't work, was unjust, needed improvement, or was so unfair it needed to be tossed out altogether so we could start all over again.


That is how I feel about assigning grades in a conventional way (whatever that means!) in a class exploring new modes of  cognition and digitality.  The point of this course is to rethink our model of mind that has been handed down to us from the Machine Age and has about all the subtlety of that age.   Which is to say not very much subtlety at all, certainly nothing at all like the complexity of the human mind.   We are now at one of the great, transitional, transformative ages in human history when human behaviors are turning out to be quite different from what we would have said even five or ten or twenty years ago they were.  No one would have believed "human nature" (remember Rational Choice theory?) would have ever, under any circumstances allowed for successful (in fact "winning" and "excellent") global collaboration among unanonymous and unpaid self-appointed team members working together toward some goal, without a leader, without predetermined rules and severe penalties for violating those rules.  But Linux exists. So does Wikipedia.  On the level of pleasure, so do intricate and successful raids in World of Warcraft.  On and on.  if we have new evidence of human social and intellectual behaviors, new evidence that people actually like to learn and teach and share together, we need more subtle and interesting ways of assessing how individuals and groups work collectively.  Digital thinking is a mode of thinking together, on line, through a process of peer evaluation and peer contribution, using a form of  "participatory learning" that blurs the lines between work and play, intellectual and social life.   It's a fascinating phenomenon when considered in historical terms and especially when viewed against theories of what is "humanly possible" as promulgated in the Machine Age.


"Nature red in tooth and claw" is a different conceptual operating system than the "wisdom of crowds."


That's what "This Is Your Brain on the Internet" is about.   I loved teaching the class last spring to an astonishing and wonderful group of Duke's ISIS students (ISIS stands for Information Science + Information Studies).  These students tend to major in wide-ranging subjects like Computer Science and French, or Engineering and Music, or Philosophy, Biological Anthropology, and English.  They deserve a prof who is as thoughtful and demanding and introspective about learning as they are.  Toffler's idea of "learning, unlearning, and relearning" is what this particular course promises and demands. 


So, we'll see how this little experiment works.  The point of an experiment is to try it and, if it doesn't work, then you try something else.   The point of an experiment in our crowdsourcing, interactive age is you can try it along with the students and you can make adjustments along the way if it isn't working.  


Successful collaboration cannot work unless all the team members are able to judge excellence, communicate judgment in a constructive and persuasive way, and then act on that judgment.  Assigning a grade based on a pre-existing scale is very different than real-world negotiations which lead to a successful final product, whatever that product may be.  Since the whole structure of "This Is Your Brain on the Internet" is working toward collaboration, the course is structured around a range of experiences that stress, demonstrate, and epitomize collaboration.  Last year, we had a field trip, for example, to first a rehearsal for Shin Wei's dance performance (he choreographed the opening ceremony of the Beijing Olympics and was an artist-in-residence at Duke last year; he both directs and relies on the improvisation of his dancers) and then to a performance and discussion of that performance.   We also had a Level 80 WoW gamer talk about his experiences in the game while others put that experience into a historical and theoretical context.   Did every group project succeed?  Not at all.  The range was incredibly varied, with some near-perfect projects and then some that were mediocre.  Later, some students in one of the groups afterwards complained privately that some of their teammates had failed them.  I regretted that they came to me after the result and began to think, back then, of how to make evaluation, feedback, standards, assessment, communication of difference, and engaged critique a structural part of this course.  These too are what Howard Gardner and Howard Rheingold call "twenty-first century literacies," skills necessary for excellence in our digital age.  Like all skills these need to be honed.  And practiced.  Not just graded.


So next year, as the first assignment in "This Is Your Brain on the Internet," students will be reading about evaluation and grading, including my blogs and the articles and the comments that all came rolling in while I was off the grid last week.  Some of the comments are so thoughtful, collaborative, and evaluative.  Some are positive, some negative, but still productive.  Others are just crochety, cynical, and intellectually dull if not mindless.  But I don't have to be the judge.  We'll put together a portfolio of comments and students will be able to start off evaluating "evaluations" of these blogs and essays on evaluation.   It's a perfect way to begin a class, since it already shows the positive and negative ways that people use the openness of communication online to contribute or just to rant.  Both.  Maybe I should even ask them to assign letter grades, A or F.  No curved grades, please.  (That, in case it is lost on screen, is a joke . . . but it's enticing.)

Evaluating evaluation is going to be an integral part of "This Is Your Brain on the Internet."  What an experience!  I can't wait.  And, as I've said, I promise to report on what happens.  Stay tuned!





I think this is a very thoughtful response and quite on the mark. Thanks Cathy




Your move to post the problem you are working in the internet is a good one, especially appropriate given the title of your course. Washington State University ran an ePortfolio contest in 2008 and we did some case studies of the results. Among them we found the strategy of working problems in public -- that is, the portfolio is a place to work not just to showcase.

The skills that your course title implies very much include assessing feedback, so by all means have your student assess the feedback your posts are getting. Making the feedback process more transparent, and making it possible to gather feedback from a diverse (but interested) audience and making it possible to gather feedback from work done in contexts on the Internet, among the community of practice where the work is authentic is the heart of the Harvesting Gradebook concept.

Finally, when we read the Chronicle article it seemed that we ought to offer to collaborate. In keeping with our belief about working problems in public, and portfolios being workspaces, we elected to engage you, and offer to collaborate in a public fora, with the hope that others may see, and contribute to, the thinking. (Note, too, that the portfolio of this work on harveting grading now begins span into another blog -- further evidence for our thinking that assessment needs to be done in-situ to the community of practice where the work is being done)

I have on my to-do list for next week to write the blog post that will contain the results of the harvested feedback on Erica's article and some other thoughts about the use of these "richer than grades" forms of feedback to enhance the student and faculty learning outcomes.


Hi, Nils, I'd love to collaborate.  And when the academic year starts, I might propose to this year's HASTAC Scholars that we host a whole public forum on evaluation.   Clearly it is a very hot topic.   I'm most excited about beginning my class on digital thinking by having students read all of the various responses to my original course description, the various suggestions people are making, and then evaluating the range of evaluations, comments, snarky remarks, helpful offers to collaborate and so forth.  All of this feeds into a deep thinking about what evaluation means, how we give feedback to our peers---and why!   Thanks so much for writing.


Cathy, I think you're right. Thinking about how we give feedback (feedback as formative assessment) is important.

I went back and re-read your interview Participatory Learning and the New Humanities by Randy Bass and Theresa Schlafly.
I think what you are talking about by having your students look at the Chronicle article, your blog and all the comments, is to observe participatory learning in action.

We read your interview in our Morning Reading Group and were impressed by the idea of 'collaboration by difference.'  In attending to the feedback you got, I hope your students explore the thought that this is all happening in public, and in fact, collaboration by difference almost demands public performance. Howard Rheingold does not quite use these terms when he talks about meta-skills for participatory learning, but I think that it may be what he means by "attention-to-attention."

This working in public is a big deal. Not only can it be uncomfortable, but understanding it (to make use of it) is a new strategy than most of us don't know, so we need to attend to how get feedback (which might be another meaning of Rheingold's phrase) as well as attending to the feedback we get. 

One reason we are exploring the Harvesting Gradebook idea is that it makes a more overt effort to gather feedback. I just finished this reply as promised in my previous comment to you. So far, there are only a couple people who have completed the survey. Looking at the preliminary data, I'm wondering what you think about the satisfactory/unsatisfactory scale?

You indicate in a response to me that you might use the survey with your students. If you are interested in doing that, let me know so that we can refresh the data for you (its not dynamic (alas)). If you might want to consider using a rubric with more criteria, we have the WSU Critical Thinking Rubric implemented in a survey and could readily make it available to you for harvesting use.

Finally, we are going to do a session for the TLT Group Friday Live on Sept 25, 2009. The working title is "The Harvesting Gradebook: From student feedback to university accreditation." We are interested in the problem of how feedback to an individual piece of student work can be "rolled up" to provide feedback (and evidence of success) to the University. For a preview see this post on Transforming the Gradebook from last August.


Yes, all of these are pieces of the digital thinking public.  I'm very excited to have these additions to what my students will start off with next winter (such a long time from now!).   I think I also mentioned that we're going to do a HASTAC Scholars Forum on evaluation and grading and it would be so great if you and your students participated.   It will probably be our second or third Forum of the year.    I would love my students to have a look at your data/survey and will let you know before the term begins so we can have the refreshed data.  Now THAT is a refreshing way to think about evaluation.


One last thought, more on the topic of the course than our conversation on assessment.

A year ago I wrote a blog post about learning and Web 2.0 for a bright high school student who is home schooled. I wrote it following a wide ranging conversation we had about the Internet. At any rate, it might be of interest to your students, but they'll need to do a little translation from my K-12 examples.