For the past six years, the primary focus of my research has been debunking Automated Essay Scoring AKA Robo-Grading. As Melanie Mitchell has eloquently noted in a New York Times Op-Ed computers do not understand meaning. Communicating meaning is the primary function of writing, To computers, however, meaning is not only invisible, but it is also irrelevant.
All computer grading is based on the counting of proxies, such as: the total number of words, the average number of words in a sentence, the average number of characters in a word, the frequency of infrequently used words, the number of sentences in a paragraph, etc. Significantly, most computer grading engines cannot deal with texts over one-thousand words and prefer first drafts written in less than an hour.
I have argued against AES in “Length, Score, Time, & Construct Validity in Holistically Graded Writing Assessments: The Case against Automated Essay Scoring (AES),” “Critique of Mark D. Shermis & Ben Hammer, ‘Contrasting State-of-the-Art Automated Scoring of Essays: Analysis’,” “When ‘the state of the art’ is counting words,” and “Grammar checkers do not work.”
Much of my work is summarized in this power-point presentation.
My most substantive and successful work on the use of AES was in Australia in conjunction with my collaboration with the New South Wales Teachers Federation and other teachers unions. I wrote a report that resulted in the National Education Council overriding the Federal Education Minister and halting the use of AES for national tests.
The most powerful as well as dramatic refutation of the validity of AES has been the BABEL Generator created by three undergraduates from MIT and Harvard and me. It produces complete gibberish that receives perfect scores from most AES engines.