“I’ve been working on this now for about 25 years, and I feel that … the time is right and it’s really starting to be used now,” says Peter Foltz, a research professor at the University of Colorado, Boulder. He’s also vice president for research for Pearson, the company whose automated scoring program graded some 34 million student essays on state and national high-stakes tests last year. “There will always be people who don’t trust it … but we’re seeing a lot more breakthroughs in areas like content understanding, and AI is now able to do things which they couldn’t do really well before.”
Foltz says computers “learn” what’s considered good writing by analyzing essays graded by humans. Then, the automated programs score essays themselves by scanning for those same features.
“We have artificial intelligence techniques which can judge anywhere from 50 to 100 features,” Foltz says. That includes not only basics like spelling and grammar, but also whether a student is on topic, the coherence or the flow of an argument, and the complexity of word choice and sentence structure. “We’ve done a number of studies to show that the scoring can be highly accurate,” he says.
To demonstrate, he takes a not-so-stellar sample essay, rife with spelling mistakes and sentence fragments, and runs it by the robo-grader, which instantly spits back a not-so-stellar score.
“It gives an overall score of two out of four,” Foltz explains. The computer also breaks it down in several categories of sub-scores showing, for example, a one on spelling and grammar, and a two on task and focus.