Copy the page URI to the clipboard
Richards, Mike; Waugh, Kevin; Slaymaker, Mark; Petre, Marian; Woodthorpe, John and Gooch, Daniel
(2024).
DOI: https://doi.org/10.1145/3633287
Abstract
Cheating has been a long standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools have provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university CS curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers, and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector which both identified every ChatGPT-generated script, but differed in the number of false-positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that, in most cases, across a range of question formats, topics and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.