Bob or Bot: Exploring ChatGPT’s answers to University Computer Science Assessment

Richards, Mike; Waugh, Kevin; Slaymaker, Mark; Petre, Marian; Woodthorpe, John and Gooch, Daniel (2024). Bob or Bot: Exploring ChatGPT’s answers to University Computer Science Assessment. ACM Transactions on Computing Education, 24(1) pp. 1–32.

DOI: https://doi.org/10.1145/3633287

Abstract

Cheating has been a long standing issue in university assessments. However, the release of ChatGPT and other free-to-use generative AI tools have provided a new and distinct method for cheating. Students can run many assessment questions through the tool and generate a superficially compelling answer, which may or may not be accurate. We ran a dual-anonymous “quality assurance” marking exercise across four end-of-module assessments across a distance university CS curriculum. Each marker received five ChatGPT-generated scripts alongside 10 student scripts. A total of 90 scripts were marked; every ChatGPT-generated script for the undergraduate modules received at least a passing grade (>40%), with all of the introductory module CS1 scripts receiving a distinction (>85%). None of the ChatGPT taught postgraduate scripts received a passing grade (>50%). We also present the results of interviewing the markers, and of running our sample scripts through a GPT-2 detector and the TurnItIn AI detector which both identified every ChatGPT-generated script, but differed in the number of false-positives. As such, we contribute a baseline understanding of how the public release of generative AI is likely to significantly impact quality assurance processes. Our analysis demonstrates that, in most cases, across a range of question formats, topics and study levels, ChatGPT is at least capable of producing adequate answers for undergraduate assessment.

Viewing alternatives

Download history

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions

Export

About