The comparison of confirmation bias between human software testers and ChatGPT
De Alwis, Rasanjana Dhanushkha (2025)
Diplomityö
De Alwis, Rasanjana Dhanushkha
2025
School of Engineering Science, Tietotekniikka
Kaikki oikeudet pidätetään.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2025060257151
https://urn.fi/URN:NBN:fi-fe2025060257151
Tiivistelmä
As software systems become increasingly complex, ensuring their reliability through rigorous testing is more critical than ever. Functional test case generation is a key activity in this process, often requiring testers to balance between confirming expected behaviours and uncovering potential faults. However, human testers are known to exhibit confirmation bias, a tendency to focus on validating expected outcomes rather than challenging them (Salman et al., 2018). With the growing adoption of large language models (LLMs) in software engineering tasks, it becomes essential to understand whether these models inherit similar biases. This thesis investigates whether a large language model (LLM), specifically ChatGPT-4, exhibits confirmation bias and how its behaviour compares with that of human software testers. The human test case data used in this study was sourced from prior controlled experiments conducted by Salman et al. (2018, 2023). The study addresses two main research objectives. First, it examines whether Chat-GPT demonstrates a confirmatory behaviour. Second, it compares the confirmation bias between ChatGPT-4 and human testers. The results show a significant difference in the number of consistent versus inconsistent test cases generated by ChatGPT-4, indicating a confirmatory behaviour in its test design behaviour. Additionally, no statistically significant difference in confirmation bias was found between the two groups. These findings suggest that ChatGPT-4 behaves similarly to human testers in test case design and demonstrates a comparable level of confirmation bias. This thesis lays the groundwork for future research on how prompt design, richer input such as graphical user interfaces, or collaborative human-AI testing environments could help reduce bias and improve testing strategies.