ORCID
- Andy J. Wills: 0000-0003-4803-0367
Abstract
Legal decision-making is expected to meet high standards of consistency and rationality, yet human judgments in this domain are known to be influenced by procedural factors such as evidence order and intermediate evaluations. Recent work has shown that even legal professionals, including judges, are susceptible to such biases when assessing criminal cases. This raises a critical question: do large language models, which are increasingly proposed as decision-support tools in legal contexts, exhibit similar procedural biases—and if so, can these biases be mitigated? To address this question, we tested GPT-4o and GPT-5.2 using a controlled legal judgment task adapted from prior human research. The task involved simplified criminal cases in which we systematically manipulated (i) the order of incriminating and exonerating evidence and (ii) whether an intermediate guilt judgment was required before a final decision. Model responses were directly compared to human judgments from the original study. We additionally examined whether prompt engineering strategies, based on current best-practice recommendations, could reduce observed biases. GPT-4o exhibited robust order effects and a form of evaluation bias, although the latter differed in structure from the human pattern. GPT-5.2 showed similar but attenuated effects. Across both models, prompt engineering had limited and inconsistent impact, failing to reliably eliminate procedural sensitivity. These findings suggest that even advanced large language models remain vulnerable to normatively irrelevant procedural influences. More broadly, they advise caution in treating large language models as inherently rational or bias-resistant decision-support systems in high-stakes professional domains such as law.
DOI Link
Publication Date
2026-03-17
Publication Title
Behavioral Sciences
Volume
16
Issue
3
ISSN
2076-328X
Acceptance Date
2026-03-12
Deposit Date
2026-06-10
Funding
E.M.P. and T.F.B. were supported by AFOSR grant FA8655-23-1-7220 to E.M.P.
Additional Links
Keywords
bias mitigation, cognitive biases, GPT, large language models, legal decision-making, order effects, prompt engineering
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Bessis, T., Wills, A., Wojciechowski, B., White, L., & Pothos, E. (2026) 'Legal Decision Biases in GPT: A Comparison with Human Judgment', Behavioral Sciences, 16(3). Available at: 10.3390/bs16030437
