Legal Decision Biases in GPT: A Comparison with Human Judgment

ORCID

Andy J. Wills: 0000-0003-4803-0367

Abstract

Legal decision-making is expected to meet high standards of consistency and rationality, yet human judgments in this domain are known to be influenced by procedural factors such as evidence order and intermediate evaluations. Recent work has shown that even legal professionals, including judges, are susceptible to such biases when assessing criminal cases. This raises a critical question: do large language models, which are increasingly proposed as decision-support tools in legal contexts, exhibit similar procedural biases—and if so, can these biases be mitigated? To address this question, we tested GPT-4o and GPT-5.2 using a controlled legal judgment task adapted from prior human research. The task involved simplified criminal cases in which we systematically manipulated (i) the order of incriminating and exonerating evidence and (ii) whether an intermediate guilt judgment was required before a final decision. Model responses were directly compared to human judgments from the original study. We additionally examined whether prompt engineering strategies, based on current best-practice recommendations, could reduce observed biases. GPT-4o exhibited robust order effects and a form of evaluation bias, although the latter differed in structure from the human pattern. GPT-5.2 showed similar but attenuated effects. Across both models, prompt engineering had limited and inconsistent impact, failing to reliably eliminate procedural sensitivity. These findings suggest that even advanced large language models remain vulnerable to normatively irrelevant procedural influences. More broadly, they advise caution in treating large language models as inherently rational or bias-resistant decision-support systems in high-stakes professional domains such as law.

DOI Link

10.3390/bs16030437

Publication Date

2026-03-17

Publication Title

Behavioral Sciences

Volume

16

Issue

3

ISSN

2076-328X

Acceptance Date

2026-03-12

Deposit Date

2026-06-10

Funding

E.M.P. and T.F.B. were supported by AFOSR grant FA8655-23-1-7220 to E.M.P.

Additional Links

https://www.scopus.com/pages/publications/105034091810

Keywords

bias mitigation, cognitive biases, GPT, large language models, legal decision-making, order effects, prompt engineering

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Recommended Citation

Bessis, T., Wills, A., Wojciechowski, B., White, L., & Pothos, E. (2026) 'Legal Decision Biases in GPT: A Comparison with Human Judgment', Behavioral Sciences, 16(3). Available at: 10.3390/bs16030437

School of Psychology

Legal Decision Biases in GPT: A Comparison with Human Judgment

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Funding

Additional Links

Keywords

Creative Commons License

Recommended Citation

Search

Browse

About

Links

School of Psychology

Legal Decision Biases in GPT: A Comparison with Human Judgment

Authors

ORCID

Abstract

DOI Link

Publication Date

Publication Title

Volume

Issue

ISSN

Acceptance Date

Deposit Date

Funding

Additional Links

Keywords

Creative Commons License

Recommended Citation

Share

Search

Browse

About

Links