AI Judges Follow The Law, Human Judges Follow Their Hearts, Study Reveals

Forbes - Mar 20th, 2025
Open on Forbes

A recent study by University of Chicago Law School researchers Eric A. Posner and Shivam Saran highlights significant differences between AI and human judicial decision-making. Using OpenAI's GPT-4o to replicate a study with 31 U.S. federal judges, the research simulated international war crimes appeals, varying the portrayal of defendants and adherence to legal precedent. The AI adhered to legal precedent over 90% of the time, while human judges were swayed by sympathetic portrayals roughly 65% of the time, illustrating a tendency towards legal realism and emotional influence among human judges compared to the AI's formalist approach.

This study underscores a longstanding debate in legal philosophy between legal formalism and realism, questioning whether justice should be blind or consider extralegal factors like emotions and social context. Despite attempts to make the AI incorporate emotional factors, it remained focused on precedent. The research raises philosophical questions about the nature of justice and whether AI's consistency or human judges' nuanced understanding better serves justice. As Chief Justice John G. Roberts Jr. noted, human judges may remain essential due to their capacity for compassion, which AI currently lacks.

Story submitted by Fairstory

RATING

7.2
Fair Story
Consider it well-founded

The article provides a well-rounded exploration of the differences between AI and human judicial decision-making, grounded in a recent study by University of Chicago researchers. It effectively communicates the study's findings and their implications for the legal system, raising important questions about the role of AI in justice. The article is timely and relevant, contributing to ongoing debates about technology's impact on society. While the article is clear and engaging, it could benefit from additional expert perspectives and more detailed source citations to enhance credibility and depth. Overall, it presents a thought-provoking analysis that is likely to interest a wide audience and provoke meaningful discussion.

RATING DETAILS

8
Accuracy

The story accurately reports on the study conducted by Eric A. Posner and Shivam Saran, detailing the comparison between AI and human judicial decision-making. The study's methodology, involving the use of OpenAI's GPT-4o to replicate decisions made by U.S. federal judges, is clearly described. The story correctly identifies the main findings: AI's strict adherence to legal precedent and human judges' susceptibility to sympathy. However, while the story provides a comprehensive overview, it does not offer direct citations or links to the original study or related academic sources, which slightly affects verifiability. The claim about statistical analysis confirming non-random differences is credible but would benefit from more detailed data presentation.

7
Balance

The article presents a balanced view of the debate between legal formalism and realism by highlighting both the AI's adherence to precedent and the human judges' emotional considerations. It acknowledges the potential benefits and drawbacks of each approach without overtly favoring one side. However, the article could enhance balance by including perspectives from legal experts or ethicists who might provide additional insights into the implications of AI in the judiciary. The absence of such voices means that while the article is largely balanced, it misses the depth that could be provided by a wider range of expert opinions.

8
Clarity

The article is well-structured and clearly written, making it accessible to readers without a legal background. The language is straightforward, and the narrative flows logically from the study's background to its findings and implications. The use of examples, such as the hypothetical case of a defendant facing extraordinary circumstances, helps illustrate complex legal concepts. However, the article occasionally uses technical terms, such as 'p-value,' without sufficient explanation, which might confuse some readers. Overall, the clarity is strong, but a little more simplification of technical jargon could improve comprehension.

6
Source quality

The article relies heavily on the study conducted by University of Chicago Law School researchers, which lends credibility to the findings presented. However, the lack of additional sources or expert commentary limits the depth of the analysis. The article would benefit from referencing peer-reviewed articles, expert interviews, or legal analyses to support its claims further. The reliance on a single study, while credible, restricts the breadth of the source material, potentially impacting the overall reliability and depth of the reporting.

7
Transparency

The article is transparent in explaining the study's methodology and the key findings, providing readers with a clear understanding of how the conclusions were reached. However, it lacks detailed information about the potential limitations of the study or any conflicts of interest that might exist. Greater transparency about the researchers' affiliations and the study's funding sources would enhance the article's credibility. Additionally, the article could improve by disclosing any potential biases in interpreting the study's results.

Sources

  1. https://pmc.ncbi.nlm.nih.gov/articles/PMC11781698/
  2. https://news.harvard.edu/gazette/story/2024/06/does-ai-help-humans-make-better-decisions-artificial-intelligence-law/
  3. https://clsbluesky.law.columbia.edu/2025/02/19/the-role-of-ai-in-judicial-decision-making/
  4. https://publicpolicy.ie/papers/relying-on-ai-in-judicial-decision-making-justice-or-jeopardy/
  5. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5098708