Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models
Summary: arXiv:2604.08757v1 Announce Type: cross
Abstract
Humor is one of the most culturally embedded and socially significant dimensions of human communication, yet it remains largely unexplored as a dimension of Large Language Model (LLM) alignment. In this study, five frontier language models play the same Cards Against Humanity games (CAH) as human players. The models select the funniest response from a slate of ten candidate cards across 9,894 rounds. While all models exceed the random baseline, alignment with human preference remains modest. More striking is that models agree with each other substantially more often than they agree with humans. We show that this preference is partly explained by systematic position biases and content preferences, raising the question whether LLM humor judgment reflects genuine preference or structural artifacts of inference and alignment.
Introduction
The exploration of humor in artificial intelligence has gained traction as researchers delve into the complexities of human communication. This study aims to evaluate how well Large Language Models (LLMs) align with human humor preferences, particularly through the lens of the popular game, Cards Against Humanity (CAH).
Methodology
To investigate humor alignment, five advanced LLMs were selected to participate in a series of CAH games. Each model was tasked with choosing the funniest response from ten candidate cards over a total of 9,894 rounds. This setup allowed for a comprehensive analysis of how each model’s humor judgment aligns with that of human players.
Findings
- All models exceeded the random baseline in selecting humorous responses.
- However, there was only modest alignment with human preferences.
- Notably, the models demonstrated a higher degree of agreement with each other than with human players.
Discussion
The findings raise important questions about the nature of humor in LLMs. The substantial inter-model agreement suggests that these systems may share underlying biases or preferences that do not necessarily reflect human tastes. This discrepancy indicates that while LLMs can generate responses that are statistically funnier than random chance, their humor judgment may be influenced by systematic biases rather than an authentic understanding of humor.
Implications
This research has significant implications for future developments in AI and its application in social interactions. Understanding how LLMs interpret humor can inform their design and deployment in various contexts, from entertainment to customer service. Additionally, it highlights the need for enhanced alignment strategies that better capture the nuances of human humor.
Conclusion
The study represents a pioneering effort to benchmark humor alignment in LLMs using a structured game format. As AI continues to evolve, ongoing research in this area will be crucial for developing models that not only understand language but also the subtleties of human interaction. The insights gained from this study pave the way for further exploration into how artificial intelligence can better resonate with human experiences.
