Cultural Differences in Moral Judgments via Large Language Models

Date:

Exploring Cultural Variations in Moral Judgments with Large Language Models

Summary: arXiv:2506.12433v2 Announce Type: cross

Abstract

Large Language Models (LLMs) have shown strong performance across many tasks, but their ability to capture culturally diverse moral values remains unclear. In this paper, we examine whether LLMs mirror variations in moral attitudes reported by the World Values Survey (WVS) and the Pew Research Center’s Global Attitudes Survey (PEW). We compare smaller monolingual and multilingual models (GPT-2, OPT, BLOOMZ, and Qwen) with recent instruction-tuned models (GPT-4o, GPT-4o-mini, Gemma-2-9b-it, and Llama-3.3-70B-Instruct).

Methodology

Using log-probability-based moral justifiability scores, we correlate each model’s outputs with survey data covering a broad set of ethical topics. Our research aims to understand the extent to which these models reflect human moral judgments across different cultural contexts.

Key Findings

  • Many earlier or smaller models often produce near-zero or negative correlations with human judgments.
  • In contrast, advanced instruction-tuned models achieve substantially higher positive correlations, indicating a better reflection of real-world moral attitudes.
  • A detailed regional analysis reveals that models align better with Western, Educated, Industrialized, Rich, and Democratic (W.E.I.R.D.) nations than with other regions.

Discussion

While scaling model size and employing instruction tuning improves alignment with cross-cultural moral norms, challenges remain for certain topics and regions. This disparity poses crucial questions about the training data diversity, potential biases, and the information retrieval implications of these models.

Implications for Future Research

Our findings suggest several areas for future research and development:

  • Bias Analysis: Further investigation into the biases inherent in LLMs is necessary to ensure that they do not perpetuate harmful stereotypes or cultural insensitivity.
  • Training Data Diversity: Increasing the diversity of training datasets can enhance the models’ ability to understand and reflect varied cultural moral values.
  • Improving Cultural Sensitivity: Strategies should be developed to improve the cultural sensitivity of LLMs, making them applicable and useful across different cultural contexts.

Conclusion

In summary, Large Language Models exhibit varying degrees of alignment with cultural moral norms, particularly favoring W.E.I.R.D. nations. Our research underscores the importance of ongoing efforts to build models that are more attuned to global moral perspectives, fostering better understanding and interaction across cultures.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.