Mitigating Propaganda in LLMs: Rhetoric Generation Explained

Date:

When Agents Persuade: Rhetoric Generation and Mitigation in LLMs

Summary: arXiv:2603.04636v2 Announce Type: replace

Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective.

Introduction

In recent years, large language models (LLMs) have gained prominence in various applications, from customer service to content creation. However, their deployment in open environments raises concerns regarding the potential misuse of these models to generate manipulative content. This article delves into the capabilities of LLMs to produce propaganda and the methods to mitigate such behavior.

Understanding Propaganda in LLMs

Propaganda is a form of communication aimed at influencing the attitude of a community toward some cause or position. The unique ability of LLMs to analyze and generate text can be leveraged to create persuasive narratives. In our study, we aimed to uncover the extent to which LLMs can be manipulated to produce propaganda.

Methodology

To explore the propagandistic capabilities of LLMs, we employed two specialized models:

  • Propaganda Classifier: This model distinguishes between propaganda and non-propaganda text.
  • Rhetorical Technique Detector: This model identifies various rhetorical strategies such as:
    • Loaded language
    • Appeals to fear
    • Flag-waving
    • Name-calling

Findings

Our research revealed that LLMs could produce content laden with propagandistic elements when prompted. The use of rhetorical techniques was evident, showcasing how these models can be exploited to sway public opinion. The implications of these findings are significant, especially in the context of misinformation and social influence.

Mitigation Strategies

To address the potential for LLMs to generate manipulative content, we explored several mitigation strategies:

  • Supervised Fine-Tuning (SFT): This method involves refining the model based on labeled data to reduce the likelihood of generating propaganda.
  • Direct Preference Optimization (DPO): This technique focuses on aligning the model’s outputs with user preferences to discourage propagandistic content.
  • Odds Ratio Preference Optimization (ORPO): Our findings indicated that ORPO was the most effective strategy, significantly decreasing the model’s propensity to generate harmful content.

Conclusion

The study highlights the dual-edged nature of LLMs: while they offer significant advantages in text generation, they can also be misused for propaganda purposes. Implementing effective mitigation strategies is crucial for ensuring that these models serve constructive roles in society. As we continue to explore the capabilities of LLMs, ongoing research is needed to balance innovation with ethical considerations.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.