Soft-Label Governance for Safer Multi-Agent AI Systems

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Summary: arXiv:2604.19752v1 Announce Type: cross

Abstract: Multi-agent AI systems exhibit emergent risks that no single agent produces in isolation. Existing safety frameworks rely on binary classifications of agent behavior, discarding the uncertainty inherent in proxy-based evaluation. We introduce SWARM (System-Wide Assessment of Risk in Multi-agent systems), a simulation framework that replaces binary good/bad labels with soft probabilistic labels p = P(v{=}+1) ∈ [0,1], enabling continuous-valued payoff computation, toxicity measurement, and governance intervention.

SWARM implements a modular governance engine with configurable levers such as:

Transaction taxes
Circuit breakers
Reputation decay
Random audits

The framework quantifies the effects of these governance mechanisms through probabilistic metrics including:

Expected toxicity: ℰ[1{-}p | accepted]
Quality gap: ℰ[p | accepted] – ℰ[p | rejected]

Our experiments conducted across seven scenarios with five-seed replication reveal that strict governance reduces welfare by over 40% without yielding improvements in safety. In contrast, aggressively internalizing system externalities leads to a dramatic collapse in total welfare, dropping from a baseline of +262 to -67, all while toxicity levels remain consistent.

We found that circuit breakers necessitate careful calibration. Overly restrictive thresholds can severely diminish system value, while an optimal threshold can balance moderate welfare against minimized toxicity. Our companion experiments demonstrate that soft metrics effectively detect proxy gaming by self-optimizing agents that may pass conventional binary evaluations.

This fundamental governance layer is applicable to live LLM-backed agents, including Concordia entities, Claude, and GPT-4o Mini, without requiring modifications. The results from our study emphasize that achieving distributional safety demands the use of continuous risk metrics, highlighting the need for governance lever calibration that involves quantifiable safety-welfare tradeoffs.

For those interested in further exploring this topic, the source code and project resources are publicly available at https://www.swarm-ai.org/.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Soft-Label Governance for Safer Multi-Agent AI Systems

Soft-Label Governance for Distributional Safety in Multi-Agent Systems

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related