Stress-Test Benchmark for Multi-Subject Identity Preservation

Date:

When Identities Collapse: A Stress-Test Benchmark for Multi-Subject Personalization

Summary: arXiv:2603.26078v1 Announce Type: cross

Abstract: Subject-driven text-to-image diffusion models have achieved remarkable success in preserving single identities, yet their ability to compose multiple interacting subjects remains largely unexplored and highly challenging. Existing evaluation protocols typically rely on global CLIP metrics, which are insensitive to local identity collapse and fail to capture the severity of multi-subject entanglement.

In this paper, we identify a pervasive “Illusion of Scalability” in current models: while they excel at synthesizing 2-4 subjects in simple layouts, they suffer from catastrophic identity collapse when scaled to 6-10 subjects or tasked with complex physical interactions. To systematically expose this failure mode, we construct a rigorous stress-test benchmark comprising 75 prompts distributed across varying subject counts and interaction difficulties.

Key Findings

Our study highlights several critical findings regarding the capabilities of existing models:

  • Identity Fidelity: The ability of models to maintain distinct identities diminishes significantly as the number of subjects increases.
  • CLIP Metric Limitations: Standard CLIP-based metrics are inadequate for evaluating multi-subject interactions, often rewarding images that lack individual identity.
  • Introduction of SCR: We propose the Subject Collapse Rate (SCR), a new metric that effectively measures identity preservation by penalizing local attention leakage and homogenization.

Benchmark Construction

The stress-test benchmark we developed includes:

  • 75 Prompts: These prompts are designed to challenge the models with varying subject counts and complexities.
  • Interaction Categories: We categorized the prompts into three interaction difficulties: Neutral, Occlusion, and Interaction.

Evaluation of State-of-the-Art Models

Our extensive evaluation of leading models, including MOSAIC, XVerse, and PSR, reveals alarming trends:

  • As scene complexity increases, identity fidelity drops precipitously.
  • At 10 subjects, SCR scores approach 100%, indicating severe identity collapse.
  • This collapse can be traced back to semantic shortcuts used in global attention routing.

Conclusion

The findings from this study underscore the urgent need for advancements in generative architectures that prioritize physical disentanglement of subjects. As AI-driven models become increasingly integrated into various applications, the ability to accurately represent multiple identities will be paramount. Our proposed SCR metric offers a crucial step towards more reliable evaluations in multi-subject scenarios, paving the way for future research and development in the field.

In summary, while current text-to-image diffusion models show promise, they are far from achieving satisfactory performance in multi-subject personalization. Addressing these challenges will be essential for enhancing the quality and reliability of AI-generated images in diverse contexts.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.