IMAGAgent: Advanced Multi-Turn Image Editing Framework

IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection

Summary: arXiv:2603.29602v1 Announce Type: cross

Abstract: Existing multi-turn image editing paradigms are often confined to isolated single-step execution. Due to a lack of context-awareness and closed-loop feedback mechanisms, they are prone to error accumulation and semantic drift during multi-turn interactions, ultimately resulting in severe structural distortion of the generated images. For that, we propose IMAGAgent, a multi-turn image editing agent framework based on a “plan-execute-reflect” closed-loop mechanism that achieves deep synergy among instruction parsing, tool scheduling, and adaptive correction within a unified pipeline.

Introduction

In recent years, the demand for advanced image editing capabilities has surged, driven by the proliferation of social media and digital content creation. Traditional image editing tools, however, often lack the sophistication required for multi-turn interactions, leading to inefficiencies and inaccuracies. IMAGAgent addresses these challenges through an innovative approach that integrates various components of image editing into a cohesive system.

Key Features of IMAGAgent

Constraint-Aware Planning: The foundation of IMAGAgent is its planning module, which utilizes a vision-language model (VLM) to decompose complex instructions into manageable sub-tasks. This process is governed by three key principles: target singularity, semantic atomicity, and visual perceptibility.
Tool-Chain Orchestration: IMAGAgent dynamically constructs execution paths based on the current image and historical context. This capability allows for adaptive scheduling and seamless collaboration among various operation models, including image retrieval, segmentation, detection, and editing.
Multi-Expert Collaborative Reflection: A central large language model (LLM) plays a critical role in synthesizing critiques from the VLM, providing holistic feedback to the editing process. This feedback loop not only facilitates fine-grained self-correction but also enhances future decision-making by recording outcomes.

Experimental Validation

To evaluate the effectiveness of IMAGAgent, extensive experiments were conducted using the newly constructed MTEditBench and the MagicBrush dataset. The results demonstrated that IMAGAgent significantly outperforms existing methods in several key metrics:

Instruction Consistency: IMAGAgent maintains high fidelity to user instructions across multiple editing iterations.
Editing Precision: The framework achieves remarkable accuracy in executing complex editing tasks.
Overall Quality: The final images produced exhibit superior quality with fewer distortions and artifacts.

Conclusion

IMAGAgent represents a significant advancement in the field of image editing, offering a robust solution to the challenges posed by multi-turn interactions. By integrating planning, execution, and reflection within a single framework, it sets a new standard for image editing tools. The code for IMAGAgent is publicly available, allowing researchers and developers to build upon this innovative framework. For more information, visit the GitHub repository.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

IMAGAgent: Advanced Multi-Turn Image Editing Framework

IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection

Introduction

Key Features of IMAGAgent

Experimental Validation

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related