Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection
Summary: arXiv:2512.16300v2 Announce Type: replace
Abstract: Existing image forgery detection (IFD) methods either exploit low-level, semantics-agnostic artifacts or rely on multimodal large language models (MLLMs) with high-level semantic knowledge. Although naturally complementary, these two information streams are highly heterogeneous in both paradigm and reasoning, making it difficult for existing methods to unify them or effectively model their cross-level interactions. To address this gap, we propose ForenAgent, a multi-round interactive IFD framework that enables MLLMs to autonomously generate, execute, and iteratively refine Python-based low-level tools around the detection objective, thereby achieving more flexible and interpretable forgery analysis.
Introduction
The field of image forgery detection has seen significant advancements, yet challenges persist in effectively integrating diverse methodologies. Traditional techniques often rely on low-level artifacts that lack semantic context, while newer approaches leverage the capabilities of multimodal large language models (MLLMs) that offer high-level semantic understanding. However, the disparity in paradigms between these two approaches has hindered the development of a unified framework capable of harnessing their complementary strengths.
Proposed Solution: ForenAgent
To bridge this gap, researchers have introduced ForenAgent, a novel interactive framework designed specifically for image forgery detection. This innovative system allows MLLMs to autonomously generate and execute low-level Python tools tailored to the detection objectives, facilitating a more adaptive and interpretable analysis of image forgery.
Methodology
ForenAgent employs a two-stage training pipeline that includes:
- Cold Start: An initial phase where the model learns basic tool interaction capabilities.
- Reinforcement Fine-Tuning: A subsequent phase aimed at enhancing the model’s reasoning adaptability through structured feedback.
Dynamic Reasoning Loop
Inspired by human cognitive processes, the framework incorporates a dynamic reasoning loop, which consists of:
- Global Perception: Analyzing the broader context of the image.
- Local Focusing: Concentrating on specific areas of interest.
- Iterative Probing: Repeatedly examining the image for anomalies.
- Holistic Adjudication: Making informed decisions based on gathered insights.
Dataset Construction: FABench
ForenAgent’s capabilities were systematically assessed using FABench, a comprehensive dataset designed for agent-forensics. FABench includes:
- 100,000 images representing various forgery types.
- Approximately 200,000 agent-interaction question-answer pairs to facilitate robust training and evaluation.
Results and Implications
Experiments conducted with ForenAgent indicate that the model exhibits emergent tool-use competence and reflective reasoning during challenging image forgery detection tasks. The integration of low-level tools enhances its analytical capabilities, paving the way for a more general-purpose approach to image forgery detection.
Conclusion
The innovative methodologies presented in ForenAgent represent a significant advancement in the field of image forgery detection. As the code for this framework is set to be released following the review process, it holds the potential to transform how we approach forensic analysis in an increasingly digital world.
