DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review
Summary: arXiv:2604.09590v1 Announce Type: new
Abstract
Automated peer review is often framed as generating fluent critique; however, it is essential for reviewers and area chairs to have judgments that they can audit. This includes understanding where a concern applies, the evidence that supports it, and the concrete follow-up that is required. DeepReviewer 2.0 is a process-controlled agentic review system designed around an output contract, which produces a traceable review package featuring anchored annotations, localized evidence, and executable follow-up actions. It exports only after fulfilling minimum traceability and coverage budgets.
Key Features of DeepReviewer 2.0
DeepReviewer 2.0 introduces several innovative features that set it apart from traditional peer review systems:
- Manuscript-only Claim-Evidence-Risk Ledger: The system first constructs a ledger that maps claims made in the manuscript to supporting evidence and associated risks.
- Verification Agenda: DeepReviewer 2.0 creates a verification agenda that guides the review process, ensuring that the critiques are focused and relevant.
- Agenda-driven Retrieval: The system performs targeted retrieval of information based on the verification agenda, enhancing the accuracy and relevance of critiques.
- Anchored Critiques: Critiques are generated with anchored references to evidence, making it easier for reviewers to track and audit the review process.
- Export Gate: The system only exports the review package once it meets predefined traceability and coverage standards, ensuring high-quality outputs.
Performance Analysis
In a comprehensive study involving 134 submissions to ICLR 2025 under three fixed protocols, an un-finetuned 196B model utilizing DeepReviewer 2.0 demonstrated superior performance. Key findings from the analysis include:
- Improved strict major-issue coverage: 37.26% compared to 23.57% for the competing model Gemini-3.1-Pro-preview.
- Outperformed human review committee: DeepReviewer 2.0 won 71.63% of micro-averaged blind comparisons against a human review committee.
- Ranked first among automatic systems within the tested pool.
Positioning and Future Directions
DeepReviewer 2.0 is positioned as an assistive tool for the peer review process rather than a decision-making proxy. This distinction emphasizes the importance of human oversight in critical review stages. However, researchers acknowledge the existence of gaps, particularly in areas requiring ethics-sensitive checks.
In conclusion, DeepReviewer 2.0 represents a significant advancement in automated peer review technology, providing a structured, auditable approach to scientific critique. As the field continues to evolve, ongoing improvements and ethical considerations will be crucial to ensuring the integrity and reliability of the peer review process.
