REM-CTX: Automated Peer Review via Reinforcement Learning with Auxiliary Context
Summary: arXiv:2604.00248v1 Announce Type: cross
Abstract: Most automated peer review systems rely on textual manuscript content alone, leaving visual elements such as figures and external scholarly signals underutilized. We introduce REM-CTX, a reinforcement-learning system that incorporates auxiliary context into the review generation process via correspondence-aware reward functions.
Overview of REM-CTX
REM-CTX is an innovative approach to automated peer review that leverages advanced reinforcement learning techniques. The system incorporates auxiliary context into its review generation process, addressing the shortcomings of traditional systems that primarily focus on textual content. By including visual elements and external scholarly signals, REM-CTX enhances the quality and accuracy of peer reviews.
Key Features of REM-CTX
- Incorporation of Auxiliary Context: Unlike traditional systems, REM-CTX utilizes a variety of contextual information beyond text, including figures and other visual data.
- Reinforcement Learning Framework: The system employs Group Relative Policy Optimization (GRPO) to train an 8B-parameter language model effectively.
- Multi-Aspect Quality Reward: REM-CTX integrates a comprehensive reward system that evaluates the quality of reviews from multiple perspectives.
- Correspondence-Aware Rewards: The model includes two specific rewards that encourage the alignment of the generated reviews with auxiliary context, ensuring a more holistic evaluation of manuscripts.
Performance Evaluation
Extensive experiments conducted across various disciplines, including Computer, Biological, and Physical Sciences, demonstrate that REM-CTX significantly surpasses existing peer review systems. The results reveal that:
- REM-CTX achieves the highest overall review quality among six baseline models.
- It outperforms larger commercial models by a substantial margin, confirming its effectiveness.
- The next-best reinforcement learning baseline is also outperformed in both quality and contextual grounding metrics.
Ablation Studies and Findings
The ablation studies conducted during the research indicate that the two correspondence rewards within REM-CTX are complementary. Each reward selectively enhances its targeted area while maintaining overall quality dimensions. The full model consistently outperformed all partial variants, highlighting the importance of a comprehensive approach to reward structuring.
Insights into Training Dynamics
Analysis of the training dynamics revealed intriguing correlations, particularly noting that the criticism aspect of reviews was negatively correlated with other quality metrics. This suggests a potential area for improvement in future studies, where grouping multi-dimensional rewards for review generation could lead to better training outcomes.
Conclusion
In conclusion, REM-CTX represents a significant advancement in automated peer review systems. By integrating auxiliary context and employing sophisticated reinforcement learning techniques, it not only enhances the quality of reviews but also sets a new standard for future research in the field. The findings from this study open avenues for further exploration into multi-dimensional reward systems, potentially revolutionizing the peer review process.
