Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning
Summary: arXiv:2511.18000v2 Announce Type: replace-cross
Abstract
In recent advancements in reinforcement learning (RL), we introduce ContagionRL, a Gymnasium-compatible platform that focuses on systematic reward engineering in spatial epidemic simulations. Unlike conventional agent-based models that depend on fixed behavioral rules, ContagionRL allows for the rigorous assessment of how the design of reward functions influences learned survival strategies in various epidemic contexts.
Key Features of ContagionRL
The platform integrates a spatial SIRS+D epidemiological model with customizable environmental parameters. This integration permits researchers to thoroughly test reward functions under diverse conditions, including:
- Limited observability
- Different movement patterns
- Heterogeneous population dynamics
Reward Function Designs
We conducted evaluations on five distinct reward designs, which range from sparse survival bonuses to an innovative potential field approach. These designs were tested across multiple RL algorithms, including:
- Proximal Policy Optimization (PPO)
- Soft Actor-Critic (SAC)
- Advantage Actor-Critic (A2C)
Findings from Systematic Ablation Studies
Our systematic ablation studies indicate that directional guidance and explicit adherence incentives are crucial for effective policy learning. The evaluation encompassed various factors such as:
- Infection rates
- Grid sizes
- Visibility constraints
- Movement patterns
The results reveal that the choice of reward function significantly influences agent behavior and survival outcomes.
Performance of Potential Field Reward
Agents trained using the potential field reward consistently demonstrated superior performance. They achieved maximal adherence to non-pharmaceutical interventions while also developing sophisticated strategies for spatial avoidance. This highlights the platform’s potential for uncovering adaptive behavioral responses in epidemic scenarios.
Conclusion
ContagionRL addresses a critical gap in the study of reward engineering, a topic that has received limited focus in existing models of this nature. The platform’s modular design facilitates the exploration of reward-behavior relationships, emphasizing the importance of reward design, information structure, and environmental predictability in learning processes.
For researchers interested in delving deeper into this topic, the code for ContagionRL is publicly available at https://github.com/redradman/ContagionRL.
