Reward Engineering in Spatial Epidemic Simulations with RL

Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning

Summary: arXiv:2511.18000v2 Announce Type: replace-cross

Abstract

In recent advancements in reinforcement learning (RL), we introduce ContagionRL, a Gymnasium-compatible platform that focuses on systematic reward engineering in spatial epidemic simulations. Unlike conventional agent-based models that depend on fixed behavioral rules, ContagionRL allows for the rigorous assessment of how the design of reward functions influences learned survival strategies in various epidemic contexts.

Key Features of ContagionRL

The platform integrates a spatial SIRS+D epidemiological model with customizable environmental parameters. This integration permits researchers to thoroughly test reward functions under diverse conditions, including:

Limited observability
Different movement patterns
Heterogeneous population dynamics

Reward Function Designs

We conducted evaluations on five distinct reward designs, which range from sparse survival bonuses to an innovative potential field approach. These designs were tested across multiple RL algorithms, including:

Proximal Policy Optimization (PPO)
Soft Actor-Critic (SAC)
Advantage Actor-Critic (A2C)

Findings from Systematic Ablation Studies

Our systematic ablation studies indicate that directional guidance and explicit adherence incentives are crucial for effective policy learning. The evaluation encompassed various factors such as:

Infection rates
Grid sizes
Visibility constraints
Movement patterns

The results reveal that the choice of reward function significantly influences agent behavior and survival outcomes.

Performance of Potential Field Reward

Agents trained using the potential field reward consistently demonstrated superior performance. They achieved maximal adherence to non-pharmaceutical interventions while also developing sophisticated strategies for spatial avoidance. This highlights the platform’s potential for uncovering adaptive behavioral responses in epidemic scenarios.

Conclusion

ContagionRL addresses a critical gap in the study of reward engineering, a topic that has received limited focus in existing models of this nature. The platform’s modular design facilitates the exploration of reward-behavior relationships, emphasizing the importance of reward design, information structure, and environmental predictability in learning processes.

For researchers interested in delving deeper into this topic, the code for ContagionRL is publicly available at https://github.com/redradman/ContagionRL.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Reward Engineering in Spatial Epidemic Simulations with RL

Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning

Abstract

Key Features of ContagionRL

Reward Function Designs

Findings from Systematic Ablation Studies

Performance of Potential Field Reward

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related