Odysseus: Scaling VLMs for 100+ Turn Game Decisions

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

As the capabilities of vision-language models (VLMs) continue to expand, researchers are turning their attention to the exciting potential of applying these models to interactive decision-making tasks, particularly in the domain of video games. A recent study, highlighted in arXiv:2605.00347v1, explores this frontier by investigating the use of reinforcement learning (RL) for long-horizon decision-making, specifically within the context of the classic game Super Mario Land.

Challenges in Existing Approaches

Traditionally, the integration of VLMs in interactive environments has been limited by two main factors:

Dependence on Supervised Fine-Tuning: Current methods often require extensive supervised fine-tuning on human-generated trajectories, which can be resource-intensive and time-consuming.
Short-Horizon Limitations: Many existing RL applications are restricted to short-horizon settings, typically involving around 20 to 30 turns of interaction, which does not adequately reflect the complexity of longer games.

Innovative Approaches to Long-Horizon Decision-Making

The authors of the study conducted a systematic exploration of key algorithmic components necessary for facilitating long-term decision-making in video games. They introduced a novel variant of Proximal Policy Optimization (PPO) that incorporates a lightweight turn-level critic. This adaptation significantly enhances training stability and sample efficiency compared to critic-free methods such as Generalized REINFORCE Policy Optimization (GRPO) and Reinforce++.

One of the standout findings of the research is the effectiveness of pretrained VLMs as strong action priors. These priors greatly enhance the efficiency of sample utilization during RL training, reducing the reliance on manual action design choices that are often prevalent in classical deep RL approaches that start from scratch.

Introducing Odysseus: A New Training Framework

Building on their findings, the researchers unveiled Odysseus, an open training framework specifically designed for VLM agents engaged in long-horizon decision-making tasks. The framework demonstrated remarkable performance across multiple levels of Super Mario Land, achieving at least three times the average game progress compared to existing frontier models.

Generalization and Future Implications

Additionally, the trained models exhibited consistent improvements in both in-game and cross-game generalization settings, all while maintaining their general-domain capabilities. This opens up exciting avenues for further research and application in various interactive environments.

Conclusion

The results of this study not only identify crucial components for stabilizing and enhancing the effectiveness of RL in long-horizon, multi-modal settings but also provide practical guidelines for the development of VLMs as embodied agents. As researchers continue to push the boundaries of VLM capabilities, frameworks like Odysseus could play a pivotal role in shaping the future of AI in gaming and beyond.

RichlyAI Blog AI Guide, Tutorials, Industrial Insights, & more!

Company

Odysseus: Scaling VLMs for 100+ Turn Game Decisions

Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning

Challenges in Existing Approaches

Innovative Approaches to Long-Horizon Decision-Making

Introducing Odysseus: A New Training Framework

Generalization and Future Implications

Conclusion

Related AI Insights

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

RichlyAI Blog
AI Guide, Tutorials, Industrial Insights, & more!

More like this
Related