One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
In the realm of artificial intelligence, symbolic world modeling plays a crucial role in how agents understand and interact with their environments. A recent paper, titled “One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration,” proposes a novel approach to this challenge. The study, available on arXiv as document 2510.12088v2, tackles the complexities of learning in stochastic environments where traditional methods may fall short.
Abstract Overview
The abstract of the paper emphasizes the need for agents to infer and represent transitional dynamics of their environments as executable programs. Previous research primarily concentrated on deterministic environments, marked by ample interaction data and human guidance. In contrast, this work addresses a more realistic scenario where agents have only “one life” to explore hostile environments without any human assistance.
The OneLife Framework
The authors introduce OneLife, a framework designed to model world dynamics through conditionally-activated programmatic laws within a probabilistic programming framework. Key features of this framework include:
- Precondition-Effect Structure: Each law within the framework operates based on preconditions that trigger effects in the relevant world states.
- Dynamic Computation Graph: This design allows the framework to route inference and optimization solely through laws that are relevant, thereby avoiding scaling challenges associated with complex environments.
- Learning Stochastic Dynamics: OneLife is capable of learning stochastic dynamics even with sparse rule activation, making it adaptable in unpredictable settings.
Evaluation Protocol
To effectively evaluate the performance of OneLife, the authors established a new evaluation protocol that focuses on two key metrics:
- State Ranking: This metric assesses the agent’s ability to distinguish between plausible and implausible future states.
- State Fidelity: This measure evaluates the agent’s capability to generate future states that closely resemble the actual reality of the environment.
Experimental Results
The framework was subjected to rigorous testing in Crafter-OO, a reimplementation of the Crafter environment. This setup allowed for a structured, object-oriented symbolic state with a pure transition function. The results were promising:
- OneLife successfully learned key dynamics of the environment from minimal, unguided interactions.
- The framework outperformed a strong baseline in 16 out of 23 scenarios tested.
- Simulated rollouts demonstrated effective planning capabilities, identifying superior strategies for future actions.
Conclusion
The work presented in this study lays a solid foundation for the autonomous construction of programmatic world models in unknown and complex environments. By enabling agents to learn and adapt through unguided exploration, OneLife represents a significant advancement in the field of AI, offering new possibilities for the development of intelligent systems that can operate effectively in unpredictable settings.
