OLLN: Options-based Large Language Models
Summary: arXiv:2604.19087v1 Announce Type: new
Abstract: We introduce Options LLM (OLLM), a simple, general method that replaces the single next-token prediction of standard LLMs with a set of learned options for the next token, indexed by a discrete latent variable. Instead of relying on temperature or sampling heuristics to induce diversity, OLLM models variation explicitly: a small latent space parametrizes multiple plausible next-token options which can be selected or searched by a downstream policy.
Architecturally, OLLM is a lightweight “plug-in” that inserts two layers: an encoder and a decoder, before the output head, allowing almost any pretrained LLM to be converted with minimal additional parameters. We apply OLLM to a 1.7B-parameter backbone (only 1.56% of parameters trainable) trained on OpenMathReasoning and evaluated on OmniMath.
Key Features of OLLM
- Enhanced Diversity: OLLM replaces traditional single next-token predictions with a set of learned options, fostering richer and more varied outputs.
- Lightweight Integration: The architecture allows for easy integration into existing LLMs with minimal parameter adjustments.
- Improved Performance: While SOTA LoRA-adapted baselines peak at 51% final answer correctness, OLLM’s option set achieves up to approximately 70% accuracy under optimal latent selection.
Operational Efficiency
To optimize the model’s performance, a compact policy is trained in the latent space. This policy emits latents to control generation, making reward optimization more sample-efficient. The approach significantly reduces common misalignments, such as language switching or degenerate reasoning, by constraining the policy to options learned during supervised fine-tuning (SFT).
Crucially, the model’s alignment is achieved through its structure rather than relying on additional KL divergences or handcrafted alignment losses. This structural alignment contributes to the model’s robustness and controllability, showcasing the potential of optionized next-token modeling.
Conclusion
The introduction of Options LLM represents a significant advancement in the field of large language models. By enabling a more structured approach to next-token generation, OLLM enhances not only the performance of mathematical reasoning tasks but also opens new avenues for reinforcement learning in LLMs. The results confirm that operating in a low-dimensional option space can lead to better outcomes in terms of controllability, robustness, and efficiency.
As we continue to explore the capabilities of large language models, OLLM stands out as a promising direction for future research and applications in AI-driven text generation and reasoning tasks.
