Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains
Summary: arXiv:2604.02343v1 Announce Type: cross
In a groundbreaking study, researchers have explored the compression capabilities of Large Language Models (LLMs), revealing a significant advancement in both lossless and lossy compression techniques. This research delineates a compression-compute frontier, illustrating that increased compression can be achieved at the expense of computational resources.
Key Findings
- Lossless Compression: By employing domain-adapted LoRA adapters, the efficiency of LLM-based arithmetic coding can be enhanced, achieving a 2x improvement compared to using the base LLM alone.
- Lossy Compression: The method of prompting a model for a concise rewrite followed by arithmetic coding allows for remarkable compression ratios of approximately 0.03, effectively doubling the efficiency compared to compressing the original text.
Introduction of Question-Asking Compression (QA)
The researchers have also introduced an innovative interactive lossy protocol known as Question-Asking compression (QA), inspired by the classic game ‘Twenty Questions’. This protocol enables a smaller model to refine its responses by sequentially asking yes/no questions to a more powerful model, thereby transferring information one bit at a time.
Through this method, the study demonstrates that on eight different benchmarks, which encompass areas like mathematics, science, and coding, ten binary questions can recover between 23% to 72% of the capability gap between small and large models. On more challenging benchmarks, the recovery rates range from 7% to 38%, resulting in compression ratios of 0.0006 to 0.004.
Comparative Efficiency
This novel approach to compression not only showcases the potential for efficiency in knowledge transfer but also emphasizes a staggering improvement over previous methods. The QA protocol achieves compression ratios that are over 100 times smaller than prior LLM-based compression techniques, as noted in the work of Deletang et al. (2024). This suggests that interactive protocols can effectively convey knowledge in a remarkably efficient manner, surpassing the traditional approach of transmitting full responses.
Conclusion
The findings from this study have profound implications for the future of LLM applications and the field of natural language processing. With the advent of such advanced compression techniques, it is now possible to optimize the efficiency of data transmission, potentially revolutionizing how AI systems interact and share information.
