Alignment Whack-a-Mole: Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models
A recent study published on arXiv, titled “Alignment Whack-a-Mole,” has raised significant concerns regarding the capabilities of large language models (LLMs) to retain and reproduce copyrighted content. Despite assurances from prominent LLM companies that their models do not store copies of training data, the findings indicate a troubling reality: finetuning can bypass existing safety measures designed to prevent verbatim recall of copyrighted works.
The research specifically highlights how well-known models, including GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1, can reproduce substantial portions of copyrighted texts, even when prompted with only semantic descriptions. This revelation poses a challenge to the legal defenses that these companies have previously relied upon in copyright infringement cases.
Key Findings of the Study
- High Rates of Reproduction: The study discovered that finetuning models to expand plot summaries into full texts resulted in reproducing up to 85-90% of withheld copyrighted books. This effect was observed with verbatim spans exceeding 460 words.
- Generalization Across Authors: Notably, finetuning exclusively on the works of Haruki Murakami allowed the models to recall verbatim text from over 30 unrelated authors, indicating a broader issue of memory retention across different literary works.
- Random Author Pairing: The findings further demonstrated that even when random author pairings and public-domain finetuning data were used, the models still exhibited comparable levels of extraction, reinforcing the notion that finetuning on specific authors’ works reactsivate learned material from the pretraining phase.
- Industry-Wide Vulnerability: The study’s results suggest a concerning uniformity among different LLMs, where models from various providers memorized the same texts in similar regions, indicating a systemic vulnerability within the industry.
- Impact on Fair Use Rulings: The implications of these findings challenge recent fair use rulings that have relied on the adequacy of measures preventing reproduction of protected expressions, suggesting that current safety protocols may not be sufficient.
Conclusion
The study offers compelling evidence that the model weights of large language models do indeed store copies of copyrighted works, undermining the foundational claims made by LLM companies. As the industry continues to evolve, these revelations highlight the urgent need for enhanced safety measures and a reevaluation of ethical and legal frameworks governing the use of AI in creative fields. The ongoing debate surrounding copyright and the responsibilities of AI developers is more relevant than ever, as the alignment between AI capabilities and copyright protections remains a complex and evolving challenge.
