RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair
Summary: arXiv:2604.12820v1 Announce Type: new
Abstract: Large language models (LLMs) inherently absorb harmful knowledge, misinformation, and personal data during pretraining on large-scale web corpora, with no native mechanism for selective removal. While machine unlearning offers a principled solution, existing approaches are provider-centric, requiring retraining pipelines, curated retain datasets, and direct intervention by model service providers (MSPs), thereby excluding end users from controlling their own data.
We introduce Interactive Machine Unlearning (IMU), a new paradigm in which users can instruct LLMs to forget targeted knowledge through natural language at inference time. To realize IMU, we propose RePAIR, a prompt-aware model repair framework comprising:
- Watchdog Model: For unlearning intent detection.
- Surgeon Model: For generating repair procedures.
- Patient Model: Whose parameters are updated autonomously.
At the core of RePAIR, we develop Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations toward a refusal subspace via closed-form pseudoinverse updates. Its low-rank variant reduces computational complexity from O(d3) to O(r3 + r2 * d), enabling efficient on-device unlearning with up to ~3x speedup over training-based baselines.
Performance and Effectiveness
Extensive experiments across various domains, including harmful knowledge suppression, misinformation correction, and personal data erasure, demonstrate that RePAIR achieves near-zero forget scores (Accf = 0.00, F-RL = 0.00) while preserving model utility (Accr up to 84.47, R-RL up to 0.88). This performance outperforms six state-of-the-art baselines, establishing RePAIR as an effective and practical framework for user-driven model editing.
Implications for User Control
RePAIR’s framework advances transparent and on-device control over learned knowledge, allowing users to manage harmful or sensitive information actively. This paradigm shift empowers end users, providing them with the tools to maintain their privacy and control over their data.
Future Directions
The success of RePAIR suggests potential extensions to multimodal foundation models, allowing for broader applications beyond text-based interactions. This could lead to significant improvements in how AI systems handle user data across various formats, including images and audio.
Overall, RePAIR represents a significant step forward in addressing the challenges of machine unlearning, paving the way for more responsible and user-centric AI technologies.
