Say Something Else: Rethinking Contextual Privacy as Information Sufficiency
Summary: arXiv:2604.06409v1 Announce Type: cross
Abstract
In the age of Large Language Models (LLMs), the ability of these agents to draft messages on behalf of users raises significant concerns about privacy. Users often find themselves oversharing sensitive information and exhibit varying perceptions of what constitutes private data. Traditional privacy mechanisms primarily focus on two strategies: suppression, which involves omitting sensitive information, and generalization, which replaces specific details with broader abstractions. However, these methods have been evaluated largely through isolated messages, thus failing to capture the complexities of real-world communication. This article aims to redefine privacy-preserving communication by introducing the concept of an Information Sufficiency (IS) task, alongside a novel approach called free-text pseudonymization. This method substitutes sensitive attributes with functionally equivalent alternatives. Furthermore, we propose a conversational evaluation protocol that tests these strategies under realistic multi-turn interactions.
Key Findings
- We conducted an extensive evaluation across 792 scenarios that encompassed three types of power relations: institutional, peer, and intimate.
- The scenarios were categorized into three sensitivity categories: discrimination risk, social cost, and boundary issues.
- Seven leading LLMs were assessed for their performance in maintaining privacy, focusing on two key aspects: covertness and utility.
- Our findings revealed that pseudonymization consistently provided the best balance between privacy and utility across various contexts.
- Additionally, evaluations based on single-message interactions significantly underestimated the potential leakage of sensitive information, with generalization strategies losing up to 16.3 percentage points of privacy when subjected to follow-up inquiries.
Implications for Future Research
The introduction of Information Sufficiency as a framework for evaluating privacy in LLM communications emphasizes the need for a more nuanced understanding of user interactions. Current methods that rely on isolated message evaluations fail to account for the dynamic nature of conversations, which can lead to unintended disclosures of sensitive information. This suggests that future research should focus on developing more sophisticated models that can adapt to the complexities of human communication.
Conclusion
As LLMs become increasingly integrated into everyday communication, the challenge of protecting user privacy while maintaining effective communication is paramount. The concept of Information Sufficiency, along with strategies such as free-text pseudonymization, offers a promising direction for enhancing privacy measures in LLM applications. By adopting a multi-turn evaluation protocol, we can ensure that privacy-preserving technologies evolve in tandem with the communicative needs of users, ultimately fostering a safer digital environment.
