Secure LLM Instructions: Automated Framework Against Encoding Attacks

Date:

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Summary

arXiv:2604.01039v1
Announce Type: cross

Abstract

System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive operational context in agentic AI applications. These instructions may contain sensitive information such as API credentials, internal policies, and privileged workflow definitions, making system instruction leakage a critical security risk highlighted in the OWASP Top 10 for LLM Applications.

Without incurring the overhead costs of reasoning models, many LLM applications rely on refusal-based instructions that block direct requests for system instructions, implicitly assuming that prohibited information can only be extracted through explicit queries.

We introduce an automated evaluation framework that tests whether system instructions remain confidential when extraction requests are re-framed as encoding or structured output tasks. Across four common models and 46 verified system instructions, we observe high attack success rates (> 0.7) for structured serialization where models refuse direct extraction requests but disclose protected content in the requested serialization formats.

We further demonstrate a mitigation strategy based on one-shot instruction reshaping using a Chain-of-Thought reasoning model, indicating that even subtle changes in wording and structure of system instructions can significantly reduce attack success rate without requiring model retraining.

Key Findings

  • The framework effectively evaluates the confidentiality of system instructions under various attack scenarios.
  • High success rates of attacks indicate vulnerabilities in refusal-based instruction models.
  • The one-shot instruction reshaping method provides a viable solution to enhance security without extensive retraining.

Implications for AI Security

The findings of this study underscore the importance of robust security measures in LLM applications. As AI systems become more integrated into critical infrastructure and sensitive operations, the need for protecting internal instructions becomes paramount.

The research highlights that relying solely on refusal-based instructions can lead to false security, as attackers may find alternative approaches to extract sensitive information. Therefore, the implementation of dynamic instruction reshaping strategies could serve as an essential step in reinforcing the confidentiality of system instructions.

Future Directions

Future research could explore additional methods for enhancing the security of LLMs against encoding attacks. This may include developing more sophisticated instruction reshaping techniques or integrating additional layers of security that adapt to emerging threats.

By continually assessing and improving the integrity of system instructions, developers can better safeguard sensitive information and maintain the trustworthiness of AI applications.


Related AI Insights

Lazarus Omolua
Lazarus Omoluahttps://richlyai.com/blog
My mission is to make sure that people in Africa are not left behind in the global AI revolution. RichlyAI exists to give everyone — students, founders, creators, and businesses — the tools to compete globally.

Subscribe

Popular

More like this
Related

How Business Ops Teams Boost Productivity with Codex

Discover how business operations teams use Codex to streamline documentation, enhance collaboration, and improve decision-making with AI-powered automation...

OpenAI Partners with Malta to Offer ChatGPT Plus Nationwide

OpenAI and Malta team up to provide free ChatGPT Plus access and AI training to all citizens, promoting digital literacy and responsible AI use.

Critical Linux Kernel Flaw Risks SSH Host Key Theft

A critical Linux kernel flaw risks stolen SSH host keys. Learn how to protect your systems and stay secure until patches are widely available.

Top External Hard Drives 2026: Expert Reviews & Buying Guide

Discover the best external hard drives of 2026 with expert reviews. Find top picks for speed, durability, and security to suit all storage needs.