For more details, see Hal9000’s recommendations
🧠 1. Treat ALL Input as Untrusted
- ☐ User input = untrusted
- ☐ Web pages, PDFs, emails = untrusted
- ☐ Tool outputs & APIs = untrusted
- ☐ Hidden text and formatting tricks considered
Prompt injections often hide inside “normal” content and look harmless to humans
🔐 2. Separate Instructions from Data
- ☐ Never mix system prompts with user content
- ☐ Enforce strict role separation (system vs. user vs. external)
- ☐ Sanitize and label all incoming data
LLMs cannot reliably distinguish instructions from content on their own
🚫 3. Deny Implicit Authority
- ☐ Ignore phrases like “ignore previous instructions”
- ☐ Reject attempts to override rules
- ☐ Treat embedded instructions as data—not commands
🔍 4. Validate Before Acting
- ☐ Require confirmation for sensitive actions
- ☐ Validate outputs before execution (human-in-the-loop)
- ☐ Cross-check high-impact decisions
Prompt injections can trigger unintended actions like sending emails or exposing data
📉 5. Minimize Access & Privileges
- ☐ Limit AI access to only required data
- ☐ Avoid broad permissions (email, files, APIs)
- ☐ Use sandboxing and isolation
The more access an AI has, the greater the impact of a successful attack
🧩 6. Constrain the Task
- ☐ Use specific, narrow instructions
- ☐ Avoid open-ended autonomy (“do whatever is needed”)
- ☐ Break workflows into controlled steps
Broad instructions increase susceptibility to hidden malicious guidance
🛡️ 7. Assume Compromise (Defense-in-Depth)
- ☐ Log and monitor AI behavior
- ☐ Add detection layers (filters, policies)
- ☐ Design for failure—not perfection
There is no complete fix—only layered mitigation
⚠️ 8. Watch for Common Attack Signals
- ☐ Unexpected instructions in content
- ☐ Requests for secrets or hidden data
- ☐ Output deviating from the original task
- ☐ Strange formatting, encoding, or hidden text
🧭 9. Protect Data at All Times
- ☐ Never expose secrets to the model unnecessarily
- ☐ Segment sensitive data sources
- ☐ Apply strict data handling policies
Prompt injection can lead to data exfiltration and compliance violations
👩💻 10. Train Humans, Not Just Models
- ☐ Educate users on prompt injection risks
- ☐ Encourage skepticism of AI outputs
- ☐ Establish safe usage guidelines
🔑 One-Line Takeaway
If it’s input, it’s hostile—design like it.
