Episode 21 — Separate System, Developer, and User Instructions to Prevent Confused Authority

This episode explains instruction hierarchy as a security control, because SecAI+ scenarios often involve an AI system receiving competing directions from system prompts, developer prompts, user prompts, and untrusted content, and the exam expects you to prevent “confused authority” failures. You will learn what each instruction layer is intended to do, how higher-priority instructions constrain lower-priority requests, and why mixing policy rules with user-provided text creates easy openings for prompt injection and policy bypass. We will work through practical examples where retrieved documents contain embedded commands, where a user attempts to override safety requirements, and where tool outputs include adversarial strings that should never be treated as instructions. You will also learn best practices like separating policy from content, validating instruction boundaries, using explicit allowlists for tool actions, and designing prompts so the model treats external text as data to analyze rather than directives to obey. Produced by BareMetalCyber.com, where you’ll find more cyber audio courses, books, and information to strengthen your educational path. Also, if you want to stay up to date with the latest news, visit DailyCyber.News for a newsletter you can use, and a daily podcast you can commute with.
Episode 21 — Separate System, Developer, and User Instructions to Prevent Confused Authority
Broadcast by