Episode 50 — Use MITRE ATLAS Concepts for AI Threat Modeling and Adversary Behavior

In this episode, we bring a threat-modeling mindset to A I systems by using concepts from MITER Adversarial Threat Landscape for Artificial Intelligence (ATLAS). If you are new to cybersecurity, threat modeling can sound like a fancy exercise, but at its core it is just structured curiosity: you ask what you are protecting, who might attack it, how they might do it, and what you can do to reduce risk. MITER ATLAS helps by cataloging adversary behaviors in a way that feels familiar to anyone who has heard of attacker tactics and techniques in traditional security. The big benefit is that it helps you avoid vague fears and focus on realistic behaviors, like poisoning data, manipulating inputs, stealing models, or abusing integrations. This is especially useful in A I because the system is not only code; it is data, models, pipelines, and human workflows all connected. When you use ATLAS concepts, you gain a vocabulary for describing how an attacker moves through that connected system. That vocabulary makes it easier to communicate risks, design defenses, and plan testing that actually mirrors how abuse happens.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

A good starting point is understanding what threat modeling looks like for an A I system in plain language. You begin by naming assets, which are the things you care about protecting, such as training data, prompts, retrieved documents, model weights, system instructions, user identities, and the tools the model can call. You then identify entry points, which are the places an attacker can interact, such as an endpoint that accepts prompts, an upload feature that ingests documents, a feedback channel that collects user ratings, or a pipeline job that pulls data from a repository. Next you consider adversaries and goals, such as someone trying to extract sensitive data, someone trying to degrade performance, or someone trying to get the model to produce harmful content. Finally, you map possible paths from entry points to assets, which is where the ATLAS mindset becomes helpful. Instead of inventing wild scenarios, you look for known patterns of adversary behavior that match your system’s shape. The result is a clearer picture of where your system is most vulnerable and what controls matter most.

MITER ATLAS is organized around the idea that adversaries use repeatable techniques, and those techniques can be grouped into higher-level goals that resemble tactics. For beginners, you do not need to memorize the catalog, but you should understand the categories of behavior it highlights. Some behaviors focus on the data lifecycle, like poisoning training data or manipulating labels. Some focus on the model itself, like extracting it, stealing it, or causing it to behave incorrectly with crafted inputs. Some focus on the surrounding system, like abusing tool integrations, breaking authentication boundaries, or taking advantage of logging and monitoring gaps. ATLAS encourages you to treat A I attacks as multi-step campaigns rather than single “gotcha” prompts. An attacker might start with probing the endpoint, then discover how the system handles instructions, then escalate to retrieval abuse, and finally cause a leak or an unsafe action. Thinking in steps helps you design layered defenses that interrupt the chain.

One central A I threat behavior is manipulating inputs to cause unsafe or unintended behavior, which includes prompt injection in L L M systems and adversarial examples in broader M L systems. The attacker goal is to shape the model’s output in a way that benefits them, such as bypassing refusal rules, extracting data, or producing harmful content. In ATLAS terms, you can think of this as an influence technique: the attacker is influencing the model’s decision process through crafted input. Threat modeling with this lens leads you to ask practical questions. Where do prompts come from, and how much of the prompt is user-controlled. Does the model receive untrusted text from external sources, like documents or web pages. Are there system instructions that must never be overridden, and how are they protected. What happens if the model receives conflicting instructions. When you answer those, you can propose controls like strict separation of roles, careful handling of retrieved content, and limiting the model’s ability to act on instructions found in data.

Another major category is data poisoning and pipeline manipulation, where the attacker tries to influence the model by corrupting the data it learns from or the processes that produce the model. In ATLAS thinking, the training pipeline is an attack surface, not just a background process. The attacker might insert bad samples into a dataset, alter labels, or compromise a data source the pipeline trusts. The goal might be to degrade accuracy broadly, or to create a targeted backdoor behavior that triggers under specific conditions. Threat modeling prompts you to ask which data sources are trusted, who can write to them, and what validation exists. It also prompts you to examine your retraining cadence, because frequent retraining using untrusted data increases the opportunity for attackers to influence outcomes. Controls include strong access controls for datasets and labeling tools, provenance tracking, sanity checks on data distribution, and review gates before new models are promoted to production. The key idea is that integrity protections belong not just in the app, but in the entire model lifecycle.

Model theft and model extraction are also common adversary goals, because models can be expensive intellectual property and because a stolen model can be used to plan attacks. Extraction often occurs through repeated queries, where an attacker collects inputs and outputs and trains a substitute model. Theft can also occur through compromised storage, leaked artifacts, or overly permissive access to model registries. In ATLAS terms, you can think of this as a form of collection and exfiltration, but applied to A I assets. Threat modeling asks you where model artifacts live, who can access them, and whether access is audited. It also asks how your public endpoints could enable extraction through high-volume querying. Controls include rate limits, anomaly detection for scraping patterns, stronger authentication for valuable endpoints, and segmentation so that internal high-capability models are not exposed like public features. Even if you cannot stop every extraction attempt, you can raise the cost and detect it earlier, which is the practical goal.

ATLAS thinking also draws attention to the surrounding ecosystem: tools, integrations, and privileges. If an A I system can retrieve documents, query databases, or trigger workflows, an attacker may try to use the model as a bridge into those systems. This is not because the model is a skilled hacker, but because the model can be tricked into requesting data or actions it should not. Threat modeling here focuses on the boundary between the model and external capabilities. What credentials does the model service use to call tools. Are those credentials scoped, or do they allow broad access. Does the system require human approval before high-impact actions. Are tool calls logged with enough detail to investigate abuse. Controls include least privilege for tool credentials, strict authorization checks outside the model, and a design where the model suggests actions but cannot execute them without validation. In attacker terms, you are preventing the model from becoming a privilege escalation path.

Another useful ATLAS concept is that adversaries probe and iterate. They do not usually succeed on the first try. They send prompts to learn how the system responds, they look for inconsistencies, and they build a mental map of what is allowed. That probing behavior is an opportunity for defense, because it creates detectable patterns like repeated refusals, repeated rephrasing, and systematic exploration of the endpoint’s limits. Threat modeling should therefore include observability as a control, not as an afterthought. You want to know what normal usage looks like so abnormal usage stands out. You want to capture metadata such as request rates, input sizes, and error patterns, while still protecting privacy. You also want to design user experiences that do not reveal too much about your guardrails, because overly detailed refusal messages can teach attackers how to bypass them. The goal is not secrecy as your main defense, but reducing how much the system helps the attacker learn.

One of the most practical outcomes of using ATLAS concepts is improving how you structure your threat scenarios. Instead of writing a vague scenario like, the model gets hacked, you write a behavior chain like, an attacker submits untrusted text containing embedded instructions, the model follows the instructions and requests restricted documents, the system retrieves those documents without proper authorization checks, and the model outputs sensitive content to the attacker. That chain is specific enough that you can point to controls at each step. You can block the chain by sanitizing input, by enforcing authorization before retrieval, by limiting what is passed into the model, and by filtering outputs. This is what good threat modeling looks like: you map an attacker’s behavior and then decide where to break it. ATLAS helps you write these chains because it encourages you to think in adversary techniques rather than in vague outcomes.

Another practical angle is to use ATLAS as a guide for testing and validation. Once you identify relevant adversary behaviors, you can create evaluation prompts and scenarios that simulate them in a controlled environment. For example, you can test prompt injection attempts that mimic real-world tactics, such as instructions hidden in data, attempts to extract system rules, or attempts to bypass refusals through incremental requests. You can test extraction attempts by simulating high-volume query patterns. You can test poisoning defenses by examining how your pipeline detects unusual changes in data distributions. You can also test tool boundary defenses by attempting to trigger unauthorized tool calls through carefully crafted requests. The beginner lesson is that threat modeling should lead to concrete tests, not just documentation. If you cannot test a control, you do not really know whether it works.

As we wrap up, remember the purpose of MITER ATLAS concepts is to make A I threats feel structured and familiar rather than mysterious. A I systems introduce new assets like models and training data, and new abuse patterns like prompt injection and poisoning, but the security mindset is the same: identify assets, map entry points, understand adversary goals, and design layered defenses that interrupt attacker behavior. ATLAS gives you a shared language for describing those behaviors and a toolkit for thinking in steps, not in slogans. If you carry one key idea forward, let it be that adversaries attack the entire A I system, not just the model. They exploit the data pipeline, the deployment environment, the integrations, and the human workflow around the model. When you threat model with that whole-system view, you build defenses that are realistic, measurable, and resilient even as the technology changes.

Episode 50 — Use MITRE ATLAS Concepts for AI Threat Modeling and Adversary Behavior
Broadcast by