Defense Agentic AI June 18, 2026 · 5 min read

NSA and the Five Eyes Set the Bar for Agentic AI in Defense. The Controls Converge at the Data Boundary.

The U.S. National Security Agency does not co-author cybersecurity guidance often, and it almost never does so on a brand-new technology category. On May 1, 2026, it did both. NSA, the Cybersecurity and Infrastructure Security Agency (CISA), and their counterparts in Australia, Canada, New Zealand and the United Kingdom jointly published "Careful adoption of agentic AI services" — the first Five Eyes guidance written specifically for agentic AI. Its opening sentence sets the stakes: agentic AI systems "increasingly operate across critical infrastructure and defence sectors and support mission-critical capabilities."

For defense contractors, aerospace OEMs and federal mission owners standing up AI agents, this is the closest thing yet to an allied baseline. And read closely, nearly every control it recommends lands in the same place — the boundary where data enters and exits the model.

The headline recommendation is restraint

The guidance is unusually direct about how cautious organizations should be. The authoring agencies recommend adopting agentic AI "with security in mind, assessing its use and never granting it broad or unrestricted access, especially to sensitive data or critical systems," and add that "organisations should only use agentic AI for low-risk and non-sensitive tasks."

That is a high bar for the defense industrial base, where the appeal of agents is precisely that they touch contract repositories, logistics data and classified-adjacent workflows. The guidance frames an agentic system as an LLM wired to external tools, external data sources, memory and planning workflows — and notes that every one of those components "widens the attack surface."

Five risk categories, one common thread

The document organizes agentic risk into five categories: privilege, design and configuration, behaviour, structural and accountability. CyberScoop, reporting on the release, summarized the same five. Privilege risk is the "confused deputy" problem — an over-permissioned agent manipulated into doing what a low-privileged user could not. Behavioural risk includes specification gaming and prompt injection. Accountability risk is the one defense auditors will feel first: agentic systems, as CyberScoop put it, "make decisions through processes that are difficult to inspect and generate logs that are hard to parse."

Different categories, one common thread — controlling the data and instructions crossing into and out of the model.

The controls converge at the data boundary

Among the guidance's design recommendations is a line that reads like a product spec for a governance proxy: "Apply security controls at all points where information enters or exits the system, including user inputs, tool calls, data pre-processing and model inference." It explains why. Agentic systems "insert data from tools and memory bases into the context window of LLM agents, greatly expanding the attack surface," and external data sources such as web search "can insert additional information into the prompt context, enabling indirect prompt injection attacks."

The recommended mitigations are concrete and they all sit on that path. Integrate "prompt injection filters and semantic analysis to detect malicious instructions." Implement "robust input validation and sanitisation for all agent inputs." "Validate context to ensure the system correctly interprets intent before execution." And — the control most enterprises are missing — "implement data loss prevention controls specifically tuned to AI agent behaviours."

None of those live inside the model. They live at the boundary, on the path between a user (or a tool) and the LLM.

Accountability is an audit-log problem

The guidance is equally specific on the accountability gap. It warns that "fragmented logs, opaque agent reasoning and emergent interactions obscure the decision path, making it hard to explain the outcome, assign responsibility, or demonstrate compliance." Its answer is a logging discipline most AI deployments don't have: "integrate unified audit logs for all inter-agent interactions," "log agent tool usage and ensure results are captured in system logs in a human-readable format," and — a control with obvious insider-threat value — "quarantine requests to delete logs or audit records until reviewed and approved by a human."

For a CMMC- or RMF-scoped contractor, "demonstrate compliance" is not a metaphor. It is the evidence an assessor asks for. Human-readable, tamper-resistant logs of what crossed the model boundary are exactly that evidence.

Two complementary layers

It is worth separating two things the guidance treats together. One layer governs what an agent is allowed to do — least privilege, cryptographically anchored identity, human sign-off on high-impact actions. The guidance is blunt that "decisions about when human approval is required are determined by system designers or operators, not delegated to the agentic AI system."

The other layer governs what data crosses the LLM boundary — the prompts, the retrieved context, the tool outputs. The guidance points operators at the OWASP Top 10 for Agentic Applications for 2026 and MITRE ATLAS for threat modelling, and recommends "continuous runtime authentication with centralised policy decision points for each action." These two layers are complementary, not interchangeable. An agent can be perfectly scoped and still leak controlled information through an unfiltered prompt; a data boundary can be airtight and still be abused by an over-privileged agent. Defense programs need both.

What this means for the defense industrial base

The agencies say plainly that agentic AI is "already being deployed in critical infrastructure and defense sectors with insufficient safeguards." The gap is not awareness — it is instrumentation. Most organizations cannot see, filter, or prove what their employees and agents send to a model. The recommended controls — input filtering at the boundary, DLP tuned to model interactions, human-readable unified audit logs, runtime policy decisions at a central point — describe an enforcement layer that has to sit between the user and the model, not bolted on after the fact.

That layer is what Containment.AI builds. We enforce governance policies in real time at the proxy layer, in the browser, and in the admin dashboard — monitoring, enforcing and auditing AI usage before sensitive data leaves the organization. Policies evaluate what crosses the boundary at the moment of send, and every decision is recorded in a form a human reviewer, or an auditor, can actually read. See how it works for defense →

The guidance closes with a warning the defense industrial base should take literally: "Until security practices, evaluation methods and standards mature, organisations should assume that agentic AI systems may behave unexpectedly and plan deployments accordingly, prioritising resilience, reversibility and risk containment over efficiency gains."

Ready to close the gap?

Talk to us about runtime AI governance for regulated environments.

Schedule a Conversation →