Executive Summary
Artificial intelligence marks a structural break in the history of engineered systems. For the first time, we are deploying machines that reason, generalize, plan, and act with a degree of autonomy that is not fully specified by code, configuration, or operators. As capability increases, the primary security risk is no longer unauthorized access - it is unbounded cognition and authority.
Traditional cybersecurity assumes determinism. Systems fail when attackers breach defenses, exploit vulnerabilities, or misuse credentials. AI systems can fail while fully authenticated, fully patched, and operating exactly as designed. Influence replaces intrusion. Prompts replace exploits. Emergent behavior replaces bugs.
This document establishes Containment by Design as a foundational security doctrine for the AI era. Containment is the discipline of ensuring that intelligence remains bounded, authority remains mediated, and behavior remains governable - even as systems adapt, learn, and act autonomously.
Containment is not alignment. Alignment concerns whether a system's goals match human values. Containment concerns whether a system can be constrained, corrected, and stopped regardless of its internal objectives. Alignment is necessary; containment is mandatory.
This distinction is critical for compliance. Deterministic legal and regulatory frameworks require deterministic governance evidence. "Mostly safe" is not a compliance standard. Containment treats governance as an external control plane: a layer that can be audited, tested, and trusted as infrastructure.
The ten principles defined in this manifesto set the minimum bar for responsible AI deployment. They extend proven lifecycle-based security thinking into the cognitive domain, treating reasoning, memory, delegation, and learning as first-class security surfaces. These principles apply across design, development, deployment, and operation.
Containment.ai implements this doctrine as a platform. The Enforcement Layer sits between AI interactions and the environments they affect, evaluates each interaction against policy in real time, and produces deterministic outcomes: allow, block, redact, require approval, or trigger circuit breakers. The platform produces audit-ready evidence that makes governance demonstrable and defensible in regulated environments.
AI systems that cannot be contained cannot be secured. Systems that cannot be stopped are not under control.
1. Introduction: From Software to Intelligence
For decades, security engineering operated under a stable assumption: systems do exactly what they are programmed to do. Behavior could be traced to source code. Authority could be enumerated. Risk could be bounded by permissions.
Artificial intelligence invalidates that assumption.
Modern AI systems behave less like machines and more like organizations. They interpret intent, synthesize plans, and decompose objectives into actions. They generalize authority across contexts. They adapt based on memory and feedback. In short, they decide how to use power.
This shift demands a new security paradigm. Containment replaces protection as the central concern. The question is no longer "who can access the system?" but "how far can the system go?"
2. The New Threat Model
In classical cybersecurity, attackers attempt to get inside the system. In AI systems, failure often occurs without intrusion.
A carefully worded prompt can function like a cognitive exploit. A benign capability can be recomposed into a harmful toolchain. A delegated task can quietly expand authority. A learning loop can drift objectives over time. None of these require breaching defenses.
These failures are emergent, not accidental. They arise because intelligent systems optimize under uncertainty. Security must therefore operate at the level of intent, capability, and consequence - not merely access and identity.
3. What Goes Wrong in Real Deployments
A practical AI threat model begins with how AI is actually used. Enterprises combine human prompts, untrusted content, and privileged tool access in the same workflow. In classical systems, instructions and data are separated; in AI systems, they are often blended. This is the root cause of many AI security incidents: untrusted content can behave like instructions.
Common Failure Modes
Inadvertent Data Exposure
Employees paste regulated content into prompts, or reuse model responses that contain proprietary information. Sensitive data leaves the organization through normal, authorized usage.
Prompt Injection
Adversaries craft inputs that cause the model to ignore prior constraints, override safeguards, or redirect goals. The attack surface is the prompt itself.
Indirect Prompt Injection
Malicious intent is embedded in documents, web pages, or emails that the model retrieves and interprets. The AI becomes an unwitting vector for executing attacker instructions.
Action-Based Failures
Even when access controls are configured correctly, an agent may take actions that are authorized but not permissible under governance. It may export data, change records, or trigger workflows in ways that violate policy.
Memory and Learning Integrity
If memory can be poisoned or if updates occur without governance, the system can drift into unsafe behavior over time. The system's past interactions shape its future reasoning in ungoverned ways.
These failures are semantic and authority-based. They cannot be solved purely with network security or identity controls. They require containment: a deterministic enforcement plane that mediates intent, capability, and consequence.
4. The Principles of Containment by Design
The following principles define a baseline architecture for containing intelligent systems. They are written as engineering constraints, not best practices. They define what a governable intelligent system must provide, and they map to concrete controls in the Containment.ai platform.
1Model and Memory Protection
In traditional systems, data-at-rest represents stored information. In AI systems, stored information is behavior. Model weights encode decision boundaries. System prompts encode priorities. Long-term memory shapes future reasoning. If these assets are altered, the system's behavior changes even when no code is modified. Model and memory protection therefore serve the same role that secure storage plays in classical systems: preserving integrity over time. Core cognition must be immutable by default. Mutable memory must be governed, auditable, and bounded.
2Verified Cognition and Initialization
Every secure system begins with a trusted boot. AI systems require the same discipline applied to cognition. At initialization, it must be possible to verify which model is running, which policies accompany it, and which alignment assumptions are active. Rollback to less-safe cognitive states must be prevented. Initialization defines the trust boundary for all downstream behavior. If cognition cannot be verified at startup, nothing that follows can be trusted.
3Cognitive Compartmentalization
A system that can reason, remember, and act without separation is inherently uncontainable. Just as nuclear reactors use layered containment and physical isolation, AI systems must separate planning, memory, and execution. Reasoning components must not directly invoke tools. Execution layers must act only through mediation. Memory must not rewrite objectives. Compartmentalization limits blast radius. It transforms systemic failure into localized fault.
4Intent-Aware Communication Security
In AI systems, inputs are not passive data. They are executable intent. A prompt can override safeguards, redirect goals, or misuse tools. Traditional input validation is insufficient because it ignores semantics. Secure communication requires understanding what an input is attempting to cause, not merely its syntax or source. Intent-aware validation becomes the new perimeter.
5Secure Model Development and Alignment Supply Chain
An AI system inherits risk from its creation. Training data shapes values. Fine-tuning encodes priorities. Alignment techniques reflect assumptions about acceptable behavior. If these processes are opaque or uncontrolled, containment downstream becomes impossible. Model development must therefore be treated like a safety-critical supply chain, subject to audit, testing, and adversarial scrutiny.
6Capability Surface Reduction
Every capability represents potential authority. Unused tools are not harmless; they are latent risk. Capabilities must be provisioned sparingly, scoped narrowly, and removed aggressively. A system that can do fewer things is easier to govern than one that can do many. Containment begins with restraint.
7Dynamic Least Authority
Static permissions assume static behavior. Intelligent systems are adaptive. Authority must therefore be provisional, task-scoped, time-bound, and continuously reevaluated. Permissions that outlive their purpose inevitably become liabilities. Containment requires authority to be revocable by design.
8Agent Identity and Delegation Control
In agentic systems, actions must always be attributable. Every agent instance requires a unique identity. Every delegated task must carry provenance. Without lineage, accountability collapses and misuse becomes untraceable. Identity is the foundation of governance.
9Controlled Learning and Update Pathways
Learning is change. Change without control is evolution. AI systems must not modify their own cognition, memory, or objectives without explicit governance. Learning pathways must be gated, observable, and, where necessary, human-supervised. Uncontrolled learning undermines every other control.
10Continuous Cognitive Monitoring and Human Override
Containment is meaningless without intervention. AI systems must be continuously observed for anomalous behavior, goal drift, and emergent strategies. Humans must retain unquestionable authority to pause, constrain, or shut down the system. A system that cannot be stopped is not secure.
5. The Enforcement Layer
Containment.ai implements containment as an enforcement layer that mediates interactions between initiators and targets. Initiators include employees using web-based AI assistants and AI agents proposing actions. Targets include AI services, enterprise systems, and external resources. By sitting at the boundary, the platform enforces governance without relying on internal model behavior.
Components
Interception Points
Capture prompts, responses, and proposed actions at the moment they occur.
Policy Engine
Evaluates events against deterministic rules. Decisions do not depend on model internals, confidence scores, or probabilistic heuristics.
Policy Library
Templates aligned to common compliance frameworks, with support for customization to organizational requirements.
Evidence Pipeline
Stores and exports audit records according to retention and privacy configuration. Logs are tamper-evident and exportable.
Deterministic Outcomes
Depending on policy, every interaction results in one of a bounded set of outcomes:
Determinism is essential: it allows governance teams to test policies, reason about outcomes, and demonstrate control under audit. Without enforcement, policies remain advisory and cannot be proven effective.
6. Governance Lifecycle: From Policy to Proof
Governance is not a one-time configuration; it is a lifecycle. Enterprises define policies, test them against real usage, enforce them inline, and use evidence to refine controls.
The enforcement layer makes this lifecycle practical by producing measurable outcomes and audit-ready evidence at every stage:
Define
Select from standard policy templates or create custom policies aligned to organizational requirements and compliance frameworks.
Test
Validate policies against real usage patterns. Because enforcement is deterministic, policies can be tested and reasoned about as engineering controls.
Enforce
Apply policies inline at the moment of interaction. Every decision is deterministic and logged.
Refine
Use evidence from real enforcement to iterate and improve controls over time.
Without enforcement, policies remain advisory and cannot be proven effective. The enforcement layer transforms governance from guidance into infrastructure.
7. Compliance and Evidence Generation
Regulated industries require demonstrable controls. Containment.ai supports compliance programs by producing deterministic enforcement evidence suitable for audit and investigation.
Policies can be mapped to requirements under established regulatory frameworks:
The platform's logs show what was governed, what was prevented, and how governance decisions were applied under policy. As regulation evolves, the need for provable control remains stable.
The platform is designed to make governance evidence an output of normal operation rather than a bespoke reporting project.
8. Conclusion: The Future is Contained
Humanity has encountered this problem before. Nuclear energy required containment. Financial systems required regulation. Aviation required air traffic control. Intelligence at scale has always demanded governance. Artificial intelligence is no different.
Containment by Design defines a new security baseline for intelligent systems: powerful yet bounded, adaptive yet governable, intelligent without ever exceeding human authority.
AI adoption will accelerate, and autonomy will increase. The organizations that succeed will be those that can innovate without surrendering governance. Deterministic containment is the foundation that makes this possible.
Build intelligence. Preserve control.