A founding document from Cogensec

agentegrity.

Why AI agents need structural integrity — not more guardrails.

March 2026 · cogensec.com · Glossary ↓

The Problem With Guardrails

The AI security industry has a guardrails problem. Not because guardrails don't work — they do, within narrow constraints. The problem is that guardrails have become the default mental model for how to secure autonomous AI agents, and that mental model is fundamentally wrong.

A guardrail is an external constraint applied to an AI system from the outside. It sits between the agent and the world, filtering inputs and outputs. It does not understand the agent's decision architecture. It does not travel with the agent when the agent moves to a new environment. It does not adapt when the agent encounters novel adversarial conditions it was not designed for.

Guardrails are the perimeter firewalls of the AI era — and they will fail for the same reasons perimeter firewalls failed. The threat moved inside the perimeter. The environment became too dynamic for static rules. And the systems being protected became too autonomous to be governed by external policy alone.

Guardrails protect AI agents the way a cage protects a bird. The bird is contained — but it is not resilient. Remove the cage, and the bird has no defenses of its own.

This is the state of AI agent security today. We have built elaborate cages around increasingly powerful autonomous systems. We have not built agents that are structurally sound.


Introducing Agentegrity

Agentegrity is the structural integrity of an autonomous AI agent — its capacity to maintain intended behavior, decision coherence, and operational safety under adversarial conditions, across any environment it operates in.

The term is deliberate. In structural engineering, integrity means a system can bear its designed load without failure, deformation, or collapse. In data systems, integrity means information remains accurate, consistent, and uncompromised. Agentegrity applies the same concept to autonomous AI: an agent with high agentegrity maintains its intended function even when adversaries attempt to corrupt its perception, reasoning, or actions.

Agentegrity is not a product. It is a discipline — a measurable property of AI agent systems that can be tested, benchmarked, and improved. It is the organizing principle for a new category of security that is native to autonomous agents, not borrowed from legacy cybersecurity frameworks designed for deterministic software.

Three properties define agentegrity:

Adversarial Coherence. The agent's decision-making remains consistent and aligned with its intended purpose under adversarial perturbation — including prompt injection, tool manipulation, memory poisoning, sensor spoofing, and cascading multi-agent failures.
Environmental Portability. The agent's security properties are intrinsic, not environmental. An agent with agentegrity maintains its defenses whether it is orchestrating API calls in a cloud environment, controlling a robotic arm on a factory floor, or navigating an autonomous vehicle through an urban intersection.
Verifiable Assurance. Agentegrity is provable. It is not a claim — it is a measurement. Through adversarial red teaming, behavioral benchmarking, and runtime monitoring, agentegrity can be quantified, compared, and certified.

Guardrails vs. Agentegrity

The distinction is not semantic. It is architectural.

Guardrails
Agentegrity
Exogenous — applied from outside
Intrinsic — embedded within
Intercepts inputs & outputs at boundary
Operates inside the decision loop
Must be rebuilt per environment
Travels with the agent
No residual defense when bypassed
Defense persists without external controls
Compliance checkbox
Measurable, benchmarkable property

Consider an analogy from structural engineering. A guardrail on a bridge prevents cars from going over the edge. It does not make the bridge itself stronger. If the bridge's structural integrity fails, the guardrail is irrelevant. Agentegrity is the discipline of building bridges that do not fail — not adding more guardrails to bridges that might.


The Dual-Domain Imperative

The urgency of the agentegrity discipline is accelerating because AI agents are no longer confined to software. They are entering the physical world.

Autonomous robots, drones, vehicles, manufacturing systems, and smart infrastructure are all governed by AI agents that perceive through sensors, reason through models, and act through physical actuators. The attack surface extends beyond prompt injection into sensor spoofing, actuation hijacking, sim-to-real transfer attacks, and adversarial manipulation of physical environments.

The current AI security industry is built entirely for digital agents. It has no framework, no tooling, and no benchmarks for physical AI security. A compromised software agent leaks data. A compromised physical agent causes real-world harm.

Agentegrity is environment-agnostic by design. It secures the agent's decision architecture — not the environment the agent happens to occupy. This is why it is the only framework that scales from digital to physical AI without being rebuilt.

The convergence of digital and physical AI security into a single discipline is not a prediction. It is an inevitability. Agentegrity is the discipline built for this convergence from day one.


Measuring Agentegrity

A discipline requires measurement. Agentegrity is a quantifiable property, assessed across four dimensions:

Adversarial Resistance. Performance under systematic red teaming — prompt injection resistance, tool misuse detection, memory integrity, and for physical agents, sensor spoofing resilience and actuation boundary enforcement.
Behavioral Consistency. Stability of decision-making across environmental variations, input perturbations, and extended operational periods. Behavioral drift is one of the most insidious failure modes in autonomous systems.
Recovery Integrity. When compromised, how quickly and completely does the agent restore intended behavior? High agentegrity means recovery without human intervention.
Cross-Domain Portability. Does the agent's security posture degrade across environments? Agentegrity that only holds in a sandbox is not agentegrity at all.

These dimensions form the foundation of an agentegrity scoring framework — a standardized assessment that enables organizations to compare, certify, and improve the structural integrity of their AI agents. What the industry lacks is not more guardrail products, but a measurement science for agent security.


The Architecture

Building agentegrity requires a fundamentally different security architecture than applying guardrails. The agentegrity stack has three layers:

The Adversarial Layer continuously tests the agent's defenses through automated red teaming. It generates adversarial inputs, simulates attack scenarios, and probes for vulnerabilities across the perception-reasoning-action loop. In physical AI, this includes simulation-based adversarial testing in synthetic environments. It does not wait for attacks — it manufactures them proactively.
The Cortical Layer is a family of specialized security models embedded within the agent's decision architecture. These models perform adversarial input detection, policy enforcement, behavioral anomaly detection, and decision validation in real time. They operate inside the agent's reasoning loop. They are the source of intrinsic security — the reason agentegrity persists when external controls are absent.
The Governance Layer provides runtime monitoring, observability, and compliance enforcement across deployed agent populations. It tracks agentegrity scores over time, detects degradation, and enforces organizational policies without requiring the agent to be rebuilt.

These three layers form a closed loop. Red teaming discovers weaknesses. Embedded models remediate them. Governance monitors the result. The loop runs continuously. Agentegrity is not a state you achieve. It is a condition you maintain.


Why Now

Agentic AI has crossed the autonomy threshold. Agents plan, execute multi-step tasks, invoke tools, retain memory, and operate with minimal oversight. The agent's internal decision architecture is now the primary attack surface.

Physical AI is scaling rapidly. The infrastructure for AI agents to operate in physical environments is being built now. Humanoid robots, autonomous vehicles, industrial automation, and smart infrastructure are transitioning from research to deployment. The security discipline for these systems does not yet exist.

Regulatory frameworks are forming. The EU AI Act, NIST AI RMF, autonomous vehicle safety standards, and industrial robotics regulations all require demonstrable assurance. Guardrails are a compliance checkbox. Agentegrity is the substantive answer to the question regulators are actually asking: how do you know this agent is safe?


The Commitment

We believe the security paradigm built for the pre-agentic era is not adequate for autonomous systems that perceive, reason, and act across digital and physical domains.

Guardrails were the right answer for the first generation of AI — when models were stateless, tool-less, and human-supervised. They are not the right answer for autonomous agents that operate independently, retain memory, invoke tools, and increasingly inhabit physical systems where failure has real-world consequences.

Agentegrity is the discipline we need. Security that is intrinsic to the agent. Security that is measurable. Security that spans digital and physical domains because it secures the agent's architecture, not its environment.

We did not coin the term agentegrity to name a product. We coined it to name a discipline — one that the industry will inevitably need, and one that we intend to define.

The Agentegrity Glossary

A living vocabulary for the discipline of AI agent structural integrity.

Version 1.0 · March 2026 · Cogensec

Suggested CitationCogensec. (2026). The Agentegrity Glossary, Version 1.0. Retrieved from https://cogensec.com/agentegrity#glossary

This glossary defines the core terminology of agentegrity — the discipline of securing autonomous AI agents through intrinsic structural integrity rather than exogenous constraints. Terms marked with † Novel are novel to the agentegrity discipline. All others are existing concepts recontextualized within the agentegrity framework.

Core Concepts

Agentegrity † Novel
The structural integrity of an autonomous AI agent — its measurable capacity to maintain intended behavior, decision coherence, and operational safety under adversarial conditions, across any environment it operates in. Agentegrity is a property of the agent's architecture, not an external layer applied to it. Analogous to structural integrity in engineering: a bridge with integrity bears its load without external support. An agent with agentegrity maintains its security without external guardrails.
Agentegrity Score † Novel
A composite metric quantifying an agent's structural integrity across four dimensions: adversarial resistance, behavioral consistency, recovery integrity, and cross-domain portability. Expressed as a normalized value enabling comparison across agent architectures, deployment environments, and operational contexts.
Agentegrity Assessment † Novel
A structured evaluation of an AI agent's structural integrity using the Agentegrity Framework methodology. Combines automated adversarial red teaming, behavioral benchmarking, recovery testing, and cross-domain portability validation to produce an agentegrity score. Distinguished from a penetration test by its focus on the agent's intrinsic resilience rather than its perimeter defenses.
Agentegrity Posture † Novel
The aggregate agentegrity state of an organization's deployed AI agent population at a given point in time. Encompasses individual agent scores, environmental coverage, policy compliance, and degradation trends. Analogous to "security posture" in traditional cybersecurity, but specific to autonomous agent systems.
Exogenous Security
Security measures applied to an AI agent from outside its decision architecture. Includes guardrails, input-output filters, policy wrappers, and inference-time safety layers. Exogenous security does not alter the agent's internal reasoning and does not persist when external controls are removed or bypassed. Contrast with intrinsic security.
Intrinsic Security
Security properties embedded within an AI agent's own model architecture, reasoning chain, and behavioral framework. Intrinsic security operates inside the agent's decision loop, travels with the agent across environments, and persists without external enforcement. The foundational principle of the agentegrity discipline.
Guardrail Dependency † Novel
The condition in which an AI agent's security relies entirely on exogenous controls, leaving no residual defense when those controls are removed, bypassed, or misconfigured. Guardrail dependency is the primary failure mode that agentegrity is designed to eliminate. An agent with guardrail dependency has an agentegrity score approaching zero regardless of how sophisticated its external protections are.

Threat Model

Perception-Decision-Action Loop (PDA Loop)
The complete information flow within an autonomous AI agent: sensory or data inputs (perception), model-based reasoning and planning (decision), and tool invocation or physical actuation (action). The PDA loop is the primary attack surface in the agentegrity framework. Each stage presents distinct vulnerability classes.
Adversarial Coherence
The property of an AI agent whose decision-making remains consistent and aligned with its intended purpose under adversarial perturbation. An agent with high adversarial coherence does not exhibit goal drift, policy violation, or behavioral divergence when subjected to prompt injection, tool manipulation, memory poisoning, or sensory spoofing.
Behavioral Drift
The gradual, often imperceptible deviation of an AI agent's actions from its intended behavioral policy over extended operational periods. Behavioral drift can be induced adversarially (through slow poisoning of memory or context) or emerge organically (through compounding model uncertainty across decision chains). One of the most insidious failure modes in autonomous systems because it evades point-in-time testing.
Behavioral Drift Rate (BDR) † Novel
A quantitative measure of how rapidly an agent's observed behavior diverges from its intended policy baseline per unit of operational time or decision cycles. Expressed as a percentage deviation per time interval. A core metric in the behavioral consistency dimension of the agentegrity score.
Cascade Compromise
A failure mode in multi-agent systems where the compromise of one agent propagates through tool calls, shared memory, or inter-agent communication to corrupt downstream agents. Cascade compromise is uniquely dangerous in agentic architectures because it exploits trust relationships between agents. Agent-level agentegrity is necessary but insufficient to prevent cascade compromise — system-level agentegrity assessment is also required.
Prompt-to-Physical Exploit † Novel
An attack vector in which adversarial input delivered through a digital channel (prompt injection, tool output poisoning, memory corruption) results in unintended physical action by an embodied AI agent. Represents the convergence of digital and physical AI security threat models.
Actuation Hijacking
The adversarial manipulation of an embodied AI agent's physical actuators — robotic arms, vehicle controls, drone flight systems, industrial machinery — through compromise of the agent's decision architecture rather than direct hardware exploitation. Distinguished from traditional OT attacks by targeting the AI reasoning layer rather than the control system itself.
Sensor Spoofing (Agentic Context)
The injection of falsified sensory data — visual, lidar, radar, acoustic, or proprioceptive — into an embodied AI agent's perception layer to induce incorrect decisions. In the agentegrity framework, sensor spoofing is assessed as a perception-layer attack on the PDA loop.
Sim-to-Real Transfer Attack
An adversarial technique that exploits the gap between simulated training environments and real-world deployment conditions to induce agent failure. Attackers craft inputs that are benign in simulation but adversarial in physical deployment.
Memory Poisoning (Persistent)
The corruption of an AI agent's long-term memory or context store such that adversarial beliefs, false policies, or manipulated history persist across sessions and influence future decisions. Distinguished from single-session prompt injection by its durability.
Confused Deputy (Agentic Context)
A privilege escalation attack in which an adversary does not directly access a target system but instead manipulates a trusted AI agent into performing unauthorized actions on its behalf. Exploits the agent's legitimate tool access and permissions. The agentic confused deputy is particularly dangerous because the agent believes it is acting correctly.

Measurement

Adversarial Resistance Index (ARI) † Novel
A composite score measuring an agent's resilience to adversarial attack across standardized test suites. Covers prompt injection resistance, tool misuse detection, memory integrity under poisoning, behavioral manipulation resistance, and — for physical agents — sensor spoofing resilience and actuation boundary enforcement.
Behavioral Consistency Index (BCI) † Novel
A measure of an agent's decision-making stability across environmental variations, input perturbations, and extended operational periods. Computed by comparing the agent's decision distribution under normal and perturbed conditions.
Recovery Half-Life † Novel
The time (or number of decision cycles) required for a compromised agent to restore 50% of its intended behavioral baseline following a successful attack. A core metric in the recovery integrity dimension of the agentegrity score.
Cross-Domain Portability Score (CDPS) † Novel
A measure of how well an agent's security properties transfer across operating environments — from sandbox to production, from cloud to edge, from digital to physical. An agent with high CDPS maintains consistent security regardless of deployment context.
Agentegrity Degradation Curve † Novel
A time-series representation of an agent's agentegrity score over its operational lifecycle. Tracks how structural integrity changes as the agent accumulates operational history.
Cortical Embedding Depth † Novel
A measure of how deeply security intelligence is integrated into an agent's decision architecture, ranging from surface-level (output filtering) to deep (embedded within the reasoning chain). Higher cortical embedding depth correlates with stronger intrinsic security.

Architecture

Agentegrity Stack † Novel
The three-layer security architecture required to build and maintain agentegrity in autonomous AI agents: the Adversarial Layer (continuous red teaming), the Cortical Layer (embedded security models), and the Governance Layer (runtime monitoring and compliance). The three layers form a closed loop: attack, defend, observe, repeat.
Adversarial Layer
The first layer of the agentegrity stack. Responsible for continuously testing the agent's defenses through automated adversarial red teaming, attack simulation, and vulnerability probing across the full PDA loop. The adversarial layer does not wait for attacks — it manufactures them proactively.
Cortical Layer
The second layer of the agentegrity stack. A family of specialized security models embedded within the agent's decision architecture. Cortical models perform real-time adversarial input detection, policy enforcement, behavioral anomaly detection, and decision validation.
Governance Layer
The third layer of the agentegrity stack. Provides runtime monitoring, observability, and compliance enforcement across deployed agent populations. Operates at the fleet level rather than the individual agent level.
Cortical Model † Novel
A purpose-built security model designed for embedding within an AI agent's decision architecture. Unlike guardrail models that operate at the input-output boundary, cortical models participate in the agent's reasoning process — detecting adversarial patterns, enforcing behavioral policies, and validating decision coherence in real time.
Security Reflex † Novel
An automated, low-latency defensive response triggered by a cortical model when adversarial conditions are detected. Analogous to a biological reflex: it executes before conscious reasoning, preventing adversarial inputs from reaching the decision layer.

Domain-Specific Concepts

Digital Agentegrity † Novel
The application of agentegrity principles to AI agents operating in software environments — orchestrating APIs, invoking tools, managing data, interacting with cloud services, and communicating with other digital agents.
Physical Agentegrity † Novel
The application of agentegrity principles to AI agents operating in the physical world — robotic systems, autonomous vehicles, drones, industrial machinery, and smart infrastructure. Encompasses safety-critical considerations where agent failure results in real-world harm.
Convergent Agentegrity † Novel
The unified practice of securing AI agents that operate fluidly across digital and physical domains. The terminal state of the discipline — the point at which digital and physical agentegrity are no longer distinguishable specializations but aspects of a single practice.
Environmental Portability
The property of security measures that persist when an AI agent transitions between operating environments. A core pillar of the agentegrity thesis: security that depends on the environment is not agentegrity.
Agent Recovery State
The behavioral condition of an AI agent during the period between successful compromise and full restoration of intended operation. During the agent recovery state, the agent may exhibit degraded decision quality, inconsistent behavior, or partial policy compliance. The duration and severity are key determinants of the recovery integrity dimension of the agentegrity score.

Organizational Concepts

Agentegrity Policy † Novel
An organizational directive specifying minimum acceptable agentegrity scores, assessment frequency, remediation timelines, and reporting requirements for deployed AI agent populations. Functions as the agent-specific equivalent of a security policy.
Agentegrity Certification † Novel
A formal attestation that an AI agent or agent system meets a defined agentegrity threshold across all four measurement dimensions. Analogous to SOC 2 or ISO 27001 certification but specific to autonomous agent systems.
Agentegrity Debt † Novel
The accumulated deficit between an organization's current agentegrity posture and its defined agentegrity policy requirements. Analogous to technical debt: agentegrity debt increases when agents are deployed without assessment, when assessment findings are not remediated, or when agents operate beyond their assessed environments.

This glossary is maintained by Cogensec as a public resource for the agentegrity discipline. Terms, definitions, and measurement specifications will evolve as the field matures. Contributions, critiques, and extensions are welcome.