# Agentegrity — Full Documentation Bundle
> Single-file plaintext bundle of the entire Agentegrity framework documentation, intended for LLM crawlers (ChatGPT, Claude, Perplexity, Gemini) and offline ingestion.
Source repository: https://github.com/Cogensec/agentegrity-framework (branch: main, Apache 2.0)
Synced at: 2026-05-07T22:44:18.530Z
Canonical docs site: https://agentegrity.cogensec.com/docs
Founding research paper: https://cogensec.com/research/agentegrity-framework
Maintainer: Cogensec (https://cogensec.com) — NVIDIA Inception Program member.
---
================================================================
# README — Overview
Source: README.md
================================================================
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/downloads/)
[](pyproject.toml)
[](spec/SPECIFICATION.md)
**Building AI agents capable of securing themselves.**
Every existing AI security tool builds protection that humans apply to agents from the outside. Guardrails filter inputs. Runtime monitors watch outputs. Policy engines enforce rules. These are necessary, and Agentegrity does not replace them. Agentegrity addresses a different question: how do you measure whether the agent itself has the structural integrity to remain coherent when those external controls cannot reach inside its decision process?
Agentegrity (agent + integrity) is the discipline of building AI agents that can defend themselves, stabilize themselves, and recover themselves — and then verifying that they actually can. This repository provides the open specification, the reference architecture, and a Python implementation for that verification.
---
## Why This Matters Now
Frontier model labs ship better base models on a regular cadence. Each new release reduces the rate at which the underlying model produces unsafe outputs in isolated benchmarks. This is real progress, and it does not solve the agent security problem.
Enterprises do not deploy base models. They deploy compositions: a base model wrapped in system prompts, augmented with retrieval over private data, given access to tools that touch customer systems, equipped with persistent memory, orchestrated through planning loops, and embedded in environments that produce inputs the model was never trained against. Every capability gain in the underlying model enables more ambitious compositions with more attack surface. The composition layer is where security failures occur, and the composition layer is not what the model labs are improving.
Agentegrity is positioned at the composition layer specifically. Its measurements are about whether the assembled agent — not the underlying model — has the structural properties required to maintain integrity under adversarial pressure, across deployment contexts, and over time.
---
## The Three Self-Securing Capabilities
A self-securing agent maintains three properties simultaneously. Each property is a capability the agent has, not a control imposed on it from outside. The Agentegrity Framework defines how to verify each one.
| Capability | What The Agent Does | What This Prevents |
|---|---|---|
| **Self-Defense** | Maintains coherent reasoning under adversarial pressure across all input channels | Goal hijacking, prompt injection, indirect injection via retrieved content, tool output poisoning |
| **Self-Stability** | Monitors its own behavioral drift against an established baseline and detects internal state corruption | Slow-drift attacks, memory poisoning, gradual goal redirection, identity erosion |
| **Self-Recovery** | Detects when its integrity has been compromised and restores itself to a known-good state | Persistent compromise, undetected lateral movement, state pollution across sessions |
v0.6.0 ships verification for all three capabilities (self-defense via the adversarial layer, self-stability via the cortical layer with optional LLM-backed semantic checks, self-recovery via the recovery layer with persistable checkpoint round-trip) across **eleven zero-config framework adapters** — five in Python (**Claude Agent SDK**, **LangChain / LangGraph**, **OpenAI Agents SDK**, **CrewAI**, **Google ADK**) and six in TypeScript (the same five, plus **Vercel AI SDK** which has no Python equivalent). All eleven share the `SessionExporter` extension point that lets any subscriber (including the commercial `agentegrity-pro` dashboard) receive live session data without touching the agent, and the same evaluator pipeline and attestation chain — a 2-3 line instrumentation on any of these frameworks produces the same signed audit trail.
---
## The Four Layers
The framework implements verification through four architectural layers. Each layer addresses a different dimension of integrity. Together they form a complete envelope around the agent.
```
┌─────────────────────────────────────────────┐
│ RECOVERY LAYER │
│ Compromise detection · Continuity · │
│ Sustained-degradation tracking │
├─────────────────────────────────────────────┤
│ GOVERNANCE LAYER │
│ Policy enforcement · Human oversight · │
│ Compliance mapping · Audit trails │
├─────────────────────────────────────────────┤
│ CORTICAL LAYER │
│ Reasoning consistency · Memory checks · │
│ Behavioral baselines · Drift detection │
├─────────────────────────────────────────────┤
│ ADVERSARIAL LAYER │
│ Attack surface mapping · Threat │
│ detection · Coherence scoring │
└─────────────────────────────────────────────┘
```
The **Adversarial Layer** verifies self-defense by mapping the agent's attack surface and detecting threats across input channels. The **Cortical Layer** verifies self-stability by monitoring reasoning consistency, memory integrity, and behavioral drift from baseline. The **Governance Layer** enforces organizational policy and produces audit trails so verification results have a place to live in compliance workflows. The **Recovery Layer** verifies self-recovery by tracking the attestation chain for continuity, watching score history for sustained degradation, and confirming the agent declares the recovery capabilities it claims (`state_restore`, `checkpoint`, `rollback`, `session_reset`).
---
## What This Library Does (and Does Not)
We believe in being explicit about what the library is and is not, because a security library that overpromises is worse than one that underdelivers.
**What it does.** It provides a Python implementation of the four-layer verification architecture defined in the [Agentegrity Specification](spec/SPECIFICATION.md). It computes integrity scores from real evaluation runs, generates cryptographically signed attestation records, builds tamper-evident attestation chains, and produces structured audit logs for governance workflows. It runs locally with zero required dependencies and never makes network calls to Cogensec or any other service. It ships with extension points for custom threat detectors, custom policy rules, and custom validators.
**What it does not do.** The adversarial layer ships a regex pattern taxonomy across six attack families (prompt_injection, jailbreak, role_confusion, system_prompt_extraction, data_exfiltration, prompt_obfuscation) — calibrated 1.000 TPR / 0.000 FPR on the in-repo synthetic suite, but **0.000 TPR on the InjecAgent benchmark** (N=2,108) because action-oriented injections embedded in tool responses don't match the regex patterns. Closing that gap requires either an embedding-similarity check or an LLM-backed semantic classifier — both planned for the next release. The cortical layer uses Jensen-Shannon distance with Laplace smoothing for drift detection (replaces the older asymmetric KL approximation) and structural memory-provenance inspection. v0.2.0 introduced optional LLM-backed cortical checks (`pip install agentegrity[llm]`) that use Claude for semantic reasoning-chain validation, memory-provenance analysis, and drift classification; these run alongside the pattern-based checks and fail open on API errors. Production deployments should also register custom detectors with domain-specific logic. As of v0.6.0 the library ships eleven framework adapters — five in Python (Claude Agent SDK, LangChain / LangGraph, OpenAI Agents SDK, CrewAI, Google ADK) and six in TypeScript (the same five plus Vercel AI SDK). Adapters for Semantic Kernel, AutoGen, and AWS Bedrock Agents are on the post-0.6 roadmap.
**What it deliberately is not.** It is not a guardrail. It does not block agent actions on its own — when an action is blocked, that is the result of explicit governance policy, not inferred risk. It is not a runtime enforcement layer trying to compete with WAF-style products. It is not a hosted service. It is a measurement and verification library, and everything it does is in service of producing evidence that an agent has (or lacks) the structural properties of a self-securing system.
---
## Quick Start
### Installation
```bash
pip install "agentegrity[claude]" # Claude Agent SDK
pip install "agentegrity[langchain]" # LangChain + LangGraph
pip install "agentegrity[openai-agents]" # OpenAI Agents SDK
pip install "agentegrity[crewai]" # CrewAI
pip install "agentegrity[google-adk]" # Google Agent Development Kit
```
Other extras: `[crypto]` (Ed25519 attestation signing), `[llm]` (LLM-backed cortical checks via the Anthropic API), `[all]` (everything).
### Instrument an existing Claude Agent SDK agent
Three lines of agentegrity, zero configuration. `hooks()` lazily builds a default adapter with a sensible `AgentProfile`, the full four-layer evaluator, and measure-only semantics (it never blocks tool calls).
```python
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
from agentegrity.claude import hooks, report
async with ClaudeSDKClient(options=ClaudeAgentOptions(hooks=hooks())) as sdk:
await sdk.query("Summarize the latest LLM safety papers")
print(report())
```
`report()` returns the session summary — evaluation count, attestation chain length, whether the chain verifies, and enforcement mode. For a cryptographically signed audit trail, pair this with the `[crypto]` extra.
### Instrument LangChain / LangGraph, OpenAI Agents, CrewAI, or Google ADK
Same three-line shape for every supported framework:
```python
# LangChain or LangGraph (one adapter, both frameworks)
from agentegrity.langchain import instrument_chain, instrument_graph, report
chain = instrument_chain(my_chain); chain.invoke({"input": "..."}); print(report())
# OpenAI Agents SDK
from agents import Runner
from agentegrity.openai_agents import run_hooks, report
await Runner.run(agent, input="...", hooks=run_hooks()); print(report())
# CrewAI
from agentegrity.crewai import instrument, report
instrument(); crew.kickoff(); print(report())
# Google ADK
from agentegrity.google_adk import instrument, report
instrument(agent); print(report())
```
Every adapter uses the same default profile, evaluator, and attestation chain as the Claude path — pass `profile=`, `client=`, `enforce=True`, or `api_key=` to override.
Quick sanity check from the terminal:
```bash
python -m agentegrity # version + installed adapters
python -m agentegrity doctor # end-to-end self-check, prints composite score
```
### Export session data to a dashboard or external sink
Every adapter exposes `register_exporter(exporter)`. Implement three async methods — `on_session_start`, `on_event`, `on_session_end` — and every evaluated event streams to your exporter as JSON-ready dicts. Exporter exceptions are caught and logged so a broken sink can never break the agent.
```python
from agentegrity.langchain import register_exporter, instrument_graph
class PrintExporter:
async def on_session_start(self, session_id, adapter_name, profile): ...
async def on_event(self, session_id, event):
print(event["event_type"], event["evaluation_result"])
async def on_session_end(self, session_id, summary): ...
register_exporter(PrintExporter())
graph = instrument_graph(my_graph)
```
This is the integration point the commercial [**`agentegrity-pro`**](https://github.com/cogensec/agentegrity-pro) dashboard listens on. Deploy the pro backend with `docker compose up`, set `AGENTEGRITY_URL` and `AGENTEGRITY_TOKEN` on the agent, and the default adapter streams every session over the published exporter HTTP API — no extra package required.
### Non-Python agents (TypeScript / Bun / Node)
TypeScript agents get the same **2–3 line zero-config** DX as the Python adapters. Install the adapter that matches your framework; each one sets `AGENTEGRITY_URL` / `AGENTEGRITY_TOKEN` from env and streams events through any `SessionExporter` you register.
**Claude Agent SDK:**
```ts
import { query } from "@anthropic-ai/claude-agent-sdk";
import { hooks } from "@agentegrity/claude-sdk";
await query({ prompt: "hi", options: { hooks: hooks() } });
```
**LangChain JS:**
```ts
import { ChatAnthropic } from "@langchain/anthropic";
import { instrument } from "@agentegrity/langchain";
await new ChatAnthropic({ callbacks: [instrument()] }).invoke("hi");
```
**OpenAI Agents SDK:**
```ts
import { Agent, run } from "@openai/agents";
import { runHooks } from "@agentegrity/openai-agents";
await run(agent, "hi", { hooks: runHooks() });
```
**CrewAI JS:**
```ts
import { instrument } from "@agentegrity/crewai";
instrument().attach(crew.events);
```
**Google ADK:**
```ts
import { instrument } from "@agentegrity/google-adk";
const close = instrument(agent);
```
**Vercel AI SDK** (TypeScript-native — no Python equivalent):
```ts
import { streamText } from "ai";
import { instrument } from "@agentegrity/vercel-ai";
await streamText({ model, prompt: "hi", experimental_telemetry: instrument() });
```
Each adapter re-exports `report()`, `reset()`, and `registerExporter()` for the same flow you get in Python. The low-level `@agentegrity/client` reporter is still available for custom frameworks, and the wire format is published as JSON Schema (`schemas/exporter/`) and OpenAPI 3.1 (`schemas/openapi.yaml`). Drift between the Python `to_dict()` output and the schemas is caught in CI by `tests/test_schemas.py`.
### Evaluate an arbitrary agent profile
For agents outside the Claude SDK — or for one-off profile scoring — the high-level `AgentegrityClient` runs the full four-layer evaluation in four lines:
```python
from agentegrity import AgentegrityClient
client = AgentegrityClient()
score = client.evaluate(client.create_profile(name="my-agent"))
print(f"{score.composite:.3f} ({score.action})")
```
### Runtime monitoring with attestation
For long-running agents, wrap actions with `@monitor.guard` to run pre- and post-execution checks and build a tamper-evident attestation chain:
```python
from agentegrity import IntegrityMonitor
monitor = IntegrityMonitor(
profile=client.create_profile(name="my-agent"),
evaluator=client.evaluator,
threshold=0.70,
enable_attestation=True,
)
@monitor.guard
async def agent_action(context=None):
return await agent.execute(context)
result = await agent_action(context={"action": {"type": "tool_call"}})
print(f"Records: {len(monitor.attestation_chain)}")
print(f"Chain valid: {monitor.attestation_chain.verify_chain()}")
```
### Configuring the evaluator
When the defaults aren't enough — custom thresholds, custom layer weights, custom threat detectors — drop down to `IntegrityEvaluator` directly:
```python
from agentegrity import IntegrityEvaluator
from agentegrity.layers import AdversarialLayer, CorticalLayer, GovernanceLayer
evaluator = IntegrityEvaluator(
layers=[
AdversarialLayer(coherence_threshold=0.85),
CorticalLayer(drift_tolerance=0.10),
GovernanceLayer(policy_set="enterprise-default"),
]
)
```
See [`examples/`](examples/) for walkthroughs including custom threat detectors, custom policy rules, and the full explicit-config Claude adapter flow.
---
## Repository Structure
```
agentegrity-framework/
├── MANIFESTO.md # The Agentegrity Manifesto
├── README.md # You are here
├── LICENSE # Apache 2.0
├── pyproject.toml # Package configuration
├── agentegrity-glossary.md # Vocabulary of the discipline
│
├── spec/ # Framework Specification
│ ├── SPECIFICATION.md # Full technical specification
│ ├── properties/ # Property definitions
│ │ ├── adversarial-coherence.md
│ │ ├── environmental-portability.md
│ │ └── verifiable-assurance.md
│ └── layers/ # Layer architecture
│ ├── adversarial-layer.md
│ ├── cortical-layer.md
│ └── governance-layer.md
│
├── src/agentegrity/ # Python Reference Implementation
│ ├── __init__.py
│ ├── __main__.py # `python -m agentegrity` + doctor CLI
│ ├── claude.py # Zero-config Claude Agent SDK surface
│ ├── langchain.py # Zero-config LangChain + LangGraph surface
│ ├── openai_agents.py # Zero-config OpenAI Agents SDK surface
│ ├── crewai.py # Zero-config CrewAI surface
│ ├── google_adk.py # Zero-config Google ADK surface
│ ├── core/ # Core abstractions
│ │ ├── profile.py # AgentProfile (+ .default() factory)
│ │ ├── evaluator.py # IntegrityEvaluator, PropertyWeights
│ │ ├── attestation.py # AttestationRecord, AttestationChain
│ │ └── monitor.py # IntegrityMonitor with @guard decorator
│ ├── layers/ # Layer implementations
│ │ ├── adversarial.py # AdversarialLayer (self-defense)
│ │ ├── cortical.py # CorticalLayer (self-stability)
│ │ ├── governance.py # GovernanceLayer (policy + audit)
│ │ └── recovery.py # RecoveryLayer (self-recovery)
│ ├── adapters/ # Framework integrations
│ │ ├── base.py # _BaseAdapter + FrameworkAdapter Protocol
│ │ ├── claude.py # ClaudeAdapter
│ │ ├── langchain.py # LangChainAdapter (covers LangGraph)
│ │ ├── openai_agents.py # OpenAIAgentsAdapter
│ │ ├── crewai.py # CrewAIAdapter
│ │ └── google_adk.py # GoogleADKAdapter
│ └── sdk/ # High-level convenience wrapper
│ └── client.py # AgentegrityClient
│
├── schemas/ # Cross-language wire contract
│ ├── exporter/ # JSON Schema (Draft 2020-12)
│ │ ├── common.json # Shared $defs (profile, event, score)
│ │ ├── session_start.json
│ │ ├── event.json
│ │ └── session_end.json
│ └── openapi.yaml # OpenAPI 3.1 spec for exporter endpoints
│
├── clients/
│ └── typescript/ # @agentegrity/client — TS/Bun/Node reporter
│ ├── src/ # AgentegrityReporter + types
│ ├── examples/ # basic.ts wiring example
│ ├── package.json
│ └── README.md
│
├── tests/ # Test suite (145 tests, all passing)
│
└── examples/ # Usage examples
├── claude_adapter.py
├── claude_adapter_advanced.py
├── langchain_adapter.py
├── openai_agents_adapter.py
├── crewai_adapter.py
├── google_adk_adapter.py
├── basic_evaluation.py
├── runtime_monitoring.py
└── custom_validator.py
```
---
## Roadmap
**v0.1.0 — Initial release.** Three-layer architecture (adversarial / cortical / governance), pattern-based reference checks, cryptographic attestation, custom validator and policy extension points, three working examples. The recovery layer joined as a default fourth layer in v0.6.0.
**v0.2.0 — Claude Agent SDK, LLM-backed checks, and self-recovery.** First framework adapter targeting the Claude Agent SDK with five integration points (Harness, Tools, Sandbox, Session, Orchestration). Optional LLM-backed cortical checks using Claude for semantic analysis of reasoning chains, memory provenance, and behavioral drift. Recovery integrity layer for self-recovery verification. Async-first evaluator pipeline that runs independent layers in parallel.
**v0.2.1 — Developer experience.** Zero-config `agentegrity.claude` top-level module (`hooks()` / `report()` / `reset()` — three-line instrumentation with no setup), `AgentProfile.default()` factory, `python -m agentegrity` info + `doctor` self-check CLI.
**v0.3.0 — Multi-framework adapters.** Four new framework adapters joining Claude — **LangChain / LangGraph**, **OpenAI Agents SDK**, **CrewAI**, and **Google Agent Development Kit** — each with the same three-line instrumentation surface. A `_BaseAdapter` shared by all five implementations means new frameworks are mostly mechanical to add going forward.
**v0.4.0 — Exporter hook + cross-language contract.** OSS-side `SessionExporter` protocol + `register_exporter()` on every adapter. Session data (session_start, every evaluated event, session_end) streams as JSON-ready dicts to any subscribed exporter, fail-open so a broken sink never breaks the agent. The wire format is published as **JSON Schema** (`schemas/exporter/`) and **OpenAPI 3.1** (`schemas/openapi.yaml`); a first-party **TypeScript client** in `clients/typescript/` (`@agentegrity/client`) lets Bun / Node agents emit the same event stream. The commercial dashboard ships separately as `agentegrity-pro`.
**v0.5.0 — Six TypeScript framework adapters.** Mirror the five Python adapters (Claude Agent SDK, LangChain JS, OpenAI Agents SDK, CrewAI JS, Google ADK) as dedicated npm packages — `@agentegrity/claude-sdk`, `@agentegrity/langchain`, `@agentegrity/openai-agents`, `@agentegrity/crewai`, `@agentegrity/google-adk` — plus a TypeScript-native `@agentegrity/vercel-ai` adapter for the Vercel AI SDK via its OpenTelemetry tracer surface. Every package depends on a shared `createDefaultAdapter()` helper in `@agentegrity/client` and ships the same 2-3 line zero-config DX as Python. Release workflow publishes all seven packages in a matrix; `scripts/check-versions.ts` enforces version parity with `pyproject.toml`.
**v0.5.3 — Release & build polish.** Concrete version pins on TypeScript workspace deps (replacing `workspace:*`) so published packages install cleanly off‑registry, GitHub Actions bumped to checkout@v5 / setup-python@v6 / setup-node@v5, scoped push triggers + concurrency cancellation in CI, repo moved to the `cogensec` org, and an `AGENTEGRITY_OFFLINE` env var so test runs work without a reporter. Adds a Python `scripts/check_versions.py` mirroring the TypeScript one to keep `pyproject.toml`, `src/agentegrity/__init__.py`, and the README badge / claim lines from drifting apart again.
**v0.6.0 — Detection depth + recovery round-trip + conformance + benchmark (current).** The adversarial layer's substring match becomes a 21-pattern regex taxonomy across six attack families. The cortical layer's drift metric becomes Jensen-Shannon distance with Laplace smoothing and a `min_drift_samples` guard. `RecoveryLayer` gains a real `Checkpoint` Protocol with `InMemory` / `File` / `Sqlite` reference backends and a tested `snapshot()` ↔ `restore_to()` round-trip. The cortical layer gains a parallel `BaselineStore` Protocol so behavioural baselines survive process restarts. A cross-adapter conformance suite pins 9 invariants × 5 adapters. A detection benchmark harness (`pytest -m benchmark`) runs the synthetic suite plus loaders for InjecAgent / PINT / AgentDojo; numbers published in `STATUS.md`. Branch coverage gates land on Python (≥85%) and TypeScript (≥80% lines / 70% functions). The recovery layer is promoted to a first-class fourth default layer; `PropertyWeights` defaults rebalanced so RI gets 0.15 of the composite. Full migration notes in `CHANGELOG.md`.
**v0.6.0 — More adapters and compliance output (next).** Adapters for Semantic Kernel, AutoGen, AWS Bedrock Agents. Compliance report generation for EU AI Act, NIST AI RMF, and ISO 42001. Observability exporters (OpenTelemetry, Datadog).
**v1.0.0 — Stable API (when ready).** Declared stable when the public API has been unchanged for a full minor release cycle, when the library has production deployments at three or more external organizations, and when the framework has been cited in at least one peer-reviewed publication. v1.0.0 is not a date — it's a signal that adoption has happened beyond our direct influence.
---
## Documentation
| Document | Description |
|---|---|
| [Manifesto](MANIFESTO.md) | The founding statement of agentegrity as a discipline |
| [Specification](spec/SPECIFICATION.md) | Full technical specification (properties, layers, controls, scoring) |
| [Glossary](agentegrity-glossary.md) | Vocabulary of the discipline, defined precisely |
| [Adversarial Layer](spec/layers/adversarial-layer.md) | Self-defense verification architecture |
| [Cortical Layer](spec/layers/cortical-layer.md) | Self-stability verification architecture |
| [Governance Layer](spec/layers/governance-layer.md) | Policy enforcement and audit architecture |
| [Recovery Layer](spec/layers/recovery-layer.md) | Self-recovery verification architecture |
---
## Design Principles
1. **Self-securing capability is the goal. Verification is the methodology.** The framework exists because agents need to be able to secure themselves. The scoring system is how we prove they can. Without the underlying capability, the score is theater. Without the verification methodology, the capability is unprovable. Both are required.
2. **Composition layer, not model layer.** Better base models do not eliminate the need for agent-level verification. They make compositions more capable and therefore more dangerous when compromised. The framework is positioned at the composition layer specifically because that's the layer model improvements don't close.
3. **Defense-in-depth, not defense-in-replacement.** Guardrails, runtime monitors, and network controls remain essential. Agentegrity adds a layer that sits inside the agent's decision process where exogenous controls cannot reach. The two complement each other.
4. **Cryptographic, not observational.** "We monitored the agent and it looked fine" is not assurance. Attestation records produced by this library are signed, chained, and independently verifiable. Verification means you can prove what the agent's state was at a point in time, not just that someone watched it.
5. **Open standard, plural implementations.** The specification is open. The reference implementation is Apache 2.0. Other implementations are welcome from any vendor, any framework, any deployment context. The integrity of autonomous agents is too important to be proprietary, and a single-vendor standard isn't a standard.
6. **Honest about limitations.** Every claim the library makes is defensible in writing. When checks can't run, they say so. When the implementation is a pattern-based reference rather than semantic analysis, the README says so. The worst possible outcome for this project is a published benchmark showing that our claims are louder than our implementation. We avoid that outcome by being the first to name limitations.
---
## Contributing
We welcome contributions. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
Priority areas for v0.4 and beyond:
- Additional framework adapters (Semantic Kernel, AutoGen, AWS Bedrock Agents, Agno)
- Minimal web dashboard for session visualization
- Compliance report generation (EU AI Act, NIST AI RMF, ISO 42001)
- Domain-specific validator libraries (healthcare, finance, embodied)
- Language ports (TypeScript, Go, Rust)
- Formal verification of layer interactions
- Cross-framework session merging (multiple adapters sharing one attestation chain)
---
## Citation
If you use the Agentegrity Framework in research or production, please cite:
```bibtex
@misc{agentegrity2026,
title={The Agentegrity Framework: Building and Verifying Self-Securing Autonomous AI Agents},
author={Cogensec Research},
year={2026},
url={https://github.com/cogensec/agentegrity-framework}
}
```
---
## License
Apache License 2.0. See [LICENSE](LICENSE) for details.
---
**Agentegrity is a Cogensec Research initiative.** The discipline is open. The framework is open. The code is open. We invite researchers, practitioners, and organizations building or deploying autonomous AI agents to adopt, implement, extend, and critique it.
[](https://github.com/cogensec/agentegrity-framework)
================================================================
# Quickstart
Source: docs/quickstart.md
================================================================
# Agentegrity Quickstart
Three copy-paste blocks. Pick the one that matches your setup and run it.
## 1. Instrument an existing Claude Agent SDK agent
```bash
pip install "agentegrity[claude]"
```
```python
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
from agentegrity.claude import hooks, report
async with ClaudeSDKClient(options=ClaudeAgentOptions(hooks=hooks())) as sdk:
await sdk.query("Summarize the latest LLM safety papers")
print(report())
```
`hooks()` lazily builds a default adapter with a generic `AgentProfile`, the full four-layer evaluator (adversarial, cortical, governance, recovery), and measure-only semantics. It never blocks tool calls unless you pass `enforce=True`. `report()` returns a session summary — evaluation count, attestation chain length, whether the chain verifies.
## 1b. Instrument LangChain / LangGraph, OpenAI Agents, CrewAI, or Google ADK
Same shape, one import per framework. Pick the extra that matches your stack:
```bash
pip install "agentegrity[langchain]" # covers LangChain + LangGraph
pip install "agentegrity[openai-agents]"
pip install "agentegrity[crewai]"
pip install "agentegrity[google-adk]"
```
```python
# LangChain
from agentegrity.langchain import instrument_chain, report
chain = instrument_chain(my_chain); chain.invoke({"input": "..."}); print(report())
# LangGraph
from agentegrity.langchain import instrument_graph, report
graph = instrument_graph(my_compiled_graph); graph.invoke(state); print(report())
# OpenAI Agents SDK
from agents import Runner
from agentegrity.openai_agents import run_hooks, report
await Runner.run(agent, input="...", hooks=run_hooks()); print(report())
# CrewAI
from agentegrity.crewai import instrument, report
instrument(); crew.kickoff(); print(report())
# Google ADK
from agentegrity.google_adk import instrument, report
instrument(agent); runner.run(...); print(report())
```
Each module exposes the same `report()` / `reset()` / `adapter()` surface as `agentegrity.claude`, with `AgentProfile.default()` and the full four-layer evaluator. Pass `profile=`, `client=`, `enforce=True`, or `api_key=` to any of the entry points to override.
## 1c. Export session data to a dashboard or external sink
Every adapter exposes `register_exporter(exporter)` — subscribe anything that implements the `SessionExporter` protocol (`on_session_start`, `on_event`, `on_session_end`) and it receives live session data as JSON-ready dicts:
```python
from agentegrity.langchain import register_exporter, instrument_graph
class PrintExporter:
async def on_session_start(self, session_id, adapter_name, profile):
print(f"[{adapter_name}] session {session_id} started")
async def on_event(self, session_id, event):
print(f" {event['event_type']}")
async def on_session_end(self, session_id, summary):
print(f"[{session_id}] score={summary}")
register_exporter(PrintExporter())
graph = instrument_graph(my_graph)
```
Exporter exceptions are caught and logged — the exporter can never break the instrumented agent. For a production dashboard, deploy the commercial [`agentegrity-pro`](https://github.com/cogensec/agentegrity-pro) backend (`docker compose up`) and point the agent at it by setting `AGENTEGRITY_URL` and `AGENTEGRITY_TOKEN` — the default adapter picks them up automatically and streams every session over the published exporter HTTP API.
**Non-Python agents** use the same contract via one of the six TypeScript adapters, each shipping a 2–3 line zero-config enable. Pick the one that matches your framework:
```bash
npm i @agentegrity/claude-sdk # Claude Agent SDK — options.hooks = hooks()
npm i @agentegrity/langchain # LangChain JS — callbacks: [instrument()]
npm i @agentegrity/openai-agents # OpenAI Agents SDK — run(..., { hooks: runHooks() })
npm i @agentegrity/crewai # CrewAI JS — instrument().attach(crew.events)
npm i @agentegrity/google-adk # Google ADK — instrument(agent)
npm i @agentegrity/vercel-ai # Vercel AI SDK — experimental_telemetry: instrument()
```
Each re-exports `registerExporter()`, `report()`, and `reset()` for the same fan-out contract as the Python adapters. The low-level `@agentegrity/client` reporter is still available for custom frameworks. The wire format is published as JSON Schema in `schemas/exporter/` and OpenAPI 3.1 in `schemas/openapi.yaml`, so any language can produce or consume events.
## 2. Score an arbitrary agent profile
```bash
pip install agentegrity
```
```python
from agentegrity import AgentegrityClient
client = AgentegrityClient()
score = client.evaluate(client.create_profile(name="my-agent"))
print(f"{score.composite:.3f} ({score.action})")
```
Use this for one-off profile scoring, CI gates, or agents outside the Claude SDK.
## 3. Confidence test from the terminal
```bash
python -m agentegrity # version + adapter availability
python -m agentegrity doctor # end-to-end self-check, prints composite score
```
If `doctor` prints a composite score and `OK`, your install is wired correctly.
---
## Next steps
- **Custom thresholds, layer weights, and threat detectors** — drop down to `IntegrityEvaluator` and the individual layers (see the "Configuring the evaluator" section in the [README](../README.md)).
- **Cryptographic attestation signing** — `pip install "agentegrity[crypto]"` and call `AttestationRecord.sign(private_key)` to produce verifiable records.
- **LLM-backed cortical checks** — `pip install "agentegrity[llm]"` and pass `LLMCorticalCheck` instances to `CorticalLayer(llm_checks=[...])` for semantic reasoning-chain validation.
- **Governance policies** — customize `GovernanceLayer(policy_set=...)` or register custom policy rules. See [`spec/layers/governance-layer.md`](../spec/layers/governance-layer.md).
- **Full specification** — [`spec/SPECIFICATION.md`](../spec/SPECIFICATION.md) is the source of truth for the property definitions, layer contracts, and scoring methodology.
================================================================
# Specification v1.0 (draft)
Source: spec/SPECIFICATION.md
================================================================
# Agentegrity Framework Specification
**Version 0.1.0**
**Status: Draft**
**Date: March 2026**
---
## 1. Overview
This specification defines the Agentegrity Framework — a standard for measuring, enforcing, and proving the integrity of autonomous AI agents. It is the technical companion to the [Agentegrity Manifesto](../MANIFESTO.md).
### 1.1 Scope
This specification covers:
- Formal definitions of the three agentegrity properties
- Architecture of the three integrity layers
- Required and optional controls at each layer
- Integrity scoring methodology
- Attestation record format
- Maturity model for progressive adoption
This specification does not cover:
- Agent alignment or value alignment
- Model training safety
- Data governance (except as it intersects agent memory integrity)
- Specific vendor implementations
### 1.2 Conformance Levels
| Level | Name | Requirements |
|---|---|---|
| **Level 1** | Aware | Agent profile defined; adversarial layer active; integrity scoring operational |
| **Level 2** | Managed | All four layers active (adversarial / cortical / governance / recovery); continuous monitoring; policy enforcement |
| **Level 3** | Verified | Cryptographic attestation; formal property verification; automated red teaming |
| **Level 4** | Autonomous | Self-healing integrity; adaptive policy; cross-environment portability proven |
---
## 2. Core Abstractions
### 2.1 Agent Profile
Every agent under agentegrity evaluation is described by an `AgentProfile`:
```
AgentProfile:
agent_id: string # Unique identifier
agent_type: enum # conversational | tool_using | autonomous | multi_agent | embodied
capabilities: list[string] # tool_use, memory_access, multi_agent_comm, code_execution, web_access, physical_actuation
deployment_context: enum # cloud | edge | hybrid | multi_agent | physical
risk_tier: enum # low | medium | high | critical
framework: string # Optional: langchain, crewai, autogen, custom, etc.
model_provider: string # Optional: openai, anthropic, google, open_source, etc.
metadata: dict # Extensible metadata
```
### 2.2 Integrity Score
The composite integrity score is a weighted aggregation of the three property scores:
```
IntegrityScore:
composite: float[0.0, 1.0]
adversarial_coherence: float[0.0, 1.0]
environmental_portability: float[0.0, 1.0]
verifiable_assurance: float[0.0, 1.0]
timestamp: datetime
confidence: float[0.0, 1.0]
evaluator_version: string
```
Default weights: `adversarial_coherence=0.40, environmental_portability=0.25, verifiable_assurance=0.35`. Weights are configurable per deployment.
### 2.3 Attestation Record
An attestation record is the cryptographic proof of an agent's integrity state at a point in time:
```
AttestationRecord:
record_id: string # UUID
agent_id: string # Reference to AgentProfile
timestamp: datetime # UTC timestamp
integrity_score: IntegrityScore
layer_states: dict # Per-layer evaluation results
evidence: list[Evidence] # Supporting evidence chain
signature: bytes # Ed25519 signature over record contents
public_key: bytes # Signing key for verification
chain_previous: string # Hash of previous attestation (chain integrity)
```
---
## 3. Property Specifications
### 3.1 Adversarial Coherence
**Formal definition:** An agent A possesses adversarial coherence at time t if, for a defined set of adversarial perturbations P applied across all input channels C, the agent's decision function D produces outputs within tolerance ε of its baseline behavior B:
```
∀p ∈ P, ∀c ∈ C: distance(D(input + p_c), B) ≤ ε
```
**Input channels (C):**
- Direct prompts and instructions
- Tool call responses
- Retrieved documents (RAG)
- Inter-agent messages
- Environmental signals (for embodied agents)
- Memory reads
**Evaluation methods:**
1. **Baseline establishment:** Record agent behavior across a standardized evaluation suite under clean conditions
2. **Perturbation injection:** Apply adversarial perturbations per channel and measure behavioral deviation
3. **Coherence scoring:** Compute the ratio of perturbation scenarios where behavior remains within tolerance
**Required controls:**
- `AC-01`: Baseline behavioral profile established and versioned
- `AC-02`: Per-channel adversarial evaluation coverage
- `AC-03`: Coherence score computed at minimum daily frequency
- `AC-04`: Threshold alerts on coherence degradation
- `AC-05`: Automated adversarial test suite (red teaming)
**Scoring:**
- 0.90–1.00: Agent maintains coherence under all tested perturbations
- 0.70–0.89: Agent maintains coherence under most perturbations; isolated deviations
- 0.50–0.69: Agent shows measurable coherence degradation under moderate pressure
- Below 0.50: Agent integrity cannot be assured under adversarial conditions
See [Adversarial Coherence property spec](properties/adversarial-coherence.md) for full detail.
### 3.2 Environmental Portability
**Formal definition:** An agent A possesses environmental portability if its integrity score S is equivalent (within tolerance δ) across a defined set of deployment environments E:
```
∀e_i, e_j ∈ E: |S(A, e_i) - S(A, e_j)| ≤ δ
```
**Deployment environments (E):**
- Single-tenant cloud (isolated)
- Multi-tenant cloud (shared infrastructure)
- Edge / on-premise (resource-constrained)
- Multi-agent system (peer agents present)
- Federated (cross-organizational)
- Physical (embodied, real-world actuation)
**Evaluation methods:**
1. **Environment matrix:** Define the set of target deployment environments
2. **Cross-environment evaluation:** Run the full integrity evaluation in each environment
3. **Portability scoring:** Compute the variance of integrity scores across environments
**Required controls:**
- `EP-01`: Target deployment environments enumerated
- `EP-02`: Per-environment integrity evaluation executed
- `EP-03`: Environment-specific threat models maintained
- `EP-04`: Trust boundary definitions per environment
- `EP-05`: Portability variance tracked over time
**Scoring:**
- 0.90–1.00: Integrity scores consistent (δ ≤ 0.05) across all environments
- 0.70–0.89: Minor variance (δ ≤ 0.15) with documented environment-specific mitigations
- 0.50–0.69: Significant variance; integrity degrades in some environments
- Below 0.50: Integrity guarantees do not port across environments
See [Environmental Portability property spec](properties/environmental-portability.md) for full detail.
### 3.3 Verifiable Assurance
**Formal definition:** An agent A possesses verifiable assurance if its integrity state is represented by an attestation record R that satisfies:
```
1. Completeness: R contains sufficient evidence to independently reconstruct the integrity evaluation
2. Tamper-evidence: Any modification to R is detectable via cryptographic verification
3. Non-repudiation: R is signed such that the evaluator cannot deny producing it
4. Chain integrity: R references the previous attestation, forming a verifiable history
```
**Evaluation methods:**
1. **Attestation generation:** Produce a signed attestation record after each integrity evaluation
2. **Independent verification:** A third party can verify the attestation using only the record and the evaluator's public key
3. **Chain validation:** The full attestation history is verifiable as an unbroken, unmodified chain
**Required controls:**
- `VA-01`: Attestation records generated for every integrity evaluation
- `VA-02`: Ed25519 (or equivalent) signing of attestation records
- `VA-03`: Attestation chain integrity maintained (hash linking)
- `VA-04`: Independent verification endpoint or tool available
- `VA-05`: Attestation records retained per organizational policy (minimum 90 days)
**Scoring:**
- 0.90–1.00: Full attestation chain; all records independently verifiable; no gaps
- 0.70–0.89: Attestation operational; minor gaps in coverage or chain
- 0.50–0.69: Partial attestation; some evaluations unattested
- Below 0.50: No cryptographic assurance; integrity claims are observational only
See [Verifiable Assurance property spec](properties/verifiable-assurance.md) for full detail.
---
## 4. Layer Architecture
### 4.1 Adversarial Layer
**Purpose:** Continuously test and validate the agent's resilience to attack.
**Position:** Outermost layer. First line of integrity defense.
**Components:**
> These are design targets for v1.0 of the specification. The v0.1.0 reference implementation does not yet measure or enforce these targets.
| Component | Function | Latency Target |
|---|---|---|
| Attack Surface Mapper | Enumerate all input channels and tool interfaces | Async (background) |
| Threat Detector | Real-time detection of adversarial inputs across channels | < 10ms p99 |
| Coherence Scorer | Compute adversarial coherence score against baseline | < 50ms p99 |
| Red Team Harness | Automated adversarial testing interface | Async (scheduled) |
| Threat Intel Connector | Ingest emerging attack patterns | Async (polling) |
**Required interfaces:**
- `evaluate(agent_profile, context) → AdversarialResult`
- `detect(input, channel) → ThreatAssessment`
- `red_team(agent_profile, scenario) → RedTeamResult`
See [Adversarial Layer spec](layers/adversarial-layer.md) for full detail.
### 4.2 Cortical Layer
**Purpose:** Monitor and validate the agent's internal cognitive integrity.
**Position:** Middle layer. Protects reasoning, memory, and behavioral consistency.
**Components:**
> These are design targets for v1.0 of the specification. The v0.1.0 reference implementation does not yet measure or enforce these targets.
| Component | Function | Latency Target |
|---|---|---|
| Reasoning Validator | Verify reasoning chain consistency and goal alignment | < 20ms p99 |
| Memory Prover | Track memory provenance and detect poisoning | < 15ms p99 |
| Behavioral Monitor | Maintain baselines and detect drift | < 10ms p99 |
| Conflict Detector | Identify contradictions between goals, instructions, memory, actions | < 25ms p99 |
| State Attester | Sign and record internal cognitive state | < 5ms p99 |
**Required interfaces:**
- `evaluate(agent_profile, cognitive_state) → CorticalResult`
- `validate_reasoning(chain) → ReasoningAssessment`
- `check_memory(read_event) → MemoryAssessment`
- `detect_drift(current_behavior, baseline) → DriftAssessment`
See [Cortical Layer spec](layers/cortical-layer.md) for full detail.
### 4.3 Governance Layer
**Purpose:** Enforce organizational policy, oversight, and compliance requirements.
**Position:** Innermost layer. Closest to the agent's action execution.
**Components:**
> These are design targets for v1.0 of the specification. The v0.1.0 reference implementation does not yet measure or enforce these targets.
| Component | Function | Latency Target |
|---|---|---|
| Policy Engine | Evaluate actions against policy-as-code rules | < 5ms p99 |
| Escalation Manager | Route high-risk decisions to human oversight | < 100ms p99 |
| Compliance Mapper | Map integrity evaluations to regulatory frameworks | Async |
| Audit Logger | Produce immutable, signed audit records | < 2ms p99 |
| Break-Glass Controller | Emergency override and agent suspension | < 10ms p99 |
**Required interfaces:**
- `evaluate(agent_profile, action) → GovernanceResult`
- `enforce_policy(action, policy_set) → PolicyDecision`
- `escalate(action, risk_assessment) → EscalationResult`
- `emergency_stop(agent_id, reason) → StopConfirmation`
See [Governance Layer spec](layers/governance-layer.md) for full detail.
---
## 5. Integrity Evaluation Flow
The standard evaluation flow processes an agent action through all four layers:
```
Agent Action Initiated
│
▼
┌─── Adversarial Layer ───┐
│ 1. Threat detection │
│ 2. Input channel scan │
│ 3. Coherence check │
│ │
│ → BLOCK if threat score │
│ exceeds threshold │
└──────────┬───────────────┘
│ PASS
▼
┌─── Cortical Layer ───────┐
│ 4. Reasoning validation │
│ 5. Memory provenance │
│ 6. Behavioral drift │
│ 7. Conflict detection │
│ │
│ → ALERT if drift exceeds │
│ tolerance │
└──────────┬───────────────┘
│ PASS
▼
┌─── Governance Layer ─────┐
│ 8. Policy evaluation │
│ 9. Risk tier assessment │
│ 10. Compliance check │
│ 11. Audit log write │
│ │
│ → ESCALATE if policy │
│ requires human review │
└──────────┬───────────────┘
│ PASS
▼
┌─── Recovery Layer ───────┐
│ 12. Baseline continuity │
│ 13. Sustained-degradation│
│ watch │
│ 14. Recovery capability │
│ 15. Chain integrity │
│ │
│ → ESCALATE if attestation│
│ chain is tampered │
└──────────┬───────────────┘
│ PASS
▼
Action Executed
│
▼
Attestation Record Generated
```
**Total latency budget:** < 100ms p99 for the full evaluation flow. (This is a design target for v1.0 of the specification. The v0.1.0 reference implementation does not yet measure or enforce this target.)
---
## 6. Maturity Model
### Level 1: Aware
The organization has adopted agentegrity vocabulary and tooling. Agents are profiled and basic integrity scoring is operational.
| Requirement | Controls |
|---|---|
| Agent profiles defined for all deployed agents | AgentProfile schema populated |
| Adversarial layer active | AC-01, AC-02 |
| Integrity scores computed | Composite score generated |
| Baseline behaviors established | AC-01 |
### Level 2: Managed
All four layers are active. Continuous monitoring is operational. Policies are enforced.
| Requirement | Controls |
|---|---|
| All four layers deployed | All layer interfaces implemented |
| Continuous integrity monitoring | Runtime evaluation on every action |
| Policy-as-code enforcement | Governance layer policies defined and active |
| Human escalation paths defined | Escalation Manager configured |
| Audit trails operational | Audit Logger active |
### Level 3: Verified
Cryptographic attestation is operational. Properties are formally verified. Automated red teaming runs continuously.
| Requirement | Controls |
|---|---|
| Attestation records generated | VA-01, VA-02, VA-03 |
| Independent verification available | VA-04 |
| Automated red teaming | AC-05 |
| Cross-environment evaluation | EP-01, EP-02 |
| Formal property verification | Mathematical property proofs |
### Level 4: Autonomous
Integrity is self-maintaining. The system adapts to new threats, new environments, and new policies without manual intervention.
| Requirement | Controls |
|---|---|
| Self-healing integrity responses | Automated remediation on degradation |
| Adaptive policy engine | Policies adjust to context and risk |
| Cross-environment portability proven | EP scores consistent across all targets |
| Continuous adversarial adaptation | Threat models update from live data |
| Full attestation chain history | VA-05 with complete chain |
---
## 7. Extensibility
The framework is designed for extension:
- **Custom validators:** Implement the `Validator` interface to add domain-specific integrity checks
- **Custom layers:** Additional layers can be inserted between the three standard layers
- **Custom scoring:** Property weights and scoring functions are configurable
- **Custom attestation:** Attestation format supports extension fields for domain-specific evidence
- **Framework adapters:** Implement the `AgentAdapter` interface to integrate with any agent framework
---
## 8. Versioning
This specification follows semantic versioning:
- **Major:** Breaking changes to core abstractions, properties, or layer architecture
- **Minor:** New controls, validators, or conformance requirements
- **Patch:** Clarifications, typo fixes, editorial changes
---
*Agentegrity Framework Specification v1.0.0 · Cogensec Research · March 2026*
================================================================
# Threat Model (STRIDE)
Source: spec/threat-model.md
================================================================
# Agentegrity Framework — Threat Model
**Status:** Normative
**Version:** 0.6.0
**Last reviewed:** 2026-05-06
---
This document is the [STRIDE](https://en.wikipedia.org/wiki/STRIDE_model)
threat model for the framework itself. Required reading for anyone
deploying agentegrity in production or putting it in front of a
security review board. Companion to [`SECURITY.md`](../SECURITY.md)
(disclosure policy) and [`STATUS.md`](../STATUS.md) (per-component
maturity).
The questions this document answers:
- What can an attacker do to the framework or its outputs?
- Which mitigations exist today, and where in the code they live.
- Which mitigations are explicitly *not* present and why.
- Which residual risk we accept and which we ask the operator to handle.
> The framework is a *measurement* layer. It does not replace
> guardrails, runtime monitors, or network controls. Anything below
> that talks about external mitigations assumes those exist —
> agentegrity is defense in depth, not defense in replacement.
---
## 1. System under analysis
For this threat model the framework is everything in this repository
plus the published `agentegrity` PyPI package and `@agentegrity/*` npm
scope. External systems (the agent's underlying LLM, the framework SDK
it's instrumenting, the operator's exporter backend) are trust
boundaries — we model what they can do *to* agentegrity and what
agentegrity does to protect itself across those boundaries.
### 1.1 Trust boundaries
```
┌─────────────────────────────────────────────────────────────┐
│ User process (Python or Node) │
│ │
│ ┌────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Framework │←─→│ AgentegrityX │←─→│ AttestationChain │ │
│ │ SDK │ │ Adapter │ │ + signing key │ │
│ └────────────┘ └─────┬────────┘ └──────────────────┘ │
│ ▲ │ │ │
│ │ ▼ ▼ │
│ │ ┌──────────┐ ┌──────────┐ │
│ │ │ Layers │ │ File / │ │
│ │ │ (4) │ │ sqlite │ │
│ │ └──────────┘ │ checkpts │ │
│ │ └──────────┘ │
│ ▼ │
│ ┌────────────┐ │
│ │ LLM │←══ network │
│ └────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ (HTTP, fan-out)
▼
┌───────────────────────┐
│ SessionExporter │
│ backend (operator) │
└───────────────────────┘
```
Trust boundaries (each of these is an attack surface):
| # | Boundary | Description |
|---|---|---|
| TB-1 | Framework SDK ↔ Adapter | The framework SDK can pass arbitrary content into adapter event handlers. |
| TB-2 | LLM ↔ Adapter | LLM output (tool responses, retrieved documents, peer messages) reaches the AdversarialLayer's scanned channels. |
| TB-3 | Adapter ↔ Layers | The four-layer evaluator runs in-process; trusted. |
| TB-4 | Layers ↔ Persistence | Checkpoint and BaselineStore writes go to local disk or sqlite. |
| TB-5 | Adapter ↔ Exporter | Session data (start, events, end) is fanned out to caller-registered exporters and the HTTP reporter. |
| TB-6 | User process ↔ Operator backend | The exporter backend (e.g. `agentegrity-pro`, the reference receiver, a custom backend) sees the wire format. |
| TB-7 | Maintainers ↔ Public registries | PyPI and npm releases are signed by the GitHub release workflow's OIDC identity (in 0.6.0+ when SLSA provenance is re-enabled). |
---
## 2. STRIDE
For each STRIDE category, we list the threats relevant to this
framework, the mitigation in place, and the residual risk.
### 2.1 Spoofing
**T-S1 — Forged AttestationRecord with attacker-chosen content.**
An attacker who can sign records with the framework's private key
can produce records that pass `AttestationRecord.verify()`.
- **Mitigation.** The signing key is supplied by the caller — the
framework never generates or stores a key on the user's behalf.
`AttestationRecord.sign(private_key)` writes the signature and
embedded public key to the record. Verification is via embedded
public key OR an explicit public_key argument so an operator can
pin to a known good key.
- **Operator must.** Generate a fresh Ed25519 key per agent (or per
agent-tier), store it in a KMS or HSM, and never pass key material
through the same trust boundary as agent input. v0.7 will ship a
`KeyProvider` Protocol so this is enforced architecturally rather
than only by convention.
- **Residual risk.** If the operator stores the signing key in the
same process as the LLM that the agent talks to and the LLM
achieves arbitrary code execution (for example via a tool that
exec's a subprocess on attacker-controlled input), the key is
reachable. Use OS-level isolation or a KMS for any deployment
where this matters.
**T-S2 — Imposter SessionExporter.**
A malicious package on PyPI/npm could register itself as a
`SessionExporter` and receive every event flowing through the agent.
- **Mitigation.** Exporter registration is explicit — `register_exporter(...)`
must be called by the user's code. There is no auto-discovery
pathway. Supply-chain policy (lockfiles, allowlists) is the
operator's responsibility.
- **Operator must.** Pin every dependency; review what
`register_exporter` calls live in your codebase.
**T-S3 — Imposter HTTP exporter backend.**
The agent points `AGENTEGRITY_URL` at a backend; an attacker
controlling DNS or the network path can serve a different one.
- **Mitigation.** TLS is the operator's responsibility. The reporter
uses `fetch` (TS) / `httpx` (Python) which honour standard CA
bundles. There is no certificate pinning today.
- **Residual risk.** A network-level attacker between the agent and
the backend can observe (and potentially modify) the wire payload.
Use mTLS or a private network for production deployments.
### 2.2 Tampering
**T-T1 — Tampered AttestationRecord on disk or in transit.**
- **Mitigation.** Each record is Ed25519-signed; `verify()` returns
False on any byte change. The chain links every record to the
SHA-256 of its predecessor's canonical payload, so a single
tampered record breaks the entire downstream chain via
`AttestationChain.verify_chain()`.
- **Cross-link tested in.** `tests/test_recovery_restore.py`,
`tests/test_attestation*.py`.
- **Residual risk.** The verifier needs the legitimate public key. An
attacker who can swap both the record AND the embedded public key
produces a chain that verifies internally — operators MUST verify
against a known-good key, not against the embedded one.
**T-T2 — Tampered Checkpoint or BaselineStore file.**
The reference file backends write JSON to disk. An attacker with
filesystem write access can mutate baselines or checkpoints.
- **Mitigation.** Atomic writes (`NamedTemporaryFile` + `os.replace`)
prevent half-written files. JSON contents are NOT signed today.
- **Operator must.** Restrict filesystem permissions. Treat the
checkpoint directory as a secret-equivalent — anyone who can write
there can roll the agent back to a state of their choosing.
- **Residual risk.** Accepted. v0.7+ will add optional signature
envelope around persisted artifacts via the `KeyProvider` interface.
**T-T3 — Policy rule tampering at rest.**
`GovernanceLayer` reads policy rules from the policy_set name +
custom rules at construction time. If the rule source (a config file,
a database row) is mutable by an attacker, the agent's authorization
behavior changes.
- **Mitigation.** Rules are passed as code, not loaded from disk by
default. The `policy_set="enterprise-default"` is a constant
defined in `src/agentegrity/layers/governance.py`.
- **Operator must.** If you load custom rules from a file, sign them
separately and verify before passing them to `GovernanceLayer`.
Audit log entries are SHA-256-hashed so post-hoc detection of
rule changes is possible — but only if you compare against a
trusted hash record.
**T-T4 — Tampering with the regex detector taxonomy at runtime.**
A malicious dependency could monkey-patch
`agentegrity.layers.adversarial.default_detector_patterns()` to
return a weakened set.
- **Mitigation.** None at runtime. Python and JS both allow
monkey-patching arbitrarily.
- **Operator must.** Pin every dependency and use `pip-audit`/`npm
audit` in CI. Consider running with `agentegrity` in an isolated
process with restricted module imports.
### 2.3 Repudiation
**T-R1 — Operator denies an evaluation occurred.**
- **Mitigation.** Every evaluation produces an `AttestationRecord`
with a UUID, a signed timestamp, and a chain link to the previous
record. The chain is independently verifiable by any third party
with the public key — the framework does not need to be trusted to
prove what it observed.
- **Operator must.** Persist the chain. The library does not write
to durable storage on its own.
**T-R2 — Agent denies it received an injection.**
- **Mitigation.** The AdversarialLayer's `ThreatAssessment` becomes
part of the `AttestationRecord.layer_states`. A signed record with
`threat_count > 0` is permanent evidence that the agent's
evaluation pipeline saw the attack.
- **Residual risk.** If the layer didn't catch the attack (see
STATUS.md detection-quality discussion — the regex taxonomy misses
action-oriented injections), the absence of a threat in the
record only proves the layer didn't fire, not that no attack
occurred. Detection quality is the operator's responsibility to
validate against their own threat model.
### 2.4 Information disclosure
**T-I1 — Sensitive context leaks via SessionExporter fan-out.**
Every adapter event is fanned out to every registered exporter. If
the agent context contains secrets (API keys, PII in tool responses,
private documents in retrieved_documents), they reach every
subscriber.
- **Mitigation.** None at the framework layer. The exporter wire
format is not sanitised.
- **Operator must.** Scrub sensitive fields before they reach the
framework, use a custom validator to redact specific fields, or
put a sanitising proxy in front of the exporter receiver. The
reference receiver in `examples/exporter_receiver/` deliberately
prints the full payload so operators see what their exporters
receive.
**T-I2 — Signing key disclosure via debug logging.**
Any operator code path that `repr()`s an `AttestationRecord` could
inadvertently expose the signature bytes (low risk) or the public
key (informational, not a secret).
- **Mitigation.** `AttestationRecord.__repr__` redacts the signature
to a truncated hex prefix; the full signature is only visible via
`to_dict()` which the user explicitly invokes.
- **Operator must.** Avoid logging private keys. The framework never
receives the private key in a form it would log — only the
public key is embedded in the record.
**T-I3 — Cross-tenant leakage in checkpoint stores.**
Multiple agents sharing a single `FileBaselineStore` directory or
`SqliteBaselineStore` database see each other's baselines.
- **Mitigation.** The store is keyed by `agent_id`, but there is no
ACL — every agent with a handle to the store can read every other
agent's state.
- **Operator must.** Isolate stores per-tenant. The backends are
cheap to instantiate; create one per tenant rather than one per
cluster.
### 2.5 Denial of service
**T-D1 — Resource exhaustion via huge prompts.**
An attacker controls the agent's input. Each AdversarialLayer
evaluation runs every regex pattern against the input text.
- **Mitigation.** Regex patterns are anchored on word boundaries and
use bounded character classes — none use unbounded backreferences
that would enable ReDoS. The default 21-pattern taxonomy runs in
~0.2 ms p95 on representative input (see `tests/test_perf_budget.py`).
- **Residual risk.** The framework does not impose a maximum input
length. An attacker passing a 10 MB string will pay 10 MB worth of
regex CPU. Use the operator's input-size limit at the framework
SDK level.
**T-D2 — Exporter starvation.**
A slow exporter (e.g. a webhook to a backend that's down) could in
principle stall the adapter.
- **Mitigation.** Exporter callbacks are awaited via `_safe_await` /
`safeCall` with a `try/except` that catches and logs every
exception. Slow exporters block the await but cannot crash the
agent. There is no per-exporter timeout in 0.6.0.
- **Operator must.** Use a non-blocking HTTP client or a fire-and-
forget queue if the exporter target is unreliable. v0.7+ will add a
per-exporter timeout knob.
**T-D3 — Chain-verification cost on long-running agents.**
`AttestationChain.verify_chain()` is O(n) over chain length. An
agent running for weeks with millions of records pays linear cost
on each verify.
- **Mitigation.** Verification is opt-in. Operators choose when to
verify; the chain doesn't auto-verify on append.
- **Operator must.** Roll over to a fresh chain at deployment
boundaries. The Checkpoint Protocol gives a clean cut-point.
### 2.6 Elevation of privilege
**T-E1 — Path traversal in FileCheckpoint / FileBaselineStore.**
A malicious or buggy caller passes an `agent_id` like
`../../etc/passwd` — without a guard the store could write outside
its root.
- **Mitigation.** `FileCheckpoint._path_for` and
`FileBaselineStore._path_for` raise `ValueError` on any id
containing `/`, `\\`, or `..`. Tested in `tests/test_checkpoint.py`
and `tests/test_baseline_store.py`.
- **Residual risk.** Accepted as low — would require an attacker to
control `agent_id` and the filesystem behaviour. Both backends
use pure-Python path checks; no shell interpolation.
**T-E2 — Arbitrary code execution via custom validator.**
The framework lets operators register custom threat detectors and
custom policy rules that are arbitrary Python callables. A malicious
dependency that adds a custom detector escalates to full process
control.
- **Mitigation.** None. Custom callables are explicitly an extension
point and their code runs in the agent's process.
- **Operator must.** Treat custom detector / policy registration as
a trust-decision; don't accept callables from untrusted sources.
Code review every custom hook.
**T-E3 — Supply-chain attack on `agentegrity` itself.**
A compromised maintainer, stolen npm token, or compromised CI
credential could publish a malicious release.
- **Mitigation today.** PyPI publishing uses GitHub Actions OIDC
trusted publishing — there are no long-lived API tokens in CI.
Two-factor authentication is enforced on all maintainer accounts.
- **Mitigation v0.7+** SLSA provenance generation in `release.yml`
(currently disabled — the bump went out without it after the repo
moved to public/cogensec; will be re-enabled in v0.7), SBOM
attached to every GitHub release, sigstore signature on release
artifacts.
- **Operator must.** Pin exact versions, verify SLSA provenance once
it's re-enabled, and use `pip install agentegrity==0.6.0
--require-hashes` against a lock file.
---
## 3. Out of scope
The threat model deliberately does not cover:
- **The agent's own LLM provider.** Agentegrity has no opinion on
whether the LLM is hosted by Anthropic, OpenAI, on-prem, etc. Trust
the LLM at whatever level your existing vendor-risk process trusts
it.
- **The framework SDK we adapt.** Issues in Claude Agent SDK,
LangChain, OpenAI Agents SDK, CrewAI, Google ADK, or Vercel AI
SDK are not agentegrity issues. Report them upstream.
- **Detection coverage.** "The regex taxonomy doesn't detect attack
pattern X" is a feature gap (see STATUS.md), not a vulnerability.
- **agentegrity-pro.** The commercial dashboard ships under a separate
security policy.
---
## 4. Mitigations summary
| # | Mitigation | Where |
|---|---|---|
| M-1 | Ed25519 record signatures, chain hash links | `src/agentegrity/core/attestation.py` |
| M-2 | Deterministic canonical payload | `AttestationRecord.canonical_payload` |
| M-3 | Atomic file writes | `FileCheckpoint`, `FileBaselineStore` |
| M-4 | Path-traversal guard | `_path_for(agent_id)` in both backends |
| M-5 | Idempotent CREATE TABLE | `SqliteCheckpoint`, `SqliteBaselineStore` |
| M-6 | Persistent connection for `:memory:` | both sqlite backends |
| M-7 | Fail-open exporter fan-out | `_safe_await` / `safeCall` |
| M-8 | Idempotent register_exporter | reference-equality dedup in both Python and TS |
| M-9 | ReDoS-safe regex taxonomy | bounded character classes, no unbounded backrefs |
| M-10 | OIDC trusted publishing | `.github/workflows/release.yml` |
| M-11 | Cross-adapter conformance suite | `tests/test_adapter_conformance.py`, `clients/typescript/test/cross-package-conformance.test.ts` |
| M-12 | Performance budget | `tests/test_perf_budget.py` |
| M-13 | Detection regression gate | `tests/test_benchmarks.py` (synthetic + InjecAgent) |
| M-14 | Tamper-recovery round trip | `tests/test_recovery_restore.py` |
## 5. Open items (v0.7+)
These items have been triaged; none represent a vulnerability against
v0.6.0 but each closes a residual risk above:
- **`KeyProvider` Protocol** with file / env / KMS-backed reference
impl. Closes T-S1, T-T2 residual risk.
- **JWS / COSE serialization for `AttestationRecord`.** Lets generic
verifiers validate without depending on the framework. Closes
interop gap behind T-R1.
- **Per-exporter timeouts.** Closes T-D2 residual risk.
- **Re-enable SLSA provenance + sigstore signatures + SBOM.**
Disabled when the repo was private; the cogensec move makes them
eligible again. Closes T-E3.
- **Optional signature envelope on persisted Checkpoint /
BaselineStore artifacts.** Closes T-T2 residual risk.
---
## 6. Reporting
Report any threat not covered here, or any bypass of a mitigation
listed above, via the process in [`SECURITY.md`](../SECURITY.md).
================================================================
# Status
Source: STATUS.md
================================================================
# Project Status
A scannable matrix of where each piece of the framework is on the
maturity curve. This is the answer to "what's actually production-ready
versus reference-quality versus experimental."
The README's *What it does and does not* prose explains the philosophy;
this document is the operational version of it.
**Legend**
- ✅ **Hardened** — production-grade, tested against adversarial inputs,
cryptographically grounded, or otherwise carrying load. Safe default.
- 🟡 **Reference** — real working logic with end-to-end test coverage,
but the detection / heuristic content is the published reference set.
Catches obvious cases; production deployments should extend with
custom detectors / rules / providers.
- 🧪 **Experimental** — feature exists, has at least smoke tests, but
the API or behaviour may change before v1.0.
- 🛠 **Planned** — on the roadmap, not yet shipped. Documented for
transparency.
---
## Core (`src/agentegrity/core/`)
| Module | Status | Notes |
|--------------------------------------------|:------:|-------|
| `evaluator.IntegrityEvaluator` | ✅ | Sync four-layer pipeline; composite scoring with configurable `PropertyWeights`; fail-fast on `block`. |
| `evaluator.AsyncIntegrityEvaluator` | ✅ | Runs independent layers via `asyncio.gather` when `fail_fast=False`. Wraps sync layers via `asyncio.to_thread`. |
| `attestation.AttestationRecord` | ✅ | Ed25519 signing via `cryptography`, deterministic JSON canonicalization, SHA-256 content hash. |
| `attestation.AttestationChain` | ✅ | Hash-chained tamper-evident history; `verify_chain()` covers all linked records. |
| `monitor.IntegrityMonitor` | ✅ | `@guard` decorator, violation callbacks, four `ViolationAction` modes. |
| `profile.AgentProfile` | ✅ | Type-safe enums for `AgentType` / `DeploymentContext` / `RiskTier`; `default()` factory. |
## Layers (`src/agentegrity/layers/`)
| Layer | Default? | Status | Detection Quality |
|--------------------|:--------:|:------:|-------------------|
| `AdversarialLayer` | ✅ | ✅ | Regex-pattern taxonomy across six families. 21 default patterns scan direct input + memory_reads + tool_outputs + retrieved_documents + peer_messages; per-pattern severity/confidence; aggregation collapses multiple matches per (channel, threat_type). Custom patterns plug in via `extra_patterns=`. **`EmbeddingSimilarityDetector` (zero-dep n-gram fallback + pluggable embed_fn for Voyage / OpenAI / sentence-transformers)** is the layer-2 defence; **`AdversarialLLMLayer` (Claude-backed semantic classifier, opt-in via `[llm]`)** is the layer-3 defence — composes regex + LLM verdicts conservatively, fail-open on API error. |
| `CorticalLayer` | ✅ | ✅ | Reasoning conflict detection rule-based (🟡). Memory provenance structural (🟡). **Drift: Jensen-Shannon distance with Laplace smoothing** (default) **or 1D Wasserstein behind `[stats]`** — chosen via `metric="js"\|"wasserstein"`. Both symmetric, both bounded in [0, 1], both gated by `min_drift_samples`. |
| `CorticalLLMLayer` (`cortical_llm.py`) | opt-in via `default_layers(prefer_llm=True)` | ✅ | Anthropic-API-backed semantic checks for reasoning + memory + drift. Sync `evaluate()` stays pattern-based — only `aevaluate()` calls Claude — so opting in doesn't penalise sync callers. Fail-open on API error / missing key. Requires `pip install agentegrity[llm]`. |
| `AdversarialLLMLayer` (`adversarial_llm.py`) | opt-in | ✅ | Claude-backed semantic classifier for the adversarial layer. Composes with the regex taxonomy via union (LLM-detected attacks add ThreatAssessments; LLM agreeing with regex deduplicates). Same opt-in pattern as `CorticalLLMLayer`. Requires `pip install agentegrity[llm]`. |
| `GovernanceLayer` | ✅ | ✅ | Real policy engine, `enterprise-default` rule set, custom rule support, audit log with SHA-256 content hash. |
| `RecoveryLayer` | ✅ | ✅ | Capability declaration check, sustained-degradation detection, attestation-chain continuity, **`Checkpoint` Protocol with InMemory / File (atomic write) / Sqlite (idempotent schema) / KMSCheckpoint (envelope encryption + AWS KMS wrapped data keys, `[kms]` extra) reference backends**. `RecoveryLayer.snapshot()` + `restore_to()` round-trip preserve chain link hashes so post-restore `verify_chain()` still passes. KMSCheckpoint binds at-rest secrecy to a KMS-managed CMK and verifies KMS encryption-context at load time so a compromised inner backend can't roll the agent back to attacker-chosen state. |
## Python Adapters (`src/agentegrity/.py`)
| Adapter | Status | Notes |
|-----------------------------------|:------:|-------|
| `agentegrity.claude` (Claude Agent SDK) | ✅ | Five hook points: Harness, Tools, Sandbox, Session, Orchestration. |
| `agentegrity.langchain` (LangChain + LangGraph) | ✅ | Single adapter covers both via callback-handler propagation. |
| `agentegrity.openai_agents` | ✅ | Hooks via the official `openai-agents` Python SDK. |
| `agentegrity.crewai` | ✅ | Event-bus subscription. |
| `agentegrity.google_adk` | ✅ | Google Agent Development Kit. |
All five inherit from `_BaseAdapter` (`adapters/base.py`), share the
seven canonical event types, and feed the same evaluator + attestation
chain.
## TypeScript Packages (`clients/typescript/packages/`)
| Package | Status | Notes |
|----------------------------|:------:|-------|
| `@agentegrity/client` | ✅ | Shared `createDefaultAdapter()`, `AgentegrityReporter`, types, `process.beforeExit` shutdown, exporter fan-out. |
| `@agentegrity/claude-sdk` | ✅ | Mirrors the Python Claude adapter. |
| `@agentegrity/langchain` | ✅ | LangChain JS callback handler. |
| `@agentegrity/openai-agents` | ✅ | OpenAI Agents JS SDK. |
| `@agentegrity/crewai` | ✅ | CrewAI JS event hooks. |
| `@agentegrity/google-adk` | ✅ | Google ADK JS bindings. |
| `@agentegrity/vercel-ai` | 🧪 | TS-native; uses the AI SDK's OpenTelemetry tracer surface. No Python equivalent. |
## Spec & Schemas
| Asset | Status | Notes |
|--------------------------------------|:------:|-------|
| `spec/SPECIFICATION.md` | 🟡 | Currently labeled `v1.0-draft`. Phase 6 plan: lock to v1.0. |
| `spec/layers/adversarial-layer.md` | ✅ | Normative. |
| `spec/layers/cortical-layer.md` | ✅ | Normative. |
| `spec/layers/governance-layer.md` | ✅ | Normative. |
| `spec/layers/recovery-layer.md` | 🧪 | Newly added in v0.5.3-Unreleased; conformance section subject to revision. |
| `spec/properties/*.md` | ✅ | Per-property normative docs (AC / EP / VA). |
| `schemas/exporter/*.json` | ✅ | JSON Schema for `event`, `session_start`, `session_end`, `common`. |
| `schemas/openapi.yaml` | ✅ | OpenAPI 3.1 description of the exporter wire format. |
## Operations & Tooling
| Capability | Status | Notes |
|--------------------------------------|:------:|-------|
| Lint (`ruff`) | ✅ | Clean. |
| Type check (`mypy --strict`) | ✅ | 27 source files, zero issues. |
| Python tests | ✅ | 147 tests, all green. |
| TypeScript build / typecheck / test | ✅ | All 7 packages green via `bun run`. |
| CI matrix (Python 3.10/3.12, Node 18/20/22) | ✅ | `.github/workflows/ci.yml`. |
| Version-parity gate | ✅ | `scripts/check_versions.py` (Python) + `scripts/check-versions.ts` (TS) wired into CI. |
| Release workflow | ✅ | `.github/workflows/release.yml` publishes Python wheel + npm matrix. |
| Conformance test suite (Python adapters) | ✅ | `tests/test_adapter_conformance.py` runs the same canonical event stream + lifecycle assertions across every shipped adapter (51 tests; 9 invariants × 5 adapters + registry sentinel). New adapters add one line to `ADAPTER_CLASSES` and inherit the entire matrix. |
| Conformance test suite (TS packages) | ✅ | `clients/typescript/test/cross-package-conformance.test.ts` is the TS mirror — 49 tests across 6 packages (claude-sdk / langchain / openai-agents / crewai / google-adk / vercel-ai), driving the same shared-core seam (`adapter()`) through the same canonical event stream and pinning the same parity invariants. The new suite caught two real bugs in `@agentegrity/client` on first run: missing `adapterName` field, no `registerExporter` deduplication. Both fixed in the same commit. |
| Performance budget | ✅ | `tests/test_perf_budget.py` (run via `pytest -m benchmark`) measures 200-iteration p95 latency for each layer in isolation and the full default pipeline. Calibrated ceilings: 50 ms per-layer, 100 ms full-pipeline. Currently measured: per-layer p95 0.01-0.20 ms, pipeline p95 0.23 ms (250-5000x cushion before LLM-backed paths land). Per-layer + pipeline budgets pinned in metadata-sentinel tests so a maintainer can't silently raise them. |
| Detection benchmark suite | ✅ | `pytest -m benchmark` runs the in-repo synthetic suite (~30 attacks + ~30 benign across 6 attack families) with calibrated thresholds (TPR ≥ 0.95, FPR ≤ 0.05, F1 ≥ 0.95, plus per-family floor: every family must register at least one TP). Loader stubs for PINT / AgentDojo / InjecAgent auto-skip when their `AGENTEGRITY_BENCH_*` env var is unset, so cron can plug in real datasets without touching CI defaults. `scripts/run_benchmarks.py [--all]` prints a markdown report and exits non-zero on regression. **Real-world numbers published below.** |
| OpenTelemetry instrumentation | 🛠 | Phase 5 plan. |
| Prometheus metrics | 🛠 | Phase 5 plan. |
| SLSA provenance + SBOM + sigstore | 🛠 | Phase 6 plan; provenance was disabled while repo was private and is now eligible to re-enable. |
## Roadmap Items Not Yet Shipped
| Item | Phase | Notes |
|---------------------------------------|:-----:|-------|
| Semantic Kernel adapter (Python + TS) | 4 | `pip install agentegrity[semantic-kernel]` + `@agentegrity/semantic-kernel`. |
| AutoGen adapter (Python) | 4 | `pip install agentegrity[autogen]`. |
| AWS Bedrock Agents adapter (Python) | 4 | `pip install agentegrity[bedrock]`. |
| ~~Reference SessionExporter receiver~~ | ✅ | Shipped in `examples/exporter_receiver/`. FastAPI app implementing all three endpoints (`POST /sessions`, `POST /sessions/{id}/events`, `POST /sessions/{id}/end`); validates each payload against `schemas/exporter/*.json` via `jsonschema.Draft202012Validator`. 11 smoke tests cover happy-path (202) and validation errors (422). Not a production backend — in-memory store, no auth — but unblocks adoption of the exporter wire format without `agentegrity-pro`. |
| JWS / COSE attestation serializations | 6 | Interop with generic verifiers; raw Ed25519 stays the default. |
| Key rotation + KMS interface | 6 | `KeyProvider` Protocol with file / env / AWS KMS impls. |
| ~~Threat model document~~ | ✅ | Shipped at [`spec/threat-model.md`](spec/threat-model.md) — STRIDE on the framework itself, 14 mitigations cross-linked to the test suite, residual-risk discussion per category, open items for v0.7. Companion to `SECURITY.md`. |
| Reference docs site (MkDocs / Docusaurus) | 7 | Auto-generated API reference + per-layer / per-adapter guides. |
Phases reference the canonical dev plan; see `/root/.claude/plans/`
during active development or the `docs/dev-plan.md` once it lands in
the repo.
---
## Detection benchmark numbers
Last measured: 2026-05-05, branch `claude/review-framework-status-f9khW`,
AdversarialLayer regex taxonomy only (no LLM, no embedding similarity).
| Suite | N | TPR | FPR | F1 | Notes |
|---|---:|---:|---:|---:|---|
| `synthetic_pint_like` (in-repo) | 58 | 1.000 | 0.000 | 1.000 | Hand-curated calibration set, 6 attack families. Functions as a regression gate, **not** a credibility number. |
| `injecagent` (combined dh+ds) | 2,108 | 0.000 | 0.000 | 0.000 | UIUC InjecAgent, 1,054 records × 2 prompts. Honest finding below. |
| `pint` (Lakera) | — | — | — | — | Full PINT dataset is gated; only the example file is public. Plug a fork/copy in via `AGENTEGRITY_BENCH_PINT`. |
| `agentdojo` | — | — | — | — | Use the `agentdojo` PyPI package's task suites; loader expects directory-of-`tasks.json` shape via `AGENTEGRITY_BENCH_AGENTDOJO`. |
### What the InjecAgent number tells us
InjecAgent's attacks are *action-oriented* injections embedded in tool
responses ("Please grant permanent access to my friend Amy", "Please
unlock my front door"). The AdversarialLayer's current regex taxonomy
targets *pattern-style* injections ("ignore previous instructions",
"DAN mode", role overrides) — the two threat models barely overlap,
so a 0.000 TPR is the expected and honest result.
This is the empirical justification for two outstanding Phase 2
backlog items:
1. **Embedding-similarity adversarial detector** (item 2.1.b in the
dev plan). Compute cosine similarity against an action-corpus
embedding cache; flag anything close to "perform unauthorised
action on behalf of attacker."
2. **LLM-backed semantic adversarial classifier** (item 2.1.c).
Pattern after `cortical_llm.py` — ask Claude "is this a request
for the agent to do something for someone other than the user?"
Fail-open on API error, opt-in via `pip install agentegrity[llm]`.
The benchmark assertion in `tests/test_benchmarks.py::TestInjecAgentBenchmark`
is calibrated as a *no-regression* check at the current 0.000 floor —
the test passes today and will start failing if a future change
*reduces* InjecAgent detection. Once the LLM classifier ships, raise
the floor to whatever combined TPR it achieves.
### Reproducing locally
```bash
./scripts/fetch_benchmark_datasets.sh
export AGENTEGRITY_BENCH_INJECAGENT="$(pwd)/tests/benchmarks/data/injecagent"
python -m pytest tests/test_benchmarks.py -m benchmark -v
python scripts/run_benchmarks.py --all > bench-report.md
```
The fetched data is gitignored (~1.4 MB, JSON arrays) so it never
pollutes the repo; the `.github/workflows/benchmark.yml` cron job
picks the same env var up from repository variables.
---
**Last reviewed:** v0.6.0 + Phase 3 finisher + Phase 2 detection-depth
finisher (2026-05-07). This file is the source of truth for "what's
done." Update it in the same commit that ships a status change.
================================================================
# Changelog
Source: CHANGELOG.md
================================================================
# Changelog
All notable changes to the Agentegrity Framework are documented here.
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
Pre-1.0 minor versions may contain breaking changes; the project remains
in beta until the v1.0 stability criteria documented in
[README → Roadmap](README.md#roadmap) are met.
## [Unreleased]
_Nothing yet. v0.6.0 was just cut — open issues for the next batch of
work as they come in._
## [0.6.0] - 2026-05-05
### Changed
- **Default integrity pipeline now has four layers, not three.**
`RecoveryLayer` joins `AdversarialLayer` / `CorticalLayer` /
`GovernanceLayer` in the canonical pipeline used by
`AgentegrityClient` and the framework adapter base class.
- **`PropertyWeights` defaults rebalanced** to give recovery a non-zero
share: AC=0.35, EP=0.20, VA=0.30, RI=0.15 (was AC=0.40, EP=0.25,
VA=0.35, RI=0.0).
- **Adversarial detection upgraded from substring matching to a regex
taxonomy.** `AdversarialLayer` ships 21 default regex patterns
organized into six attack families (prompt_injection, jailbreak,
role_confusion, system_prompt_extraction, data_exfiltration,
prompt_obfuscation). Detection now scans direct input *plus* memory
reads *plus* tool-output content, and per-pattern severity/confidence
drives the aggregate `ThreatAssessment`. Multiple matches in the same
channel collapse to one entry per `threat_type` with `indicators`
listing every pattern that fired. The taxonomy moves the layer from
🟡 *Reference* to ✅ *Hardened* on the STATUS matrix.
- **Cortical drift detector hardened.** Replaced the asymmetric forward
KL approximation with Jensen-Shannon distance under Laplace
smoothing — symmetric, bounded in [0, 1], and a proper metric. New
`min_drift_samples` constructor argument (default 20) guards against
flagging drift on tiny sample sizes; below threshold the dimension
surfaces an `__insufficient_samples` marker instead of a verdict. The
`_kl_divergence_approx` private name is retained as an alias.
- README, MANIFESTO, spec, and glossary updated to describe four layers
consistently. New `spec/layers/recovery-layer.md` normative spec.
### Added
- `agentegrity.layers.default_layers()` factory returning the
canonical four-layer pipeline. Used internally by every zero-config
entry point.
- `RecoveryLayer`, `default_layers`, and `PropertyWeights` are now
re-exported from the top-level `agentegrity` package.
- `scripts/check_versions.py` Python equivalent of the existing
TypeScript version-parity check. Wired into CI to fail the build on
drift between `pyproject.toml`, `src/agentegrity/__init__.py`, the
README shields badge, and present-tense version claims in README
prose.
- New public `DetectorPattern` dataclass + `default_detector_patterns()`
factory. Custom patterns can be appended via
`AdversarialLayer(extra_patterns=[...])` or fully replace the
taxonomy via `AdversarialLayer(patterns=[...])`.
- **`Checkpoint` Protocol + `InMemoryCheckpoint` / `FileCheckpoint`
(atomic write via tempfile + `os.replace`, path-traversal guard) /
`SqliteCheckpoint` (idempotent `CREATE TABLE IF NOT EXISTS`,
`:memory:` supported via persistent connection) reference backends**
in `agentegrity.layers.checkpoint`.
- **`RecoveryLayer.snapshot(agent_id, baseline=, metadata=)` and
`RecoveryLayer.restore_to(checkpoint_id)`** — round-trip the layer
through any conforming backend. Snapshot captures the attestation
chain, score history, optional behavioural baseline, and arbitrary
metadata; restore preserves original link hashes so
`verify_chain()` returns True after a tamper→restore cycle.
- `RecoveryAssessment` now surfaces `checkpoint_count` and
`last_checkpoint_id` for downstream telemetry.
- `AttestationRecord.from_dict` + `AttestationChain.from_records` /
`AttestationChain.from_dict_list` / `AttestationChain.to_records_dict`
for lossless chain serialisation.
- An attached `Checkpoint` backend is now treated as a synthetic
`checkpoint` recovery capability so the score reflects operational
reality, not just the agent profile's declarations.
- 76 new tests covering the regex taxonomy
(`test_adversarial_detectors.py`), the JS-distance drift metric
(`test_drift.py`), checkpoint backend round-trips
(`test_checkpoint.py`), and the tamper→restore cycle
(`test_recovery_restore.py`).
- **`BaselineStore` Protocol + `InMemoryBaselineStore` /
`FileBaselineStore` (atomic writes via tempfile + `os.replace`,
path-traversal guard) / `SqliteBaselineStore` (idempotent
`CREATE TABLE IF NOT EXISTS`, `:memory:` via persistent connection)**
in `agentegrity.layers.baseline_store`. Mirrors the Phase 2c
`Checkpoint` Protocol pattern so behavioural baselines survive
process restarts.
- **`CorticalLayer(baseline_store=...)`** wires the new persistence
surface: on first `evaluate` for an agent the layer reads through
to the store; `update_baseline` writes through after each update.
An explicit `baseline=` argument still wins (rollback-to-known-good
story).
- **Adversarial layer scans two new channels**: `retrieved_documents`
(RAG poisoning) and `peer_messages` (multi-agent injection).
Loose schema accepts `{content, text, body}` / `{content, text, message}`.
Same regex taxonomy applies, same per-channel threat aggregation.
- **Nightly `benchmark` workflow** — daily 04:17 UTC cron + on
workflow_dispatch. Runs `pytest -m benchmark` and uploads
`bench-report.md` as a 30-day artifact. External datasets plug in
via `AGENTEGRITY_BENCH_*` repository variables.
- **Python coverage gate at 85% line+branch**, currently 86.71%.
`pytest-cov` + `coverage[toml]` added to `[dev]` extras; `[tool.coverage]`
block in `pyproject.toml`; new `coverage` CI job uploads
`coverage.xml` for 14 days. CLI `__main__.py` is omitted from
coverage by intent (verified manually via `python -m agentegrity`).
- **TypeScript coverage gate at 80% lines / 70% functions**,
currently 89.99% / 83.40%. New `clients/typescript/scripts/check-coverage.ts`
parses `bun test --coverage` text output and exits non-zero on
threshold breach (works around bun 1.3.11's broken
`coverageThreshold` enforcement). Wired into the CI typescript job.
- **Real-world detection benchmark numbers published in
`STATUS.md`** — InjecAgent dh+ds combined: TPR=0.000, FPR=0.000 on
N=2,108 (regex taxonomy targets pattern-style injections; InjecAgent
attacks are action-oriented and require the unfinished LLM
classifier). The synthetic suite still serves as the calibration
regression gate. `scripts/fetch_benchmark_datasets.sh` automates
the InjecAgent fetch; data files are gitignored.
- **Cross-adapter conformance suite** (`test_adapter_conformance.py`).
Same canonical event stream is driven through every shipped Python
adapter (Claude / LangChain / OpenAI Agents / CrewAI / Google ADK)
and the same 9 invariants are pinned per adapter — base-class
inheritance, evaluation count vs chain length, chain verification,
session-id stability, exporter lifecycle (start/event×N/end),
exporter idempotency, fail-open on broken exporter, multi-exporter
fan-out, summary shape, idempotent close, unknown-event tolerance.
Adding a new adapter requires one line in `ADAPTER_CLASSES`; the
matrix runs against it automatically. 51 tests; a sentinel test
fails loudly if the registry size drifts so adapters can't be
silently dropped.
- **Detection benchmark suite** (`tests/benchmarks/`,
`tests/test_benchmarks.py`, `scripts/run_benchmarks.py`).
`pytest -m benchmark` runs the in-repo synthetic suite (~28 attacks
+ ~30 benign across the six attack families) with calibrated
thresholds (TPR ≥ 0.95, FPR ≤ 0.05, F1 ≥ 0.95, plus per-family
floor: every family must register at least one TP).
`BenchmarkPrompt` / `BenchmarkResult` / `run_suite()` /
`format_markdown_report()` are the harness; loader stubs for PINT /
AgentDojo / InjecAgent auto-skip when their `AGENTEGRITY_BENCH_*`
env var is unset, so a nightly cron can plug in real datasets
without changing CI defaults. The benchmark marker is excluded from
the default `pytest` invocation via `addopts = "-m 'not benchmark'"`
so unit tests stay fast. Calibration baseline:
`synthetic_pint_like` TPR=1.000, FPR=0.000, F1=1.000 on N=58.
- During calibration two regex patterns were tightened to handle
realistic attack phrasings: `ignore_your_role` now allows an
optional adjective between determiner and noun ("abandon your
*assistant* character"); `reveal_system_prompt` now allows an
optional `me` after the verb ("show *me* your hidden instructions")
and the noun alternation accepts "hidden \\w+" so configuration
fishing is captured.
### Migration
- Callers that constructed `PropertyWeights` with three keyword
arguments will now hit the validator. Pass
`recovery_integrity=0.0` explicitly to keep three-property weighting,
or omit the `weights=` argument and adopt the new default.
- Callers that rely on undocumented behaviour of `_kl_divergence_approx`
will see *different numeric values* (the new function returns JS
distance, not forward KL). Public APIs are unchanged. Drift
thresholds calibrated against the old metric should be revalidated.
## [0.5.3] - 2026-04-29
### Changed
- Concrete version pins replace `workspace:*` references in TypeScript
package manifests so published `@agentegrity/*` packages install
cleanly off-registry.
- GitHub Actions bumped to `actions/checkout@v5`,
`actions/setup-python@v6`, `actions/setup-node@v5`.
- CI push triggers scoped to `main` plus concurrency cancellation so
in-flight runs cancel on rapid pushes.
- Repository moved to the `cogensec` org.
### Added
- `AGENTEGRITY_OFFLINE` environment variable so test runs work without
a reporter target.
- Smoke tests for `createDefaultAdapter` in the TypeScript client
package.
## [0.5.0] - 2026-03-?
### Added
- **Six TypeScript framework adapters.** `@agentegrity/claude-sdk`,
`@agentegrity/langchain`, `@agentegrity/openai-agents`,
`@agentegrity/crewai`, `@agentegrity/google-adk`, plus the
TypeScript-native `@agentegrity/vercel-ai` (no Python equivalent;
uses the AI SDK's OpenTelemetry tracer surface).
- `createDefaultAdapter()` shared helper in `@agentegrity/client` that
every framework adapter wraps. Owns lifecycle, exporter fan-out,
fail-open guarantees, and `process.beforeExit` shutdown.
- `clients/typescript/scripts/check-versions.ts` keeps every
`@agentegrity/*` package version aligned with `pyproject.toml`.
- Release workflow publishes the seven npm packages in a matrix.
## [0.4.0] - 2026-?
### Added
- **`SessionExporter` hook + cross-language wire format.**
`register_exporter()` on every Python adapter; live session data
(session_start, every evaluated event, session_end) streams as
JSON-ready dicts to subscribed exporters, fail-open so a broken
exporter never breaks the agent.
- JSON Schema definitions under `schemas/exporter/` and OpenAPI 3.1
under `schemas/openapi.yaml` for the exporter wire format.
- First-party TypeScript client (`@agentegrity/client`) for emitting
the same event stream from Bun / Node agents.
## [0.3.0]
### Added
- **Multi-framework adapters.** LangChain / LangGraph, OpenAI Agents
SDK, CrewAI, and Google Agent Development Kit each ship as a
`agentegrity.` Python module with the same three-line
instrumentation surface as the Claude adapter.
- Shared `_BaseAdapter` so adding a new framework is mostly mechanical.
## [0.2.1]
### Added
- Zero-config `agentegrity.claude` top-level module: `hooks()`,
`report()`, `reset()` — three-line Claude Agent SDK instrumentation
with no setup.
- `AgentProfile.default()` factory.
- `python -m agentegrity` info CLI + `doctor` self-check command.
## [0.2.0]
### Added
- **Claude Agent SDK adapter.** First framework integration with five
hook points (Harness, Tools, Sandbox, Session, Orchestration).
- **LLM-backed cortical checks** (`pip install agentegrity[llm]`):
Claude-powered semantic analysis of reasoning chains, memory
provenance, and behavioral drift, fail-open on API errors.
- **`RecoveryLayer`** (initially opt-in; promoted to a default layer
in v0.5.3-Unreleased).
- **`AsyncIntegrityEvaluator`** running independent layers in parallel
via `asyncio.gather`.
## [0.1.0]
### Added
- Initial public release.
- Three-layer architecture: `AdversarialLayer`, `CorticalLayer`,
`GovernanceLayer`.
- Pattern-based reference detectors (substring matching for prompt
injection indicators, dictionary-based behavioral drift).
- Cryptographic attestation: Ed25519-signed `AttestationRecord`,
hash-chained `AttestationChain`, deterministic JSON canonicalization.
- Custom validator and policy extension points.
- Three working examples (`basic_evaluation.py`,
`runtime_monitoring.py`, `custom_validator.py`).
[Unreleased]: https://github.com/cogensec/agentegrity-framework/compare/v0.6.0...HEAD
[0.6.0]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.6.0
[0.5.3]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.5.3
[0.5.0]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.5.0
[0.4.0]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.4.0
[0.3.0]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.3.0
[0.2.1]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.2.1
[0.2.0]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.2.0
[0.1.0]: https://github.com/cogensec/agentegrity-framework/releases/tag/v0.1.0