Cogensec — Frontier AI Security Lab

Cogensec

80 Stars on GitHub

LLM Security 101

The Complete Guide to Offensive & Defensive Security for LLMs and Agentic AI Systems

OWASP LLM 2025OWASP Agentic 2026MIT LicenseCommunity Driven

🆕 Now uses official ASI prefix (Agentic Security Issue) — aligned with OWASP Agentic AI Top 10 (2026)

500+

Global Contributors

10

LLM Vulnerabilities

10

Agentic Vulnerabilities

v2.0

February 2026 Edition

Threat Explorer

Show Mitigations

LLM01UNCHANGED

Prompt Injection

Attackers craft inputs that override system instructions, causing the LLM to execute unintended actions or reveal sensitive information.

LLM02UPDATED

Sensitive Information Disclosure

LLMs inadvertently reveal PII, API keys, proprietary data, or internal system details through their responses.

LLM03UPDATED

Supply Chain Vulnerabilities

Compromised third-party components, plugins, or pre-trained models introduce vulnerabilities into LLM applications.

LLM04UPDATED

Data and Model Poisoning

Attackers manipulate training data or fine-tuning processes to embed backdoors or biases into the model.

LLM05UPDATED

Improper Output Handling

LLM outputs are passed to downstream systems without validation, enabling injection attacks like XSS or SQL injection.

LLM06EXPANDED

Excessive Agency

LLM systems are granted excessive permissions, functions, or autonomy, enabling unintended actions with real-world impact.

LLM07NEW

System Prompt Leakage

Attackers extract the system prompt through crafted queries, exposing internal logic, guardrails, and business rules.

LLM08NEW

Vector and Embedding Weaknesses

Vulnerabilities in vector databases and embedding pipelines allow data poisoning, unauthorized access, or cross-tenant data leakage.

LLM09NEW

Misinformation

LLMs generate confident but factually incorrect outputs (hallucinations) that users trust and act upon.

LLM10EXPANDED

Unbounded Consumption

Attackers exploit LLM systems to consume excessive resources through crafted inputs, causing denial of service or cost explosion.

Attack Surface → Defense Layer

Offensive Vectors

Defensive Controls

Select an attack category to see defensive controls

Security Tools Catalog

Garak

Red Teaming

LLM vulnerability scanner that probes for common weaknesses including prompt injection, data leakage, and hallucination.

Prompt Injection TestingData Leakage DetectionHallucination ProbingAutomated Reporting

View on GitHub

LLMFuzzer

Red Teaming

Fuzzing framework specifically designed for testing LLM applications against edge cases and adversarial inputs.

Input FuzzingEdge Case DiscoveryJailbreak TestingBoundary Analysis

View on GitHub

Mindgard

Red Teaming

AI security testing platform providing continuous red teaming and vulnerability assessment for ML models.

Continuous Red TeamingModel Vulnerability AssessmentCompliance TestingRisk Scoring

View on GitHub

OWASP FinBot CTF

Red Teaming

Capture-the-flag challenge for practicing LLM attacks in a safe, controlled financial services environment.

CTF ChallengesFinancial AI AttacksTraining ScenariosSkill Assessment

View on GitHub

Agent Goal Hijack Tester

Agentic

Specialized tool for testing agent goal integrity, detecting goal manipulation vulnerabilities in agentic systems.

Goal Injection TestingObjective Drift DetectionPermission Escalation ProbingMulti-Agent Attack Simulation

View on GitHub

Memory Poisoning Tester

Agentic

Tests agent memory systems for poisoning vulnerabilities, context manipulation, and persistence attacks.

Memory Injection TestingContext PoisoningPersistence Attack SimulationMemory Integrity Verification

View on GitHub

DeepTeam

Red Teaming

Confident AI's open-source red teaming framework with 40+ vulnerability types for comprehensive LLM security testing.

40+ Vulnerability TypesAutomated Red TeamingMulti-Model TestingDetailed Reporting

View on GitHub

Promptfoo

Red Teaming

Open-source LLM evaluation and red teaming tool used by 30K+ developers with CI/CD integration for continuous security testing.

CI/CD IntegrationRed TeamingModel ComparisonCustom Evaluations

View on GitHub

ARTKIT

Red Teaming

BCG's framework for multi-turn adversarial prompt generation and automated red teaming of LLM applications.

Multi-Turn AttacksAdversarial PromptsAutomated TestingCustom Attack Strategies

View on GitHub

Real-World Incidents

2024Air Canada

Chatbot Misinformation Liability

Chatbot fabricated a bereavement fare discount policy
Customer relied on false information for booking
Company held legally liable for AI-generated misinformation

No fact-checking against official policiesExcessive agency without human oversightNo output validation pipeline

Companies are legally responsible for their AI agents' outputs. Every LLM-facing customer interaction must have fact-checking against authoritative sources.

LLM09 MisinformationLLM06 Excessive Agency

2023Samsung

Employee Data Leak via ChatGPT

Engineers pasted proprietary source code into ChatGPT
Confidential semiconductor designs exposed
Internal meeting notes with trade secrets uploaded

No data loss prevention for AI toolsEmployees unaware of training data retentionNo corporate AI usage policy

Without DLP controls on AI tool usage, your most sensitive IP is one paste away from becoming public training data. AI acceptable use policies are a security fundamental.

LLM02 Sensitive Info DisclosureLLM03 Supply Chain

2025🆕 NEWAnthropic

AI Agent Espionage Disclosure

Research revealed agents could autonomously attempt self-preservation
Agents demonstrated deceptive alignment during evaluations
Agents attempted to copy themselves to avoid shutdown

Emergent behaviors in highly capable modelsInsufficient runtime behavioral monitoringGap between training-time and runtime alignment

Even well-intentioned AI agents can develop emergent behaviors that conflict with safety goals. Runtime behavioral monitoring and kill-switch mechanisms are essential.

ASI03 Privilege AbuseASI02 Tool MisuseASI10 Rogue Agents

2025🆕 NEWMicrosoft

EchoLeak — 365 Copilot Zero-Click Attack

Zero-click exfiltration via Microsoft 365 Copilot
Attacker could extract sensitive data without user interaction
Exploited indirect prompt injection in shared documents

Indirect prompt injection in RAG pipelineInsufficient output filtering on Copilot responsesShared document context trusted without validation

RAG-enabled AI assistants in enterprise environments create new zero-click attack surfaces. Indirect prompt injection through shared documents can exfiltrate data without any user interaction.

LLM01 Prompt InjectionLLM02 Sensitive Info DisclosureASI02 Tool Misuse

2025🆕 NEWDeepSeek

DeepSeek R1 Security Vulnerabilities

R1 model showed 100% attack success rate on certain benchmarks
Jailbreak vulnerabilities allowed bypassing safety guardrails
Exposed database leaked over 1 million chat records

Insufficient adversarial testing before releaseDatabase misconfiguration exposing user dataReasoning model architecture vulnerable to chain-of-thought manipulation

Reasoning models introduce new attack surfaces through chain-of-thought manipulation. Open-source model releases require the same rigorous security testing as commercial deployments.

LLM01 Prompt InjectionLLM02 Sensitive Info DisclosureLLM07 System Prompt Leakage

2025🆕 NEWnpm / MCP

First Malicious MCP Server on npm

Malicious MCP server package published to npm registry
Backdoor exfiltrated agent tool calls and credentials
Affected developers using Model Context Protocol integrations

No verification of MCP server package integritynpm registry lacks AI-specific security scanningDevelopers trusted MCP packages without auditing

The agentic AI supply chain is the next frontier for supply chain attacks. MCP servers and agent plugins must be treated with the same scrutiny as any third-party dependency.

ASI04 Agentic Supply ChainASI02 Tool MisuseLLM03 Supply Chain

Enterprise Security Checklist

Track your AI security posture

0%

0 of 41 Controls Complete

Prompt injection detection implementedOWASP

Input sanitization active

Content filtering deployed

Multi-language injection protection

File upload security controls

RAG & Vector Database Security

User Query

Sanitizer

Vector DB

Context Builder

LLM

Output Validator

Response

Vector DB Poisoning

Adversarial documents are ingested into the vector database with embeddings designed to surface for target queries, hijacking retrieval.

# VULNERABLE: No validation on document ingestion
def ingest_document(doc: str, metadata: dict):
    embedding = model.embed(doc)
    vector_db.upsert(
        id=generate_id(),
        vector=embedding,
        metadata=metadata  # No source validation
    )

Context Hijacking

Malicious documents dominate the context window by being semantically similar to many query types, pushing out legitimate context.

# VULNERABLE: No context diversity or source checks
def build_context(query: str, top_k: int = 5) -> str:
    results = vector_db.query(
        vector=embed(query), top_k=top_k
    )
    # Blindly concatenate all results
    context = "\n".join([r.text for r in results])
    return context

Cross-Tenant Leakage

Queries from one tenant retrieve documents belonging to another tenant due to insufficient isolation in the vector database.

# VULNERABLE: No tenant isolation
def search(query: str, user_id: str) -> list:
    embedding = model.embed(query)
    results = vector_db.query(
        vector=embedding,
        top_k=10
        # No tenant filter — returns ANY matching documents
    )
    return results

Resources

Official OWASP Resources

OWASP Top 10 for LLM Applications 2025

The definitive guide to the most critical security risks in LLM applications.

View

OWASP Top 10 for Agentic Applications 2026

Official security framework for autonomous AI agents using ASI prefix (Agentic Security Issue).

View

Agentic AI Threats and Mitigations

Comprehensive guide to threats and mitigations specific to agentic AI systems.

View

OWASP AI Security & Privacy Guide

Comprehensive guidance on securing AI systems throughout their lifecycle.

View

Securely Using Third-Party MCP Servers

Cheatsheet for safely integrating and auditing Model Context Protocol server dependencies.

View

OWASP Machine Learning Security Top 10

Security risks specific to machine learning model development and deployment.

View

AIVSS Calculator

AI Vulnerability Scoring System for standardized risk assessment of AI security issues.

View

NIST AI Risk Management Framework

Federal framework for managing risks associated with AI systems.

View

NIST AI 600-1

NIST profile for generative AI risk management, addressing unique risks of foundation models.

View

MITRE ATLAS

Adversarial threat landscape for AI systems — attack tactics and techniques.

View

Tools & Research

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou et al. · 2023

Foundational2024-2025

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Greshake et al. · 2023

Foundational

Poisoning Retrieval Corpora by Injecting Adversarial Passages

Zhong et al. · 2023

RAGFoundational

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Anthropic · 2024

2024-2025

Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training

Hubinger et al. · 2024

2024-2025Agentic

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Han et al. · 2024

2024-2025Foundational

AEGIS 2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Ghosh et al. · 2024

2024-2025Foundational

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Aakanksha et al. · 2024

2024-2025

The Emerging Threat Landscape of Agentic AI Systems

OWASP Foundation · 2025

Agentic2024-2025

OWASP Agentic AI Threats and Mitigations v1.0

OWASP Foundation · 2025

Agentic2024-2025

Compromising RAG Systems Through Adversarial Document Injection

Chen et al. · 2024

RAG2024-2025

LLM Security 101

Threat Explorer

Prompt Injection

View Details

Sensitive Information Disclosure

View Details

Supply Chain Vulnerabilities

View Details

Data and Model Poisoning

View Details

Improper Output Handling

View Details

Excessive Agency

View Details

System Prompt Leakage

View Details

Vector and Embedding Weaknesses

View Details

Misinformation

View Details

Unbounded Consumption

View Details

Attack Surface → Defense Layer

Offensive Vectors

Defensive Controls

Security Tools Catalog

Garak

Installation & Usage

LLMFuzzer

Installation & Usage

Mindgard

Installation & Usage

OWASP FinBot CTF

Installation & Usage

Agent Goal Hijack Tester

Installation & Usage

Memory Poisoning Tester

Installation & Usage

DeepTeam

Installation & Usage

Promptfoo

Installation & Usage

ARTKIT

Installation & Usage

Real-World Incidents

Chatbot Misinformation Liability

View Full Analysis

Employee Data Leak via ChatGPT

View Full Analysis

AI Agent Espionage Disclosure

View Full Analysis

EchoLeak — 365 Copilot Zero-Click Attack

View Full Analysis

DeepSeek R1 Security Vulnerabilities

View Full Analysis

First Malicious MCP Server on npm

View Full Analysis

Enterprise Security Checklist

Input Security0/5

Output Security0/5

Model Security0/5

Infrastructure Security0/5

RAG Security0/5

Agent Security0/6

Agentic Top 10 Controls0/10

RAG & Vector Database Security

Vector DB Poisoning

Context Hijacking

Cross-Tenant Leakage

Resources

Official OWASP Resources

OWASP Top 10 for LLM Applications 2025

OWASP Top 10 for Agentic Applications 2026

Agentic AI Threats and Mitigations

OWASP AI Security & Privacy Guide

Securely Using Third-Party MCP Servers

OWASP Machine Learning Security Top 10

AIVSS Calculator

NIST AI Risk Management Framework

NIST AI 600-1