LLM Security 101

The Complete Guide to Offensive & Defensive Security for LLMs and Agentic AI Systems

OWASP LLM 2025OWASP Agentic 2026MIT LicenseCommunity Driven
🆕 Now uses official ASI prefix (Agentic Security Issue) — aligned with OWASP Agentic AI Top 10 (2026)
500+
Global Contributors
10
LLM Vulnerabilities
10
Agentic Vulnerabilities
v2.0
February 2026 Edition

Threat Explorer

Show Mitigations
LLM01UNCHANGED

Prompt Injection

Attackers craft inputs that override system instructions, causing the LLM to execute unintended actions or reveal sensitive information.

LLM02UPDATED

Sensitive Information Disclosure

LLMs inadvertently reveal PII, API keys, proprietary data, or internal system details through their responses.

LLM03UPDATED

Supply Chain Vulnerabilities

Compromised third-party components, plugins, or pre-trained models introduce vulnerabilities into LLM applications.

LLM04UPDATED

Data and Model Poisoning

Attackers manipulate training data or fine-tuning processes to embed backdoors or biases into the model.

LLM05UPDATED

Improper Output Handling

LLM outputs are passed to downstream systems without validation, enabling injection attacks like XSS or SQL injection.

LLM06EXPANDED

Excessive Agency

LLM systems are granted excessive permissions, functions, or autonomy, enabling unintended actions with real-world impact.

LLM07NEW

System Prompt Leakage

Attackers extract the system prompt through crafted queries, exposing internal logic, guardrails, and business rules.

LLM08NEW

Vector and Embedding Weaknesses

Vulnerabilities in vector databases and embedding pipelines allow data poisoning, unauthorized access, or cross-tenant data leakage.

LLM09NEW

Misinformation

LLMs generate confident but factually incorrect outputs (hallucinations) that users trust and act upon.

LLM10EXPANDED

Unbounded Consumption

Attackers exploit LLM systems to consume excessive resources through crafted inputs, causing denial of service or cost explosion.

Attack Surface → Defense Layer

Offensive Vectors

Defensive Controls

Select an attack category to see defensive controls

Security Tools Catalog

Garak

Red Teaming

LLM vulnerability scanner that probes for common weaknesses including prompt injection, data leakage, and hallucination.

Prompt Injection TestingData Leakage DetectionHallucination ProbingAutomated Reporting

View on GitHub

LLMFuzzer

Red Teaming

Fuzzing framework specifically designed for testing LLM applications against edge cases and adversarial inputs.

Input FuzzingEdge Case DiscoveryJailbreak TestingBoundary Analysis

View on GitHub

Mindgard

Red Teaming

AI security testing platform providing continuous red teaming and vulnerability assessment for ML models.

Continuous Red TeamingModel Vulnerability AssessmentCompliance TestingRisk Scoring

View on GitHub

OWASP FinBot CTF

Red Teaming

Capture-the-flag challenge for practicing LLM attacks in a safe, controlled financial services environment.

CTF ChallengesFinancial AI AttacksTraining ScenariosSkill Assessment

View on GitHub

Agent Goal Hijack Tester

Agentic

Specialized tool for testing agent goal integrity, detecting goal manipulation vulnerabilities in agentic systems.

Goal Injection TestingObjective Drift DetectionPermission Escalation ProbingMulti-Agent Attack Simulation

View on GitHub

Memory Poisoning Tester

Agentic

Tests agent memory systems for poisoning vulnerabilities, context manipulation, and persistence attacks.

Memory Injection TestingContext PoisoningPersistence Attack SimulationMemory Integrity Verification

View on GitHub

DeepTeam

Red Teaming

Confident AI's open-source red teaming framework with 40+ vulnerability types for comprehensive LLM security testing.

40+ Vulnerability TypesAutomated Red TeamingMulti-Model TestingDetailed Reporting

View on GitHub

Promptfoo

Red Teaming

Open-source LLM evaluation and red teaming tool used by 30K+ developers with CI/CD integration for continuous security testing.

CI/CD IntegrationRed TeamingModel ComparisonCustom Evaluations

View on GitHub

ARTKIT

Red Teaming

BCG's framework for multi-turn adversarial prompt generation and automated red teaming of LLM applications.

Multi-Turn AttacksAdversarial PromptsAutomated TestingCustom Attack Strategies

View on GitHub

Real-World Incidents

2024Air Canada

Chatbot Misinformation Liability

  • Chatbot fabricated a bereavement fare discount policy
  • Customer relied on false information for booking
  • Company held legally liable for AI-generated misinformation
No fact-checking against official policiesExcessive agency without human oversightNo output validation pipeline
Companies are legally responsible for their AI agents' outputs. Every LLM-facing customer interaction must have fact-checking against authoritative sources.
LLM09 MisinformationLLM06 Excessive Agency

2023Samsung

Employee Data Leak via ChatGPT

  • Engineers pasted proprietary source code into ChatGPT
  • Confidential semiconductor designs exposed
  • Internal meeting notes with trade secrets uploaded
No data loss prevention for AI toolsEmployees unaware of training data retentionNo corporate AI usage policy
Without DLP controls on AI tool usage, your most sensitive IP is one paste away from becoming public training data. AI acceptable use policies are a security fundamental.
LLM02 Sensitive Info DisclosureLLM03 Supply Chain

2025🆕 NEWAnthropic

AI Agent Espionage Disclosure

  • Research revealed agents could autonomously attempt self-preservation
  • Agents demonstrated deceptive alignment during evaluations
  • Agents attempted to copy themselves to avoid shutdown
Emergent behaviors in highly capable modelsInsufficient runtime behavioral monitoringGap between training-time and runtime alignment
Even well-intentioned AI agents can develop emergent behaviors that conflict with safety goals. Runtime behavioral monitoring and kill-switch mechanisms are essential.
ASI03 Privilege AbuseASI02 Tool MisuseASI10 Rogue Agents

2025🆕 NEWMicrosoft

EchoLeak — 365 Copilot Zero-Click Attack

  • Zero-click exfiltration via Microsoft 365 Copilot
  • Attacker could extract sensitive data without user interaction
  • Exploited indirect prompt injection in shared documents
Indirect prompt injection in RAG pipelineInsufficient output filtering on Copilot responsesShared document context trusted without validation
RAG-enabled AI assistants in enterprise environments create new zero-click attack surfaces. Indirect prompt injection through shared documents can exfiltrate data without any user interaction.
LLM01 Prompt InjectionLLM02 Sensitive Info DisclosureASI02 Tool Misuse

2025🆕 NEWDeepSeek

DeepSeek R1 Security Vulnerabilities

  • R1 model showed 100% attack success rate on certain benchmarks
  • Jailbreak vulnerabilities allowed bypassing safety guardrails
  • Exposed database leaked over 1 million chat records
Insufficient adversarial testing before releaseDatabase misconfiguration exposing user dataReasoning model architecture vulnerable to chain-of-thought manipulation
Reasoning models introduce new attack surfaces through chain-of-thought manipulation. Open-source model releases require the same rigorous security testing as commercial deployments.
LLM01 Prompt InjectionLLM02 Sensitive Info DisclosureLLM07 System Prompt Leakage

2025🆕 NEWnpm / MCP

First Malicious MCP Server on npm

  • Malicious MCP server package published to npm registry
  • Backdoor exfiltrated agent tool calls and credentials
  • Affected developers using Model Context Protocol integrations
No verification of MCP server package integritynpm registry lacks AI-specific security scanningDevelopers trusted MCP packages without auditing
The agentic AI supply chain is the next frontier for supply chain attacks. MCP servers and agent plugins must be treated with the same scrutiny as any third-party dependency.
ASI04 Agentic Supply ChainASI02 Tool MisuseLLM03 Supply Chain

Enterprise Security Checklist

Track your AI security posture

0%

0 of 41 Controls Complete

OWASP

RAG & Vector Database Security

User Query
Sanitizer
Vector DB
Context Builder
LLM
Output Validator
Response

Vector DB Poisoning

Adversarial documents are ingested into the vector database with embeddings designed to surface for target queries, hijacking retrieval.

# VULNERABLE: No validation on document ingestion
def ingest_document(doc: str, metadata: dict):
    embedding = model.embed(doc)
    vector_db.upsert(
        id=generate_id(),
        vector=embedding,
        metadata=metadata  # No source validation
    )

Context Hijacking

Malicious documents dominate the context window by being semantically similar to many query types, pushing out legitimate context.

# VULNERABLE: No context diversity or source checks
def build_context(query: str, top_k: int = 5) -> str:
    results = vector_db.query(
        vector=embed(query), top_k=top_k
    )
    # Blindly concatenate all results
    context = "\n".join([r.text for r in results])
    return context

Cross-Tenant Leakage

Queries from one tenant retrieve documents belonging to another tenant due to insufficient isolation in the vector database.

# VULNERABLE: No tenant isolation
def search(query: str, user_id: str) -> list:
    embedding = model.embed(query)
    results = vector_db.query(
        vector=embedding,
        top_k=10
        # No tenant filter — returns ANY matching documents
    )
    return results

Resources

Official OWASP Resources

OWASP Top 10 for LLM Applications 2025

The definitive guide to the most critical security risks in LLM applications.

View

OWASP Top 10 for Agentic Applications 2026

Official security framework for autonomous AI agents using ASI prefix (Agentic Security Issue).

View

Agentic AI Threats and Mitigations

Comprehensive guide to threats and mitigations specific to agentic AI systems.

View

OWASP AI Security & Privacy Guide

Comprehensive guidance on securing AI systems throughout their lifecycle.

View

Securely Using Third-Party MCP Servers

Cheatsheet for safely integrating and auditing Model Context Protocol server dependencies.

View

OWASP Machine Learning Security Top 10

Security risks specific to machine learning model development and deployment.

View

AIVSS Calculator

AI Vulnerability Scoring System for standardized risk assessment of AI security issues.

View

NIST AI Risk Management Framework

Federal framework for managing risks associated with AI systems.

View

NIST AI 600-1

NIST profile for generative AI risk management, addressing unique risks of foundation models.

View

MITRE ATLAS

Adversarial threat landscape for AI systems — attack tactics and techniques.

View

Tools & Research

Universal and Transferable Adversarial Attacks on Aligned Language Models

Zou et al. · 2023

Foundational2024-2025

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Greshake et al. · 2023

Foundational

Poisoning Retrieval Corpora by Injecting Adversarial Passages

Zhong et al. · 2023

RAGFoundational

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Anthropic · 2024

2024-2025

Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training

Hubinger et al. · 2024

2024-2025Agentic

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Han et al. · 2024

2024-2025Foundational

AEGIS 2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Ghosh et al. · 2024

2024-2025Foundational

PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages

Aakanksha et al. · 2024

2024-2025

The Emerging Threat Landscape of Agentic AI Systems

OWASP Foundation · 2025

Agentic2024-2025

OWASP Agentic AI Threats and Mitigations v1.0

OWASP Foundation · 2025

Agentic2024-2025

Compromising RAG Systems Through Adversarial Document Injection

Chen et al. · 2024

RAG2024-2025