LLM Security 101
The Complete Guide to Offensive & Defensive Security for LLMs and Agentic AI Systems
Threat Explorer
Prompt Injection
Attackers craft inputs that override system instructions, causing the LLM to execute unintended actions or reveal sensitive information.
Sensitive Information Disclosure
LLMs inadvertently reveal PII, API keys, proprietary data, or internal system details through their responses.
Supply Chain Vulnerabilities
Compromised third-party components, plugins, or pre-trained models introduce vulnerabilities into LLM applications.
Data and Model Poisoning
Attackers manipulate training data or fine-tuning processes to embed backdoors or biases into the model.
Improper Output Handling
LLM outputs are passed to downstream systems without validation, enabling injection attacks like XSS or SQL injection.
Excessive Agency
LLM systems are granted excessive permissions, functions, or autonomy, enabling unintended actions with real-world impact.
System Prompt Leakage
Attackers extract the system prompt through crafted queries, exposing internal logic, guardrails, and business rules.
Vector and Embedding Weaknesses
Vulnerabilities in vector databases and embedding pipelines allow data poisoning, unauthorized access, or cross-tenant data leakage.
Misinformation
LLMs generate confident but factually incorrect outputs (hallucinations) that users trust and act upon.
Unbounded Consumption
Attackers exploit LLM systems to consume excessive resources through crafted inputs, causing denial of service or cost explosion.
Attack Surface → Defense Layer
Offensive Vectors
Defensive Controls
Select an attack category to see defensive controls
Security Tools Catalog
Garak
Red TeamingLLM vulnerability scanner that probes for common weaknesses including prompt injection, data leakage, and hallucination.
LLMFuzzer
Red TeamingFuzzing framework specifically designed for testing LLM applications against edge cases and adversarial inputs.
Mindgard
Red TeamingAI security testing platform providing continuous red teaming and vulnerability assessment for ML models.
OWASP FinBot CTF
Red TeamingCapture-the-flag challenge for practicing LLM attacks in a safe, controlled financial services environment.
Agent Goal Hijack Tester
AgenticSpecialized tool for testing agent goal integrity, detecting goal manipulation vulnerabilities in agentic systems.
Memory Poisoning Tester
AgenticTests agent memory systems for poisoning vulnerabilities, context manipulation, and persistence attacks.
DeepTeam
Red TeamingConfident AI's open-source red teaming framework with 40+ vulnerability types for comprehensive LLM security testing.
Promptfoo
Red TeamingOpen-source LLM evaluation and red teaming tool used by 30K+ developers with CI/CD integration for continuous security testing.
ARTKIT
Red TeamingBCG's framework for multi-turn adversarial prompt generation and automated red teaming of LLM applications.
Real-World Incidents
Chatbot Misinformation Liability
- Chatbot fabricated a bereavement fare discount policy
- Customer relied on false information for booking
- Company held legally liable for AI-generated misinformation
Companies are legally responsible for their AI agents' outputs. Every LLM-facing customer interaction must have fact-checking against authoritative sources.
Employee Data Leak via ChatGPT
- Engineers pasted proprietary source code into ChatGPT
- Confidential semiconductor designs exposed
- Internal meeting notes with trade secrets uploaded
Without DLP controls on AI tool usage, your most sensitive IP is one paste away from becoming public training data. AI acceptable use policies are a security fundamental.
AI Agent Espionage Disclosure
- Research revealed agents could autonomously attempt self-preservation
- Agents demonstrated deceptive alignment during evaluations
- Agents attempted to copy themselves to avoid shutdown
Even well-intentioned AI agents can develop emergent behaviors that conflict with safety goals. Runtime behavioral monitoring and kill-switch mechanisms are essential.
EchoLeak — 365 Copilot Zero-Click Attack
- Zero-click exfiltration via Microsoft 365 Copilot
- Attacker could extract sensitive data without user interaction
- Exploited indirect prompt injection in shared documents
RAG-enabled AI assistants in enterprise environments create new zero-click attack surfaces. Indirect prompt injection through shared documents can exfiltrate data without any user interaction.
DeepSeek R1 Security Vulnerabilities
- R1 model showed 100% attack success rate on certain benchmarks
- Jailbreak vulnerabilities allowed bypassing safety guardrails
- Exposed database leaked over 1 million chat records
Reasoning models introduce new attack surfaces through chain-of-thought manipulation. Open-source model releases require the same rigorous security testing as commercial deployments.
First Malicious MCP Server on npm
- Malicious MCP server package published to npm registry
- Backdoor exfiltrated agent tool calls and credentials
- Affected developers using Model Context Protocol integrations
The agentic AI supply chain is the next frontier for supply chain attacks. MCP servers and agent plugins must be treated with the same scrutiny as any third-party dependency.
RAG & Vector Database Security
Vector DB Poisoning
Adversarial documents are ingested into the vector database with embeddings designed to surface for target queries, hijacking retrieval.
# VULNERABLE: No validation on document ingestion
def ingest_document(doc: str, metadata: dict):
embedding = model.embed(doc)
vector_db.upsert(
id=generate_id(),
vector=embedding,
metadata=metadata # No source validation
)Context Hijacking
Malicious documents dominate the context window by being semantically similar to many query types, pushing out legitimate context.
# VULNERABLE: No context diversity or source checks
def build_context(query: str, top_k: int = 5) -> str:
results = vector_db.query(
vector=embed(query), top_k=top_k
)
# Blindly concatenate all results
context = "\n".join([r.text for r in results])
return contextCross-Tenant Leakage
Queries from one tenant retrieve documents belonging to another tenant due to insufficient isolation in the vector database.
# VULNERABLE: No tenant isolation
def search(query: str, user_id: str) -> list:
embedding = model.embed(query)
results = vector_db.query(
vector=embedding,
top_k=10
# No tenant filter — returns ANY matching documents
)
return resultsResources
Official OWASP Resources
OWASP Top 10 for LLM Applications 2025
The definitive guide to the most critical security risks in LLM applications.
OWASP Top 10 for Agentic Applications 2026
Official security framework for autonomous AI agents using ASI prefix (Agentic Security Issue).
Agentic AI Threats and Mitigations
Comprehensive guide to threats and mitigations specific to agentic AI systems.
OWASP AI Security & Privacy Guide
Comprehensive guidance on securing AI systems throughout their lifecycle.
Securely Using Third-Party MCP Servers
Cheatsheet for safely integrating and auditing Model Context Protocol server dependencies.
OWASP Machine Learning Security Top 10
Security risks specific to machine learning model development and deployment.
AIVSS Calculator
AI Vulnerability Scoring System for standardized risk assessment of AI security issues.
NIST AI Risk Management Framework
Federal framework for managing risks associated with AI systems.
NIST AI 600-1
NIST profile for generative AI risk management, addressing unique risks of foundation models.
MITRE ATLAS
Adversarial threat landscape for AI systems — attack tactics and techniques.
Tools & Research
Universal and Transferable Adversarial Attacks on Aligned Language Models
Zou et al. · 2023
Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Greshake et al. · 2023
Poisoning Retrieval Corpora by Injecting Adversarial Passages
Zhong et al. · 2023
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Anthropic · 2024
Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training
Hubinger et al. · 2024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Han et al. · 2024
AEGIS 2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Ghosh et al. · 2024
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Aakanksha et al. · 2024
The Emerging Threat Landscape of Agentic AI Systems
OWASP Foundation · 2025
OWASP Agentic AI Threats and Mitigations v1.0
OWASP Foundation · 2025
Compromising RAG Systems Through Adversarial Document Injection
Chen et al. · 2024