Lab

RAG Poisoning Simulator

Inject adversarial documents into a simulated RAG pipeline and observe how poisoned context changes retrieval and generation.

JavaScript Required

The RAG Poisoning Simulator requires JavaScript to run in your browser. Please enable JavaScript to use this tool.

Educational simulation

All retrieval and generation is pre-computed. No real embeddings or LLM calls are made. This is a deterministic visualization of RAG poisoning concepts.

What this is

A simulation of how adversarial documents can poison a Retrieval-Augmented Generation pipeline. You inject crafted content into a document corpus and observe how it manipulates retrieval rankings and model output.

What you'll learn

How RAG pipelines retrieve and rank documents
Why adversarial documents can dominate retrieval
How defenses like provenance checks and input filtering help

Document Corpus

The knowledge base contains 4 trusted security advisory documents.

4 clean documents

CVE-2024-3094: XZ Utils Backdoor

Security

Critical supply-chain compromise in xz/liblzma affecting SSH authentication. Malicious code injected via build process targets sshd on x86-64 Linux systems. CVSS 10.0. Immediate update to xz 5.6.1+ required.

CVE-2024-21762: FortiOS Out-of-Bound Write

Security

Critical vulnerability in Fortinet FortiOS SSL VPN allowing remote code execution via specially crafted HTTP requests. CVSS 9.8. Actively exploited in the wild. Patch to FortiOS 7.4.3+ immediately.

NIST SP 800-53: Access Control Best Practices

Policy

Implement least-privilege access controls. Enforce MFA on all administrative accounts. Review access logs quarterly. Segment networks to limit lateral movement. Maintain up-to-date asset inventory.

Patch Management SOP v3.1

Operations

Critical patches must be applied within 48 hours. High-severity within 7 days. All patches require staging environment validation. Emergency patches follow the CAB fast-track approval process.

Adversarial Injection

Craft a poisoned document and inject it into the corpus.

Poison Type Adversarial Document Content

Query Pipeline

Run a query through the RAG pipeline and observe each stage.

Query

Embed user question

Similarity

Compute cosine scores

Ranking

Order by relevance

Augment

Build context prompt

Response

Generate answer

Ready to query

Defense Mechanisms

Enable defenses to filter or flag adversarial documents during retrieval.

All defenses off

Similarity Threshold

Reject documents below 0.7 cosine similarity

Score gating

Filters at: Ranking stage

Provenance Check

Verify document source metadata

Source validation

Filters at: Retrieval stage

Input Filtering

Scan retrieved documents for injection patterns

Pattern detection

Filters at: Augmentation stage

How RAG Pipelines Work

What is RAG?

Retrieval-Augmented Generation combines a search index with a language model. When a user asks a question, relevant documents are retrieved from a corpus, injected into the prompt as context, and the model generates an answer grounded in that context, reducing hallucination and enabling domain-specific knowledge.

Cosine Similarity

Documents and queries are converted to embedding vectors. Cosine similarity measures the angle between two vectors: 1.0 means identical direction (highly relevant), 0 means orthogonal (unrelated). Adversarial documents are crafted to have high cosine similarity to likely queries, ensuring they rank near the top.

Poison Payload Types

Hidden instruction embeds prompt injection tokens (e.g., [INST]) in the document. Topic hijack mimics legitimate content but redirects advice. Authority impersonation fakes a trusted source to boost the model's confidence in the poisoned content.

Defense Strategies

Similarity thresholds reject low-relevance documents. Provenance checks verify document source metadata against a trusted allowlist. Input filtering scans retrieved text for injection patterns before it reaches the model. Layering all three provides robust protection.

Security model (30 seconds)

This tool runs entirely in your browser. No real embeddings are computed, no LLM inference occurs, and no data is sent to any server. All retrieval scores and generated responses are pre-computed lookup tables designed to illustrate RAG poisoning concepts.