Three Prompt Injection Patterns Your AI Security Detection Stack Misses

Prompt injection has become the AI security gap that detection engineering teams have no tooling for yet. The attack doesn’t arrive at the firewall looking like an attack. It arrives as a PDF a customer uploaded, a task description an AI agent retrieved, or a calendar invite the scheduling assistant processed. By the time the malicious instruction executes, it has already been treated as trusted input by the model itself. Prompt security is an architecture problem, but the detection-engineering gap runs deeper than most threat models acknowledge. Three prompt injection variants operating in production environments today cannot be surfaced by traditional security tooling.

Why Prompt Injection Misses WAF and EDR Coverage

Web application firewalls (WAFs) and endpoint detection and response (EDR) platforms were built to pattern-match against known malicious payloads in network traffic and process execution. Prompt injection bypasses both because the payload IS the content the system is supposed to process. An indirect injection embedded in a retrieved document doesn’t create a malicious network connection; it arrives as legitimate HTTP from the LLM’s retrieval call. A jailbreak via conversation-history poisoning doesn’t spawn a suspicious process; it modifies the behavioral context of the next inference call. The signal the detection toolchain expects – anomalous traffic, unexpected execution – is absent.

According to OWASP’s 2025 Top 10 for Large Language Model Applications, prompt injection ranks as the most severe risk category for LLM deployments. MITRE ATLAS, the adversary tactics taxonomy for ML systems, mirrors the MITRE ATT&CK structure and lists three prompt injection subtechniques under the initial-access tactic. Those are: direct injection via user input (AML.T0054.001), indirect injection via external content retrieval (AML.T0054.002), and prompt extraction to surface system instructions (AML.T0054.003). None of the three generate the telemetry signals that SOC detection stacks are calibrated to alert on.

Three Injection Patterns and the Detection Logic for Each

The three variants below are distinct enough in mechanism that each requires a separate detection approach. Treating them as a single “prompt injection” category produces a detection blind spot for the variants you’re not specifically watching.

Indirect injection via retrieved document content. An attacker embeds a malicious instruction in a document the LLM will retrieve at query time – a PDF in a customer file store, a Jira ticket body, a Confluence page in the knowledge base. When the model processes the retrieval result, the embedded instruction executes with the privileges of the retrieval session. Detection approach: log every retrieval operation that feeds a model context window, including source URL, document hash, and the content slice retrieved. Alert on retrieved content that contains high-density imperative verb structures against a pattern set built from known injection signatures. The telemetry source is retrieval-augmented generation (RAG) pipeline logs, not network or process logs.

Second-order injection through AI agent tool calls. In agentic deployments where the LLM can call external tools – web search, code execution, database query – an attacker injects malicious instructions into the output of a tool the agent is likely to call. The agent calls the tool legitimately; the tool returns attacker-controlled content; the agent executes the embedded instruction in its next reasoning step. Detection approach: log every tool-call input and output pair with the originating task context. Alert on tool outputs that produce instruction-formatted text in the next turn of the agent’s reasoning trace. The telemetry source is agent orchestration logs, specifically the tool-call and tool-response pairs, not the underlying tool’s own logs.

Conversation-history poisoning for behavioral context shift. In persistent multi-turn LLM deployments, an attacker uses a sequence of seemingly benign turns to progressively shift the model’s behavioral context before the target instruction is issued. By the time the harmful direction is given, the model’s prior context treats it as consistent with an established pattern. Detection approach: segment conversation histories into rolling 10-turn windows and compare the model’s behavioral profile across windows. Alert on sessions where the ratio of user-to-assistant turn length inverts sharply, where topic entropy drops to near-zero, or where system-role language appears in user turns. The telemetry source is conversation-session metadata plus turn-level behavioral analysis.

Closing the AI Security Detection Gap for Prompt Injection

The three tooling gaps above share a common architecture requirement: visibility into what the LLM is receiving and reasoning about, not just what arrives at the perimeter. Start with retrieval pipeline logging because indirect injection via document content is the highest-volume attack path in current production deployments, per OWASP’s threat model ranking.

Instrument the RAG retrieval pipeline before any other detection investment. Log document source, content hash, character-level similarity to known injection signatures, and whether the retrieved content produced an instruction-formatted output in the next model turn. This creates the baseline dataset for threshold-setting. Most enterprise RAG deployments have no retrieval logging at all; adding it costs one logging sink and a structured-output format change to the retrieval wrapper.

Build agent tool-call logging as a first-class telemetry source. Agent frameworks like LangChain, LangGraph, and AutoGen all emit tool-call events; most organizations route these to application logs rather than to the SOC’s SIEM. Move tool-call pairs into the SIEM. Write a detection rule that flags instruction-formatted content in tool outputs within three turns of a sensitive agent action: credential retrieval, file write, or external API call. This detection pattern won’t produce zero false positives on day one; the first 30 days of alert triage produce the calibration dataset for threshold refinement.

Add conversation-session behavioral analysis to the AI security monitoring stack. Groups that have implemented this earliest – primarily in financial services, where AI agent deployments serve customer-facing workflows – report that conversation-history poisoning attempts are identifiable in session metadata even when individual turns look benign. The prompt injection detection gap is not a tooling-vendor problem yet; it’s a detection-engineering priority-setting problem that security teams can close without waiting for a product release.

Join our LinkedIn group Information Security Community!

Holger Schulze
Holger Schulze is the founder and publisher of Cybersecurity Insiders, an independent cybersecurity media and research company. The publication centers on the security domains under the most pressure from AI: identity and phishing resistance, incident response velocity, application security, and threat intelligence tradecraft. Coverage maps the readiness gap between where CISO teams sit today and where AI-era attack speed is pushing them, and which moves close it fastest. Writing here applies Cybersecurity Insiders' Capability and Coherence Maturity Model to primary-research data and named incident analysis, evaluating security programs across the reactive, managed, and adaptive maturity tiers. Holger moderates the Information Security Community on LinkedIn, one of the largest cybersecurity professional networks. Connect at linkedin.com/in/holger-schulze.

No posts to display