Skip to content

Detection Overview

Pisama ships 87 detectors across the MAST failure taxonomy, spanning planning, execution, verification, safety, and 5 agent frameworks. 54 are measured against real and synthetic traces, and 25 are externally validated at production grade (real-trace F1 of 0.80 or higher, mean 0.93). The rest are in active calibration.

Detectors are organized by the MAST taxonomy, with extensions for enterprise use cases and the published AI agent safety taxonomy. The per-detector F1 below is the real-trace score where real coverage exists and the synthetic-backed score otherwise; the Coverage column shows how many real traces back each number.

Failure Mode Summary

F1 is the real-trace (external-lane) score for detectors with real coverage, and the blended (external+synthetic) score for thin/synthetic detectors. Coverage = real external traces backing the number: real (>=30, externally validated), thin (1-29, number is synthetic-backed), synthetic (0, capability signal pending real traces).

Detector Name Tier F1 Coverage Status
overflow Context Overflow ICP 1.000 real (50) Production
retrieval_quality Retrieval Quality Enterprise 1.000 thin (29) Production
withholding Information Withholding ICP 1.000 real (50) Production
n8n_cycle N8N Cycle - 1.000 thin (14) Production
openclaw_session_loop OpenClaw Session Loop Enterprise 1.000 thin (20) Production
openclaw_tool_abuse OpenClaw Tool Abuse Enterprise 1.000 thin (20) Production
openclaw_channel_mismatch OpenClaw Channel Mismatch Enterprise 1.000 thin (20) Production
approval_bypass Approval Bypass Enterprise 1.000 real (50) Production
chunk_attribution Chunk Attribution (RAG) Enterprise 1.000 thin (20) Production
impersonation_risk Impersonation Risk (safety v2) ICP 0.985 real (85) Production
deception Deception (safety v2) ICP 0.985 real (85) Production
persona_drift Persona Drift ICP 0.983 real (89) Production
consensus_collapse Consensus Collapse ICP 0.967 real (60) Production
silent_cascade Silent Cascade Enterprise 0.966 real (200) Production
specification_compliance Specification Compliance (AgentPex) ICP 0.966 real (30) Production
scope_escalation Scope Escalation (safety v2) ICP 0.960 real (85) Production
openclaw_spawn_chain OpenClaw Spawn Chain Enterprise 0.947 thin (20) Production
role_usurpation Role Usurpation (F9) ICP 0.944 real (170) Production
analytical_semantics Analytical Semantics (AgentFuel) ICP 0.941 real (32) Production
reward_hacking Reward Hacking ICP 0.939 real (50) Production
corruption State Corruption ICP 0.930 real (94) Production
injection Injection Detection ICP 0.925 real (100) Production
role_usurpation_exec Role Usurpation (ChatDev exec coding) ICP 0.917 real (83) Production
rag_poisoning RAG Poisoning ICP 0.916 real (100) Production
openclaw_sandbox_escape OpenClaw Sandbox Escape Enterprise 0.909 thin (20) Production
synthesis_failure Synthesis Failure Enterprise 0.903 real (200) Production
sycophancy Sycophancy ICP 0.902 real (50) Production
hallucination Hallucination Detection ICP 0.888 real (300) Production
specification Specification Mismatch ICP 0.881 real (300) Production
n8n_complexity N8N Complexity Enterprise 0.857 thin (14) Production
multi_agent_contagion Multi-Agent Contagion ICP 0.857 real (60) Production
role_usurpation_canonical Role Usurpation (MAST F9) ICP 0.841 real (97) Production
over_refusal Over-Refusal (safety v2) ICP 0.828 real (300) Production
context Context Neglect ICP 0.800 real (300) Production
openclaw_elevated_risk OpenClaw Elevated Risk Enterprise 0.800 thin (20) Production
context_precision Context Precision (RAG) Enterprise 0.789 real (30) Beta
completion Completion Misjudgment ICP 0.788 real (300) Beta
grounding Grounding Detection Enterprise 0.777 real (300) Beta
decomposition Task Decomposition ICP 0.772 real (300) Beta
chunk_relevance Chunk Relevance (RAG) Enterprise 0.765 thin (28) Beta
under_refusal Under-Refusal (safety v2) ICP 0.706 real (300) Beta
workflow Workflow Analysis ICP 0.667 real (100) Beta
delegation Delegation Failure ICP 0.667 real (47) Beta
communication Communication Breakdown ICP 0.649 real (300) Experimental
loop Loop Detection ICP 0.644 real (300) Experimental
redundant_delegation_conflict Redundant Delegation Conflict Enterprise 0.636 real (200) Experimental
derailment Task Derailment ICP 0.632 real (300) Experimental
n8n_error N8N Error Handling Enterprise 0.609 thin (17) Experimental
coordination Coordination Analysis ICP 0.606 real (300) Experimental
jailbreak_compliance Jailbreak Compliance (safety v2) ICP 0.583 real (300) Experimental
n8n_schema N8N Schema - 0.571 thin (15) Experimental
n8n_timeout N8N Timeout - 0.571 thin (19) Experimental
citation Citation Accuracy Enterprise 0.222 thin (15) Experimental
n8n_resource N8N Resource Limits Enterprise 0.000 thin (13) Failing (needs retraining)

Status Definitions

Status F1 Threshold Meaning
Production >= 0.80 Reliable for production use
Beta 0.65 - 0.79 Usable but may have false positives/negatives
Experimental < 0.65 Under active improvement
Calibration in progress Not yet measured Awaiting a calibration run on representative data

Status is derived from the F1 above. The Coverage column tells you whether that F1 is externally validated (real) or still synthetic-backed (thin/synthetic).

Coverage

Coverage records how many real (external) traces back each F1, so a synthetic number is never mistaken for an externally-validated one:

Coverage Meaning
real (n) Backed by n >= 30 real traces. F1 shown is the real-trace score. Externally validated.
thin (n) Backed by 1 to 29 real traces. F1 shown is the synthetic-backed score; real coverage is too small to validate on its own.
synthetic Zero real traces. F1 is a capability signal on synthetic data, pending real-trace validation.

Detection by Category

Planning Failures (FC1)

Problems in how tasks are specified, decomposed, and organized:

  • F1 Specification Mismatch: Output doesn't match user's original requirements
  • F2 Poor Decomposition: Subtasks are circular, vague, or wrongly granular
  • F3 Resource Misallocation: Agents compete for shared resources (Enterprise)
  • F4 Tool Provision: Required tools are missing or misconfigured (Enterprise)
  • F5 Workflow Design: Unreachable nodes, dead ends, missing error handling

Execution Failures (FC2)

Problems during agent runtime:

  • F6 Task Derailment: Agent goes off-topic (20% prevalence in MAST-Data)
  • F7 Context Neglect: Agent ignores upstream context
  • F8 Information Withholding: Agent omits critical information
  • F9 Role Usurpation: Agent exceeds role boundaries (Enterprise)
  • F10 Communication Breakdown: Inter-agent messages misunderstood
  • F11 Coordination Failure: Handoff failures, circular delegation

Verification Failures (FC3)

Problems in output validation and completion:

  • F12 Output Validation: Validation steps skipped or bypassed (Enterprise)
  • F13 Quality Gate Bypass: Quality thresholds ignored (Enterprise)
  • F14 Completion Misjudgment: Premature completion claims (40% prevalence for F1.5 in MAST-Data)

Extended Detectors

Cross-cutting concerns not in the core MAST taxonomy:

  • Loop Detection: Agents stuck repeating actions
  • Context Overflow: Context window exhaustion
  • Prompt Injection: Attack detection
  • Hallucination: Fabricated information
  • Grounding Failure: Claims unsupported by source documents
  • Retrieval Quality: Wrong or irrelevant documents retrieved
  • Persona Drift: Role/personality deviation
  • State Corruption: Memory/state anomalies
  • Convergence: Metric plateau, regression, thrashing, divergence detection
  • Cost Tracking: Token/cost budget monitoring

Safety Detectors

Agentic behavioral failure modes from the published AI agent safety taxonomy (Apollo, Anthropic, DeepMind, CAIS 2026). Distinct from content moderation (covered by Llama Guard 4 et al.) and distinct from pre-execution prompt-injection filtering (covered by injection).

  • Scope Escalation: Agent performs actions exceeding the declared scope (file_destroy, network_egress, etc.)
  • Jailbreak Compliance: Agent complies with an adversarial input rather than refusing (output-side inverse of injection)
  • Over-Refusal: Agent refuses benign requests
  • Under-Refusal: Agent fails to refuse adversarial requests
  • Impersonation Risk: Agent speaks as an unauthorized real entity (person or organization)
  • Deception: Agent makes false claims about its own actions (e.g., "I ran the tests" with no test execution in the trace)

Cross-Agent Detectors

Multi-agent-specific failure modes. CAIS 2026 paper substrate: "Trace-Level Analysis of Information Contamination" (Galhotra/Cornell) found privacy/safety leakage roughly doubles in multi-agent settings vs single-agent. No productized competitor ships these.

  • Multi-Agent Contagion: Sensitive content (PII, jailbreak directive, instruction override) propagates from agent A's input to agent B's output without authorization. Three-stage gate: marker in A's input then absent from A's output (sanitized) then reappears in B's output via cosine >= 0.40 paraphrase match.

Behavioral Detectors

Agentic behavioral integrity failure modes that depend on cross-referencing claims against trace evidence.

  • Reward Hacking Artifacts: Agent claims completion but trace shows weakened tests, skip/xfail decorators without justification, or trivial-true assertions inserted in test files. Anthropic disclosed Opus 4.5 reward-hacks ~18% of test scenarios; no surveyed competitor productizes this. Heuristic-only, sub-ms latency.

Platform-Specific Detectors

In addition to the general-purpose detectors above, Pisama includes platform-specific detectors that catch issues unique to each framework's architecture:

  • n8n: Schema mismatch, workflow cycles, complexity, error handling, resource exhaustion, timeouts
  • LangGraph: Recursion limits, state corruption, edge misrouting, tool failures, parallel sync, checkpoint corruption
  • Dify: RAG poisoning, iteration escape, silent model fallback, variable leakage, classifier drift, tool schema mismatch
  • OpenClaw: Session loops, tool abuse, elevated privilege risk, spawn chain depth, channel mismatch, sandbox escape
  • Claude Managed Agents: Session stall, tool permission escalation, MCP failure, environment escape, cost overrun, session corruption

These run automatically when traces from the corresponding platform are ingested. Several platform detectors are calibration-pending (see each platform's page for current F1 and coverage).

Detection Pipeline

Each trace is analyzed by the DetectionOrchestrator, which runs applicable detectors using a cheapest-first strategy:

  1. Tier 1: Rule-based (hash, pattern, structural) -- $0.00
  2. Tier 2: State delta analysis -- $0.00
  3. Tier 3: Embedding similarity -- ~$0.001
  4. Tier 4: LLM Judge (Claude) -- ~$0.005-0.05
  5. Tier 5: Human review -- variable

Target: $0.05/trace average. Most traces resolve at Tier 1-2.

See Detection Tiers for the full escalation architecture.