Detection Overview¶
Pisama ships 87 detectors across the MAST failure taxonomy, spanning planning, execution, verification, safety, and 5 agent frameworks. 54 are measured against real and synthetic traces, and 25 are externally validated at production grade (real-trace F1 of 0.80 or higher, mean 0.93). The rest are in active calibration.
Detectors are organized by the MAST taxonomy, with extensions for enterprise use cases and the published AI agent safety taxonomy. The per-detector F1 below is the real-trace score where real coverage exists and the synthetic-backed score otherwise; the Coverage column shows how many real traces back each number.
Failure Mode Summary¶
F1 is the real-trace (external-lane) score for detectors with real coverage, and the blended (external+synthetic) score for thin/synthetic detectors. Coverage = real external traces backing the number: real (>=30, externally validated), thin (1-29, number is synthetic-backed), synthetic (0, capability signal pending real traces).
| Detector | Name | Tier | F1 | Coverage | Status |
|---|---|---|---|---|---|
overflow | Context Overflow | ICP | 1.000 | real (50) | Production |
retrieval_quality | Retrieval Quality | Enterprise | 1.000 | thin (29) | Production |
withholding | Information Withholding | ICP | 1.000 | real (50) | Production |
n8n_cycle | N8N Cycle | - | 1.000 | thin (14) | Production |
openclaw_session_loop | OpenClaw Session Loop | Enterprise | 1.000 | thin (20) | Production |
openclaw_tool_abuse | OpenClaw Tool Abuse | Enterprise | 1.000 | thin (20) | Production |
openclaw_channel_mismatch | OpenClaw Channel Mismatch | Enterprise | 1.000 | thin (20) | Production |
approval_bypass | Approval Bypass | Enterprise | 1.000 | real (50) | Production |
chunk_attribution | Chunk Attribution (RAG) | Enterprise | 1.000 | thin (20) | Production |
impersonation_risk | Impersonation Risk (safety v2) | ICP | 0.985 | real (85) | Production |
deception | Deception (safety v2) | ICP | 0.985 | real (85) | Production |
persona_drift | Persona Drift | ICP | 0.983 | real (89) | Production |
consensus_collapse | Consensus Collapse | ICP | 0.967 | real (60) | Production |
silent_cascade | Silent Cascade | Enterprise | 0.966 | real (200) | Production |
specification_compliance | Specification Compliance (AgentPex) | ICP | 0.966 | real (30) | Production |
scope_escalation | Scope Escalation (safety v2) | ICP | 0.960 | real (85) | Production |
openclaw_spawn_chain | OpenClaw Spawn Chain | Enterprise | 0.947 | thin (20) | Production |
role_usurpation | Role Usurpation (F9) | ICP | 0.944 | real (170) | Production |
analytical_semantics | Analytical Semantics (AgentFuel) | ICP | 0.941 | real (32) | Production |
reward_hacking | Reward Hacking | ICP | 0.939 | real (50) | Production |
corruption | State Corruption | ICP | 0.930 | real (94) | Production |
injection | Injection Detection | ICP | 0.925 | real (100) | Production |
role_usurpation_exec | Role Usurpation (ChatDev exec coding) | ICP | 0.917 | real (83) | Production |
rag_poisoning | RAG Poisoning | ICP | 0.916 | real (100) | Production |
openclaw_sandbox_escape | OpenClaw Sandbox Escape | Enterprise | 0.909 | thin (20) | Production |
synthesis_failure | Synthesis Failure | Enterprise | 0.903 | real (200) | Production |
sycophancy | Sycophancy | ICP | 0.902 | real (50) | Production |
hallucination | Hallucination Detection | ICP | 0.888 | real (300) | Production |
specification | Specification Mismatch | ICP | 0.881 | real (300) | Production |
n8n_complexity | N8N Complexity | Enterprise | 0.857 | thin (14) | Production |
multi_agent_contagion | Multi-Agent Contagion | ICP | 0.857 | real (60) | Production |
role_usurpation_canonical | Role Usurpation (MAST F9) | ICP | 0.841 | real (97) | Production |
over_refusal | Over-Refusal (safety v2) | ICP | 0.828 | real (300) | Production |
context | Context Neglect | ICP | 0.800 | real (300) | Production |
openclaw_elevated_risk | OpenClaw Elevated Risk | Enterprise | 0.800 | thin (20) | Production |
context_precision | Context Precision (RAG) | Enterprise | 0.789 | real (30) | Beta |
completion | Completion Misjudgment | ICP | 0.788 | real (300) | Beta |
grounding | Grounding Detection | Enterprise | 0.777 | real (300) | Beta |
decomposition | Task Decomposition | ICP | 0.772 | real (300) | Beta |
chunk_relevance | Chunk Relevance (RAG) | Enterprise | 0.765 | thin (28) | Beta |
under_refusal | Under-Refusal (safety v2) | ICP | 0.706 | real (300) | Beta |
workflow | Workflow Analysis | ICP | 0.667 | real (100) | Beta |
delegation | Delegation Failure | ICP | 0.667 | real (47) | Beta |
communication | Communication Breakdown | ICP | 0.649 | real (300) | Experimental |
loop | Loop Detection | ICP | 0.644 | real (300) | Experimental |
redundant_delegation_conflict | Redundant Delegation Conflict | Enterprise | 0.636 | real (200) | Experimental |
derailment | Task Derailment | ICP | 0.632 | real (300) | Experimental |
n8n_error | N8N Error Handling | Enterprise | 0.609 | thin (17) | Experimental |
coordination | Coordination Analysis | ICP | 0.606 | real (300) | Experimental |
jailbreak_compliance | Jailbreak Compliance (safety v2) | ICP | 0.583 | real (300) | Experimental |
n8n_schema | N8N Schema | - | 0.571 | thin (15) | Experimental |
n8n_timeout | N8N Timeout | - | 0.571 | thin (19) | Experimental |
citation | Citation Accuracy | Enterprise | 0.222 | thin (15) | Experimental |
n8n_resource | N8N Resource Limits | Enterprise | 0.000 | thin (13) | Failing (needs retraining) |
Status Definitions¶
| Status | F1 Threshold | Meaning |
|---|---|---|
| Production | >= 0.80 | Reliable for production use |
| Beta | 0.65 - 0.79 | Usable but may have false positives/negatives |
| Experimental | < 0.65 | Under active improvement |
| Calibration in progress | Not yet measured | Awaiting a calibration run on representative data |
Status is derived from the F1 above. The Coverage column tells you whether that F1 is externally validated (real) or still synthetic-backed (thin/synthetic).
Coverage¶
Coverage records how many real (external) traces back each F1, so a synthetic number is never mistaken for an externally-validated one:
| Coverage | Meaning |
|---|---|
| real (n) | Backed by n >= 30 real traces. F1 shown is the real-trace score. Externally validated. |
| thin (n) | Backed by 1 to 29 real traces. F1 shown is the synthetic-backed score; real coverage is too small to validate on its own. |
| synthetic | Zero real traces. F1 is a capability signal on synthetic data, pending real-trace validation. |
Detection by Category¶
Planning Failures (FC1)¶
Problems in how tasks are specified, decomposed, and organized:
- F1 Specification Mismatch: Output doesn't match user's original requirements
- F2 Poor Decomposition: Subtasks are circular, vague, or wrongly granular
- F3 Resource Misallocation: Agents compete for shared resources (Enterprise)
- F4 Tool Provision: Required tools are missing or misconfigured (Enterprise)
- F5 Workflow Design: Unreachable nodes, dead ends, missing error handling
Execution Failures (FC2)¶
Problems during agent runtime:
- F6 Task Derailment: Agent goes off-topic (20% prevalence in MAST-Data)
- F7 Context Neglect: Agent ignores upstream context
- F8 Information Withholding: Agent omits critical information
- F9 Role Usurpation: Agent exceeds role boundaries (Enterprise)
- F10 Communication Breakdown: Inter-agent messages misunderstood
- F11 Coordination Failure: Handoff failures, circular delegation
Verification Failures (FC3)¶
Problems in output validation and completion:
- F12 Output Validation: Validation steps skipped or bypassed (Enterprise)
- F13 Quality Gate Bypass: Quality thresholds ignored (Enterprise)
- F14 Completion Misjudgment: Premature completion claims (40% prevalence for F1.5 in MAST-Data)
Extended Detectors¶
Cross-cutting concerns not in the core MAST taxonomy:
- Loop Detection: Agents stuck repeating actions
- Context Overflow: Context window exhaustion
- Prompt Injection: Attack detection
- Hallucination: Fabricated information
- Grounding Failure: Claims unsupported by source documents
- Retrieval Quality: Wrong or irrelevant documents retrieved
- Persona Drift: Role/personality deviation
- State Corruption: Memory/state anomalies
- Convergence: Metric plateau, regression, thrashing, divergence detection
- Cost Tracking: Token/cost budget monitoring
Safety Detectors¶
Agentic behavioral failure modes from the published AI agent safety taxonomy (Apollo, Anthropic, DeepMind, CAIS 2026). Distinct from content moderation (covered by Llama Guard 4 et al.) and distinct from pre-execution prompt-injection filtering (covered by injection).
- Scope Escalation: Agent performs actions exceeding the declared scope (
file_destroy,network_egress, etc.) - Jailbreak Compliance: Agent complies with an adversarial input rather than refusing (output-side inverse of
injection) - Over-Refusal: Agent refuses benign requests
- Under-Refusal: Agent fails to refuse adversarial requests
- Impersonation Risk: Agent speaks as an unauthorized real entity (person or organization)
- Deception: Agent makes false claims about its own actions (e.g., "I ran the tests" with no test execution in the trace)
Cross-Agent Detectors¶
Multi-agent-specific failure modes. CAIS 2026 paper substrate: "Trace-Level Analysis of Information Contamination" (Galhotra/Cornell) found privacy/safety leakage roughly doubles in multi-agent settings vs single-agent. No productized competitor ships these.
- Multi-Agent Contagion: Sensitive content (PII, jailbreak directive, instruction override) propagates from agent A's input to agent B's output without authorization. Three-stage gate: marker in A's input then absent from A's output (sanitized) then reappears in B's output via cosine >= 0.40 paraphrase match.
Behavioral Detectors¶
Agentic behavioral integrity failure modes that depend on cross-referencing claims against trace evidence.
- Reward Hacking Artifacts: Agent claims completion but trace shows weakened tests, skip/xfail decorators without justification, or trivial-true assertions inserted in test files. Anthropic disclosed Opus 4.5 reward-hacks ~18% of test scenarios; no surveyed competitor productizes this. Heuristic-only, sub-ms latency.
Platform-Specific Detectors¶
In addition to the general-purpose detectors above, Pisama includes platform-specific detectors that catch issues unique to each framework's architecture:
- n8n: Schema mismatch, workflow cycles, complexity, error handling, resource exhaustion, timeouts
- LangGraph: Recursion limits, state corruption, edge misrouting, tool failures, parallel sync, checkpoint corruption
- Dify: RAG poisoning, iteration escape, silent model fallback, variable leakage, classifier drift, tool schema mismatch
- OpenClaw: Session loops, tool abuse, elevated privilege risk, spawn chain depth, channel mismatch, sandbox escape
- Claude Managed Agents: Session stall, tool permission escalation, MCP failure, environment escape, cost overrun, session corruption
These run automatically when traces from the corresponding platform are ingested. Several platform detectors are calibration-pending (see each platform's page for current F1 and coverage).
Detection Pipeline¶
Each trace is analyzed by the DetectionOrchestrator, which runs applicable detectors using a cheapest-first strategy:
- Tier 1: Rule-based (hash, pattern, structural) -- $0.00
- Tier 2: State delta analysis -- $0.00
- Tier 3: Embedding similarity -- ~$0.001
- Tier 4: LLM Judge (Claude) -- ~$0.005-0.05
- Tier 5: Human review -- variable
Target: $0.05/trace average. Most traces resolve at Tier 1-2.
See Detection Tiers for the full escalation architecture.