Skip to content

Failure Modes

Pisama detects 22 failure modes in multi-agent LLM systems, organized into 4 categories based on the MAST: Multi-Agent System Failure Taxonomy.

Overview

Category MAST Code Modes Description
Planning Failures FC1 F1-F5 Problems in task specification, decomposition, and workflow design
Execution Failures FC2 F6-F11 Problems during agent execution including derailment, withholding, and coordination
Verification Failures FC3 F12-F14 Problems in output validation, quality gates, and completion judgment
Extended Detectors -- 9 modes Cross-cutting concerns: loops, injection, hallucination, corruption, etc.

Tier Classification

  • ICP (Always Available): 16 detectors included in all plans
  • Enterprise (Feature Flags Required): 5 detectors requiring ml_detection or advanced_evals feature flags

All 21 Failure Modes

MAST ID Name Detector Key Category Tier
F1 Specification Mismatch specification Planning ICP
F2 Poor Task Decomposition decomposition Planning ICP
F3 Resource Misallocation resource_misallocation Planning Enterprise
F4 Inadequate Tool Provision tool_provision Planning Enterprise
F5 Flawed Workflow Design workflow Planning ICP
F6 Task Derailment derailment Execution ICP
F7 Context Neglect context Execution ICP
F8 Information Withholding withholding Execution ICP
F9 Role Usurpation role_usurpation Execution Enterprise
F10 Communication Breakdown communication Execution ICP
F11 Coordination Failure coordination Execution ICP
F12 Output Validation Failure output_validation Verification Enterprise
F13 Quality Gate Bypass quality_gate Verification Enterprise
F14 Completion Misjudgment completion Verification ICP
-- Loop Detection loop Extended ICP
-- Context Overflow overflow Extended ICP
-- Prompt Injection injection Extended ICP
-- Hallucination hallucination Extended ICP
-- Grounding Failure grounding Extended ICP
-- Retrieval Quality retrieval_quality Extended Enterprise
-- Persona Drift persona_drift Extended ICP
-- State Corruption corruption Extended ICP
-- Cost Tracking cost Extended ICP

Accuracy Summary

Production Detectors (F1 >= 0.80)

Detector F1 Precision Recall Tier
Prompt Injection 0.944 0.983 0.908 ICP
Persona Drift 0.932 0.899 0.969 ICP
State Corruption 0.906 0.955 0.863 ICP
Info Withholding 0.874 0.805 0.957 ICP
Context Neglect 0.868 0.805 0.943 ICP
Loop Detection 0.846 0.829 0.863 ICP
Retrieval Quality 0.824 0.718 0.968 Enterprise
Context Overflow 0.823 1.000 0.699 ICP
Task Derailment 0.820 0.702 0.985 ICP
Communication Breakdown 0.818 0.724 0.940 ICP

Beta Detectors (F1 0.70-0.79)

Detector F1 Precision Recall Tier
Coordination Failure 0.797 0.836 0.761 ICP
Flawed Workflow 0.797 0.851 0.750 ICP
Hallucination 0.772 0.718 0.836 ICP
Completion Misjudgment 0.745 0.687 0.814 ICP
Poor Decomposition 0.727 0.727 0.727 ICP
Specification Mismatch 0.703 0.592 0.866 ICP

Emerging (F1 < 0.70)

Detector F1 Precision Recall Tier
Grounding Failure 0.671 0.636 0.710 ICP

For detailed descriptions of each failure mode including real-world examples, detection methods, and sub-types, see the Detection Reference.

References