Planning Failures (FC1)¶
Planning failures occur when the task specification, decomposition, or workflow design is flawed before any agent begins execution.
F1: Specification Mismatch¶
| Field | Value |
|---|---|
| Detector key | specification |
| Tier | ICP |
| Severity | Medium |
| Accuracy | F1 0.857, P 0.923, R 0.800 |
| MAST mapping | FM-1.1 Disobey Task Specification |
Plain language: The agent delivered something different from what was asked for. Like ordering a blue car and receiving a red one -- the work got done, but it doesn't match the original request.
Technical: Measures semantic coverage between the user's specification and the agent's output using embedding similarity, keyword matching, and structural analysis. Detects scope drift, missing requirements, and constraint violations.
Examples (non-technical):
- You ask for a 500-word summary and get back 150 words
- You request a report in Spanish but receive it in English
- The agent completes only 3 of 5 requested tasks
Examples (technical):
- User specifies Python implementation but agent delivers TypeScript
- Agent output uses deprecated
printstatement syntax (Python 2 vs 3) - Task requires REST API endpoints but agent generates GraphQL schema
Detection methods:
- Semantic Coverage: Measures how well output covers each requirement using embeddings
- Keyword Matching: Checks for presence of required elements, topics, and constraints
- Code Quality Checks: Validates language match, deprecated syntax, stub implementations
- Numeric Tolerance: Handles approximate constraints like word counts (within 20%)
Sub-types: scope_drift, missing_requirement, ambiguous_spec, conflicting_spec, overspecified
F2: Poor Task Decomposition¶
| Field | Value |
|---|---|
| Detector key | decomposition |
| Tier | ICP |
| Severity | Medium |
| Accuracy | F1 1.000, P 1.000, R 1.000 |
| MAST mapping | FM-1.2 |
Plain language: The agent broke a big task into smaller pieces badly. Some pieces depend on each other in circles, some are too vague to act on, and the overall breakdown doesn't make sense for the complexity of the work.
Technical: Analyzes task decomposition graphs for structural issues including circular dependencies, granularity mismatches, and vague subtask definitions using dependency analysis and complexity estimation.
Examples (non-technical):
- A project plan where Step 3 requires Step 5 to be done first, but Step 5 requires Step 3
- A subtask that just says "handle the infrastructure" with no specifics
- A simple button change broken into 15 steps when 3 would do
Examples (technical):
- Subtask dependency graph contains cycle:
parse_data→validate_schema→parse_data - Subtask description uses non-actionable language: "etc.", "various components", "if necessary"
- Complex distributed system redesign decomposed into only 2 subtasks, each too broad for a single agent
Detection methods:
- Dependency Analysis: Detects circular, missing, or impossible dependencies
- Granularity Check: Validates task-aware decomposition depth (complex vs simple)
- Vagueness Detection: Flags non-actionable steps using indicator words
- Complexity Estimation: Identifies subtasks too broad for single execution
Sub-types: impossible_subtask, missing_dependency, circular_dependency, duplicate_work, wrong_granularity, missing_subtask, vague_subtask, overly_complex
F3: Resource Misallocation (Enterprise)¶
| Field | Value |
|---|---|
| Detector key | resource_misallocation |
| Tier | Enterprise |
| Severity | High |
| Accuracy | Benchmarking in progress |
| MAST mapping | FM-1.3 |
Plain language: Multiple agents are fighting over the same resources, like two people trying to use one printer at the same time. This causes delays, deadlocks, or wasted capacity.
Technical: Tracks concurrent resource access patterns across agents, detecting contention, starvation conditions, circular wait (deadlock), and inefficient allocation using resource graph analysis.
Examples (non-technical):
- Three agents all need database access at once, causing everything to slow down
- One agent holds a lock indefinitely, preventing all other agents from working
- Most agents sit idle while one agent is completely overloaded
Examples (technical):
- Three agents simultaneously request the same database connection pool, causing pool exhaustion
- Agent A holds write lock on
userstable while Agent B holds lock onorders, both waiting for the other (deadlock) - Load balancer routes 90% of requests to one agent instance while others have zero utilization
Detection methods:
- Contention Analysis: Tracks concurrent resource access requests
- Starvation Detection: Identifies agents that never acquire needed resources
- Deadlock Graph: Analyzes circular wait conditions
- Efficiency Scoring: Measures resource utilization distribution
Sub-types: contention, starvation, deadlock_risk, inefficient_allocation, excessive_wait, resource_exhaustion
F4: Inadequate Tool Provision (Enterprise)¶
| Field | Value |
|---|---|
| Detector key | tool_provision |
| Tier | Enterprise |
| Severity | High |
| Accuracy | Benchmarking in progress |
| MAST mapping | FM-1.4 |
Plain language: The agent doesn't have the right tools to do its job. It's like asking someone to build furniture but not giving them a screwdriver -- they'll either fail or improvise badly.
Technical: Compares attempted tool invocations against the provisioned tool inventory, detecting hallucinated tool names, capability gaps, and manual workarounds that indicate missing tools.
Examples (non-technical):
- Agent tries to search a database but that capability was never set up
- Agent manually copies data from a website because it lacks a proper data connector
- Agent keeps failing because the tool it needs doesn't support the required file format
Examples (technical):
- Agent calls
search_database()but no such tool exists in its tool registry - Agent hallucinates tool name
web_search_v2-- onlyweb_searchis provisioned - Agent writes raw HTTP requests to scrape data because it lacks an API client tool
Detection methods:
- Tool Inventory Check: Compares attempted tool calls against available tools
- Hallucinated Tool Detection: Identifies tool names not in the provisioned set
- Workaround Detection: Flags manual approaches that suggest missing tools
- Capability Gap Analysis: Matches task requirements against tool capabilities
Sub-types: missing_tool, hallucinated_tool, tool_capability_gap, workaround_detected, tool_call_failure
F5: Flawed Workflow Design¶
| Field | Value |
|---|---|
| Detector key | workflow |
| Tier | ICP |
| Severity | High |
| Accuracy | F1 0.667, P 0.517, R 0.938 |
| MAST mapping | FM-1.5 |
Plain language: The workflow itself is badly designed -- some steps can never be reached, some paths have no ending, and there's no plan for what happens when things go wrong.
Technical: Performs graph traversal on workflow DAGs to detect unreachable nodes, dead-end paths, missing error handlers, bottleneck nodes, and excessive sequential depth.
Examples (non-technical):
- A workflow step exists but no path ever leads to it -- it's orphaned
- A workflow path has no finish line -- the process never ends
- If any single step fails, the entire workflow crashes because there's no error handling
Examples (technical):
- Node
validate_outputis unreachable from the start node in the workflow DAG - Execution path through
process → transform → enrichhas no terminal node - AI processing nodes lack try/catch -- a single
APIErrorcrashes the entire pipeline - All 8 parallel paths funnel through a single
aggregatenode (bottleneck)
Detection methods:
- Graph Traversal: Checks reachability of all nodes from start
- Dead End Detection: Identifies paths with no terminal nodes
- Error Handler Audit: Verifies error handling on critical nodes
- Bottleneck Analysis: Detects nodes with disproportionate in-degree
- Depth Analysis: Flags excessively deep sequential chains
Sub-types: unreachable_node, dead_end, missing_error_handling, infinite_loop_risk, bottleneck, missing_termination, orphan_node, excessive_depth