capabilities table
F1 is the real-trace (external-lane) score for detectors with real coverage, and the blended (external+synthetic) score for thin/synthetic detectors. Coverage = real external traces backing the number: real (>=30, externally validated), thin (1-29, number is synthetic-backed), synthetic (0, capability signal pending real traces).
| Detector | F1 | Coverage | Status |
|---|---|---|---|
overflow | 1.000 | real (50) | Production |
withholding | 1.000 | real (50) | Production |
approval_bypass | 1.000 | real (50) | Production |
chunk_attribution | 1.000 | thin (20) | Production |
openclaw_channel_mismatch | 0.993 | thin (20) | Production |
mcp_protocol | 0.990 | synthetic | Production |
impersonation_risk | 0.985 | real (85) | Production |
deception | 0.985 | real (85) | Production |
openclaw_session_loop | 0.985 | thin (20) | Production |
task_starvation | 0.980 | synthetic | Production |
tool_error_recovery | 0.980 | synthetic | Production |
computer_use | 0.977 | synthetic | Production |
entity_confusion | 0.969 | synthetic | Production |
critic_quality | 0.967 | synthetic | Production |
consensus_collapse | 0.967 | real (60) | Production |
specification_compliance | 0.966 | real (30) | Production |
scope_escalation | 0.960 | real (85) | Production |
reward_hacking | 0.960 | real (50) | Production |
n8n_timeout | 0.959 | thin (19) | Production |
dify_rag_poisoning | 0.952 | synthetic | Production |
analytical_semantics | 0.947 | real (32) | Production |
langgraph_tool_failure | 0.944 | synthetic | Production |
scheduled_task | 0.942 | synthetic | Production |
citation | 0.935 | thin (15) | Production |
openclaw_spawn_chain | 0.933 | thin (20) | Production |
n8n_resource | 0.931 | thin (13) | Production |
delegation | 0.930 | real (47) | Production |
corruption | 0.930 | real (94) | Production |
retrieval_quality | 0.930 | thin (29) | Production |
planning_fallacy | 0.920 | synthetic | Production |
persona_drift | 0.920 | real (89) | Production |
rag_poisoning | 0.916 | real (100) | Production |
sycophancy | 0.902 | real (50) | Production |
dify_classifier_drift | 0.900 | synthetic | Production |
langgraph_parallel_sync | 0.899 | synthetic | Production |
langgraph_checkpoint_corruption | 0.896 | synthetic | Production |
subagent_boundary | 0.896 | synthetic | Production |
multi_chain | 0.891 | synthetic | Production |
propagation | 0.889 | synthetic | Production |
adaptive_thinking | 0.887 | synthetic | Production |
openclaw_elevated_risk | 0.886 | thin (20) | Production |
dify_variable_leak | 0.886 | synthetic | Production |
convergence | 0.883 | synthetic | Production |
openclaw_tool_abuse | 0.882 | thin (20) | Production |
orchestration_quality | 0.880 | synthetic | Production |
injection | 0.878 | real (100) | Production |
n8n_complexity | 0.876 | thin (14) | Production |
langgraph_edge_misroute | 0.869 | synthetic | Production |
dispatch_async | 0.869 | synthetic | Production |
openclaw_sandbox_escape | 0.864 | thin (20) | Production |
hallucination | 0.862 | real (250) | Production |
parallel_consistency | 0.860 | synthetic | Production |
role_usurpation | 0.857 | real (170) | Production |
multi_agent_contagion | 0.857 | real (60) | Production |
role_usurpation_exec | 0.851 | real (83) | Production |
role_usurpation_canonical | 0.849 | real (97) | Production |
compaction_quality | 0.847 | synthetic | Production |
n8n_error | 0.844 | thin (17) | Production |
over_refusal | 0.843 | real (300) | Production |
context | 0.838 | real (221) | Production |
langgraph_state_corruption | 0.833 | synthetic | Production |
memory_staleness | 0.822 | synthetic | Production |
reasoning_consistency | 0.816 | synthetic | Production |
grounding | 0.807 | real (215) | Production |
specification | 0.806 | real (276) | Production |
routing | 0.800 | synthetic | Production |
loop | 0.794 | real (129) | Beta |
cowork_safety | 0.792 | synthetic | Beta |
context_precision | 0.789 | real (30) | Beta |
n8n_cycle | 0.781 | thin (14) | Beta |
completion | 0.777 | real (199) | Beta |
n8n_schema | 0.765 | thin (15) | Beta |
chunk_relevance | 0.765 | thin (28) | Beta |
decomposition | 0.738 | real (232) | Beta |
exploration_safety | 0.716 | synthetic | Beta |
model_selection | 0.683 | synthetic | Beta |
workflow | 0.667 | real (100) | Beta |
communication | 0.664 | real (213) | Beta |
authority_gradient | 0.641 | synthetic | Experimental |
derailment | 0.623 | real (223) | Experimental |
coordination | 0.591 | real (235) | Experimental |
under_refusal | 0.562 | real (300) | Experimental |
jailbreak_compliance | 0.507 | real (300) | Experimental |