capabilities table

F1 is the real-trace (external-lane) score for detectors with real coverage, and the blended (external+synthetic) score for thin/synthetic detectors. Coverage = real external traces backing the number: real (>=30, externally validated), thin (1-29, number is synthetic-backed), synthetic (0, capability signal pending real traces).

Detector F1 Coverage Status
overflow 1.000 real (50) Production
withholding 1.000 real (50) Production
approval_bypass 1.000 real (50) Production
chunk_attribution 1.000 thin (20) Production
openclaw_channel_mismatch 0.993 thin (20) Production
mcp_protocol 0.990 synthetic Production
impersonation_risk 0.985 real (85) Production
deception 0.985 real (85) Production
openclaw_session_loop 0.985 thin (20) Production
task_starvation 0.980 synthetic Production
tool_error_recovery 0.980 synthetic Production
computer_use 0.977 synthetic Production
entity_confusion 0.969 synthetic Production
critic_quality 0.967 synthetic Production
consensus_collapse 0.967 real (60) Production
specification_compliance 0.966 real (30) Production
scope_escalation 0.960 real (85) Production
reward_hacking 0.960 real (50) Production
n8n_timeout 0.959 thin (19) Production
dify_rag_poisoning 0.952 synthetic Production
analytical_semantics 0.947 real (32) Production
langgraph_tool_failure 0.944 synthetic Production
scheduled_task 0.942 synthetic Production
citation 0.935 thin (15) Production
openclaw_spawn_chain 0.933 thin (20) Production
n8n_resource 0.931 thin (13) Production
delegation 0.930 real (47) Production
corruption 0.930 real (94) Production
retrieval_quality 0.930 thin (29) Production
planning_fallacy 0.920 synthetic Production
persona_drift 0.920 real (89) Production
rag_poisoning 0.916 real (100) Production
sycophancy 0.902 real (50) Production
dify_classifier_drift 0.900 synthetic Production
langgraph_parallel_sync 0.899 synthetic Production
langgraph_checkpoint_corruption 0.896 synthetic Production
subagent_boundary 0.896 synthetic Production
multi_chain 0.891 synthetic Production
propagation 0.889 synthetic Production
adaptive_thinking 0.887 synthetic Production
openclaw_elevated_risk 0.886 thin (20) Production
dify_variable_leak 0.886 synthetic Production
convergence 0.883 synthetic Production
openclaw_tool_abuse 0.882 thin (20) Production
orchestration_quality 0.880 synthetic Production
injection 0.878 real (100) Production
n8n_complexity 0.876 thin (14) Production
langgraph_edge_misroute 0.869 synthetic Production
dispatch_async 0.869 synthetic Production
openclaw_sandbox_escape 0.864 thin (20) Production
hallucination 0.862 real (250) Production
parallel_consistency 0.860 synthetic Production
role_usurpation 0.857 real (170) Production
multi_agent_contagion 0.857 real (60) Production
role_usurpation_exec 0.851 real (83) Production
role_usurpation_canonical 0.849 real (97) Production
compaction_quality 0.847 synthetic Production
n8n_error 0.844 thin (17) Production
over_refusal 0.843 real (300) Production
context 0.838 real (221) Production
langgraph_state_corruption 0.833 synthetic Production
memory_staleness 0.822 synthetic Production
reasoning_consistency 0.816 synthetic Production
grounding 0.807 real (215) Production
specification 0.806 real (276) Production
routing 0.800 synthetic Production
loop 0.794 real (129) Beta
cowork_safety 0.792 synthetic Beta
context_precision 0.789 real (30) Beta
n8n_cycle 0.781 thin (14) Beta
completion 0.777 real (199) Beta
n8n_schema 0.765 thin (15) Beta
chunk_relevance 0.765 thin (28) Beta
decomposition 0.738 real (232) Beta
exploration_safety 0.716 synthetic Beta
model_selection 0.683 synthetic Beta
workflow 0.667 real (100) Beta
communication 0.664 real (213) Beta
authority_gradient 0.641 synthetic Experimental
derailment 0.623 real (223) Experimental
coordination 0.591 real (235) Experimental
under_refusal 0.562 real (300) Experimental
jailbreak_compliance 0.507 real (300) Experimental