Detection Tiers¶

Pisama uses a tiered escalation system to balance detection cost and accuracy. Each trace is analyzed starting from the cheapest tier, escalating only when lower tiers are inconclusive.

Tier Overview¶

Tier	Method	Cost per Trace	When Used
Tier 1	Hash-based detection	< $0.001	Always -- fastest, cheapest
Tier 2	State delta analysis	$0.005 - $0.01	When Tier 1 confidence is low
Tier 3	Embedding / ML detection	$0.01 - $0.02	When Tier 2 is inconclusive
Tier 4	LLM-as-Judge	$0.05 - $0.10	Gray zone cases requiring reasoning
Tier 5	Human review	Variable	When all automated tiers are uncertain

Target cost: $0.05 per trace average. Most traces resolve at Tier 1-2.

Tier Details¶

Tier 1: Rule-Based Detection¶

The fastest and cheapest tier. Uses deterministic algorithms with zero LLM cost.

Hash collision: SHA256 hash of normalized state delta -- if two states hash identically, they contain the same data
Pattern matching: Regex patterns for injection detection (60+ patterns across 6 attack categories)
Structural matching: Compares agent IDs and state delta keys between consecutive states
Threshold checks: Token counts, cost budgets, context window utilization

Typical confidence when matched: 0.80 - 0.96

Tier 2: State Delta Analysis¶

Analyzes differences between consecutive states to detect anomalies.

Cross-field consistency: Validates relationships (start_date < end_date, min < max)
Domain constraint validation: Age 0-150, price >= 0, valid email/URL formats
Velocity analysis: Detects abnormal rate of state changes
Type drift detection: Catches when a numeric field suddenly contains a string
Null/disappearance detection: Flags when 3+ fields vanish simultaneously

Typical confidence when matched: 0.70 - 0.90

Tier 3: Embedding / ML Detection¶

Uses embedding models to detect semantic patterns invisible to rules.

Semantic similarity: Embedding distance between task description and output (derailment, context neglect)
Semantic clustering: KMeans clustering on state embeddings to detect loop patterns
Role embedding comparison: Compares agent behavior vectors against role definitions (persona drift)
Grounding score: Measures output alignment against source documents

Models used:

E5-large-instruct (1024 dimensions)
nomic-embed-text-v1.5 (768 dimensions)
Ensemble mode for higher accuracy

Typical confidence when matched: 0.65 - 0.85

Tier 4: LLM-as-Judge¶

When Tier 3 is still uncertain (confidence in the gray zone 0.35-0.65), the LLM Judge provides reasoning-based verification.

Model routing by failure mode:

Judge Tier	Failure Modes	Model	Cost per 1M tokens
Tier 1 (low-stakes)	F3, F7, F11, F12	Gemini Flash Lite	$0.10 / $0.40
Tier 2 (default)	F1, F2, F4, F5, F10, F13	Claude Sonnet	$3 / $15
Tier 3 (high-stakes)	F6, F8, F9, F14	Claude Sonnet (thinking)	Extended thinking

Features:

RAG retrieval: Few-shot examples from pgvector (similarity >= 0.65)
Caching: SHA256-based cache with LRU eviction (max 1000 entries)
Cost tracking: Per-tier and per-provider spend recorded via JudgeCostTracker
No-downgrade set: Some detectors (coordination, grounding) have high precision from rules -- the LLM judge can only boost confidence, never reduce it

Tier 5: Human Review¶

For critical decisions where all automated tiers are uncertain. Involves routing the detection to a human reviewer through the dashboard or webhook notification.

Gray Zone Handling¶

When a detector's confidence falls in the range [0.35, 0.65], the result is classified as "uncertain" and escalates to the next tier. This prevents the system from committing to low-confidence decisions.

Confidence < 0.35  →  Classified as negative (no failure)
Confidence 0.35-0.65  →  Gray zone → escalate to next tier
Confidence > 0.65  →  Classified as positive (failure detected)

Tier Configuration¶

The tiered system is configured per detector type:

@dataclass
class TierConfig:
    rule_confidence_threshold: float = 0.7
    cheap_ai_confidence_threshold: float = 0.8
    expensive_ai_confidence_threshold: float = 0.85
    gray_zone_lower: float = 0.35
    gray_zone_upper: float = 0.65
    enable_cheap_ai: bool = True
    enable_expensive_ai: bool = True
    enable_human_escalation: bool = True
    track_costs: bool = True

Feature Availability by Tier¶

Feature	Free	Startup	Growth	Enterprise
Loop detection	Yes	Yes	Yes	Yes
State corruption	Yes	Yes	Yes	Yes
Persona drift	Yes	Yes	Yes	Yes
Coordination analysis	Yes	Yes	Yes	Yes
Hallucination	Yes	Yes	Yes	Yes
Injection detection	Yes	Yes	Yes	Yes
Context overflow	Yes	Yes	Yes	Yes
Task derailment	Yes	Yes	Yes	Yes
ML-based detection	--	--	--	Yes
Tiered LLM-judge	--	--	--	Yes
Turn-aware detection	--	--	--	Yes
Quality gate	--	--	--	Yes

Enterprise features require the FEATURE_ML_DETECTION or FEATURE_ADVANCED_EVALS flags. See Configuration for details.