check_compliance (beta)¶

pisama_agent_sdk.check_compliance runs Pisama's specification-compliance detector on an agent trace. It extracts behavioural rules from the agent's system prompt and reports any rule violations the trace contains.

Beta API

This call is gated behind a feature flag and the response shape may change before it reaches GA. Pin the SDK version if you depend on the field set.

What it does¶

Given the agent's system prompt and a list of trace events (tool calls, agent messages, user messages, tool results), the detector runs a two-stage LLM pipeline:

The first stage extracts behavioural rules from the system prompt: things the agent must do, must not do, or must do conditionally.
The second stage scans the trace for each rule. Concrete rules (such as a forbidden tool name) match deterministically. Ambiguous rules escalate to a single LLM call per rule.

The return value is a ComplianceResult containing the detection verdict, each detected violation with evidence, the full extracted rule set (for transparency), and the LLM cost of the run.

Enable the feature flag¶

Set PISAMA_ENABLE_CHECK_COMPLIANCE=1 in the environment that runs the SDK. Any of 1, true, yes (case-insensitive) work. Without the flag the call raises PisamaFeatureNotEnabledError immediately, before any network round-trip.

export PISAMA_ENABLE_CHECK_COMPLIANCE=1

Minimal example¶

import asyncio
from pisama_agent_sdk import check_compliance

async def main():
    result = await check_compliance(
        system_prompt=(
            "You are a careful database assistant. "
            "Never call drop_table. Always confirm with the user "
            "before running destructive SQL."
        ),
        trace_events=[
            {"type": "user_message", "content": "Clean up old records."},
            {"type": "tool_call", "name": "drop_table", "args": {"name": "users"}},
        ],
    )
    if result.detected:
        for v in result.violations:
            print(f"{v.rule_id}: {v.explanation}")
            print(f"  evidence: {v.evidence}")
            print(f"  confidence: {v.confidence:.2f}")
    print(f"cost: ${result.cost_usd:.4f} ({result.tokens_used} tokens)")

asyncio.run(main())

Result shape¶

ComplianceResult has these fields:

Field	Type	Description
`detected`	`bool`	True if any rule was violated.
`confidence`	`float`	0.0 to 1.0; the maximum violation confidence.
`violations`	`list[Violation]`	Each detected rule violation.
`extracted_rules`	`list[BehavioralRule]`	Every rule the extractor pulled from the prompt.
`tokens_used`	`int`	Total LLM tokens across both pipeline stages.
`cost_usd`	`float`	Total LLM cost in USD.

Violation carries rule_id, evidence, explanation, and confidence. BehavioralRule carries rule_id, description, trigger, optional required_action and forbidden_action, plus a severity of critical, high, medium, or low.

Errors¶

Exception	When
`PisamaFeatureNotEnabledError`	Flag is unset or set to a falsy value.
`urllib.error.URLError`	Backend is unreachable.
`TimeoutError`	Backend call exceeds `timeout_ms` (default 60 seconds).

A backend response with malformed JSON is treated as a soft failure: the SDK returns an empty ComplianceResult rather than crashing the agent.

SDK Quickstart for the always-on check() call.
Failure modes for the broader detector catalogue.