Skip to content

Pisama

Failure detection for multi-agent LLM systems. 27 production-grade detectors across 4 frameworks. Open source. MIT licensed.

Get started in 3 minutes View on GitHub

  • Python SDK


    pip install pisama — detect failures in 3 lines. No server needed.

    SDK Quickstart

  • Full Platform


    Dashboard, tiered detection, self-healing, REST API. Docker Compose.

    Platform Quickstart

  • API Reference


    REST API for traces, detections, healing, and integrations.

    API docs

  • Detection Reference


    Per-detector documentation with F1 scores and accuracy benchmarks.

    Detectors

  • Cookbook


    Integration examples for LangGraph, n8n, Dify, OpenClaw, CrewAI, and Claude.

    Cookbook

  • OSS vs Cloud


    Compare the free SDK with the full platform.

    Compare


What Pisama detects

LLM agents fail silently. A coding agent loops for 40 minutes. A research agent hallucinates citations. A support agent drifts from its persona. Standard monitoring misses all of it.

Pisama provides 41 calibrated detectors (17 core + 24 framework-specific) built on the MAST taxonomy:

27 production-grade F1 >= 0.80 — loop detection, state corruption, coordination failure, persona drift, and more
5-tier escalation Hash ($0.00) state delta embeddings LLM judge ($0.02) human review
4 frameworks Purpose-built detectors for LangGraph, n8n, Dify, and OpenClaw
$0.05 avg/trace 90%+ resolved at Tier 1-2, zero LLM cost
OTEL native OpenTelemetry with gen_ai.* semantic conventions
Self-healing Fix generation, approval workflows, rollback

Supported frameworks

LangGraph · n8n · Dify · OpenClaw · CrewAI · AutoGen · Claude Code · Claude Managed Agents · Any OTEL source