BLACKBOX
flight recorder for AI agents
INTERACTIVE TUTORIAL · 15 MINUTES

Build a tamper-evident AI agent recorder

From zero to a fully hash-chained, auditable AI agent run — with chain verification, Merkle anchoring, and a compliance report. Every step produces real output you can inspect.

⏱ 15 min·Python 3.11+·No prior experience needed·Real API · real data
Choose your scenario:
1
Install
2
Configure
3
First Run
4
Add Events
5
Instrument
6
Verify Chain
7
Dashboard
8
Alerts
9
Report
10
Production
What you'll build
A hash-chained run
Every event cryptographically linked — tamper one byte and the chain breaks detectably.
🔍
A public audit certificate
A URL you can share with any auditor. No auth. Chain validity computed fresh on every request.
📅
A Merkle anchor
Proof your run existed before a given timestamp. Verifiable with log(n) hashes.
📄
A compliance report
Date-filtered PDF with fault rates, model breakdown, and chain integrity for risk committees.
01

Install the SDK

BLACKBOX SDK has zero external dependencies — stdlib urllib only. It works in any Python 3.11+ environment including Lambda, Cloud Run, and Docker.

terminal
pip install blackbox-sdk
verify installation
python -c "import blackbox_sdk as bb; print(bb.__version__)"
# → 0.3.1
TypeScript? Install with npm install @blackbox-ai/sdk — same API, same chain guarantees.
02

Configure the client

Point the SDK at your BLACKBOX instance. In production, use an environment variable.

agent.py
import blackbox_sdk as bb

# Point to your BLACKBOX instance
bb.configure(api_url="https://blackbox-gold.vercel.app")

# Verify connectivity
import urllib.request, json
resp = urllib.request.urlopen("https://blackbox-gold.vercel.app/healthz")
print(json.loads(resp.read()))  # → {"status": "ok"}
Production tip: Set BLACKBOX_API_URL as an env var. The SDK reads it automatically — no hardcoded URLs in code.
03

Record your first run

A run is one complete agent session — from genesis (the system prompt and config) to seal (the final tamper-evident hash). Use the context manager: it seals automatically on exit and records a fault event if an unhandled exception occurs.

agent.py
import blackbox_sdk as bb

bb.configure(api_url="https://blackbox-gold.vercel.app")

SYSTEM_PROMPT = """You are a credit risk assessment agent. Evaluate loan applications."""

with bb.run(
    "credit_risk_001",
    model="claude-sonnet-4-6",
    system_prompt=SYSTEM_PROMPT,
    tools=["data_lookup", "risk_calculator"],
    sampling={"temperature": 0.1, "max_tokens": 2048},
) as run:
    # Your agent logic goes here
    pass
# ↑ Sealed automatically. Chain hash computed and stored.

print(f"Run sealed: credit_risk_001")
What just happened?
BLACKBOX created a genesis event — the immutable record of your agent's starting configuration. The system prompt, tool manifest, model, and sampling parameters are all hashed and sealed. This is the root of your chain:
genesis hash (example)
hash_0 = sha256("GENESIS" + canonical_json({
  "run_id": "credit_risk_001",
  "type": "genesis",
  "system_prompt": "You are a credit risk assessment agent. Evaluate loan applications.",
  ...
}))
04

Add reasoning, tool calls, and output

Real agent runs have multiple steps. Each one becomes a chained event — every hash includes the previous hash, so the entire sequence is tamper-evident.

agent.py — full run
import blackbox_sdk as bb

bb.configure(api_url="https://blackbox-gold.vercel.app")

with bb.run(
    "credit_risk_001_full",
    model="claude-sonnet-4-6",
    system_prompt="You are a credit risk assessment agent. Evaluate loan applications.",
) as run:

    # Step 1: reasoning — record what the agent is thinking
    run.reasoning(
        "Analysing input data to identify risk factors",
        latency_ms=340,
    )

    # Step 2: tool call — record what tools were used and the results
    run.tool_call(
        "data_lookup",
        inputs={"entity_id": "LOAN-9182", "fields": ["credit_score", "income"]},
        result={"credit_score": 710, "income": 85000, "debt_ratio": 0.32},
        latency_ms=820,
    )

    # Step 3: another reasoning step
    run.reasoning(
        "Credit score 710 with 32% debt ratio — borderline. Requesting additional verification.",
        latency_ms=290,
    )

    # Step 4: output — the agent's final response
    run.output(
        "Application LOAN-9182: CONDITIONAL APPROVAL. "
        "Credit score meets minimum threshold. Debt ratio requires verification.",
        tokens_in=1240,
        tokens_out=420,
        cost_usd=0.0089,
        latency_ms=2100,
    )

# Each event's hash = sha256(prev_hash + canonical_json(event))
# Alter any event → chain breaks from that point forward
Your chain so far
GENESIS
h0 = sha256(h0 + …)
REASONING
h1 = sha256(h0 + …)
TOOL_CALL
h2 = sha256(h1 + …)
REASONING
h3 = sha256(h2 + …)
OUTPUT
h4 = sha256(h3 + …)
SEAL
h5 = sha256(h4 + …)
05

Auto-instrumentation — zero code changes

If you're using OpenAI or Anthropic directly, bb.instrument() patches the client at startup. Every client.messages.create() call is automatically recorded as a BLACKBOX run. No context managers, no changes to existing agent code.

agent.py — auto-instrument
import anthropic
import blackbox_sdk as bb

bb.configure(api_url="https://blackbox-gold.vercel.app")
bb.instrument()  # ← one line at startup. That's it.

# Now every Anthropic or OpenAI call is auto-recorded:
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a credit risk assessment agent. Evaluate loan applications.",
    messages=[{"role": "user", "content": "Evaluate LOAN-9182"}],
)
# ↑ Automatically sealed in BLACKBOX. View it at /dashboard.
How it works: bb.instrument() monkey-patches anthropic.resources.messages.Messages.create and openai.resources.chat.completions.Completions.create. Each call gets its own run ID, genesis event, and seal. Fail-open — if BLACKBOX is unreachable, your agent continues uninterrupted.
06

Verify the hash chain

Chain verification re-derives every hash from scratch and compares it to the stored value. One altered byte — anywhere in the chain — produces a mismatch. No stored booleans to falsify.

verify programmatically
import urllib.request, json

run_id = "credit_risk_001_full"
url = f"https://blackbox-gold.vercel.app/v1/audit/{run_id}"

cert = json.loads(urllib.request.urlopen(url).read())

print(cert["chain_valid"])      # True — every hash checks out
print(cert["event_count"])      # 6 events
print(cert["root_hash"])        # sha256 of the final seal
print(cert["merkle_proof"])     # sibling-path inclusion proof

# Share this URL with any auditor — no auth required:
print(url)
verify from scratch — stdlib only
import hashlib, json, urllib.request

def canonical(obj):
    return json.dumps(obj, sort_keys=True, separators=(",", ":"))

def verify_chain(events):
    prev = "GENESIS"
    for e in events:
        stored = e.pop("hash")
        computed = hashlib.sha256((prev + canonical(e)).encode()).hexdigest()
        assert computed == stored, f"CHAIN BREAK at seq {e['sequence']}"
        prev = stored
    return True

# Fetch events
events = json.loads(urllib.request.urlopen(
    "https://blackbox-gold.vercel.app/v1/runs/credit_risk_001_full"
).read())["events"]

print(verify_chain(events))  # True — or raises AssertionError with break location
The key insight
This verification requires only Python stdlib. No BLACKBOX infrastructure. An auditor with the raw JSON and 15 lines of Python can independently verify the chain without trusting BLACKBOX, your infrastructure, or anyone else.
07

View your run in the dashboard

Every recorded run appears in the dashboard immediately. The run detail page shows the full event timeline, chain visualization, cost breakdown, and links to the public audit certificate.

08

Set up fault alerts

BLACKBOX detects hallucinations, policy violations, and prompt injection attempts in real time. Configure webhooks to alert your team instantly via Slack, PagerDuty, or any HTTP endpoint.

curl — create a Slack alert rule
curl -X POST https://blackbox-gold.vercel.app/v1/alerts/rules \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High-severity fault → Slack",
    "condition": "fault_rate > 0.1 OR severity = critical",
    "channel": "slack",
    "webhook_url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
    "enabled": true
  }'
Slack Block Kit: Alerts are automatically formatted with run ID, fault class, severity, and a direct link to the run. No custom formatting needed.
→ Configure alerts in the UI
09

Generate a compliance report

Compliance reports are date-filtered PDFs suitable for risk committees, regulators, and auditors. They include fault rates, model breakdown, chain integrity status, cost attribution, and Merkle anchor hashes.

curl — generate report
curl "https://blackbox-gold.vercel.app/v1/report?from_date=2026-01-01&to_date=2026-12-31" \
  | python3 -m json.tool

# Key fields in the response:
# total_runs, total_events, total_faults, fault_rate
# chain_integrity: "all_valid" or violation count
# by_model: per-model fault rates, quality scores, costs
# fault_patterns: most common fault classes
# merkle_anchors: daily root hashes for external verification
→ Generate a report in the UI
10

Production checklist

Before going live, run through this checklist:

Create a scoped API key
Use role=agent (write-only) for your agents. Never use an admin key in production agent code.
Admin → API Keys
Set BLACKBOX_API_URL env var
Never hardcode the API URL. Use environment variables so you can swap endpoints without code changes.
Enable fault alerts
At minimum, alert on severity=critical. Configure a Slack or PagerDuty webhook.
Configure alerts
Schedule daily anchors
Run POST /v1/anchor/{date} nightly (cron). Publish the root hash externally for independent verification.
Set a retention policy
Configure data lifecycle rules — especially crypto-shred for GDPR right-to-erasure.
Retention policies
Test chain verification
Run the verification script in your CI pipeline. Alerts if a deployment accidentally breaks chain integrity.
Configure legal hold
For regulated runs, enable legal hold to prevent any retention policy from touching them.
Admin → Governance
You're done. Here's what's next.