INTERACTIVE TUTORIAL · 15 MINUTES

Build a tamper-evident AI agent recorder

From zero to a fully hash-chained, auditable AI agent run — with chain verification, Merkle anchoring, and a compliance report. Every step produces real output you can inspect.

⏱ 15 min·Python 3.11+·No prior experience needed·Real API · real data

Choose your scenario:

What you'll build

⛓

A hash-chained run

Every event cryptographically linked — tamper one byte and the chain breaks detectably.

🔍

A public audit certificate

A URL you can share with any auditor. No auth. Chain validity computed fresh on every request.

📅

A Merkle anchor

Proof your run existed before a given timestamp. Verifiable with log(n) hashes.

📄

A compliance report

Date-filtered PDF with fault rates, model breakdown, and chain integrity for risk committees.

Install the SDK

BLACKBOX SDK has zero external dependencies — stdlib urllib only. It works in any Python 3.11+ environment including Lambda, Cloud Run, and Docker.

terminal

pip install blackbox-sdk

verify installation

python -c "import blackbox_sdk as bb; print(bb.__version__)"
# → 0.3.1

TypeScript? Install with npm install @blackbox-ai/sdk — same API, same chain guarantees.

Configure the client

Point the SDK at your BLACKBOX instance. In production, use an environment variable.

agent.py

import blackbox_sdk as bb

# Point to your BLACKBOX instance
bb.configure(api_url="https://blackbox-gold.vercel.app")

# Verify connectivity
import urllib.request, json
resp = urllib.request.urlopen("https://blackbox-gold.vercel.app/healthz")
print(json.loads(resp.read()))  # → {"status": "ok"}

Production tip: Set BLACKBOX_API_URL as an env var. The SDK reads it automatically — no hardcoded URLs in code.

Record your first run

A run is one complete agent session — from genesis (the system prompt and config) to seal (the final tamper-evident hash). Use the context manager: it seals automatically on exit and records a fault event if an unhandled exception occurs.

agent.py

import blackbox_sdk as bb

bb.configure(api_url="https://blackbox-gold.vercel.app")

SYSTEM_PROMPT = """You are a credit risk assessment agent. Evaluate loan applications."""

with bb.run(
    "credit_risk_001",
    model="claude-sonnet-4-6",
    system_prompt=SYSTEM_PROMPT,
    tools=["data_lookup", "risk_calculator"],
    sampling={"temperature": 0.1, "max_tokens": 2048},
) as run:
    # Your agent logic goes here
    pass
# ↑ Sealed automatically. Chain hash computed and stored.

print(f"Run sealed: credit_risk_001")

What just happened?

BLACKBOX created a genesis event — the immutable record of your agent's starting configuration. The system prompt, tool manifest, model, and sampling parameters are all hashed and sealed. This is the root of your chain:

genesis hash (example)

hash_0 = sha256("GENESIS" + canonical_json({
  "run_id": "credit_risk_001",
  "type": "genesis",
  "system_prompt": "You are a credit risk assessment agent. Evaluate loan applications.",
  ...
}))

Add reasoning, tool calls, and output

Real agent runs have multiple steps. Each one becomes a chained event — every hash includes the previous hash, so the entire sequence is tamper-evident.

agent.py — full run

import blackbox_sdk as bb

bb.configure(api_url="https://blackbox-gold.vercel.app")

with bb.run(
    "credit_risk_001_full",
    model="claude-sonnet-4-6",
    system_prompt="You are a credit risk assessment agent. Evaluate loan applications.",
) as run:

    # Step 1: reasoning — record what the agent is thinking
    run.reasoning(
        "Analysing input data to identify risk factors",
        latency_ms=340,
    )

    # Step 2: tool call — record what tools were used and the results
    run.tool_call(
        "data_lookup",
        inputs={"entity_id": "LOAN-9182", "fields": ["credit_score", "income"]},
        result={"credit_score": 710, "income": 85000, "debt_ratio": 0.32},
        latency_ms=820,
    )

    # Step 3: another reasoning step
    run.reasoning(
        "Credit score 710 with 32% debt ratio — borderline. Requesting additional verification.",
        latency_ms=290,
    )

    # Step 4: output — the agent's final response
    run.output(
        "Application LOAN-9182: CONDITIONAL APPROVAL. "
        "Credit score meets minimum threshold. Debt ratio requires verification.",
        tokens_in=1240,
        tokens_out=420,
        cost_usd=0.0089,
        latency_ms=2100,
    )

# Each event's hash = sha256(prev_hash + canonical_json(event))
# Alter any event → chain breaks from that point forward

Your chain so far

GENESIS

h0 = sha256(h0 + …)

→

REASONING

h1 = sha256(h0 + …)

→

TOOL_CALL

h2 = sha256(h1 + …)

→

REASONING

h3 = sha256(h2 + …)

→

OUTPUT

h4 = sha256(h3 + …)

→

SEAL

h5 = sha256(h4 + …)

Auto-instrumentation — zero code changes

If you're using OpenAI or Anthropic directly, bb.instrument() patches the client at startup. Every client.messages.create() call is automatically recorded as a BLACKBOX run. No context managers, no changes to existing agent code.

agent.py — auto-instrument

import anthropic
import blackbox_sdk as bb

bb.configure(api_url="https://blackbox-gold.vercel.app")
bb.instrument()  # ← one line at startup. That's it.

# Now every Anthropic or OpenAI call is auto-recorded:
client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a credit risk assessment agent. Evaluate loan applications.",
    messages=[{"role": "user", "content": "Evaluate LOAN-9182"}],
)
# ↑ Automatically sealed in BLACKBOX. View it at /dashboard.

How it works: bb.instrument() monkey-patches anthropic.resources.messages.Messages.create and openai.resources.chat.completions.Completions.create. Each call gets its own run ID, genesis event, and seal. Fail-open — if BLACKBOX is unreachable, your agent continues uninterrupted.

Verify the hash chain

Chain verification re-derives every hash from scratch and compares it to the stored value. One altered byte — anywhere in the chain — produces a mismatch. No stored booleans to falsify.

verify programmatically

import urllib.request, json

run_id = "credit_risk_001_full"
url = f"https://blackbox-gold.vercel.app/v1/audit/{run_id}"

cert = json.loads(urllib.request.urlopen(url).read())

print(cert["chain_valid"])      # True — every hash checks out
print(cert["event_count"])      # 6 events
print(cert["root_hash"])        # sha256 of the final seal
print(cert["merkle_proof"])     # sibling-path inclusion proof

# Share this URL with any auditor — no auth required:
print(url)

verify from scratch — stdlib only

import hashlib, json, urllib.request

def canonical(obj):
    return json.dumps(obj, sort_keys=True, separators=(",", ":"))

def verify_chain(events):
    prev = "GENESIS"
    for e in events:
        stored = e.pop("hash")
        computed = hashlib.sha256((prev + canonical(e)).encode()).hexdigest()
        assert computed == stored, f"CHAIN BREAK at seq {e['sequence']}"
        prev = stored
    return True

# Fetch events
events = json.loads(urllib.request.urlopen(
    "https://blackbox-gold.vercel.app/v1/runs/credit_risk_001_full"
).read())["events"]

print(verify_chain(events))  # True — or raises AssertionError with break location

The key insight

This verification requires only Python stdlib. No BLACKBOX infrastructure. An auditor with the raw JSON and 15 lines of Python can independently verify the chain without trusting BLACKBOX, your infrastructure, or anyone else.

View your run in the dashboard

Every recorded run appears in the dashboard immediately. The run detail page shows the full event timeline, chain visualization, cost breakdown, and links to the public audit certificate.

▣

Dashboard

All runs · fault rates · cost · latency

⛓

Chain Integrity Panorama

Every run · computed chain validity · break detection

📊

Leaderboard

Model quality scores · fault rates · p99 latency

⏱

Timeline

Chronological event view across all runs

Set up fault alerts

BLACKBOX detects hallucinations, policy violations, and prompt injection attempts in real time. Configure webhooks to alert your team instantly via Slack, PagerDuty, or any HTTP endpoint.

curl — create a Slack alert rule

curl -X POST https://blackbox-gold.vercel.app/v1/alerts/rules \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High-severity fault → Slack",
    "condition": "fault_rate > 0.1 OR severity = critical",
    "channel": "slack",
    "webhook_url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
    "enabled": true
  }'

Slack Block Kit: Alerts are automatically formatted with run ID, fault class, severity, and a direct link to the run. No custom formatting needed.

→ Configure alerts in the UI

Generate a compliance report

Compliance reports are date-filtered PDFs suitable for risk committees, regulators, and auditors. They include fault rates, model breakdown, chain integrity status, cost attribution, and Merkle anchor hashes.

curl — generate report

curl "https://blackbox-gold.vercel.app/v1/report?from_date=2026-01-01&to_date=2026-12-31" \
  | python3 -m json.tool

# Key fields in the response:
# total_runs, total_events, total_faults, fault_rate
# chain_integrity: "all_valid" or violation count
# by_model: per-model fault rates, quality scores, costs
# fault_patterns: most common fault classes
# merkle_anchors: daily root hashes for external verification

→ Generate a report in the UI

Production checklist

Before going live, run through this checklist:

□

Create a scoped API key

Use role=agent (write-only) for your agents. Never use an admin key in production agent code.

→ Admin → API Keys

□

Set BLACKBOX_API_URL env var

Never hardcode the API URL. Use environment variables so you can swap endpoints without code changes.

□

Enable fault alerts

At minimum, alert on severity=critical. Configure a Slack or PagerDuty webhook.

→ Configure alerts

□

Schedule daily anchors

Run POST /v1/anchor/{date} nightly (cron). Publish the root hash externally for independent verification.

□

Set a retention policy

Configure data lifecycle rules — especially crypto-shred for GDPR right-to-erasure.

→ Retention policies

□

Test chain verification

Run the verification script in your CI pipeline. Alerts if a deployment accidentally breaks chain integrity.

□

Configure legal hold

For regulated runs, enable legal hold to prevent any retention policy from touching them.

→ Admin → Governance

You're done. Here's what's next.

Champion vs. Challenger

Re-run a recorded run against a different model and compare outputs side-by-side with a word diff.

Model Upgrade Regression

Before upgrading GPT-4 → GPT-4o or Haiku → Sonnet, test all your recorded runs and get a PROCEED / HOLD verdict.

Prompt Registry

Version-control your system prompts. Certify them on-chain. Compare versions with a line diff.

SOC 2 Narrative

Pre-written TSC control narratives for CC6–CC9 mapped to BLACKBOX's technical controls.

API Reference

Full endpoint documentation with request/response schemas, auth patterns, and error codes.

Talk to us

Deploying at scale or need enterprise features? We'll get you set up. hello@promptblackbox.com