BLACKBOX
flight recorder for AI agents

Champion vs Challenger

APPROXIMATE RE-EXECUTION
Pick a recorded run and a challenger model. BLACKBOX re-executes the original context against the challenger, verifies the output, and records the challenger's run permanently — so the comparison itself is in the audit trail. Provider non-determinism means results are approximate.