BLACKBOX
flight recorder for AI agents

Upgrade Test

APPROXIMATE RE-EXECUTION
Test a new model version against your recorded run history. Each run's genesis context and retrieval set is re-submitted to the new model, the output is verified, and the verdict is compared against the original. Provider non-determinism means results are approximate — labeled accordingly.
QUICK START