News
v0.9.0February 22, 2026

Validator Snapshots & Sandbox Audit

TraceCore now treats validator verdicts and sandbox declarations as first-class audit data. v0.9.0 ships normalized validator snapshots, manifest-enforced sandbox allowlists, and IO audits that flow into run artifacts, trust bundles, and the documentation.


Why this release matters

Deterministic contracts are only trustworthy when the final verdict and access surface are provable. Prior releases captured the action trace but left validator payloads and sandbox declarations implicit. v0.9.0 closes that gap so trace diffs, bundle replays, and trust bundles all share a common source of truth.

For incident auditors and CI pipelines this means: when a validator halts a run, you know exactly why, and when a task promises "only /app" filesystem access, you can prove it never strayed.

What shipped

Validator snapshots in every artifact

agent_bench/runner/runner.py now normalizes validator payloads before emitting the final result. Unknownfailure_type strings fall back to taxonomy classifications, failure_reason inherits frommessage/error, and the entire snapshot is stored under the top-level validator key.

Baseline bundles mirror the same snapshot (validator.json), so replay/strict can diff validator verdicts in addition to trace steps.

validator snapshot excerpt
"validator": {
  "ok": false,
  "terminal": true,
  "message": "bad_output",
  "failure_reason": "bad_output",
  "failure_type": "logic_failure",
  "termination_reason": "logic_failure"
}

Manifest-enforced sandbox allowlists

Deterministic task manifests now require two declarations:

  • `filesystem_roots` — absolute path prefixes agents may traverse
  • `network_hosts` — literal or wildcard hostnames permitted for outbound calls

GuardedEnv enforces these allowlists at runtime and emits io_audit entries whenever the harness touches the filesystem or network. The allowlist is mirrored into the run artifact's sandbox field and the bundle manifest, letting trust bundles prove the episode never exceeded its declared surface.

Docs & tooling updates

  • `docs/trace_artifacts.md` now documents the `sandbox` and `validator` top-level fields plus IO audit lines in `tool_calls.jsonl`.
  • `docs/runner.md` explains how validator normalization works and how sandbox allowlists feed GuardedEnv.
  • `docs/record_mode.md` status banner flipped to v0.9.0 with explicit allowlist enforcement notes.
  • README installation guidance now covers `uv tool install --editable .` for local shims before PyPI.

Highlights

  • Validator verdicts normalized + persisted in both run artifacts and bundles.
  • Mandatory sandbox allowlists with GuardedEnv enforcement and IO audit propagation.
  • Trust bundle v0.9.0 includes metadata + JSON artifacts for the fresh manual verification runs.
  • Full documentation sweep: runner, trace artifacts, record mode, README, and changelog.

What’s next

With validator verdicts and sandbox declarations now canonical, the next milestone is surfacing IO audit diffs in the Trace Viewer UI alongside bundle comparisons. Expect deeper ledger integrations and artifact signing so trust bundles feel as first-class as source tarballs.