Trace Artifacts

Every run produces a JSON artifact in .agent_bench/runs/.

Top-Level Fields

  • task_id (string)
  • version (int)
  • seed (int)
  • success (bool)
  • termination_reason (string)
  • failure_reason (string | null)
  • failure_type (string | null)
  • steps_used (int)
  • tool_calls_used (int)
  • metrics (object)
  • action_trace (array)
  • run_id (string, UUID hex)
  • trace_id (string, UUID hex)
  • agent (string, path)
  • task_ref (string, <id>@<version>)
  • started_at (string, ISO 8601)
  • completed_at (string, ISO 8601)
  • harness_version (string)

Trace Entry Schema

trace_entry.json
{
  "step": 1,
  "observation": {
    "step": 1,
    "task": { "id": "filesystem_hidden_config", "description": "..." },
    "last_action": null,
    "last_action_result": null,
    "visible_state": {},
    "budget_remaining": { "steps": 199, "tool_calls": 39 }
  },
  "action": { "type": "list_dir", "args": { "path": "." } },
  "result": { "ok": true, "files": ["config", "readme.txt"] },
  "budget_after_step": { "steps": 199, "tool_calls": 39 }
}

Compatibility Notes

  • Additive fields are allowed. Removals or renames require a version bump and changelog entry.
  • Consumers should ignore unknown keys to remain forward compatible.
  • metrics is reserved for derived values.
  • Action/result payloads are task-defined; only the surrounding envelope is standardized.