Trace Artifacts
Every run produces a JSON artifact in .agent_bench/runs/.
Top-Level Fields
- task_id (string)
- version (int)
- seed (int)
- success (bool)
- termination_reason (string)
- failure_reason (string | null)
- failure_type (string | null)
- steps_used (int)
- tool_calls_used (int)
- metrics (object)
- action_trace (array)
- run_id (string, UUID hex)
- trace_id (string, UUID hex)
- agent (string, path)
- task_ref (string, <id>@<version>)
- started_at (string, ISO 8601)
- completed_at (string, ISO 8601)
- harness_version (string)
Trace Entry Schema
trace_entry.json
{
"step": 1,
"observation": {
"step": 1,
"task": { "id": "filesystem_hidden_config", "description": "..." },
"last_action": null,
"last_action_result": null,
"visible_state": {},
"budget_remaining": { "steps": 199, "tool_calls": 39 }
},
"action": { "type": "list_dir", "args": { "path": "." } },
"result": { "ok": true, "files": ["config", "readme.txt"] },
"budget_after_step": { "steps": 199, "tool_calls": 39 }
}Compatibility Notes
- Additive fields are allowed. Removals or renames require a version bump and changelog entry.
- Consumers should ignore unknown keys to remain forward compatible.
- metrics is reserved for derived values.
- Action/result payloads are task-defined; only the surrounding envelope is standardized.