CI Workflow
Use the reusable GitHub Actions workflow to run a task and compare results against a baseline.
GitHub Actions
.github/workflows/tracecore-ci.yml
name: tracecore-ci
on:
pull_request:
workflow_dispatch:
jobs:
tracecore-compare:
uses: ./.github/workflows/baseline-compare.yml
with:
agent_path: agents/chain_agent.py
task_ref: rate_limited_chain@1
seed: "0"
baseline: .agent_bench/baselines/rate_limited_chain_chain_agent.json
require_success: "true"
max_steps: "180"
max_tool_calls: "60"
max_step_delta: "10"
max_tool_call_delta: "5"Exit codes: 0 = identical, 1 = different, 2 = incompatible task/agent.
GitLab CI
.gitlab-ci.yml
stages:
- run
- compare
- gate
run_agent:
stage: run
script:
- pip install -e .[dev]
- agent-bench run --agent agents/chain_agent.py --task rate_limited_chain@1 --seed 0 > run.json
artifacts:
paths:
- run.json
- .agent_bench/runs/
compare_baseline:
stage: compare
needs: [run_agent]
script:
- pip install -e .[dev]
- agent-bench baseline --compare .agent_bench/baselines/rate_limited_chain_chain_agent.json $(python -c "import json;print(json.load(open('run.json'))['run_id'])")
policy_gates:
stage: gate
needs: [compare_baseline]
script:
- pip install -e .[dev]
- python scripts/policy_gate.py --run-json run.json --baseline .agent_bench/baselines/rate_limited_chain_chain_agent.json --max-steps 180 --max-step-delta 10Reusable Workflow Parameters
- agent_path: Path to agent file
- task_ref: Task reference (e.g., filesystem_hidden_config@1)
- seed: Random seed for deterministic run
- baseline: Path to baseline JSON
- require_success: Whether baseline must show success
- max_steps / max_tool_calls: Budget thresholds
- max_step_delta / max_tool_call_delta: Allowed deviation from baseline