Deterministic Test Runner
for Agent Loops
Deterministic Test Runner
for Agent Loops
TraceCore v1.0 freezes the Deterministic Episode Runtime spec, ships a first-class `tracecore` CLI, enforces wall-clock telemetry, and adds batch execution plus live metrics for CI confidence.
Artifacts declare spec_version, runtime_identity, and required fields like "wall_clock_elapsed_s".
tracecore run batch --workers N launches isolated subprocesses with enforced timeouts.
CLI, REST, and dashboard metrics surface success rates, budgets, and recovery time.
$python -m venv .venv && .venv\Scripts\activate$pip install tracecore$tracecore versionruntime: 1.0.0 spec: tracecore-spec-v1.0$tracecore run batch --workers 4 --strict-specBatch summary: passed=4 failed=0 p95_wall=4.2s$tracecore runs metrics --format tabletask success rate log_stream_monitor@1 1.00
// capabilities
Built for traceable reliability
TraceCore v1.0: deterministic episodes, spec contracts, and CI-grade telemetry.
Spec-locked artifacts
tracecore-spec v1.0 enforces wall_clock_elapsed_s, runtime_identity, and task hashes in every artifact.
Batch isolation
tracecore run batch spawns clean subprocess workers with per-job wall-clock timeouts and summaries.
Live metrics
CLI + REST metrics expose reproducibility rates, budget P95, and MTTR for every agent/task pair.
Immutable registry
Frozen task hashes, SPEC_FREEZE docs, and trust bundles make CI gating and audits boring-but-provable.
// diff
What makes TraceCore different
Operations-first. Deterministic. No vibes.
| Dimension | TraceCore | Typical |
|---|---|---|
| Validation | Deterministic validators | LLM judges + heuristics |
| Reproducibility | Seed + version = identical | Variable by design |
| Actions | Structured, explicit schema | Free-form natural language |
| Focus | Operational reliability | Broad capability |
| Budget Enforcement | Hard limits, deterministic stop | Tracked but often soft |
| Sandbox Posture | Explicit anti-cheating | Varies by benchmark |
// quickstart
First spec-verified run in three steps
Install the v1 CLI, verify the spec, and collect metrics without leaving your terminal.
$pip install tracecore$tracecore version$tracecore run pairing log_stream_monitor --strict-specruntime: 1.0.0 spec: tracecore-spec-v1.0Run 41c2 completed: success=True, steps=37/150, wall_clock_elapsed_s=4.21
install
Published PyPI wheel
pip install tracecore
batch
Spawn-isolated workers with enforced timeouts
tracecore run batch --workers 4 --strict-spec
Prefer the classic form?
$tracecore run \--agent agents/toy_agent.py \--task filesystem_hidden_config@1 \--seed 42
// agent_interface
class Agent:
def reset(self, task_spec: dict) -> None: ...
def observe(self, observation: dict) -> None: ...
def act(self) -> dict: ...No async. No callbacks. No streaming. Just reset, observe, act.
// articles
Practical essays for teams building real agents
Long-form writing on deterministic agents, trustable evaluation, and why operational discipline matters more than vibes.
Why Tracecore did not exist before
Deterministic agent runtimes arrived late because the ecosystem optimized for demos, benchmark outcomes, and observability before it optimized for reproducible execution contracts.