v1.0 // deterministic episode runtime

Deterministic Test Runner
for Agent Loops

TraceCore v1.0 freezes the Deterministic Episode Runtime spec, ships a first-class `tracecore` CLI, enforces wall-clock telemetry, and adds batch execution plus live metrics for CI confidence.

Spec contract

v1.0

tracecore-spec

Wall-clock telemetry

100%

artifact field

Batch workers

spawned

process isolation

> Demo View_Source

signal

Spec-locked artifacts

Artifacts declare spec_version, runtime_identity, and required fields like "wall_clock_elapsed_s".

signal

Parallel batch execution

tracecore run batch --workers N launches isolated subprocesses with enforced timeouts.

signal

Metrics & MTTR

CLI, REST, and dashboard metrics surface success rates, budgets, and recovery time.

[terminal]

$python -m venv .venv && .venv\Scripts\activate
$pip install tracecore
$tracecore version
runtime: 1.0.0  spec: tracecore-spec-v1.0
$tracecore run batch --workers 4 --strict-spec
Batch summary: passed=4 failed=0 p95_wall=4.2s
$tracecore runs metrics --format table
task success rate  log_stream_monitor@1   1.00

// capabilities

Built for traceable reliability

TraceCore v1.0: deterministic episodes, spec contracts, and CI-grade telemetry.

Spec-locked artifacts

tracecore-spec v1.0 enforces wall_clock_elapsed_s, runtime_identity, and task hashes in every artifact.

Batch isolation

tracecore run batch spawns clean subprocess workers with per-job wall-clock timeouts and summaries.

Live metrics

CLI + REST metrics expose reproducibility rates, budget P95, and MTTR for every agent/task pair.

Immutable registry

Frozen task hashes, SPEC_FREEZE docs, and trust bundles make CI gating and audits boring-but-provable.

// diff

What makes TraceCore different

Operations-first. Deterministic. No vibes.

Dimension	TraceCore	Typical
Validation	Deterministic validators	LLM judges + heuristics
Reproducibility	Seed + version = identical	Variable by design
Actions	Structured, explicit schema	Free-form natural language
Focus	Operational reliability	Broad capability
Budget Enforcement	Hard limits, deterministic stop	Tracked but often soft
Sandbox Posture	Explicit anti-cheating	Varies by benchmark

// quickstart

First spec-verified run in three steps

Install the v1 CLI, verify the spec, and collect metrics without leaving your terminal.

[quickstart]

$pip install tracecore
$tracecore version
$tracecore run pairing log_stream_monitor --strict-spec
runtime: 1.0.0  spec: tracecore-spec-v1.0
Run 41c2 completed: success=True, steps=37/150, wall_clock_elapsed_s=4.21

install

Published PyPI wheel

pip install tracecore

batch

Spawn-isolated workers with enforced timeouts

tracecore run batch --workers 4 --strict-spec

Prefer the classic form?

[classic-form]

$tracecore run \
  --agent agents/toy_agent.py \
  --task filesystem_hidden_config@1 \
  --seed 42

// agent_interface

class Agent:
    def reset(self, task_spec: dict) -> None: ...
    def observe(self, observation: dict) -> None: ...
    def act(self) -> dict: ...

No async. No callbacks. No streaming. Just reset, observe, act.

// articles

Practical essays for teams building real agents

Long-form writing on deterministic agents, trustable evaluation, and why operational discipline matters more than vibes.

All articles

Featured essayMarch 21, 2026

Why Tracecore did not exist before

Deterministic agent runtimes arrived late because the ecosystem optimized for demos, benchmark outcomes, and observability before it optimized for reproducible execution contracts.

#tracecore#determinism#agent-runtime#evaluation

Read

Deterministic Test Runnerfor Agent Loops

Deterministic Test Runnerfor Agent Loops

Built for traceable reliability

Spec-locked artifacts

Batch isolation

Live metrics

Immutable registry

What makes TraceCore different

First spec-verified run in three steps

// agent_interface

Practical essays for teams building real agents

Why Tracecore did not exist before

Deterministic Test Runner
for Agent Loops

Deterministic Test Runner
for Agent Loops