v1.0 // deterministic episode runtime

Deterministic Test Runner
for Agent Loops

Deterministic Test Runner
for Agent Loops

TraceCore v1.0 freezes the Deterministic Episode Runtime spec, ships a first-class `tracecore` CLI, enforces wall-clock telemetry, and adds batch execution plus live metrics for CI confidence.

Spec contract
v1.0
tracecore-spec
Wall-clock telemetry
100%
artifact field
Batch workers
spawned
process isolation
signal
Spec-locked artifacts

Artifacts declare spec_version, runtime_identity, and required fields like "wall_clock_elapsed_s".

signal
Parallel batch execution

tracecore run batch --workers N launches isolated subprocesses with enforced timeouts.

signal
Metrics & MTTR

CLI, REST, and dashboard metrics surface success rates, budgets, and recovery time.

[terminal]
$python -m venv .venv && .venv\Scripts\activate
$pip install tracecore
$tracecore version
runtime: 1.0.0 spec: tracecore-spec-v1.0
$tracecore run batch --workers 4 --strict-spec
Batch summary: passed=4 failed=0 p95_wall=4.2s
$tracecore runs metrics --format table
task success rate log_stream_monitor@1 1.00

// capabilities

Built for traceable reliability

TraceCore v1.0: deterministic episodes, spec contracts, and CI-grade telemetry.

01

Spec-locked artifacts

tracecore-spec v1.0 enforces wall_clock_elapsed_s, runtime_identity, and task hashes in every artifact.

02

Batch isolation

tracecore run batch spawns clean subprocess workers with per-job wall-clock timeouts and summaries.

03

Live metrics

CLI + REST metrics expose reproducibility rates, budget P95, and MTTR for every agent/task pair.

04

Immutable registry

Frozen task hashes, SPEC_FREEZE docs, and trust bundles make CI gating and audits boring-but-provable.

// diff

What makes TraceCore different

Operations-first. Deterministic. No vibes.

DimensionTraceCoreTypical
Validation
Deterministic validators
LLM judges + heuristics
Reproducibility
Seed + version = identical
Variable by design
Actions
Structured, explicit schema
Free-form natural language
Focus
Operational reliability
Broad capability
Budget Enforcement
Hard limits, deterministic stop
Tracked but often soft
Sandbox Posture
Explicit anti-cheating
Varies by benchmark

// quickstart

First spec-verified run in three steps

Install the v1 CLI, verify the spec, and collect metrics without leaving your terminal.

[quickstart]
$pip install tracecore
$tracecore version
$tracecore run pairing log_stream_monitor --strict-spec
runtime: 1.0.0 spec: tracecore-spec-v1.0
Run 41c2 completed: success=True, steps=37/150, wall_clock_elapsed_s=4.21

install

Published PyPI wheel

pip install tracecore

batch

Spawn-isolated workers with enforced timeouts

tracecore run batch --workers 4 --strict-spec

Prefer the classic form?

[classic-form]
$tracecore run \
--agent agents/toy_agent.py \
--task filesystem_hidden_config@1 \
--seed 42

// agent_interface

class Agent:
    def reset(self, task_spec: dict) -> None: ...
    def observe(self, observation: dict) -> None: ...
    def act(self) -> dict: ...

No async. No callbacks. No streaming. Just reset, observe, act.