TraceCore Documentation

A deterministic, budgeted, and auditable test runner for agent control loops.

What is TraceCore?

TraceCore is best understood as a deterministic test runner for agent loops — think "pytest for agents." It does not measure broad intelligence. It validates whether an agent can operate correctly, repeatedly, under constraints.

The invariant core is a Deterministic Episode Runtime: a bounded runtime that executes agent-environment interaction with fixed inputs and emits replayable traces plus a structured verdict. TraceCore v1.0 locks this behavior behind the tracecore-spec v1.0 contract and `tracecore run --strict-spec`.

Key Concepts

  • Episodes are the smallest valid execution unit.
  • Each episode runs Agent + Environment + Seed + Budgets and produces a trace + outcome.
  • Validation is always deterministic and mechanical — no LLM judges.
  • Artifacts are machine-readable and CI-friendly.
  • Replay is a first-class property, not a convenience feature.

Quick Start

terminal
pip install tracecore
tracecore version
tracecore run pairing log_stream_monitor --strict-spec

Practical Value

  • Regression detection with stable seeds and baseline compare workflows.
  • Actionable failures via structured taxonomy and full trace context.
  • CI-native gating using deterministic pass/fail and policy thresholds.
  • Auditable evidence through persisted run artifacts and replayability.

One-Line Mental Model

If pytest tests functions, TraceCore executes deterministic episodes. If Docker packages containers, TraceCore packages bounded agent-environment interactions.