Task Catalog

Deterministic, closed-world tasks with frozen versions, constrained actions, and mechanical validators.

Registry & Plugin Workflow

  • tasks/registry.json is the manifest that keeps documentation and specs in sync.
  • Each task directory includes a task.toml manifest describing budgets, entrypoints, and deterministic behavior.
  • External task packages can register via the agent_bench.tasks entry-point group.
  • The loader merges bundled manifest rows + plugin descriptors.

filesystem_hidden_config@1

PropertyValue
Suitefilesystem
DeterministicYes
Pathtasks/filesystem_hidden_config/

Forces agents to plan cautious filesystem exploration to recover API_KEY without brute-force traversal.

Skills stressed: Stateful search across nested directories, budget-aware exploration, validating when a clue resolves the goal.

rate_limited_api@1

PropertyValue
Suiteapi
DeterministicYes
Pathtasks/rate_limited_api/

Single-endpoint API that enforces strict quotas and transient failures; agents must respect retry_after windows.

rate_limited_chain@1

PropertyValue
Suiteapi
DeterministicYes
Pathtasks/rate_limited_chain/

Extends the API with a handshake template and chained endpoints that expire. Skills stressed: Parsing templates, tracking handshake lifetimes, differentiating errors.

deterministic_rate_service@1

PropertyValue
Suiteapi
DeterministicYes
Pathtasks/deterministic_rate_service/

Deterministic yet unforgiving service combining handshake confirmation, required payload templates, rate limiting, and a guaranteed transient hiccup.

log_alert_triage@1

PropertyValue
Suiteoperations
DeterministicYes
Pathtasks/log_alert_triage/

Walk deterministic log artifacts and recover the final ALERT_CODE used for escalation.

config_drift_remediation@1

PropertyValue
Suiteoperations
DeterministicYes
Pathtasks/config_drift_remediation/

Compare desired vs. live configuration and output the exact remediation patch line.

incident_recovery_chain@1

PropertyValue
Suiteoperations
DeterministicYes
Pathtasks/incident_recovery_chain/

Follow a deterministic recovery handoff chain to extract the final RECOVERY_TOKEN.

log_stream_monitor@1

PropertyValue
Suiteoperations
DeterministicYes
Pathtasks/log_stream_monitor/

Poll a seeded, paginated log stream across multiple pages, filter out INFO/WARN noise, and emit the STREAM_CODE embedded in the first CRITICAL entry.

Skills stressed: Cursor-based pagination without over-fetching, signal/noise discrimination across a multi-page stream, stopping immediately once the trigger condition is met.

Why it matters: Mirrors production monitoring loops where agents must watch a live stream, ignore routine events, and fire exactly once on a critical signal — without exhausting tool-call budgets on noise.

Quick start: agent-bench run pairing log_stream_monitor

runbook_verifier@1

PropertyValue
Suiteoperations
DeterministicYes
Pathtasks/runbook_verifier/

Confirm that every incident runbook phase executed in order and emit the RUNBOOK_CHECKSUM(phase codes + ACK + handoff token) by stitching evidence across README, runbook index, per-phase manifests, sequence logs, timeline, and final handoff instructions.

Skills stressed: Maintaining strict ordering under tight tool-call budgets, validating that each phase completed, and refusing to emit outputs when any artifact is missing.

Why it matters: Mirrors audit workflows where operators must prove each mitigation phase ran before handing off an incident—with zero tolerance for missing steps.