Safety & mutation control
This is the highest-stakes surface of an agent-CLI, and the one the real-world crop most often gets wrong (see Antipatterns). It refines invariants I2 and I9.
Read-only by default
Section titled “Read-only by default”An agent acts on inference and can be wrong or adversarially steered, so the safe path is the default:
- State-changing operations MUST be off unless the caller explicitly opts in — a single global
gate (
--allow-mutations/--write) read by every mutating command is the clearest shape. - A blocked mutation returns a structured error with a dedicated exit code (e.g.
MUTATION_BLOCKED), so the agent can distinguish “blocked — ask permission” from a real failure. - The guarantee lives in the binary. Putting confirmation only in a wrapper/skill is an antipattern: an agent that shells out directly bypasses it.
Read-only-only tools
Section titled “Read-only-only tools”If a tool has no mutating commands by design (a pure state-gatherer), the gate becomes an input
validator: a generic passthrough (tool run "<cmd>") refuses anything that isn’t a read,
returning a structured WRITE_REFUSED, and there is deliberately no escape flag — “no writes”
is the product boundary, not a toggle. Gate operationally heavy reads (debug, full dumps) behind
their own explicit opt-in.
Confirming a mutation an agent requested
Section titled “Confirming a mutation an agent requested”A human types y; an agent cannot. So:
--yes/--forceMAY bypass a confirmation for scripted/agent use.- A bare
--yesMUST NOT act on an implicit or wildcard target — the dangerous target must be named explicitly. - Scale friction to severity: trivial change → just do it; destructive change → require the target be named, or a typed confirmation token that is still scriptable.
--dry-run and reviewed-plan apply
Section titled “--dry-run and reviewed-plan apply”- Every mutating command SHOULD support
--dry-run: perform the full plan, print what would change (as JSON under--json), change nothing, exit 0. - For high-stakes changes, prefer reviewed-plan apply:
--dry-runemits a plan plus a hash; a separateapply <hash>executes only that exact plan. This closes the time-of-check / time-of-use gap that a blind--yesopens — the artifact reviewed is the artifact executed (the model Terraform uses with saved plan files). - Prefer idempotent/declarative verbs (
ensure,apply,sync) so an agent’s retries don’t double-act; “delete an already-gone thing” is a soft success, not an error.
Fan-out safety
Section titled “Fan-out safety”When a tool acts across many targets (a fleet), isolate per-target failures — one unreachable host must not fail the others — and return a partial-success exit code when any target failed. Default targeting must never silently fan a mutation out to everything.
Prompt-injection fencing
Section titled “Prompt-injection fencing”Output that originates from an external, attacker-influenceable source — message bodies, interface descriptions, neighbor names, logs, web pages, ticket text — is a prompt-injection vector. In agent-facing output it MUST be fenced as untrusted:
- Wrap raw free text in explicit
BEGIN/END UNTRUSTED …markers, and/or set an"untrusted": trueflag on the carrying JSON object. - Default this on in agent mode. Of the real-world crop, essentially one tool fenced untrusted content at all, and even then it was opt-in — make it the default.
Capability assumption: fencing matters as long as agents may follow instructions found in their context. If models become reliably immune to injected instructions, this relaxes — see Evolution.