May 19, 20267 min readЧитать на русском

Tiered Agent Autonomy — Routine Acts Auto, Decisions Through Approval

A studio puts an AI agent into a client chat to take the routine load off the lead producer. The first week the agent is set to fully autonomous — it can read, reply, do whatever the chat needs. By Wednesday the agent has sent the client a preliminary render that wasn't ready, named a delivery date the team hadn't agreed to, and acknowledged a scope change as if it were approved. Each individual action looked reasonable in isolation. None of them were the agent's to make. The operator pulls back to "manual mode" — every reply now waits for them — and the agent loses its reason to exist: it's slower than typing the reply yourself.

What follows is the pattern I name Tiered Agent Autonomy. The agent's full action space is split, explicitly and ahead of time, into two tiers: actions it can perform without approval, and actions that must route through a human operator. The split is not "everything but high-stakes" — it is enumerated, per chat and per direction, in the agent's operational config (in the OpenClaw deployment, AGENTS.md). The agent's job is to know which tier any given action belongs to and to default to the approval-gated tier when in doubt.

The mechanism is enumeration plus a STOP fallback. The config lists the autonomous actions positively (acknowledge received feedback, send status updates, request files or specs from the client, relay client feedback to the team by rewriting in own words, probe team members for blockers) and the approval-gated actions positively (work results delivered to client, requests for feedback from the client, deadlines, scope, decisions, any mention of cost). Any situation that isn't covered by either list triggers a third state: STOP — do not reply, notify the operator, wait for instruction. The two-tier list plus STOP is the whole governance model.

Why one tier always fails

A single tier of autonomy fails one of two ways. Full autonomy fails by overreach — the agent generates plausible-sounding decisions in the operator's voice, and the chat treats them as authoritative; recovery is expensive because the words are out. Zero autonomy fails by elimination — every reply waits on the human, so the agent's value flattens to autocomplete and the operator stops trusting that anything is happening without them.

The split between tiers is the production answer. The autonomous tier is the floor of trivial-to-reverse actions where speed matters: an acknowledgement that arrives in two minutes does more for the relationship than a perfect reply six hours later. The approval-gated tier is the ceiling of expensive-to-reverse actions where correctness matters more than speed: the deadline that an agent fabricates becomes a deadline the client expects.

Per-chat tiering

Tiers are not global. The case-study agent runs three concurrent projects with three different tier configurations:

The Swiss-watch project has two chats tiered differently in the same project (see two-chat-architecture). Internal team chat: fully autonomous on coordination, status, ack, blocker probes. Client chat: every single reply gates through the operator, even the acknowledgement.
The retailer project runs the client chat in a third mode — pure observation. The agent reads everything, replies only on direct address with a tiny set of acknowledgements ("received", "we'll get back to you", "thanks, we have it"). All judgement is gated.
The hiring channel runs as a pipeline — autonomy is scoped to the stages of a funnel and the agent must escalate when a candidate response doesn't fit a stage transition.

This is the operational insight: the right autonomy tier depends on the chat's role in the project, not on the agent's capability. The same agent runs all three configurations from the same identity.

Per-direction tiering

In a single chat, tiers can also split by direction. The two-chat architecture in the watch project makes this explicit: information flows client → agent → team on the autonomous tier (the agent rewrites client feedback into Russian for the team without approval); information flows team → agent → client on the approval-gated tier (the agent never delivers a team result to the client without operator review). The same agent, the same context, opposite directions, different tiers.

The reason is asymmetric reversibility. An imperfect internal-team rewrite of a client message can be corrected in the team chat without damage. A premature client-facing message can't.

What goes in each tier (working list from the case study)

Autonomous — the case-study agent does these without notifying the operator:

Acknowledging received feedback ("Received, sent to the team")
Status updates in response to direct asks ("Status: Ivan is working on the corrections, expecting renders today")
Requesting files, specs, references from the client
Relaying client feedback to the team by rewriting in own words (never forwarding the original message)
Proactive probes of team members for blockers and progress
Hourly heartbeat presence (silent if nothing requires reply)

Approval-gated — the case-study agent stops and notifies on these:

Team results being delivered to the client
Requests for feedback from the client
Any mention of deadlines, scope, or commitments
Any mention of cost, additional cost, or pricing
Any new task assignment to a team member
Any reply to a question framed as a decision ("what do you think — should we…?")
Any conflict-tinted message in the chat

STOP — anything not in either list, including ambiguous cases. The agent does not improvise.

The STOP protocol as the load-bearing third state

Two-tier systems usually fail at the seam: a situation arises that doesn't fit either tier, and the agent guesses. The third state — STOP — is the load-bearing piece. Without it, the agent treats out-of-scope situations as autonomous-by-default ("I'll just acknowledge politely") and quietly oversteps.

The STOP rule is: if a situation is not covered by the autonomous list and not covered by the approval-gated list, do not respond. Notify the operator through the private notification channel. Wait for instruction. The operator's job in setting up the agent is to enumerate generously enough that STOP is rare, but not so generously that the autonomous tier accumulates power it shouldn't have.

In the case study, the STOP protocol fired most often on cost questions ("how much for an extra animation?"), opinion solicitations ("what do you think we should do here?"), and conflict-tinted exchanges. None of those have a safe autonomous-tier answer.

When to expand the autonomous tier

The autonomous tier should grow when patterns repeat enough to be promoted. After a month of operation in the case study, the autonomous list grew by three items — promoted up from the approval-gated tier — because the operator noticed they were being approved verbatim every time. Each promotion came with a worked example in the config: "Status responses of the form 'X is working on Y, expected Z' may be sent autonomously when the team has confirmed status in the internal chat within the past 24 hours."

The autonomous tier should shrink — items demoted to approval-gated — when an incident shows the autonomous version was unsafe. The pattern here is the same as incident-driven-configuration.

When the pattern fits

Production-line agentic systems where the same agent must run multiple concurrent contexts with different governance.
Customer-facing copilots that should be fast on FAQs and acknowledgements but cautious on anything contractual.
Coding agents in shared repos where autonomous small commits are useful but autonomous merges to main are catastrophic.
Operations agents for ticket triage, where acknowledging and tagging is autonomous but routing and resolution decisions are not.

It is weakest in single-task agents with no governance dimension — a one-shot research summariser doesn't need tiers because it never sends anything to a third party.

Three failure modes

Tiers without enumeration. A config that says "use judgement, escalate when unsure" inherits all the problems of one-tier autonomy because the model interprets "judgement" as confidence. Fix: name the autonomous actions and approval-gated actions positively, with examples, and use STOP for the residual.
Same tiers for all chats. A global "be autonomous on status, escalate on cost" rule fails when one of the chats is a client chat where even the status updates need review. Fix: tier per chat (and per direction in chats where direction matters).
No STOP state. The agent treats "not in approval-gated list" as "is autonomous". Anything ambiguous gets a polite autonomous reply that is technically a decision. Fix: explicit STOP rule; trust the operator to be reachable.