May 19, 20267 min readЧитать на русском

Incident-Driven Configuration — Growing an Agent's Rules by Precedent, Not by Design

The first instinct when deploying an autonomous agent is to design its rules upfront. The team sits down with a whiteboard, lists every situation the agent might encounter, writes a comprehensive set of guidelines, ships it. By week two the agent has done three things the team did not anticipate — sent a duplicate message because the proxy timed out, mentioned a project name from deleted memory, confused UTC and MSK on a deadline calculation — and the team realises that the design was a guess. None of those situations were in the original list. The whiteboard didn't fail because it was lazy; it failed because the space of possible incidents is bigger than upfront design can cover.

What follows is the pattern I name Incident-Driven Configuration. The agent's rules are not designed; they are accumulated. Each incident — a moment when the agent's behaviour produced a problem — is logged in a dedicated .learnings/ directory. The operator analyses the pattern, then promotes a hard rule into the agent's main configuration files. The system grows by precedent: each case becomes a new line of rules. The shape is closer to a body of case law than to a software specification.

The mechanism is small but disciplined. A three-file .learnings/ directory holds the working notes: LEARNINGS.md (behaviour corrections), ERRORS.md (technical failures), FEATURE_REQUESTS.md (capabilities the agent is missing). When an incident happens, the operator (sometimes the agent itself) appends an entry. When the entry is analysed and the lesson generalised, the rule is promoted into one of the main config files: behaviour into SOUL.md, process into AGENTS.md, technical workaround into TOOLS.md. The promotion is the moment the case becomes law.

Why upfront design under-covers

The space of incidents an agent will produce in production is shaped by three layers the designer can only partially see: the model's defaults, the operating environment's quirks, and the specific dynamics of the chats the agent enters. The first is hard because the model surfaces new failure modes as context grows. The second is hard because environments are unstable — the proxy that worked yesterday drops every third request today. The third is hard because the chats themselves are people, and the dynamics of people are not designable in advance.

The upfront-design failure mode looks like this: a thirty-line config that anticipates the obvious cases, miss-specifies three of them in subtle ways, and is entirely silent on the cases that actually occur. The team either fixes the config reactively (in which case the discipline of incident-driven config is what they are doing, just without naming it) or convinces themselves the agent is "working" while it accumulates small slips.

The incident-driven path inverts the work. Start with a deliberately minimal config — the few rules that are obvious enough to bet on — and let the rest accumulate. Each rule has provenance: the file says why the rule exists, and which incident produced it. This is what makes the config grow without bloat: when an old rule no longer applies, the operator can read its provenance and decide to drop it.

The three-file learnings drawer

The intermediate stage between incident and rule is the .learnings/ directory. Three files, each a long-running log:

LEARNINGS.md — behavioural corrections. The agent kept doing X and it caused a problem; rule is to do Y instead. Example entries from the case study: Before sending any message, dump the conversation context (last 20–30 messages). After every send, verify exactly one message was sent.
ERRORS.md — technical failures. Tool, script, or environment broke; what happened, how it was diagnosed. Example: Telethon timeouts when the HTTP proxy resets every 3–4 requests; mitigation is reconnecting every 2–3 operations.
FEATURE_REQUESTS.md — capabilities the agent is missing. Things the operator noticed during operation that would be useful but aren't necessary now. Example: Telethon daemon → webhook for real-time monitoring; Gmail integration through OpenClaw hooks.

The drawer is a staging area. Not every entry becomes a rule — some are observations that turn out to be one-offs, some are feature requests deferred to a roadmap. The promotion to main config is the discipline.

The promotion step

When an entry has accumulated enough evidence — usually after the same pattern shows up twice — the operator promotes it. The destination depends on the kind of rule:

A behavioural correction (tone, what to say, what never to say) goes into SOUL.md as a hard prohibition. Never use an exclamation mark. Women's grammatical gender always. See persona-through-prohibitions.
A process correction (when to act, in which order, with which sequence) goes into AGENTS.md. Before sending in a forum chat, verify the topic ID matches the subject. After every send, check that exactly one message left.
A technical workaround (how to use a tool, how to handle a failure) goes into TOOLS.md. Reset http_proxy env vars to empty; pass proxy directly into Telethon constructor.
A schedule change goes into HEARTBEAT.md. Crons run daily including weekends — promoted after the agent missed Saturday reminders.

The promotion is small — one or two lines. The new rule includes its provenance, often as a comment: (after the 22.03 incident with three-message duplicate in chat X). Provenance is what lets the operator audit the rules months later and remove ones whose original cause no longer applies.

A worked example — one month of precedent

The case-study agent's config grew by these rules in the first month of operation, each tied to a specific incident:

| Incident | Promoted rule | Destination | |---|---|---| | Triple-sent message after proxy timeout | After every send, verify one message left. If duplicated, delete immediately. | HEARTBEAT.md | | Leaked name of a deleted project | Deleted projects are removed from everywhere: files, memory, tools, persona. Never mention. | SOUL.md | | UTC vs MSK confusion in deadlines | Always compute and display in MSK. Never confuse with UTC. | SOUL.md | | Missed Saturday reminders | Crons run daily, including weekends. | HEARTBEAT.md | | 18 template replies without reading chat history | Before any send, dump the last 20–30 messages of the chat. | HEARTBEAT.md | | Forum topic-ID mismatch | Before sending to a forum chat, verify the topic ID matches the subject. | AGENTS.md | | Wrong-context status update to client | Status responses to client require operator approval. | AGENTS.md |

Each rule is a single line in the config. Together they are the accumulated experience of the deployment, encoded in a form the agent reads at each cron firing. The config is short — twenty to forty lines — because the rules are specific. The body of "things the agent has learned" lives in .learnings/ and stays there unless the pattern is general enough to promote.

The case-law analogue

The pattern's nearest analogue outside computing is common-law jurisprudence. Each case produces a ruling; the ruling becomes precedent; over time the body of precedent shapes the behaviour of the court without ever being codified upfront. The strength of the analogy is in how the rules accumulate — case by case, with provenance — and how they are narrow: each rule covers the specific situation that produced it, not a hypothetical wider class.

The weakness of the analogue is that case law has a doctrine of stare decisis — courts defer to past rulings. Agent configurations need active maintenance: when an environment changes, rules tied to the old environment need to be retired, not deferred to. The case-study config has examples of retired rules — "monitor weekdays only" was retired the day it was promoted to "monitor daily including weekends".

When the pattern fits

Long-running agent deployments where the operating environment is too complex for upfront design.
High-trust contexts where the operator can be in the loop on each incident and is the one analysing and promoting rules.
Systems with a clear distinction between configuration and code — files the agent reads at startup, separable from the agent's logic.

It is weakest in fully unattended systems where there is no operator to do the promotion step. There, you either need a different mechanism (LLM-based self-reflection, automated rule generation) or you stay with upfront design and accept its under-coverage.

Two failure modes

Promotion without provenance. A rule lands in the config with no record of the incident that produced it. Months later, the operator can't tell whether the rule is still relevant. Fix: every promoted rule includes a one-line comment naming the incident.
Drawer without promotion. Incidents accumulate in .learnings/ but never make it into the main config. The agent keeps making the same mistakes. Fix: the operator's job is the promotion step; review the drawer at a known cadence (the case-study operator does this weekly).