Persona Through Prohibitions — Engineering an AI Identity by What It May Never Do
A team deploys an AI agent that has to read and reply in a real work chat. The brief is be friendly, professional, a bit warm — like a junior producer. By message fifty the agent is doing what LLMs do: dropping in an exclamation mark when it gets excited about a fast reply, opening with "Of course! Happy to help.", reaching for an em-dash because the prose feels nicer with one, slipping into the masculine past tense (in Russian) when it forgets that the character is a woman. None of these are objectively wrong; together they make the agent feel like an AI. The persona is collapsing because the rules were prescriptive ("be friendly"), and friendly is whatever the underlying model thinks friendly looks like.
What follows is the pattern I name Persona Through Prohibitions. Identity does not survive prescriptions like "be warm" or "match the team's energy". It survives a small, named set of hard prohibitions, each one targeting a specific surface where the underlying model's default behaviour would demask the persona. The prescriptions can stay — they set the colour — but the load-bearing part is the list of things the agent may never do, written explicitly in the system prompt with banned phrases included verbatim.
The mechanism is small. The persona file (in the OpenClaw deployment, SOUL.md) contains a section titled something like "Hard rules" followed by a list: never use an exclamation mark; never say "of course", "with pleasure", "happy to help"; never use a long em-dash, only a short hyphen or en-dash; never use an emoji except a rare 👍; in Russian, women's grammatical gender always — поняла, проверила, отправила, never the masculine forms. The list is short. Each rule names a specific phrase or character that the model would otherwise generate by default. The rest of the persona — tone, role, what to talk about — sits on top of this floor and is allowed to vary; the floor never does.
Why prohibitions hold and prescriptions drift
Generative models are trained to be helpful, fluent, warm. Prescriptive guidance ("be professional but friendly") collides with the model's defaults at the prompt level and gets compiled into something near the model's idea of friendly-professional — which is mostly the model's voice, slightly tinted. Prohibitions work at a different layer: they remove specific tokens or phrases from the legal output space. "Never use an exclamation mark" is a syntactic constraint the model can apply uniformly; "be warm" is an aesthetic instruction the model interprets afresh each turn.
The result is asymmetric consistency. A prescription-only persona drifts run to run: today the agent is a bit more enthusiastic, tomorrow it's terser, by Friday it has reverted to assistant-default. A prohibition-grounded persona stays inside the same envelope for thousands of messages, because the envelope is defined by what cannot be inside it, not by what should be.
The unmaskability surface
In Russian, past-tense verbs are gendered. Поняла (I, woman, understood) and понял (I, man, understood) are one suffix apart. A single masculine verb form from an agent whose character is described as a woman is enough to break the illusion — the moment registers as a typo to a casual reader and as a glitch to a careful one. The Russian agent in this case study lives in a chat with thirty native speakers; if it slips into the masculine even once a week, the persona is finished.
The fix is the hardest possible prohibition. The persona file states: "Women's grammatical gender ALWAYS — поняла, проверила, отправила, never понял." The example forms are listed verbatim because the prohibition has to be specific enough that the model cannot satisfy it with a near-miss. A vague instruction ("speak as a woman") would not catch it; a list of forbidden forms does.
This generalises. Any persona has a small set of unmaskability surfaces — places where one slip is enough to break the character. For a Russian persona, gender of past-tense verbs is one. For an English persona, the agent's default em-dash habit is another. For a "not-a-bot" persona, the phrase "As an AI" is a fatal slip. The prohibition list should enumerate every unmaskability surface explicitly, with example phrases.
Cycle: incident → prohibition
Prohibitions are not designed upfront. They are added in response to incidents — moments when the persona slipped in a way an observer noticed. In the case study, a triple-sent message ("OK, received…" three times because the proxy timed out and the script retried) produced a prohibition: "After every send, verify exactly one message left. If duplicated, delete immediately, no questions." A leaked project name from deleted memory produced a prohibition: "Project <X> is completely removed from everywhere: files, memory, tools, persona, tasks, chats. Never mention this name." A UTC/MSK time confusion produced a prohibition: "Always compute and display time in MSK. Never confuse with UTC."
The growth pattern is precedent: each incident produces one new line in the persona file. After a month of operation the file has 20-30 hard rules, each one tied to a specific past mistake. The file is short because the rules are specific. (Connection to incident-driven-configuration — same engine, slightly wider scope.)
What prescriptions are still good for
The prohibition-grounded persona still wants a tone, a role, a tonal default. The case-study agent's persona file starts with: dry-professional by default, warmer if the client is warm. That sentence does work — it shapes the surface above the floor. What it cannot do is hold the persona alone. The prohibitions hold it; the prescription colours it.
A useful test: if every prescriptive line in the persona were deleted, would the persona still be recognisable? With a strong prohibition floor, yes — not-quite-bubbly, no exclamation marks, no "of course", short responses, women's gender is already a recognisable voice. Without prohibitions, prescriptions cannot do the work.
When the pattern fits
The pattern is strongest in three settings:
- Agents that must pass as human in chats where coworkers are not informed of the AI nature. The case study sits here.
- Brand voice in long-running narrative content where consistency across chapters, episodes, or seasons matters more than expressive range. The lock-layer-pattern-paper formalises the wider case of preservation-as-layer; persona-through-prohibitions is its agent-identity instance.
- Customer-facing assistants where any AI tic ("I'm here to help!", "Let me know if you have any other questions!") would signal a chatbot and lose user trust.
It is weakest where the persona is supposed to vary — creative-collaborator agents, character chatbots, role-playing systems. There, prescriptions are the load-bearing part and prohibitions are the safety floor.
Two failure modes
- Too few prohibitions. The persona file contains five rules. Three of the actual unmaskability surfaces are uncovered. The persona drifts on those three surfaces and the work to find the slip is reactive. Fix: enumerate every unmaskability surface explicitly before deployment; expect to add 3–5 more in the first month from observed incidents.
- Prohibitions without examples. Rules like "no overly chatty greetings" leave the model freedom to interpret. The phrasing collapses to: list the banned phrases and surfaces verbatim. "Never write Of course!, Happy to help!, Great question!, Let me know if you need anything else!" is enforceable. "Be sober in tone" is not.
See also
- lock-layer-pattern-paper — the broader pattern of preservation-as-separable-layer; the persona-prohibition list is one instance
- two-stage-architect-pattern-paper — system-prompt charter discipline; the prohibition list is a charter for the agent's voice
- incident-driven-configuration — the cycle that grows the prohibition list over time
- openclaw-autonomous-agent-paper — full case study where this pattern was deployed