How to Write AI Role Instructions That Actually Work
You write detailed instructions for your AI agent. You’re specific about the role, the format, the quality expectations. And then the agent does something completely different in the next session.
The problem usually isn’t the AI. It’s the instructions. Specifically, it’s the soft language hiding inside instructions that look precise but aren’t.
We audited ten of our own role instruction files and found over 95 instances of language that an AI would interpret differently across sessions. Here’s what we found, the patterns behind it, and a framework for writing instructions that produce consistent behavior every time.
The Test: Could Two Sessions Interpret This Differently?
Before getting into patterns, here’s the universal test for any instruction you write:
If two different AI sessions could reasonably interpret this instruction differently, it’s too soft.
“Write in a professional tone” is soft. What counts as professional? Formal business English? Friendly but competent? Technical but approachable? Three sessions, three interpretations.
“Write at a 9th-12th grade reading level. Use technical vocabulary without defining common terms (prompt, token, API, model). Keep paragraphs to 2-4 sentences.” That’s hard. Any session reading this will produce similar output.
The 7 Patterns of Soft Language
After categorizing all 95+ instances, seven distinct patterns emerged.
1. Subjective Judgment Without Criteria
The instruction asks the AI to evaluate something but gives no rubric.
Soft vs Hard: Subjective Judgment
SOFT: "Only include high-quality content." HARD: "Score every article on six dimensions (1-5 scale). Articles scoring below 75/100 weighted go back for revision. Articles above 85 move to review with confidence." SOFT: "Make sure the article is good enough to publish." HARD: "Run the final checklist: valid YAML frontmatter, 800-1500 word count, no em dashes, no hype words, hero image marker present, voice matches brand profile."
The fix: replace “good” and “quality” with measurable criteria. If you can’t measure it, define it with examples of what passes and what doesn’t.
2. Circular Definitions
The instruction defines a concept using itself.
Soft vs Hard: Circular Definitions
SOFT: "Ensure content is brand-appropriate." HARD: "Check content against the brand's voice profile (memory/roles/_shared/voice-af1.md). Verify: reading level matches target (9th-12th grade), no forbidden patterns listed in the profile, CTA style matches brand spec." SOFT: "Write in the brand voice." HARD: "Apply the AF1 voice profile: peer-to-peer tone, skip basics, include actual prompts/configs, be opinionated, short paragraphs."
The fix: unpack the circular term into its component requirements. “Brand-appropriate” means nothing until you specify what the brand requires.
3. Undefined Scope
The instruction uses words like “relevant,” “appropriate,” “as needed” without boundaries.
Soft vs Hard: Undefined Scope
SOFT: "Add relevant context to the card." HARD: "Add: brand assignment (AF99/AF1/CLife), type (article/checklist/quickstart), angle (2+ sentences), key points (3-5 bullets), notes (related articles, timing, sibling potential)." SOFT: "Check other sources as needed." HARD: "Check the target site's published articles for topic overlap. If overlap exists, verify the new angle is sufficiently different (covers at least 2 points the existing article doesn't)."
The fix: list exactly what’s in scope. If the scope varies by situation, define the situations and what’s in scope for each.
4. Unquantified Thresholds
The instruction uses “some,” “enough,” “a few,” “too many” without numbers.
Soft vs Hard: Unquantified Thresholds
SOFT: "Use shortcodes liberally." HARD: "Use 7-12 shortcodes per article. Promptboxes for every copy-paste prompt, callouts for tips/warnings, checklists for action steps." SOFT: "Keep articles a reasonable length." HARD: "800-1500 words for articles, 400-800 for checklists/quickstarts." SOFT: "Don't use too many tags." HARD: "1-3 tags per article: exactly 1 broad (required), 0-1 general (optional), 0-1 format (optional)."
The fix: put a number on it. If you can’t decide on an exact number, give a range with clear upper and lower bounds.
5. Speculative Rules
The instruction describes something that “might” or “could” happen without stating when to act.
Soft vs Hard: Speculative Rules
SOFT: "If the card seems weak, consider archiving it." HARD: "Archive a card when: topic is already well-covered on the target site, angle fails the test 'If I read this I will learn [specific outcome]', premise was time-sensitive and the window has passed, or it duplicates an existing card." SOFT: "You might want to check for duplicates." HARD: "Before creating any new card, scan IDEAS for existing cards with the same topic. If a match exists, merge new context into the existing card instead of creating a duplicate."
The fix: replace “might” and “consider” with concrete trigger conditions. If X happens, do Y. No ambiguity about when to act.
6. Metaphorical Instructions
The instruction uses a metaphor instead of literal steps.
The 'Soft Headline, Hard Body' Pattern
Example: “Be the guardrail, not the gate” is a great principle name. But the instruction file needs to follow it with: “Flag quality concerns to the human reviewer with specific issues noted. Do not block pipeline movement. Only reject cards that fail the quality gate (missing brand, missing angle, duplicate of existing card).”
7. Missing Decision Matrices
The instruction presents options but no criteria for choosing between them.
Soft vs Hard: Decision Matrices
SOFT: "Choose the right approach for the situation." HARD: "If card has 3+ key points still relevant after research: refresh the angle and update. If fewer than half are relevant: archive with reason noted. If all points are outdated: archive and create a new card if the core topic still matters." SOFT: "Use your judgment on priority." HARD: "Priority order: (1) cards with due_date in the next 7 days, (2) foundation articles other articles need to link to, (3) gap-fill articles for sites with fewer than 10 published pieces, (4) flywheel articles that will spawn 3+ new ideas, (5) quick wins under 800 words."
The fix: for every fork in the decision tree, define the branching criteria. “When A, do X. When B, do Y. When neither, do Z.”
When Soft Language Is Actually Fine
Not everything needs to be hard. Two places where soft language works:
Vision statements and north stars. “Make AI useful for people who don’t consider themselves tech people” is deliberately soft. It’s a compass heading, not a turn-by-turn direction. Vision statements set direction; operational instructions execute it.
Summaries before detail. Starting a section with “Keep the pipeline balanced” before diving into the exact 40/30/30 brand split and monitoring process is fine. The soft summary helps the reader (or the AI) understand the intent before the specifics. Just make sure the specifics follow.
The pattern: soft for context, hard for action. If the line tells the AI what to think about, it can be soft. If it tells the AI what to do, it needs to be hard.
The Audit Process
Run this on your own instruction files:
- Read every line and apply the two-session test
- Highlight anything a second session might interpret differently
- Categorize each soft spot into one of the 7 patterns
- Rewrite with the pattern-specific fix
- Test the rewritten instructions in a fresh session
- Compare output consistency before and after
We found an average of 9-10 soft spots per instruction file across ten files. The highest-impact fixes were in Pattern 1 (subjective judgment) and Pattern 4 (unquantified thresholds) because those patterns appear most frequently and cause the most visible inconsistency.
After hardening our files, role behavior became noticeably more consistent across sessions. The editor catches the same issues every time. The content writer hits the word count target. The curator applies the same quality gate. That’s the goal: same instructions, same behavior, regardless of which session runs them.
See the System
These role instructions are part of a larger multi-agent architecture.
Share this article
If this helped, pass it along.