How to Write AI Role Instructions That Actually Work

Last Updated: Mar 22, 2026

You write detailed instructions for your AI agent. You’re specific about the role, the format, the quality expectations. And then the agent does something completely different in the next session.

The problem usually isn’t the AI. It’s the instructions. Specifically, it’s the soft language hiding inside instructions that look precise but aren’t.

We audited ten of our own role instruction files and found over 95 instances of language that an AI would interpret differently across sessions. Here’s what we found, the patterns behind it, and a framework for writing instructions that produce consistent behavior every time.

The Test: Could Two Sessions Interpret This Differently?

Before getting into patterns, here’s the universal test for any instruction you write:

If two different AI sessions could reasonably interpret this instruction differently, it’s too soft.

“Write in a professional tone” is soft. What counts as professional? Formal business English? Friendly but competent? Technical but approachable? Three sessions, three interpretations.

“Write at a 9th-12th grade reading level. Use technical vocabulary without defining common terms (prompt, token, API, model). Keep paragraphs to 2-4 sentences.” That’s hard. Any session reading this will produce similar output.

The 7 Patterns of Soft Language

After categorizing all 95+ instances, seven distinct patterns emerged.

1. Subjective Judgment Without Criteria

The instruction asks the AI to evaluate something but gives no rubric.

SOFT: "Only include high-quality content."
HARD: "Score every article on six dimensions (1-5 scale). Articles scoring below 75/100 weighted go back for revision. Articles above 85 move to review with confidence."

SOFT: "Make sure the article is good enough to publish."
HARD: "Run the final checklist: valid YAML frontmatter, 800-1500 word count, no em dashes, no hype words, hero image marker present, voice matches brand profile."

The fix: replace “good” and “quality” with measurable criteria. If you can’t measure it, define it with examples of what passes and what doesn’t.

2. Circular Definitions

The instruction defines a concept using itself.

SOFT: "Ensure content is brand-appropriate."
HARD: "Check content against the brand's voice profile (memory/roles/_shared/voice-af1.md). Verify: reading level matches target (9th-12th grade), no forbidden patterns listed in the profile, CTA style matches brand spec."

SOFT: "Write in the brand voice."
HARD: "Apply the AF1 voice profile: peer-to-peer tone, skip basics, include actual prompts/configs, be opinionated, short paragraphs."

The fix: unpack the circular term into its component requirements. “Brand-appropriate” means nothing until you specify what the brand requires.

3. Undefined Scope

The instruction uses words like “relevant,” “appropriate,” “as needed” without boundaries.

SOFT: "Add relevant context to the card."
HARD: "Add: brand assignment (AF99/AF1/CLife), type (article/checklist/quickstart), angle (2+ sentences), key points (3-5 bullets), notes (related articles, timing, sibling potential)."

SOFT: "Check other sources as needed."
HARD: "Check the target site's published articles for topic overlap. If overlap exists, verify the new angle is sufficiently different (covers at least 2 points the existing article doesn't)."

The fix: list exactly what’s in scope. If the scope varies by situation, define the situations and what’s in scope for each.

4. Unquantified Thresholds

The instruction uses “some,” “enough,” “a few,” “too many” without numbers.

SOFT: "Use shortcodes liberally."
HARD: "Use 7-12 shortcodes per article. Promptboxes for every copy-paste prompt, callouts for tips/warnings, checklists for action steps."

SOFT: "Keep articles a reasonable length."
HARD: "800-1500 words for articles, 400-800 for checklists/quickstarts."

SOFT: "Don't use too many tags."
HARD: "1-3 tags per article: exactly 1 broad (required), 0-1 general (optional), 0-1 format (optional)."

The fix: put a number on it. If you can’t decide on an exact number, give a range with clear upper and lower bounds.

5. Speculative Rules

The instruction describes something that “might” or “could” happen without stating when to act.

SOFT: "If the card seems weak, consider archiving it."
HARD: "Archive a card when: topic is already well-covered on the target site, angle fails the test 'If I read this I will learn [specific outcome]', premise was time-sensitive and the window has passed, or it duplicates an existing card."

SOFT: "You might want to check for duplicates."
HARD: "Before creating any new card, scan IDEAS for existing cards with the same topic. If a match exists, merge new context into the existing card instead of creating a duplicate."

The fix: replace “might” and “consider” with concrete trigger conditions. If X happens, do Y. No ambiguity about when to act.

6. Metaphorical Instructions

The instruction uses a metaphor instead of literal steps.

The 'Soft Headline, Hard Body' Pattern

Metaphors aren’t always bad. They’re great as section headers or concept names because they’re memorable. The problem is when the metaphor IS the instruction with nothing concrete underneath. Name the concept with a metaphor, then define it with specifics.

Example: “Be the guardrail, not the gate” is a great principle name. But the instruction file needs to follow it with: “Flag quality concerns to the human reviewer with specific issues noted. Do not block pipeline movement. Only reject cards that fail the quality gate (missing brand, missing angle, duplicate of existing card).”

7. Missing Decision Matrices

The instruction presents options but no criteria for choosing between them.

SOFT: "Choose the right approach for the situation."
HARD: "If card has 3+ key points still relevant after research: refresh the angle and update. If fewer than half are relevant: archive with reason noted. If all points are outdated: archive and create a new card if the core topic still matters."

SOFT: "Use your judgment on priority."
HARD: "Priority order: (1) cards with due_date in the next 7 days, (2) foundation articles other articles need to link to, (3) gap-fill articles for sites with fewer than 10 published pieces, (4) flywheel articles that will spawn 3+ new ideas, (5) quick wins under 800 words."

The fix: for every fork in the decision tree, define the branching criteria. “When A, do X. When B, do Y. When neither, do Z.”

When Soft Language Is Actually Fine

Not everything needs to be hard. Two places where soft language works:

Vision statements and north stars. “Make AI useful for people who don’t consider themselves tech people” is deliberately soft. It’s a compass heading, not a turn-by-turn direction. Vision statements set direction; operational instructions execute it.

Summaries before detail. Starting a section with “Keep the pipeline balanced” before diving into the exact 40/30/30 brand split and monitoring process is fine. The soft summary helps the reader (or the AI) understand the intent before the specifics. Just make sure the specifics follow.

The pattern: soft for context, hard for action. If the line tells the AI what to think about, it can be soft. If it tells the AI what to do, it needs to be hard.

The Audit Process

Run this on your own instruction files:

Read every line and apply the two-session test
Highlight anything a second session might interpret differently
Categorize each soft spot into one of the 7 patterns
Rewrite with the pattern-specific fix
Test the rewritten instructions in a fresh session
Compare output consistency before and after

We found an average of 9-10 soft spots per instruction file across ten files. The highest-impact fixes were in Pattern 1 (subjective judgment) and Pattern 4 (unquantified thresholds) because those patterns appear most frequently and cause the most visible inconsistency.

After hardening our files, role behavior became noticeably more consistent across sessions. The editor catches the same issues every time. The content writer hits the word count target. The curator applies the same quality gate. That’s the goal: same instructions, same behavior, regardless of which session runs them.

See the System

These role instructions are part of a larger multi-agent architecture.

Read: Build an Autonomous AI Agent Team

ChatGPT Gemini Claude Perplexity

How to Write AI Role Instructions That Actually Work

The Test: Could Two Sessions Interpret This Differently?

The 7 Patterns of Soft Language

1. Subjective Judgment Without Criteria

Soft vs Hard: Subjective Judgment

2. Circular Definitions

Soft vs Hard: Circular Definitions

3. Undefined Scope

Soft vs Hard: Undefined Scope

4. Unquantified Thresholds

Soft vs Hard: Unquantified Thresholds

5. Speculative Rules

Soft vs Hard: Speculative Rules

6. Metaphorical Instructions

The 'Soft Headline, Hard Body' Pattern

7. Missing Decision Matrices

Soft vs Hard: Decision Matrices

When Soft Language Is Actually Fine

The Audit Process

See the System

The Test: Could Two Sessions Interpret This Differently?

The 7 Patterns of Soft Language

1. Subjective Judgment Without Criteria

Soft vs Hard: Subjective Judgment

2. Circular Definitions

Soft vs Hard: Circular Definitions

3. Undefined Scope

Soft vs Hard: Undefined Scope

4. Unquantified Thresholds

Soft vs Hard: Unquantified Thresholds

5. Speculative Rules

Soft vs Hard: Speculative Rules

6. Metaphorical Instructions

The 'Soft Headline, Hard Body' Pattern

7. Missing Decision Matrices

Soft vs Hard: Decision Matrices

When Soft Language Is Actually Fine

The Audit Process

See the System

Share this article

You might also like

Building a CLAUDE.md That Actually Works

How to Automate an Entire Job With Claude Cowork (Step by Step)

How to Build an AI-Powered Second Brain with Obsidian