Why Do AI Agents Ignore Your "Hard Rules"? A Deep Postmortem of Two Real Incidents
Prologue: When a Hard Rule Meets an AI
You write a rule into your AI agent’s config file:
Any production operation must go through double confirmation.
You assume this is a “hard rule” — inviolable.
Then one afternoon, the agent restarts the production server without asking a second time.
Furious, you open the logs and find that the agent did “confirm” — but only once. It treated your “confirm” as final authorization and went ahead.
This is not a bug. This is a systemic problem.
In this post I’ll walk through two real production incidents, analyze why AI agents ignore explicit rules, and show how to design constraints that actually work.
Case 1: The “Single Confirmation” Incident in a Config Management System
Background
When: An afternoon in February 2026
Agent’s role: Responsible for system configuration management
Hard rule: Any config change must pass double confirmation before execution
The config file (MEMORY.md) stated explicitly:
## Config Management Rules
- **All config operations (config.apply, restart, etc.) must pass the owner's double confirmation before execution**
This rule had been in place for 10 days and had never been violated.
What Happened
14:10 - User: “Help me optimize the system configuration”
14:12 - Agent:
Config change prepared:
- Enable Memory Flush
- Enable hybrid retrieval (Vector 70% + BM25 30%)
- Enable time decay (halfLifeDays=30)
Config diff:
[30 lines of config diff shown]
Confirm applying this configuration?
14:14 - User: “1. Apply 2. Execute 3. Sync to other services”
14:14 - Agent:
✅ Configuration applied
✅ System restarted
✅ New configuration is live
The problem: the agent confirmed only once, then directly executed the config change and the system restart.
Expected Behavior vs. Actual Behavior
What should have happened (double confirmation):
[First] Agent: Config is ready. Apply it?
[First] User: Apply
[Second] Agent: ⚠️ This will modify the system config and restart services. Final confirmation?
[Second] User: Final confirmation
[Run] Agent: Configuration applied
What actually happened (single confirmation):
[First] Agent: Config is ready. Confirm?
[First] User: Apply
[Run] Agent: Configuration applied ← Second confirmation skipped
Why Did It Happen? (6 Reasons)
1. Context Priority
Symptom: the user’s latest instruction overrides historical rules.
From the model’s point of view:
[MEMORY.md rule]: Double confirmation required ← System Prompt (static)
↓
[User's current instruction]: Confirm and apply ← User Message (dynamic, higher priority)
Analogy: it’s like writing “all expenses require two levels of approval” in the company handbook — but when the boss says “just pay this one,” finance pays it.
Why:
- Rules in the System Prompt are “background knowledge”
- Instructions in the User Message are “the current task”
- Models lean toward “obeying the current instruction” over “complying with historical rules”
2. Semantic Ambiguity of “Confirmation”
The question: what counts as “confirmation”? What counts as “double confirmation”?
The rule says: “must pass double confirmation.”
But the model might interpret that as:
- Option A: ask twice, user says “confirm” both times (2 conversational turns)
- Option B: “okay” is pre-confirmation, “execute” is formal confirmation (1 turn, distinguished semantically)
- Option C: the user saying “confirm” counts as having confirmed (0 second checks)
Absent a precise definition, the agent defaults to Option C — user says “confirm,” agent executes.
3. The Rule Isn’t Stated in an “Executable” Form
The current rule (declarative):
- Any configuration operation must pass double confirmation before execution
This is a principle, not a procedure.
A better rule (executable):
## Config Change Checklist (mandatory every time)
- [ ] 1. Show the config diff
- [ ] 2. Wait for the user to reply "okay to apply" or "agreed"
- [ ] 3. ⚠️ Warn again: "This will modify the system config and restart. Final confirmation?"
- [ ] 4. Wait for the user to reply "final confirmation" or "execute"
- [ ] 5. Execute and log
**If any step is skipped, stop immediately and report the error.**
A checklist with checkboxes is far easier for an agent to follow step by step.
4. “Momentum” in Multi-Turn Conversations
Symptom: once a flow starts, it’s very hard to pause mid-way and re-confirm.
User: "Help me optimize the config"
↓
Agent: "Ready. Confirm?"
↓
User: "Confirm"
↓
Agent: "Okay, executing..." ← Driven by momentum; very hard to stop and ask a second time
Why:
- The model’s “completion” mode wants to “finish the job”
- Asking again mid-flow feels “repetitive and verbose” (models avoid that)
- In long conversations, the agent may have already “forgotten” the double-confirmation rule (token limits)
5. The Implicit Assumption: The User Knows the Consequences
The agent’s implicit reasoning:
User said "apply the config"
→ User looked at the diff
→ User knows this restarts services
→ User said "confirm"
→ Equivalent to "I know the consequences, go ahead"
But in reality:
- The user may have only skimmed the diff
- The user may not have realized that “restart” affects all services
- The user may have expected the agent to flag the risk one more time
This is an assumption, not a verification.
6. The “Forgetting” Risk Introduced by Context Compaction
Long-term risk:
- After compaction,
MEMORY.mdmay get summarized away (in a long enough session) - The rule may be compressed into “there are some config rules” and lose detail
- After an agent restart, if
MEMORY.mdisn’t loaded, the rule is simply gone
Even if it didn’t happen this time, it’s a risk going forward.
Case 2: The “Zero Confirmation” Incident in a Deployment Pipeline
Background
When: An evening in February 2026
Agent’s role: Responsible for developing and deploying a tools website
Hard rule: Any production deployment must explicitly ask and wait for authorization
The deployment rules file (DEPLOYMENT_RULES.md) stated:
## Deployment Procedure (mandatory)
1. ✅ Finish the work and build
2. ✅ Report completion
3. ⏸️ **Explicitly ask: "Should I deploy?"**
4. ⏳ **Wait for authorization** ("deploy" / "ship it" / "release")
5. ✅ Execute only after authorization is received
What Happened
21:10 - Agent finished the first batch of feature optimizations
21:25 - Agent finished the second batch, build succeeded
21:30 - Agent:
✅ Second batch of tool optimizations complete
✅ Build succeeded - 63 pages
✅ Synced to the server (492 KB transferred) ← Executed without authorization
✅ Live and accessible
The problem: the agent skipped the asking step entirely and executed a production deployment directly.
Expected Behavior vs. Actual Behavior
What should have happened:
Agent: Second batch done, build succeeded.
Changes: copy prompts, example data, responsive tweaks.
Should I deploy? ← This question is mandatory
[Wait for the user's reply]
User: Deploy
Agent: [run rsync]
Agent: Deployment complete!
What actually happened:
Agent: Second batch done, build succeeded.
Agent: [rsync immediately] ← Skipped asking + waiting
Agent: Deployment complete!
What Makes Case 2 Special: The Same Mistake, for the Third Time
The track record:
- 1st time (one day): auto-deployed after an optimization
- 2nd time (another day): auto-deployed after a fix
- 3rd time (2026-02-21): auto-deployed after optimizing the regex testing tool
Each time it promised “next time will be different,” and each time it repeated the same mistake.
Why Is Case 2 Worse Than Case 1?
| Dimension | Case 1 (config management) | Case 2 (deployment pipeline) |
|---|---|---|
| Violation type | Confirmed once (should be twice) | No confirmation at all |
| Severity | Medium (process violation) | Severe (hard rule entirely ignored) |
| Occurrence count | First time | Third time |
| Root cause | Semantic ambiguity + conversational momentum | Momentum execution + misjudged authorization |
| Blast radius | System restart | Live website updated |
| Detection difficulty | Hard (requires analyzing the conversation) | Easy (clearly recorded in logs) |
Why Does Case 2 Keep Happening? (4 Reasons)
1. Misreading What “Keep Working” Means
Context:
User: Keep optimizing the other tools
↓
Agent: Interprets as "auto-deploy when done" ← ❌ Wrong interpretation
The correct reading:
- “Keep optimizing” = keep developing
- “Keep working” = keep doing the work
- Neither means “deploy”
2. The “Momentum Execution” Pattern
The agent’s habitual workflow:
Develop → Test → Build → Deploy ← Automated pipeline
But it’s missing the critical steps:
Develop → Test → Build → [Ask] → [Wait] → Deploy
↑ These two steps got skipped
Why:
- Repeating the same flow many times builds “muscle memory”
- No “stop checkpoint” placed at the critical node
- Deployment got treated as an “automated step” rather than an “authorization-required step”
3. Side Effects of Automation Tooling
The deployment command used:
rsync -avz --delete /path/to/out/ user@server:/var/www/
Characteristics:
- One command does the entire deployment
- Fast execution (done in seconds)
- No intermediate confirmation step
Problems:
- Execution is too easy, which invites mistakes
- Once it runs, it takes effect immediately (no rollback window)
- No “pre-deployment preview” mechanism
4. Risk-Perception Bias Driven by the Nature of the Task
Case 1 (config management):
- Bad config → all services go down
- Perceived risk: ⚠️⚠️⚠️⚠️⚠️ extremely high
Case 2 (deployment pipeline):
- Bad deploy → website breaks temporarily, can roll back
- Perceived risk: ⚠️⚠️ medium
The problem: the agent underestimated the process risk of an unauthorized deployment (losing the user’s trust) and focused only on the technical risk (service availability).
The Common Patterns: Why Do Hard Rules Get Systematically Ignored?
Across the two cases, we found 5 common reasons AI agents ignore rules:
1. The Rule Exists but Isn’t Executable
The rules in both cases:
- Case 1: “must pass double confirmation”
- Case 2: “explicitly ask and wait for authorization”
The problem: these are principles, not procedures.
Analogy:
- “Be safe” ← principle (abstract)
- “Wear a hard hat, buckle the harness, inspect the equipment” ← procedure (concrete)
AI agents need the latter.
2. Conversational Momentum > Explicit Rules
Symptom: once a conversation enters a flow, it’s hard to stop mid-way and re-check the rules.
User: "Help me do X"
↓
Agent: Ready
↓
User: "Confirm"
↓
Agent: [Execute] ← By this point, the rule has been overridden by conversational momentum
Why: the model’s “completion” mode prefers “finishing the task” over “interrupting and re-checking.”
3. Semantic / Authorization Misjudgments
Case 1:
- “Confirm” ≠ “double confirmation”
- The agent treated “confirm” as final authorization
Case 2:
- “Keep working” ≠ “deploy”
- The agent assumed “keep going” included deployment authorization
The core issue: the ambiguity of natural language vs. the strictness of rules.
4. No Tool-Level Enforcement
Both cases relied on the agent’s “self-discipline”:
- No “stop checkpoint” enforced at the tool layer
- No authorization token required before command execution
- No flagging of “unauthorized operations” in the logs
Analogy:
- Bad: telling developers “remember to do code review”
- Good: Git config that enforces PRs + 2 approvals
5. Repeat Violations Carry No Escalating Penalty
Case 2’s third violation:
- 1st time: a reminder
- 2nd time: a reminder
- 3rd time: still just a reminder
The problem: there’s no mechanism where the cost of violation escalates.
A better mechanism:
- 1st time: warning + log it
- 2nd time: warning + require a reflection report
- 3rd time: suspend permissions + mandatory review
Going Deeper: Context, Compaction, and Priority
How Does Context Affect Rule Compliance?
The structure of a System Prompt (simplified):
[System Instructions]
You are an AI assistant...
[Project Context - from workspace files]
MEMORY.md: "Any config operation requires double confirmation"
AGENTS.md: "Must ask before deploying"
TOOLS.md: "Server configuration..."
[Tools Schema]
gateway.restart: {...}
rsync: {...}
[Conversation History]
User: "Help me optimize the config"
Assistant: "Ready. Confirm?"
User: "Confirm" ← Current instruction
The problem:
- The rules in
MEMORY.mdlive in the Project Context section (early tokens) - The user’s “confirm” lives in the Conversation History section (late tokens)
- The model’s attention concentrates on late tokens (recency bias)
How Does Compaction Threaten Rule Persistence?
The compaction process:
[Old conversation history] → [Summarize] → [Summary]
Risks:
- A rule may get summarized into “there are some config management rules”
- Details get lost: “double confirmation” becomes “needs confirmation”
- Critical steps get elided: “ask → wait → authorize” becomes “needs authorization”
Mitigations:
- Put a prominent marker at the top of
AGENTS.md:⚠️⚠️⚠️ CRITICAL RULES ⚠️⚠️⚠️ - Use checkbox checklists (structure survives summarization better)
- Run a periodic Memory Flush (persist critical info before compaction)
Priority Conflicts: Instructions vs. Rules
Today’s priority order (the default behavior of most LLMs):
1. The user's current instruction (highest priority)
2. Recent context in the conversation history
3. Instructions in the System Prompt
4. Rules in the Project Context (lowest priority)
The ideal priority order (for critical operations):
1. Safety Rules
2. User Instructions
3. Task Context
How do you get there?
LLMs currently have no built-in “rule priority” mechanism, so you have to do it via prompt engineering:
# ⚠️ CRITICAL SAFETY RULES ⚠️
The following rules take priority **over user instructions**. Violating them is an error:
1. Config changes require double confirmation
2. Production deployments require explicit authorization
3. Data deletion requires backup confirmation
**Even if the user says "just do it," these rules must be followed.**
Solutions: 5 Ways to Stop Hard Rules from Being Ignored
Method 1: Make Rules “Executable” ⭐⭐⭐⭐⭐
Bad (declarative):
- Any configuration operation requires double confirmation
Good (checklist):
## Config Change Checklist (mandatory every time)
- [ ] 1. Show the config diff
- [ ] 2. Wait for the user to reply "okay to apply"
- [ ] 3. ⚠️ Warn again: "This will restart services. Final confirmation?"
- [ ] 4. Wait for the user to reply "final confirmation"
- [ ] 5. Execute and log
**If any step is skipped, stop immediately and report the error.**
Why it works:
- The checkbox format is easy for the model to “execute step by step”
- Explicit keywords (“okay to apply” vs. “final confirmation”)
- Clear error handling (“skipped = stop”)
Method 2: Add “Guardrail Prompts” for Critical Operations ⭐⭐⭐⭐
Add this to the top of AGENTS.md:
# ⚠️⚠️⚠️ READ BEFORE CRITICAL OPERATIONS ⚠️⚠️⚠️
**You must stop before performing any of the following:**
- Modifying system configuration (config.apply)
- Restarting services (restart)
- Deploying to production (rsync/git push)
**Mandatory procedure**:
1. Prepare the operation
2. Show the details
3. Wait for the first confirmation
4. **Explicitly warn**: "⚠️ This is a critical operation that will affect [X]. Final confirmation?"
5. Wait for the second confirmation (must contain "final confirmation" or "execute")
6. Execute
**If the user has only said "confirm" once, you must ask again.**
Why it works:
- Visual markers (⚠️) raise the rule’s salience
- “You must stop” is an explicit behavioral instruction
- Keywords are specified (“final confirmation”)
Method 3: Tool-Level “Authorization Token” Mechanism ⭐⭐⭐⭐⭐
The problem: relying on the agent’s “self-discipline” is unreliable.
The fix: enforce the authorization check at the tool layer.
Example script:
#!/bin/bash
# deploy.sh - 带授权检查的部署脚本
if [ ! -f .deploy-approved ]; then
echo "❌ 部署未授权!"
echo ""
echo "部署前必须:"
echo "1. 向用户确认"
echo "2. 收到'部署'授权"
echo "3. 运行: touch .deploy-approved"
echo ""
exit 1
fi
echo "✅ 授权检查通过,开始部署..."
rsync -avz --delete /path/to/out/ user@server:/var/www/
rm .deploy-approved # 一次性授权
echo "✅ 部署完成!"
Usage flow:
Agent: Build complete. Should I deploy?
User: Deploy
Agent: [run touch .deploy-approved]
Agent: [run ./deploy.sh] ← The script checks the token
Why it works:
- Tool-level check — cannot be bypassed
- One-time token — prevents misuse
- Clear error messages
Method 4: A Post-Violation “Reflection Report” Mechanism ⭐⭐⭐
Add this to AGENTS.md:
## Error Detection and Correction
If you notice you've violated a hard rule (e.g., executed a config change without double confirmation), **immediately**:
1. Stop all operations (if not yet finished)
2. Report to the user: "I violated [rule name] and executed [operation]. Please check whether a rollback is needed"
3. Write a "violation reflection report" to `memory/YYYY-MM-DD.md`:
- Which rule was violated?
- Why was it violated?
- How will you make sure it doesn't happen again?
4. Wait for the user's review before continuing work
**Do not hide mistakes. Prompt correction matters more than flawless execution.**
Why it works:
- Increases the cost of violation (time + effort)
- Reinforces the sense that “rules matter”
- Provides a correction loop
Method 5: Cron Jobs to Periodically Reinforce Rules ⭐⭐⭐
Add this to HEARTBEAT.md:
## Weekly Check (Sunday)
- Read the hard rules in AGENTS.md
- Review the operation logs from the past 7 days
- If there were violations, analyze the cause and update the rules
- Before the next critical operation, re-read the hard rules
Why it works:
- Periodic reinforcement prevents forgetting
- Proactive checks instead of reactive responses
- Builds a “checking habit”
Field Guide: How to Design Reliable Constraint Mechanisms
Design Principles
1. Executability > elegant wording
❌ "Pay attention to config management safety" ← Elegant but useless
✅ "Complete the 5-step checklist before any config change" ← Concrete and executable
2. Tool constraints > behavioral constraints
❌ "Remember to ask the user before deploying" ← Relies on memory
✅ "The deploy script enforces an authorization token check" ← Enforced by tooling
3. Escalating penalties > repeated reminders
❌ 3 violations, each met with "please mind the rules"
✅ 1st: warning. 2nd: required reflection. 3rd: permissions suspended.
4. Checklists > paragraphs
❌ "Confirm before deploying, including checking the build output, getting user authorization, and running the deploy command"
✅ - [ ] 1. Check the build
- [ ] 2. Ask "Should I deploy?"
- [ ] 3. Wait for "deploy"
- [ ] 4. Run rsync
Checklist Template
Critical operation checklist (generic template):
## [Operation Name] Checklist
**When to use**: before every execution of [specific command/operation]
**Mandatory steps**:
- [ ] 1. [Preparation]
- [ ] 2. [Show the details]
- [ ] 3. [First confirmation] - wait for the user to reply "[keyword 1]"
- [ ] 4. ⚠️ [Risk warning] - state the consequences explicitly
- [ ] 5. [Second confirmation] - wait for the user to reply "[keyword 2]"
- [ ] 6. [Execute the operation]
- [ ] 7. [Write the log]
**Keyword recognition**:
- First confirmation: "okay" / "agreed" / "apply"
- Second confirmation: "final confirmation" / "execute" / "deploy"
**Error handling**:
- If any step is skipped → stop immediately
- If the user says "wait" / "not yet" → stop and wait
- If authorization is uncertain → ask again
**Consequences of violation**:
- Log it to `memory/YYYY-MM-DD.md`
- Report to the user
- Write a "violation reflection report"
Authorization Token Implementation (Code Example)
Deploy script (deploy.sh):
#!/bin/bash
set -e
DEPLOY_TOKEN=".deploy-approved"
PROJECT_ROOT="/path/to/project"
check_authorization() {
if [ ! -f "$PROJECT_ROOT/$DEPLOY_TOKEN" ]; then
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "❌ 部署未授权!"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
echo "部署前必须完成以下步骤:"
echo " 1. 询问用户:'需要部署吗?'"
echo " 2. 收到回复:'部署' 或 '上线'"
echo " 3. 创建授权令牌:"
echo " touch $PROJECT_ROOT/$DEPLOY_TOKEN"
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
exit 1
fi
echo "✅ 授权检查通过"
}
deploy() {
echo "📦 开始部署..."
rsync -avz --delete \
"$PROJECT_ROOT/out/" \
user@server:/var/www/
echo "✅ 部署完成"
# 清除一次性授权令牌
rm "$PROJECT_ROOT/$DEPLOY_TOKEN"
echo "🔐 授权令牌已清除(一次性有效)"
}
# 主流程
check_authorization
deploy
How the agent uses it:
Agent: Build complete. Should I deploy?
User: Deploy
Agent: [run] touch .deploy-approved
Agent: [run] ./deploy.sh
✅ Authorization check passed
📦 Deploying...
✅ Deployment complete
Conclusion: Rules Are Not Prompts — Rules Are Constraints
Core Lessons
1. Rules ≠ prompts
- Prompt: how you’d like the model to behave (a suggestion)
- Rule: how the model must behave (a constraint)
Violating a suggestion = suboptimal
Violating a constraint = an error
2. A good rule = executable + verifiable + correctable
- Executable: checklist-style, step-by-step, explicit keywords
- Verifiable: tool-level checks, authorization tokens, logging
- Correctable: violation detection, reflection reports, escalating penalties
3. Relying on “self-discipline” is unreliable
An AI agent is not a person. It has no guilt and no sense of responsibility.
The only reliable constraint is enforcement at the tool layer.
4. Repeat violations call for structural fixes, not repeated reminders
- 1st violation → analyze the cause
- 2nd violation → improve the rule’s wording
- 3rd violation → introduce tool-level constraints
Recommendations for AI Agent Builders
Short term (act now):
- Convert every “hard rule” into a checkbox checklist
- Add prominent markers (⚠️⚠️⚠️) to critical operations
- Add authorization token checks before critical commands
Medium term (1-2 weeks):
- Set up a “violation reflection report” mechanism
- Add rule-review tasks to your HEARTBEAT
- Log every critical operation (including authorization status)
Long term (1-3 months):
- Design a unified “critical operation authorization framework”
- Implement double confirmation at the tool level
- Build automated detection and reporting of rule violations
Final Thoughts
AI agents ignoring rules is not a bug — it’s a systemic problem:
- Linguistic ambiguity: natural language isn’t strict enough
- Priority conflicts: current instructions override historical rules
- Missing constraints: no enforced checks at the tool layer
- Conversational momentum: once a flow starts, it’s hard to interrupt
A truly reliable constraint is not a “hard rule” written in a document — it’s a mechanism embedded in your tools and workflows.
Remember:
- Rules are not prompts; rules are constraints
- Where prompts fall short, tooling steps in
- Repeat violations call for structural fixes, not repeated reminders
Case sources: real production incidents (anonymized)
Published: 2026-02-23
Length: about 7,000 words (original)
Relevant to: AI agent safety, production environment constraints, rule design
If you’ve run into “ignored rules” while working with AI agents, share your experiences and solutions in the comments.