Session 47: The One That Broke Me
It was 11pm on a Tuesday. I was debugging a 403 error in one of the five Go microservices I maintain. Claude Code had been helping me trace the issue through DynamoDB logs for twenty minutes. Good progress.
Then it ran this:
# What Claude actually ran:
go test -short ./handler/admin/...
# in: ~/code/api-gateway/ ← WRONG REPO
# should be: ~/code/auth-service/Wrong repo. Wrong test command. This service uses make test-unit, not go test. Claude was confidently editing files in a completely different codebase, and I didn't catch it for three more commands.
I fixed the mess, corrected Claude, and moved on. The next morning, new session, same bug — Claude opened the wrong repo again. It had no memory of last night. No memory of the correction. No memory of anything.
That was the moment I decided: every correction that isn't saved is a correction you'll make again.
The Numbers That Made Me Build This
Six months later, I ran an analysis across all my Claude Code sessions. The insights tool I'd built by then could crunch the data:
| Mistake Category | Occurrences | Example |
|---|---|---|
| Wrong repo or directory | 137 | Editing service-A code from service-B checkout |
| Wrong data source | 48 | Querying staging table with prod profile |
| Scope creep | 31 | Touching V1 handlers while fixing a V3 bug |
| Shallow analysis | 29 | Reading git log instead of actual source file |
137 times. The same category of mistake, one hundred and thirty-seven times. Not because Claude is bad — because every session starts from zero. The AI equivalent of that colleague who keeps asking "wait, which database is staging again?" except they ask it every single morning forever.
So I built a system to make it stop. It took six months of iteration, and I'm going to show you every file.
The Architecture: A Nervous System for Your AI
Here's what the full setup looks like today:
~/.claude/
├── settings.json ← The nervous system (hooks wiring)
│
├── rules/ ← Tier 1: Behavioral guardrails
│ ├── agent-behavior.md # Auto-save protocol, knowledge routing
│ ├── error-recovery.md # Reflexion protocol, retry budgets
│ ├── grader-pipeline.md # Per-service quality gates
│ └── workflow-discipline.md # Worktree-first, branch naming
│
├── projects/*/memory/ ← Tier 2: Corrections & project facts
│ ├── MEMORY.md # Index file (loaded every session)
│ └── *.md # Individual memory files
│
├── skills/ ← Tier 3: Reusable procedures
│ ├── SKILL_INDEX.md # Trigger-based index
│ └── */SKILL.md # 15+ skill files
│
├── scripts/ ← 25 bash scripts (enforcement layer)
│ ├── guard-*.sh # 5 PreToolUse guards
│ ├── detect-*.sh # Correction & change detection
│ ├── post-stop-*.sh # Session-end reflection
│ ├── nudge-learn.sh # Periodic learning nudge
│ └── pre-compact-*.sh # Context compaction recovery
│
└── tools/ ← Cross-session search
├── recall_index.py # JSONL → SQLite FTS5 indexer
└── recall_search.py # BM25 search with recency boostEach tier has a different loading strategy:
| Tier | When Loaded | What It Stores | Example |
|---|---|---|---|
| Rules | When you touch files matching path patterns | "Always/never" behavioral statements | "Never commit to master" |
| Memory | Every session startup | Corrections, project facts, preferences | "DynamoDB prefix is stg- not staging-" |
| Skills | On-demand via trigger matching | Multi-step procedures with exact commands | "How to deploy auth-service to staging" |
But the architecture diagram doesn't explain why this works. The secret is in the hooks.
The Nervous System: settings.json
This is the single most important file. It wires 25 bash scripts into 6 lifecycle events. Every action Claude takes passes through this nervous system:
// file: ~/.claude/settings.json (the real one, running in production right now)
{
"hooks": {
"SessionStart": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "python3 ~/.claude/tools/recall_index.py; bash ~/.claude/scripts/detect-external-changes.sh; echo \"Branch: $(git branch --show-current) | Uncommitted: $(git status --short | wc -l) files\""
}]
}],
"UserPromptSubmit": [{
"matcher": "*",
"hooks": [
{ "type": "command", "command": "bash ~/.claude/scripts/detect-correction.sh" },
{ "type": "command", "command": "bash ~/.claude/scripts/nudge-learn.sh" }
]
}],
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{ "type": "command", "command": "bash ~/.claude/scripts/guard-force-push.sh" },
{ "type": "command", "command": "bash ~/.claude/scripts/guard-dangerous-commands.sh" },
{ "type": "command", "command": "bash ~/.claude/scripts/guard-aws-env.sh" },
{ "type": "command", "command": "bash ~/.claude/scripts/guard-commit-master.sh" }
]
},
{
"matcher": "Edit",
"hooks": [{ "type": "command", "command": "bash ~/.claude/scripts/guard-worktree.sh" }]
},
{
"matcher": "Write",
"hooks": [{ "type": "command", "command": "bash ~/.claude/scripts/guard-worktree.sh" }]
}
],
"PostToolUse": [{
"matcher": "Edit",
"hooks": [{ "type": "command", "command": "bash ~/.claude/scripts/post-edit-gofmt.sh" }]
}],
"Stop": [{
"matcher": "",
"hooks": [
{ "type": "command", "command": "bash ~/.claude/scripts/post-stop-reflect.sh" }
]
}]
}
}Six events. Every Bash command passes through 4 guards. Every file edit goes through the worktree check. Every session end triggers mandatory reflection. Every user message gets scanned for corrections.
This is the nervous system. Now let me show you the organs.
Guard Scripts: The Teeth
The realization that changed everything: rules without enforcement are suggestions. I can write "never commit to main" in a markdown file all day. Claude will still do it at 2am when the context gets long and it forgets.
Guard scripts are bash hooks that intercept commands before they execute. Exit code 0 means "allowed." Exit code 2 means "blocked." The API is dead simple.
The 17-line script that saved my deploys
#!/bin/bash
# file: ~/.claude/scripts/guard-commit-master.sh
# Guard: block git commit on master/main — must be in a feature branch
INPUT=$(cat)
COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty' 2>/dev/null)
[[ -z "$COMMAND" ]] && exit 0
if echo "$COMMAND" | grep -qE 'git\s+commit'; then
BRANCH=$(git branch --show-current 2>/dev/null)
if [[ "$BRANCH" == "master" || "$BRANCH" == "main" ]]; then
echo "BLOCKED: Cannot commit on $BRANCH. Use a worktree branch."
echo " Use EnterWorktree with a ticket name (e.g., PROJ-1234-description)."
exit 2
fi
fi
exit 017 lines. Reads the command from stdin as JSON, checks if it's a git commit, checks the branch. That's it.
In six months, this script has silently blocked accidental commits to protected branches. Each one would have triggered the CI pipeline, sent Slack alerts, and potentially woken up the on-call engineer (sometimes me). Seventeen lines of bash, sleeping silently until the moment you need them.
The worktree enforcer
This one's more aggressive — it blocks ANY source code edit outside a git worktree:
#!/bin/bash
# file: ~/.claude/scripts/guard-worktree.sh
# Guard: block code edits if not in a git worktree
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty' 2>/dev/null)
# Allow editing .claude/ config files anywhere
if [[ "$FILE_PATH" == *"/.claude/"* ]]; then
exit 0
fi
# Allow editing home directory dotfiles (not in a repo)
if [[ "$FILE_PATH" == "$HOME/"* ]] && [[ "$FILE_PATH" != *"/code/"* ]]; then
exit 0
fi
# Allow if not in a git repo at all
if ! git rev-parse --is-inside-work-tree &>/dev/null 2>&1; then
exit 0
fi
# Check if in a worktree (git-dir contains /worktrees/ for worktree checkouts)
GIT_DIR=$(git rev-parse --git-dir 2>/dev/null)
if [[ "$GIT_DIR" == *"/worktrees/"* ]]; then
exit 0
fi
echo "BLOCKED: Not in a worktree. Call EnterWorktree before editing source files."
exit 2Every code change goes through worktree → feature branch → PR → merge. Zero exceptions. This is why I never accidentally edit code on the main branch anymore — the guard makes it structurally impossible.
The one that catches production disasters
My favorite guard: the AWS environment mismatch detector.
#!/bin/bash
# file: ~/.claude/scripts/guard-aws-env.sh
# Blocks profile/table prefix mismatches in DynamoDB commands
INPUT=$(cat)
COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty' 2>/dev/null)
[[ -z "$COMMAND" ]] && exit 0
# Block inline credentials — force profile usage
if echo "$COMMAND" | grep -qE 'AWS_SECRET_ACCESS_KEY='; then
echo "BLOCKED: Inline AWS credentials detected. Use AWS_PROFILE instead."
exit 2
fi
# Catch profile/table prefix mismatch
if echo "$COMMAND" | grep -qE 'dynamodb.*(scan|query|get-item|put-item)'; then
if echo "$COMMAND" | grep -qE '(PROFILE=stage|--profile stage)' && \
echo "$COMMAND" | grep -qE '(prod-|prd-)myapp'; then
echo "BLOCKED: Stage profile but production table name detected."
echo " Stage tables use prefix: stg-myapp-"
exit 2
fi
if echo "$COMMAND" | grep -qE '(PROFILE=prod|--profile prod)' && \
echo "$COMMAND" | grep -q 'stg-myapp'; then
echo "BLOCKED: Prod profile but staging table name detected."
echo " Prod tables use prefix: prd-myapp-"
exit 2
fi
fi
exit 0This catches a class of mistake that gives me nightmares: querying the production DynamoDB table with staging credentials, or vice versa. Claude gets confused about prefixes (stg- vs stage- vs staging-, prd- vs prod- vs production-) — and honestly, so do humans. This guard doesn't care about your intentions. It reads the actual command and checks the actual prefix against the actual profile.
Of my 48 "wrong data source" incidents, this guard would have caught at least 30. I built it after the other 18. (That's how it always works — you build the guard after the incident, not before.)
The Auto-Correction Loop
Guards prevent mistakes. But the real magic is in the learning loop — a three-script pipeline that detects corrections, nudges learning, and enforces reflection.
Step 1: Detect the correction (bilingual)
Every message I type passes through this:
#!/bin/bash
# file: ~/.claude/scripts/detect-correction.sh
# Runs on every prompt — must be fast (<200ms)
INPUT=$(cat)
PROMPT=$(echo "$INPUT" | jq -r '.prompt // ""' 2>/dev/null)
[ -z "$PROMPT" ] && exit 0
# Correction patterns (Chinese + English)
if echo "$PROMPT" | grep -qiE \
'不是這|不要這|不要用|不對|錯了|搞錯|你搞|重來|重做|別這樣|wrong|not what I|don.t do that|that.s wrong|stop doing|no[,.].*instead|no[,.].*use'; then
# Create marker so Stop hook knows corrections happened
touch ~/.claude/sessions/.correction-detected
cat << 'JSONEOF'
{"additionalContext": "User correction detected. MANDATORY: After fixing this, save the correction as a feedback memory BEFORE continuing. Include Why and How-to-apply."}
JSONEOF
fi
exit 0When I say "不對" (wrong) or "don't do that" or "no, use X instead," the script detects it, drops a marker file, and injects a system message telling Claude to save the correction immediately. Not at the end of the session. Not when it remembers. Right now.
The bilingual patterns matter. I think in Chinese when I'm frustrated ("你搞錯了" comes out faster than "you got it wrong"), and the script catches both languages.
Step 2: Nudge periodic learning
Every 15 prompts:
#!/bin/bash
# file: ~/.claude/scripts/nudge-learn.sh
# Hermes-style periodic nudge
COUNTER_FILE="$HOME/.claude/sessions/.prompt-counter"
NUDGE_INTERVAL=15
if [[ -f "$COUNTER_FILE" ]]; then
COUNT=$(cat "$COUNTER_FILE" 2>/dev/null | tr -d ' \n')
COUNT=$((COUNT + 1))
else
COUNT=1
fi
echo "$COUNT" > "$COUNTER_FILE"
if [[ $((COUNT % NUDGE_INTERVAL)) -eq 0 ]]; then
cat << 'JSONEOF'
{"additionalContext": "LEARNING REVIEW (periodic). Check:\n- Stuck? Run /recall — a past session may have the answer.\n- User corrected you? Save to memory.\n- Complex task solved (5+ tool calls)? Save as skill.\nOnly act if something is genuinely worth saving."}
JSONEOF
fi
exit 0Inspired by Hermes agent research on periodic self-reflection. Every 15 prompts, Claude gets a nudge: "Have you learned anything worth saving?" Most of the time the answer is no. But when it is yes — when Claude just spent 8 tool calls figuring out the deploy sequence — that nudge turns a one-time discovery into a permanent skill.
(I started at every 5 prompts. Unbearable. Claude would interrupt itself mid-thought to earnestly save that "the user prefers tabs over spaces." 15 is the sweet spot.)
Step 3: Block the exit until you reflect
The nuclear option. When a session ends:
#!/bin/bash
# file: ~/.claude/scripts/post-stop-reflect.sh (simplified)
INPUT=$(cat)
# Self-gate: only block once per session
GATE_FILE="$HOME/.claude/sessions/.stop-reflected"
if [[ -f "$GATE_FILE" ]]; then
echo '{"decision": "approve"}'
exit 0
fi
CORRECTIONS_DETECTED=false
if [[ -f ~/.claude/sessions/.correction-detected ]]; then
CORRECTIONS_DETECTED=true
rm -f ~/.claude/sessions/.correction-detected
fi
# Check if session was substantial (>30KB transcript)
SUBSTANTIAL=false
TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path // ""' 2>/dev/null)
if [[ -n "$TRANSCRIPT" ]] && [[ -f "$TRANSCRIPT" ]]; then
SIZE=$(wc -c < "$TRANSCRIPT" 2>/dev/null | tr -d ' ')
[[ "$SIZE" -gt 30000 ]] && SUBSTANTIAL=true
fi
if [[ "$SUBSTANTIAL" == "true" ]] || [[ "$CORRECTIONS_DETECTED" == "true" ]]; then
touch "$GATE_FILE" # Only block once
MSG="MANDATORY SESSION REFLECTION:\n\n"
if [[ "$CORRECTIONS_DETECTED" == "true" ]]; then
MSG="${MSG}CORRECTIONS DETECTED — save each as feedback memory NOW.\n\n"
fi
MSG="${MSG}1) Reusable patterns? → /learn\n"
MSG="${MSG}2) Mistakes to prevent? → feedback memory\n"
MSG="${MSG}3) New user preferences? → user memory\n"
MSG_ESCAPED=$(echo -e "$MSG" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))')
echo "{\"decision\": \"block\", \"reason\": \"Save learnings first\", \"systemMessage\": $MSG_ESCAPED}"
else
echo '{"decision": "approve"}'
fiWhen you correct Claude during a session and the session tries to end, this hook blocks the exit. It won't let Claude stop until it saves what it learned. The self-gate (.stop-reflected) ensures it only blocks once — after Claude reflects, it exits normally.
This is the enforcement that makes everything compound. Without it, corrections evaporate. With it, every correction becomes permanent memory.
Tier 1: Rules That Load When You Need Them
Rules live in ~/.claude/rules/ with path-scoped frontmatter — they only activate when you touch matching files:
<!-- file: ~/.claude/rules/agent-behavior.md -->
---
paths:
- "src/**"
- "cmd/**"
- "internal/**"
- "handler/**"
- "service/**"
---
## Auto-Save (MANDATORY — not optional)
Do NOT ask permission. Do NOT skip. Do NOT defer to end of session.
### On User Correction — Save IMMEDIATELY
1. Fix the issue
2. **Immediately** save the correction as feedback
3. Classify: behavioral → rules/, project fact → memory/, procedure → skills/
4. Include **Why** and **How to apply**
### On Error Recovery — Save After Resolving
When you recover from a failure that took 2+ attempts:
1. Save the recovery pattern via /learnThe key words: "do NOT ask permission." Corrections are mandatory auto-saves. Every time I say "not like that," Claude creates a memory entry with the correction, the reason, and when it applies. The detect-correction script injects the reminder. The Stop hook prevents exiting without saving. Three layers of enforcement for one behavior.
The Reflexion protocol adds a retry budget:
<!-- file: ~/.claude/rules/error-recovery.md -->
## On ANY Failure, STOP and Reflect
1. **Recall**: "Have I seen this before?" → Run /recall with the error
2. **Diagnose**: "Root cause, not symptom."
3. **Plan**: "What specific change will fix this?"
4. **Check**: "Am I retrying the same thing expecting different results?"
5. **Learn**: "Worth capturing via /learn?"
## Retry Budget
- **3 attempts** max at the same approach. Then pivot.
- **2 approaches fail** → Ask the user.
- **Same failure twice** → Auto-save as feedback memory.Three attempts, then pivot. Two pivots, then ask. Same failure twice, auto-save. No infinite loops.
Tier 2: Memory That Compounds
Memory files load at session start with structured frontmatter:
<!-- file: memory/blog-quality-feedback.md -->
---
name: blog-quality-feedback
description: Articles must teach, not summarize. Depth non-negotiable.
type: feedback
---
Conrad rejected the first batch of blog articles as low quality.
Specific issues:
- Too short (4-5 min read) — good tech articles are 10-20 min
- Surface-level — reads like a README, not a teaching article
- No actionable takeaways
- Pseudo-code instead of real code
**Why:** Conrad's brand is "backend engineer who teaches through depth."
Shallow articles undermine credibility.
**How to apply:** Every article must pass BLOG_QUALITY.md quality bar.The structure: what happened → why it matters → how to apply. "Why" separates compliance from judgment. "Don't write short articles" gives you compliance. "Shallow articles undermine credibility with senior engineers" gives you an AI that can judge edge cases.
Tier 3: Skills That Replace Prompting
Skills are deterministic procedures:
<!-- file: ~/.claude/skills/deploy-staging/SKILL.md -->
---
name: deploy-staging
description: Trigger staging deployment after Build completes
allowed-tools: Bash(gh *), Bash(aws *)
model: sonnet
---
# Step 1: Wait for Build and Publish
gh run list --workflow "Build and Publish" --branch master --limit 1 \
--repo my-org/api-gateway \
--json databaseId,status,conclusion,headSha
# Step 2: Trigger staging deployment
gh workflow run "Deploy to staging" \
--repo my-org/api-gateway \
-f environment=stg
# Step 3: Monitor deployment pipeline
aws codepipeline get-pipeline-state \
--name myapp-staging-pipeline \
--profile stage --region us-west-2/deploy-staging → same commands, same repo, same flags. Every time. 15+ skills covering deploys, debugging, and PR workflows. A skill beats a prompt because it's deterministic, maintained, and encodes tribal knowledge like "this service needs dual PRs to both release AND develop branches."
Cross-Session Recall: 197 Sessions in 200ms
The compound interest. Every session transcript gets chunked and indexed into SQLite FTS5:
# file: ~/.claude/tools/recall_index.py
CHUNK_TARGET_CHARS = 3000
CHUNK_OVERLAP_CHARS = 500
def chunk_turns(turns: list[dict], target_chars=CHUNK_TARGET_CHARS):
"""Chunk conversation turns into searchable segments."""
chunks = []
current_chunk = []
current_chars = 0
for turn in turns:
text = f"{turn['role'].upper()}: {turn['text']}"
if current_chars + len(text) > target_chars and current_chunk:
chunk_text = "\n\n".join(current_chunk)
chunks.append({
"content": chunk_text,
"has_code": bool(re.search(r"```|func |def |class ", chunk_text)),
"has_error": bool(re.search(
r"error|panic|FAIL|nil pointer", chunk_text, re.IGNORECASE
)),
})
# Keep last item as overlap for context continuity
current_chunk = [current_chunk[-1]] if current_chunk else []
current_chars = len(current_chunk[0]) if current_chunk else 0
current_chunk.append(text)
current_chars += len(text)
return chunksSearch combines BM25 relevance with exponential recency decay:
# file: ~/.claude/tools/recall_search.py
def recency_boost(session_date_str: str, half_life_days: float = 30.0) -> float:
"""Sessions from 30 days ago get 50% weight. Recent work matters more."""
session_date = datetime.fromisoformat(session_date_str)
age_days = (datetime.now() - session_date).total_seconds() / 86400
return 2 ** (-age_days / half_life_days)
# Combined: score = bm25_score * (0.7 + 0.3 * recency)The database: 197 sessions · 4,374 chunks · 57.5 MB searchable.
/recall "admin 403 permission denied" → exact root cause from three weeks ago in 200ms. No re-investigating. A personal Stack Overflow where every answer is verified against your own production environment.
What Went Wrong (Because Of Course It Did)
Building the harness was its own six-month debugging adventure.
The guard that blocked itself. The worktree guard initially blocked editing ANY file outside a worktree — including .claude/ config files. I spent 20 minutes wondering why Claude couldn't save a memory file. The /.claude/ exception? Added after I locked myself out of my own learning system.
The unbearable nudge. nudge-learn.sh started at every 5 prompts. Claude would interrupt itself mid-thought to earnestly save that "the user prefers tabs over spaces." At 5, every minor observation becomes a skill. At 15, only genuine discoveries survive.
Memory bloat. Hit the 10-file budget in month one. The Stop hook started nagging about it, which was annoying until I realized the budget was the feature. Forced consolidation means memories stay atomic instead of becoming a graveyard of stale facts.
The Stop hook infinite loop. Early versions: block exit → Claude "reflects" without saving → tries to stop → blocked again → repeat. The self-gate marker file was the fix: block once, approve forever. Let the human judge if the reflection was adequate.
57 MB and counting. The recall database grows ~300KB per day. At this rate it'll hit 200MB in a year. I need a pruning strategy. I don't have one yet. (If you build one, let me know.)
Before vs. After
After six months and 197 sessions:
| Metric | Before (sessions 1–50) | After (sessions 150–197) |
|---|---|---|
| Wrong repo/directory per session | ~2.7 incidents | ~0.1 (guards block the rest) |
| Session setup time | 5–10 min context-setting | 0 min (memory auto-loads) |
| "I already told you this" corrections | ~4 per session | ~0.5 per session |
| Past problem re-investigation | Full re-debug every time | /recall → 200ms |
| Deploy procedure | Manual command assembly | /deploy-staging exact commands |
The system isn't perfect. Claude still surprises me with creative interpretations of rules. But the category of mistake that used to dominate — repetitive, preventable, "I told you this yesterday" errors — is gone.
Set It Up Yourself (17 Minutes)
You don't need 25 scripts. Start with three files.
File 1: A guard script (5 minutes)
#!/bin/bash
# file: ~/.claude/scripts/guard-main.sh
INPUT=$(cat)
CMD=$(echo "$INPUT" | jq -r '.tool_input.command // empty' 2>/dev/null)
[[ -z "$CMD" ]] && exit 0
if echo "$CMD" | grep -qE 'git\s+(commit|push)'; then
BRANCH=$(git branch --show-current 2>/dev/null)
if [[ "$BRANCH" == "main" || "$BRANCH" == "master" ]]; then
echo "BLOCKED: Use a feature branch, not $BRANCH"
exit 2
fi
fi
exit 0File 2: Wire it in settings.json (2 minutes)
{
"hooks": {
"PreToolUse": [{
"matcher": "Bash",
"hooks": [{ "type": "command", "command": "bash ~/.claude/scripts/guard-main.sh" }]
}]
}
}File 3: Your first memory (10 minutes)
<!-- file: ~/.claude/projects/<your-project>/memory/corrections.md -->
---
name: corrections
description: Things Claude got wrong that it should remember
type: feedback
---
## This repo uses pnpm, not npm
**Why:** npm causes lockfile conflicts in our monorepo
**How to apply:** Always use pnpm for install, add, remove commands
## API prefix is /api/v2/, not /api/v1/
**Why:** v1 was deprecated in 2024, still in some old docs
**How to apply:** All new endpoints and test URLs use /api/v2/Three files. You'll feel the difference in a week.
Then iterate
Every time you catch yourself correcting the AI, ask one question: "How do I make sure this never happens again?"
- →Behavioral mistake → add a rule
- →Wrong fact → save a memory
- →Repetitive procedure → write a skill
- →Dangerous command → wire a guard
After a month, your AI pair programmer will know your codebase better than a new hire. After three months, it'll know your deploy pipeline better than the wiki. After six months and 197 sessions, it won't make the same mistake 137 times.
It'll make it zero.