What the Claude Code Leak Actually Reveals About Building AI Agents

claude-codeagent-architectureanthropicproduction-aisecurity

Anthropic shipped Claude Code v2.1.88 to npm with a 59.8MB source map that was never supposed to leave the building. A missing line in .npmignore. That's it. 1,906 files. 512,000+ lines of TypeScript. The complete internals of the most commercially successful AI coding agent — exposed.

And this was the second time. Same mistake. Thirteen months apart.

I'm not here to dunk on the security failure. I'm here because what leaked is a masterclass in how to build agents that actually work in production. If you're building anything with AI agents, here's what matters.

The model doesn't get to decide what's allowed

This is the single most important pattern in the entire codebase.

Claude Code has 40+ tools — file read, file write, bash, web fetch, MCP calls. Every single one has its own permission gate. The model says "I want to run this bash command." The tool system says "no, not without the human approving it."

The permission check happens before execution. Not after. Not as a suggestion. Not as a "the model should probably ask first." It's a hard gate in the code.

If your LLM is both deciding what to do and deciding whether it's allowed to do it, you don't have a permission system. You have a suggestion system. Anthropic clearly learned this lesson. Most agent builders haven't.

Steal this: Separate decision from permission. Always.

Prompt engineering is infrastructure, not vibes

There's a file that tracks 14 different ways the prompt cache can break. Functions are annotated DANGEROUS_uncachedSystemPromptSection(). There are "sticky latches" that prevent mode toggles from invalidating the cache.

Why? Because at Anthropic's scale, every cache miss costs real money. They split the system prompt at a boundary — everything before it (instructions, tool definitions) is cached globally. Everything after (your CLAUDE.md, git status, today's date) is session-specific. Your project config doesn't bust the cache for every other user.

They even A/B tested "be concise" against explicit word counts. The word counts won. A 1.2% token reduction doesn't sound like much until you multiply it by millions of daily sessions.

Steal this: Design your prompt structure around cache boundaries from day one. Treat it as architecture, not optimisation.

Build a sceptic into the loop

Claude Code has a Verification Agent. Its entire job is to distrust the builder agent's work.

It has a literal list of excuses it's trained to reject:

"The code looks correct based on my reading" — reading is not verification. Run it.
"The implementer's tests already pass" — the implementer is an LLM. Verify independently.
"This is probably fine" — probably is not verified. Run it.

The head of Claude Code has said 100% of his contributions were written by Claude Code. The tool writes itself. And then a separate agent checks the work because they don't trust the builder — even though the builder is their own model.

If Anthropic doesn't trust Claude to verify Claude's own work, why would you?

Steal this: Adversarial verification agents. The builder proposes, the sceptic verifies. Different context, different incentives.

9,700 lines of bash security is not overkill

Twenty-two security validators. A tree-sitter parser that builds an AST of every command before execution. Defence against Zsh equals expansion, unicode zero-width space injection, null-byte attacks. Default posture: when in doubt, ask the human.

The threat model is remarkably specific because they've been attacked with remarkably specific exploits. A HackerOne review found a malformed token bypass. There's a known parser differential between the old and new parsers that they're running in shadow mode, logging every divergence.

They documented the vulnerability in the code itself. So the next engineer who reads it understands the risk surface without having to rediscover it.

Most agent builders treat shell execution as "just run subprocess." Anthropic treats it as a security boundary. Because it is one.

Steal this: If your agent executes commands, your parser is your security boundary. Invest accordingly.

They're building an autonomous daemon and haven't shipped it yet

The biggest unreleased feature is KAIROS — an autonomous mode where Claude Code runs unattended. Background workers. GitHub webhooks. Cron jobs every five minutes.

And a /dream command. When the user goes idle, the agent runs a "dream" — a reflective pass over its memory files. It scans the day's work for new information worth keeping, consolidates duplicates, prunes outdated memories, and watches for memories that have drifted from reality.

The memory system has three layers: working context (always loaded), project notes (on-demand), and session history (searched selectively). Consolidation uses a triple gate — 24 hours must pass, 5+ sessions must accumulate, and a file lock must be acquired.

This is where the industry is heading. Not AI assistants you talk to. Autonomous agents that work while you sleep and maintain their own memory.

Steal this: The tiered memory pattern. The dream consolidation concept. The triple-gate activation for background processes.

The operational details tell you everything about the culture

A regex detects user frustration by matching profanity. An LLM company using regexes for sentiment analysis. But a regex is faster and cheaper than an inference call just to check if someone's swearing at your tool. Pragmatism wins.

An internal comment: "1,279 sessions had 50+ consecutive failures in a single session, wasting ~250K API calls/day globally." The fix? Three lines of code. MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. Circuit breakers belong in every retry loop from day one.

The terminal renderer uses game-engine techniques — Int32Array character pools, bitmask-encoded styles, a patch optimiser that merges cursor moves. Seems like overkill until you remember these things stream tokens one at a time at scale.

What to take away

Five patterns worth stealing:

Separate decision from permission. The model proposes. The tool system disposes. Never conflate the two.
Treat prompts as infrastructure. Cache boundaries, A/B tested verbosity, dynamic vs static sections. This is systems engineering.
Build adversarial verification into the loop. Don't trust the builder to verify its own work.
Design for autonomy from day one. KAIROS isn't bolted on. The memory system, the daemon architecture — it's woven into the same codebase.
Invest in the boring parts. 9,700 lines of bash security. 14 cache-break detectors. The infrastructure nobody talks about is what keeps the product from falling over.

The irony: the system designed to prevent leaking internal information was present in the leaked source. The anti-leak mechanism couldn't prevent the leak.

But for those of us building agents — this is the most valuable accident in AI engineering this year. Study it. Steal the patterns. Build better.

Sandeep builds autonomous AI agent systems and writes about production AI architecture at 11factor.ai.

← cd ../blog