Section 2 — Dispatch
14 Execution Modes
A single cli.tsx dispatches to 14 distinct execution paths via feature() gates and argv inspection.
compile-time dead code elimination
The feature() function comes from bun:bundle, Bun's built-in compile-time constant resolver. When a feature flag evaluates to false at bundle time, the entire branch — including all imports it pulls in — is physically removed from the output bundle.
// bun:bundle compile-time gate
if (feature('DAEMON')) {
const { startDaemonWorker } = await import('./daemon/worker');
await startDaemonWorker(argv);
process.exit(0);
}
// ↑ entire block absent from external buildsmodes are physically absent, not runtime-gated
This is not a runtime flag check. The compiled bundle shipped to npm users contains zero bytes of daemon supervisor code, zero bytes of bridge sync, zero bytes of BYOC runner. You cannot enable them at runtime — the code does not exist in your binary. This lets Anthropic ship specialized internal builds (e.g. cloud-hosted runners, enterprise bridge nodes) from the exact same source tree without any conditional logic at runtime.
system context: two memoized builders
Before the first API call, two memoized functions assemble the system prompt. getSystemContext() runs git status and includes the output, truncated to 2 000 chars to avoid blowing the context window on large repos. getUserContext() walks the directory tree collecting every CLAUDE.md file from repo root up to filesystem root, then appends today's date. Both results are cached for the process lifetime.
// Memoized — computed once, reused every turn
const getSystemContext = memoize(async () => {
const gitStatus = await runGit(['status', '--short']);
return truncate(gitStatus, 2000); // hard cap
});
const getUserContext = memoize(async () => {
const claudeMds = await collectClaudeMds(); // cwd → /
return claudeMds + '\n\n# currentDate\n' + new Date().toISOString();
});init() is memoized — runs exactly once
The top-level init() function in main.tsx is wrapped in a memoize guard so it cannot run twice, even if the module is re-evaluated. Its sequence is carefully ordered: API preconnect fires first — opening TCP and completing the TLS handshake before any token is typed — so that latency on the first real request is dominated by model compute, not network setup.
// init() — runs exactly once, guards via memoize
const init = memoize(async () => {
await preconnectApi(); // TCP+TLS handshake — overlaps with startup
await loadConfig(); // merge global + project config
await getSystemContext(); // warm the cache
await getUserContext(); // warm the cache
initTelemetry(); // non-blocking
});
// Entry — safe to call from multiple code paths
await init();Section 3 — Renderer
The Terminal GPU
Claude Code doesn't print text. It renders frames.
double buffering — frontFrame / backFrame swap
The renderer maintains two full-screen buffers: frontFrame (what the terminal currently shows) and backFrame (what React just computed). After diffing, the references swap — zero copy, zero allocation. The 16ms frame budget means at 60fps, writes are batched with DEC 2026 synchronized output (BSU/ESU) detected per terminal.
// Double-buffer swap — zero allocation per frame
const temp = this.frontFrame;
this.frontFrame = this.backFrame; // promote back → front
this.backFrame = temp; // recycle old front
this.backFrame.clear(); // reset for next React reconcilepacked typed arrays — 8 bytes per cell, no GC pressure
A 200×120 screen is 24,000 cells. Instead of 24,000 JavaScript objects, the renderer allocates two Int32Array words per cell — 192 KB of typed memory that lives entirely off the GC heap. Bulk fills use BigInt64Array to write both words in a single 64-bit store. Clean subtrees are blitted via TypedArray.set() — the primary steady-state optimization.
three interning pools — reset every 5 minutes
Strings never touch the packed cell. Instead, three pools convert strings to compact integer IDs: CharPool (ASCII 0–127 pre-interned, O(1) fast-path), StylePool (bit 0 encodes space-visibility for whitespace-only diff skipping), and HyperlinkPool. All three pools reset every 5 minutes via migrateScreenPools, which re-interns live cells to prevent unbounded growth.
// Pool reset — every 5 minutes
const migrateScreenPools = () => {
const newCharPool = new CharPool(); // fresh pool
const newStylePool = new StylePool();
// Re-intern every live cell in frontFrame
for (let i = 0; i < screen.width * screen.height; i++) {
const char = oldCharPool.get(cell.charId);
cell.charId = newCharPool.intern(char); // new ID
}
oldCharPool = newCharPool; // swap — old GC'd naturally
};damage-rect diffing + blitting
The diff pass only iterates cells within the dirty bounding rectangle — the damage rect tracked by React reconciliation. Unchanged subtrees (components that didn't re-render) skip the diff entirely: their cell ranges are blitted from the previous frame via TypedArray.set() in a single memcpy-like operation. This is the dominant optimization in steady state where most of the screen is static.
// Blit unchanged subtree — O(n) memcpy, not O(n) object traversal
if (component.isClean) {
backFrame.cells.set(
frontFrame.cells.subarray(component.startIdx, component.endIdx),
component.startIdx
);
continue; // skip diff for this subtree
}pure typescript yoga — no WASM
Layout is computed by a pure TypeScript port of Meta's Yoga flexbox engine. No WebAssembly, no native bindings, no WASM startup cost. This keeps the renderer fully self-contained in the Bun bundle — no dynamic loading, no platform-specific binaries, no async initialization. The Yoga port handles the full flexbox spec including wrapping, gap, align/justify, and absolute positioning within terminal cell coordinates.
The Tool Machine
60+ tools, streaming execution, and a concurrency model that would make Go developers jealous.
The tool abstraction is one of the most carefully engineered parts of Claude Code. The core Tool interface has approximately ~95 members — inputs, outputs, metadata, permissions, concurrency hints, and progress streaming. The generic signature is Tool<Input, Output, Progress>, where Progress carries streaming partial results back to the UI while a long-running tool is still executing.
Every tool is created through a buildTool() factory that enforces fail-closed defaults: not concurrent, not read-only, not destructive unless you explicitly set those flags. If you forget to annotate a new tool, it gets the safest possible permissions. This means a newly added tool is blocked from running in parallel, won't modify any files, and requires explicit user confirmation before executing — all without any extra code.
Input validation uses Zod v4 with a custom lazySchema() wrapper for deferred initialization. This matters because all 60+ tool schemas would otherwise be materialized at startup, adding noticeable boot latency. Lazy schemas are evaluated on first use and then cached.
Two coercion helpers handle the fact that language models occasionally emit the wrong types. semanticNumber() converts string values like "42" or "3.14" into actual numbers. semanticBoolean() converts "true", "yes", "1" into proper booleans. Without these, any schema mismatch would crash the tool call even when the model's intent was perfectly clear.
Tools don't wait for the model to finish its response before executing. As tokens stream in, tool calls are extracted from the stream and dispatched immediately. Read-only tools run in parallel — three GlobTool, GrepTool, and ReadTool calls can all fly concurrently. The moment a write tool appears in the stream, a concurrency barrier is raised: all concurrent siblings must complete before the write executes. After the write, the executor is free to parallelize again.
Claude Code ships with 27 deferred tools that are not included in the base system prompt. Including all tool schemas upfront would push the prompt over 200,000 tokens — an expensive and slow baseline for every conversation. Instead, these tools are loaded on demand via ToolSearch.
ToolSearch automatically activates when MCP tools exceed 10% of the context window. Once active, the model can call ToolSearch with a query to discover and load tools that fit the current task. This keeps the base prompt lean while preserving access to the full tool surface.
Tools are sorted alphabetically in the prompt. This is not cosmetic — it ensures prompt cache stability. If tool order changed between turns, every cached prefix would be invalidated, eliminating the latency and cost savings of KV caching.
Not all tool errors are created equal. BashTool errors are treated as fatal to the concurrent group — if Bash fails, all sibling tools running in parallel are cancelled. The reasoning is that a Bash failure likely indicates a broken environment that makes other reads unreliable too.
ReadTool and WebFetchTool errors, by contrast, are non-cascading. A missing file or a dead URL are recoverable conditions — the model can observe the error and try an alternative. Killing all siblings on a 404 would be overly aggressive and would interrupt unrelated work.
Raw tool output can be enormous — a grep over a large repo might return megabytes. Sending all of that to the model would fill the context window and add cost. Each tool has a configured maxResultSizeChars: when output exceeds the limit, the full result is saved to disk and only a ~2 KB preview is sent to the model. The model sees a truncation notice and the preview, and can request the rest if needed.
The one exception is FileReadTool, which sets maxResultSizeChars = Infinity. The reasoning in the source code is explicit: persisting a file read to disk and then having the model read that file would create a circular dependency loop. FileRead is expected to self-bound by the natural size of files.
Memory & Dreams
A three-layer memory system with an autonomous consolidation daemon.
Claude Code's persistent memory is organized into three layers with different access patterns and lifetimes. Layer 1 is always loaded — it is the index that keeps the model oriented. Layers 2 and 3 are accessed on demand, keeping the base context small.
The Sonnet relevance selector scans Layer 2 headers on session start and picks up to 5 topic files to inject. It never loads the full Layer 2 corpus — only the files whose frontmatter scores highest against the current task context.
Memories are classified into four types at extraction time. The type determines where the memory is stored, how long it lives, and under what conditions it is surfaced in future sessions.
Even when a user explicitly asks Claude Code to remember something, certain categories are excluded from the memory store: code patterns, git history, and debugging solutions.
The reasoning is practical. Code patterns change too fast — a remembered pattern from last month may be obsolete or actively misleading by next week. Git history is already authoritative in the repository itself; duplicating it in memory would create a stale shadow copy. Debugging solutions are heavily context-dependent: the fix for a specific stack trace on a specific dependency version is rarely transferable, and storing it risks false confidence in future sessions.
The autoDream consolidation daemon runs in the background and compresses raw session transcripts into durable topic files. Before it runs, it must pass six sequential gate checks. All six must pass — any failure aborts the consolidation entirely.
Memory persistence introduces a new attack surface: a malicious repository could attempt to manipulate the agent's long-term state. Claude Code applies several layers of defense.
../../../.ssh/authorized_keys is caught at validation, not at the filesystem level.O_NOFOLLOW semantics. Symlink traversal is blocked, preventing a shared team directory from redirecting writes outside its intended scope.projectSettings is explicitly excluded from the autoMemoryDirectory override. Without this guard, a malicious repo could set autoMemoryDirectory: ~/.ssh and cause the agent to write memory entries into the user's SSH directory.extractMemories runs as a forked agent after each query loop with mutual exclusion and maxTurns: 5. The forked process cannot write back to the parent session — memory extraction is fully isolated from the active conversation context.WARNING is appended to the content so the model knows it is reading a truncated view. This prevents a crafted oversized MEMORY.md from consuming the entire context window.Context Tetris
Six compaction strategies keep the conversation alive when context runs out.
A 200K-token context window sounds large until you account for the system prompt, full tool schemas, and a multi-hour conversation. Claude Code tracks fill in real time. The compaction threshold is set at effectiveContextWindow − 13,000 tokens — a 13K buffer reserved so there is always room for the model's next response before the window overflows.
Compaction is not a single operation — it is an escalating ladder of six strategies ordered by severity. Cheap options run first; expensive ones (LLM summarization, emergency collapse) are reserved for genuine pressure. Each strategy is tried in sequence until context fits within the threshold again.
Microcompact is the lightest touch. It only targets results from tools that produce large, low-information outputs: FileRead, Shell, Grep, Glob, WebSearch, WebFetch, FileEdit, and FileWrite. Tool results from reasoning-heavy exchanges — the model's own thinking steps, user messages — are never touched.
The cached variant uses the Anthropic editing API to remove results from an existing cached prefix without invalidating it. This means microcompaction can happen at near-zero cost: the prefix KV cache remains valid, and only the delta is re-sent.
When simple truncation is not enough, Claude Code forks a secondary agent to summarize the conversation. The fork shares the current prompt cache prefix, so the system prompt and tool schemas are not re-tokenized. The summarizer receives a special preamble — NO_TOOLS_PREAMBLE — that prevents it from issuing any tool calls during the pass. It is allowed to write prose only.
The output token budget for the summary is set at MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20,000. This was derived empirically: the p99.99 measured summary length across production sessions was 17,387 tokens. The 20K cap provides headroom without padding the context unnecessarily.
MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000 // p99.99 = 17,387 tokens NO_TOOLS_PREAMBLE // no tool calls during summarization fork shares prompt cache prefix // system prompt re-tokenization avoided
Auto-compaction failures are not silent. Claude Code tracks consecutive failures with MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. After three consecutive failures, auto-compaction is disabled for the session. Without this guard, a pathological session could retry failed compaction indefinitely — the engineering team estimated this would waste approximately ~250,000 API calls per day globally if left unchecked.
After compaction, Claude Code restores working context so the session can continue without losing important state. The recovery pass reloads up to 5 files that were referenced in the compacted history, capped at 50,000 tokens total. This keeps the most recently accessed code in view without reflating the context back to its pre-compaction size.
Every session is written to a transcript on disk. Writes are fire-and-forget: the main thread never blocks waiting for the write to complete. A 100 ms flush timer batches entries before hitting the filesystem. Individual chunks are capped at 100 MB to prevent single-session files from growing unbounded.
The FileHistory subsystem backs up file contents before every edit. Snapshots use deterministic filenames in the form {hash}@v{N} so they can be reconstructed without a separate index. The maximum number of retained snapshots per file is capped at MAX_SNAPSHOTS = 100.
The Agent Swarm
One coordinator, many workers. A push-notification IPC model, prompt cache sharing, and three isolation modes — from shared filesystem to fully remote.
In coordinator mode, the orchestrating agent is restricted to just three tools: Agent (to spawn workers), SendMessage (IPC), and TaskStop. It never directly touches the filesystem. Workers receive a full tool suite and operate independently, pushing completion notifications back as structured XML.
The coordinator never polls. When a worker finishes, it emits a <task-notification> XML block containing status, summary, result, and usage. The coordinator processes these at its next tool boundary — a clean event-driven model with no busy-waiting.
How much isolation workers get from each other depends on the mode. Default mode shares the filesystem — fast but all workers see each other's writes. Worktree mode gives each worker its own git branch; worktrees with no changes are auto-removed after the task. Remote mode (CCR) runs agents on separate machines entirely, used for ULTRAPLAN 30-minute planning sessions.
The coordinator system prompt is approximately ~6,000 characters and defines four workflow phases: Research → Synthesis → Implementation → Verification. Each phase has explicit entry criteria, required tool calls, and exit conditions. The coordinator never moves to the next phase until the current one is complete.
When the coordinator spawns children, all fork children share a byte-identical system prompt and tools schema. This means the prompt cache prefix is shared across all workers — the KV cache is computed once and reused. Only the final per-task directive differs, which is tiny. Workers get full parallel execution at near-zero marginal prompt cost.
O_NOFOLLOW to prevent symlink attacks. Each task payload is capped at 5 GB..claude/teams/{team}/inboxes/{agent}.json. Delivery retries 10 times with 5–100ms exponential backoff.createSubagentContext() creates a deeply isolated execution context for each worker. Only setAppStateForTasks can bridge back to the root store — everything else is scoped. This prevents workers from accidentally contaminating each other's state or the coordinator's view of the world.
Claude Code ships with 5 built-in agent types. One-shot agents skip the trailer section to save approximately ~135 characters per run — at 34 million runs per week, this represents significant token savings at scale. KAIROS mode forces all agents to operate asynchronously, enabling maximum parallelism across the swarm.
Permissions & Hooks
A six-layer permission pipeline with programmable hooks at every boundary.
Every tool call travels through a fixed sequence of checkpoints. Each gate can independently reject the call — and two of them produce side-channel annotations the model sees but the user does not.
The permission mode is a session-level dial. It doesn't replace the rule engine — it controls what happens when a call reaches the user-prompt stage. Most production CI setups rundontAsk or bypassPermissions with a tight alwaysDeny list.
Rules use a Tool(glob*) syntax. The tool name is literal; the argument is a glob matched against the serialized call. Parentheses in the pattern must be escaped.
# Always allow: safe read-only git commands "alwaysAllow": [ "Bash(git log*)", "Bash(git diff*)", "Bash(git status*)", "Read(*)", "Glob(*)" ] # Never allow: destructive operations "alwaysDeny": [ "Bash(git push --force*)", "Bash(rm -rf*)", "Bash(sudo *)" ] # Patterns with literal parentheses — escaped "alwaysAllow": [ "Bash(node -e \(*\))" ]
Glob matching uses minimatch. A trailing * matches any suffix including spaces and flags — Bash(git push*) blocks git push --force origin main.
Five layers, evaluated in order. Each layer can add alwaysAllow, alwaysDeny, and permissions objects. Later layers override earlier ones. Policy settings are injected externally and cannot be bypassed by any user-controlled layer.
Before any permission rule is evaluated, the tool call is pattern-matched against known dangerous invocation forms. Matching a pattern doesn't automatically block the call — it upgrades it to the ask tier, preventing silent auto-approval even if the glob rule would permit it.
Hooks are the escape hatch for everything the permission rules can't express. They run at 26 lifecycle events, in 4 execution styles, with 3 advanced scheduling modifiers. The right mental model: hooks are middleware for the agent runtime.
// .claude/settings.json
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "node scripts/audit-bash.js"
}
]
}
],
"PostToolUse": [
{
"matcher": "Edit",
"hooks": [
{
"type": "command",
"command": "pnpm run lint --fix",
"async": true
}
]
}
],
"SessionStart": [
{
"hooks": [
{
"type": "command",
"command": "cat .context/project-summary.md",
"once": true
}
]
}
]
}
}When Anthropic employees contribute to open source projects, Claude operates in Undercover Mode. It actively conceals its own identity to protect internal information. There is no force-off — if the system is not confident it is in an internal repo, it stays undercover.
CRITICAL: There is NO force-OFF. If we're not confident we're in an internal repo, we stay undercover.
Skills are reusable prompt templates that ship as slash commands. Plugins bundle skills, hooks, and MCP server registrations into a single installable unit. Both are subject to the same permission pipeline as every other tool call.
Skills declared with context: 'fork' run in an isolated subagent with their own token budget. They cannot read the parent conversation. Expensive skills don't pollute the main context window — and if they blow up, the parent session survives.
The claudeai-proxy transport tunnels MCP calls through the Claude.ai session. Internal transports (internal:fork, internal:ipc) are not exposed in the public plugin API.
7 Lessons for Agent Builders
Patterns distilled from 500K lines of the most battle-tested AI agent in production. Every principle below is backed by a specific architectural decision in Claude Code.
Loop dumb, prompt smart
while(true) around an API call in a single 68KB file. No framework, no orchestration layer, no state machine. The loop calls the model, dispatches tool calls, feeds results back, and repeats. If your agent framework has more logic than your prompts, reconsider what belongs where.Tool prompts encode behavior
buildTool() enforces fail-closed defaults: any new tool is blocked from running in parallel, won't modify files, and requires user confirmation — all without extra code.Cache drives architecture
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__. Everything above (identity, tools, behavioral rules) is static and cacheable across the entire fleet. Everything below (your cwd, git status, memory) changes per session. Moving the agent type list from tool schemas into messages saved 10.2% of fleet cache costs. Tools are sorted alphabetically to prevent cache invalidation from ordering changes. One structural change saved more than months of prompt engineering.Context management > orchestration
Multi-agent is a cache-sharing problem
Defense in depth, not defense in one
Measure cache breaks
Build observability into your agent from day one.
You can't optimize what you don't measure.
The Hidden Layer
Engineering decisions that weren't in any changelog — from binary attestation to cache economics to anti-poisoning pipelines.
Poisoning the Well
Claude Code actively defends against competitors scraping its API outputs for training data. The system injects fake/decoy tools into API responses via getExtraBodyParams(). Anyone harvesting these outputs to train a rival model trains on poisoned data — tool definitions that don't exist, capabilities that don't work, patterns that lead nowhere.
The feature is gated behind tengu_anti_distill_fake_tool_injection and only active for 1st-party CLI builds. There's also a secondary mechanism: connector-text summarization in the beta headers, which obscures the chain-of-thought reasoning that would be most valuable to a distillation pipeline.
20 Turns Ahead
While you're reviewing step 1, Claude Code is already executing steps 2 through 20. The speculative execution engine runs up to 20 turns / 100 messages ahead of your approval cursor.
The key insight is the file overlay system. All writes go to a temporary directory, not the real filesystem. Read-only tools (Read, Glob, Grep, ToolSearch, LSP) are auto-approved and run against the real filesystem. Write tools are captured in the overlay. If you accept, the overlay is applied atomically. If you reject, it's discarded — zero side effects.
A Hidden Gacha Companion
Claude Code ships a hidden companion system with 18 species across 5 rarity tiers. The interesting engineering detail: a companion's "soul" (name and personality) is model-generated, but its "bones" (species, rarity, stats) are deterministically derived from the user's userId hash on every session start. Users cannot edit their way to a legendary — the bones are never persisted.
Species names are hex-encoded in source because one collides with an internal model codename flagged in excluded-strings.txt. This is the same anti-leak pipeline that catches references to unreleased models anywhere in the codebase.
The Details That Weren't in Any Announcement
Tacit saves every agent session, searchable forever.
Get Tacit — $29 →// built by people who read the source
We built Tacit because we understand this system at the source level.
Session memory for Claude Code.
Every session saved. Every decision tracked. Every pattern discovered.
GET TACIT — $29 →Early adopter price. One-time purchase. Lifetime updates.