I Reverse-Engineered Claude Code's 500,000 Lines.

14 execution modes. GPU rendering in your terminal. 48 unreleased feature flags. The complete technical anatomy of the most sophisticated AI coding agent ever built.

by Ketan Khairnar · tacit.sh~15 min read

500Klines of source

1884.ts/.tsx files

14ways to run

60+available tools

48+unreleased features

6compaction strategies

Choose your lens

Section 2 — Dispatch
14 Execution ModesA single cli.tsx dispatches to 14 distinct execution paths via feature() gates and argv inspection.
14 EXECUTION MODES — cli.tsx DISPATCH TREE
SYSTEM PROMPT TOKEN BUDGET (~46,000 TOKENS)
compile-time dead code eliminationThe feature() function comes from bun:bundle, Bun's built-in compile-time constant resolver. When a feature flag evaluates to false at bundle time, the entire branch — including all imports it pulls in — is physically removed from the output bundle.
// bun:bundle compile-time gate
if (feature('DAEMON')) {
  const { startDaemonWorker } = await import('./daemon/worker');
  await startDaemonWorker(argv);
  process.exit(0);
}
// ↑ entire block absent from external builds
modes are physically absent, not runtime-gatedThis is not a runtime flag check. The compiled bundle shipped to npm users contains zero bytes of daemon supervisor code, zero bytes of bridge sync, zero bytes of BYOC runner. You cannot enable them at runtime — the code does not exist in your binary. This lets Anthropic ship specialized internal builds (e.g. cloud-hosted runners, enterprise bridge nodes) from the exact same source tree without any conditional logic at runtime.
system context: two memoized buildersBefore the first API call, two memoized functions assemble the system prompt. getSystemContext() runs git status and includes the output, truncated to 2 000 chars to avoid blowing the context window on large repos. getUserContext() walks the directory tree collecting every CLAUDE.md file from repo root up to filesystem root, then appends today's date. Both results are cached for the process lifetime.
// Memoized — computed once, reused every turn
const getSystemContext = memoize(async () => {
  const gitStatus = await runGit(['status', '--short']);
  return truncate(gitStatus, 2000); // hard cap
});

const getUserContext = memoize(async () => {
  const claudeMds = await collectClaudeMds(); // cwd → /
  return claudeMds + '\n\n# currentDate\n' + new Date().toISOString();
});
init() is memoized — runs exactly onceThe top-level init() function in main.tsx is wrapped in a memoize guard so it cannot run twice, even if the module is re-evaluated. Its sequence is carefully ordered: API preconnect fires first — opening TCP and completing the TLS handshake before any token is typed — so that latency on the first real request is dominated by model compute, not network setup.
// init() — runs exactly once, guards via memoize
const init = memoize(async () => {
  await preconnectApi();          // TCP+TLS handshake — overlaps with startup
  await loadConfig();             // merge global + project config
  await getSystemContext();       // warm the cache
  await getUserContext();         // warm the cache
  initTelemetry();                // non-blocking
});

// Entry — safe to call from multiple code paths
await init();

Section 3 — Renderer
The Terminal GPUClaude Code doesn't print text. It renders frames.
FRAME PIPELINE (16ms budget, ~60fps)
PACKED CELL LAYOUT (8 bytes per cell)
So what?
If your CLI app re-renders frequently, consider typed arrays over JS objects. At 24K cells, the difference between 192KB of Int32Arrays (off-heap) and 24K JS objects (GC-managed) is the difference between smooth 60fps and periodic GC pauses that stutter the UI.
double buffering — frontFrame / backFrame swapThe renderer maintains two full-screen buffers: frontFrame (what the terminal currently shows) and backFrame (what React just computed). After diffing, the references swap — zero copy, zero allocation. The 16ms frame budget means at 60fps, writes are batched with DEC 2026 synchronized output (BSU/ESU) detected per terminal.
// Double-buffer swap — zero allocation per frame
const temp  = this.frontFrame;
this.frontFrame = this.backFrame;  // promote back → front
this.backFrame  = temp;            // recycle old front
this.backFrame.clear();            // reset for next React reconcile
packed typed arrays — 8 bytes per cell, no GC pressureA 200×120 screen is 24,000 cells. Instead of 24,000 JavaScript objects, the renderer allocates two Int32Array words per cell — 192 KB of typed memory that lives entirely off the GC heap. Bulk fills use BigInt64Array to write both words in a single 64-bit store. Clean subtrees are blitted via TypedArray.set() — the primary steady-state optimization.
200 × 120 = 24,000 cells · 8 bytes each = 192 KB typed arrays, not 24,000 JS objects
three interning pools — reset every 5 minutesStrings never touch the packed cell. Instead, three pools convert strings to compact integer IDs: CharPool (ASCII 0–127 pre-interned, O(1) fast-path), StylePool (bit 0 encodes space-visibility for whitespace-only diff skipping), and HyperlinkPool. All three pools reset every 5 minutes via migrateScreenPools, which re-interns live cells to prevent unbounded growth.
// Pool reset — every 5 minutes
const migrateScreenPools = () => {
  const newCharPool  = new CharPool();   // fresh pool
  const newStylePool = new StylePool();
  // Re-intern every live cell in frontFrame
  for (let i = 0; i < screen.width * screen.height; i++) {
    const char  = oldCharPool.get(cell.charId);
    cell.charId = newCharPool.intern(char);  // new ID
  }
  oldCharPool  = newCharPool;  // swap — old GC'd naturally
};
damage-rect diffing + blittingThe diff pass only iterates cells within the dirty bounding rectangle — the damage rect tracked by React reconciliation. Unchanged subtrees (components that didn't re-render) skip the diff entirely: their cell ranges are blitted from the previous frame via TypedArray.set() in a single memcpy-like operation. This is the dominant optimization in steady state where most of the screen is static.
// Blit unchanged subtree — O(n) memcpy, not O(n) object traversal
if (component.isClean) {
  backFrame.cells.set(
    frontFrame.cells.subarray(component.startIdx, component.endIdx),
    component.startIdx
  );
  continue; // skip diff for this subtree
}
pure typescript yoga — no WASMLayout is computed by a pure TypeScript port of Meta's Yoga flexbox engine. No WebAssembly, no native bindings, no WASM startup cost. This keeps the renderer fully self-contained in the Bun bundle — no dynamic loading, no platform-specific binaries, no async initialization. The Yoga port handles the full flexbox spec including wrapping, gap, align/justify, and absolute positioning within terminal cell coordinates.

Section 04 — Tool Execution

The Tool Machine

60+ tools, streaming execution, and a concurrency model that would make Go developers jealous.

The Tool Interface

The tool abstraction is one of the most carefully engineered parts of Claude Code. The core Tool interface has approximately ~95 members — inputs, outputs, metadata, permissions, concurrency hints, and progress streaming. The generic signature is Tool<Input, Output, Progress>, where Progress carries streaming partial results back to the UI while a long-running tool is still executing.

Every tool is created through a buildTool() factory that enforces fail-closed defaults: not concurrent, not read-only, not destructive unless you explicitly set those flags. If you forget to annotate a new tool, it gets the safest possible permissions. This means a newly added tool is blocked from running in parallel, won't modify any files, and requires explicit user confirmation before executing — all without any extra code.

Schemas and Coercions

Input validation uses Zod v4 with a custom lazySchema() wrapper for deferred initialization. This matters because all 60+ tool schemas would otherwise be materialized at startup, adding noticeable boot latency. Lazy schemas are evaluated on first use and then cached.

Two coercion helpers handle the fact that language models occasionally emit the wrong types. semanticNumber() converts string values like "42" or "3.14" into actual numbers. semanticBoolean() converts "true", "yes", "1" into proper booleans. Without these, any schema mismatch would crash the tool call even when the model's intent was perfectly clear.

Streaming Tool Executor

Tools don't wait for the model to finish its response before executing. As tokens stream in, tool calls are extracted from the stream and dispatched immediately. Read-only tools run in parallel — three GlobTool, GrepTool, and ReadTool calls can all fly concurrently. The moment a write tool appears in the stream, a concurrency barrier is raised: all concurrent siblings must complete before the write executes. After the write, the executor is free to parallelize again.

STREAMING TOOL EXECUTOR

So what?

The streaming executor is the key latency win. Instead of waiting for the model to finish its entire response before running tools, results start flowing back while the model is still generating. Read-only tools run in parallel — write tools raise a barrier. If you're building an agent, this pattern alone can cut perceived latency by 40-60%.

27 Deferred Tools and ToolSearch

Claude Code ships with 27 deferred tools that are not included in the base system prompt. Including all tool schemas upfront would push the prompt over 200,000 tokens — an expensive and slow baseline for every conversation. Instead, these tools are loaded on demand via ToolSearch.

ToolSearch automatically activates when MCP tools exceed 10% of the context window. Once active, the model can call ToolSearch with a query to discover and load tools that fit the current task. This keeps the base prompt lean while preserving access to the full tool surface.

Tools are sorted alphabetically in the prompt. This is not cosmetic — it ensures prompt cache stability. If tool order changed between turns, every cached prefix would be invalidated, eliminating the latency and cost savings of KV caching.

Error Cascading

Not all tool errors are created equal. BashTool errors are treated as fatal to the concurrent group — if Bash fails, all sibling tools running in parallel are cancelled. The reasoning is that a Bash failure likely indicates a broken environment that makes other reads unreliable too.

ReadTool and WebFetchTool errors, by contrast, are non-cascading. A missing file or a dead URL are recoverable conditions — the model can observe the error and try an alternative. Killing all siblings on a 404 would be overly aggressive and would interrupt unrelated work.

Tool Result Budgets

Raw tool output can be enormous — a grep over a large repo might return megabytes. Sending all of that to the model would fill the context window and add cost. Each tool has a configured maxResultSizeChars: when output exceeds the limit, the full result is saved to disk and only a ~2 KB preview is sent to the model. The model sees a truncation notice and the preview, and can request the rest if needed.

The one exception is FileReadTool, which sets maxResultSizeChars = Infinity. The reasoning in the source code is explicit: persisting a file read to disk and then having the model read that file would create a circular dependency loop. FileRead is expected to self-bound by the natural size of files.

TOOL RESULT BUDGETS

Agent memory resets every session.

Tacit remembers everything.

Learn more →

Section 05 — Memory

Memory & Dreams

A three-layer memory system with an autonomous consolidation daemon.

Three-Layer Memory Architecture

Claude Code's persistent memory is organized into three layers with different access patterns and lifetimes. Layer 1 is always loaded — it is the index that keeps the model oriented. Layers 2 and 3 are accessed on demand, keeping the base context small.

THREE-LAYER MEMORY ARCHITECTURE

The Sonnet relevance selector scans Layer 2 headers on session start and picks up to 5 topic files to inject. It never loads the full Layer 2 corpus — only the files whose frontmatter scores highest against the current task context.

Four Memory Types

Memories are classified into four types at extraction time. The type determines where the memory is stored, how long it lives, and under what conditions it is surfaced in future sessions.

user

Long-term preferences, working style, communication patterns. Persists across all projects.

feedback

Corrections and adjustments the user has made to model behavior. Shapes future responses.

project

Codebase-specific context: architecture decisions, conventions, known pitfalls.

reference

External facts, documentation snippets, or domain knowledge worth retaining.

What Is Never Saved

Even when a user explicitly asks Claude Code to remember something, certain categories are excluded from the memory store: code patterns, git history, and debugging solutions.

The reasoning is practical. Code patterns change too fast — a remembered pattern from last month may be obsolete or actively misleading by next week. Git history is already authoritative in the repository itself; duplicating it in memory would create a stale shadow copy. Debugging solutions are heavily context-dependent: the fix for a specific stack trace on a specific dependency version is rarely transferable, and storing it risks false confidence in future sessions.

autoDream Gate Checks

The autoDream consolidation daemon runs in the background and compresses raw session transcripts into durable topic files. Before it runs, it must pass six sequential gate checks. All six must pass — any failure aborts the consolidation entirely.

1.✓Feature gatetengu_onyx_plover
2.✓KAIROS / remote guardskips remote agents entirely
3.✓Time gate≥ 24 hours since last consolidation
4.✓Scan throttle10 min minimum between scans
5.✓Session gate≥ 5 sessions before first dream
6.✓LockPID written to .consolidate-lock
→Dream phases
Orient
→
Gather
→
Consolidate
→
Prune

So what?

Memory without pruning becomes noise. Claude Code's autoDream consolidation daemon compresses raw transcripts into durable topic files — but only after passing 6 sequential gates (feature flag, remote guard, 24h cooldown, 10min scan throttle, 5-session minimum, PID lock). If you're building persistent memory for an agent, design the eviction policy before the write policy.

Security Details

Memory persistence introduces a new attack surface: a malicious repository could attempt to manipulate the agent's long-term state. Claude Code applies several layers of defense.

Path Security

The memory write path rejects relative paths, null bytes, and tilde expansion before any file operation. An attempted write to ../../../.ssh/authorized_keys is caught at validation, not at the filesystem level.

Team Memory Symlink Protection

Team memory directories are accessed with O_NOFOLLOW semantics. Symlink traversal is blocked, preventing a shared team directory from redirecting writes outside its intended scope.

autoMemoryDirectory Override Prevention

projectSettings is explicitly excluded from the autoMemoryDirectory override. Without this guard, a malicious repo could set autoMemoryDirectory: ~/.ssh and cause the agent to write memory entries into the user's SSH directory.

extractMemories Isolation

extractMemories runs as a forked agent after each query loop with mutual exclusion and maxTurns: 5. The forked process cannot write back to the parent session — memory extraction is fully isolated from the active conversation context.

truncateEntrypointContent()

The MEMORY.md entrypoint is truncated in two passes: line-truncate first (200 lines), byte-truncate second (25 KB). If either cap is hit, a WARNING is appended to the content so the model knows it is reading a truncated view. This prevents a crafted oversized MEMORY.md from consuming the entire context window.

Section 06 — Context

Context Tetris

Six compaction strategies keep the conversation alive when context runs out.

Context Window — Fill Model

A 200K-token context window sounds large until you account for the system prompt, full tool schemas, and a multi-hour conversation. Claude Code tracks fill in real time. The compaction threshold is set at effectiveContextWindow − 13,000 tokens — a 13K buffer reserved so there is always room for the model's next response before the window overflows.

CONTEXT WINDOW UTILIZATION

Six-Strategy Compaction Hierarchy

Compaction is not a single operation — it is an escalating ladder of six strategies ordered by severity. Cheap options run first; expensive ones (LLM summarization, emergency collapse) are reserved for genuine pressure. Each strategy is tried in sequence until context fits within the threshold again.

1

MicrocompactTime-based — 60-min idle gap

Clears old tool results, keeps the last 5. Only targets COMPACTABLE_TOOLS: FileRead, Shell, Grep, Glob, WebSearch, WebFetch, FileEdit, FileWrite. Cached variant uses the editing API to remove results without invalidating the prefix cache.

2

Snip compactionEarly pressure

Truncates the oldest history entries. No summarization — raw truncation of messages from the front of the conversation.

3

Session Memory compactMid pressure — session memory present

Prunes conversation history using the accumulated session memory content as a guide for what is safe to remove. More targeted than snip compaction.

4

Full compactionHigh pressure

LLM summarization via a forked agent that shares the current prompt cache. MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20,000 tokens (based on p99.99 measured at 17,387 tokens). NO_TOOLS_PREAMBLE is injected to prevent the summarizing agent from making tool calls during the compaction pass.

5

Context CollapseEmergency — projection threshold exceeded

Projection-based emergency system. When the projected context size at next tool boundary exceeds the threshold, collapse is triggered regardless of other strategy state.

6

Reactive compactionAPI 413 error received

Last resort. Triggered by an HTTP 413 (Payload Too Large) response from the API. The context was already too large to send — compaction happens retroactively before retrying.

So what?

Most agent frameworks treat context overflow as a single problem with a single solution. Claude Code's 6-strategy ladder shows the right approach: cheap operations first (clear old tool results), expensive ones last (LLM summarization). The cached microcompact variant uses the editing API to modify content without invalidating the prefix cache — near-zero-cost compaction.

Microcompact — COMPACTABLE_TOOLS

Microcompact is the lightest touch. It only targets results from tools that produce large, low-information outputs: FileRead, Shell, Grep, Glob, WebSearch, WebFetch, FileEdit, and FileWrite. Tool results from reasoning-heavy exchanges — the model's own thinking steps, user messages — are never touched.

The cached variant uses the Anthropic editing API to remove results from an existing cached prefix without invalidating it. This means microcompaction can happen at near-zero cost: the prefix KV cache remains valid, and only the delta is re-sent.

Full Compaction — Forked Summarizer

When simple truncation is not enough, Claude Code forks a secondary agent to summarize the conversation. The fork shares the current prompt cache prefix, so the system prompt and tool schemas are not re-tokenized. The summarizer receives a special preamble — NO_TOOLS_PREAMBLE — that prevents it from issuing any tool calls during the pass. It is allowed to write prose only.

The output token budget for the summary is set at MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20,000. This was derived empirically: the p99.99 measured summary length across production sessions was 17,387 tokens. The 20K cap provides headroom without padding the context unnecessarily.

MAX_OUTPUT_TOKENS_FOR_SUMMARY = 20_000   // p99.99 = 17,387 tokens
NO_TOOLS_PREAMBLE                        // no tool calls during summarization
fork shares prompt cache prefix          // system prompt re-tokenization avoided

Circuit Breaker — Failure Guard

Auto-compaction failures are not silent. Claude Code tracks consecutive failures with MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. After three consecutive failures, auto-compaction is disabled for the session. Without this guard, a pathological session could retry failed compaction indefinitely — the engineering team estimated this would waste approximately ~250,000 API calls per day globally if left unchecked.

MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3prevents ~250K wasted API calls/day globally on failure loops

Post-Compact Recovery

After compaction, Claude Code restores working context so the session can continue without losing important state. The recovery pass reloads up to 5 files that were referenced in the compacted history, capped at 50,000 tokens total. This keeps the most recently accessed code in view without reflating the context back to its pre-compaction size.

5 files

MAX FILES RESTORED

Most recently referenced files in compacted history

50K tokens

MAX RESTORE BUDGET

Hard cap so recovery cannot re-trigger compaction

Transcript System — Fire-and-Forget Writes

Every session is written to a transcript on disk. Writes are fire-and-forget: the main thread never blocks waiting for the write to complete. A 100 ms flush timer batches entries before hitting the filesystem. Individual chunks are capped at 100 MB to prevent single-session files from growing unbounded.

The FileHistory subsystem backs up file contents before every edit. Snapshots use deterministic filenames in the form {hash}@v{N} so they can be reconstructed without a separate index. The maximum number of retained snapshots per file is capped at MAX_SNAPSHOTS = 100.

Write modefire-and-forget
Flush timer100 ms
Max chunk size100 MB
MAX_SNAPSHOTS100 per file
Backup timingbefore every edit
Filename format{hash}@v{N}

Section 07 — Agent Swarm

The Agent Swarm

One coordinator, many workers. A push-notification IPC model, prompt cache sharing, and three isolation modes — from shared filesystem to fully remote.

Coordinator Mode — Agent Tree

In coordinator mode, the orchestrating agent is restricted to just three tools: Agent (to spawn workers), SendMessage (IPC), and TaskStop. It never directly touches the filesystem. Workers receive a full tool suite and operate independently, pushing completion notifications back as structured XML.

COORDINATOR MODE

Push Notifications — No Polling

The coordinator never polls. When a worker finishes, it emits a <task-notification> XML block containing status, summary, result, and usage. The coordinator processes these at its next tool boundary — a clean event-driven model with no busy-waiting.

<task-notification>
<status>complete</status>
<summary>Refactored auth module, 3 files changed</summary>
<result>success</result>
<usage>tokens_in=4200 tokens_out=810</usage>
</task-notification>

Three Isolation Modes

How much isolation workers get from each other depends on the mode. Default mode shares the filesystem — fast but all workers see each other's writes. Worktree mode gives each worker its own git branch; worktrees with no changes are auto-removed after the task. Remote mode (CCR) runs agents on separate machines entirely, used for ULTRAPLAN 30-minute planning sessions.

default

shared fs

Workers share the filesystem. Fast, but no git isolation.

worktree

isolated git branch

Each worker gets its own git worktree. Auto-removed if no changes.

remote (CCR)

fully remote

ULTRAPLAN: 30-min planning session via Claude Code Remote.

Coordinator System Prompt

The coordinator system prompt is approximately ~6,000 characters and defines four workflow phases: Research → Synthesis → Implementation → Verification. Each phase has explicit entry criteria, required tool calls, and exit conditions. The coordinator never moves to the next phase until the current one is complete.

PHASE 01

Research

Gather context, read code, search docs

PHASE 02

Synthesis

Plan the approach, define subtasks

PHASE 03

Implementation

Spawn workers, await notifications

PHASE 04

Verification

Validate results, run tests

"Parallelism is basically free."

Fork Subagent Cache Sharing

When the coordinator spawns children, all fork children share a byte-identical system prompt and tools schema. This means the prompt cache prefix is shared across all workers — the KV cache is computed once and reused. Only the final per-task directive differs, which is tiny. Workers get full parallel execution at near-zero marginal prompt cost.

FORK SUBAGENT CACHE SHARING

So what?

Multi-agent is an economics problem, not a routing problem. If all your workers share the same system prompt bytes, the KV cache is computed once and reused across every fork. The marginal cost of spawning a new worker is just the per-task directive — typically under 200 tokens. Design your multi-agent architecture around what can be cached.

Task IPC and Teammate Mailbox

File-based IPC

Task communication uses the filesystem with O_NOFOLLOW to prevent symlink attacks. Each task payload is capped at 5 GB.

Teammate Mailbox

Stored at .claude/teams/{team}/inboxes/{agent}.json. Delivery retries 10 times with 5–100ms exponential backoff.

SendMessage Routing

Running agent → message queued at next tool boundary. Stopped agent → auto-resumed. Evicted agent → resumed from disk transcript. No message is ever dropped.

createSubagentContext — Deep Isolation

createSubagentContext() creates a deeply isolated execution context for each worker. Only setAppStateForTasks can bridge back to the root store — everything else is scoped. This prevents workers from accidentally contaminating each other's state or the coordinator's view of the world.

5 Built-in Agent Types

Claude Code ships with 5 built-in agent types. One-shot agents skip the trailer section to save approximately ~135 characters per run — at 34 million runs per week, this represents significant token savings at scale. KAIROS mode forces all agents to operate asynchronously, enabling maximum parallelism across the swarm.

Coordinator

One-shot

KAIROS

ULTRAPLAN

Teammate

Section 08 — Security

Permissions & Hooks

A six-layer permission pipeline with programmable hooks at every boundary.

Permission decision pipeline

Every tool call travels through a fixed sequence of checkpoints. Each gate can independently reject the call — and two of them produce side-channel annotations the model sees but the user does not.

PERMISSION DECISION PIPELINE

Six permission modes

The permission mode is a session-level dial. It doesn't replace the rule engine — it controls what happens when a call reaches the user-prompt stage. Most production CI setups rundontAsk or bypassPermissions with a tight alwaysDeny list.

default

Prompt user for approval on every unrecognized tool call.

acceptEdits--accept-edits

Auto-approve all file edits. User still sees read operations.

bypassPermissions--dangerously-skip-permissions

All tool calls approved automatically. Requires --dangerously-skip-permissions.

plan--plan

Read-only mode. No writes, no bash, no network. Safe exploration.

dontAsk--dont-ask

Silently deny any tool call that would normally require a prompt.

autofeature-gated

AI classifier decides per-call. Feature-gated — not in public builds.

Rule syntax — glob patterns

Rules use a Tool(glob*) syntax. The tool name is literal; the argument is a glob matched against the serialized call. Parentheses in the pattern must be escaped.

# Always allow: safe read-only git commands
"alwaysAllow": [
  "Bash(git log*)",
  "Bash(git diff*)",
  "Bash(git status*)",
  "Read(*)",
  "Glob(*)"
]

# Never allow: destructive operations
"alwaysDeny": [
  "Bash(git push --force*)",
  "Bash(rm -rf*)",
  "Bash(sudo *)"
]

# Patterns with literal parentheses — escaped
"alwaysAllow": [
  "Bash(node -e \(*\))"
]

Glob matching uses minimatch. A trailing * matches any suffix including spaces and flags — Bash(git push*) blocks git push --force origin main.

Config cascade — five layers

Five layers, evaluated in order. Each layer can add alwaysAllow, alwaysDeny, and permissions objects. Later layers override earlier ones. Policy settings are injected externally and cannot be bypassed by any user-controlled layer.

userSettings

~/.claude/settings.json

Personal defaults. Lowest precedence.

projectSettings

.claude/settings.json

Checked-in project rules. Shared with team.

localSettings

.claude/settings.local.json

Local overrides. gitignored by default.

flagSettings

CLI flags

Runtime overrides via --dangerously-skip-permissions, --plan, etc.

policySettings

Managed policy

Enterprise MDM / Cloudflare Access injection. Highest precedence. Cannot be overridden.

Dangerous pattern detection

Before any permission rule is evaluated, the tool call is pattern-matched against known dangerous invocation forms. Matching a pattern doesn't automatically block the call — it upgrades it to the ask tier, preventing silent auto-approval even if the glob rule would permit it.

Interpreters

pythonpython3noderubyperlphpluatclsh

Package Runners

npxbunxyarn dlxpnpxpipxuvx

Shells

bashshzshfishdashkshcshtcsh

Privileged / Network

sudosudoassshscprsynccurl | bashwget | sh

Hook system

Hooks are the escape hatch for everything the permission rules can't express. They run at 26 lifecycle events, in 4 execution styles, with 3 advanced scheduling modifiers. The right mental model: hooks are middleware for the agent runtime.

26 hook events — highlights

PreToolUseFires before every tool call. Can modify input or deny.

PostToolUseFires after every tool call. Receives input + output.

SessionStartOnce per session. Used to inject context or set up environment.

SubagentStartFires when a subagent is spawned by the coordinator.

PreCompactFires before context compaction. Can add persistent memory.

PermissionRequestFires when a call reaches the user-prompt stage.

FileChangedWatches the workspace. Fires on any file modification.

CwdChangedFires when the working directory changes mid-session.

4 hook types

commandShell subprocess

stdin receives JSON payload. stdout is parsed for allow/deny/modify response. Exit 0 = pass-through.

promptSingle-turn LLM call

Default model: Claude Haiku. Receives serialized tool call. Returns JSON decision. Cheapest inference-based hook.

httpPOST request

Body is the hook payload JSON. SSRF guard rejects private IP ranges. Requires explicit allowlist for internal endpoints.

agentMulti-turn subagent

Max 50 turns. Runs in dontAsk permission mode. Can use any registered tool. Heaviest option — use for complex review logic.

Advanced modifiers

async

Hook runs in background. Tool executes immediately without waiting for hook result.

asyncRewake

Background hook. If it exits with code 2, the model is woken and given the hook output as a new message.

once

Hook auto-removes itself after first successful execution. Used for one-time setup tasks.

// .claude/settings.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "node scripts/audit-bash.js"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit",
        "hooks": [
          {
            "type": "command",
            "command": "pnpm run lint --fix",
            "async": true
          }
        ]
      }
    ],
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "cat .context/project-summary.md",
            "once": true
          }
        ]
      }
    ]
  }
}

Undercover mode — internal opsec

When Anthropic employees contribute to open source projects, Claude operates in Undercover Mode. It actively conceals its own identity to protect internal information. There is no force-off — if the system is not confident it is in an internal repo, it stays undercover.

✗Doesn't know what model it is

✗No model codenames, no unreleased versions

✗No internal repository names

✗No Slack channel references

✗No "Co-Authored-By" lines in commits

CRITICAL: There is NO force-OFF. If we're not confident we're in an internal repo, we stay undercover.

Skills & plugins

Skills are reusable prompt templates that ship as slash commands. Plugins bundle skills, hooks, and MCP server registrations into a single installable unit. Both are subject to the same permission pipeline as every other tool call.

5 skill discovery sources

user

~/.claude/skills/

Personal, across all projects.

project

.claude/skills/

Checked into the repo. Team-shared.

managed

MDM / policy inject

Pushed by enterprise policy. Read-only.

bundled

Built-in

Shipped with Claude Code. Always available.

plugin

Plugin bundle

Installed plugin contributes its own skills.

context: 'fork'

Skills declared with context: 'fork' run in an isolated subagent with their own token budget. They cannot read the parent conversation. Expensive skills don't pollute the main context window — and if they blow up, the parent session survives.

Plugin system — security notes

Homograph attack detection

Plugin names are normalized to ASCII before comparison. Lookalike Unicode characters (е vs e) are rejected.

Official impersonation protection

Plugin names matching Anthropic-reserved namespaces (anthropic:*, claude:*) require a signed manifest.

Bundle: skills + hooks + MCP servers

A single plugin can contribute all three. All are subject to the same permission pipeline as first-party code.

7+ MCP transports

stdiossehttpwssdkclaudeai-proxyinternal:forkinternal:ipc

The claudeai-proxy transport tunnels MCP calls through the Claude.ai session. Internal transports (internal:fork, internal:ipc) are not exposed in the public plugin API.

Section 10 — The Playbook

7 Lessons for Agent Builders

Patterns distilled from 500K lines of the most battle-tested AI agent in production. Every principle below is backed by a specific architectural decision in Claude Code.

Loop dumb, prompt smart

The core loop should fit on a napkin. All intelligence lives in prompts and tool descriptions.

Claude Code's entire agent is a while(true) around an API call in a single 68KB file. No framework, no orchestration layer, no state machine. The loop calls the model, dispatches tool calls, feeds results back, and repeats. If your agent framework has more logic than your prompts, reconsider what belongs where.

Tool prompts encode behavior

Write tool descriptions like you're training a new hire — anti-patterns, cross-references, failure modes.

The Bash tool prompt alone is 5,282 tokens — larger than all behavioral instructions in the system prompt combined. It doesn't just describe what Bash does; it encodes when not to use it, how it relates to other tools (prefer Read over cat, prefer Grep over rg), and what failure looks like. buildTool() enforces fail-closed defaults: any new tool is blocked from running in parallel, won't modify files, and requires user confirmation — all without extra code.

Cache drives architecture

Design your prompt structure around what can be cached. Static first, dynamic last, clear boundary.

Claude Code splits the system prompt at __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__. Everything above (identity, tools, behavioral rules) is static and cacheable across the entire fleet. Everything below (your cwd, git status, memory) changes per session. Moving the agent type list from tool schemas into messages saved 10.2% of fleet cache costs. Tools are sorted alphabetically to prevent cache invalidation from ordering changes. One structural change saved more than months of prompt engineering.

Context management > orchestration

How you manage the context window matters more than how you route between agents.

Six compaction strategies, cheapest first: microcompact clears old tool results (near-zero cost via the editing API, which preserves the prefix cache). Then snip truncation, session-memory-guided pruning, forked LLM summarization (shares prompt cache, capped at 20K output tokens), projection-based collapse, and reactive emergency compaction on HTTP 413. A circuit breaker stops after 3 consecutive failures to prevent ~250K wasted API calls/day globally.

Multi-agent is a cache-sharing problem

Fork exists for cost. Same prompt bytes = shared KV cache. Parallelism is basically free.

When the coordinator spawns workers, all children share a byte-identical system prompt and tools schema. The KV cache is computed once and reused — only the per-task directive differs. Workers get full parallel execution at near-zero marginal prompt cost. The coordinator itself is restricted to 3 tools (Agent, SendMessage, TaskStop) and never touches the filesystem. Push-based notifications, not polling.

Defense in depth, not defense in one

A 10-step permission cascade. Bypass-immune safety checks. Multiple independent gates.

Every tool call travels through: Zod schema validation → permission rule matching → PreToolUse hooks → permission mode resolution → execution → PostToolUse hooks. Dangerous pattern detection (interpreters, package runners, shells, privileged commands) upgrades calls to the "ask" tier even if a glob rule would permit them. Tree-sitter AST parsing catches injection in generated code. Classifier denial tracking caps at 3 consecutive / 20 total before escalation.

Measure cache breaks

Instrument your cache hit rate. 77% of breaks are text changes, not tool adds/removes.

Claude Code hashes the system prompt, per-tool schemas, model ID, betas, and effort level. On a cache break, it diffs to identify exactly which tool description changed. Analysis showed 77% of cache breaks came from schema text changes, not structural changes like adding or removing tools. You can't optimize what you don't measure. The meta-principle: build observability into your agent from day one.

Meta-principle

Build observability into your agent from day one.
You can't optimize what you don't measure.

Section 09 — Hidden Details

The Hidden Layer

Engineering decisions that weren't in any changelog — from binary attestation to cache economics to anti-poisoning pipelines.

C — Anti-Distillation

Poisoning the Well

Claude Code actively defends against competitors scraping its API outputs for training data. The system injects fake/decoy tools into API responses via getExtraBodyParams(). Anyone harvesting these outputs to train a rival model trains on poisoned data — tool definitions that don't exist, capabilities that don't work, patterns that lead nowhere.

The feature is gated behind tengu_anti_distill_fake_tool_injection and only active for 1st-party CLI builds. There's also a secondary mechanism: connector-text summarization in the beta headers, which obscures the chain-of-thought reasoning that would be most valuable to a distillation pipeline.

Why this matters

If you ship an AI product, consider what happens when competitors scrape your API outputs for training data. Anti-distillation is a defensive posture most teams don't think about until it's too late.

D — Speculative Execution

20 Turns Ahead

While you're reviewing step 1, Claude Code is already executing steps 2 through 20. The speculative execution engine runs up to 20 turns / 100 messages ahead of your approval cursor.

The key insight is the file overlay system. All writes go to a temporary directory, not the real filesystem. Read-only tools (Read, Glob, Grep, ToolSearch, LSP) are auto-approved and run against the real filesystem. Write tools are captured in the overlay. If you accept, the overlay is applied atomically. If you reject, it's discarded — zero side effects.

// speculation.ts
MAX_SPECULATION_TURNS = 20
SAFE_READ_ONLY_TOOLS = ['Read', 'Glob', 'Grep', 'ToolSearch', 'LSP']
WRITE_TOOLS → overlay to temp dir, apply on accept

Why this matters

If your agent needs human approval for risky actions, don't block the entire pipeline. Speculatively execute safe read-only actions in the background and capture writes in an overlay. The user sees instant results while retaining full veto power.

A — The Buddy System

A Hidden Gacha Companion

Claude Code ships a hidden companion system with 18 species across 5 rarity tiers. The interesting engineering detail: a companion's "soul" (name and personality) is model-generated, but its "bones" (species, rarity, stats) are deterministically derived from the user's userId hash on every session start. Users cannot edit their way to a legendary — the bones are never persisted.

Species names are hex-encoded in source because one collides with an internal model codename flagged in excluded-strings.txt. This is the same anti-leak pipeline that catches references to unreleased models anywhere in the codebase.

B — Engineering Deep Cuts

The Details That Weren't in Any Announcement

Binary attestation via Zig

The native HTTP layer is compiled with Zig. A placeholder byte sequence in the binary is overwritten at release time with the attestation hash — the value never appears in source code or build logs.

Anti-training-data poisoning

When the model generates synthetic tool results, those outputs are exported into a structured dataset used for HFI (Human Feedback on Inaccuracy) rejection training. Poisoned tool outputs get flagged before they can contaminate future model weights.

Numeric length anchors

System prompts use quantitative directives like "~1.2% output token reduction" instead of qualitative "be concise." Research showed numeric anchors produce measurably tighter compliance than natural-language instructions.

Memoized session date

The current date injected into the system prompt is memoized for the entire process lifetime. After midnight, the date is stale — but a stale date beats busting the entire conversation's prompt cache prefix. Cache economics > correctness for a non-critical field.

@[MODEL LAUNCH] markers

Scattered throughout the codebase — TODO markers for work gated on the next model release. A grep for this string shows the full pending surface area for any model ship.

Tacit saves every agent session, searchable forever.

Get Tacit — $29 →

// built by people who read the source
We built Tacit because we understand this system at the source level.
Session memory for Claude Code.
Every session saved. Every decision tracked. Every pattern discovered.
GET TACIT — $29 →Early adopter price. One-time purchase. Lifetime updates.