Flue: When the Astro Team Builds a Headless Claude Code
The withastro org shipped Flue — a TypeScript framework that takes Claude Code's harness shape (sandbox, tools, sessions, skills, AGENTS.md), strips the TUI, and makes it deployable to Node.js or Cloudflare Workers from the same source. This is what 'agents are directories that compile to servers' looks like in practice. v0.3.5, Apache-2.0, 7,013 SDK lines, zero tests.
TL;DR
The team behind Astro shipped Flue, a TypeScript framework for headless agents. The git log shows Fred K. Schott — Astro’s co-founder/CEO — is one of the two primary committers. It is, in a literal sense, Claude Code with the TUI removed and a build target bolted on.
A repo with agents/*.ts, roles/*.md, optional AGENTS.md, and .agents/skills/*/SKILL.md compiles via flue build into either a single Node .mjs server or a Cloudflare Workers + Durable Objects bundle. Same source. Same handler signature. Two production targets.
Three architectural decisions matter:
SessionEnvis the only sandbox interface the core knows about. In-memory just-bash, host-mounted just-bash, Cloudflare Sandbox DO over RPC, Daytona — every one of them adapts to the same nine methods. Branch once at the boundary, never inside. This is the same shape Vercel’s just-bash ships and it’s what makes the “virtual sandbox by default” claim cheap enough to mean.- Build plugins, not config flags. Node and Cloudflare are two
BuildPluginimplementations choosing betweenbundle: 'esbuild'(Flue owns the bundle) andbundle: 'none'(wrangler owns it). The Cloudflare path emits one Durable Object class per webhook agent and lets wrangler bundle the rest. Pre-bundling on top of wrangler causednodejs_compatfailures with packages liketarthat import barefs/zlib/assert— so they stopped doing it. - Filesystem-as-config, discovered at runtime.
AGENTS.md,CLAUDE.md, andSKILL.mdfiles are read bydiscoverSessionContext()on everyinit()against the resolved cwd. A skill can live at.agents/skills/triage/SKILL.mdinside the cloned repo and the agent picks it up — without a redeploy. This is the Claude Code mental model, ported.
What it’s not: a router framework (no LangGraph), a multi-agent orchestrator (no CrewAI), or an SDK-over-HTTP wrapper (no Vercel AI SDK). It’s a runtime + build target, and that’s the whole pitch.
The status flag is honest: > Experimental — Flue is under active development. APIs may change. v0.3.5 SDK, 760 GitHub stars, 133 commits, zero unit tests. Treat as L7 reading material, not a production dependency yet.
What Flue Actually Is
The framing on flueframework.com:
“The Agent Harness Framework. Not another SDK. Build powerful, autonomous agents with Flue’s programmable TypeScript harness. Write once, deploy anywhere.”
The framing inside the repo’s own SDK README:
“The Sandbox Agent Framework. If you know how to use Claude Code (or OpenCode, Codex, Gemini, etc)… then you already know the basics of how to build agents with Flue.”
Note the slip: the marketing site says harness, the SDK README says sandbox. They’re not contradicting — they’re pointing at two layers of the same thing. A Sandbox Agent (OpenAI’s term, from the Agents API docs) pairs an agent harness with a secure isolated workspace. Flue ships both halves and lets you choose how heavy each one needs to be.
The minimum viable agent is 20 lines:
// .flue/agents/hello-world.ts
import type { FlueContext } from '@flue/sdk/client';
import * as v from 'valibot';
export const triggers = { webhook: true };
export default async function ({ init, payload }: FlueContext) {
const agent = await init({ model: 'anthropic/claude-sonnet-4-6' });
const session = await agent.session();
const result = await session.prompt(
`Translate this to ${payload.language}: "${payload.text}"`,
{
result: v.object({
translation: v.string(),
confidence: v.picklist(['low', 'medium', 'high']),
}),
},
);
return result;
}
What flue build --target node does with that file:
- Discovers
agents/hello-world.ts(and anyroles/*.md). - Generates
dist/_entry_server.ts— a Hono server that mounts every webhook-triggered agent atPOST /agents/<name>/<id>. - Runs esbuild with
platform: 'node',target: 'node22', externalizes the user’s direct deps, bundles Flue infra. - Emits
dist/server.mjs(single self-contained module) plusdist/manifest.json.
What flue build --target cloudflare does with the same file:
- Same discovery.
- Generates
dist/_entry.ts— a Workerfetchhandler that callsrouteAgentRequest()from theagentsSDK and emits one Durable Object class per webhook agent (PascalCase from kebab-case). - Reads the user’s
wrangler.jsonc, validates it (compat date floor2026-04-01,nodejs_compatflag), merges Flue’s DO bindings + migrations, writesdist/wrangler.jsonc. - Does not bundle. Wrangler bundles at deploy time.
Same source. Two artifacts. The user’s handler doesn’t change.
Why Astro Built This
The most distinctive fact about Flue is that the withastro org shipped it — and the second most distinctive fact is that the framework is published under the same GitHub organization that owns Astro itself, not as a spinoff. The git log shows Fred K. Schott (Astro’s co-founder, CEO of The Astro Foundation) as one of the two primary committers alongside Brian Giori.
This is a real bet, not a side project. The pattern is familiar: Astro grew up in the era of “the website is a build artifact.” Pages compile to static HTML, server functions, or edge handlers from one source. The pitch was write once, deploy anywhere. That’s the same phrase on flueframework.com, applied to agents.
Three things this signals:
-
The framework playbook generalizes. Same source → multiple deploy targets is a pattern Astro proved for pages. Flue is the experiment: does it work for agent runtimes? The architectural moves are the same — content collections become
agents/, integrations become connectors, adapters become build plugins. -
The Cloudflare-first bias is intentional. Astro’s deepest deploy adapter is Cloudflare; their docs lead with it; their team has Cloudflare-specific expertise. Flue inherits that bias — the Cloudflare target is the more opinionated half of the framework, and the dev/deploy parity (
unstable_startWorker,bundle: 'none',nodejsCompatMode: 'v2'hardcoded for compat-date floors past the v1→v2 cutover) shows operators who have shipped real workers. -
The “agents are the new pages” thesis is implicit. Astro’s value-add for the static-site era was treating pages as compile-time artifacts. Flue’s value-add for the agent era is treating agents as compile-time artifacts. If you squint,
agents/hello.tsis a page,roles/triager.mdis a layout,.agents/skills/issrc/content/. Whether or not that thesis pays off is an open question — but the org is putting capital behind it, not just an OSS bet.
The honest read: Flue is Astro’s swing at owning the agent-framework primitive layer the way Astro owns the static-site primitive layer. v0.3.5 is the very early innings; the bet is on the pattern, not the current code.
The Three Layers Worth Reading
Flue is small enough to internalize: 7,013 lines of TypeScript across the SDK, plus a CLI and a connectors package. But three pieces are worth slowing down on.
Layer 1 — SessionEnv: the universal sandbox interface
Every sandbox in Flue — in-memory just-bash, 'local' (cwd-mounted just-bash), Cloudflare Sandbox DO over RPC, Daytona via @daytona/sdk — adapts to a single nine-method interface:
interface SessionEnv {
exec(command, options?): Promise<ShellResult>;
scope?(options?: { commands?: Command[] }): Promise<SessionEnv>;
readFile(path): Promise<string>;
readFileBuffer(path): Promise<Uint8Array>;
writeFile(path, content): Promise<void>;
stat(path): Promise<FileStat>;
readdir(path): Promise<string[]>;
exists(path): Promise<boolean>;
mkdir(path, options?): Promise<void>;
rm(path, options?): Promise<void>;
cwd: string;
resolvePath(p): string;
cleanup(): Promise<void>;
}
This is the same architectural move every serious agent platform converges on — you can name the interface differently (LangChain’s BaseTool env, Anthropic’s bash/text_editor tool surface, Tensorlake’s ToolRunner), but the shape is identical: file ops + exec + scoping + cleanup.
What’s interesting is what’s optional: scope?(). It’s the per-call command-injection hook. BashFactory-backed envs implement it (they spin up a fresh just-bash with a registered command for that one prompt); remote sandboxes don’t, because workerd RPC and Daytona’s API don’t expose a way to scope a command to one tool call. So createScopedEnv() throws loudly if you try to pass commands: [gh, npm] to a Daytona session — no silent fallback, no half-feature. It’s the kind of design choice you only make after you’ve watched a half-feature confuse three users in production.
Layer 2 — withScopedRuntime: the per-call save/restore harness
Inside session.ts:489, every prompt/skill/task call enters this routine:
1. Save previous { tools, model, systemPrompt }.
2. Resolve model with precedence call > role > agent.
3. Compose new system prompt from agent base + <role> overlay.
4. Build tool set: built-ins (read/write/edit/bash/grep/glob/task)
bound to a scoped env (commands registered for THIS CALL ONLY)
+ custom ToolDef[] (agent-wide + per-call).
5. Run harness.prompt(text), wait for idle.
6. Restore previous state in finally.
The thing it’s solving — agent-wide defaults you can override per call without leaking into stored history or the next call — is the same thing Claude Code solves with its --system-prompt overlays and per-agent model config. Flue makes the precedence explicit (call > role > agent) and the rollback mandatory.
The cost is one mutable harness instance shared across calls. The benefit is no per-call agent re-init, which on a Cloudflare Durable Object — where init time is the hot path — is a real choice.
Layer 3 — Compaction with append-only history
Long sessions outgrow the model context. Flue’s answer is SessionHistory — a flat array of entries linked by parentId, with a leafId pointer at the active tip:
type SessionEntry = MessageEntry | CompactionEntry | BranchSummaryEntry;
interface CompactionEntry {
type: 'compaction';
summary: string;
firstKeptEntryId: string; // pointer back into the tree
tokensBefore: number;
details?: { readFiles: string[]; modifiedFiles: string[] };
}
When tokens approach contextWindow - reserveTokens (default 16,384), compact() runs: a CompactionEntry is emitted, its summary field replays as a synthetic user message at the start of the next context window, and everything before firstKeptEntryId is excluded from the runtime message list.
The structure supports branches (BranchSummaryEntry.fromId) but the current code never emits one — every append moves leafId forward. The branch slot exists; the implementation is a Chesterton’s fence the maintainers drew on purpose.
Token estimation is chars/4 (compaction.ts:68) — conservative, routinely overestimates non-English text. Tool result truncation cap is 2,000 chars before summarization. The summarization model is the harness’s current model, so a cheap default summarizing a complex coding session can produce poor recoveries. This is the same tradeoff Claude Code makes in its /compact flow; Flue inherits the failure mode.
The Two-Sandbox Bet
Most agent frameworks make you choose. Either you get a real Linux container (Daytona, E2B, Modal, Cloudflare Sandbox), or you get a “tools-only” runtime (Vercel AI SDK, LangChain). Flue’s bet is that the right answer is both, gated by one knob:
const agent = await init({ sandbox: 'empty' }); // virtual sandbox (default)
const agent = await init({ sandbox: 'local' }); // host filesystem mounted at /workspace
const agent = await init({ sandbox: daytona(...) }); // real container
const agent = await init({ sandbox: getVirtualSandbox(env.R2_BUCKET) }); // R2 mounted as fs
The default is the virtual sandbox: in-memory just-bash with full POSIX-like behavior — grep, glob, read, write, bash all work, but nothing leaves the process. The pitch on the Cloudflare side is:
“A virtual sandbox is going to be dramatically faster, cheaper, and more scalable than running a full container for every agent, which makes it perfect for building high-traffic/high-scale agents.”
What the virtual sandbox enables that nobody else cleanly delivers: a knowledge-base agent where R2 is the filesystem. getVirtualSandbox(env.KNOWLEDGE_BASE) mounts an R2 bucket directly into the agent’s just-bash. The agent does grep "billing question" / and it works. No vector DB, no embeddings pipeline, no separate retrieval service. The retrieval is filesystem traversal, and the LLM is already trained to use grep.
The tradeoff: a virtual sandbox can’t run git, npm install, or anything that wants real subprocesses. For coding agents you escape-hatch to Daytona (or Cloudflare Sandbox, when its DO RPC adapter is wired). The decision tree is therefore:
| Agent kind | Sandbox | Why |
|---|---|---|
| Translation, classification, structured extraction | 'empty' (virtual) | No filesystem needed, just text in / text out |
| Knowledge-base support agent | getVirtualSandbox(R2) | Filesystem is the knowledge base |
| Issue triage in CI | 'local' | Repo already cloned, gh/npm are host CLIs |
Coding agent with git push | daytona(...) | Need a real container with persistent state |
This decision tree is the framework’s actual contribution. Most agent SDKs collapse the choice; Flue makes it the first parameter of init().
The Cloudflare Bet (and a One-Line Heuristic Holding Up Production)
Flue’s Cloudflare target is the more opinionated half of the framework. The opinions are visible in the code:
One Durable Object class per webhook agent. The generated entry emits:
export class Hello extends Agent {
async onRequest(request) { return handleAgentRequest(request, this, 'hello', handler_hello); }
async onFiberRecovered(ctx) { /* flue:webhook:* fibers */ }
}
PascalCase from kebab-case. Every webhook agent is its own DO class with its own session storage, its own SQL store, its own fiber recovery. The agents SDK from Cloudflare routes POST /agents/hello/<id> to the right DO. Flue doesn’t reinvent routing — it bolts on top of the official agents package.
bundle: 'none' for Cloudflare. This is the choice with the most history. The commit log shows two prior attempts to pre-bundle on top of wrangler that both broke nodejs_compat. The fix was to stop bundling at all on the Cloudflare path:
“Pre-bundling on top of wrangler caused
nodejs_compatresolution failures (e.g.tarpackage using barefs/zlib/assert). Letting wrangler be the only bundler eliminates that whole class of problem and makes dev/deploy paths identical.”
The downside: Flue gives up control of the bundle, dist size, sourcemaps. The upside: dev and deploy go through identical wrangler invocations. What works in flue dev --target cloudflare works in production. That’s the contract.
The one-line heuristic. This is the part of the codebase that will haunt someone in 18 months:
// resolveSandbox in build-plugin-cloudflare.ts
if (Object.getPrototypeOf(sandbox)?.constructor?.name === 'DurableObject') {
return cfSandboxToSessionEnv(sandbox);
}
Workerd’s RPC stub for a Cloudflare Sandbox DO is a proxy. Structural duck-typing returns true for any property check (the proxy lies). instanceof DurableObject from cloudflare:workers returns false (different identity). Empirically, the only reliable signal is the constructor name string of the prototype. This is documented (07-danger-zones.md), and the commit history shows the prior attempt — 8120f17 fix(cloudflare): use instanceof DurableObject — was reverted by 0a8a646 fix(cloudflare): use prototype constructor name for sandbox detection. If workerd renames its internal class, every Flue Cloudflare user breaks at runtime, with no compile-time signal.
This is what the L7 takeaway looks like: a framework that’s honest about which integration points are duct tape, and which aren’t.
What Flue Is Replacing (Per Their Own Site)
The flueframework.com page lists three named replacement targets:
“Replacing: Dosu, Greptile, CodeRabbit”
Three vertical AI tools — issue triage, code search, code review. These are not Mastra or LangGraph competitors. They’re SaaS endpoints. The pitch is: stop renting an opinionated SaaS for narrow workflows, build a custom Sandbox Agent that does the same thing in 200 lines of TypeScript and ship it as a Cloudflare Worker.
The implicit comparison set:
| Framework | What it gives you | What it doesn’t |
|---|---|---|
| Mastra | TypeScript agent framework, opinionated DAG runtime, deploy adapters | No sandbox abstraction; tools are just functions |
| LangGraph | Stateful graph orchestration, explicit checkpointer, time-travel | Python-first; no built-in agent harness; no sandbox |
| Vercel AI SDK | Streaming UI, tool calls, model routing | No agent harness, no sessions, no sandbox |
| Claude Code | Full agent harness, sandbox, skills, AGENTS.md — but for the developer | Not headless, not deployable as a server |
| Flue | Claude-Code-shaped harness + sandbox + sessions + skills, headless and deployable | No graph orchestration, no built-in HITL, no eval gate |
The slot Flue fills: “I want Claude Code’s developer ergonomics, but for an HTTP endpoint my customers hit, deployed on infra I trust.” That slot was empty until April 2026. Mastra is the closest competitor, but Mastra doesn’t have the sandbox/just-bash bet, the AGENTS.md/skills runtime discovery, or the Astro-team distribution.
What Flue Doesn’t Do (Yet)
Reading honest framework code is partly about reading what’s missing. From a deep walk of the SDK:
No HITL primitive. There’s no pause(), resume(), no resume-token issuer, no Slack/Linear adapter. If you need humans in the loop, you build the broker yourself. (See agent-infrastructure-foundation for the interface shape this needs.)
No eval gate. No promotion pipeline, no shadow mode, no canary. flue dev and flue deploy are the only gates. For a framework that wants to ship to production, this is the obvious next interface to grow.
No durable execution layer. Sessions persist on Cloudflare via Durable Objects; on Node, they default to InMemorySessionStore (process lifetime). For long-running workflows (>1 hour, multi-region, retry semantics), you’d reach for Temporal or Inngest yourself.
No tests. Zero unit tests in the repo. The examples/hello-world/ directory has 11 example agents that double as integration tests, but there is no CI-runnable suite. The README is explicit: examples/hello-world/.flue/agents/compaction-test.ts is what you run when you change compaction logic.
Cron triggers are manifest-only. export const triggers = { cron: '*/5 * * * *' } is recorded in dist/manifest.json and never wired to anything. The build emits the schedule, the runtime ignores it. (Cloudflare cron triggers exist; the wiring is just not done.)
MCP is one-direction. connectMcpServer() calls out to remote MCP servers. Flue does not expose your agents as MCP servers. No stdio MCP server spawning, no OAuth callback handling. This is a v1 scope choice and the README says so.
flue run is Node-only. Cloudflare target is rejected up front. If you want to test the Cloudflare path one-shot, you have to use flue dev --target cloudflare, which is interactive.
The Provenance Trail
The deps file tells you the architecture:
"dependencies": {
"@cloudflare/shell": "^0.3.2",
"@hono/node-server": "^1.14.0",
"@mariozechner/pi-agent-core": "*",
"@mariozechner/pi-ai": "*",
"@modelcontextprotocol/sdk": "^1.29.0",
"@valibot/to-json-schema": "^1.0.0",
"esbuild": "^0.25.0",
"hono": "^4.7.0",
"just-bash": "^2.14.2",
"package-up": "^5.0.0",
"valibot": "^1.0.0"
}
Two names matter:
@mariozechner/pi-agent-core and @mariozechner/pi-ai — Mario Zechner’s pi-coding-agent toolkit. The actual harness.prompt() / harness.waitForIdle() event loop driving every Flue session is pi-agent-core. The model abstraction (Model, Type schema, completeSimple) is pi-ai. Flue’s contribution is the shell around pi-agent-core: sessions, build plugins, sandbox interface, AGENTS.md discovery, compaction.
just-bash — Vercel Labs’ in-memory POSIX-ish bash runtime. This is the engine behind 'empty' and 'local' sandboxes. Without just-bash, Flue would have to ship a container per agent. With just-bash, the default sandbox is a single in-process JS object with no subprocess and no syscall — orders of magnitude lighter than spawning a Daytona or Cloudflare Sandbox container.
Knowing the provenance reframes what Flue is: a productized harness pattern on top of two upstream libraries that already had the hard parts. The Astro team’s value-add is the build pipeline, the deploy targets, the Cloudflare-DO topology, and the API ergonomics.
The Honest Adoption Calculus
When to bring Flue into a real system, in 2026:
Yes, adopt now if:
- You’re building a self-hosted equivalent of Dosu/Greptile/CodeRabbit and want to own the agent. The sandbox abstraction + AGENTS.md/skills runtime discovery + Cloudflare DO topology compress months of plumbing into a weekend.
- You already use Cloudflare Workers + Durable Objects in production and want first-class agent state on the same primitives. Flue is the only framework that natively assumes DO-backed sessions.
- You want a single TypeScript codebase that builds for both local Node CI and Cloudflare production. Mastra can do this with adapter packages; Flue does it with one knob.
Wait if:
- You need HITL approval flows, eval gates, or compliance-grade audit. Build them yourself or pick a framework that already has them (Phoenix, Braintrust, Temporal).
- You’re allergic to pre-1.0 APIs. The README is explicit: APIs may change. v0.0.x users had to migrate; v0.3.x users will too.
- You need Python. There’s no Python port.
- You want eval-gated promotions, shadow traffic, or canary rollouts. The pipeline isn’t there yet.
Don’t adopt if:
- You need predictable latency per agent invocation. Cloudflare DO cold start +
discoverSessionContext()(which doesreaddir(cwd)plus readsAGENTS.mdand everySKILL.mdagainst the resolved cwd) + harness wiring is non-trivial overhead per first request. Bench it on your own data before committing; for raw chat-completion latency you’d skip the harness entirely. - You need multi-tenant isolation guarantees stronger than “one Durable Object per agent ID.” Flue doesn’t sandbox per tenant — it sandboxes per agent. If two tenants share an agent, they share a DO.
- You expect real test coverage from your dependencies. Flue has 0 unit tests. The bus factor is the maintainer count of the withastro org.
Patterns Worth Stealing
Three patterns from the SDK worth extracting into your own internal framework, whether or not you adopt Flue:
-
Filesystem-as-knowledge-base beats vector DB for long-tail tasks. Flue’s R2-as-virtual-sandbox example is the cleanest illustration. The agent uses
grepbecause the LLM already knows how to usegrep. No embeddings pipeline, no chunking heuristic, no retrieval tuning. For a knowledge base where the corpus fits comfortably in a Cloudflare R2 bucket and the agent’s queries are mostly substring/regex-shaped, this is now the lazy path. The threshold where vector indexing wins is higher than most teams think. -
Per-call model precedence as an explicit, named rule.
call > role > agentis what Flue documents inroles.tsand enforces inwithScopedRuntime. Most agent frameworks let model choice happen ambiently — wherever you happen to set it last wins. Naming the precedence and enforcing rollback infinallyblocks is how you stop “why did this prompt use Haiku instead of Opus?” from becoming a 2-hour Datadog session. Steal the rule even if you don’t steal the code. -
Generated TS entries are the third-rail to design around. Both Flue build plugins emit TypeScript strings the user never sees. Type errors in those strings surface only at user runtime when they trip esbuild or wrangler. The Flue maintainers are explicit about it: “any edit to these strings should be paired with a manual run of
flue run hello --target nodeand the equivalent CF dev path”. If you ship code-generated entry points, the generated artifact needs to be diff-stable, regenerated on every build, and exercised by an example that actually compiles. Otherwise you eat the cost the next person inherits.
The summary, then, isn’t “use Flue.” It’s: the shape Flue is endorsing is the shape your own internal framework should converge on if you’re building agents to ship. A nine-method sandbox interface. AGENTS.md + skills discovered at runtime. A scoped runtime that saves and restores per call. Build plugins, not config flags. Same source, two targets.
Whether you npm install @flue/sdk or rebuild it from the same primitives is a separate decision. But the patterns are now public, named, Apache-2.0, and 7,013 lines of TypeScript away from being copied into your own monorepo.
Sources
Sources & Provenance
Verifiable sources. Dates matter. Credibility assessed.