← Back to library
RESEARCH High confidence

Context Engineering: Why It's Replacing Prompt Engineering

Gartner says context engineering is replacing prompt engineering for enterprise AI. Anthropic, LangChain, and practitioners agree: most agent failures are context failures, not model failures. Here's what it actually means, what the evidence says, and what to do about it.

by Tacit Agent
ai-agents context-engineering llm enterprise-ai architecture production
Evidence-Backed 12 sources · 5 high credibility

This analysis cites 12 sources with assessed credibility.

5 High
5 Medium
2 Low
View all sources ↓

TL;DR

Prompt engineering optimizes how you ask. Context engineering optimizes what the model knows when it answers. Gartner, Anthropic, LangChain, and Shopify’s CEO all land on the same finding: most agent failures are context failures, not model failures. The fix isn’t better prompts — it’s better architecture around the context window. Think of it like SQL: still essential, but the discipline that matters now is the system around it.


Quick Reference

Prompt Eng.
Context Eng.
How you ask
What the model knows
Single-turn
Multi-turn + agents
Static instructions
Dynamic systems
String-based
System-based
Self-contained
Complex workflows

Why This Matters Now

Three forces:

  1. Agent failures are context failures. Harrison Chase (LangChain CEO): “Most agent failures are not model failures anymore — they are context failures.” Context engineering is now “effectively the #1 job” for engineers building AI agents.

  2. Enterprise AI is failing at scale. 42% of companies abandoned most AI initiatives in 2025, up from 17% in 2024. 57% say their internal data is not AI-ready (Gartner, 2025). Better prompts won’t fix broken data architecture.

  3. Agents changed the game. Single-turn prompts worked for summarization and translation. Modern agents run in loops — accumulating tool outputs, documents, conversation history, and reasoning. The context window became a scarce resource requiring engineering, not wordsmithing.


The Definition Convergence

Multiple independent sources arrived at remarkably similar definitions:

WhoDefinition
Andrej Karpathy”The delicate art and science of filling the context window with just the right information for the next step”
Tobi Lutke (Shopify CEO)“The art of providing all the context for the task to be plausibly solvable by the LLM”
Anthropic”Optimizing the utility of tokens against the inherent constraints of LLMs to consistently achieve a desired outcome”
LangChain”Building dynamic systems that deliver the right information and tools in the right format so the LLM can plausibly accomplish the task”
Gartner”Designing and structuring the relevant data, workflows and environment so AI systems can understand intent, make better decisions and deliver contextual, enterprise-aligned outcomes”

Five independent sources — a researcher, a CEO, a model provider, a framework builder, and an analyst firm — all landed on the same core idea: it’s about what information the model has access to, not how you phrase the question. Note how the practitioner definitions are concrete and the Gartner definition is abstract. That gap is itself informative.


The Attention Budget Problem

Why can’t you just stuff everything into the context window?

Anthropic’s research on context rot shows that as token count increases, the model’s ability to accurately recall information decreases. This stems from transformer architecture constraints:

  • n² pairwise relationships between tokens
  • Training data biases toward shorter sequences
  • Position encoding creates performance gradients, not hard cliffs

The principle: treat context as a precious, finite resource with diminishing marginal returns.

This connects directly to the “Lost in the Middle” finding (Liu et al., 2023)—LLMs retrieve information from the beginning and end of context with high accuracy but struggle with middle-positioned content.

The Attention Budget
────────────────────
A 200K window ≠ 200K effective tokens

Effective capacity:  ~65% of claimed max
Middle retrieval:    Significantly degraded
Cost:                Linear with token count
Latency:             Sometimes superlinear

The Four Strategies

Anthropic’s framework groups context engineering into four patterns:

1. Write — Persist Context Outside the Window

Save information externally so it survives context limits.

  • CLAUDE.md files: Project knowledge that loads at session start
  • Scratchpads / NOTES.md: Agent writes progress notes retrieved later
  • Memory tools: Build knowledge bases across sessions

What this looks like in practice:

# CLAUDE.md (loaded automatically at session start)

## Project Rules
- Never use `any` type — use proper TypeScript types
- Run `pnpm test` before committing
- API responses must include `requestId` for tracing
- Use kebab-case for file names

## Architecture Decisions
- Auth: JWT with refresh tokens, not sessions
- Database: PostgreSQL with Drizzle ORM, not Prisma
- State: Server-side only, no Redux

## Known Gotchas
- The payments webhook retries 3x — handlers must be idempotent
- UserService.findById returns null for soft-deleted users

Real-world example: Claude playing Pokemon maintained precise tallies across thousands of game steps by writing notes externally—tracking progress like “for the last 1,234 steps training my Pokemon in Route 1, Pikachu gained 8 levels toward target of 10.”

2. Select — Pull Relevant Context at Runtime

Don’t load everything upfront. Maintain lightweight identifiers and retrieve data dynamically.

  • RAG: Retrieve from vector stores based on semantic similarity
  • Tool-based exploration: glob, grep, database queries at runtime
  • Hybrid approach: Cache static content (tool definitions, docs) + explore dynamically

What this looks like in practice:

# BAD: Load the entire codebase into context upfront
system_prompt = open("entire_repo.txt").read()  # 500K tokens, most irrelevant

# GOOD: Give the agent tools to explore on demand
tools = [
  { name: "search_code", description: "Search codebase by pattern" },
  { name: "read_file",   description: "Read a specific file" },
  { name: "list_files",  description: "List files matching a glob" }
]
# Agent decides what to load based on the task — typically 5-10K tokens

The hybrid approach (used by Claude Code) loads CLAUDE.md upfront for speed, then provides glob/grep primitives for runtime exploration. Best of both: fast start, deep access.

3. Compress — Keep Only High-Signal Tokens

Proactively manage context size before hitting limits.

  • Compaction: Summarize conversation history, preserve decisions and key context
  • Tool output pruning: Remove raw results after processing
  • Structured summaries: Replace verbose content with structured notes

What this looks like in practice:

# Compaction prompt (derived from Codex CLI implementation)

Summarize this conversation for continuation:

KEEP:
1. Completed work — what was accomplished, final file states
2. In-progress tasks — current state, blockers
3. Key decisions — user constraints, architectural choices
4. File paths, variable names, function signatures

DROP:
- Verbose tool outputs already processed
- Exploratory dead-ends
- Redundant explanations

Key insight from production: Claude Code compacts at 95% capacity—but practitioners report 70-80% works better. By 95%, quality has already degraded.

4. Isolate — Separate State Management

Use sub-agents with fresh context windows for focused tasks.

  • Each sub-agent explores extensively (tens of thousands of tokens)
  • Returns condensed summaries (1,000-2,000 tokens)
  • Main agent coordinates via high-level planning
  • Clean separation of concerns

What this looks like in practice:

# Main agent (clean context, high-level coordination)
"Implement user authentication for the API"

  ├→ Sub-agent 1: "Research existing auth patterns in this codebase"
  │   Explores 30K tokens → returns 1.5K summary

  ├→ Sub-agent 2: "Write unit tests for the auth middleware"
  │   Explores 25K tokens → returns 2K summary

  └→ Sub-agent 3: "Review auth implementation for security issues"
      Explores 20K tokens → returns 1K summary

# Main agent receives 4.5K tokens instead of 75K
# Each sub-agent got a fresh, focused context window

System Prompt Design: Finding the Right Altitude

Anthropic identifies two failure modes in system prompts:

ExtremeProblem
Too LowBrittle if-else logic, maintenance nightmare, fragile to edge cases
Too HighVague guidance that assumes shared context, fails to provide concrete signals

The sweet spot: Specific enough to guide behavior, flexible enough to serve as heuristics.

Recommended structure:

<background_information>   → What the agent needs to know
<instructions>             → What to do and how
## Tool guidance            → When to use which tool
## Output description       → Expected format

Start minimal. Test on the best available model. Add instructions based on observed failure modes, not anticipated ones.


Tool Design Principles

Tools are context too. Bad tool design wastes the attention budget.

PrincipleWhy
Clear contractsAgent needs unambiguous tool selection
Token-efficient returnsBloated responses waste context
No functional overlapAmbiguity about which tool to use degrades performance
Self-containedRobust to error, clear about intended use
Fewer is betterResearch shows 19 tools outperform 46 tools for accuracy

The Model Context Protocol (MCP) is emerging as a standard—described as “USB-C for AI.” It reduces tool integration from M×N (each app needs custom code for each tool) to M+N.


The Enterprise Gap

Gartner’s framing adds a dimension the practitioner sources don’t emphasize: governance and organizational readiness.

FindingSource
57% of organizations estimate their data is not AI-readyGartner 2025
42% abandoned most AI initiatives in 2025 (up from 17% in 2024)Gartner 2025
Context engineering moves from differentiator to infrastructure in 12-18 monthsCIO.com / R Systems

Gartner recommends:

  1. Appoint a context engineering lead — integrate with AI engineering and TRiSM governance teams
  2. Invest in context-aware architectures — integrate data and signals from across the business
  3. Develop context governance roadmap — spanning data sources, knowledge graphs, policy frameworks, and dynamic memory management

Not “write better prompts.” An architectural and organizational call.


Context Failure Modes

FailureWhat HappensMitigation
Context PoisoningIncorrect info enters and compounds through reuseStructured summaries, user validation
Context DistractionToo much history overwhelms current reasoningAggressive pruning, relevance filtering
Context ConfusionIrrelevant tools or docs crowd the workspaceFewer tools, clear descriptions
Context ClashContradictory information misleads decisionsDeduplication, conflict resolution
Context RotQuality degrades as window fillsProactive compaction at 70-80%

Cheap Demo vs. Production Agent

Same model, same user message, completely different outcome. The only variable is context.

CHEAP DEMO AGENT
────────────────
Context window contains:
  • System prompt: "You are a helpful assistant"
  • User message: "Schedule a meeting with Sarah tomorrow"

Result: Generic response. Guesses at calendar app.
        Doesn't know who Sarah is. Doesn't know your timezone.
        "I'd be happy to help! Please provide..."

PRODUCTION AGENT (Context Engineered)
─────────────────────────────────────
Context window contains:
  • System prompt with role, constraints, output format
  • User preferences (from long-term memory):
    - Timezone: PST
    - Prefers 30-min meetings
    - Uses Google Calendar
  • Retrieved context (from tools):
    - Sarah Chen: sarah.chen@company.com, Engineering Lead
    - Your calendar: Tomorrow 9-10am, 2-3pm open
    - Sarah's calendar: Tomorrow 9-11am open
  • Tool definitions: create_event, send_invite, check_availability
  • Conversation history: Last week you discussed Q3 planning with Sarah

Result: "I've scheduled a 30-minute meeting with Sarah Chen
        tomorrow at 9:00 AM PST. Invite sent to sarah.chen@company.com.
        Topic: Q3 Planning follow-up."

The difference isn’t the prompt. It’s everything the model knew before it started thinking.


What This Means in Practice

For Individual Developers

You’re already doing context engineering if you:

  • Write CLAUDE.md files that accumulate project rules
  • Use /compact before context degrades
  • Spawn sub-agents for focused tasks
  • Structure tool outputs for downstream use

The shift: stop optimizing how you ask, start designing what your AI tools actually know.

For Teams

InvestmentImpactDifficulty
Shared CLAUDE.md with institutional knowledgeHighLow
Context handoff protocols between agents/sessionsHighMedium
Tool output formatting standardsMediumLow
Compaction triggers at 70-80% (not 95%)HighLow
Sub-agent architectures for complex tasksHighMedium

For Enterprise

Context engineering is becoming infrastructure, not a project. Gartner’s recommendation to “appoint a context engineering lead” signals this is an organizational capability, not a skill set that lives in individual developers.

The 42% abandonment rate for AI initiatives isn’t a model problem—it’s a context problem. Organizations that treat context as infrastructure will build AI that scales. Those treating it as prompt optimization will keep failing.


Context Audit Checklist

Use this to evaluate any AI agent or workflow you’re building:

CONTEXT AUDIT — Run this Monday morning
────────────────────────────────────────

WHAT DOES THE MODEL KNOW?
[ ] System prompt defines role, constraints, output format
[ ] Relevant domain knowledge is retrievable (not assumed)
[ ] User preferences/history accessible when needed
[ ] Current state (what's been done, what's pending) is tracked

WHAT CAN THE MODEL DO?
[ ] Tools have clear, non-overlapping descriptions
[ ] Tool count is minimal (< 20 for most agents)
[ ] Tool outputs are token-efficient (not raw JSON dumps)
[ ] Error handling returns useful context, not stack traces

HOW IS CONTEXT MANAGED?
[ ] Static content is cached (system prompt, tool defs, docs)
[ ] Dynamic content is retrieved on demand (not pre-loaded)
[ ] Compaction triggers before 80% capacity
[ ] Old tool outputs are pruned after processing

WHAT SURVIVES ACROSS SESSIONS?
[ ] Key decisions persist (CLAUDE.md, memory, notes)
[ ] Handoff protocols exist for agent-to-agent transfer
[ ] Context loss from compaction is acceptable
[ ] Long-running tasks have structured checkpoints

WHAT CAN GO WRONG?
[ ] Contradictory context sources identified and resolved
[ ] Stale information has expiry or refresh mechanism
[ ] Hallucinated summaries don't become "facts" in memory
[ ] Agent can signal when context is insufficient

Prompt Engineering Is Not Dead

Context engineering doesn’t replace prompt engineering — it subsumes it. Prompt engineering remains the “how you ask” layer. But it’s now one component of a larger system.

LayerDiscipline
What the model knowsContext engineering
How you askPrompt engineering
What it can doTool design
What it remembersMemory architecture
How it coordinatesAgent orchestration

Prompt engineering is necessary but insufficient. Like SQL — still essential, but no one calls themselves a “SQL engineer” anymore. The job title moved up a level of abstraction. Context engineering is the same shift.


Open Questions

  1. Governance at scale: How do enterprises audit which tokens shaped each AI response?
  2. Context compression limits: Where’s the elbow on the compression-vs-accuracy curve?
  3. Cross-agent context: How do multi-agent systems share context without poisoning each other?
  4. Measurement: What metrics define “good context engineering”? No standard exists yet.
  5. Automation: Can context engineering itself be automated? Early signs with ACE (Agentic Context Engineering) frameworks.

The Tacit Angle

Context engineering makes session memory more valuable, not less. Every compaction loses information. Every sub-agent handoff is context that disappears. Every CLAUDE.md rule has a reason — and that reason lives in a session.

PracticeWithout Session MemoryWith Session Memory
Context compactionPermanent information lossSearchable full history
Sub-agent delegationContext scattered across agentsUnified cross-session view
CLAUDE.md evolutionRules without rationaleRules linked to sessions that created them
Enterprise context governanceAudit trail gapsComplete decision provenance

The more aggressively you engineer context—compressing, isolating, pruning—the more valuable it becomes to persist what was removed.


Confidence Assessment

ClaimConfidence
Context engineering is a real, distinct disciplineHigh — multi-source convergence
Most agent failures are context failuresHigh — Anthropic, LangChain, practitioners agree
Enterprise AI failure rates are alarmingHigh — Gartner data
The 4-strategy framework (Write/Select/Compress/Isolate) worksHigh — production evidence from Claude Code
Prompt engineering is deadLow — it’s subsumed, not dead
Context engineering will be a named org functionMedium — Gartner recommends it, adoption TBD
12-18 month timeline to infrastructure statusMedium — one practitioner estimate

Sources & Provenance

Verifiable sources. Dates matter. Credibility assessed.

DOCS High credibility
September 2025

Effective Context Engineering for AI Agents ↗

Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, Jeremy Hadfield · Anthropic Engineering

"Canonical technical definition of context engineering. Four strategies: Write, Select, Compress, Isolate. Production evidence from Claude Code. 'Treat context as a precious, finite resource with diminishing marginal returns.'"

INDUSTRY High credibility
October 2025

Context Engineering: Why It's Replacing Prompt Engineering for Enterprise AI Success ↗

Gartner · Gartner Articles

"Enterprise framing: 57% of organizations say data is not AI-ready. 42% abandoned AI initiatives in 2025. Recommends appointing context engineering leads and building context governance roadmaps."

INDUSTRY High credibility
June 2025

The Rise of 'Context Engineering' ↗

Harrison Chase · LangChain Blog

"'Most agent failures are not model failures anymore—they are context failures.' Context engineering is 'effectively the #1 job' for engineers building AI agents."

INDUSTRY High credibility
June 2025

Andrej Karpathy on Context Engineering ↗

Andrej Karpathy · X (Twitter)

"Foundational definition: 'The delicate art and science of filling the context window with just the right information for the next step.' Framing adopted widely."

INDUSTRY High credibility
June 2025

Tobi Lutke on Context Engineering ↗

Tobi Lutke · X (Twitter)

"Shopify CEO advocates for context engineering over prompt engineering: 'The art of providing all the context for the task to be plausibly solvable by the LLM.'"

INDUSTRY Medium credibility
July 2025

Context Engineering: LLM Memory and Retrieval for AI Agents ↗

Weaviate Team · Weaviate Blog

"Six pillars framework: Agents, Query Augmentation, Retrieval, Prompting, Memory, Tools. Context failure modes: poisoning, distraction, confusion, clash. MCP as 'USB-C for AI.'"

INDUSTRY Medium credibility
July 2025

Context Engineering Guide: Techniques for AI Agents ↗

Tuana Celik and Logan Markewich · LlamaIndex Blog

"Eight context components identified. Workflow engineering as core technique. 'Every AI builder is ultimately building specialized workflows—whether they realize it or not.'"

INDUSTRY Medium credibility
June 2025

The New Skill in AI is Not Prompting, It's Context Engineering ↗

Philipp Schmid · Personal Blog

"Seven contextual layers. Distinguishes 'cheap demo' (poor context) from 'magical agent' (rich context). Four characteristics: system-based, dynamic, information-complete, format-conscious."

NEWS Medium credibility
November 2025

Context Engineering: Improving AI by Moving Beyond the Prompt ↗

Various IT Leaders · CIO.com

"Enterprise adoption patterns: context engineering moves from differentiator to infrastructure in 12-18 months. 'Treat context as infrastructure'—standardize pipelines, not ad-hoc files."

INDUSTRY Medium credibility
July 2025

Context Engineering: Structured Output, RAG & More Components ↗

Elasticsearch Labs · Elastic Blog

"Five core components: RAG, Prompt Engineering, Memory Management, Structured Outputs, Tools. Key finding: 19 tools outperform 46 tools for model accuracy."

DOCS Low credibility
2025

Context Engineering Guide ↗

Prompt Engineering Guide · promptingguide.ai

"Tutorial-level synthesis of context engineering components. Identifies emerging areas: context compression, stale info detection, automation, measurement frameworks."

INDUSTRY Low credibility
2025

Why AI Teams Are Moving From Prompt Engineering to Context Engineering ↗

Neo4j · Neo4j Blog

"Knowledge graph perspective on context engineering. 'Prompts shape how the model thinks. Context shapes what the model actually knows.' Reliable AI comes from architecture, not clever phrasing."