Ever watched an agent “use tools” and had the same nagging thought I’ve had: why are we shoving the same lump of JSON through the model again and again just so it can call the next tool? It feels like making someone rewrite a grocery list after every aisle. That annoying drag is exactly what Code Mode tries to get rid of. And honestly, it’s starting to look like the default way people will use MCP once things move from cute demos to real systems.
Cloudflare doesn’t tiptoe around it. Their take is basically: we’ve been using MCP wrong. Not because MCP is broken, but because asking an LLM to juggle raw tool-call plumbing is a weird interface compared to what it actually “speaks” fluently… code.
So Code Mode flips the whole interaction. Instead of the model spitting out tool-call JSON one call at a time, you let it write TypeScript calls an API generated from MCP tools, run code in a sandbox, and then show the model only the results it truly needs. Cloudflare says the results are “striking” and that agents can handle more tools and more complex tools when they’re presented as a TypeScript API instead of direct tool calls. Source is Cloudflare’s post, [“Code Mode: the better way to use MCP”].
What is Code Mode for MCP
Code Mode is an MCP usage pattern where MCP tools show up as a normal programming-language API, often TypeScript, and the model generates executable code that calls those tools in a sandbox instead of emitting tool-call JSON step by step.
Why bother?
- LLMs have seen a ridiculous amount of real-world code during training.
- Tool calling leans on special non-text “tool tokens” plus synthetic examples, which makes it easier to fumble.
- Code can stash intermediate data inside the sandbox, which is cheap, instead of stuffing it into the context window, which is expensive.
Why direct MCP tool calling gets painful fast
Direct tool calling looks clean when you’ve got three tools and a tidy prompt. Then you hook an agent up to a bunch of servers and suddenly you’re knee-deep in it.
Cloudflare and Anthropic keep pointing at the same two problems.
1) Tool definitions can blow up your context window
Anthropic notes many MCP clients load all tool definitions upfront. If you’ve got “hundreds or thousands of tools,” the agent may chew through hundreds of thousands of tokens before it even starts doing the user’s job. Source is Anthropic’s engineering write-up, [“Code execution with MCP”].
2) Intermediate results eat tokens and accuracy
Anthropic gives a very real example: move a meeting transcript from Google Drive into Salesforce. With direct tool calls, you pull the transcript into context… then you basically duplicate it when it gets written into the next tool call.
They even quantify it. A 2-hour sales meeting transcript could mean ~50,000 additional tokens processed just from the “pass-through twice” pattern. Big docs can also overflow the context window and wreck the workflow. Same Anthropic source as above.
Cloudflare says it in simpler terms: once you’re “stringing together multiple calls,” every tool output gets fed back into the model “just to be copied over” into the next call. That’s wasted time, energy, and tokens. Source is Cloudflare again, [Code Mode].
Code Mode with MCP: what actually changes in the architecture?
MCP still does the stuff it’s good at. You keep it for:
- a uniform way to connect to tools across an ecosystem
- tool documentation
- authorization handled out-of-band, which matters a lot in real deployments
But you stop asking the model to emit tool-call JSON.
Instead, the flow looks more like this:
- Convert MCP tool schemas into a normal API surface. TypeScript is common.
- Give a sandbox runtime “bindings,” Cloudflare’s word, for the MCP servers you’re connected to.
- Ask the model to write code to orchestrate the calls.
- Run it in the sandbox and return the final result, or a small summary, back to the model.
Cloudflare’s key implementation detail is worth keeping verbatim: “we give the sandbox access to bindings representing the MCP servers it is connected to” so the agent can use them directly during code execution. Source is [Cloudflare Code Mode].
How to implement Code Mode for MCP
If you’re building agents, you don’t need fairy dust. You need a tight loop: generate → run → report.
Step 1: Expose MCP tools as a TypeScript API
Anthropic shows a clean pattern. Generate a file tree where each tool becomes a module. Example structure, from [Anthropic engineering]:
servers/
google-drive/
getDocument.Yet
index.Yet
salesforce/
updateRecord.Yet
index.PlusAnd each module wraps a generic “call tool” function:
// ./servers/google-drive/getDocument.ts
import { callMCPTool } from "../client.js". Interface GetDocumentInput {
documentId: string.
}
interface GetDocumentResponse {
content: string.
}
export async function getDocument: Promise<GetDocumentResponse> {
return callMCPTool<GetDocumentResponse>.
}Step 2: Provide a sandbox with MCP bindings
This is where Code Mode becomes “oh, now I get it.”
Your runtime should do a few obvious things:
- run TypeScript/JS, or compile TS to JS
- allow network only through the MCP client, not arbitrary egress unless you truly intend that
- enforce timeouts and resource limits
- capture stdout/stderr so debugging isn’t a guessing game
Cloudflare’s “bindings” model is a good mental image. The sandbox gets handles to the MCP servers it’s connected to. Source is [Cloudflare Code Mode].
Step 3: Let the model write orchestration code
Here’s the Google Drive → Salesforce example as code, Anthropic’s version:
import * as gdrive from "./servers/google-drive";
import * as salesforce from "./servers/salesforce". Const transcript = (await gdrive.getDocument({ documentId: "abc123" })).content;
await salesforce.updateRecord({
objectType. "SalesMeeting",
recordId. "00Q5f000001abcXYZ",
data: { Notes: transcript },
});What you don’t see is the transcript getting pasted into the model context twice. That’s the whole point.
Step 4: Return only what the model needs
I’ve seen people miss this part, then wonder why the gains feel “meh.” The win isn’t just “the LLM writes code.” The win is the sandbox doing the noisy, heavy lifting while the model stays focused.
Patterns that work well:
- Filter huge docs down to a summary in code.
- Extract structured fields before sending anything back.
- Return a short audit trail, like which tools were called, whether they succeeded, maybe timings.
Benefits of Code Mode for MCP (it’s not only about tokens)
Token savings are great, sure. But the practical wins go beyond the bill.
- Reliability. Models are generally better at writing idiomatic API calls than emitting precise tool-call JSON tokens. Cloudflare flat-out argues tool-call tokens aren’t “seen in the wild,” while real-world code is. Source is Cloudflare Code Mode.
- Tool scale. Anthropic notes developers now build agents with hundreds or thousands of tools spread across many MCP servers. Code execution helps by loading only what’s needed and by processing data outside the context window. Source is Anthropic engineering.
- Composable orchestration. Loops, retries, caching, pagination, diffing… this is regular programming stuff. Code handles it without making everything feel like a ritual.
Community chatter on HN, Reddit, and LinkedIn keeps circling one idea: having the LLM generate code on top of a normal API instead of calling external functions one by one just feels like common sense. I mostly agree. MCP wasn’t the mistake. Code Mode is just a more natural UI for it.
Best practices for Code Mode MCP (stuff I’d actually enforce)
Sandbox guardrails
You want the sandbox to be useful, but not a free-for-all.
- hard timeouts per run
- memory limits
- deterministic dependencies, so no random
npm installat runtime - file system isolation
- an explicit allowlist of which MCP servers and tools are reachable
Design the generated API like a real SDK
Boring is good here.
Keep names consistent (getDocument, updateRecord). Generate TypeScript types from tool schemas. Put docs in JSDoc so the model “sees” usage patterns while it’s writing code.
Keep the model out of the data firehose
If a tool returns 10 MB, don’t shovel into the model.
Have the code extract what matters, compress or aggregate, then return a minimal object. This lines up with Anthropic’s warning that large intermediate results can blow context limits and increase copying mistakes. Source is Anthropic engineering.
Keep reading
If you’re thinking about the broader shift away from direct tool calling, this internal post pairs well: Anthropic killed tool calling? what it means
Primary sources, straight from the folks who built it:
Code Mode is basically the “normal programming” version of MCP
MCP gives you a standard way to connect tools. Code Mode gives you a standard way to use them without drowning the model in tool definitions and intermediate results.
If you already have an MCP client, try it as a weekend refactor. Pick one workflow chaining 3+ tools, wrap those tools as a small TypeScript API, run it in a sandbox, and see what breaks first. Because, yeah, that’s usually where the real engineering is hiding.