What is Code Mode in Bifrost MCP Gateway?
Code Mode in Bifrost MCP Gateway lets AI agents write Python to orchestrate tools, cutting token usage by up to 92% without sacrificing capability.
Code Mode in Bifrost MCP Gateway is a new execution model for agents that replaces the default approach of injecting every tool definition into the model's context on every request. Instead of exposing hundreds of tool schemas directly, Code Mode exposes four lightweight meta-tools and lets the model write a short Python (Starlark) script to orchestrate the work. In controlled benchmarks across 500+ tools, this approach has reduced input tokens by up to 92.8% while holding pass rate at 100%. For teams running production AI agents across multiple Model Context Protocol servers, Code Mode is the difference between a manageable AI bill and an unmanageable one.
What is Code Mode in Bifrost MCP Gateway
Code Mode in Bifrost MCP Gateway is an orchestration mode where the AI model writes Python code to call MCP tools instead of invoking them one at a time through the standard function-calling interface. Bifrost exposes connected MCP servers as a virtual filesystem of Python stub files (.pyi signatures). The model discovers and reads only the tools it needs, writes a script that chains them together, and Bifrost executes that script in a sandboxed Starlark interpreter. Only the final result flows back into the model's context.
This design directly addresses the context-bloat problem that appears as soon as a team connects more than a handful of MCP servers. In classic MCP execution, every tool definition from every connected server is loaded into the prompt on every turn. With 5 servers and 30 tools each, that is 150 schemas before the model has even read the user's request. Code Mode breaks that coupling: context cost is bounded by what the model actually reads, not by how many tools exist in the registry.
Why the Default MCP Execution Model Has a Cost Problem
The standard way to use MCP is to let the gateway inject all available tool schemas into every LLM call. This works for demos and prototypes. In production, three things go wrong:
- Token spend compounds with each connected server. Classic MCP flow sends the full tool list on every request and every intermediate turn of an agent loop. Adding more MCP servers makes the problem worse, not better.
- Latency grows with context size. Long tool catalogs increase prompt length, which increases time-to-first-token and end-to-end request latency.
- "Just trim the tool list" is a tradeoff, not a fix. Removing tools to control cost means removing capability. Teams end up maintaining separate, artificially small tool sets for different agents.
These problems were quantified in public work from Anthropic's engineering team, which reported a drop from 150,000 to 2,000 tokens on a Google Drive to Salesforce workflow when tool calls were replaced with code execution, and from Cloudflare, which explored a similar approach using a TypeScript runtime. Bifrost's Code Mode takes the same insight and builds it natively into the Bifrost MCP gateway, with two deliberate differences: Python instead of JavaScript (LLMs are trained on substantially more Python), and a dedicated documentation meta-tool that further reduces context.
How Code Mode Works: The Four Meta-Tools
When Code Mode is enabled on an MCP client, Bifrost automatically adds four generic meta-tools to every request, replacing the direct tool schemas that would otherwise be injected.
| Meta-tool | Purpose |
|---|---|
listToolFiles |
Discover which servers and tools are available as virtual .pyi stub files |
readToolFile |
Load compact Python function signatures for a specific server or tool |
getToolDocs |
Fetch detailed documentation for a specific tool before using it |
executeToolCode |
Run an orchestration script against the live tool bindings |
The model navigates the tool catalog on demand. It lists stub files, reads only the signatures it needs, optionally fetches detailed docs for a specific tool, and then writes a short Python script that Bifrost executes in a sandbox. Bifrost supports both server-level and tool-level bindings: one stub per server for compact discovery, or one stub per tool for more granular lookups. Both modes share the same four-tool interface. This flexibility is documented in detail in the Code Mode configuration reference.
The Sandbox: What Code Can and Cannot Do
Bifrost executes model-generated scripts inside a Starlark interpreter, a deterministic Python-like language designed originally at Google for build system configuration. The sandbox is intentionally constrained:
- No imports
- No file I/O
- No network access
- Only tool calls against the allowed bindings and basic Python-like logic
This makes execution fast, deterministic, and safe to run under Agent Mode with auto-execution. The listToolFiles, readToolFile, and getToolDocs meta-tools are always auto-executable because they are read-only. executeToolCode becomes auto-executable only when every tool the generated script calls is on the configured allow-list.
How Code Mode Cuts Token Costs in Practice
Consider a multi-step e-commerce workflow: look up a customer, check their order history, apply a discount, send a confirmation. The difference between classic MCP and Code Mode shows up in the shape of the context, not just the output.
Classic MCP flow: Every turn carries the full tool list. Every intermediate tool result flows back through the model. With 10 MCP servers and 100+ tools, most of each prompt is spent on tool definitions.
Code Mode flow: The model reads one stub file, writes a single script that chains the calls together, and Bifrost executes the script in the sandbox. Intermediate results stay inside the sandbox. Only the final compact output reaches the model's context.
Bifrost published three rounds of controlled benchmarks comparing Code Mode on and off, scaling tool count between rounds:
| Scenario | Input tokens (off) | Input tokens (on) | Token reduction | Cost reduction |
|---|---|---|---|---|
| 96 tools / 6 servers | 19.9M | 8.3M | -58.2% | -55.7% |
| 251 tools / 11 servers | 35.7M | 5.5M | -84.5% | -83.4% |
| 508 tools / 16 servers | 75.1M | 5.4M | -92.8% | -92.2% |
Savings compound as tool count grows, because the classic flow loads every definition on every call while Code Mode's cost is bounded by what the model actually reads. Pass rate held at 100% across all three rounds, confirming that accuracy is not being traded away for efficiency. The full methodology and results are available in the Bifrost MCP Code Mode benchmark report.
The complete story on how this plays out in production, including cost governance, access control, and per-tool pricing, is covered in the Bifrost MCP Gateway launch post.
Why Code Mode Matters for Enterprise AI Teams
Token cost is only one of the reasons Code Mode matters in production. For platform and infrastructure teams managing AI agents at scale, Code Mode unlocks a set of operational properties that classic MCP execution cannot offer:
- Capability without cost penalty. Teams can connect every MCP server they need (internal APIs, search, databases, filesystem, CRM) without paying a per-request token tax for each tool definition.
- Predictable scaling. Adding a new MCP server does not inflate the context window of every downstream agent. The per-request cost profile stays flat.
- Faster execution. Fewer, larger model turns with sandboxed orchestration between them reduces end-to-end latency compared to multi-turn tool-by-tool execution.
- Deterministic workflows. Orchestration logic sits in a deterministic Starlark script rather than being reconstructed across multiple stochastic model turns.
- Auditable execution. Every tool call inside a Code Mode script is still a first-class log entry in Bifrost, with tool name, server, arguments, result, latency, virtual key, and parent LLM request captured.
Combined with Bifrost's virtual keys and governance, Code Mode fits into the broader pattern enterprise AI teams need: capability, cost control, and governance enforced at the infrastructure layer rather than bolted onto each agent.
How to Enable Code Mode on a Bifrost MCP Client
Code Mode is a per-client toggle. Any MCP client connected to Bifrost (STDIO, HTTP, SSE, or in-process via the Go SDK) can be switched between classic mode and Code Mode without redeployment or schema changes.
Step 1: Connect an MCP server
In the Bifrost dashboard, navigate to the MCP section and add a client. Give it a name, pick the connection type, and enter the endpoint or command. Bifrost discovers the server's tools and syncs them on a configurable interval, visible in the client list with a live health indicator. Full setup details are in the connecting to MCP servers guide.
Step 2: Toggle Code Mode on
Open the client's settings and enable Code Mode. Bifrost immediately stops injecting the full tool catalog into context for that client. From the next request onwards, the model receives the four meta-tools and navigates the tool filesystem on demand. Token usage on agent loops drops right away.
Step 3: Configure auto-execution
By default, tool calls require manual approval. To run the agent loop autonomously, allowlist specific tools under the auto-execute settings. Allowlisting is per-tool, so filesystem_read can auto-execute while filesystem_write stays behind an approval gate. In Code Mode, the three read-only meta-tools are always auto-executable, and executeToolCode becomes auto-executable only when all tools its script calls are allow-listed.
Step 4: Scope access with virtual keys
Pair Code Mode with virtual keys to scope tool access per consumer. A virtual key for a customer-facing agent can be restricted to a specific subset of tools, while an internal admin key gets broader access. The model never sees definitions for tools outside the virtual key's scope, eliminating prompt-level workarounds.
Getting Started with Code Mode in Bifrost MCP Gateway
Code Mode is the practical answer to the question every team running MCP in production eventually asks: how do we keep adding capability without watching our token bill go exponential? By moving orchestration from prompts into sandboxed Python, Bifrost's Code Mode delivers up to 92% lower token costs, faster agent execution, and full auditability, all through a single per-client toggle. It works with any MCP server, integrates with virtual keys and tool groups for access control, and slots cleanly into the MCP gateway architecture alongside Bifrost's LLM routing, fallbacks, and observability.
To see how Code Mode in Bifrost MCP Gateway performs on your own agent workloads, book a Bifrost demo with the team.