How Bifrost MCP Gateway Cuts Token Costs in Claude Code and Codex CLI
Bifrost MCP Gateway reduces token costs in coding agents like Claude Code and Codex CLI by up to 92% through Code Mode, tool filtering, and centralized governance.
Coding agents like Claude Code and Codex CLI run hot on tokens. Every time you connect a new MCP server for filesystem access, GitHub, internal APIs, or database tooling, the entire tool catalog gets injected into the model's context on every turn of the agent loop. Most teams see this only when the invoice arrives. Bifrost MCP Gateway fixes the root cause by changing how tools are exposed to the model, combining Code Mode with per-consumer virtual keys and tool filtering so coding agents consume a small fraction of the tokens they would otherwise burn. In controlled benchmarks at 508 tools across 16 MCP servers, token usage dropped by 92.8% with pass rate held at 100%.
Why MCP Tool Bloat Drains Tokens in Coding Agents
Classic MCP has a costly default: every tool definition from every connected server is sent into the model's context on every single request. For a coding agent with five MCP servers and thirty tools each, that is 150 tool definitions loaded before the model reads a single line of your prompt. Scale to 16 servers with around 500 tools and it gets worse, because classic MCP loads every tool definition on every request regardless of which tools the model actually needs.
Anthropic's engineering team documented this directly. In a recent writeup on code execution with MCP, they showed a Google Drive to Salesforce workflow where context dropped from 150,000 tokens to 2,000 tokens when tool definitions were loaded on demand rather than upfront. The same pattern hits Claude Code and Codex CLI users who wire up many MCP servers: most of the token spend goes to reading tool catalogs the model never uses on that turn.
Two consequences follow. First, inference cost scales with the size of your MCP footprint, not with the work you actually want the agent to do. Second, coding agents become slower as their tool catalog grows, because the model spends more of its budget parsing schemas instead of reasoning about code. Claude Code's own docs note that tool search is on by default precisely to mitigate this, but individual client-side mitigations do not solve the problem when many teams, agents, and customers share the same tool fleet.
The Hidden Cost Math for Claude Code and Codex CLI Users
The pattern that shows up most often in coding agent deployments looks like this:
- A developer connects Claude Code or Codex CLI to a filesystem MCP server, a GitHub server, and a few internal tool servers.
- Each server exposes between ten and fifty tools.
- The agent loop takes six to ten turns to complete a non-trivial task.
- Every turn resends the full tool list back to the model.
With 150 tool definitions averaging a few hundred tokens each, a single ten-turn coding task can easily spend 300K input tokens before the model produces any useful output. Across hundreds of daily runs per engineer, that compounds into thousands of dollars per month of pure schema overhead. It also degrades tool selection accuracy, because the model must choose between dozens of irrelevant options alongside the one it actually needs.
How Bifrost MCP Gateway Reduces Token Costs at the Source
Bifrost is the open-source AI gateway by Maxim AI, built in Go with 11 microseconds of overhead at 5,000 requests per second. It acts as both an MCP client (connecting to upstream tool servers) and an MCP server (exposing a single /mcp endpoint to Claude Code, Codex CLI, Cursor, and other clients). Cost reduction for coding agents comes from three layers working together.
Code Mode: stub files instead of schema dumps
Code Mode is the core mechanism. Instead of injecting every tool definition into the context, Bifrost exposes your MCP servers as a virtual filesystem of lightweight Python stub files. The model gets four meta-tools and navigates the tool catalog on demand:
listToolFiles: discover which servers and tools are availablereadToolFile: load compact Python function signatures for a specific server or toolgetToolDocs: fetch detailed documentation for a single tool before using itexecuteToolCode: run an orchestration script against live tool bindings in a sandboxed Starlark interpreter
The model reads only the stubs it needs, writes a short script to orchestrate the tools, and submits it through executeToolCode. Bifrost runs the script in a sandbox, chains the tool calls, and returns only the final output to the model. Intermediate results never flow back through the context.
Code Mode supports two binding levels. Server-level binding groups all tools from a server into a single stub file, which works well for servers with modest tool counts. Tool-level binding gives each tool its own stub, useful when a server exposes thirty-plus tools with complex schemas. Both use the same four meta-tools.
Tool filtering: scope what each coding agent can see
Claude Code and Codex CLI do not always need access to every tool connected through the gateway. Bifrost's tool filtering lets you define, per virtual key, exactly which MCP tools are exposed. A key provisioned for a CI agent might only see read-only tools. A key provisioned for a human developer's Claude Code session might see the full set. The model only ever sees the tools it is allowed to call, which keeps both context size and blast radius under control.
Centralized discovery through one /mcp endpoint
Rather than configuring multiple MCP servers individually in every coding agent's config, teams point Claude Code or Codex CLI at Bifrost's single /mcp endpoint. All connected servers are discovered and governed centrally. Adding a new MCP server to Bifrost makes it available to every connected coding agent automatically, without client-side config changes.
Benchmark Results: 92% Lower Token Costs at Scale
Bifrost ran three rounds of controlled benchmarks with Code Mode on and off, scaling tool count between rounds to measure how savings change as the MCP footprint grows:
| Round | Tools × Servers | Input Tokens (OFF) | Input Tokens (ON) | Token Reduction | Cost Reduction | Pass Rate |
|---|---|---|---|---|---|---|
| 1 | 96 tools · 6 servers | 19.9M | 8.3M | −58.2% | −55.7% | 100% |
| 2 | 251 tools · 11 servers | 35.7M | 5.5M | −84.5% | −83.4% | 100% |
| 3 | 508 tools · 16 servers | 75.1M | 5.4M | −92.8% | −92.2% | 100% |
Two things stand out. The savings are not linear, they compound as MCP footprint grows, because classic MCP loads every tool definition on every request while Code Mode's cost is bounded by what the model actually reads. And accuracy is not traded off to get there: pass rate held at 100% in every round. The full report is available in the Bifrost MCP Code Mode benchmarks repo.
For a detailed breakdown of how Code Mode interacts with governance and audit, the Bifrost MCP Gateway overview post covers access control, cost tracking, and tool groups in depth.
Setting Up Bifrost MCP Gateway for Claude Code and Codex CLI
Moving Claude Code or Codex CLI behind Bifrost takes a few minutes. The Claude Code integration guide and Codex CLI integration guide walk through the full config. The key steps:
- Run Bifrost locally or in your VPC and connect your upstream MCP servers through the dashboard (HTTP, SSE, or STDIO transports are supported).
- Toggle Code Mode on per MCP client. No schema changes or redeployment required.
- Create a virtual key for each consumer (developer, CI agent, customer integration) and attach the tool set it is allowed to call.
- Point Claude Code or Codex CLI at Bifrost's
/mcpendpoint using the virtual key as the credential. - Optionally, use MCP Tool Groups to manage access at team or customer scope rather than per individual key.
Once the coding agent is connected, every tool call is logged as a first-class entry: tool name, source server, arguments, result, latency, virtual key, and the parent LLM request that initiated the loop. This gives you token cost tracking and per-tool cost tracking side by side, so you can attribute spend accurately.
What You Get Beyond Token Savings
Cost reduction is the visible win, but coding agents running through Bifrost MCP Gateway also pick up infrastructure that most teams end up building themselves:
- Scoped access: every coding agent sees only the tools it should.
- Audit trails: every tool execution is logged with full arguments and results, useful for security review and debugging.
- Health monitoring: automatic reconnection on upstream server failure, with periodic refresh to pick up new tools.
- OAuth 2.0 with PKCE: for MCP servers that require user-scoped auth, including dynamic client registration and auto token refresh.
- Unified model routing: the same gateway that governs MCP traffic also handles provider routing, failover, and load balancing across 20+ LLM providers.
For teams running Claude Code or Codex CLI at scale, the Bifrost MCP gateway resource page and the Claude Code integration resource cover deployment patterns and cost-saving configurations in more depth.
Start Cutting Coding Agent Token Costs with Bifrost
Token cost in coding agents is not a rounding error at production scale. When Claude Code, Codex CLI, and every agent in between are sending full tool catalogs on every turn, the bill grows faster than the value. Bifrost MCP Gateway gets token costs back under control by loading tool definitions on demand, scoping access through virtual keys, and centralizing every MCP server behind a single endpoint, without trading off capability or accuracy.
To see how Bifrost can reduce token costs across your coding agent fleet, book a demo with the Bifrost team.