How to Track Per-Tool Costs Across MCP Servers

How to Track Per-Tool Costs Across MCP Servers

Track per-tool costs across MCP servers through Bifrost's gateway: per-execution audit logs, tool-scoped budgets, and unified cost attribution.

As AI agents connect to more Model Context Protocol (MCP) servers, cost stops being a simple LLM token problem. A single agent run now spans filesystem calls, web searches, database queries, and paid external APIs, each with its own pricing and its own contribution to the monthly bill. Teams that want to track per-tool costs across MCP servers quickly discover that standard observability stacks attribute charges to the model, not to the tools the model called. Bifrost, the open-source AI gateway by Maxim AI, closes that gap by routing every MCP tool invocation through a single audit layer that captures the tool name, calling server, virtual key, latency, token impact, and dollar cost of each execution. This guide covers why per-tool cost tracking is difficult, how Bifrost's MCP gateway solves it, and how to configure budgets, filters, and exports to keep spend attributable at the tool level.

Why Per-Tool Cost Tracking Across MCP Servers Is Hard

A naive agent architecture sends requests directly from the application to both the LLM provider and each MCP server. That setup produces three problems for cost attribution:

  • No single audit trail: Each MCP server writes its own logs, usually without the virtual key, team, or agent session that triggered the call.
  • Tool name collisions: Multiple MCP servers commonly expose tools with identical short names (for example, a GitHub server and a Jira server may both expose a search tool), making it impossible to attribute cost correctly from aggregated logs.
  • Blended model and tool costs: LLM token spend appears in the model provider's dashboard. Paid MCP tool calls (search APIs, geolocation APIs, SaaS integrations, image models) appear in separate vendor invoices. There is no shared schema tying them to the same agent run.

Without a unified layer, a finance question like "which tool drove the $8,000 spike last week?" becomes a multi-day investigation across logs from ten different systems.

What "Cost" Actually Means for MCP Tools

Accurate MCP cost tracking has to account for two distinct cost drivers on every tool call:

  • Model-side costs: Every tool exposed to the LLM consumes tokens. A catalog of 16 MCP servers with 500 total tools can burn 600+ tokens per turn on tool definitions alone, before any real work happens.
  • Tool-side costs: Many MCP servers wrap paid APIs, such as web search, geocoding, SaaS CRM queries, or image generation. Each successful tool call becomes a separate invoice item.

A useful cost-tracking system captures both, ties them to the same request identifier, and attributes them to the same virtual key, team, or customer. The Bifrost MCP gateway is built around that model: one gateway, one log schema, both cost types attributed per tool.

How Bifrost Tracks Per-Tool Costs Across MCP Servers

Bifrost acts as both an MCP client (to upstream tool servers) and an MCP server (to Claude Code, Claude Desktop, Cursor, and other MCP-compatible clients). Every request and every tool call passes through the gateway, which means every tool invocation enters a single structured log stream.

One audit trail per tool execution

When an LLM response includes tool calls and the application executes them through the gateway, Bifrost records:

  • The MCP client (upstream server) the tool belongs to
  • The fully qualified tool name
  • The arguments passed to the tool
  • The virtual key used on the request
  • Latency, token impact, and final status
  • Any guardrail decisions applied before or after execution

Tool names are automatically prefixed with the MCP client name (for example, filesystem_list_directory or github_search), which ensures uniqueness across servers. That prefixing is what makes per-tool cost attribution actually work at scale: every cost row in the log maps to exactly one upstream server and one tool.

LLM token cost plus tool execution cost

For each request, Bifrost calculates LLM cost from real-time provider pricing data, token usage, request type (chat, embedding, speech, transcription), cache status, and batch discounts. For MCP tool calls, Bifrost records execution metadata on the same request trace, so platform teams can join model spend with tool spend using a shared request ID. Tool execution stays explicit (the application, not the gateway, decides when to execute), which also means every call that contributes to the bill has a clear call-site.

Setting Up Per-Tool Cost Attribution with Virtual Keys

Virtual keys are the primary governance and attribution entity in Bifrost. Each virtual key carries its own budget, rate limits, tool allowlist, and reporting scope, so any tool call through that key is attributable to the team, project, or customer who owns it.

Attach one virtual key per team, feature, or environment. The budget and limits system supports a hierarchical structure:

  • Customer: Independent budget for an external account or internal business unit
  • Team: One or more teams under a customer, each with its own budget
  • Virtual key: Per-consumer budget, rate limits, and tool access
  • Provider config: Per-provider budgets inside a virtual key, so OpenAI and Anthropic spend are tracked independently

When a tool call arrives, Bifrost checks every applicable budget in the hierarchy. Any single budget failure blocks the request, and costs are deducted from each level so a platform team sees both the per-tool bill and the rolled-up team total. Reset durations are flexible (1m, 1h, 1d, 1w, 1M, 1Y), and calendar-aligned budgets reset at UTC boundaries for clean monthly reporting.

Tool access is scoped per virtual key. The MCP tool filtering configuration lets each virtual key expose a different slice of the tool catalog, so cost tracking stays meaningful. A production-support key that only has filesystem.read_file and github.list_prs will never silently run up charges on a paid search API, because that API is not in its tool scope.

Metrics and Telemetry for Tool-Level Cost Analysis

Logs alone are not enough for production cost work. Teams need aggregate metrics, traces that connect agent reasoning to tool execution, and a way to export all of it into the company's standard observability stack. Bifrost supports four complementary channels:

  • Native Prometheus metrics: Scraping and Push Gateway modes, with counters and histograms per tool and per virtual key
  • **OpenTelemetry tracing**: OTLP export to any OTel-compatible backend, including Grafana Tempo, Honeycomb, and New Relic
  • Datadog connector: Native integration for APM traces, LLM observability, and metrics, without a separate collector
  • Log exports: Automated export to storage systems and data lakes for long-horizon analysis

With traces tied to a shared request ID, a platform engineer can answer concrete cost questions in seconds: which agent session triggered the most expensive tool call last Tuesday, which virtual key drove the 40% spike in GitHub API spend, or which MCP server has the highest average per-call latency. The Bifrost MCP Gateway deep-dive walks through how cost attribution flows from the audit logs into production dashboards, along with benchmark results up to 92% lower token costs on agent-heavy workloads.

For compliance environments, content logging can be disabled per environment while still preserving tool name, server, latency, and status, keeping cost data complete without capturing sensitive payloads.

From Tracking Costs to Reducing Them

Tracking per-tool costs is the first step; cutting them is the payoff. Two Bifrost features move directly from attribution to reduction.

Code Mode attacks the single largest hidden cost in multi-server setups: tool definition tokens in context. When 16 MCP servers expose roughly 500 tools, classic MCP dumps hundreds of tool schemas into every LLM turn. Code Mode replaces that pattern with four meta-tools that let the model write Python in a Starlark sandbox and call many tools in one orchestrated script. The measured impact across multi-server workflows is roughly a 50% reduction in total request tokens and 30 to 40% faster execution, with reductions up to 92% on agent-heavy workloads at the 500-tool scale.

Semantic caching at the gateway layer removes duplicate tool-triggering LLM calls entirely. When two requests are semantically similar, Bifrost returns the cached response without invoking the model or the downstream tools, which shows up directly in the per-tool cost logs as fewer paid tool calls for the same workload.

Together, the two features let a team take the per-tool cost picture from audit logs and turn it into a targeted reduction plan, using the same dataset that drove the visibility in the first place.

Start Tracking Per-Tool Costs Across MCP Servers

Per-tool cost tracking across MCP servers is a solvable problem once every tool invocation flows through a single gateway with structured logs, attributed metrics, and hierarchical budgets. Bifrost captures LLM tokens and tool execution costs in one audit trail, scopes every cost to a virtual key, and exports the resulting dataset into Prometheus, OpenTelemetry, or Datadog so platform and finance teams work from the same numbers. The open-source release on GitHub deploys in one command.

To see how Bifrost can give your team full per-tool cost visibility across MCP servers, including clustering, federated authentication, and enterprise support, book a demo with the Bifrost team.