Top 5 Enterprise AI Gateways to Track Claude Code Costs
Claude Code is one of the most capable AI coding agents in production today, but it creates a cost visibility problem that catches engineering teams off guard. Each agentic session triggers dozens of API calls for file operations, terminal commands, and code editing, often using high-cost models like Claude Opus or Sonnet. On API pricing, the average cost is roughly $6 per developer per day, but heavy users can exceed that significantly. Anthropic's billing console shows total spend, but it does not break costs down by session, team, project, or developer. For enterprises running Claude Code across 50 or 200 engineers, the question "where is our AI budget going?" has no clean answer out of the box.
An enterprise AI gateway solves this by sitting between Claude Code and the LLM provider, intercepting every API call to log token consumption, enforce budgets, and attribute costs at a granular level. This article covers five enterprise AI gateways that support Claude Code cost tracking: Bifrost, Cloudflare AI Gateway, Kong AI Gateway, OpenRouter, and LiteLLM.
What to Look for in a Claude Code Cost Tracking Gateway
Before evaluating individual platforms, enterprise teams should assess gateways against these cost management criteria:
- Per-developer and per-team attribution: Can you break down spend by individual developer, team, or project?
- Budget enforcement: Does the gateway automatically block requests when spending limits are reached, or does it only report after the fact?
- Hierarchical cost controls: Can you set budgets at multiple levels (developer, team, organization)?
- Real-time visibility: Is cost data available in real time, or only after batch processing?
- Integration with existing observability: Does the gateway export data to Prometheus, Datadog, Grafana, or OpenTelemetry?
- Self-hosted deployment: Can you run the gateway within your VPC for data residency and compliance requirements?
- Claude Code compatibility: Does it work with Claude Code's
ANTHROPIC_BASE_URLwithout breaking tool calling, streaming, or agentic workflows?
Gartner predicts that by 2028, 90% of enterprise software engineers will use AI code assistants. Cost governance is not optional at that adoption rate.
1. Bifrost
Bifrost is an open-source, high-performance AI gateway built in Go by Maxim AI. It is purpose-built for enterprise cost tracking and governance across AI coding agents, including native Claude Code integration.
Cost tracking capabilities
Bifrost provides a persistent, queryable audit trail that captures cost, latency, tokens, input, output, and status for every request. The log store supports SQLite and PostgreSQL backends, and aggregated stats (total requests, success rate, average latency, total tokens, total cost) are queryable through a search API.
A built-in model catalog auto-syncs pricing data from all providers every 24 hours, so cost calculations stay accurate without manual updates. If you use semantic caching, Bifrost calculates costs correctly for cache hits versus misses.
Budget enforcement
Bifrost's virtual key governance provides a four-tier budget hierarchy: Customer, Team, Virtual Key, and Provider Configuration. You can set dollar-amount budgets with configurable reset durations (hourly, daily, weekly, monthly) at each level. When a budget is exhausted, Bifrost blocks subsequent requests automatically before additional charges accumulate.
This means an engineering manager can cap each developer at $500/month, set a team ceiling of $5,000/month, and set a global organization limit, all enforced in real time.
Enterprise features
- Automatic failover and load balancing across 20+ providers
- MCP gateway for centralized tool management with per-developer tool filtering
- Native Prometheus metrics, OpenTelemetry, and Datadog integration
- In-VPC deployments, audit logs, SSO with Okta and Microsoft Entra
- Guardrails for content safety enforcement
- Only 11 microseconds of overhead at 5,000 RPS (benchmarks)
Claude Code setup
export ANTHROPIC_BASE_URL=http://your-bifrost-instance:8080/anthropic
export ANTHROPIC_API_KEY=your-bifrost-virtual-key
All Claude Code traffic flows through Bifrost with zero code changes. The Bifrost CLI automates this setup entirely, including model selection and virtual key configuration.
Best for: Enterprises that need hierarchical budget enforcement, per-developer cost attribution, self-hosted deployment, and compliance-grade audit trails. The LLM Gateway Buyer's Guide provides a detailed comparison of how Bifrost's governance capabilities compare across platforms.
2. Cloudflare AI Gateway
Cloudflare AI Gateway is a managed proxy built on Cloudflare's global network infrastructure. It provides analytics, caching, and rate limiting for AI API traffic across multiple providers.
Cost tracking capabilities
Cloudflare's dashboard tracks requests, token usage, costs, and errors across all configured providers. Custom metadata tagging allows teams to label requests with user IDs, team information, or project identifiers for filtering in analytics. Cost tracking supports custom pricing overrides for teams operating under negotiated rates.
Persistent logs are available on all plans with a free allocation (100,000 logs/month on the free tier, 1,000,000 on Workers Paid). Logpush is available on the Workers Paid plan for exporting logs to external systems.
Budget enforcement
Cloudflare provides rate limiting at the gateway level but does not offer per-developer or per-team budget enforcement with automatic request blocking. Cost controls are reactive (analytics-based) rather than proactive (enforcement-based). There are no hierarchical budget structures.
Claude Code setup
Point ANTHROPIC_BASE_URL to your Cloudflare AI Gateway endpoint. Cloudflare supports the Anthropic provider natively.
Best for: Teams already embedded in the Cloudflare ecosystem that need low-friction cost analytics without self-hosting. Note the lack of per-developer budget enforcement and self-hosted deployment options.
3. Kong AI Gateway
Kong AI Gateway extends Kong's mature enterprise API management platform with AI-specific plugins for LLM traffic routing, rate limiting, and analytics.
Cost tracking capabilities
Kong's AI Proxy plugin logs token usage statistics for every request, including prompt tokens, completion tokens, total tokens, and cost. The File Log plugin captures full request and response metadata for audit trails. Enterprise analytics dashboards track AI consumption as API requests and token usage.
Budget enforcement
Kong provides token-based rate limiting through the AI Rate Limiting Advanced plugin, which operates on actual token consumption rather than raw request counts. Model-level rate limits can be set per model for cost-aligned enforcement. Semantic caching reduces redundant calls.
However, advanced AI-specific cost features like token-based rate limiting are restricted to the Enterprise tier. Kong's pricing model is per-service, meaning each LLM provider endpoint counts as a separate service, and annual Enterprise licenses can exceed $50,000 for mid-sized deployments.
Claude Code setup
Configure Claude Code's ANTHROPIC_BASE_URL to point at the Kong gateway endpoint. Kong handles authentication upstream and routes to Anthropic.
Best for: Organizations already standardized on Kong for API management that want to extend their existing infrastructure to handle LLM traffic without introducing a separate gateway. Not practical for teams without existing Kong infrastructure due to complexity and cost.
4. OpenRouter
OpenRouter is a managed routing service providing a single API endpoint for accessing 290+ models across major providers. It handles billing aggregation and model availability tracking through a hosted proxy.
Cost tracking capabilities
OpenRouter's Activity Dashboard shows per-request cost data in real time. Every API response includes total_cost and usage fields, enabling per-request cost attribution at the application level. Spending can be tracked per model and per API key. Separate API keys can be created per environment (dev, staging, production) with individual caps and alerts.
Budget enforcement
OpenRouter supports spending limits and credit allocation across team members. Usage alerts notify when spend approaches limits. However, OpenRouter lacks hierarchical budget structures (no team-level or organization-level controls), virtual keys, RBAC, or audit logging at the enterprise level. SSO (SAML) is available only on Enterprise plans.
Claude Code setup
export ANTHROPIC_BASE_URL=https://openrouter.ai/api
export ANTHROPIC_API_KEY=your-openrouter-key
Note: OpenRouter has known issues with streaming function call arguments, which can cause failures in tool-heavy Claude Code workflows.
Best for: Individual developers and smaller teams that want instant multi-model access with per-request cost data and unified billing. Not suited for enterprises requiring self-hosted deployment, hierarchical governance, or compliance audit trails.
5. LiteLLM
LiteLLM is an open-source Python proxy that provides a unified interface for 100+ LLM providers. Anthropic's own Claude Code documentation references LiteLLM as a cost tracking option for teams using Bedrock, Vertex, and Foundry deployments.
Cost tracking capabilities
LiteLLM tracks spend per virtual key, per team, and per model through its admin dashboard. It logs token usage and cost for every request routed through the proxy. Spend reports can be filtered by key, team, or model.
Budget enforcement
LiteLLM supports virtual key-based spend tracking with budget limits. However, it lacks SSO integration, RBAC, guardrails, audit logging, and the hierarchical cost control depth of purpose-built enterprise gateways.
Claude Code setup
export ANTHROPIC_BASE_URL=http://0.0.0.0:4000
export ANTHROPIC_AUTH_TOKEN=$LITELLM_MASTER_KEY
Best for: Python-native teams comfortable with self-hosting that need basic cost tracking and multi-provider support without enterprise governance features. Teams outgrowing LiteLLM's capabilities may want to evaluate migration options.
Comparison Summary
Here is how the five gateways compare on key cost tracking criteria for Claude Code deployments:
- Hierarchical budget enforcement: Bifrost (four-tier hierarchy with automatic blocking) stands alone. Kong offers token-based rate limiting on Enterprise. Cloudflare, OpenRouter, and LiteLLM provide analytics without hierarchical enforcement.
- Per-developer attribution: Bifrost (via virtual keys), OpenRouter (via separate API keys), and LiteLLM (via virtual keys) support per-developer cost breakdowns. Cloudflare supports custom metadata tagging. Kong tracks at the service level.
- Self-hosted deployment: Bifrost, Kong, and LiteLLM can be self-hosted. Cloudflare and OpenRouter are managed services only.
- Claude Code tool calling compatibility: Bifrost and Cloudflare handle Claude Code's streaming tool calls reliably. Kong works through its AI Proxy plugin. OpenRouter has known issues with streaming function call arguments. LiteLLM works through its Anthropic pass-through endpoint.
- Observability integrations: Bifrost supports Prometheus, OpenTelemetry, and Datadog natively. Kong integrates with its existing analytics ecosystem. Cloudflare provides Logpush. OpenRouter provides per-response cost metadata. LiteLLM provides a built-in dashboard.
- Performance overhead: Bifrost adds 11 microseconds at 5,000 RPS. Kong, Cloudflare, and OpenRouter add varying levels of latency depending on deployment topology. LiteLLM's Python-based proxy introduces higher overhead under load.
Start Tracking Claude Code Costs with Bifrost
Enterprise teams running Claude Code at scale need more than analytics dashboards. They need real-time budget enforcement, per-developer cost attribution, hierarchical spending controls, and compliance-grade audit trails. Bifrost delivers all of this as an open-source platform with 11 microseconds of overhead and zero disruption to developer workflows.
Book a demo with the Bifrost team to see how your organization can take control of Claude Code costs at enterprise scale.