AI Gateway

How to Govern Claude Code Usage Across Engineering Teams

Govern Claude Code usage across engineering teams with virtual keys, hierarchical budgets, and tool filtering. Full Bifrost playbook for platform teams.

Claude Code has become the default terminal coding agent in most engineering organizations, and the bill is following. Anthropic's own enterprise data puts the average at $13 per developer per active day and $150 to $250 per developer per month. For a 200-engineer org, that is $30,000 to $50,000 monthly on a tool that most platform teams still cannot attribute to a team, a project, or a developer. Governing Claude Code usage across engineering teams is what turns that line item from a CFO conversation into an operational control. This guide covers what governance actually requires, where Anthropic's native controls stop, and how Bifrost, the open-source AI gateway by Maxim AI, gives platform teams hierarchical budgets, per-team rate limits, tool filtering, and full observability without changing a single developer's workflow.

Why Claude Code Needs Governance at All

Claude Code is powerful because it is autonomous. It reads entire repositories, runs terminal commands, edits files, calls MCP tools, and produces pull requests from a single CLI session. Every one of those actions consumes input and output tokens, and the usage patterns that drive cost most (extended thinking budgets, cache invalidation after idle time, subagent fan-out) are invisible to the developer running the session.

Anthropic's official cost documentation states that 90 percent of users stay below $30 per active day, but that still leaves a long tail of sessions that cost hundreds or thousands of dollars each. At organizational scale, three patterns recur:

Runaway sessions: subagent loops, autocompact cascades, and context resubmission spikes that rack up five-figure bills before anyone notices
Attribution gaps: the Anthropic console shows total spend, not team, project, or developer-level breakdowns
Uncontrolled access: every developer has the same model access and the same rate limits, regardless of whether they are a senior IC working on a production migration or an intern running exploratory scripts

Anthropic's Team and Enterprise plans solve some of this with admin controls and enterprise security, but they do not solve the platform team's core problem: fine-grained, hierarchical governance that works across provider boundaries and produces auditable evidence. That belongs in the gateway layer.

What "Governing Claude Code" Actually Means

Effective Claude Code governance covers five distinct controls:

Identity and attribution: every request is tied to a person, team, or project, not a shared API key
Spend controls: hard budget limits at multiple organizational levels, not just monitoring alerts
Access controls: which developers can use which models, providers, and tools
Rate limits: per-key throughput ceilings that contain runaway sessions
Audit and observability: request-level logs with session, model, token, and cost detail for compliance and chargeback

Without all five, governance is a dashboard, not a control plane.

The Claude Code Governance Model

Before enforcing anything, platform teams need a model for how Claude Code access maps to the organization. A workable model looks like this:

Customer or Business Unit level: top of the hierarchy, represents a major cost center (a product line, a business unit, or an external customer for SaaS embedding)
Team level: an engineering team or squad with its own budget
Virtual Key level: scoped to a use case or individual developer
Provider Config level: specifies which providers and models the key is allowed to use

Each level enforces its own policies. A request that passes the virtual key's check but exceeds the team's monthly budget is blocked at the team level. This is how large organizations prevent one team from burning another team's allocation.

How Bifrost Enforces the Model

Bifrost implements this governance model on top of virtual keys, its primary governance entity. The full governance stack is native to the gateway, which means enforcement happens in real time on every request, not in a separate monitoring loop.

The primitives are:

Virtual keys scoped per developer, per team, or per project
Hierarchical budgets with independent limits at Customer, Team, Virtual Key, and Provider Config levels
Rate limits at Virtual Key and Provider Config levels
Tool filtering per virtual key for MCP tool access control
Provider access restrictions that lock a key to specific providers and models

Developers authenticate with Claude Code the normal way (API key or browser OAuth for Max subscriptions). Bifrost handles the actual provider credentials, so revoking or reconfiguring a virtual key takes effect on the next request with no key rotation ceremony and no environment variable updates across developer machines.

Step 1: Route Claude Code Through Bifrost

The integration is a two-line environment change:

export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
export ANTHROPIC_API_KEY=vk_<bifrost-virtual-key>

Claude Code sends the virtual key as its API key; Bifrost looks up the key's policies and routes the request to Anthropic (or another configured provider) with the real credentials. The Claude Code integration guide covers the full setup including cloud provider passthrough for Bedrock and Vertex.

Because Bifrost is a drop-in replacement for the Anthropic SDK, Claude Code itself does not change. Streaming, tool calling, and extended thinking all continue to work.

Step 2: Model Your Organization with Virtual Keys

The simplest effective pattern is one virtual key per developer, grouped under team-level parent budgets. For a platform engineering team of 12 people:

Team budget: $3,000 per month (shared cap for the whole team)
Per-developer virtual key budget: $300 per month (individual ceiling)
Rate limit per virtual key: 60 requests per minute
Model access: Sonnet 4.6 and Haiku 4.5 only; Opus 4.7 restricted to senior engineers

When any developer's usage compounds against both their individual budget and the team budget. When either is exhausted, subsequent requests are blocked at the gateway before they reach Anthropic. Calendar-aligned budgets reset monthly on the first day at 00:00 UTC, which maps cleanly to finance reporting cycles.

For multi-product organizations, add a Customer or Business Unit level above Team: a fintech company might model Trading, Risk, and Customer Success as three separate customers, each with its own budget, and teams underneath.

Step 3: Apply Rate Limits to Contain Runaway Sessions

Rate limits are the cheapest defense against subagent loops and autocompact cascades. A reasonable default for Claude Code:

60 requests per minute per virtual key for interactive use
20 requests per minute for automation keys used by CI pipelines
Burst tolerance for the first few seconds to handle rapid tool-call sequences

Rate limits only apply at the Virtual Key and Provider Config levels in Bifrost, which is the right shape: Customer and Team levels enforce budgets (how much money can be spent total), and Virtual Keys enforce rate (how fast it can be spent). This prevents one misbehaving session from exhausting a team budget in minutes while other developers are still working.

Step 4: Restrict Model Access Per Role

Not every developer needs Opus 4.7 at $5/$25 per million tokens. A tiered access model contains cost without blocking work:

Senior engineers and staff engineers: full access to Opus 4.7, Sonnet 4.6, Haiku 4.5
Everyone else: Sonnet 4.6 and Haiku 4.5 by default
Automation and CI pipelines: Haiku 4.5 only for code review bots, syntax checks, and routine operations

Bifrost enforces this through provider configuration on each virtual key. Developers can still use the /model command inside Claude Code, but switching to a model outside their virtual key's allow-list returns a clear error instead of silently burning budget.

Step 5: Filter MCP Tools Per Team

Claude Code's tool surface grows fast once MCP servers enter the picture. Filesystem, database, web search, GitHub, Slack, Jira, and a dozen internal MCP servers can all connect at once. Not every developer should see every tool.

Bifrost's tool filtering controls which MCP tools are available per virtual key or per request. A customer support agent developer gets CRM and ticketing tools. An SRE gets Kubernetes and Datadog tools. A finance agent developer gets accounting tools. The full capability set exists in the gateway; the visible surface is scoped per role. For teams planning a multi-agent rollout, the MCP Gateway resource page covers the broader architecture.

Step 6: Instrument Spend and Usage

Governance without observability is policy theater. Bifrost exposes native Prometheus metrics and OpenTelemetry traces that carry virtual key, team, customer, model, and tool labels on every metric, which means dashboards can answer:

Cost per developer per day, week, and month
Cost per team against team budget
Cost per MCP tool call (useful for identifying expensive automations)
Top sessions by token consumption for anomaly review
Cache hit rate impact on effective cost

For regulated workloads, audit logs provide immutable, request-level records suitable for SOC 2, HIPAA, GDPR, and ISO 27001 evidence, covering every Claude Code session including model, tokens, tool calls, and user attribution.

Step 7: Stack With Cost Optimizations

Governance limits the blast radius of cost. Optimizations reduce the bill on every request that passes governance. Two that apply directly to Claude Code:

Semantic caching: Bifrost's semantic caching captures repeated queries across developers on the same team, which is common on shared codebases where multiple engineers ask Claude Code variations of the same question
Code Mode for MCP: For teams using three or more MCP servers, Code Mode reduces token usage by up to 92 percent by letting the model write orchestration code in a sandbox instead of loading every tool definition into context on every turn

Both compound the savings from governance controls without requiring any developer workflow change.

What to Expect After Rollout

Platform teams that govern Claude Code through Bifrost typically see the following within 30 days:

Per-developer spend visibility: every request attributed to an owner
Eliminated runaway sessions: hard budget caps stop five-figure anomalies at the gateway
30 to 50 percent cost reduction when semantic caching and Code Mode are enabled alongside governance
Zero workflow disruption: developers use Claude Code the same way; only the base URL changes
Clean chargeback and audit evidence: Prometheus and audit log data ready for finance and compliance review

Start Governing Claude Code Across Your Engineering Teams

Governing Claude Code usage across engineering teams is a control plane problem, not an application problem. The Anthropic console shows the bill; it does not stop the bill from running away, attribute it to the right team, or enforce model access policies. Bifrost gives platform teams hierarchical budgets, per-developer rate limits, tool filtering, audit logs, and semantic caching in one open-source gateway, all behind the same Anthropic-compatible endpoint Claude Code already knows how to talk to. The Bifrost governance page covers the full capability set, and the LLM Gateway Buyer's Guide compares Bifrost against other gateways on governance depth.

To see Claude Code governance working on your engineering org's actual usage, book a Bifrost demo with the Bifrost team.