Bring Your Own Model to Claude Code: A Setup Guide for Engineering Teams
Bring your own model to Claude Code with Bifrost. A complete setup guide covering routing, governance, observability, and team rollout patterns.
Engineering teams adopting Claude Code want the agentic coding workflow without the constraint of a single provider. The phrase "bring your own model to Claude Code" captures a specific operational requirement: route Claude Code requests through a model your team controls, hosted on the cloud account your team uses, governed by the policies your team enforces. Out of the box, Claude Code talks directly to api.anthropic.com. For most production engineering organizations, that single path does not match how AI infrastructure is actually managed. This guide walks through why teams adopt a BYOM pattern with Claude Code, how Bifrost enables it at the transport layer, and the configuration steps to roll it out across an engineering org. Bifrost, the open-source AI gateway by Maxim AI, sits between Claude Code and any provider you authorize, with 11 microsecond overhead at 5,000 RPS and full governance built in.
Understanding the Claude Code BYOM Challenge
Claude Code is built around Anthropic's Claude family, organized into three model tiers: Sonnet (default), Opus (complex reasoning), and Haiku (fast, lightweight tasks). The client expects an Anthropic-format API endpoint, sends requests there, and parses Anthropic-format responses. That tight coupling is good for first-time setup. It is restrictive for teams that need:
- Cloud account alignment: Existing committed spend on AWS Bedrock, Google Vertex AI, or Azure that should absorb Claude usage.
- Data residency: Inference traffic that must stay within a specific region or VPC for compliance.
- Model diversity: The ability to pin Sonnet to one provider and Haiku to another, or substitute non-Anthropic models entirely for specific task types.
- Self-hosted models: Open-source coding models running on internal GPUs for sensitive codebases.
- Cost optimization: Routing simple completions to cheaper models without the developer having to manage that decision per session.
Anthropic's own Claude Code on Vertex AI documentation acknowledges third-party platforms like Bedrock and Vertex through environment variables, but a multi-provider, governed deployment requires more than CLI flags. It requires a layer that translates between API formats, applies policy, and gives platform teams a single place to manage everything. That layer is an AI gateway.
Approaches to Claude Code BYOM
Engineering teams typically evaluate three patterns when bringing custom models to Claude Code:
- Direct provider configuration: Use environment variables like
CLAUDE_CODE_USE_BEDROCK=1orCLAUDE_CODE_USE_VERTEX=1to point Claude Code at one cloud provider. This works for single-provider rollouts but locks the team to one path with no failover, no per-developer governance, and no cross-provider routing. - Per-developer custom setups: Each engineer manages their own credentials, base URLs, and model pins. This scales poorly, leaks secrets, and makes attribution and cost tracking nearly impossible at the org level.
- Centralized AI gateway: A gateway runs as a single endpoint that Claude Code points to. The gateway holds all provider credentials, enforces policy, and routes traffic to the right model based on rules. This is the only pattern that scales across multiple teams and multiple providers.
Bifrost takes the gateway approach and standardizes it. Claude Code sends Anthropic-formatted requests to a Bifrost endpoint. Bifrost translates them to the target provider's format (OpenAI, Bedrock, Vertex, Azure, Groq, Mistral, or any of 20+ supported providers), forwards them, and translates responses back into Anthropic format. Claude Code never knows the difference, while platform teams get full control over the inference layer.
How Bifrost Enables BYOM for Claude Code
Bifrost is designed as a drop-in replacement for direct provider SDKs, with native Anthropic API compatibility for Claude Code. The integration works through two environment variables in Claude Code's settings.json and a virtual key created in Bifrost. Claude Code's traffic flows through the gateway, picking up routing, governance, and observability without any changes to the Claude Code binary.
The core mechanism is straightforward. Bifrost exposes an Anthropic-compatible endpoint at /anthropic. Claude Code is pointed at that endpoint via ANTHROPIC_BASE_URL, and a virtual key replaces the real API key in ANTHROPIC_API_KEY. The virtual key abstracts the underlying provider credentials and carries the governance scope: which models the developer can use, which budget caps apply, and which rate limits the gateway enforces.
Beyond routing, Bifrost provides the production capabilities that make BYOM viable at scale:
- Multi-provider model pinning: Map Claude Code's
SonnetandHaikuslots to any model on any provider throughANTHROPIC_DEFAULT_SONNET_MODELandANTHROPIC_DEFAULT_HAIKU_MODEL. - Automatic failover: If a primary provider rejects requests, Bifrost's fallback chains reroute to a secondary without surfacing errors to the developer.
- Virtual keys with hierarchical budgets: Per-developer or per-team scopes with independent spend caps that cascade across customer, team, virtual key, and provider tiers.
- Built-in observability: Every Claude Code request is logged with model, provider, token usage, cost, and latency, viewable in the Bifrost dashboard or exported via OpenTelemetry.
Bifrost adds 11 microseconds of overhead per request at 5,000 RPS in sustained benchmarks. Independent performance benchmarks confirm this matches the bar for production-grade developer tooling, where any latency introduced by the gateway would show up as visible slowness in the IDE.
Step-by-Step Setup for Claude Code BYOM
The setup takes three steps: install Claude Code and Bifrost, configure routing rules in the Bifrost dashboard, and update Claude Code's settings.json to point at the gateway. Detailed configuration patterns are documented in the Claude Code integration guide.
1. Install both tools
Install Claude Code globally:
npm install -g @anthropic-ai/claude-code
Run Bifrost locally with zero configuration:
npx -y @maximhq/bifrost
Bifrost listens on port 8080 by default. The dashboard is accessible at http://localhost:8080, where you configure providers, virtual keys, and routing rules.
2. Create a virtual key
In the Bifrost dashboard, create a virtual key for the Claude Code workflow. Configure the providers it can route to (for example, AWS Bedrock with Claude on Bedrock, or Vertex AI with Claude on Vertex), set monthly budget caps, and apply rate limits using Bifrost's governance controls. The virtual key value will replace the real API key in Claude Code's settings.
3. Update Claude Code settings.json
Global settings.json lives at ~/.claude/settings.json on macOS, Linux, or WSL, and %USERPROFILE%\\.claude\\settings.json on Windows. Merge the following into the existing top-level object (do not paste it as a standalone file):
"env": {
"ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
"ANTHROPIC_API_KEY": "your-virtual-key",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-6",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6"
}
For Anthropic models on Bedrock, prefix the model IDs with bedrock/. For Vertex AI, use vertex/. For Azure, use azure/. For example, the Bedrock variant looks like:
"env": {
"ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
"ANTHROPIC_API_KEY": "your-virtual-key",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "bedrock/global.anthropic.claude-haiku-4-6",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "bedrock/global.anthropic.claude-sonnet-4-6"
}
Make sure no model field is set in settings.json, since that would override the env block. After updating settings, run claude, execute /logout, restart, and select API key authentication when prompted.
For dynamic model selection mid-session, use the /model command to switch between providers without losing conversation context. Switches happen instantly and Claude Code continues the conversation against the new model.
Governance and Observability for BYOM Rollouts
Routing alone is the table-stakes capability. The reason engineering teams centralize Claude Code through a gateway is the governance layer that comes with it. A 2026 Cloud Security Alliance survey found that 82% of organizations have discovered AI agents running in their IT environments that security and IT did not previously know about. Gateway-level controls close that visibility gap for Claude Code traffic specifically.
Bifrost provides four pillars of governance for Claude Code BYOM deployments:
- Virtual keys: Per-developer or per-team API keys with independent budgets, rate limits, and access scopes. Configurable through virtual keys.
- Hierarchical budgets: Customer, team, virtual key, and provider-level budget enforcement. When a budget is exhausted, requests are blocked at the gateway before they reach the provider.
- Rate limits: Token-per-minute and request-per-minute caps applied per virtual key. A runaway agent loop hits a clear ceiling instead of consuming the team's monthly quota.
- Tool and model filtering: Restrict which models a key can route to and which MCP tools it can invoke, so contractors and interns get a different surface than senior engineers.
For observability, every Claude Code request that flows through Bifrost is logged with input messages, model parameters, provider context, token usage, costs, and latency. The built-in observability dashboard shows real-time agent activity. For production rollouts, native Prometheus metrics and OpenTelemetry export to Datadog, Grafana, New Relic, or Honeycomb provide the same telemetry inside existing monitoring stacks.
For regulated workloads, the Bifrost LLM Gateway Buyer's Guide details enterprise capabilities like in-VPC deployment, audit logs for SOC 2 and HIPAA, RBAC with SSO via Okta and Entra, and HashiCorp Vault integration for secret management.
Real-World Benefits of Centralizing Claude Code Through Bifrost
Engineering teams that adopt the BYOM pattern with Bifrost typically see four practical outcomes. Cloud spend consolidates onto existing committed contracts because Claude Code traffic now bills through Bedrock, Vertex, or Azure rather than a separate Anthropic invoice. Reliability improves because failover chains catch provider outages before developers notice. Cost attribution becomes precise because virtual keys map cleanly to teams and projects, making chargeback and forecasting tractable. Compliance posture strengthens because a single audited path replaces a sprawl of individual developer configurations.
The operational model also matters. Because virtual keys are managed centrally, policy changes propagate immediately. Revoking a contractor's access, reducing a budget cap, or restricting which models a team can use takes effect on the next request, with no key rotation ceremony or environment variable distribution.
Start Building with Bifrost
Bringing your own model to Claude Code is no longer a workaround. With Bifrost, it is a standard production pattern: one configuration change in settings.json, one virtual key per developer or team, and a centralized gateway that routes, governs, and observes every request. Engineering teams keep the Claude Code experience their developers want and gain the infrastructure controls their organization needs. To see how Bifrost handles Claude Code BYOM at scale, book a demo with the Bifrost team.