Try Bifrost Enterprise free for 14 days. Request access

Route Claude Code Through Google Gemini Models Using Bifrost

Route Claude Code Through Google Gemini Models Using Bifrost
Bifrost is an open-source AI gateway that lets you route Claude Code sessions through Google Gemini models with a single environment variable change and zero code modifications.

Claude Code is a terminal-based coding agent that communicates exclusively with Anthropic's API by default. For engineering teams running high-volume coding tasks, this creates a hard dependency on a single provider: Anthropic's pricing, rate limits, and availability. Bifrost, the open-source AI gateway built in Go by Maxim AI, breaks that dependency by acting as a protocol translation layer between Claude Code and any provider it's configured for, including Google Gemini. This post walks through how the translation works, how to configure Gemini as a provider, and how to pin Claude Code's model tiers to specific Gemini models.

Why Route Claude Code Through Gemini

Engineering teams adopt a multi-provider strategy for Claude Code for reasons that are specific to their cost and infrastructure profile:

  • Cost management: Gemini models, particularly Gemini 2.5 Flash, offer competitive pricing for high-volume coding tasks. Routing routine file operations and completions through Gemini while reserving Claude Sonnet for complex multi-step reasoning can reduce overall token spend without changing developer workflows.
  • Latency optimization: Depending on geographic region and workload type, Gemini endpoints may offer lower response times for specific tasks. Teams running infrastructure in Google Cloud regions benefit from reduced round-trip latency.
  • Model benchmarking: Running identical coding prompts through both Claude and Gemini gives engineering teams data to make informed decisions about which model performs best for their codebase and task types.
  • Provider redundancy: Relying on a single provider creates a single point of failure. If Anthropic's API experiences rate limiting during a high-traffic deploy window, Claude Code stops working. Bifrost's automatic fallback handles the switch transparently.

How Bifrost Handles the Protocol Translation

Google Gemini's API structure differs substantially from Anthropic's. Bifrost performs the full conversion at request time, with no changes required in Claude Code:

  • Role remapping: Anthropic's "assistant" role is remapped to Gemini's "model" role. System messages are converted to Gemini's systemInstruction field format.
  • Parameter renaming: Anthropic's max_completion_tokens becomes Gemini's maxOutputTokens; stop becomes stopSequences.
  • Tool call handling: Function call structures are restructured to match Gemini's functionDeclarations format, with tool choice values mapped from Anthropic's auto/none/required to Gemini's AUTO/NONE/ANY.
  • Response normalization: Gemini's finishReason values (STOP, MAX_TOKENS, SAFETY) are mapped back to Anthropic's stop, length, and content_filter values before the response reaches Claude Code.

Bifrost handles this translation at 11 microseconds of added overhead at 5,000 RPS, which means the translation cost is negligible compared to Gemini API latency.

The supported operations for Gemini through Bifrost include chat completions (streaming and non-streaming), the Responses API, embeddings, image generation (both standard Gemini and Imagen formats), transcription, and speech. For Claude Code's core use cases, chat completions and streaming are what matter.

Step 1: Start Bifrost

The fastest way to run Bifrost locally is with NPX:

npx -y @maximhq/bifrost

Or with Docker, mounting a local data directory for persistent configuration:

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

Both commands start the gateway at http://localhost:8080. The web UI is available at the same address and provides a visual interface for provider configuration, routing rules, and observability.

For production deployments, see the Kubernetes deployment guide.

Step 2: Configure Gemini as a Provider

Add Gemini to Bifrost using the web UI, the configuration API, or a config.json file.

Using the Web UI

Navigate to Models > Model Providers, find Google Gemini under Configured Providers (or click Add New Provider), then click Add Key. Paste your Gemini API key directly or reference it as an environment variable using the env. prefix (e.g., env.GEMINI_API_KEY). Set Allowed Models to All Models to permit any Gemini model, or specify an explicit model allowlist.

Using config.json

{
  "providers": {
    "gemini": {
      "keys": [
        {
          "name": "gemini-key-1",
          "value": "env.GEMINI_API_KEY",
          "models": ["*"],
          "weight": 1.0
        }
      ]
    }
  }
}

Using the API

curl --location '<http://localhost:8080/api/providers>' \
--header 'Content-Type: application/json' \
--data '{
  "provider": "gemini",
  "keys": [
    {
      "name": "gemini-key-1",
      "value": "env.GEMINI_API_KEY",
      "models": ["*"],
      "weight": 1.0
    }
  ]
}'

Bifrost automatically handles Gemini's authentication: standard endpoints use the x-goog-api-key header, while Imagen and custom endpoints use query-parameter authentication. The provider configuration handles both methods without any additional setup.

Step 3: Create a Virtual Key

Virtual keys are the primary governance mechanism in Bifrost. Each developer or team receives a virtual key that encodes their access policy: which providers and models they can reach, their budget ceiling, and their rate limit. The actual Gemini API key is stored in the gateway and never distributed to individual users.

Create a virtual key in the Bifrost web UI under Virtual Keys. Configure it to restrict access to the gemini provider and set a monthly spending cap appropriate for your team's usage.

Step 4: Configure Claude Code to Route Through Bifrost

Claude Code's model routing is controlled via settings.json, located at ~/.claude/settings.json on macOS/Linux. Two configuration approaches work for Gemini.

Option A: Provider-Specific Model Pinning

Pin Claude Code's Haiku and Sonnet tiers directly to Gemini models:

"env": {
  "ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
  "ANTHROPIC_AUTH_TOKEN": "your-virtual-key",
  "ANTHROPIC_DEFAULT_HAIKU_MODEL": "gemini/gemini-2.5-flash",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "gemini/gemini-2.5-pro"
}

ANTHROPIC_AUTH_TOKEN is the recommended authentication method. Claude Code sends this value in the Authorization: Bearer header, and Bifrost uses it as the virtual key for routing and governance. No Anthropic account login is required.

Option B: Dynamic Aliasing via Routing Rules

For teams that want to change the Gemini model target without touching developer machines, use routing rules in Bifrost to map alias names to provider/model targets at request time:

  1. In the Bifrost dashboard, create a routing rule for sonnet-model with a user-agent condition matching claude-cli, mapped to gemini/gemini-2.5-pro.
  2. Create a second rule for haiku-model mapped to gemini/gemini-2.5-flash.
  3. Update settings.json:
"env": {
  "ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
  "ANTHROPIC_AUTH_TOKEN": "your-virtual-key",
  "ANTHROPIC_DEFAULT_HAIKU_MODEL": "haiku-model",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "sonnet-model"
}

The alias approach allows model targets to be updated centrally in the gateway without redeploying or reconfiguring individual developer environments.

Verify the Configuration

After updating settings.json, start Claude Code and confirm the active model:

claude --model gemini/gemini-2.5-pro

Or switch mid-session:

/model gemini/gemini-2.5-flash

Run /model without arguments to confirm the current active model. Claude Code will show the Gemini model name if routing is working correctly.

Option: Use Bifrost CLI Instead of Editing settings.json

Steps 3 and 4 above require manually editing settings.json and setting environment variables. If you prefer not to touch those files, the Bifrost CLI is an interactive terminal tool that handles the Claude Code configuration automatically, once the Bifrost gateway is already running.

With the gateway running (from Step 1), launch the CLI in a second terminal:

npx -y @maximhq/bifrost-cli

The CLI walks through a four-step interactive setup:

  1. Base URL — enter your Bifrost gateway address (default: http://localhost:8080)
  2. Virtual key — enter your virtual key if authentication is enabled, or press Enter to skip
  3. Harness — select Claude Code from the agent list; the CLI shows installation status and will install it via npm if missing
  4. Model — the CLI fetches available models from your running gateway and presents a searchable list; type gemini/gemini-2.5-pro or gemini/gemini-2.5-flash to filter and select

After confirming the summary screen, the CLI launches Claude Code with all required environment variables set automatically (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN, Gemini model identifiers) and auto-attaches Bifrost as an MCP server inside Claude Code, so configured MCP tools are available without additional setup. No settings.json edits required.

The CLI stores its configuration in ~/.bifrost/config.json and keeps virtual keys in your OS keyring. On subsequent runs it returns to the summary screen with your last selections intact. Press Enter to re-launch immediately, or press m to pick a different Gemini model first.

For teams running multiple concurrent sessions, the CLI provides a tabbed terminal UI. Press Ctrl+B to open the tab bar, n to start a new session, and number keys to jump between tabs.

The CLI and the manual settings.json approach produce identical routing behavior through Bifrost. The CLI is the faster path for individual developers; the manual approach is better suited for shared team configurations or GitOps-managed environments.

Gemini-Specific Capabilities Available Through Bifrost

Beyond standard chat completions, Bifrost exposes several Gemini capabilities through the same API surface:

  • Reasoning/Thinking: Gemini's extended thinking mode maps to Bifrost's reasoning parameter. Setting reasoning.effort to "high" maps to Gemini's thinkingConfig.thinkingLevel = HIGH, with reasoning.max_tokens controlling the thinking budget.
  • Multimodal inputs: Image inputs in URL or base64 format, video content with fps and offset metadata, and PDF files via file references all pass through Bifrost's content conversion layer.
  • Embeddings: Bifrost routes embedding requests to Gemini's embedContent endpoint, with dimensions mapping to outputDimensionality.
  • Image generation: Both standard Gemini image generation and Imagen model requests are supported. Bifrost automatically selects the correct endpoint (/generateContent vs. /predict) based on the model name.

For Claude Code's primary use cases, chat completions and tool calling are the critical path. Bifrost maps Claude Code's tool schemas to Gemini's functionDeclarations format and handles the response conversion back to Anthropic format transparently.

Governance and Cost Controls Across Gemini Traffic

Running Claude Code against Gemini through Bifrost enables cost and access controls that aren't available when calling Gemini directly:

  • Per-developer budget limits: Set daily or monthly spending caps on each virtual key. When a developer hits their budget ceiling, Bifrost rejects requests with a policy error instead of accumulating additional spend.
  • Model access restrictions: Restrict a virtual key to specific models. A junior engineer's key might permit only gemini/gemini-2.5-flash, while a senior engineer's key adds gemini/gemini-2.5-pro.
  • Rate limiting: Apply request-per-minute or token-per-day limits at the virtual key or provider level to distribute capacity across teams.
  • Automatic fallback: Configure Bifrost to fall back from Gemini to Anthropic (or another provider) when Gemini returns 5xx errors or rate limit responses. The fallback configuration requires no changes in Claude Code.

The full governance feature set covers budget hierarchies at the customer, team, virtual key, and provider configuration levels. For enterprise teams with SSO requirements, Bifrost Enterprise adds OIDC integration and RBAC on top of the same governance stack.

For teams evaluating AI gateways for their Claude Code infrastructure, the LLM Gateway Buyer's Guide covers the full capability matrix across routing, governance, observability, and deployment options.

Production Considerations

A few configuration details are worth addressing before rolling out Gemini routing in production:

Tool calling compatibility: Claude Code relies heavily on tool use for file edits, bash commands, and code operations. Gemini 2.5 Pro and Flash both support function calling, but not all Claude-specific server-side tools (web_search, computer_use) are available on non-Claude models. Validate your specific workflow against Gemini before switching tiers in production.

Streaming finish reason timing: Gemini only includes finish_reason in the final stream chunk. Bifrost normalizes this to match Anthropic's streaming format, so Claude Code receives stop at the correct position, but it is worth noting if you are debugging stream behavior.

Consecutive tool responses: Gemini merges consecutive tool response messages into a single user message. Bifrost handles this conversion automatically, but it changes the message count in observability logs compared to direct Anthropic calls.

For deeper observability into Gemini traffic routing through Bifrost, the observability features expose per-request latency, token usage by provider, and error rates in the built-in dashboard and via OpenTelemetry and Prometheus.

Summary

Routing Claude Code through Google Gemini models requires three configuration steps: starting Bifrost, adding the Gemini provider with an API key, and updating settings.json to point Claude Code at the Bifrost endpoint with Gemini model identifiers. Bifrost handles the full Anthropic-to-Gemini protocol translation, including role remapping, parameter renaming, tool schema conversion, and response normalization.

Virtual keys add budget limits, model access restrictions, and rate limits on top of the routing, giving platform teams control over Gemini spend without changing developer workflows. The same gateway handles automatic fallback to Anthropic or any other configured provider when Gemini is unavailable.

Bifrost is open-source with full documentation and benchmarks available for teams evaluating gateway options; the LLM Gateway Buyer's Guide covers the capability comparison in detail. To explore enterprise features, including adaptive load balancing, SSO integration, and in-VPC deployment for production Claude Code workflows, book a demo with the Bifrost team.