Try Bifrost Enterprise free for 14 days. Request access

Route Claude Code Through Groq Using Bifrost

Route Claude Code Through Groq Using Bifrost
Bifrost is an open-source AI gateway that lets you redirect Claude Code traffic to any provider, including Groq, without changing how Claude Code itself works. This guide walks through the complete setup.

Claude Code connects to Anthropic's API by default and uses Claude models exclusively. For most individual developers, that default works well enough. For teams that need low-latency inference on open-weight models for cost-sensitive tasks, or that want to route specific workloads to purpose-built inference hardware, the default becomes a constraint. Bifrost, the open-source AI gateway built in Go by Maxim AI, sits between Claude Code and any LLM provider and translates requests at the transport layer. By pointing Claude Code at Bifrost's Anthropic-compatible endpoint, you can route inference to Groq, or any other configured provider, without touching the Claude Code binary.

Why Route Claude Code Through Groq

Groq operates a cloud inference platform built on Language Processing Units (LPUs), custom ASICs designed specifically for transformer inference. The LPU architecture delivers deterministic latency: every token takes a consistent amount of time to generate, which eliminates the tail-latency variability common on GPU-based providers. Independent benchmarks place Groq's output throughput at 4-7x faster than the fastest GPU-based inference providers, with time-to-first-token 3-4x lower under comparable conditions.

For Claude Code workflows, that speed difference is meaningful in a few specific scenarios:

  • Background or exploratory tasks where you want fast iteration on open-weight models (Llama 4 Scout, llama-3.3-70b-versatile, DeepSeek R1 Distill) at lower token cost than frontier model pricing
  • Multi-step agent chains where tool calls require sequential LLM invocations; faster inference per step compounds across the chain
  • Cost tiering where Haiku-equivalent tasks can be served by a fast open-weight model, reserving Anthropic credits for reasoning-heavy work

One important caveat: Groq's API does not support image inputs, embeddings, speech synthesis, or transcription. It handles chat completions and tool calling on text-only inputs. Claude-specific server-side tools like computer_use are also unavailable. Route Groq for text-generation and tool-use tasks; keep Anthropic in the mix for tasks that depend on those capabilities.

How Bifrost Connects Claude Code to Groq

Bifrost exposes an Anthropic-compatible endpoint at /anthropic on its local gateway. Claude Code, configured to point at that endpoint instead of Anthropic's production API, treats Bifrost as its backend. Bifrost receives the incoming Anthropic-format request, routes it to whichever provider and model you have configured, and returns the response in the format Claude Code expects.

Groq is an OpenAI-compatible provider. Bifrost's Groq integration delegates to its OpenAI implementation with parameter handling adjusted for Groq's specifics: unsupported fields like store, service_tier, and prompt_cache_key are silently dropped, and streaming uses Groq's SSE format. Tool calling is fully supported. The provider configuration and the routing layer are separate concerns in Bifrost; adding Groq as a provider and routing Claude Code to it are two independent steps.

Step 1: Install and Start Bifrost

The fastest way to run Bifrost locally is via npx. You need Node.js 18 or later.

npx -y @maximhq/bifrost -app-dir ./bifrost-data

This starts the gateway at http://localhost:8080 and creates a bifrost-data directory for configuration and storage. Open http://localhost:8080 in your browser to access the dashboard.

Alternatively, run Bifrost with Docker:

docker run -p 8080:8080 maximhq/bifrost:latest

Step 2: Configure Groq as a Provider

Bifrost supports provider configuration through the web UI or a config.json file. Using the web UI: navigate to Models > Model Providers, find Groq under configured providers or add it via Add New Provider, then add your Groq API key (direct value or as env.GROQ_API_KEY). Set Allowed Models to All Models or specify a model allowlist.

Using config.json, create or edit the file in your bifrost-data directory:

{
  "providers": {
    "groq": {
      "keys": [
        {
          "name": "groq-key-1",
          "value": "env.GROQ_API_KEY",
          "models": ["*"],
          "weight": 1.0
        }
      ]
    }
  }
}

Export your Groq API key before starting Bifrost:

export GROQ_API_KEY="your-groq-api-key"

You can verify the provider is active by listing models from the Bifrost dashboard under Models > Model Providers > Groq.

Step 3: Create a Virtual Key

Virtual keys are Bifrost's mechanism for authenticating Claude Code to the gateway without exposing your actual provider API keys. Each virtual key can carry routing rules, budget limits, and rate limits.

In the Bifrost dashboard, navigate to Governance > Virtual Keys and create a new key. Assign it a name, set the allowed providers to include Groq, and copy the generated key value. This key is what you will pass to Claude Code in the next step.

Step 4: Connect Claude Code to Bifrost

With the gateway running and Groq configured, you need to point Claude Code at Bifrost. There are two ways to do this: editing settings.json manually, or using Bifrost CLI, an interactive terminal tool that configures Claude Code for you without touching any environment variables or config files.

Option A: Manual (settings.json)

Claude Code reads environment configuration from its settings.json file. The location depends on your OS and whether you want a global or project-scoped config:

  • macOS / Linux / WSL global: ~/.claude/settings.json
  • Windows global: %USERPROFILE%\.claude\settings.json
  • Project-scoped: .claude/settings.json in your project root

Add the following to the env block in your settings.json. The snippet shows only the env key; merge it into your existing top-level object rather than replacing the whole file.

"env": {
  "ANTHROPIC_BASE_URL": "<http://localhost:8080/anthropic>",
  "ANTHROPIC_AUTH_TOKEN": "your-bifrost-virtual-key",
  "ANTHROPIC_DEFAULT_HAIKU_MODEL": "groq/llama-3.3-70b-versatile",
  "ANTHROPIC_DEFAULT_SONNET_MODEL": "groq/llama-3.3-70b-versatile"
}

ANTHROPIC_BASE_URL redirects Claude Code's API calls to Bifrost's local Anthropic-compatible handler. ANTHROPIC_AUTH_TOKEN passes the virtual key as the Authorization: Bearer header, which Bifrost uses for authentication and routing. The Haiku and Sonnet slots are pinned to Groq models using the groq/ prefix.

If Claude Code was running, restart it after editing settings.json. To avoid caching issues, run /logout inside Claude Code before restarting, then relaunch.

Option B: Bifrost CLI (no environment variables needed)

Bifrost CLI is an interactive terminal launcher that connects Claude Code to a running Bifrost gateway without you editing any config files or setting environment variables. It handles base URL configuration, virtual key storage, model selection, and MCP auto-attachment in a guided flow.

In a second terminal (with the gateway already running), launch the CLI:

npx -y @maximhq/bifrost-cli

The interactive setup flow walks through five steps:

  1. Enter your Bifrost gateway URL (default: http://localhost:8080)
  2. Enter your virtual key from Step 3, or press Enter to skip
  3. Select Claude Code as your harness
  4. Type groq/ in the model search to filter available Groq models, then select one (e.g., groq/llama-3.3-70b-versatile)
  5. Press Enter on the summary screen to launch

The CLI sets all required environment variables for Claude Code automatically, registers Bifrost's MCP server endpoint so your configured tools are available, and stores your selections in ~/.bifrost/config.json for subsequent sessions. Virtual keys are stored in your OS keyring, never in plaintext on disk.

Model selection note: Claude Code relies on tool calling for file operations, bash, and code editing. Groq supports tool calling on its text completion models. Confirm that the model you select supports tool use for the operations you intend to run. You can review available Groq models and their capabilities in the GroqCloud developer console.

Step 5: Verify the Routing

After connecting Claude Code via either method, run a quick test:

claude --model groq/llama-3.3-70b-versatile

Inside the session, run /model to confirm the active model. The Bifrost dashboard under Logs will show the incoming request routed through the Groq provider. If the request appears in logs with a 200 response, the routing is working correctly.

To switch models mid-session without restarting, use the /model command:

/model groq/llama-3.3-70b-versatile

Routing Rules and Fallbacks

One of the advantages of routing Claude Code through Bifrost is access to automatic fallbacks. If Groq returns a 5xx error or rate-limit response, Bifrost can fall back to Anthropic or another configured provider automatically, without Claude Code noticing. Configure a fallback chain in the Bifrost dashboard under Features > Fallbacks, or define it in config.json:

{
  "fallbacks": [
    {
      "from": "groq/llama-3.3-70b-versatile",
      "to": ["anthropic/claude-haiku-4-5"]
    }
  ]
}

With routing rules, you can also define condition-based overrides. For example, route requests from Claude Code (identifiable by the claude-cli user-agent) to Groq by default, while routing other clients to Anthropic. This gives you per-client model governance without separate deployments.

Adding Governance and Observability

With Claude Code routed through Bifrost, you gain the full governance layer for free. Budget and rate limits on virtual keys prevent runaway costs. Audit logs track every request with model, provider, token count, and latency. For teams with multiple developers on Claude Code, the LLM Gateway Buyer's Guide covers how to structure virtual key hierarchies by team, project, or user.

Observability is available out of the box through Bifrost's dashboard. Prometheus metrics are exposed at /metrics for integration with existing monitoring stacks. OpenTelemetry export is also supported for distributed tracing into Datadog, Grafana, or any OTLP-compatible backend.

For teams that want to centralize MCP tool access alongside model routing, Bifrost also supports acting as an MCP gateway: connecting to upstream MCP servers and exposing them through a single /mcp endpoint that Claude Code can register with a single claude mcp add command. This separates inference routing from tool governance into two clean configuration points rather than one sprawling Claude Code config.

Summary

Routing Claude Code through Groq via Bifrost involves three shared setup steps (start the gateway, configure Groq as a provider, create a virtual key) followed by connecting Claude Code to Bifrost, either by editing settings.json directly or using Bifrost CLI for a no-environment-variable interactive flow. After that, Claude Code routes through Groq by default, with fallback, governance, and observability available without additional setup.

The approach works for any other provider Bifrost supports: swap the model prefix to anthropic/, openai/, bedrock/, vertex/, or any of the 20+ configured providers, and Claude Code follows without further changes.

To explore enterprise features like adaptive load balancing, clustering, vault-backed secrets, SSO, and in-VPC deployment for your Claude Code infrastructure, book a demo with the Bifrost team.