Self-Hosted AI Gateway for Cursor with Claude or Ollama

Self-Hosted AI Gateway for Cursor with Claude or Ollama

Run Cursor with Claude, Ollama, and 20+ providers through a self-hosted AI gateway. Configure custom models, virtual keys, and unified routing with Bifrost.

Engineering teams adopting Cursor as their primary AI IDE quickly hit the limits of its default model picker. Cost overruns from agent-mode runs, model lock-in to a single provider, no visibility into per-developer token spend, and an inability to use locally hosted models for sensitive code are common complaints. A self-hosted AI gateway for Cursor fixes all of these in a single layer: it lets a team point Cursor at one internal endpoint and then route every Chat, Agent, Inline Edit, and Tab Completion request to Claude, GPT-5, Gemini, or a local Ollama instance, with governance, observability, and failover applied at the gateway. Bifrost is the open-source AI gateway that makes this configuration possible in minutes.

Why Teams Want a Self-Hosted Gateway for Cursor

Cursor ships with a hosted backend and a curated list of models, but enterprise and security-conscious teams almost always want more control over the data path. A self-hosted gateway addresses several recurring problems:

  • Provider lock-in: The hosted Cursor backend abstracts away which provider serves a request, making it hard to enforce policies like "Claude only for repos that touch payment code" or "local Ollama for proprietary algorithms."
  • Cost opacity: Without a gateway, individual developer spending against Anthropic, OpenAI, or Google bills is invisible until the monthly statement arrives.
  • Data residency: Some workloads cannot leave a VPC. Routing Cursor through a self-hosted gateway with an Ollama backend keeps all inference local.
  • Failover during outages: When a single provider goes down, Cursor with the default backend stalls. A gateway with automatic fallbacks keeps developers productive by silently swapping to a secondary model.
  • Audit and compliance: SOC 2, HIPAA, and GDPR audits require request-level logs, which the gateway can produce centrally.

Bifrost runs as a self-hosted gateway that sits between Cursor and every supported provider, so all of these controls become configurable from a single dashboard.

What Is a Self-Hosted AI Gateway

A self-hosted AI gateway is a proxy that you run on your own infrastructure (local machine, VPC, or Kubernetes cluster) and that exposes a unified, OpenAI-compatible API to clients like Cursor, while routing requests to one or more LLM providers behind the scenes. It centralizes authentication, model routing, rate limits, caching, and observability so that every AI-powered client in your organization talks to one endpoint.

In Cursor's case, the gateway intercepts requests that would normally go to api.openai.com, translates them to whichever provider you have selected, and returns an OpenAI-shaped response so Cursor never knows the difference.

How Bifrost Connects Cursor to Claude, Ollama, and Other Providers

Bifrost is a high-performance, open-source AI gateway built by Maxim AI that exposes a single OpenAI-compatible HTTP API and routes requests to 20+ providers. Because Cursor allows you to override the OpenAI base URL globally, Bifrost slots in without any modifications to Cursor itself.

Bifrost supports the following providers using a provider/model-name format that Cursor recognizes natively:

  • Anthropic: anthropic/claude-sonnet-4-5-20250929, anthropic/claude-opus-4-5
  • OpenAI: openai/gpt-5, openai/gpt-4.1
  • Google Gemini: gemini/gemini-2.5-pro, gemini/gemini-2.5-flash
  • AWS Bedrock: bedrock/anthropic.claude-3-5-sonnet
  • **Ollama (local)**: ollama/llama-3.3-70b, ollama/qwen2.5-coder
  • Groq, Mistral, Cohere, xAI, Cerebras, Perplexity, Azure OpenAI, OpenRouter, vLLM, Hugging Face, and more

Bifrost adds only 11 microseconds of overhead per request at 5,000 RPS in sustained performance benchmarks, which means the gateway is invisible to Cursor users even under heavy agent workloads.

Setting Up Bifrost as a Self-Hosted Gateway for Cursor

The end-to-end setup takes under ten minutes for a local installation, longer only if you are deploying behind a public hostname for a team. The four phases are: run Bifrost, configure the providers you want, expose Bifrost to Cursor, and point Cursor at the gateway.

Step 1: Run Bifrost locally or in your VPC

Bifrost ships as a single binary, an NPX package, and a Docker image. The fastest local start is:

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

This starts the gateway on http://localhost:8080 with a persistent data volume. You can also run it via npx @maximhq/bifrost or deploy to Kubernetes for team-wide access. Bifrost works as a drop-in replacement for the OpenAI SDK, so no client code changes are required anywhere downstream.

Step 2: Add Claude and Ollama as providers

Open the Bifrost dashboard at http://localhost:8080, go to Providers, and add:

  • Anthropic: Paste your Anthropic API key. Bifrost will route any anthropic/... model through this provider.
  • Ollama: Set the base URL to http://localhost:11434 (or wherever your Ollama instance is running). No API key is required for local Ollama.

You can add more providers in the same workflow. Bifrost's provider routing rules let you assign weights, fallbacks, and per-model overrides so that, for example, every anthropic/claude-sonnet-4-5-20250929 request can fall back to openai/gpt-5 if Anthropic is degraded.

Step 3: Expose Bifrost to Cursor

Cursor requires a publicly accessible URL for its base URL override, because its desktop client routes API traffic through Cursor's servers when used with hosted accounts. There are three common ways to expose a self-hosted Bifrost instance:

  • **Cloudflare Tunnel or ngrok** for individual developers and pilots
  • Internal load balancer with a private DNS entry like https://bifrost.internal.company.com for team deployments
  • Kubernetes ingress for production-grade installations (see the k8s deployment guide)

Whichever path you choose, the endpoint should accept HTTPS traffic and forward to Bifrost's port 8080.

Step 4: Connect Cursor to your self-hosted gateway

In Cursor, press Cmd+, (macOS) or Ctrl+, (Windows/Linux), navigate to Models, and complete these settings:

  1. In the OpenAI API Key field, paste a Bifrost virtual key or a raw provider API key.
  2. Toggle Override OpenAI Base URL to ON and enter your Bifrost endpoint (for example, https://bifrost.example.com/cursor).
  3. In Add or search model, enter the models you want available using the provider/model-name format: anthropic/claude-sonnet-4-5-20250929, openai/gpt-5, ollama/qwen2.5-coder, gemini/gemini-2.5-pro.

Cursor assigns models separately to Chat, Agent, Inline Edit, and Tab Completion. You can now mix providers per feature: use Claude for Agent mode, a fast Groq-hosted Llama for Tab Completion, and a local Ollama model for any work involving sensitive code. Non-native models must support tool use for Agent mode and Inline Edit to function correctly. For broader patterns across all supported coding agents, see Bifrost's CLI agents resource page.

For the full configuration walkthrough including screenshots, see Bifrost's Cursor integration guide in the docs.

Governance and Cost Control for Cursor Teams

The biggest operational benefit of running Cursor behind a self-hosted gateway is centralized governance. Bifrost's virtual keys are the primary governance entity: each developer or team gets a virtual key that maps to specific permissions instead of a raw provider API key.

Virtual keys let platform teams enforce:

  • Per-developer budgets: Cap monthly Cursor spend per engineer.
  • Model allowlists: Restrict Cursor's Agent mode to specific models for cost reasons, or restrict access to Claude Opus for sensitive workloads only.
  • Rate limits: Prevent runaway agent loops from exhausting an entire team's quota.
  • Provider scoping: Allow only Ollama models for engineers working on regulated repositories.

Bifrost's governance feature set supports hierarchical budgets at the virtual key, team, and customer level, so an organization can manage Cursor spend at the same level of granularity as cloud spend in AWS.

Observability for Cursor Requests Through Bifrost

Every Cursor request that flows through Bifrost is logged with the prompt, response, latency, token counts, and provider used. The Bifrost dashboard at http://localhost:8080/logs lets you filter by provider, model, or conversation content, and the observability stack exports native Prometheus metrics and OpenTelemetry traces.

Common observability use cases for Cursor teams include:

  • Identifying which Cursor features (Tab vs Agent vs Chat) drive the most cost
  • Comparing real-world latency of Claude versus Gemini for Inline Edit operations
  • Detecting prompt injection attempts in agent runs through log analysis
  • Tracking adoption per team to inform license and quota decisions

Bifrost integrates with Datadog, Grafana, New Relic, and Honeycomb, so Cursor telemetry can land in the same dashboards your platform team already uses for backend services. Teams running Maxim AI's evaluation platform can also pipe Bifrost traces into Maxim for production-grade prompt and agent evaluation.

Reliability with Automatic Failover

Cursor sessions stall when the active provider goes down. With Bifrost's automatic fallbacks, you can define a fallback chain (for example, Anthropic, then OpenAI, then Ollama as a last resort) and Bifrost transparently retries failed requests against the next provider in the chain. The developer in Cursor sees an uninterrupted response.

This is particularly valuable for Agent mode runs that may take minutes to complete: a single 503 from Anthropic would normally force a full restart, but the fallback returns a usable result and preserves the agent's context.

Start Using Cursor with Your Self-Hosted Gateway

A self-hosted AI gateway for Cursor turns the IDE from a single-provider tool into a fully governed, multi-model platform that any engineering organization can scale. Bifrost gives Cursor users access to Claude, Ollama, OpenAI, Gemini, and 16+ other providers behind a single endpoint, with virtual keys for governance, semantic caching for cost reduction, automatic failover for reliability, and built-in observability for compliance.

To see how Bifrost can become the self-hosted gateway behind your team's Cursor deployment, book a demo with the Bifrost team or explore the open-source repo on GitHub.