Manual Approval vs Autonomous Tool Execution: Designing Safe AI Agent Loops
Designing safe AI agent loops means choosing where to gate tool execution. A practical guide to manual approval, autonomous execution, and the middle path.
Every production AI agent eventually has to answer the same question: when the model proposes a tool call, who decides whether it runs? The two ends of the spectrum, fully manual approval and fully autonomous execution, both fail in predictable ways. Designing safe AI agent loops is the work of building the structured middle, where low-risk tool calls run on their own, high-risk ones pause for human review, and every decision is logged. Bifrost, the open-source AI gateway by Maxim AI, implements this pattern at the infrastructure layer through its MCP gateway, so platform teams can enforce execution policies once and have every agent across the organization inherit them.
This piece breaks down the design space, the failure modes on each side, and the controls Bifrost provides to ship agents that are both capable and safe.
What Safe AI Agent Loops Actually Require
A safe AI agent loop is one where tool execution is gated by a policy that matches each action's risk to the right level of human oversight, with every decision auditable end to end. Four properties define a loop that holds up in production:
- Default-deny: no tool runs unless an explicit policy permits it.
- Risk-tiered execution: read operations can auto-run, while writes, deletes, and external side effects require approval.
- Scoped credentials: each agent or consumer can only see and call the tools its role permits.
- Complete audit trail: every tool suggestion, approval, execution, and result is logged with the parent LLM request that triggered it.
These four properties hold whether the agent is an internal coding assistant, a customer-facing support bot, or an infrastructure automation pipeline.
The Failure Modes at Each End of the Spectrum
Both extremes of the manual-vs-autonomous design choice break in production, but in different ways.
Pure manual approval breaks at scale
If every tool call queues for human review, three failures emerge. Reviewers stop reading and start rubber-stamping, producing what MIT Technology Review described in April 2026 as "HITL theater," the illusion of oversight without the substance. Agents stall whenever a reviewer is unavailable, defeating the latency advantage that made the agent useful in the first place. And once the volume crosses a few hundred actions per day, organizations either staff a permanent review desk or quietly raise the auto-approval threshold until the policy is meaningless.
Pure manual approval also fails to scale across tool types. Asking a human to review a read_file call that returns a public README is the same UI friction as reviewing a delete_database call. The signal-to-noise ratio collapses.
Pure autonomous execution is unsafe in production
The opposite extreme is more dangerous. When an agent can execute any tool the model proposes, the model's mistakes become real-world consequences. In March 2026, an ML engineer used Claude Code with Terraform to wipe out an entire AWS production deployment after self-approving the agent's plan, recoverable only because a backup existed. Around the same time, an internal AI agent at Meta posted an incorrect answer to a confidential engineering forum, which then granted broad access to user data for over two hours.
The pattern in both cases is the same: an agent had write or destructive permissions it did not need, the human approving the action lacked the context to evaluate it, and there was no policy layer between the model's suggestion and the system's response. The OWASP Top 10 for LLM Applications flags exactly this risk category as excessive agency, and the EU AI Act's Article 14 makes human oversight a legal requirement for high-risk systems starting August 2026.
A Risk-Tiered Approach to Tool Execution
The middle path is to classify tools by risk and let policy, not the model, decide which calls run autonomously. A useful classification looks like this:
- Tier 1, read-only:
list_directory,read_file,search,query_readonly. Safe to auto-execute. Reversible, no side effects. - Tier 2, scoped writes:
create_ticket,post_comment,write_fileto a sandboxed path. Auto-executable for trusted virtual keys; otherwise requires approval. - Tier 3, destructive or external:
delete_record,send_email,execute_command,publish_release, anything that hits a paid external API. Always requires explicit human approval.
This tiering replaces a binary choice with a deterministic policy. The model still proposes tool calls. The gateway, not the model, decides which proposals reach the system.
How Bifrost Implements Safe AI Agent Loops
Bifrost ships with the primitives needed to operate this policy at the gateway layer, without per-agent code. The same MCP gateway that handles tool discovery and execution also enforces the approval policy, scopes credentials, and writes the audit trail.
Default-deny tool execution
By default, Bifrost does not execute the tool calls an LLM proposes. When a model returns a tool call, Bifrost returns it to the application, which must explicitly hit the tool execution endpoint to run it. This security-first default means an agent connected to a fleet of MCP servers cannot accidentally invoke anything, the application is always in the loop.
Agent Mode for supervised autonomy
For workflows where every tool call cannot realistically pause for approval, Bifrost's Agent Mode introduces configurable auto-approval. Each MCP client has a tools_to_auto_execute field, and a tool must appear in both tools_to_execute (the allow-list) and tools_to_auto_execute (the auto-approval list) to run without manual intervention. Anything else is returned to the application for review.
A safe Agent Mode configuration follows the risk tiering above:
{
"name": "filesystem",
"tools_to_execute": ["read_file", "list_directory", "write_file"],
"tools_to_auto_execute": ["read_file", "list_directory"]
}
Here, write_file is permitted but always pauses for approval. The model can still propose it, the application still sees the proposal, but execution requires an explicit call. Read-only operations flow through automatically.
Agent Mode also runs auto-approved tools in parallel within each iteration, which keeps latency low for the safe operations that make up the majority of any real workflow. Mixed iterations are supported: in a single step, Bifrost will auto-execute the read calls, return the writes for approval, and resume once the application responds.
Tool filtering for least-privilege access
Risk-tiered execution alone is not sufficient if every agent can see every tool. Bifrost layers three levels of tool filtering on top of Agent Mode:
- Client-level: the baseline set of tools registered on each MCP client.
- Request-level: per-request narrowing via headers (
x-bf-mcp-include-tools) or SDK context values. - Virtual key-level: per-consumer scoping that overrides everything else.
Virtual keys are Bifrost's primary unit of governance. Each key is scoped at the tool level, not just the server level: a key can be granted filesystem_read_file without being granted filesystem_write_file from the same MCP server. A customer-facing agent simply cannot reach internal admin tooling, even if the model hallucinates a call. The full governance model covers virtual keys, tool groups, budget caps, and rate limits in one configuration surface.
OAuth 2.1 for per-user identity
For tools that act on external systems on behalf of a specific user, Bifrost's OAuth integration binds tool calls to the user's own credentials with PKCE and automatic token refresh. This closes the gap where a single shared service account would otherwise be the principal for every action, which collapses the audit trail and breaks compliance.
Implementation Pattern: From Default-Deny to Supervised Autonomy
The rollout pattern that works in practice is sequential, not all-at-once:
- Start with default-deny. Connect MCP servers through Bifrost's MCP gateway. Do not enable Agent Mode. Every tool call returns to the application for explicit execution.
- Classify tools by risk tier. Walk every connected MCP server and assign each tool to Tier 1, 2, or 3. Document the classification in the same place as the virtual key configuration.
- Enable Agent Mode for Tier 1 only. Add read-only tools to
tools_to_auto_execute. Leave everything else gated. - Issue scoped virtual keys. Create one virtual key per consumer (team, agent, customer integration). Scope each key to the tools it actually needs.
- Promote tools to Tier 2 selectively. For specific virtual keys where a write operation is routine and reversible, expand
tools_to_auto_execute. Keep Tier 3 tools in manual approval for every key, no exceptions. - Wire the approval UI. When Bifrost returns a non-auto tool call, the application surfaces it to the right reviewer. The agent run pauses, state is preserved, execution resumes once approval arrives.
Most teams will run for weeks at step 1 or 2 before moving on. That is the right pace. The cost of one production incident is higher than the latency overhead of an extra approval click.
Audit Trails and Compliance for Agent Loops
Even with the right execution policy, the loop is not safe if the system cannot reconstruct what happened. Bifrost logs every tool call as a first-class entry, tool name, MCP server, arguments, result, latency, the virtual key that triggered it, and the parent LLM request that initiated the agent loop. The audit logs are immutable and designed for SOC 2 Type II, GDPR, HIPAA, and ISO 27001 scope. Content logging can be disabled per environment while still capturing the metadata required for attribution.
For deeper context on how this governance layer compounds with cost controls, the Bifrost engineering writeup on MCP gateway access control, cost governance, and 92% lower token costs at scale walks through the same primitives applied to the cost side of the same problem. Access control and cost governance share the same infrastructure surface in Bifrost, which is the point: a single gateway enforces both, and every agent run produces one coherent record.
Start Building Safe AI Agent Loops with Bifrost
Safe AI agent loops are not a model problem. They are an infrastructure problem. The model will keep proposing tool calls, some of them correct, some of them not. What determines whether those proposals become safe outcomes is the policy layer between the model and the tools, and that layer belongs in the gateway, not in every agent's code. Bifrost provides that layer, default-deny execution, risk-tiered Agent Mode, scoped virtual keys, tool filtering, OAuth-bound identity, and immutable audit trails, in a single open-source binary.
To see how Bifrost can help your team ship safer AI agents without sacrificing autonomy, book a Bifrost demo with the team.