Human-in-the-Loop AI: Ensuring Safety and Control in Agent Workflows
By Chris Moen • Published 2026-03-25
Discover how human-in-the-loop (HITL) workflows add crucial checkpoints to AI agents, ensuring safety, auditability, and production readiness through approvals, waits, and callbacks.
Quick Answer
Human-in-the-loop workflows add deliberate checkpoints to AI agents. You pause execution, ask for approval or input, and resume later by callback or manual action. Approvals, waits, and callbacks make agent work safe, auditable, and production-ready.
What This Means in Practice
Human-in-the-loop (HITL) puts people and systems in the loop on purpose. The agent proposes. The workflow waits. A person confirms or amends. Then the run continues.
Common building blocks:
- Approvals. Pause until a human accepts or rejects a proposed action. Tools like n8n let you require human approval before an agent executes a tool call. See the n8n note on requiring approval before tool execution.
- Waits. Park the run until an event arrives. For example, “wait for reviewer input” or “wait for a background job to finish.” Restate documents pause and resume patterns for approvals and signals. See Restate’s Approvals with Pause & Resume.
- Callbacks. Continue only when an external system posts back to a callback URL. You see this in “task token” style designs and transport layers for AI tools. For example, Ably shows OpenAI human‑in‑the‑loop approval and callback patterns. Step Function guides also discuss task tokens and the callback pattern for human approvals.
Linked References:
- Human approval before tool execution: Human-in-the-loop for AI tool calls on n8n.
- Pause and resume: Approvals with Pause & Resume on Restate.
- OpenAI human-in-the-loop approval and callbacks: Ably guide.
- Step Functions task tokens and callbacks: overview of task tokens and the callback pattern.
Why This Matters for Production Workflows
HITL turns fragile demos into durable operations:
- Safety. Gate high-impact actions like code changes, billing, and customer messages.
- Clarity. Show the exact change, diff, or outbound payload before it ships.
- Audit. Keep a run history of who approved what and when.
- Control. Set timeouts, escalate when needed, and roll forward only on clear signal.
- Longevity. Let jobs run for minutes or hours without blocking a live connection.
- Fit for real teams. Blend agents, people, and external systems in one path.
What to Look for in a HITL Platform
Focus on primitives and operations, not demos:
- First-class approvals and waits. Not bolt-ons.
- Deterministic orchestration and clear step history.
- External callbacks that safely resume paused runs.
- Versioned workflows with draft vs live targets.
- Long-running job support without keeping a step open.
- Resource handling for large artifacts, not just tiny JSON.
- Connection and secret management separate from logic.
- Agent-friendly CLI or API with stable JSON output.
- Triggers for manual, schedule, and webhook events.
How Breyta Fits This Use Case
Breyta is a workflow and agent orchestration platform for coding agents. It is built for multi-step automations, agent workflows, and autonomous jobs that need structure, state, and release control.
For human-in-the-loop patterns, Breyta provides:
- Approvals and waits. Approvals and wait steps are first-class. Flows can pause for human confirmation or external signals and resume later with state intact.
- Callback-driven resumes. Long-running work can post back to a callback URL that continues the flow.
- Remote-agent pattern. Kick off work on a VM over SSH, pause with a wait, and resume on callback. This is the documented approach for long jobs and VM-backed agents.
- Local-agent orchestration. Hand work to a local coding agent, wait, and continue when it finishes.
- Deterministic runs and history. Inspect step outputs and decisions across every run.
- Versioned flows. Iterate in draft. Release to live when approved. Runs are pinned to the resolved release at start time.
- Resource model. Persist large outputs, pass compact res:// refs between steps, and keep artifacts inspectable.
- Agent-first CLI. Commands return stable JSON for agents to parse and operate the workflow lifecycle.
- Triggers. Manual, schedule, and webhook/event triggers are supported.
- Billing nuance. Triggers, waits, and approval steps do not count as billable step executions.
Examples Breyta supports today:
- Autonomous review-and-apply flows that ask for human approval before release and promotion.
- Content operators on a VM that generate drafts, persist memory, request approval, and dispatch approved posts.
- Coding-agent runs on VMs that start detached workers, wait for callbacks, and return structured PR payloads.
Approvals, Waits, and Callbacks in a Real Flow
A simple production path looks like this:
- Trigger. Manual button, webhook from your app, or a schedule.
- Gather context. Fetch data, search, or run an LLM step.
- Propose. Prepare the diff, outbound message, or code change.
- Approval gate. Pause for a human to review. Show the exact change and impact.
- Wait state. If more work is needed, park the run until an external event or a human response arrives.
- Callback resume. Continue only when a service completes and posts results.
- Apply or abort. If approved, apply the change. If rejected, record the reason and stop.
- Notify and persist. Save artifacts as resources. Send notifications.
What makes this durable in Breyta:
- The wait and approval steps keep state across time.
- Large artifacts are stored as resources, not pushed through every step.
- Each step is logged and inspectable in run history.
- The live version stays stable until you promote a new release.
Long-Running and VM-Backed Agents
Some agent work takes time. Example: a code agent running tests and preparing a PR.
Breyta’s remote-agent pattern:
- Start a remote process over SSH with an
:sshstep. - Pause the workflow with a
:waitstep. - The remote worker posts back to a callback URL with results.
- The flow resumes, evaluates output, and continues to approval or publish.
Why this helps:
- No fragile long-lived SSH step.
- Workflow state persists while heavy work runs elsewhere.
- Results come back in structured form for clean handoff.
Callback-style designs are a common approach across the industry. You will see similar pause and resume patterns in systems like Restate, and HITL approval plus callback guidance in OpenAI transport guides such as Ably’s example.
How to Evaluate HITL Operations
Before you choose, check these details:
- **Approval UX**. Can approvers see the exact diff or payload? Can they comment or require changes?
- Timeouts and escalation. Can waits expire and route to a backup path?
- Notifications. Are notify steps and channels supported for both approvals and results?
- Version control. Are flows versioned with a draft vs live split, and are releases immutable?
- Run pinning. Are runs pinned to the resolved release at start time?
- Concurrency policy. Can you set safe limits and retries?
- Artifacts. Can you persist large outputs and pass lightweight refs?
- Secrets. Are connections and secrets managed outside workflow logic?
- Agent tooling. Is there a stable, scriptable CLI for agents and CI?
Where Breyta Is a Strong Fit
Use Breyta when you need:
- Approval-heavy flows that must pause, resume, and record decisions.
- Long-running agent jobs that run on local machines or VMs over SSH.
- Deterministic orchestration around coding agents you already use.
- Versioned releases with draft, review, and promote.
- Reuse through templates and published apps when you want to package a working pattern.
Breyta handles execution, state, retries, and recovery for the orchestration layer. You bring your external systems, APIs, local runners, or VMs when the job needs them.
FAQ
Do I need to keep a long SSH connection open during a wait?
No. Start remote work, pause the flow with a wait step, and resume when the worker posts back. The workflow keeps state the whole time.
What is the difference between a wait and an approval?
A wait pauses for any external signal. An approval is a structured human checkpoint with an explicit accept or reject outcome.
Can I iterate fast without risking production?
Yes. Breyta splits draft and live. You push changes to draft, test runs, and promote a version to live when approved.
How are large outputs handled?
Persist them as resources and pass compact res:// references. This keeps state lean and artifacts inspectable.
Do waits and approvals count as billable steps in Breyta?
Triggers, waits, and approval steps do not count as billable step executions.