Achieving Idempotency: Safe Retries for Robust Workflows

By Chris Moen • Published 2026-03-24

Discover how to implement safe retries and strict idempotency in your workflows to prevent duplicate charges, emails, and records, ensuring system reliability.

Breyta workflow automation

Reliability needs safe retries and strict idempotency. The goal is simple. Let failures retry without charging twice, sending two emails, or creating duplicate records. Use stable keys, conditional writes, and clear gates around side effects.

What it means in practice

  • Retry is the act of running a step again after a failure or timeout.
  • Idempotency means running an operation multiple times produces the same effect as running it once. It is how you make retries safe. See this overview on why idempotency prevents duplicate actions in practice in Idempotency: Preventing Double Charges and Duplicate Actions.
  • In real systems you face slow networks, event retries, and user double-clicks. Each can fire the same workflow more than once.

Common duplicate symptoms:

  • Two charges for one order
  • Two welcome emails
  • Two rows for the same entity
  • The same job applied twice to a codebase

Why it matters for production workflows

Duplicates happen even when your code is correct.

  • Webhooks are often delivered with at-least-once semantics, so a provider may replay the same event if it thinks your system failed. See this n8n discussion about Stripe retries and at-least-once delivery.
  • Browsers resend on refresh or back. Users double-click when UIs lag.
  • Distributed systems add transient failures. Retries are required for resiliency, but without idempotency they repeat side effects.
  • Orchestrators that handle failures need a retry-safe plan. The Orkes blog walks through retry-safe distributed workflow patterns.

Patterns that prevent duplicates

Use layers that catch duplicates before they do harm. Combine several for defense in depth.

Edge and trigger patterns:

  • Stable idempotency keys. Use a unique request ID from the client or the event’s own ID. For Stripe webhooks, the event ID makes a solid key as noted in the n8n thread.
  • Dedupe log at the boundary. Record seen keys early. Drop repeats before any side effect.

Pre side effect gates:

  • Check and set in one step. Use an atomic check-and-set in a KV store or database before you write, charge, or send.
  • Conditional writes and unique constraints. Make the database reject duplicates with unique indexes. Use conditional updates or compare-and-swap so a second run does nothing. See this note on conditional writes and CAS from 7 patterns that make agent retries idempotent, not duplicative.

Action-level safety:

  • Pass idempotency keys to external APIs. Many APIs accept an Idempotency-Key header or a request-id field. Stripe is a common example highlighted in the n8n thread.
  • Inbox or outbox patterns. Persist the intent and deliver once. If an event must be handled once downstream, store a processed key and skip repeats.
  • Leases and locks for critical regions. Short leases prevent two workers from doing the same job in parallel.
  • Saga and compensation. If a later step fails, roll back or apply a compensating action rather than repeating a side effect. Covered in the Medium patterns reference.

Retry policy design:

  • Backoff and jitter. Space out retries to reduce contention.
  • Give up or escalate after a cap. Do not retry forever on permanent errors.
  • Choose blocking or non-blocking retries per use case. See Blocking vs. Non-Blocking Retry Patterns in event-driven commerce for tradeoffs.
  • Expect unexpected retries from callers and platforms. Microsoft’s guidance on Logic Apps notes using unique request IDs for safe handling when retries occur.

How Breyta fits this use case

Breyta is a workflow and agent orchestration platform for coding agents. It is built for reliable, multi-step automations and long-running jobs with approvals, waits, and clear run history. Here is how Breyta supports retry-safe, idempotent execution:

  • Deterministic runtime and run history. You can inspect step outputs and decisions for every run. That makes duplicate diagnosis straightforward.
  • Versioned flows and releases. Draft vs live keeps behavior stable. Retries hit the same released logic, not today’s edits.
  • Concurrency policy at the flow level. You can control overlap for a flow so runs do not step on each other.
  • First-class triggers. Webhook, schedule, and manual triggers give a clean boundary to capture external ids and enforce dedupe early.
  • Step primitives that implement the patterns:
  • :kv or :db for check-and-set dedupe keys at the start of a run.
  • :http to pass idempotency keys to external APIs.
  • :wait with external callbacks to coordinate long-running work without redoing steps.
  • :ssh to start remote agents, then resume only once when the callback arrives.
  • :notify and approvals to add human gates before irreversible actions.
  • Long-running agent pattern. Kick off remote work, pause with :wait, and resume exactly once on callback. This avoids keeping a risky long-lived connection open and prevents accidental re-execution.
  • Resource refs for large artifacts. Breyta persists outputs as resources with compact refs. That keeps state inspectable and prevents rework loops tied to oversized payloads.
  • Agent-first CLI. Agents can script flows and read stable JSON output. That makes building dedupe checks and approval gates reproducible.

Safe framing for Breyta in this context:

A concrete flow sketch in Breyta

Example shape for a webhook-triggered charge plus follow-ups:

  • Trigger: webhook receives event with event_id.
  • Dedupe gate: :kv or :db step does an atomic check-and-set with event_id. If seen, exit.
  • Payment: :http call to the processor. Include the event_id as an idempotency key if the API supports it.
  • Order write: :db step with a unique constraint or conditional upsert.
  • Notify: :notify step or approval if human confirmation is needed.
  • Long-running job: :ssh to start a remote worker if needed, then :wait for its callback token.
  • Finalize: mark completion, persist artifacts as resources, and return a compact result.

This flow can retry any step safely. The dedupe gate and conditional writes prevent repeats. The idempotency header protects the external charge. The wait and callback pattern avoids duplicate resumes.

What teams should look for

Choose tools and patterns that let you:

  • Attach a stable idempotency key to every request and event
  • Check and set dedupe state atomically before side effects
  • Pass idempotency to external APIs at the call site
  • Enforce uniqueness and conditional writes in your data layer
  • Add human approvals where business risk is high
  • Orchestrate long-running work with pause and resume, not long-held connections
  • Inspect run history to prove a duplicate was blocked
  • Control concurrency so runs do not overlap in unsafe ways

FAQ

Are retries and idempotency the same?

No. Retries help with transient failures. Idempotency makes retries safe by ensuring repeated runs do not repeat side effects.

What should I use as an idempotency key?

Use a stable unique value. Examples include a client-generated UUID, a payment intent id, or a webhook event id. The n8n thread shows using a Stripe event id as the key.

Where should I store dedupe state?

Use a durable store that supports atomic check-and-set. A database with a unique index or a KV store with set-if-not-exists both work. In Breyta, use :db or :kv steps to implement this gate near the start of the flow.

What if a platform retries without telling me?

Assume it will. Add keys and gates at your boundary. See the Azure note on unexpected retries and unique request IDs for why this is necessary.

Summary

Retries keep systems robust. Idempotency keeps them clean. Use stable keys, atomic gates, conditional writes, and safe retry policies to prevent duplicate workflow runs. Breyta gives you the workflow structure, primitives, waits, approvals, and run history to put these patterns into production with your coding agent.