Prevent Duplicate Runs: Idempotency & Retries for Resilient Workflows

By Chris Moen • Published 2026-05-11

Learn how to prevent duplicate workflow runs by combining retries with idempotency. Discover practical patterns like idempotency keys, conditional writes, and deterministic execution names to build resilient, exactly-once workflows.

Breyta workflow automation

The short answer

You prevent duplicate runs by pairing retries with idempotency. Use an idempotency key per logical operation, gate writes with conditions, and make execution names deterministic. Add a dedupe store and verify state before side effects. These patterns turn at-least-once delivery into exactly-once effects at the boundary.

What this means in practice

  • Retry: You re-attempt a request when the first attempt times out or fails. This is normal in real systems. As many note, retries are unavoidable and idempotency is what makes them safe.
  • Idempotency: The same operation, sent once or many times, produces one effect. You achieve this with application logic, not only transport rules.
  • Duplicate run: The same trigger or request kicks off the same workflow more than once. This happens with flaky networks, webhook retries, or client-side resubmits.

See examples that echo these ideas:

  • Idempotency keys stop duplicate calls. Conditional writes stop duplicate effects. Source: 7 patterns on Medium describing keys and CAS writes.
  • Deterministic execution names reduce duplicates in managed runtimes such as AWS Lambda durable execution docs.

Why it matters for production workflows

  • Networks drop responses. Clients retry and you may get two creates, two emails, or two charges if you do not guard the write.
  • Webhooks are at-least-once by design. Many providers resend events. Without dedupe, a single user action can create multiple runs.
  • Long-running jobs amplify risk. If a step times out, a retry can fork the flow unless you fence it with state checks.

These are common in distributed systems. Practical guidance on retry-safe orchestration appears in posts like Orkes on retry safety for workflows and Sketech on idempotency and circuit breakers.

Core patterns that prevent duplicate runs

Group 1. Identify the operation

  • Idempotency key. Generate a unique key for the logical operation. Carry it through the request, store, and response. If you see the same key again, return the recorded result instead of doing work. See the Medium piece on idempotency keys for agents.
  • Deterministic IDs. Derive the workflow execution ID from the external event ID or a stable composite, not a random UUID. This lets the platform treat duplicates as the same run when supported. AWS durable execution docs highlight this idea for Lambda. Cookbook advice from workflow tools says the same.

Group 2. Prevent duplicate effects

  • Conditional writes (CAS). Make writes conditional so only the first attempt can land. Example patterns:
  • Insert-if-not-exists on a unique key.
  • Update with a known version. If the version changed, reject and return the stored result.
  • Upsert with a status transition that only accepts valid previous states. The Medium patterns article calls out CAS as a core guard.
  • Inbox table per consumer. Record the event or idempotency key before any side effect. If it already exists, short-circuit.
  • Outbox for producers. Write the side effect to an outbox then deliver once. Retries re-deliver the same message with the same key.
  • Dedupe window. Keep a short TTL record of recent keys to absorb close-in duplicates caused by client retries or webhook backoffs.

Group 3. Make retries safe end to end

  • Check-before-send on external calls. Many APIs support idempotency keys. Use them. If not, protect calls with your own dedupe around a stable key.
  • Read-your-own-results. If the first call probably succeeded but the response was lost, fetch by key before sending again.
  • Notifications and emails. Include a stable message key. Providers often expose native dedupe or replacement by key.
  • Stateful steps. Persist progress after each durable step. On retry, skip completed steps and continue.

Group 4. Long-running jobs and callbacks

  • Token the handoff. When you start a remote job, mint a stable job token. Use it for the callback URL and for storage. If the kick-off step retries, the remote system sees the same token and does not start a new job.
  • Resume by key. When the callback arrives, search by token and resume a single waiting run. Discard late duplicates.

Community threads on webhook dedupe in workflow tools reflect this pattern.

Group 5. Conflict handling

  • Mismatched second request. If the same idempotency key appears with different payload, pick a rule. Many teams reject with 409 Conflict and return the original parameters. Others treat it as a new operation under a new key. The important part is to be explicit.

What teams should look for

  • Deterministic execution. Ability to pin runs to a stable ID.
  • Concurrency controls. A way to limit parallelism and prevent overlapping runs for a key.
  • Durable state. Step outputs and status stored durably so retries can skip already-completed work.
  • Native steps for checks. Easy reads and conditional writes against a KV, DB, or API.
  • Webhook and schedule triggers. With tooling to correlate duplicate events to the same run.
  • Long-running support. Waits and external callbacks with stable tokens and resumes.
  • Clear run history. Inspect what happened, when, and with which key to debug issues.

How Breyta fits this use case

Breyta is a workflow and agent orchestration platform for coding agents. It runs multi-step flows with deterministic execution, explicit state, and versioned releases. Here is how it supports retry and idempotency patterns:

  • Deterministic orchestration inside the flow. Flows are versioned and runs are pinned to a resolved release at start. That keeps behavior stable while you debug duplicates.
  • Triggers with control. Use manual, schedule, or webhook triggers. For event triggers, derive or pass a stable execution identifier from the event so replays target the same run.
  • Concurrency policy. Each flow has an explicit concurrency policy. Use it to prevent overlapping runs for a given key or scope.
  • Durable steps and state. You can persist outputs, use resources for large artifacts, and inspect step results in run history. Retries can skip work that already completed.
  • Waits, approvals, and callbacks. Use wait steps and approvals to fence final side effects. The long-running remote-agent pattern lets you kick off work over SSH, pause with a wait, then resume on callback. This reduces timeout-driven duplicate starts.
  • Data guards with built-in steps. Use :kv or :db steps to implement an inbox table or a “check-and-set” write. First write claims the key. Later retries read and return the stored result.
  • Secret and connection boundaries. Flows reference connections instead of raw secrets. That keeps dedupe logic and keys cleanly separated from credential handling.
  • Agent-first CLI. The CLI returns stable JSON, so your coding agent can set keys, check state, and control retries reliably.

Safe positioning reminders from the product:

  • Use Breyta as the workflow layer around your coding agent.
  • Orchestrate local runs, VM-backed agents over SSH, and approval-heavy flows with clear state and history.

Practical blueprints

  • Single-charge API
  • Generate an idempotency key on the client.
  • Start a Breyta flow with that key.
  • First step writes the key to a KV with created status using CAS semantics.
  • Side-effect step charges the card and persists a resource ref to the receipt.
  • On retry, read status by key and return the existing receipt. See DZone’s overview of idempotency for the double-charge risk context.
  • Webhook-driven sync
  • Use the provider’s event ID as the flow execution ID.
  • First step checks an inbox table for that event ID.
  • Process only if not seen. Upsert the record with a state transition.
  • Emit notifications after state confirms the transition.
  • If the webhook retries, the inbox short-circuits. Threads like the n8n community discuss this exact pitfall.
  • Long-running code generation on a VM
  • Flow kicks off a remote agent over SSH with a stable job token.
  • Flow pauses on a wait step.
  • Remote agent calls back with the token. The flow resumes, verifies token state, and requests human approval before applying changes.
  • If the kick-off step retries, the remote side sees the same token and refuses to duplicate the job.

Implementation tips

  • Choose your key format early. Include tenant, operation, and a stable natural key.
  • Bound your dedupe window. Keep a TTL if natural keys can repeat over time.
  • Log and expose the key. Make it easy to trace across systems and runs.
  • Decide the policy for conflicting payloads. Reject with 409 or treat as new. Stay consistent.
  • Push side effects late. Validate, persist intent, then act. Retries will be safer.

FAQ

What if I cannot control the client to send an idempotency key?

Derive a deterministic key from the payload or source event. Combine stable fields like event ID, user ID, and a timestamp bucket. Store and compare. If a natural key exists in your domain, prefer that.

How do I handle steps that are not idempotent by nature?

Wrap them with checks. Before calling a non-idempotent API, read by key to see if a prior call completed. If the provider supports its own key, send it. If not, store your own record and only call once per key.

How do I balance throughput with dedupe?

Use a per-key concurrency fence. Allow broad parallelism across keys, but serialize work for the same key. Breyta’s explicit concurrency policy helps you make that trade-off per flow.

Do I still need retries if I build idempotency?

Yes. Retries and idempotency complement each other. Retries mask transient failures. Idempotency prevents duplicate effects when those retries occur, which aligns with the guidance that retries are mandatory and idempotency is what keeps them safe.

Where should I store large artifacts for replay?

Persist large outputs as resources and pass compact references downstream. Breyta supports resource refs so runs stay inspectable without bloating state.

Closing thought

You cannot stop retries. You can make them safe. Use keys, conditional writes, deterministic IDs, and durable waits. Then use Breyta’s structure to run those patterns with clear history, approvals, and reliable orchestration for reliable workflows.