Retries and Idempotency: Building Resilient and Reliable Workflows

By Chris Moen • Published 2026-04-29

Learn how retries and idempotency work together to prevent duplicate actions and ensure consistent outcomes in your production workflows. Discover proven patterns and practical tips.

Breyta workflow automation

Retries reattempt work when calls fail or time out. Idempotency makes those retries safe so a repeated call produces the same outcome rather than a duplicate. Use both together to prevent double work and duplicate runs.

What these terms mean in practice

  • Retries: You rerun a step after a failure or timeout. That same step can run more than once. Without safeguards, it can repeat side effects like a second charge or a duplicate insert. The AWS Durable Execution guide notes that replay and retry can repeat side effects if you do not design for it (AWS).
  • Idempotency: A request or step can run once or many times and yield the same result. No extra charge. No extra record. This is the core idea behind preventing double actions (DZone).

Why this matters for production workflows

  • Networks fail. Clients retry. Workers restart.
  • Event delivery can be at-least-once.
  • Human-in-the-loop steps can bounce back for edits.
  • Without idempotency, you risk:
  • double charges and refunds
  • duplicate records and messages
  • spammy notifications
  • wasted spend on APIs and compute
  • With idempotency, retries become safe. You keep availability high and outcomes consistent.

Proven patterns to prevent duplicate runs

You rarely need only one. Mix patterns that fit your flow.

  • Idempotency keys on write requests
  • Accept a client-supplied key for each business action.
  • On duplicate keys, return the cached result instead of doing the work again.
  • This is the standard API approach (Zuplo guide, OneUptime Python).
  • Deterministic workflow IDs
  • Derive the workflow or run ID from the event’s unique ID.
  • If the same event arrives twice, you detect it and avoid starting a second run (prevent duplicate runs useworkflow.dev).
  • Dedupe logs and the outbox pattern
  • Keep a log keyed by a business identifier. Record completed side effects.
  • Before each side effect, check the log. If present, skip.
  • Use an outbox to persist intended effects once, then deliver with retries.
  • These are common agent and integration patterns (Medium patterns).
  • Leases and single-flight locks
  • Acquire a short lease per key before a side effect.
  • If you cannot get the lease, another worker is handling it. Back off and retry later.
  • Helps with concurrent duplicate triggers (Medium patterns).
  • Sagas with compensating actions
  • Break big writes into steps. If a later step fails, run a compensating step to undo prior effects.
  • This contains damage when exactly-once is not possible in the stack (Medium patterns, Orkes overview).
  • Safe tool design
  • Expose idempotent operations where possible. Reads and upserts are safer than inserts without keys.
  • Make non-idempotent tools accept a key and return the same result on repeats (Medium patterns).

Practical design tips for retries

  • Separate reads from writes. Retry reads freely. Guard writes with keys and checks.
  • Keep steps small and explicit. Fewer side effects per step are easier to protect.
  • Persist results you can return on duplicate keys. Cache final payloads for safe replays.
  • Use backoff and caps. Never retry forever. Keep retry windows inside your key retention period.
  • Log the business key with every run. Make audits fast and clean.

How this maps to agents and automations

Agents call tools. Tools fail. Agents retry. Without idempotency, a tool retry can send two invoices or post the same message twice. Patterns like keys, leases, and dedupe logs keep agent retries safe and quiet (Medium patterns).

How Breyta fits this use case

Breyta is a workflow and agent orchestration platform for coding agents. It gives teams a reliable runtime for multi-step work with clear history, approvals, waits, and versioned releases.

You can apply idempotency and retry safety in Breyta with these shapes:

  • Triggers and IDs
  • Use webhook or event triggers. Derive a deterministic run key from the event ID. Avoid spawning duplicates from the same trigger.
  • Guarded side effects
  • Before a write step, store or check a dedupe marker using steps like :kv or :db. If present, short-circuit and return the prior result.
  • Concurrency policy
  • Set a conservative concurrency policy for flows that touch the same resource. This cuts down overlapping side effects while you check keys in-flow.
  • Long-running agents
  • For VM or local agents, kick off work, then use a :wait and resume by callback. Place idempotency checks right before apply steps. This prevents duplicate applies if the agent reports back twice.
  • Approvals in the loop
  • Gate irreversible steps behind an approval workflow. Retries can still happen, but no side effect occurs until a human approves.
  • Inspectability
  • Use run history and step outputs to see if a dedupe check fired, what key was used, and which branch ran. Deterministic orchestration keeps behavior consistent across retries.
  • Versioned releases
  • Test your idempotency logic in draft. Inspect outputs. Promote to live when ready. Runs are pinned to the resolved release at start time, which keeps behavior stable.

Example flow shape in Breyta:

  • Trigger by webhook with an idempotency key from the client.
  • :kv get to check if the key already has a stored result reference.
  • If present, return the stored res:// reference.
  • If absent, perform the side effect in :http or :db.
  • :persist or store the result reference.
  • :kv set the key to the result reference for future duplicates.

This uses Breyta’s workflow structure to keep retries safe. It keeps state across waits. It makes duplicate runs predictable and harmless.

Common pitfalls to avoid

  • Keys that are too broad or too narrow. Scope by business action and subject.
  • Forgetting key retention. Keep a key as long as retries may arrive.
  • Side effects before checks. Always check first, then write.
  • Non-idempotent third parties. Wrap them with your own key and cache.
  • Missing audits. Log the key and outcome with every run.

FAQ

Do I still need retries if I design for idempotency?

Yes. Retries raise success rates. Idempotency makes those retries safe.

How long should I keep idempotency keys?

Keep them for at least the maximum retry and replay window in your system. Match your queues, clients, and worker policies.

Can I avoid duplicates without keys?

Sometimes. Deterministic workflow IDs and strict concurrency help. Still, keys and dedupe logs are the most direct guard for side effects.