Stateful Multi‑Agent Patterns: Practical Designs for Developer AI

By Chris Moen • Published 2026-02-26

Stateful multi-agent patterns are repeatable designs for how coding agents hold, share, and recover state across steps. This guide defines the patterns, shows when to use them in developer AI (planning, refactors, tests), and explains orchestration, memory, error handling, and guardrails.

Quick answer: What are stateful multi‑agent patterns?

Stateful multi‑agent patterns are repeatable designs that let multiple coding agents read and write shared, durable state across steps to achieve a goal. They make long‑running developer AI workflows reliable by structuring memory, coordination, and recovery.

Common patterns: blackboard memory with scoped writes; supervisor–worker with role mailboxes; plan–act–reflect loops over a task graph; event‑sourced logs with reducers; checkpoint and resume with durable storage.

Good use cases: multi‑file refactors, code planning and patch sets, API research and spec drafting, test‑driven changes with reviews.

Why state matters for developer AI

Developer agents work across many steps and artifacts. Without explicit state, they drift, repeat work, or lose context. With state, they produce coherent plans and diffs, use tools safely, and recover faster after failures.

Model the agent state

Make state first‑class: small, typed, versioned, and queryable. Define explicit read and write paths.

Goal: objective, constraints, acceptance tests

Plan: steps, owners, status, dependencies

Facts: environment info, APIs, repo paths

Artifacts: files, patches, test results

Decisions: rationale, assumptions, risks

Metrics: token budget, time, tool calls

Memory types that work

Short‑term: the current context window for a step

Working memory: scratchpad or blackboard for in‑progress notes

Long‑term: vector or key‑value memory for facts and artifacts

Structured memory: typed objects like Plan, Task, FileChange

Episodic memory: an event log for replay and audit

Share state safely across agents

Give each agent a contract. Limit scope of reads and writes. Add review gates for risky changes.

Blackboard store with namespaces per agent role

Role mailboxes for messages and async handoff

Pull‑based reads with queries; push‑based events on changes

Write‑intent then commit after validation

Coordinate multi‑agent work

Supervisor–worker with role prompts

Planner–executor–critic loop

Map–reduce style for parallel subtasks

Chain of responsibility for escalations

Stop conditions tied to tests or acceptance checks

Persist and retrieve memories

Storage options

Key‑value for session and state blobs

Document store for plans, patches, logs

Vector store for semantic search over docs and code

Graph store for dependencies between tasks and files

Object storage for large artifacts

Retrieval tips

Index by session, entity, and task ID

Use hybrid search; use exact keys for critical facts

Summarize and prune to control size

Keep provenance and timestamps

Blackboard memory pattern: when to use it

A shared workspace where agents post facts and artifacts for others to build on. Use it when multiple roles collaborate on the same repo or task set.

Code planning and patch sets

API research and spec drafting

Multi‑file refactors with tests

Plan–act–reflect loops: implementation steps

Create a minimal plan with acceptance tests

Execute a single task

Reflect on results and risks

Update plan and budgets

Repeat until done or blocked

Execution harness for long‑running agents

Use a harness that tracks progress, enforces budgets, and checkpoints so agents can pause, resume, and recover.

Step limits, timeouts, and token budgets

Idempotent tool calls with request IDs

Durable checkpoints after each tool call

Event log with replay and audit

Health pings and backoff on errors

Errors, retries, and idempotency

Classify errors as transient, logic, or policy

Retry transient errors with backoff; do not retry logic or policy failures

Escalate logic errors to a supervisor

Log all tool inputs and outputs

Test and evaluate stateful agents

Use fixtures and golden outputs. Run reproducible scenarios and track outcome, cost, and drift.

Unit tests for tools and reducers

Scenario tests for end‑to‑end tasks

Perturbation tests for flaky inputs

Regression tests for plans and patches

Offline replays from event logs

Guardrails and governance

Recommended guardrails

Allowlists for tools and repos

Sandboxed file writes; diff‑only commits

Lint, type, and test gates before merge

PII redaction in logs

Human‑in‑the‑loop for releases

Common pitfalls

Letting the LLM store raw prompts as truth

No plan object, only prose

No checkpoints, so no recovery

Unbounded memory growth

Scale to teams and projects

One session ID per feature or ticket

Entity memory per service or repo path

Shard vector and KV stores by team

Use mailboxes and queues for backpressure

Capture metrics per agent and per tool

Tools and libraries that help

You can build with general cloud and database tools, plus libraries that support stateful, cyclic workflows. Teams often choose frameworks that model agent graphs and checkpoints. Serverless platforms can host tools and manage session state. Pick tools that match your data model and guardrail needs.

Introduce state into an existing agent

Start small: add a plan object, an event log, and a KV store. Then add checkpoints and budgets. Later, add a critic and a test gate.

Control memory growth

Prune and summarize. Score importance and keep only what you need. Keep raw logs in cold storage; keep hot memory small and typed.

Ensure reproducibility

Version prompts, tools, and models. Store seeds and config in the log. Checkpoint after each tool call and use replay to debug.

Orchestrating these patterns with Breyta

Breyta is a workflow and agent orchestration platform for coding agents. It is the workflow layer around the coding agent you already use, built for multi‑step automations, long‑running jobs, approval‑heavy flows, and agent orchestration. Breyta provides deterministic runtime behavior, explicit approvals and waits, versioned flow definitions, resource references, and clear run history. It can orchestrate local agents and VM‑backed agents over SSH, and offers an agent‑first CLI to build, run, and publish reliable workflows.

FAQ

What is the simplest way to start with state?

Add a single KV store for session state. Store the goal, plan, and last step. Then add checkpoints after each tool call.

Do I need many agents to get value?

No. You can get value with one agent and a clear state model. Add roles like a critic when you see gaps.

How do I choose between a blackboard and mailboxes?

Use a blackboard for shared artifacts such as diffs and tests. Use mailboxes when tasks are independent or need async handoff.

Should I use a vector store for all memory?

No. Use a vector store for semantic search over code and docs. Keep critical state in structured stores that support exact reads.

How do I prevent prompt drift across steps?

Pin a plan object and acceptance checks. Summarize context, not the source of truth. Reload facts from state at each step.