Stateful Multi‑Agent Patterns: Practical Designs for Developer AI
By Chris Moen • Published 2026-02-26
Stateful multi-agent patterns are repeatable designs for how coding agents hold, share, and recover state across steps. This guide defines the patterns, shows when to use them in developer AI (planning, refactors, tests), and explains orchestration, memory, error handling, and guardrails.
Quick answer: What are stateful multi‑agent patterns?
Stateful multi‑agent patterns are repeatable designs that let multiple coding agents read and write shared, durable state across steps to achieve a goal. They make long‑running developer AI workflows reliable by structuring memory, coordination, and recovery.
- Common patterns: blackboard memory with scoped writes; supervisor–worker with role mailboxes; plan–act–reflect loops over a task graph; event‑sourced logs with reducers; checkpoint and resume with durable storage.
- Good use cases: multi‑file refactors, code planning and patch sets, API research and spec drafting, test‑driven changes with reviews.
Why state matters for developer AI
Developer agents work across many steps and artifacts. Without explicit state, they drift, repeat work, or lose context. With state, they produce coherent plans and diffs, use tools safely, and recover faster after failures.
Model the agent state
Make state first‑class: small, typed, versioned, and queryable. Define explicit read and write paths.
- Goal: objective, constraints, acceptance tests
- Plan: steps, owners, status, dependencies
- Facts: environment info, APIs, repo paths
- Artifacts: files, patches, test results
- Decisions: rationale, assumptions, risks
- Metrics: token budget, time, tool calls
Memory types that work
- Short‑term: the current context window for a step
- Working memory: scratchpad or blackboard for in‑progress notes
- Long‑term: vector or key‑value memory for facts and artifacts
- Structured memory: typed objects like Plan, Task, FileChange
- Episodic memory: an event log for replay and audit
Share state safely across agents
Give each agent a contract. Limit scope of reads and writes. Add review gates for risky changes.
- Blackboard store with namespaces per agent role
- Role mailboxes for messages and async handoff
- Pull‑based reads with queries; push‑based events on changes
- Write‑intent then commit after validation
Coordinate multi‑agent work
- Supervisor–worker with role prompts
- Planner–executor–critic loop
- Map–reduce style for parallel subtasks
- Chain of responsibility for escalations
- Stop conditions tied to tests or acceptance checks
Persist and retrieve memories
Storage options
- Key‑value for session and state blobs
- Document store for plans, patches, logs
- Vector store for semantic search over docs and code
- Graph store for dependencies between tasks and files
- Object storage for large artifacts
Retrieval tips
- Index by session, entity, and task ID
- Use hybrid search; use exact keys for critical facts
- Summarize and prune to control size
- Keep provenance and timestamps
Blackboard memory pattern: when to use it
A shared workspace where agents post facts and artifacts for others to build on. Use it when multiple roles collaborate on the same repo or task set.
- Code planning and patch sets
- API research and spec drafting
- Multi‑file refactors with tests
Plan–act–reflect loops: implementation steps
- Create a minimal plan with acceptance tests
- Execute a single task
- Reflect on results and risks
- Update plan and budgets
- Repeat until done or blocked
Execution harness for long‑running agents
Use a harness that tracks progress, enforces budgets, and checkpoints so agents can pause, resume, and recover.
- Step limits, timeouts, and token budgets
- Idempotent tool calls with request IDs
- Durable checkpoints after each tool call
- Event log with replay and audit
- Health pings and backoff on errors
Errors, retries, and idempotency
- Classify errors as transient, logic, or policy
- Retry transient errors with backoff; do not retry logic or policy failures
- Escalate logic errors to a supervisor
- Log all tool inputs and outputs
Test and evaluate stateful agents
Use fixtures and golden outputs. Run reproducible scenarios and track outcome, cost, and drift.
- Unit tests for tools and reducers
- Scenario tests for end‑to‑end tasks
- Perturbation tests for flaky inputs
- Regression tests for plans and patches
- Offline replays from event logs
Guardrails and governance
Recommended guardrails
- Allowlists for tools and repos
- Sandboxed file writes; diff‑only commits
- Lint, type, and test gates before merge
- PII redaction in logs
- Human‑in‑the‑loop for releases
Common pitfalls
- Letting the LLM store raw prompts as truth
- No plan object, only prose
- No checkpoints, so no recovery
- Unbounded memory growth
Scale to teams and projects
- One session ID per feature or ticket
- Entity memory per service or repo path
- Shard vector and KV stores by team
- Use mailboxes and queues for backpressure
- Capture metrics per agent and per tool
Tools and libraries that help
You can build with general cloud and database tools, plus libraries that support stateful, cyclic workflows. Teams often choose frameworks that model agent graphs and checkpoints. Serverless platforms can host tools and manage session state. Pick tools that match your data model and guardrail needs.
Introduce state into an existing agent
Start small: add a plan object, an event log, and a KV store. Then add checkpoints and budgets. Later, add a critic and a test gate.
Control memory growth
Prune and summarize. Score importance and keep only what you need. Keep raw logs in cold storage; keep hot memory small and typed.
Ensure reproducibility
Version prompts, tools, and models. Store seeds and config in the log. Checkpoint after each tool call and use replay to debug.
Orchestrating these patterns with Breyta
Breyta is a workflow and agent orchestration platform for coding agents. It is the workflow layer around the coding agent you already use, built for multi‑step automations, long‑running jobs, approval‑heavy flows, and agent orchestration. Breyta provides deterministic runtime behavior, explicit approvals and waits, versioned flow definitions, resource references, and clear run history. It can orchestrate local agents and VM‑backed agents over SSH, and offers an agent‑first CLI to build, run, and publish reliable workflows.
FAQ
What is the simplest way to start with state?
Add a single KV store for session state. Store the goal, plan, and last step. Then add checkpoints after each tool call.
Do I need many agents to get value?
No. You can get value with one agent and a clear state model. Add roles like a critic when you see gaps.
How do I choose between a blackboard and mailboxes?
Use a blackboard for shared artifacts such as diffs and tests. Use mailboxes when tasks are independent or need async handoff.
Should I use a vector store for all memory?
No. Use a vector store for semantic search over code and docs. Keep critical state in structured stores that support exact reads.
How do I prevent prompt drift across steps?
Pin a plan object and acceptance checks. Summarize context, not the source of truth. Reload facts from state at each step.