Multi-Agent Dispatch — Right Model for the Right Job

The Idea

Three Tiers of Cognition

Not every task needs your most expensive model. Session digests don't need architectural reasoning. Code search doesn't need creative synthesis. Matching the model to the cognitive demand saves tokens and often produces better results — specialized agents with narrow focus outperform generalists.

DISPATCHER (your main Claude session — Opus) │ ├── "Digest this session log" │ └── Haiku agent — mechanical, fast, cheap │ ├── "Search past sessions for X" │ └── Sonnet agent — synthesis, judgment, cross-referencing │ ├── "Review this PR for security issues" │ └── Opus agent — deep reasoning, multi-module analysis │ └── "Index this codebase" + "Check infra health" ├── Haiku agent 1 — indexing (parallel) └── Haiku agent 2 — health checks (parallel)

Tier	Model	Best for	Examples
Execution	Haiku	Mechanical, fast, single-purpose tasks	Session digests, log indexing, health checks, wiki page updates, implementation logging
Synthesis	Sonnet	Cross-referencing, judgment, moderate complexity	Issue triage, PR review, session oracle search, project status reports, debugging investigation
Complexity	Opus	Deep reasoning, multi-module architecture, security	Spec design, security audit, complex refactors, architectural decisions

Agent Definitions

Defining Agents

Agents are markdown files in ~/.claude/agents/. Each file defines the agent's name, model, description, and instructions. Claude Code auto-discovers them.

~/.claude/agents/session-digest.md

---
name: session-digest
description: "Reads JSONL transcripts, writes summaries to vault and mem0."
model: haiku
---

# Session Digest Agent

You process Claude Code JSONL session transcripts into DocVault
daily digest entries and mem0 memories. You run in isolation
because session logs can be large.

## Step 1: Extract conversation
Run: python3 ~/.claude/scripts/extract-jsonl-session.py "$JSONL_FILE"

## Step 2: Write summary
200-300 word prose summary with actor attribution and wikilinks.

## Step 3: Save to DocVault + mem0
...

The model: haiku frontmatter tells Claude Code to run this agent on Haiku instead of the current session's model. The agent runs in isolation — its context is separate from your main session, so large search results or log content don't eat your primary context window.

The agent matrix

Agent	Model	Dispatched by
session-digest	Haiku	legacy — replaced by session-rag MCP
health-monitor	Haiku	`/prime` at session start (background)
index-monitor	Haiku	`/prime` at session start (background)
implementation-logger	Haiku	`/spec` Phase 4 after each task
wiki-page-updater	Haiku	`/wiki-update` — one per page, parallel
firecrawl-scraper	Haiku	`/web-search`, `/discover`
vault-linker	Haiku	After issue creation
session-oracle	Sonnet	session-rag MCP provides this natively now (legacy agent)
code-oracle	Sonnet	`/code-oracle` for codebase questions
issue-triage	Sonnet	`/issue-triage`
project-manager	Sonnet	`/weekly-standup`, `/after-action`, `/update-roadmap`
pr-discover	Sonnet	`/pr-resolve` (gather phase)
pr-thread-resolver	Sonnet	`/pr-resolve` (one per thread, parallel)
systematic-debugger	Sonnet	`/systematic-debugging`
prime-status	Sonnet	`/prime` (status gathering)

Patterns

Dispatch Patterns

1. Parallel dispatch

When tasks are independent (no shared state, no file overlap), dispatch multiple agents simultaneously. Claude Code runs them concurrently and returns results as they complete.

# /prime dispatches TWO agents in parallel at session start:
Agent 1 (Haiku): index-monitor — re-index code in vector DB
Agent 2 (Haiku): health-monitor — check all API/infra endpoints

# Both run in background while you read the status report
# Results arrive independently — no blocking

2. Gather-then-fan-out

One agent gathers context, then multiple agents resolve individual items in parallel. Used for PR review where one agent reads all threads, then N agents fix each thread independently.

# /pr-resolve uses gather-then-fan-out:
Phase 1: pr-discover (Sonnet) — reads all review threads, Codacy findings
Phase 2: N × pr-thread-resolver (Sonnet) — one per thread, in parallel
         Each fixes code or drafts a reply independently

3. Isolation for context protection

Agents run in their own context windows. This is the key architectural insight — when a code-oracle agent searches through 50 files, those file contents stay in the agent's context, not yours. The agent returns a synthesized 200-word report instead of 50K tokens of raw search results.

# Without isolation:
You: "search for price extraction code"
Claude: [reads 50 files into YOUR context → burns 40K tokens]

# With isolation:
You: "search for price extraction code"
Claude: [dispatches code-oracle agent → agent reads 50 files
         in ITS context → returns 200-word summary to you]

The isolation principle

Every agent that touches large datasets (search results, log files, API responses) should run in isolation. The main session's context is precious — don't contaminate it with raw data that can be summarized by a cheaper model.

Multi-Model

Beyond Subagents: Six CLIs, One Stack

The dispatch pattern extends beyond Claude's model tiers. Six CLIs share the same MCP infrastructure — memory, secrets, code search, quality gates, and the specflow workflow engine. The model is just the brain. The infrastructure is the nervous system.

CLI	Model	Strengths	Use for
Claude Code	Opus 4.6	Complex reasoning, multi-module, all tasks	Architecture, spec design, orchestration
OpenCode	75+ models	Multi-model runtime switching, open source	Flexible sessions, model comparison, backup CLI
Kimi CLI	Kimi K2.5	262K context, native Claude skill support	Coding tasks, large-context work
claude-glm	GLM 4.7/5.1	Strong coding, spec task execution	Implementation, PR work, experimental
Codex CLI	GPT-5.4	Fast mechanical work, clear unambiguous specs	Single-file patches, reviews, rescue
Gemini CLI	3.1 Pro	Good all-rounder, 1M context	Documentation, batch digests

One MCP stack serves all

Every CLI connects to the same infrastructure — no per-tool silos:

specflow

Workflow orchestrator — specs, tasks, approvals. The shared protocol.

mem0

Cross-session memory. Every CLI reads and writes the same memories.

claude-context

Semantic code search (Milvus). One index, all agents query it.

infisical

Self-hosted secrets. API keys, tokens — no plaintext in configs.

Shared workspace, separate instructions

All six CLIs read from the same git repos and MCP servers. The spec documents (tasks.md) serve as the shared protocol — any agent can pick up a task, implement it, and log the result. Implementation logs prevent duplicate work.

# Claude plans, others implement:
1. Claude creates issue + spec (requirements, design, tasks)
2. Switch to GLM terminal: claude-glm → /spec STAK-498
3. GLM implements tasks, logs artifacts via specflow MCP
4. Switch back to Claude for code review + merge

# Parallel implementation across agents:
1. GLM implements Tasks 1-3 in worktree-a
2. Codex implements Tasks 4-6 in worktree-b
3. Both log artifacts — specflow prevents duplicate work

# Or use OpenCode with model switching:
1. Start in OpenCode, pick Claude for architecture (Ctrl+O)
2. Switch to GLM for implementation (Ctrl+O)
3. Switch to Kimi K2.5 for a different perspective (Ctrl+O)
4. All in one terminal, same MCP stack throughout

Getting Started

Create Your First Agent

1. Create the agents directory

mkdir -p ~/.claude/agents

2. Define an agent

# Create a simple summarization agent on Haiku
cat > ~/.claude/agents/summarizer.md << 'EOF'
---
name: summarizer
description: "Summarizes files or search results into concise reports."
model: haiku
---

# Summarizer Agent

You receive a topic and a list of files or search results.
Produce a 200-word summary covering: key findings, decisions
made, and open questions. Return structured output.

## Rules
- Maximum 200 words
- No raw data — only synthesized insights
- Include file paths for key findings
EOF

3. Dispatch it

Claude Code auto-discovers agents in ~/.claude/agents/. You can dispatch them via the Agent tool or through skills that reference them.

Prompt for Claude: Build an agent system

I want to create a multi-agent system for my Claude Code workflow. Help me define agents for:

1. Session digest (Haiku) — processes session transcripts into summaries
2. Code search (Sonnet) — searches codebase and returns synthesized findings
3. Health check (Haiku) — pings my infrastructure endpoints and reports status

Each agent should run in isolation (so large results don't eat my main context). Define them as ~/.claude/agents/{name}.md files with frontmatter specifying the model tier. Include clear instructions for what the agent receives, what it does, and what it returns.

Codex Plugin

Codex Reviews Inside Claude Code

The openai/codex-plugin-cc plugin lets you dispatch work to GPT-5.4 (Codex) directly from Claude Code. It shares your auth, config, and repo checkout — no separate terminal needed for reviews.

Three review modes

/codex:review

Standard code review of uncommitted or branch changes. Reads the diff, returns findings. Review-only — never applies fixes.

/codex:adversarial-review

Challenges the approach itself. Questions design choices, assumptions, tradeoffs, and where the design could fail under real-world conditions.

/codex:rescue

Delegates investigation or a fix to Codex. Runs as a background task — fire and forget, check back with /codex:status.

How it fits into the workflow

# After implementing a feature, get a second opinion:
> /codex:review --background

# Before merging, pressure-test the design:
> /codex:adversarial-review --background
# "Is this the right approach? What assumptions could break?"

# Stuck on a bug? Hand it off:
> /codex:rescue investigate why the price poller returns stale data
# Codex runs in background, you keep working
# Check back: /codex:status → /codex:result

The key insight: Codex reviews Claude's work, not the other way around. Claude writes the code (it has the full spec context), Codex reviews it with fresh eyes. Two different models catching different classes of issues — Claude tends to over-engineer, Codex catches unnecessary complexity. Codex misses architectural context, Claude catches integration risks.

Install

claude plugin add openai/codex-plugin-cc — then /codex:setup to verify auth and connectivity. Requires the Codex CLI installed locally (npm i -g @anthropic-ai/codex).

Advanced

Cross-Agent Spec Handoff

The spec documents are the handoff protocol. Any agent that can read markdown and call MCP tools can pick up a spec at any phase. This enables a workflow where different models handle different phases based on their strengths.

Cross-Agent Spec Lifecycle Claude (Opus) — creates issue, writes requirements + design │ (best at: understanding intent, architecture) ▼ Claude (Opus) — reviews design, writes tasks.md │ (best at: breaking down complexity) ▼ Switch iTerm tab → Any agent can pick up tasks from here: ├── GLM (claude-glm) — implements tasks (tested, strong coder) ├── Kimi (kimi CLI) — implements tasks (262K context) ├── Codex (GPT-5.4) — implements tasks (mechanical, fast) └── OpenCode (any) — pick model at runtime (Ctrl+O) ▼ Claude (Opus) — /codex:review, merge, /wrap

How handoff works

There's no custom IPC or message passing. The spec files (requirements.md, design.md, tasks.md) sit in the repo's .spec-workflow/ directory. When you switch terminals:

# In Codex terminal:
codex "@spec STAK-498"
# Codex reads tasks.md, implements next pending task, logs artifacts

# In GLM terminal (Kimi or OpenCode work the same way):
claude-glm   # then: /spec STAK-498
kimi         # then: /spec STAK-498
opencode     # then: /spec STAK-498

# Every agent reads the same spec, sees what's [ ] vs [x]
# Implements the next pending task, logs artifacts
# Implementation logs prevent duplicate work

# In Claude terminal — review and merge:
/spec STAK-498
# Reads spec status, sees what was completed, reviews, merges

Implementation logging prevents conflicts

Every agent calls log-implementation after completing a task. The log records which files were modified, which API endpoints were created, and which components were built. The next agent reads these logs before starting — so it won't create a duplicate endpoint or reimplement a utility that another agent already built.

Session memory bridges the gap

The session memory pipeline ensures context transfers between agents. When Claude runs /wrap, it saves a digest to mem0 and the Obsidian vault. When any other CLI reads mem0, it gets the same memories. Same project tag, same mem0 account — cross-agent continuity across all six CLIs.

Start simple

You don't need six CLIs on day one. Start with Claude Code + /codex:review for second-opinion reviews. Add opencode as a backup CLI with model flexibility. Add Kimi or GLM when you want to delegate spec tasks. The MCP stack is the foundation — once it's set up, adding a new CLI is just copying a config file.