SpecFlow — Spec-driven development with persistent project memory and code intelligence

The problem with AI-assisted development

Every session starts from zero. Three failure modes compound over time.

🧊

Cold starts

The agent has no memory of yesterday. It doesn't know what it built, what broke, or what conventions you agreed on. You re-explain everything, every time.

💥

Context flooding

You stuff everything into CLAUDE.md — architecture, conventions, decisions, infrastructure. The file balloons. Token waste goes up. Attention quality goes down.

📉

No lifecycle

Agents jump straight to code. No requirements phase. No design review. No approval gates. The result: rework, scope creep, and features that don't match intent.

Four systems, one workflow

SpecFlow structures how work gets done. DocVault structures what the agent knows. Code Context gives it semantic understanding of your code. Together, they replace the bloated CLAUDE.md + ad-hoc prompting pattern.

SpecFlow — MCP Server

Spec-driven lifecycle

Requirements → Design → Tasks → Implementation with dashboard approvals at every gate. Works with Claude Code, Gemini CLI, and Codex CLI — any MCP-compatible agent.

DocVault — Obsidian Vault

Cross-project knowledge base

One Obsidian vault serves 8+ repos. Architecture, infrastructure, decisions, issues — all in structured markdown with graph visualization and wikilinks.

Code Context — Semantic Search

Vector-indexed code intelligence

Self-hosted Milvus vector database indexes every codebase. Agents search code by meaning, not just keywords. Forked from Zilliz, hardened with timeouts and stability fixes.

Skill System — 60+ Skills

Procedural knowledge routing

CLAUDE.md stays tiny — a routing table to skills. Each skill encodes a full workflow: debugging, deployment, PR resolution, infrastructure management.

Multi-agent, one workflow

SpecFlow speaks MCP — the open protocol for agent-tool communication. Any agent that supports MCP gets the full spec lifecycle, shared knowledge base, and code intelligence.

Claude Code

Plugin marketplace

One-click install via the Claude Code marketplace. Full skill system with 60+ workflows, session lifecycle, and parallel subagent dispatch.

Gemini CLI

Manual MCP config

Add SpecFlow as an MCP server in Gemini’s settings. Full access to spec tools, DocVault, and the approval dashboard. Uses GEMINI.md for agent-specific behavior.

Codex CLI

Manual MCP config

Configure in Codex’s TOML config. Same MCP tools, same workflow, same dashboard. Uses CODEX.md for agent-specific behavior.

All three agents share the same DocVault knowledge base, spec workflow state, and dashboard — work started in one agent can be continued in another.

        Cross-Agent Spec Handoff
      
Claude Code                         Discovery & Design
  /issue create                      → SWF-65 created in DocVault
  /discover SWF-65                   → Discovery brief with research
  /spec SWF-65                       → Requirements → Design → awaiting approval

Codex CLI                            Task Generation
  @spec resume                       → Reads spec from disk, picks up at Phase 3
                                      → Generates tasks.md → awaiting approval

Gemini CLI                           Implementation
  @spec resume                       → Reads spec from disk, picks up at Phase 4
                                      → Implements tasks → commits → logs artifacts

Spec state lives on disk in .spec-workflow/specs/ — not in any agent's memory.
The dashboard tracks progress regardless of which agent is driving.

Three-tier memory architecture

Not everything belongs in one file. Each tier has a purpose, a source of truth ranking, and a different retrieval cost.

Tier 1

DocVault — Ground truth

Human-curated Obsidian vault. Architecture, infrastructure topology, data models, deployment procedures. Version-controlled. Wins all tie-breaking conflicts. Read via Grep, Glob, or semantic search.

Tier 2

File Memory — Session context

Project-scoped markdown files at ~/.claude/projects/*/memory/. Agent-curated, human-reviewable. Loaded at session start. Stores user preferences, feedback, project state, and external references.

Tier 3

mem0 — Episodic recall

Cloud-based semantic memory. Automatically populated from session digests. Probabilistic retrieval via embedding similarity. Never authoritative — DocVault always wins. Best for past decisions, operational context, and gap-filling.

▼ conflict resolution: newest wins, highest tier breaks ties

Code intelligence, self-hosted

Agents shouldn't grep blindly through your codebase. Code Context gives them semantic understanding — search by meaning, not string matching. Backed by a self-hosted Milvus vector database. Embedding generation uses a cloud API or local model via Ollama.

Semantic Code Search

Search by meaning

Ask "find the authentication middleware" and get results even if the code never uses the word "auth." Vector embeddings capture semantic relationships across your entire codebase.

Self-Hosted Milvus

No collection limits

Cloud vector databases cap you at 4 collections on free tiers. Self-hosted Milvus on your own infrastructure gives unlimited project indices with full data sovereignty.

Hardened Fork

Production-stable

Forked from Zilliz's MCP server with critical fixes: 30s fetch timeouts, gRPC connection guards, and pinned npm versions. No more silent upstream breakage pulling broken builds into your sessions.

Persistent Indices

Index once, search forever

Code indices persist across sessions in Milvus. No re-indexing on every cold start. Incremental updates catch only what changed since the last index run.

        Code search tiers — cheapest first
      
# Tier 1: Structural graph (callers, imports, dead code)
code-graph-context  → Neo4j — "what calls this function?"

# Tier 2: Semantic search (meaning-based)
code-context        → Milvus — "find code related to payment processing"

# Tier 3: Literal search (exact strings)
grep / glob         → Filesystem — "find all files matching *.test.ts"

# Tier 4: Deep analysis (only when tiers 1-3 leave gaps)
code-oracle agent  → Subagent — combines all sources + AI reasoning

Spec workflow lifecycle

Every non-trivial feature follows the same path. Approvals required at each gate. No phase-skipping.

Phase 0

/chat

→

Phase 1

/discover

→

Phase 2

/spec

→

Phase 3

Design

→

Phase 4

Implement

→

Phase 5

/retro

        Session lifecycle
      
# Explore an idea without committing to code
/chat "what if we added real-time price alerts?"

# Structured brainstorm with research agents
/discover STAK-42
  Dispatching research agents...
  Discovery Brief generated with all open questions resolved

# Create spec: Requirements → Design → Tasks
/spec STAK-42
  Requirements approved ✓  Design approved ✓  Tasks generated ✓

# Implementation with parallel subagents
  Task 1/4: WebSocket server      ■■■■■■■■■■ done
  Task 2/4: Alert rules engine     ■■■■■■■■■■ done
  Task 3/4: Push notification API   ■■■■■■■■■■ done
  Task 4/4: Frontend alert panel    ■■■■■■■■■■ done

# Extract lessons, save to mem0
/retro
  3 prescriptive lessons saved

Every session learns from the last

Most AI tools treat each session as isolated. SpecFlow creates a continuous learning loop: sessions end with knowledge extraction, and new sessions start pre-loaded with everything that matters.

        The continuous context lifecycle
      
┌─────────────────────────────────────────────────────────┐
│              THE CONTINUOUS LEARNING LOOP                │
└─────────────────────────────────────────────────────────┘

Morning — /prime boots you up
  Indexes codebase (Code Context + Code Graph)
  Reads recent session digests from DocVault
  Pulls relevant memories from mem0
  Checks open issues + git status
  Presents: "Here's where you left off, here's what's next"

During the day — sessions compound
  Each session has full context from /prime
  Work produces commits, specs, implementation logs
  /handoff relays state between terminal sessions

End of session — /wrap orchestrates everything
/wrap
  Step 1: Cleanup (stale branches, uncommitted work)
  Step 2: /vault-update (sync documentation)
  Step 3: /retro
    Extracts prescriptive lessons from conversation
    Saves to mem0 as structured retro-learning memories
  Step 4: /digest-session
    Reads JSONL session transcripts
    Summarizes via configurable LLM (local Ollama or cloud)
    Writes daily digest to DocVault/Daily Digests/
    Saves key facts to mem0 for cross-session recall

On-demand — /audit checks project health
/audit
  Code quality + security scan
  Documentation drift detection
  Issue staleness check
  Actionable remediation report

┌─────────────────────────────────────────────────────────┐
│  /prime ──→ work ──→ /audit ──→ /wrap ──→ /prime ──→ .. │
│  Tomorrow's /prime reads today's digest + retro lessons │
└─────────────────────────────────────────────────────────┘

/prime

Fast session quick-start

Boots in ~15 seconds: indexes code, reads digests, pulls mem0 memories, checks issues and git status. Optional --deep mode for thorough codebase analysis. The agent starts every session knowing what happened yesterday.

/wrap

End-of-session orchestrator

One command to close a session cleanly: cleanup stale branches, sync documentation via /vault-update, extract prescriptive lessons via /retro, then process session logs via /digest-session. Replaces the deprecated /goodnight and /digest-session skills.

/audit

On-demand project health check

Scans code quality, security posture, documentation drift, and issue staleness. Produces an actionable remediation report. Run anytime — before a release, after a sprint, or when things feel off.

/retro

Prescriptive lesson extraction

Not "what did we do" but "what should we do differently." Extracts actionable rules from the conversation and saves them as structured memories. Future sessions apply these lessons automatically.

One vault, all projects

DocVault is a single Obsidian vault that serves every project. No per-repo scaffolding. No duplicate context files. Infrastructure docs sit alongside application architecture.

DocVault/ Projects/ StakTrakr/ Overview.md, Architecture.md, Data Model.md, API.md... HexTrackr/ Overview.md, Architecture.md, Database.md, WebSocket.md... spec-workflow-mcp/ Overview.md, Tools & Prompts.md, Dashboard.md... Infrastructure/ Host Inventory.md ← every IP, port, and DNS record Stack Registry.md ← 26 Docker stacks Portainer.md, Proxmox.md, Cloudflare.md, NPM.md... Architecture/ Methodology.md, Memory Pipeline.md, Skill Matrix.md... Daily Digests/ StakTrakr/ 2026-03-29.md, 2026-03-28.md... ← auto-generated Templates/ requirements.md, design.md, tasks.md, steering.md... Issues.base ← vault-based issue tracking

How it compares

Different approaches to the same problem: giving AI agents persistent project context.

Dimension	SpecKit	BMAD	GSD	Taskmaster	mex	Pimzino	SpecFlow
Approval gates	None	Advisory	UAT phase	None	None	Dashboard (blocking)	Dashboard + skill enforcement
Memory	constitution.md	Git docs	STATE.md	tasks.json	Scaffold files	Steering docs	3-tier: DocVault + file + mem0
Session learning	None	None	None	None	GROW loop	None	/prime → /audit → /wrap
Code search	None	None	None	None	None	None	Semantic + structural
Multi-project	Per-repo	Per-repo	Per-repo	Per-repo	Per-repo	Per-repo	One vault, all repos
Infrastructure	Code only	Code only	Code only	Code only	Code only	Code only	Docker, DNS, VMs, proxies
Drift detection	None	None	None	None	8 checkers	None	/vault-update gate
Multi-tool	Any AI tool	Any AI tool	Claude focused	Any AI tool	4 tools	Claude + MCP	Claude + MCP ecosystem
Self-hosted	Files only	Files only	Files only	Files only	Files only	Node.js	Milvus, Neo4j, Ollama
Best for	Quick adoption	Enterprise teams	Solo context eng.	PRD pipelines	Per-repo memory	Structured workflow	Multi-project, high-governance

What ships in the box

SpecFlow is an MCP server that works with any MCP-compatible agent — Claude Code, Gemini CLI, and Codex CLI verified. DocVault is an Obsidian vault. Code Context is a semantic search engine. Together, they replace the bloated instruction file + ad-hoc prompting pattern.

        MCP Tools & Prompts
      
Tools (6)
  spec-workflow-guide    Lifecycle orchestration instructions
  steering-guide         Steering document creation guidance
  spec-status            Phase, completion, implementation audit
  approvals              Dashboard approve/reject/request changes
  log-implementation     Record artifacts (functions, endpoints, tests)
  spec-list              List all specs with status

Prompts (10)
  create-spec            Create requirements, design, or tasks doc
  create-steering-doc    Create product/tech/structure steering
  implement-task         Dispatch implementation to subagents
  refresh-tasks          Re-sync task state from spec files
  wrap                   End-of-session orchestrator (cleanup, docs, retro, digest)
  prime                  Fast session quick-start (~15s, optional deep mode)
  audit                  On-demand project health check
  + 3 injection prompts  Context injection for guides

        Dashboard — localhost:5051
      
Real-time web UI for spec management

  ✓ View all specs across projects with phase indicators
  ✓ Approve, reject, or request changes at each gate
  ✓ Track implementation progress per task
  ✓ Implementation log audit trail
  ✓ Auto-starts with MCP server initialization

Parallel subagent dispatch

The biggest architectural difference from other spec workflows: tasks don't execute sequentially in one session. Each task runs through an isolated three-stage pipeline, and tasks with zero file overlap execute concurrently.

        Subagent pipeline — per task
      
# Each task runs through 3 isolated stages

Orchestrator reads task from spec
  → Implementer Agent     fresh context, writes code, tests, commits
  → Compliance Reviewer    reads actual code vs requirements
  → Quality Reviewer      architecture, error handling, production readiness
  → log-implementation    record artifacts
  → mark complete

# File Touch Map identifies zero-overlap tasks
# Those tasks dispatch simultaneously in batches

Batch 1:  Task 1 + Task 3 + Task 5     parallel
Batch 2:  Task 2 + Task 4               parallel
Batch 3:  Task 6 (depends on Task 2)    sequential

Case Study

Forge: Empty repo to deployed production app in 3 hours

23 tasks, 30 parallel subagent dispatches, 8 dashboard approval gates, zero file conflicts. What would take a full day sequentially completed in one afternoon.

Where this is going

Each component is independently useful today. The roadmap is consolidation: one install, one Docker container, zero cloud dependencies.

        Consolidation roadmap
      
✓ SpecFlow MCP server — spec lifecycle + dashboard           shipped
✓ DocVault — cross-project knowledge vault                    shipped
✓ Code Context — semantic search (Milvus, self-hosted)        shipped
✓ Code Graph Context — structural search (Neo4j, local)       shipped
✓ 60+ skills — procedural knowledge routing                   shipped
✓ Memory pipeline — session digests (local Ollama or cloud LLM)           shipped

○ Rebrand Code Context — merge into SpecFlow plugin           next
○ Self-host mem0 — fork + local deployment                    planned
○ Self-host CGC — fork + local Neo4j bundle                   planned
○ Unified Docker container — all services in one stack        planned
○ Rebrand + package — one-command install for any project     future

The end state: a single installable system that gives any AI agent persistent memory, semantic code search, structural code intelligence, and spec-driven workflow — self-hosted core, cloud optional.

Spec-driven development withpersistent project memory

The problem with AI-assisted development

Cold starts

Context flooding

No lifecycle

Four systems, one workflow

Spec-driven lifecycle

Cross-project knowledge base

Vector-indexed code intelligence

Procedural knowledge routing

Multi-agent, one workflow

Plugin marketplace

Manual MCP config

Manual MCP config

Three-tier memory architecture

DocVault — Ground truth

File Memory — Session context

mem0 — Episodic recall

Code intelligence, self-hosted

Search by meaning

No collection limits

Production-stable

Index once, search forever

Spec workflow lifecycle

Every session learns from the last

Fast session quick-start

End-of-session orchestrator

On-demand project health check

Prescriptive lesson extraction

One vault, all projects

How it compares

What ships in the box

Parallel subagent dispatch

Forge: Empty repo to deployed production app in 3 hours

Where this is going

Stop re-explaining your project every session

Spec-driven development with
persistent project memory