MCP Server Plugin • Open Source

Spec-driven development with
persistent project memory

AI agents lose context between sessions. They forget decisions, repeat mistakes, and drift from reality. SpecFlow gives them a structured lifecycle, persistent memory, and semantic code intelligence — self-hosted core with optional cloud integrations.

The problem with AI-assisted development

Every session starts from zero. Three failure modes compound over time.

🧊

Cold starts

The agent has no memory of yesterday. It doesn't know what it built, what broke, or what conventions you agreed on. You re-explain everything, every time.

💥

Context flooding

You stuff everything into CLAUDE.md — architecture, conventions, decisions, infrastructure. The file balloons. Token waste goes up. Attention quality goes down.

📉

No lifecycle

Agents jump straight to code. No requirements phase. No design review. No approval gates. The result: rework, scope creep, and features that don't match intent.

Four systems, one workflow

SpecFlow structures how work gets done. DocVault structures what the agent knows. Code Context gives it semantic understanding of your code. Together, they replace the bloated CLAUDE.md + ad-hoc prompting pattern.

SpecFlow — MCP Server

Spec-driven lifecycle

Requirements → Design → Tasks → Implementation with dashboard approvals at every gate. Works with Claude Code, Gemini CLI, and Codex CLI — any MCP-compatible agent.

DocVault — Obsidian Vault

Cross-project knowledge base

One Obsidian vault serves 8+ repos. Architecture, infrastructure, decisions, issues — all in structured markdown with graph visualization and wikilinks.

Code Context — Semantic Search

Vector-indexed code intelligence

Self-hosted Milvus vector database indexes every codebase. Agents search code by meaning, not just keywords. Forked from Zilliz, hardened with timeouts and stability fixes.

Skill System — 60+ Skills

Procedural knowledge routing

CLAUDE.md stays tiny — a routing table to skills. Each skill encodes a full workflow: debugging, deployment, PR resolution, infrastructure management.

Multi-agent, one workflow

SpecFlow speaks MCP — the open protocol for agent-tool communication. Any agent that supports MCP gets the full spec lifecycle, shared knowledge base, and code intelligence.

Claude Code

Plugin marketplace

One-click install via the Claude Code marketplace. Full skill system with 60+ workflows, session lifecycle, and parallel subagent dispatch.

Gemini CLI

Manual MCP config

Add SpecFlow as an MCP server in Gemini’s settings. Full access to spec tools, DocVault, and the approval dashboard. Uses GEMINI.md for agent-specific behavior.

Codex CLI

Manual MCP config

Configure in Codex’s TOML config. Same MCP tools, same workflow, same dashboard. Uses CODEX.md for agent-specific behavior.

All three agents share the same DocVault knowledge base, spec workflow state, and dashboard — work started in one agent can be continued in another.

Cross-Agent Spec Handoff
Claude Code Discovery & Design /issue create → SWF-65 created in DocVault /discover SWF-65 → Discovery brief with research /spec SWF-65 → Requirements → Design → awaiting approval Codex CLI Task Generation @spec resume → Reads spec from disk, picks up at Phase 3 → Generates tasks.md → awaiting approval Gemini CLI Implementation @spec resume → Reads spec from disk, picks up at Phase 4 → Implements tasks → commits → logs artifacts Spec state lives on disk in .spec-workflow/specs/ — not in any agent's memory. The dashboard tracks progress regardless of which agent is driving.

Three-tier memory architecture

Not everything belongs in one file. Each tier has a purpose, a source of truth ranking, and a different retrieval cost.

Tier 1

DocVault — Ground truth

Human-curated Obsidian vault. Architecture, infrastructure topology, data models, deployment procedures. Version-controlled. Wins all tie-breaking conflicts. Read via Grep, Glob, or semantic search.

Tier 2

File Memory — Session context

Project-scoped markdown files at ~/.claude/projects/*/memory/. Agent-curated, human-reviewable. Loaded at session start. Stores user preferences, feedback, project state, and external references.

Tier 3

mem0 — Episodic recall

Cloud-based semantic memory. Automatically populated from session digests. Probabilistic retrieval via embedding similarity. Never authoritative — DocVault always wins. Best for past decisions, operational context, and gap-filling.

▼ conflict resolution: newest wins, highest tier breaks ties

Code intelligence, self-hosted

Agents shouldn't grep blindly through your codebase. Code Context gives them semantic understanding — search by meaning, not string matching. Backed by a self-hosted Milvus vector database. Embedding generation uses a cloud API or local model via Ollama.

Semantic Code Search

Search by meaning

Ask "find the authentication middleware" and get results even if the code never uses the word "auth." Vector embeddings capture semantic relationships across your entire codebase.

Self-Hosted Milvus

No collection limits

Cloud vector databases cap you at 4 collections on free tiers. Self-hosted Milvus on your own infrastructure gives unlimited project indices with full data sovereignty.

Hardened Fork

Production-stable

Forked from Zilliz's MCP server with critical fixes: 30s fetch timeouts, gRPC connection guards, and pinned npm versions. No more silent upstream breakage pulling broken builds into your sessions.

Persistent Indices

Index once, search forever

Code indices persist across sessions in Milvus. No re-indexing on every cold start. Incremental updates catch only what changed since the last index run.

Code search tiers — cheapest first
# Tier 1: Structural graph (callers, imports, dead code) code-graph-context → Neo4j — "what calls this function?" # Tier 2: Semantic search (meaning-based) code-context → Milvus — "find code related to payment processing" # Tier 3: Literal search (exact strings) grep / glob → Filesystem — "find all files matching *.test.ts" # Tier 4: Deep analysis (only when tiers 1-3 leave gaps) code-oracle agent → Subagent — combines all sources + AI reasoning

Spec workflow lifecycle

Every non-trivial feature follows the same path. Approvals required at each gate. No phase-skipping.

Phase 0
/chat
Phase 1
/discover
Phase 2
/spec
Phase 3
Design
Phase 4
Implement
Phase 5
/retro
Session lifecycle
# Explore an idea without committing to code /chat "what if we added real-time price alerts?" # Structured brainstorm with research agents /discover STAK-42 Dispatching research agents... Discovery Brief generated with all open questions resolved # Create spec: Requirements → Design → Tasks /spec STAK-42 Requirements approved ✓ Design approved ✓ Tasks generated ✓ # Implementation with parallel subagents Task 1/4: WebSocket server ■■■■■■■■■■ done Task 2/4: Alert rules engine ■■■■■■■■■■ done Task 3/4: Push notification API ■■■■■■■■■■ done Task 4/4: Frontend alert panel ■■■■■■■■■■ done # Extract lessons, save to mem0 /retro 3 prescriptive lessons saved

Every session learns from the last

Most AI tools treat each session as isolated. SpecFlow creates a continuous learning loop: sessions end with knowledge extraction, and new sessions start pre-loaded with everything that matters.

The continuous context lifecycle
┌─────────────────────────────────────────────────────────┐ │ THE CONTINUOUS LEARNING LOOP │ └─────────────────────────────────────────────────────────┘ Morning — /prime boots you up Indexes codebase (Code Context + Code Graph) Reads recent session digests from DocVault Pulls relevant memories from mem0 Checks open issues + git status Presents: "Here's where you left off, here's what's next" During the day — sessions compound Each session has full context from /prime Work produces commits, specs, implementation logs /handoff relays state between terminal sessions End of session — /wrap orchestrates everything /wrap Step 1: Cleanup (stale branches, uncommitted work) Step 2: /vault-update (sync documentation) Step 3: /retro Extracts prescriptive lessons from conversation Saves to mem0 as structured retro-learning memories Step 4: /digest-session Reads JSONL session transcripts Summarizes via configurable LLM (local Ollama or cloud) Writes daily digest to DocVault/Daily Digests/ Saves key facts to mem0 for cross-session recall On-demand — /audit checks project health /audit Code quality + security scan Documentation drift detection Issue staleness check Actionable remediation report ┌─────────────────────────────────────────────────────────┐ │ /prime ──→ work ──→ /audit ──→ /wrap ──→ /prime ──→ .. │ │ Tomorrow's /prime reads today's digest + retro lessons │ └─────────────────────────────────────────────────────────┘
/prime

Fast session quick-start

Boots in ~15 seconds: indexes code, reads digests, pulls mem0 memories, checks issues and git status. Optional --deep mode for thorough codebase analysis. The agent starts every session knowing what happened yesterday.

/wrap

End-of-session orchestrator

One command to close a session cleanly: cleanup stale branches, sync documentation via /vault-update, extract prescriptive lessons via /retro, then process session logs via /digest-session. Replaces the deprecated /goodnight and /digest-session skills.

/audit

On-demand project health check

Scans code quality, security posture, documentation drift, and issue staleness. Produces an actionable remediation report. Run anytime — before a release, after a sprint, or when things feel off.

/retro

Prescriptive lesson extraction

Not "what did we do" but "what should we do differently." Extracts actionable rules from the conversation and saves them as structured memories. Future sessions apply these lessons automatically.

One vault, all projects

DocVault is a single Obsidian vault that serves every project. No per-repo scaffolding. No duplicate context files. Infrastructure docs sit alongside application architecture.

DocVault/ Projects/ StakTrakr/ Overview.md, Architecture.md, Data Model.md, API.md... HexTrackr/ Overview.md, Architecture.md, Database.md, WebSocket.md... spec-workflow-mcp/ Overview.md, Tools & Prompts.md, Dashboard.md... Infrastructure/ Host Inventory.md ← every IP, port, and DNS record Stack Registry.md ← 26 Docker stacks Portainer.md, Proxmox.md, Cloudflare.md, NPM.md... Architecture/ Methodology.md, Memory Pipeline.md, Skill Matrix.md... Daily Digests/ StakTrakr/ 2026-03-29.md, 2026-03-28.md... ← auto-generated Templates/ requirements.md, design.md, tasks.md, steering.md... Issues.base ← vault-based issue tracking

How it compares

Different approaches to the same problem: giving AI agents persistent project context.

Dimension SpecKit BMAD GSD Taskmaster mex Pimzino SpecFlow
Approval gates None Advisory UAT phase None None Dashboard (blocking) Dashboard + skill enforcement
Memory constitution.md Git docs STATE.md tasks.json Scaffold files Steering docs 3-tier: DocVault + file + mem0
Session learning None None None None GROW loop None /prime → /audit → /wrap
Code search None None None None None None Semantic + structural
Multi-project Per-repo Per-repo Per-repo Per-repo Per-repo Per-repo One vault, all repos
Infrastructure Code only Code only Code only Code only Code only Code only Docker, DNS, VMs, proxies
Drift detection None None None None 8 checkers None /vault-update gate
Multi-tool Any AI tool Any AI tool Claude focused Any AI tool 4 tools Claude + MCP Claude + MCP ecosystem
Self-hosted Files only Files only Files only Files only Files only Node.js Milvus, Neo4j, Ollama
Best for Quick adoption Enterprise teams Solo context eng. PRD pipelines Per-repo memory Structured workflow Multi-project, high-governance

What ships in the box

SpecFlow is an MCP server that works with any MCP-compatible agent — Claude Code, Gemini CLI, and Codex CLI verified. DocVault is an Obsidian vault. Code Context is a semantic search engine. Together, they replace the bloated instruction file + ad-hoc prompting pattern.

MCP Tools & Prompts
Tools (6) spec-workflow-guide Lifecycle orchestration instructions steering-guide Steering document creation guidance spec-status Phase, completion, implementation audit approvals Dashboard approve/reject/request changes log-implementation Record artifacts (functions, endpoints, tests) spec-list List all specs with status Prompts (10) create-spec Create requirements, design, or tasks doc create-steering-doc Create product/tech/structure steering implement-task Dispatch implementation to subagents refresh-tasks Re-sync task state from spec files wrap End-of-session orchestrator (cleanup, docs, retro, digest) prime Fast session quick-start (~15s, optional deep mode) audit On-demand project health check + 3 injection prompts Context injection for guides
Dashboard — localhost:5051
Real-time web UI for spec management View all specs across projects with phase indicators Approve, reject, or request changes at each gate Track implementation progress per task Implementation log audit trail Auto-starts with MCP server initialization

Parallel subagent dispatch

The biggest architectural difference from other spec workflows: tasks don't execute sequentially in one session. Each task runs through an isolated three-stage pipeline, and tasks with zero file overlap execute concurrently.

Subagent pipeline — per task
# Each task runs through 3 isolated stages Orchestrator reads task from spec Implementer Agent fresh context, writes code, tests, commits Compliance Reviewer reads actual code vs requirements Quality Reviewer architecture, error handling, production readiness log-implementation record artifacts mark complete # File Touch Map identifies zero-overlap tasks # Those tasks dispatch simultaneously in batches Batch 1: Task 1 + Task 3 + Task 5 parallel Batch 2: Task 2 + Task 4 parallel Batch 3: Task 6 (depends on Task 2) sequential
Case Study

Forge: Empty repo to deployed production app in 3 hours

23 tasks, 30 parallel subagent dispatches, 8 dashboard approval gates, zero file conflicts. What would take a full day sequentially completed in one afternoon.

Where this is going

Each component is independently useful today. The roadmap is consolidation: one install, one Docker container, zero cloud dependencies.

Consolidation roadmap
SpecFlow MCP server — spec lifecycle + dashboard shipped DocVault — cross-project knowledge vault shipped Code Context — semantic search (Milvus, self-hosted) shipped Code Graph Context — structural search (Neo4j, local) shipped 60+ skills — procedural knowledge routing shipped Memory pipeline — session digests (local Ollama or cloud LLM) shipped Rebrand Code Context — merge into SpecFlow plugin next Self-host mem0 — fork + local deployment planned Self-host CGC — fork + local Neo4j bundle planned Unified Docker container — all services in one stack planned Rebrand + package — one-command install for any project future

The end state: a single installable system that gives any AI agent persistent memory, semantic code search, structural code intelligence, and spec-driven workflow — self-hosted core, cloud optional.

Stop re-explaining your project every session

SpecFlow is open source. The core workflow runs entirely on your machine. Cloud integrations (mem0, embeddings, session digests) are optional and configurable — use local models or cloud APIs based on your preference and hardware.