Session Memory Pipeline — Claude Code Guide

The Problem

Sessions Are Ephemeral

Every time you start a new Claude Code session, Claude has no memory of what happened before. Your CLAUDE.md provides static instructions, but it doesn't know what you worked on yesterday, what decisions you made, or what's left unfinished. You're constantly re-explaining context.

The typical workaround is to manually paste notes or dump entire conversation logs into the next session. This is noisy, burns tokens on irrelevant context, and doesn't scale.

What you actually want is a pipeline that automatically captures the important parts of each session, stores them in a searchable format, and loads relevant context when you start the next one.

Architecture

The Session Memory Pipeline

The pipeline has three layers: automatic context injection via hooks, semantic search over all past transcripts via session-rag, and lifecycle skills that load context at session start and capture knowledge at session end.

SESSION START SESSION END │ │ ▼ ▼ Hook injects mem0 automatically /wrap captures knowledge: └─ 5 recent memories, silent ├─ Retrospective lessons → mem0 │ ├─ Curated summary → mem0 ▼ └─ Worktree + branch cleanup /start quick resume (same-day) ├─ session-rag recent context ├─ mem0 project memories └─ Git status + open issues │ ▼ /prime full boot (days away) ├─ session-rag + mem0 context ├─ Code index health └─ Full project status ALWAYS RUNNING │ ▼ session-rag MCP server: Watches ~/.claude/projects/ for new JSONL transcripts, indexes automatically via Milvus Lite + SQLite FTS5 hybrid │ COMPACTION? │ Hook fires again in HEAVY MODE └─ Recovers handoff state + 10 memories

SessionStart hook

Automatic. Silently injects mem0 memories at every session start. Switches to heavy mode after compaction to recover lost context. Full guide →

/start

Quick resume. Queries session-rag + mem0 for recent context, checks git state and open issues. Use for same-day work or short breaks.

/prime

Full boot. Queries session-rag + mem0, indexes codebase, checks code health, gathers project status. Use after days away or on new projects.

/wrap

Session close. Runs retrospective inline, saves lessons + curated summary to mem0, cleans up branches. No separate /retro needed.

session-rag

MCP server. Indexes JSONL transcripts into Milvus Lite with FTS5 hybrid search. Real-time file watcher, date range filtering, cross-session semantic search.

Layer 1

Session Search: session-rag

session-rag is an MCP server that indexes Claude Code JSONL transcripts directly into a Milvus Lite vector database with a SQLite FTS5 keyword sidecar, providing hybrid semantic + keyword search across all past sessions. It works natively with the JSONL format Claude Code already writes, indexes in real-time via a file watcher, and supports date range filtering for historical queries.

Fork note

We run a fork of the original mwgreen/claude-code-session-rag. The fork adds ISO 8601 date range filtering (date_from/date_to) for historical search, global MCP server architecture (single Milvus DB at ~/.session-rag/ vs. per-project databases), and hybrid search using Reciprocal Rank Fusion to merge vector and FTS5 keyword results.

How it works

session-rag runs as an HTTP MCP server on port 7102. It watches ~/.claude/projects/ for new JSONL transcript files, chunks and embeds them using EmbeddingGemma-300M (optimized for Apple Silicon), and stores vectors in a local Milvus Lite database at ~/.session-rag/. A parallel SQLite FTS5 index provides keyword search. Results from both engines are merged using Reciprocal Rank Fusion (RRF) for the best of both worlds.

MCP tools

Tool	Description
`search_session`	Search current session with recency bias. Finds past discussions, decisions, code snippets, error messages.
`search_all_sessions`	Cross-session semantic search without recency bias. Filters: `git_branch`, `project_root`, `date_from`, `date_to` (ISO 8601).
`get_turns`	Retrieve conversation turns surrounding a specific turn index. Use after search to see full context around a hit.
`get_session_stats`	Index statistics: total turns, session count, branches, breakdown by chunk type.
`cleanup_sessions`	Delete old data by age (`max_age_days`), specific `session_id`, or `git_branch`.

Usage examples

# Search across all sessions with date filtering
search_all_sessions("what was the decision about auth middleware",
    date_from="2026-03-01", date_to="2026-03-15")

# Search within the current session
search_session("how did we fix the build error")

# Get full context around a search hit
get_turns(session_id="abc123", turn_index=42, context=3)

# Check index health
get_session_stats()

# Clean up old session data
cleanup_sessions(max_age_days=90)

Why session-rag instead of log cleaning scripts?

Earlier versions of this pipeline used Python scripts to clean iTerm2 logs and JSONL transcripts, then indexed them via claude-context. session-rag eliminates all of that — it reads JSONL natively, indexes automatically via a file watcher, and stores everything in a local vector DB. No cleaning scripts, no manual indexing, no cron jobs. The old scripts are retired.

Prompt for Claude: Set up session-rag

I want to set up session-rag as an MCP server for Claude Code. It indexes JSONL transcripts into a Milvus Lite vector DB with FTS5 hybrid search and provides semantic search via MCP tools. The server should:

1. Run as an HTTP MCP server (default port 7102)
2. Watch ~/.claude/projects/ for new JSONL files and index them automatically
3. Provide search_all_sessions with date range filtering (date_from/date_to) for cross-session queries
4. Provide search_session for current-session search with recency bias
5. Provide get_turns for retrieving verbatim conversation turns around a search hit
6. Provide get_session_stats for index health monitoring

Add it to your Claude Code MCP config alongside mem0. Together, session-rag (verbatim transcript search) and mem0 (curated episodic memory) are the two authoritative sources for session context.

Layer 2

Lifecycle Skills: /start, /prime, and /wrap

The real power comes from wiring the pipeline into your session lifecycle. Three skills handle this: /start for quick resumes, /prime for full boots, and /wrap at session end.

/start — Quick Resume

/start is the lightweight alternative to /prime. Use it for same-day work or resuming after a short break. Claude:

1. Queries session-rag for recent session context
2. Searches mem0 for project memories
3. Checks git state and open issues
4. Presents a quick briefing — no indexing, no health checks

   Use /start when you're continuing the same thread of work.
   Use /prime when you've been away for days or switching projects.

/prime — Full Session Boot

When you start a session and type /prime, Claude:

1. Queries session-rag for recent session context
2. Searches mem0 for recent memories about this project
3. Checks git status, open PRs, and unfinished issues
4. Verifies code indexes are fresh (vector DB + graph DB)
5. Gathers full project status
6. Presents a unified startup briefing:

   "Last session you were working on STAK-498 (price scraper bug).
    PR #903 was merged. Tasks 4-5 are still open. The poller was
    redeployed to both Portainer and Fly.io. Code indexes are fresh."

/wrap — Session End

When you're done and type /wrap, Claude runs the retrospective inline (no separate /retro step) and saves everything to mem0:

1. Checks for uncommitted changes, open PRs, in-progress tasks
2. Asks what to do with any loose ends (commit? stash? discard?)
3. Runs retrospective inline — captures prescriptive lessons
   (what to do differently, not just what happened)
4. Saves retro lessons + curated session summary to mem0
5. Cleans up merged worktree branches
6. Prints a session recap with shipped/pending/next-session summary

   Note: /wrap saves to mem0 only. session-rag indexes the raw JSONL
   transcript automatically — no digest step needed.

Prompt for Claude: Build session lifecycle skills

I want three Claude Code skills for session lifecycle management:

/start (quick resume): Query session-rag for recent session context, search mem0 for project memories, check git status and open issues, and present a quick briefing. No indexing or health checks — use this for same-day work or short breaks.

/prime (full session boot): Search mem0 for recent project memories, check git status and open issues, verify code indexes are healthy, and present a full briefing that tells me where I left off and what needs attention. Use this after days away or when switching projects.

/wrap (session end): Check for uncommitted work, capture retrospective lessons to mem0 (mistakes, successful approaches, preferences), save a curated session summary to mem0, clean up merged branches, and print a session recap. session-rag handles transcript indexing automatically — no digest agent needed.

All three skills should be saved as ~/.claude/skills/<name>/SKILL.md files. They should ask me before taking destructive actions (deleting branches, discarding changes).

Comparison

Approaches to Session Continuity

There are multiple valid approaches to solving the session memory problem. Here's how the lifecycle approach compares to other strategies.

Aspect	Lifecycle Pipeline	Universal Indexer
Philosophy	Capture at session boundaries with structured skills	Post-process logs into a symbolic ontology of primitives
When it runs	Integrated into workflow: `/prime` at start, `/wrap` at end	Batch processing — can run anytime against log corpus
What it stores	Tagged memories (mem0) + raw transcript index (session-rag)	Structured primitives: terms, decisions, patterns, operators
Search method	Hybrid vector + FTS5 (session-rag) + mem0 recall	Index lookup + differential tracking
Multi-agent	Same pipeline works across Claude, Codex, Gemini sessions	Designed for Claude + Codex with synced index copies
Survives compaction	Yes — lives in mem0 + session-rag, not conversation window	Yes — loaded from index files into CLAUDE.md/AGENTS.md
Best for	Workflow-integrated teams who want session bookends	Heavy research/exploration sessions needing deep corpus analysis

These approaches are complementary. You could run both — use lifecycle skills for day-to-day session management and a universal indexer for periodic deep analysis of your conversation corpus.

Quick Start

Get Started in 10 Minutes

1. Set up session-rag

Clone the session-rag fork and run it as an HTTP MCP server. It indexes your Claude Code JSONL transcripts into a local Milvus Lite vector database with FTS5 hybrid search. The file watcher picks up new sessions automatically — no cron jobs or manual indexing needed.

git clone https://github.com/lbruton/claude-code-session-rag.git
cd claude-code-session-rag
pip install -r requirements.txt
python http_server.py  # runs on port 7102

2. Set up mem0

mem0 provides cloud-hosted cross-session memory with automatic fact decomposition. Sign up at mem0.ai and add it as an MCP server in your Claude Code config. Together with session-rag, these form the two authoritative session context sources.

3. Add to MCP config

~/.claude.json (excerpt)

{
  "mcpServers": {
    "session-rag": {
      "type": "http",
      "url": "http://localhost:7102/mcp"
    },
    "mem0": {
      "command": "npx",
      "args": ["-y", "@mem0/mcp"],
      "env": { "MEM0_API_KEY": "your-key-here" }
    }
  }
}

4. Create the skills

mkdir -p ~/.claude/skills/start
mkdir -p ~/.claude/skills/prime
mkdir -p ~/.claude/skills/wrap
# Create SKILL.md in each directory with the lifecycle logic

5. Try it

# Start a new Claude Code session
claude

# Quick resume (same-day work or short breaks)
> /start

# Or full boot (days away, new project)
> /prime

# ... do your work ...

# Capture everything before closing
> /wrap

Minimum viable version

You don't need all of this at once. Start with just session-rag — it gives you searchable history with zero manual work. Add mem0 for curated memory that survives across projects. Add the lifecycle skills last for the full session-bookend experience.