Session Memory Pipeline — Claude Code Guide

The Problem

Sessions Are Ephemeral

Every time you start a new Claude Code session, Claude has no memory of what happened before. Your CLAUDE.md provides static instructions, but it doesn't know what you worked on yesterday, what decisions you made, or what's left unfinished. You're constantly re-explaining context.

The typical workaround is to manually paste notes or dump entire conversation logs into the next session. This is noisy, burns tokens on irrelevant context, and doesn't scale.

What you actually want is a pipeline that automatically captures the important parts of each session, stores them in a searchable format, and loads relevant context when you start the next one.

Architecture

The Session Memory Pipeline

The pipeline has four layers: automatic context injection via hooks, manual deep-load via skills, on-demand search via the oracle, and batch catch-up for missed sessions. Each layer builds on the previous one.

SESSION START SESSION END │ │ ▼ ▼ Hook injects mem0 automatically /wrap captures knowledge: └─ 5 recent memories, silent ├─ Retrospective lessons → mem0 │ ├─ Session digest → DocVault ▼ ├─ Curated summary → mem0 /prime deep-loads context (optional) └─ Worktree + branch cleanup ├─ DocVault project pages ├─ Git status + open issues MISSED SESSION? └─ Code index health │ ▼ DURING SESSION Batch digest scans for │ unprocessed JSONL files ▼ and dispatches session-digest session-oracle agent: agents to catch up Semantic search across all past session logs via Milvus vector DB │ COMPACTION? │ Hook fires again in HEAVY MODE └─ Recovers handoff state + 10 memories

SessionStart hook

Automatic. Silently injects mem0 memories at every session start. Switches to heavy mode after compaction to recover lost context. Full guide →

/prime

Manual. Deep project briefing — vault pages, git status, open issues, code index health. Use when you need the full picture.

/wrap

Manual. Captures retro lessons, writes digest to vault + mem0, cleans up branches. Run at session end.

session-oracle

On demand. Semantic search across past iTerm2 session logs. "When did we decide X?" "How did we fix Y?"

session-digest

Batch agent. Parses JSONL transcripts into vault + mem0 summaries. Dispatched by /wrap or run manually for missed sessions.

Layer 1

Log Cleaning Scripts

Raw session logs are noisy — full of ANSI escape codes, TUI spinner text, and control characters. These Python scripts clean them into digestible conversation text.

iTerm2 Session Log Cleaner

If you use iTerm2's "Automatically log session input/output" feature, your logs will be full of terminal noise. This script strips timestamps, ANSI codes, TUI elements (spinners, status bars), and reassembles single-character keystrokes back into typed words.

clean-session-log.py

#!/usr/bin/env python3
"""
iTerm2 session log cleaner for Claude Code digest pipeline.

Strategy: strip timestamps and control chars, then KEEP only lines that
look like real conversational content. Everything else is TUI noise.

Usage:
    python3 clean-session-log.py <input.log> [output.txt]
    python3 clean-session-log.py <input.log>  # prints to stdout
"""

import re, sys
from pathlib import Path

# --- Timestamp prefix ---
TS_RE = re.compile(r"^\[[\d/]+,\s*[\d:.]+ [AP]M\]\s*")

# --- Control character stripping ---
ANSI_RE = re.compile(r"\x1b\[[0-9;]*[A-Za-z]|\x1b\].*?(\x07|\x1b\\)")
CTRL_CHARS_RE = re.compile(r"[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]")
TUI_FILLER_RE = re.compile(r"[\u2500-\u257f\u2580-\u259f\u25a0-\u25ff]{2,}")

# --- Noise patterns (lines to DROP) ---
NOISE_PATTERNS = [
    # TUI spinner verbs (Claude Code rotates these)
    re.compile(r"^[·✢✳✶✻✽❯⏺\s]*(Embellishing|Grooving|Orbiting|Composing|"
               r"Thinking|Reflecting|Musing|Pondering|Weaving|Crafting|"
               r"Sculpting|Harmonizing|Polishing|Illuminating|Reasoning|"
               r"Brainstorming|Synthesizing|Considering|Generating|"
               r"Processing|Analyzing|Exploring|Initializing|Loading)"),
    re.compile(r"--\s*(INSERT|NORMAL|VISUAL|REPLACE)\s*--"),
    re.compile(r"(Opus\s*4\.\d|Sonnet\s*4\.\d|Haiku\s*4\.\d)"),
    re.compile(r"^lbruton@\S+.*%"),  # shell prompts
    re.compile(r"^(Last login:|You have mail)"),
    re.compile(r"claude\.ai/code/session_"),
    re.compile(r"Do you want to proceed\?|Esc\s*to\s*cancel"),
    re.compile(r"^(Read|Edit|Write|Bash|Glob|Grep)\s*(file)?\s*$"),
]

WORD_RE = re.compile(r"\b[a-zA-Z]{2,}\b")

def strip_line(line):
    """Remove timestamp, ANSI codes, and control chars."""
    line = TS_RE.sub("", line)
    line = ANSI_RE.sub("", line)
    line = CTRL_CHARS_RE.sub("", line)
    line = TUI_FILLER_RE.sub("", line)
    return re.sub(r"  {3,}", "  ", line).rstrip()

def is_noise(line):
    stripped = line.strip()
    if not stripped or len(stripped) < 4:
        return True
    return any(pat.search(stripped) for pat in NOISE_PATTERNS)

def has_content(line):
    stripped = line.strip()
    if "`" in stripped or "/" in stripped:
        return len(WORD_RE.findall(stripped)) >= 2
    return len(WORD_RE.findall(stripped)) >= 3

def reassemble_keystrokes(lines):
    """Collapse single-char keystroke lines into assembled text."""
    result, buf = [], []
    for line in lines:
        stripped = line.strip()
        if len(stripped) == 1 and stripped.isprintable() and ord(stripped) < 128:
            buf.append(stripped)
        else:
            if buf:
                word = "".join(buf)
                if len(word) > 2:
                    result.append(f"[user typed: {word}]")
                buf = []
            result.append(line)
    if buf and len("".join(buf)) > 2:
        result.append(f"[user typed: {''.join(buf)}]")
    return result

def clean_log(raw_text):
    lines = [strip_line(l) for l in raw_text.splitlines()]
    lines = reassemble_keystrokes(lines)
    cleaned = [l for l in lines if not is_noise(l) and has_content(l)]
    # Collapse blank runs
    result, blanks = [], 0
    for line in cleaned:
        if not line.strip():
            blanks += 1
            if blanks <= 1: result.append("")
        else:
            blanks = 0
            result.append(line)
    return "\n".join(result).strip()

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <input.log> [output.txt]", file=sys.stderr)
        sys.exit(1)
    raw = Path(sys.argv[1]).read_text(errors="replace")
    cleaned = clean_log(raw)
    if len(sys.argv) >= 3:
        Path(sys.argv[2]).write_text(cleaned)
        r, c = len(raw.splitlines()), len(cleaned.splitlines())
        print(f"Cleaned: {r} -> {c} lines ({(r-c)/r*100:.0f}% reduction)", file=sys.stderr)
    else:
        print(cleaned)

JSONL Transcript Extractor

Claude Code stores session transcripts as JSONL files in ~/.claude/projects/. This script extracts clean conversation text with actor attribution, tool call summaries (not full output), and session metadata. It filters out thinking blocks, tool results (which contain entire file contents), and system injections.

extract-jsonl-session.py

#!/usr/bin/env python3
"""
Extract clean conversation text from Claude Code JSONL transcripts.

Output format:
    ### User
    <user message text>

    ### Assistant
    <assistant response text>

    ### Tool: <tool_name>
    <brief tool context — name + key input, NOT full output>

Filters out: thinking blocks, tool_result payloads, system messages.
"""

import json, sys
from datetime import datetime
from pathlib import Path

def extract_session_metadata(entries):
    meta = {"project": "unknown", "branch": "", "cwd": "",
            "start_time": "", "end_time": "",
            "tool_count": 0, "tools_used": set()}
    for entry in entries:
        if not meta["cwd"] and entry.get("cwd"):
            meta["cwd"] = entry["cwd"]
        if not meta["branch"] and entry.get("gitBranch"):
            meta["branch"] = entry["gitBranch"]
        ts = entry.get("timestamp", "")
        if ts:
            if not meta["start_time"] or ts < meta["start_time"]:
                meta["start_time"] = ts
            if not meta["end_time"] or ts > meta["end_time"]:
                meta["end_time"] = ts
        msg = entry.get("message", {})
        if isinstance(msg, dict):
            for block in (msg.get("content") or []):
                if isinstance(block, dict) and block.get("type") == "tool_use":
                    meta["tool_count"] += 1
                    meta["tools_used"].add(block.get("name", ""))
    # Derive project from cwd
    cwd = meta["cwd"]
    if cwd:
        parts = cwd.rstrip("/").split("/")
        meta["project"] = parts[-1] if parts else "unknown"
    meta["tools_used"] = sorted(meta["tools_used"])
    return meta

def extract_conversation(jsonl_path):
    entries = []
    for line in jsonl_path.read_text(errors="replace").splitlines():
        line = line.strip()
        if not line: continue
        try: entries.append(json.loads(line))
        except json.JSONDecodeError: continue
    if not entries: return "", {}

    meta = extract_session_metadata(entries)
    parts = [f"--- Session: {meta['project']} | "
             f"branch={meta['branch']} | "
             f"tools={','.join(meta['tools_used'][:10])} ---\n"]

    for entry in entries:
        etype = entry.get("type", "")
        if etype == "user":
            content = entry.get("message", {}).get("content", "")
            if isinstance(content, str):
                text = content.strip()
            elif isinstance(content, list):
                text = "\n".join(
                    b["text"] for b in content
                    if isinstance(b, dict) and b.get("type") == "text"
                ).strip()
            else: continue
            if text: parts.append(f"### User\n{text}\n")

        elif etype == "assistant":
            content = entry.get("message", {}).get("content", [])
            if not isinstance(content, list): continue
            texts, tools = [], []
            for block in content:
                if not isinstance(block, dict): continue
                if block.get("type") == "text":
                    t = block.get("text", "").strip()
                    if t: texts.append(t)
                elif block.get("type") == "tool_use":
                    name = block.get("name", "?")
                    inp = block.get("input", {})
                    ctx = ""
                    if name in ("Read","Write","Edit"):
                        ctx = inp.get("file_path", "")
                    elif name == "Bash":
                        cmd = inp.get("command", "")
                        ctx = cmd[:120] + ("..." if len(cmd) > 120 else "")
                    elif name == "Grep":
                        ctx = f'pattern="{inp.get("pattern","")}"'
                    elif name == "Agent":
                        ctx = inp.get("description", "")[:80]
                    tools.append(f"[Tool: {name} -> {ctx}]" if ctx
                                 else f"[Tool: {name}]")
            if texts: parts.append("### Assistant\n" + "\n".join(texts) + "\n")
            if tools: parts.append("  " + "  ".join(tools) + "\n")
    return "\n".join(parts), meta

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <session.jsonl> [output.txt]",
              file=sys.stderr)
        sys.exit(1)
    text, meta = extract_conversation(Path(sys.argv[1]))
    if len(sys.argv) >= 3:
        Path(sys.argv[2]).write_text(text)
    else:
        print(text)

Where are the logs?

iTerm2 logs: Enable in iTerm2 Preferences → Profiles → Session → "Automatically log session input/output." Logs land in your configured directory (e.g., ~/.claude/iterm2/).

JSONL transcripts: Claude Code stores these automatically at ~/.claude/projects/<project-hash>/*.jsonl. Each session is one file.

Layer 2

The Session Digest Agent

The digest agent runs as an isolated subagent (so large logs don't eat your main context window). It reads the extracted conversation, writes a structured summary to an Obsidian vault, and saves it to mem0 cloud memory.

What the digest captures

Each digest is a dense 200-300 word prose summary covering: what was the goal, what was accomplished, what problems were hit, what decisions were made, and what's left unfinished. Concrete anchors (commit hashes, issue IDs, version numbers) are mandatory.

Dual storage

The digest is written to two places:

Store	Format	Purpose
Obsidian Vault	Markdown daily digest with `[[wikilinks]]`	Human-browsable, shows up in vault graph view, backlinks panel. Organized by project and date.
mem0	Single memory entry with metadata tags	Searchable by future sessions via `/prime`. mem0 automatically decomposes the summary into granular facts.

Agent definition

Save this as ~/.claude/agents/session-digest.md. The agent is dispatched by the /wrap skill at session end.

Prompt for Claude: Build the digest agent

I want to create a session digest agent that processes my Claude Code JSONL transcripts into structured summaries. The agent should:

1. Run extract-jsonl-session.py to get clean conversation text
2. Write a 200-300 word prose summary with actor attribution ("I" for user actions, "Claude" for agent actions)
3. Save to an Obsidian vault as a daily digest file (append if file exists)
4. Save to mem0 with project-specific agent_id tags
5. Return a structured report showing what was captured

The agent should run in isolation (as a subagent) to keep large logs out of my main context. Use the Haiku model for cost efficiency.

Layer 3

The Session Oracle

The oracle is a search agent that lets you ask questions about past sessions: "When did we decide to use SQLite instead of Postgres?" or "How did we fix that CORS issue last week?" It uses semantic vector search, not keyword grep.

How it works

The oracle indexes your cleaned iTerm2 session logs (or JSONL transcripts) into a Milvus vector database using the claude-context MCP server. When you search, it returns contextualized excerpts categorized as decisions, solutions, discussions, or commands.

Example queries

# Ask Claude to search past sessions
/session-oracle "when did we set up the Portainer stack for the poller?"
/session-oracle "how did we fix the webpack build error in StakTrakr?"
/session-oracle "what was the decision about using mem0 vs local files?"

Prompt for Claude: Build a session search agent

I want an agent that can search my past Claude Code sessions semantically. I have iTerm2 session logs in ~/.claude/iterm2/. The agent should:

1. Check if the log directory is indexed in a vector database (I use claude-context with Milvus, but any embedding store works)
2. Run semantic search against the query
3. Read surrounding context for each hit to understand the full conversation
4. Classify each result as a decision, solution, discussion, or command
5. Return clean excerpts with dates and project attribution

The agent should strip ANSI codes from excerpts and limit results to 10 most relevant. Run it as a subagent to keep search results out of main context.

Layer 4

Lifecycle Skills: /prime and /wrap

The real power comes from wiring the pipeline into your session lifecycle. Two skills handle this: /prime at session start and /wrap at session end.

/prime — Session Boot

When you start a session and type /prime, Claude:

1. Searches mem0 for recent memories about this project
2. Reads project status from the Obsidian vault
3. Checks git status, open PRs, and unfinished issues
4. Verifies code indexes are fresh (vector DB + graph DB)
5. Presents a unified startup briefing:

   "Last session you were working on STAK-498 (price scraper bug).
    PR #903 was merged. Tasks 4-5 are still open. The poller was
    redeployed to both Portainer and Fly.io. Code indexes are fresh."

/wrap — Session End

When you're done and type /wrap, Claude:

1. Checks for uncommitted changes, open PRs, in-progress tasks
2. Asks what to do with any loose ends (commit? stash? discard?)
3. Updates documentation in the Obsidian vault
4. Captures retrospective lessons to mem0 (prescriptive: what to
   do differently, not just what happened)
5. Writes a session digest to the vault + mem0
6. Cleans up merged worktree branches
7. Prints a session recap with shipped/pending/next-session summary

Prompt for Claude: Build session lifecycle skills

I want two Claude Code skills for session lifecycle management:

/prime (session start): Search mem0 for recent project memories, check git status and open issues, verify code indexes are healthy, and present a briefing that tells me where I left off and what needs attention.

/wrap (session end): Check for uncommitted work, capture retrospective lessons to mem0 (mistakes, successful approaches, preferences), write a session digest to my Obsidian vault and mem0, clean up merged branches, and print a session recap.

Both skills should be saved as ~/.claude/skills/<name>/SKILL.md files. They should ask me before taking destructive actions (deleting branches, discarding changes).

Layer 5

Batch Digest: Catching Missed Sessions

Not every session ends with /wrap. Sometimes you close a terminal, your Mac sleeps, or you just forget. Those sessions still have JSONL transcripts sitting in ~/.claude/projects/ — they just haven't been digested yet.

How it works

Claude Code writes one .jsonl file per session in ~/.claude/projects/<encoded-path>/. The batch digest scans all project directories, finds unprocessed files (tracked via a .jsonl-processed ledger), runs the extraction script to check if the session has enough content to be worth digesting, and dispatches session-digest agents in parallel — one per session.

# Scan for unprocessed sessions
PROCESSED="$HOME/.claude/scripts/.jsonl-processed"
touch "$PROCESSED"

for dir in "$HOME/.claude/projects"/*/; do
  ls -t "$dir"*.jsonl 2>/dev/null | while read f; do
    key="jsonl:$(basename "$dir")/$(basename "$f")"
    if ! grep -qF "$key" "$PROCESSED"; then
      lines=$(wc -l < "$f" | tr -d ' ')
      if [ "$lines" -gt 20 ]; then
        echo "UNPROCESSED: $f ($lines lines)"
      fi
    fi
  done
done

# For each unprocessed file:
# 1. Run extract-jsonl-session.py to check content depth
# 2. Skip sessions under 30 extracted lines (boots, aborts)
# 3. Dispatch session-digest agent for substantial sessions
# 4. Mark as processed in the ledger

This catches everything /wrap misses. Run it at the end of the day, or set up a launchd/cron job to run it hourly. The session-digest agent is the same one /wrap uses — same output format, same dual storage (vault + mem0).

Prompt for Claude: Build a batch digest scanner

I have Claude Code JSONL session transcripts in ~/.claude/projects/ that don't always get digested when I close sessions. Build me a script that:

1. Scans all ~/.claude/projects/*/ directories for .jsonl files
2. Tracks which files have already been processed using a ledger file
3. Runs the extraction script on unprocessed files to check if they have enough content (skip sessions under 30 lines)
4. For qualifying sessions, dispatches the session-digest agent with the file path, project name, and mem0 tag
5. Marks processed files in the ledger so they aren't re-digested

This should be runnable as a cron job or manually at end of day.

Comparison

Approaches to Session Continuity

There are multiple valid approaches to solving the session memory problem. Here's how the lifecycle approach compares to other strategies.

Aspect	Lifecycle Pipeline	Universal Indexer
Philosophy	Capture at session boundaries with structured skills	Post-process logs into a symbolic ontology of primitives
When it runs	Integrated into workflow: `/prime` at start, `/wrap` at end	Batch processing — can run anytime against log corpus
What it stores	Prose summaries + tagged memories + vault entries	Structured primitives: terms, decisions, patterns, operators
Search method	Semantic vector search (Milvus) + mem0 recall	Index lookup + differential tracking
Multi-agent	Same pipeline works across Claude, Codex, Gemini sessions	Designed for Claude + Codex with synced index copies
Survives compaction	Yes — lives in memory files and vault, not conversation window	Yes — loaded from index files into CLAUDE.md/AGENTS.md
Best for	Workflow-integrated teams who want session bookends	Heavy research/exploration sessions needing deep corpus analysis

These approaches are complementary. You could run both — use lifecycle skills for day-to-day session management and a universal indexer for periodic deep analysis of your conversation corpus.

Setup

iTerm2 Session Logging

The session oracle depends on having raw session logs to index. Here's how to set up automatic logging in iTerm2.

Enable automatic logging

# iTerm2 → Preferences → Profiles → Session
# Check: "Automatically log session input/output"
# Log directory: ~/.claude/iterm2/
# Log format: Raw (not plain text — we clean it ourselves)

Log cleanup cron

Session logs grow fast. This cleanup script removes logs older than 30 days and re-indexes the remaining ones.

Prompt for Claude: Build a log maintenance script

I have iTerm2 session logs in ~/.claude/iterm2/ that grow over time. Build me a Python script that:

1. Finds all .log files older than 30 days and deletes them
2. Runs clean-session-log.py on any new/unprocessed logs
3. Triggers a re-index of the log directory via the claude-context MCP server
4. Logs what it did to a maintenance log file

Set it up as a weekly launchd job on macOS (or cron on Linux).

Quick Start

Get Started in 15 Minutes

1. Save the scripts

mkdir -p ~/.claude/scripts
# Save clean-session-log.py and extract-jsonl-session.py
# to ~/.claude/scripts/

2. Set up mem0 (optional but recommended)

mem0 provides cloud-hosted cross-session memory with automatic fact decomposition. Sign up at mem0.ai and add it as an MCP server in your Claude Code config.

3. Create the skills

mkdir -p ~/.claude/skills/prime
mkdir -p ~/.claude/skills/wrap
# Create SKILL.md in each directory with the lifecycle logic

4. Create the agents

mkdir -p ~/.claude/agents
# Save session-digest.md and session-oracle.md
# Claude Code auto-discovers agents in this directory

5. Try it

# Start a new Claude Code session
claude

# Load context from past sessions
> /prime

# ... do your work ...

# Capture everything before closing
> /wrap

Minimum viable version

You don't need all of this at once. Start with just the scripts + a simple /wrap skill that writes a digest. Add the oracle, mem0, and /prime as you see the value. The scripts alone are worth having — they turn unusable raw logs into searchable conversation text.