The 7-Layer Memory Stack

Clawpy implements a biologically-inspired, stratified memory architecture that gives every agent seven distinct layers of knowledge — from volatile session context to permanently-pinned canonical facts. This is not a single database with different labels. Each layer is a separate subsystem with its own storage engine, access pattern, and retention policy.

No other agentic platform implements memory at this depth.


Architecture Overview

┌────────────────────────────────────────────────────────────┐
│  Layer G — PARA Canonical Knowledge      (Permanent)       │
│  Storage: JSON + Markdown │ Module: para_manager.py        │
├────────────────────────────────────────────────────────────┤
│  Layer F — Temporal Knowledge Graph      (Persistent)      │
│  Storage: JSON DAG       │ Module: task_memory_graph.py    │
├────────────────────────────────────────────────────────────┤
│  Layer E — Cross-Session Context Engine  (Configurable)    │
│  Storage: ChromaDB       │ Module: chroma_context_engine.py│
├────────────────────────────────────────────────────────────┤
│  Layer D — Auto-Capture / Auto-Recall    (Automatic)       │
│  Storage: SQLite + RAM   │ Module: auto_capture + recall   │
├────────────────────────────────────────────────────────────┤
│  Layer C — Semantic Vector Search        (Configurable)    │
│  Storage: ChromaDB       │ Module: layer3_vector.py        │
├────────────────────────────────────────────────────────────┤
│  Layer B — Structured Event Ledger       (Configurable)    │
│  Storage: SQLite         │ Module: layer2_sqlite.py        │
├────────────────────────────────────────────────────────────┤
│  Layer A — Markdown Flat Files           (Session)         │
│  Storage: Filesystem     │ Module: layer1_markdown.py      │
└────────────────────────────────────────────────────────────┘

Layer A — Markdown Flat Files

Module: memory/layer1_markdown.py Storage: Filesystem (.md files) Retention: Session-scoped

The simplest layer. Each agent workspace contains flat Markdown files that serve as mutable state:

  • soul.md — The agent's identity, personality, and operating constraints. Read-only during execution, writable by the Wisdom Cascade teaching cycle.
  • memory.md — Accumulated daily extraction summaries. Appended nightly by the Memory Extractor.
  • heartbeat.md — Pending scheduled tasks. Read by the Heartbeat Protocol on every pulse.
  • comms.json — Communication configuration including reports_to for hierarchy resolution.
  • tools.json — Tool permissions, archetype classification, and model routing.

Layer A is the only layer that is directly editable by operators through the dashboard.


Layer B — Structured Event Ledger

Module: memory/layer2_sqlite.py (35,386 bytes) Storage: SQLite Retention: Configurable per workspace

Every significant runtime event is recorded as a structured row in SQLite — tool calls, validation runs, learning records, budget crossings, and memory events. This is Clawpy's audit trail and the raw material for the Self-Learning pipeline.

Key tables:

TablePurposeRecords
learning_recordsOutcomes from validation, errors, budget incidentsUsed by Adaptation Engine
validation_runsEvery validate → heal → retry cyclePerformance analytics
memory_eventsPARA promotions, nightly extractionsMemory lifecycle tracking
cost_entriesPer-request token costs and billing dataBudget enforcement

The event ledger is the input to the Nightly Memory Extraction process, which synthesises durable facts from the raw event stream.


Layer C — Semantic Vector Search

Module: memory/layer3_vector.py (12,244 bytes) Storage: ChromaDB (L2 distance, HNSW index) Retention: Configurable via retention tiers

Embeds text chunks into a vector space for similarity-based retrieval. This is the workhorse for "fuzzy recall" — when an agent needs to find relevant prior context without knowing the exact terms.

Uses an L2 distance threshold (MAX_DISTANCE = 1.5) to filter irrelevant results. Memories that score above this threshold are discarded, preventing hallucinated context injection.


Layer D — Auto-Capture & Auto-Recall Pipeline

Modules: memory/auto_capture.py (9,698 bytes) + memory/auto_recall.py (40,829 bytes) Storage: In-memory buffers + SQLite Retention: Automatic — the system decides what to capture

This is the pattern-based extraction layer that sits between raw conversation and permanent memory. It operates in two phases:

  1. Auto-Capture — Scans assistant messages for extractable patterns: decisions made, facts stated, corrections applied. These are tagged and stored without human intervention.

  2. Auto-Recall — Before every LLM call, queries all available memory layers and injects relevant context into the system prompt. This is the layer that decides what the agent remembers at any given moment.

The Auto-Recall module is 40K+ lines because it manages the complex orchestration of querying Layers B, C, E, and G simultaneously, deduplicating results, and fitting them within the token budget.


Layer E — Cross-Session Context Engine

Module: core/chroma_context_engine.py (12,262 bytes) Storage: ChromaDB (clawpy_agent_memory collection) Retention: Configurable

While Layer C handles within-session similarity search, Layer E provides cross-session, cross-task semantic recall. When an agent starts a new task, Layer E retrieves relevant memories from all previous tasks — even those belonging to different agents in the same workspace.

Key capabilities:

  • Task summary storage — After task completion, a compressed summary is stored as a durable memory entry.
  • Cloud backup — If Supabase is configured, task summaries are pushed to the cloud for cross-device recall.
  • Relevance filtering — Only memories within the L2 distance threshold (1.5) are injected, preventing noise.

Layer F — Temporal Knowledge Graph

Module: core/task_memory_graph.py (15,906 bytes) Storage: JSON adjacency list (data/task_graph.json) Retention: Persistent (never deleted, only soft-invalidated)

This is where Clawpy's memory becomes structural. Layer F tracks not just what happened, but how tasks relate to each other using a directed graph:

Node Model

TaskNode {
  task_id:     string     // Unique identifier
  title:       string     // Human-readable task name
  timestamp:   float      // Creation/completion time
  success:     bool|null  // Outcome (null = pending)
  description: string     // Task context
}

Edge Types

RelationshipMeaning
blocksTask A's failure directly blocked Task B
related_toTasks share domain, components, or codebase area
derived_fromTask B was spawned / split from Task A
similar_toAuto-linked by semantic similarity threshold

Time-Decay Weighted Recall

When recalling past tasks, Layer F re-ranks ChromaDB results using a composite score that blends semantic similarity with recency:

combined_score = λ × semantic_score + (1 − λ) × recency_score

where:
  semantic_score = 1 − chroma_distance    (0 to 1, higher = more similar)
  recency_score  = exp(−age_days / τ)     (exponential decay)
  λ = SEMANTIC_WEIGHT  = 0.7              (configurable)
  τ = HALF_LIFE_DAYS   = 14              (memory half-life)

This means a task from yesterday with moderate similarity will score higher than a highly-similar task from six months ago — mimicking how human memory prioritises recent experience.

1-Hop Neighbor Injection

Before returning recall results, Layer F walks the graph and injects 1-hop neighbors — tasks that are structurally connected to the recalled tasks but weren't found by vector search. This catches causal chains ("the deployment failed because the preceding build task failed") that pure similarity search would miss.


Layer G — PARA Canonical Knowledge

Module: memory/para_manager.py (9,924 bytes) Storage: JSON items + Markdown summaries Retention: Permanent (facts are never deleted, only superseded)

The PARA system (Projects, Areas, Resources, Archives) stores canonical, immutable facts — the highest-fidelity knowledge an agent possesses:

  • Projects — Enduring project facts (e.g., "Clawpy uses FastAPI on port 8000")
  • Areas — Ongoing responsibilities and operating preferences
  • Resources — Reusable references and technical knowledge
  • Archives — Completed work preserved for future reference

Each fact is an atomic ParaFact object with metadata:

ParaFact {
  id:             "fact_a1b2c3d4"
  fact:           "Owner prefers Claude for code review"
  category:       "preference"
  source:         "memory_extractor:ceo"
  status:         "active" | "superseded"
  access_count:   7
  last_accessed:  "2026-04-19T22:00:00Z"
  superseded_by:  null | "fact_e5f6g7h8"
}

Facts are never deleted — when a fact becomes outdated, it is superseded by a new fact, preserving the full correction history. This creates an auditable knowledge provenance chain.

Nightly PARA Promotion

The Memory Extractor runs nightly and uses an LLM to evaluate whether any daily synthesis output is durable enough to promote into PARA:

Daily Ledger → Nightly Synthesis → LLM Evaluation → PARA Promotion
                                    ↓
                              Validation Loop
                         (cost-capped, retry-aware)

The promotion process is governed by the Validation Loop with a configurable cost budget (default: 12 cents) and maximum fact count (default: 8 per cycle).


How the Layers Interact

A typical agent recall involves multiple layers working together:

  1. Auto-Recall (Layer D) orchestrates the query
  2. Vector Search (Layer C) finds semantically similar chunks
  3. Context Engine (Layer E) adds cross-session results
  4. Knowledge Graph (Layer F) re-ranks with time-decay and injects graph neighbors
  5. PARA (Layer G) provides canonical facts as ground truth
  6. token budget constraints determine how many results survive into the prompt

This multi-layer approach means an agent can simultaneously draw on recent conversation context (Layer A), historical event data (Layer B), semantically similar past work (Layer C+E), causally related tasks (Layer F), and canonical project knowledge (Layer G) — all within a single LLM call.