Memory Architecture — Deep Dive

Clawpy's memory is not a single database. It is a 22-module cognitive memory system spanning 7 storage layers, 4 recall sources, multimodal indexing, lossless context compression, and a self-tuning feedback loop. Every module is purpose-built and backed by source code in the memory/ directory.

This page documents every memory subsystem, how they work together, and how they compare to competing frameworks.


The Storage Stack

Layer 1: Markdown Flat Files

Source: memory/layer1_markdown.py

Human-readable Markdown files — editable in Obsidian, VS Code, or any text editor. This is the most transparent layer: you can browse and modify memories directly on disk.

Layer 2: SQLite Event Ledger

Source: memory/layer2_sqlite.py (35KB, 950+ lines)

A structured event ledger powered by SQLite with FTS5 full-text search. This layer handles:

  • Structured event storage — every memory operation is recorded with timestamps, categories, and metadata
  • FTS5 full-text indexing — instant keyword search with BM25 relevance scoring
  • Recall telemetry — tracks which memories are recalled, how often, and from which source
  • Feedback modifiers — stores operator feedback (helpful/noisy/pin) for adaptive recall tuning
  • Recall count tracking — per-source hit counts for adaptive weighting

Layer 3: Vector Embeddings

Source: memory/layer3_vector.py + memory/embeddings.py

ChromaDB vector store for semantic similarity search. Memories are embedded as dense vectors and queried by cosine distance. This enables "find memories about this topic" — even when exact keywords don't match.


The Hybrid Search Engine

Source: memory/hybrid_search.py (351 lines)

This is the heart of Clawpy's recall system. It merges vector similarity with keyword matching in a single ranked result set:

final_score = α × vector_score + (1−α) × keyword_score × decay_multiplier

How It Works

  1. Vector search — ChromaDB cosine similarity (semantic meaning)
  2. Keyword search — SQLite FTS5 with BM25 scoring (exact matches)
  3. Weighted fusion — tunable alpha parameter (default 0.6 = slightly vector-heavy)
  4. Temporal decay — Ebbinghaus-inspired forgetting curve reduces scores for stale memories
  5. Access-aware boosting — every recall reduces effective age, so frequently-read memories decay slower
  6. MMR diversity re-ranking — Maximal Marginal Relevance prevents returning 3 near-identical results

The Forgetting Curve

decay = exp(−age × ln(2) / half_life)
effective_age = max(0, age − access_count × boost_factor × half_life)
  • Half-life: 30 days (configurable)
  • Access boost: Each recall subtracts 0.15 × half_life from effective age
  • Result: A memory recalled 3 times has its effective age reduced by ~13.5 days — it stays relevant 45% longer
  • Never deleted: Memories are deprioritized, not removed. They can always be recovered

No competitor implements access-aware temporal decay. OpenClaw, Hermes, Agent Zero, and Paperclip all treat memory as undifferentiated — no priority aging.


Auto-Capture: Intelligent Memory Extraction

Source: memory/auto_capture.py

Clawpy doesn't rely on the LLM to decide what to remember. A dedicated rule-based engine scans every user message and automatically captures facts worth preserving.

14 Trigger Patterns Across 4 Categories

CategoryWhat It CapturesExample Triggers
PreferencesLikes, dislikes, habits"I prefer...", "I always...", "my favorite..."
DecisionsAgreed-upon choices"We decided...", "let's use...", "let's stick with..."
EntitiesNames, contacts, identifiersEmail addresses, phone numbers, "my name is..."
FactsTechnical details, stack info"Our API key is...", "the project uses...", "the database is..."

Safety Gates

Before any memory is stored:

  1. Length bounds — too short (<10 chars) = noise, too long (>500 chars) = dump, not fact
  2. System content filter — skips injected memory blocks, XML-looking content, markdown-heavy output, emoji-heavy responses
  3. Injection detection — runs is_prompt_injection() — rejects adversarial payloads before storage
  4. Only user messages — never captures assistant output (prevents self-poisoning)
  5. Cosine dedup — searches existing memories; if similarity > 0.90, skips as duplicate
  6. Rate limit — max 3 captures per conversation turn to prevent flooding

Cloud Sync & Bubble Filter

Captured memories are:

  • Cloud-synced — fire-and-forget background push to Supabase for cross-device persistence
  • Bubble-evaluated — worker discoveries are scored and selectively promoted to leaders via the Wisdom Cascade

Auto-Recall: Multi-Source Context Injection

Source: memory/auto_recall.py (950 lines, 40KB)

Before every agent turn, this module searches 4 sources simultaneously and injects the most relevant memories into the system prompt.

4 Recall Sources

SourceWhat It SearchesWeight
HybridChromaDB vectors + SQLite FTS51.0 (baseline)
PARADurable canonical knowledge (Projects/Areas/Resources/Archives)1.08 (boosted)
Daily NotesRecent operational residue (last 3 days)0.94 (slightly lower)
CloudSupabase remote memories (cross-device fallback)1.05 (slight boost)

Adaptive Weighting

The weights above are base weights. Over time, the system observes which sources produce recalled memories and applies a bounded boost:

effective_weight = base_weight × (1 + max_boost × (source_hits / max_hits))

If PARA memories are consistently recalled, PARA gets a slight boost. If daily notes are rarely useful, their weight stays flat. Maximum adaptive boost: 18%.

Operator Feedback Loop

Operators can mark recalled memories as:

FeedbackScore DeltaEffect
Helpful+0.05Memory appears more readily in future
Noisy−0.15Memory is deprioritised (3× stronger than helpful)
Pin+0.25Memory is strongly prioritised (5× helpful)

Modifiers are bounded: maximum +0.45, minimum −0.35. They're applied per-subject, so a noisy memory from one topic doesn't affect others.

Leave-One-Out Backtesting

The evaluate_feedback_cases() method replays past feedback against baseline vs. adapted weights, measuring:

  • Baseline success rate vs. adapted success rate
  • Improved cases vs. regressed cases
  • Top-1 positive hit rate

This is a scientific validation loop for the recall system — it proves whether adaptation is actually working.


PARA Canonical Knowledge

Source: memory/para_manager.py

A structured knowledge management system following the PARA method:

CategoryPurpose
ProjectsActive work with deadlines and deliverables
AreasOngoing responsibilities (no end date)
ResourcesReference material and curated knowledge
ArchivesCompleted or inactive items

Each entity has:

  • Summary — human-readable overview
  • Active facts — structured, searchable knowledge items
  • Access tracking — when was each fact last recalled?
  • Fact lifecycle — facts can be created, updated, archived, and restored

Cognitive Pager: Lossless Context Compression

Source: memory/cognitive_pager.py

Two techniques for preventing token window exhaustion without losing information:

State Folding

When the agent calls the same tool 4+ times consecutively (e.g., 4 failed attempts + 1 success), the older executions are collapsed into a single AST node:

{
  "Task": "run_tests",
  "Attempts": 5,
  "Final_Result": "passed",
  "Action_Summary": "Attempted 5 times. Final state: passed"
}

The most recent result is preserved verbatim. Older attempts become [FOLDED].

Semantic Pointer Swapping

When a tool output exceeds 1,000 characters:

  1. The full text is stored in SQLite
  2. The output is replaced with [REFER TO SQLite_DB id:ptr_abc123]
  3. If the LLM needs the full text later, resolve_pointer() retrieves it

This means zero information loss — the data is always accessible, just not consuming context tokens.


Context Compactor: LLM-Driven Summarization

Source: memory/context_compactor.py

When the conversation history exceeds the token budget (50% of the model's context window):

  1. Splits history into system prompt (always preserved), older messages (summarized), and recent 40% (preserved verbatim)
  2. Chains summaries — if a previous summary exists, it's incorporated and updated
  3. Preserves critical context — active tasks, decisions, code/file references, commitments, technical details
  4. Fallback — if LLM summarization fails, uses crude truncation with message counts

Query Expansion: LLM-Powered Recall Enhancement

Source: memory/query_expansion.py

Before searching memory, the query is expanded into 4 alternative phrasings using an LLM:

  • Each variant uses different vocabulary, angles, or specificity
  • Mixes broad and specific variants
  • Includes conceptual and implementation-level variants
  • Original query is always included as the first variant
  • Results across all variants are merged, deduplicated, and ranked

This dramatically improves recall on fuzzy or semantic searches.


Multimodal Memory: Images & Audio

Source: memory/multimodal_memory.py

Agents can index and search visual and auditory content:

Images

Supported: JPG, PNG, WebP, GIF, HEIC, HEIF

  1. Image is base64-encoded
  2. Sent to a vision LLM (GPT-4o) with a detailed description prompt
  3. The description is embedded as a vector for semantic search
  4. Agents can later search by describing what they're looking for: "find the architecture diagram from last week"

Audio

Supported: MP3, WAV, OGG, OPUS, M4A, AAC, FLAC

  1. Audio is transcribed via Whisper
  2. Transcript is embedded as a vector
  3. Agents can search by content: "find the meeting where we discussed the deployment plan"

Wisdom Cascade: Hierarchical Knowledge Flow

Source: memory/wisdom_cascade.py (25KB) + memory/wisdom_teacher.py (13KB)

Knowledge doesn't just live in individual agents — it flows through the hierarchy:

Downward Flow (Teaching)

Leadership knowledge is injected into worker prompts with hop-based compression:

  • 100% fidelity at direct leader
  • 60% one hop up
  • 30% two hops up

The wisdom_teacher.py module implements active teaching protocols — leaders don't just have wisdom, they actively push it to their reports.

Upward Flow (Bubble Filter)

Worker discoveries are evaluated against a quality threshold (score ≥ 7/10) and selectively promoted to leaders. Knowledge never leaks sideways — it flows strictly up and down.


Cloud Sync: Cross-Device Persistence

Source: memory/supabase_sync.py (26KB)

Full bidirectional synchronization with Supabase:

  • Local memories are pushed to the cloud in fire-and-forget background threads
  • Cloud memories are available as a fallback recall source
  • The sync_watchdog.py monitors sync health
  • Enables cross-device memory persistence without manual export/import

Competitor Comparison — Memory

Memory CapabilityClawpyOpenClawHermesAgent ZeroPaperclip
Storage layers✅ 3 (Markdown + SQLite + Vector)⚠️ 1 (Markdown only)⚠️ 2 (Markdown + SQLite)⚠️ 2 (Files + FAISS)⚠️ 2 (Knowledge Graph + Notes)
Hybrid search (vector + keyword)✅ Fused with tunable alpha⚠️ Plugin-based❌ SQLite FTS only❌ FAISS vector only❌ None (orchestration layer)
Temporal decay (forgetting curve)✅ Ebbinghaus-inspired, access-aware❌ None❌ None❌ None❌ None
Access-aware boosting✅ Each recall extends memory life❌ None❌ None❌ None❌ None
MMR diversity re-ranking✅ Built-in❌ None❌ None❌ None❌ None
Rule-based auto-capture✅ 14 patterns, 4 categories❌ LLM decides❌ Manual❌ LLM decides❌ None
Self-poisoning prevention✅ Only captures user messages❌ Not enforced❌ Not enforced❌ Not enforced❌ No agent runtime
Cosine dedup (>0.90)✅ Before storage❌ None❌ None❌ None❌ None
Multi-source recall✅ 4 sources❌ Single source❌ Single source❌ Single source❌ None
Adaptive source weighting✅ Bounded telemetry-driven❌ None❌ None❌ None❌ None
Operator feedback loop✅ Helpful/Noisy/Pin modifiers❌ None❌ None❌ None❌ None
Leave-one-out backtesting✅ Scientific validation❌ None❌ None❌ None❌ None
PARA canonical knowledge✅ 4 categories + fact lifecycle❌ None❌ None❌ None⚠️ PARA-inspired (similar concept)
Lossless context compression✅ State Folding + Pointer Swapping❌ Destructive compaction❌ None⚠️ Summarization only❌ None
LLM-driven history summarization✅ Chained summaries⚠️ Memory flush (pre-compaction)❌ None⚠️ Basic summarization❌ None
Query expansion✅ LLM-powered (4 variants)❌ None❌ None❌ None❌ None
Multimodal memory (images + audio)✅ Vision LLM + Whisper❌ Text only❌ Text only❌ Text only❌ None
Bidirectional wisdom flow✅ Down (hop-compressed) + Up (bubble)❌ None❌ None❌ None⚠️ Goals flow down, results up
Active teaching protocol✅ Leader → Worker injection❌ None❌ None❌ None❌ None
Cloud sync (cross-device)✅ Supabase bidirectional❌ Local only❌ Local only❌ Local only❌ Local only
Injection guard on memory✅ 11 patterns + sanitisation❌ None⚠️ Basic❌ None❌ None
Memory inspection via dashboard✅ Full GUI⚠️ Read Markdown files⚠️ Read Markdown files⚠️ Browse directories⚠️ React dashboard (goals/audit)
User modeling✅ Alfred relationship memory❌ None✅ Honcho dialectic modeling❌ None❌ None

The Fundamental Difference

OpenClaw stores memories as Markdown and relies on the LLM to decide what to save. This is non-deterministic — the LLM might forget to save, or save the wrong things. Community plugins (Mem0, LCM) try to patch this, but they're external dependencies.

Hermes has the strongest user modeling through Honcho (dialectic dual-peer reasoning), but it operates as a single agent with no hierarchical knowledge flow, no temporal decay, no hybrid search, and no multi-source recall.

Agent Zero uses FAISS for vector search and has project-isolated workspaces, but has no hybrid fusion, no rule-based capture, no temporal decay, no feedback loop, and no lossless compression.

Paperclip has a PARA-inspired memory concept (Knowledge Graph + Daily Notes that distill into durable facts) and goals flow down the org chart while results flow up. But Paperclip is an orchestration-only layer — it doesn't run agents itself, so it has no vector search, no hybrid fusion, no auto-capture, no temporal decay, no query expansion, no multimodal memory, and no lossless compression. Memory is handled by whatever runtime you plug into it (Claude Code, OpenClaw, etc.).

Clawpy treats memory as a cognitive architecture — not a storage bucket. It independently captures, decays, boosts, expands, compresses, syncs, and validates memory across 22 interconnected modules. Knowledge flows through the hierarchy. The operator can inspect, correct, pin, or mark memories as noisy. And the system scientifically validates whether its adaptive recall is actually improving.