Memory Architecture — Deep Dive
Clawpy's memory is not a single database. It is a 22-module cognitive memory system spanning 7 storage layers, 4 recall sources, multimodal indexing, lossless context compression, and a self-tuning feedback loop. Every module is purpose-built and backed by source code in the memory/ directory.
This page documents every memory subsystem, how they work together, and how they compare to competing frameworks.
The Storage Stack
Layer 1: Markdown Flat Files
Source: memory/layer1_markdown.py
Human-readable Markdown files — editable in Obsidian, VS Code, or any text editor. This is the most transparent layer: you can browse and modify memories directly on disk.
Layer 2: SQLite Event Ledger
Source: memory/layer2_sqlite.py (35KB, 950+ lines)
A structured event ledger powered by SQLite with FTS5 full-text search. This layer handles:
- Structured event storage — every memory operation is recorded with timestamps, categories, and metadata
- FTS5 full-text indexing — instant keyword search with BM25 relevance scoring
- Recall telemetry — tracks which memories are recalled, how often, and from which source
- Feedback modifiers — stores operator feedback (helpful/noisy/pin) for adaptive recall tuning
- Recall count tracking — per-source hit counts for adaptive weighting
Layer 3: Vector Embeddings
Source: memory/layer3_vector.py + memory/embeddings.py
ChromaDB vector store for semantic similarity search. Memories are embedded as dense vectors and queried by cosine distance. This enables "find memories about this topic" — even when exact keywords don't match.
The Hybrid Search Engine
Source: memory/hybrid_search.py (351 lines)
This is the heart of Clawpy's recall system. It merges vector similarity with keyword matching in a single ranked result set:
final_score = α × vector_score + (1−α) × keyword_score × decay_multiplier
How It Works
- Vector search — ChromaDB cosine similarity (semantic meaning)
- Keyword search — SQLite FTS5 with BM25 scoring (exact matches)
- Weighted fusion — tunable
alphaparameter (default 0.6 = slightly vector-heavy) - Temporal decay — Ebbinghaus-inspired forgetting curve reduces scores for stale memories
- Access-aware boosting — every recall reduces effective age, so frequently-read memories decay slower
- MMR diversity re-ranking — Maximal Marginal Relevance prevents returning 3 near-identical results
The Forgetting Curve
decay = exp(−age × ln(2) / half_life)
effective_age = max(0, age − access_count × boost_factor × half_life)
- Half-life: 30 days (configurable)
- Access boost: Each recall subtracts
0.15 × half_lifefrom effective age - Result: A memory recalled 3 times has its effective age reduced by ~13.5 days — it stays relevant 45% longer
- Never deleted: Memories are deprioritized, not removed. They can always be recovered
No competitor implements access-aware temporal decay. OpenClaw, Hermes, Agent Zero, and Paperclip all treat memory as undifferentiated — no priority aging.
Auto-Capture: Intelligent Memory Extraction
Source: memory/auto_capture.py
Clawpy doesn't rely on the LLM to decide what to remember. A dedicated rule-based engine scans every user message and automatically captures facts worth preserving.
14 Trigger Patterns Across 4 Categories
| Category | What It Captures | Example Triggers |
|---|---|---|
| Preferences | Likes, dislikes, habits | "I prefer...", "I always...", "my favorite..." |
| Decisions | Agreed-upon choices | "We decided...", "let's use...", "let's stick with..." |
| Entities | Names, contacts, identifiers | Email addresses, phone numbers, "my name is..." |
| Facts | Technical details, stack info | "Our API key is...", "the project uses...", "the database is..." |
Safety Gates
Before any memory is stored:
- Length bounds — too short (<10 chars) = noise, too long (>500 chars) = dump, not fact
- System content filter — skips injected memory blocks, XML-looking content, markdown-heavy output, emoji-heavy responses
- Injection detection — runs
is_prompt_injection()— rejects adversarial payloads before storage - Only user messages — never captures assistant output (prevents self-poisoning)
- Cosine dedup — searches existing memories; if similarity > 0.90, skips as duplicate
- Rate limit — max 3 captures per conversation turn to prevent flooding
Cloud Sync & Bubble Filter
Captured memories are:
- Cloud-synced — fire-and-forget background push to Supabase for cross-device persistence
- Bubble-evaluated — worker discoveries are scored and selectively promoted to leaders via the Wisdom Cascade
Auto-Recall: Multi-Source Context Injection
Source: memory/auto_recall.py (950 lines, 40KB)
Before every agent turn, this module searches 4 sources simultaneously and injects the most relevant memories into the system prompt.
4 Recall Sources
| Source | What It Searches | Weight |
|---|---|---|
| Hybrid | ChromaDB vectors + SQLite FTS5 | 1.0 (baseline) |
| PARA | Durable canonical knowledge (Projects/Areas/Resources/Archives) | 1.08 (boosted) |
| Daily Notes | Recent operational residue (last 3 days) | 0.94 (slightly lower) |
| Cloud | Supabase remote memories (cross-device fallback) | 1.05 (slight boost) |
Adaptive Weighting
The weights above are base weights. Over time, the system observes which sources produce recalled memories and applies a bounded boost:
effective_weight = base_weight × (1 + max_boost × (source_hits / max_hits))
If PARA memories are consistently recalled, PARA gets a slight boost. If daily notes are rarely useful, their weight stays flat. Maximum adaptive boost: 18%.
Operator Feedback Loop
Operators can mark recalled memories as:
| Feedback | Score Delta | Effect |
|---|---|---|
| Helpful | +0.05 | Memory appears more readily in future |
| Noisy | −0.15 | Memory is deprioritised (3× stronger than helpful) |
| Pin | +0.25 | Memory is strongly prioritised (5× helpful) |
Modifiers are bounded: maximum +0.45, minimum −0.35. They're applied per-subject, so a noisy memory from one topic doesn't affect others.
Leave-One-Out Backtesting
The evaluate_feedback_cases() method replays past feedback against baseline vs. adapted weights, measuring:
- Baseline success rate vs. adapted success rate
- Improved cases vs. regressed cases
- Top-1 positive hit rate
This is a scientific validation loop for the recall system — it proves whether adaptation is actually working.
PARA Canonical Knowledge
Source: memory/para_manager.py
A structured knowledge management system following the PARA method:
| Category | Purpose |
|---|---|
| Projects | Active work with deadlines and deliverables |
| Areas | Ongoing responsibilities (no end date) |
| Resources | Reference material and curated knowledge |
| Archives | Completed or inactive items |
Each entity has:
- Summary — human-readable overview
- Active facts — structured, searchable knowledge items
- Access tracking — when was each fact last recalled?
- Fact lifecycle — facts can be created, updated, archived, and restored
Cognitive Pager: Lossless Context Compression
Source: memory/cognitive_pager.py
Two techniques for preventing token window exhaustion without losing information:
State Folding
When the agent calls the same tool 4+ times consecutively (e.g., 4 failed attempts + 1 success), the older executions are collapsed into a single AST node:
{
"Task": "run_tests",
"Attempts": 5,
"Final_Result": "passed",
"Action_Summary": "Attempted 5 times. Final state: passed"
}
The most recent result is preserved verbatim. Older attempts become [FOLDED].
Semantic Pointer Swapping
When a tool output exceeds 1,000 characters:
- The full text is stored in SQLite
- The output is replaced with
[REFER TO SQLite_DB id:ptr_abc123] - If the LLM needs the full text later,
resolve_pointer()retrieves it
This means zero information loss — the data is always accessible, just not consuming context tokens.
Context Compactor: LLM-Driven Summarization
Source: memory/context_compactor.py
When the conversation history exceeds the token budget (50% of the model's context window):
- Splits history into system prompt (always preserved), older messages (summarized), and recent 40% (preserved verbatim)
- Chains summaries — if a previous summary exists, it's incorporated and updated
- Preserves critical context — active tasks, decisions, code/file references, commitments, technical details
- Fallback — if LLM summarization fails, uses crude truncation with message counts
Query Expansion: LLM-Powered Recall Enhancement
Source: memory/query_expansion.py
Before searching memory, the query is expanded into 4 alternative phrasings using an LLM:
- Each variant uses different vocabulary, angles, or specificity
- Mixes broad and specific variants
- Includes conceptual and implementation-level variants
- Original query is always included as the first variant
- Results across all variants are merged, deduplicated, and ranked
This dramatically improves recall on fuzzy or semantic searches.
Multimodal Memory: Images & Audio
Source: memory/multimodal_memory.py
Agents can index and search visual and auditory content:
Images
Supported: JPG, PNG, WebP, GIF, HEIC, HEIF
- Image is base64-encoded
- Sent to a vision LLM (GPT-4o) with a detailed description prompt
- The description is embedded as a vector for semantic search
- Agents can later search by describing what they're looking for: "find the architecture diagram from last week"
Audio
Supported: MP3, WAV, OGG, OPUS, M4A, AAC, FLAC
- Audio is transcribed via Whisper
- Transcript is embedded as a vector
- Agents can search by content: "find the meeting where we discussed the deployment plan"
Wisdom Cascade: Hierarchical Knowledge Flow
Source: memory/wisdom_cascade.py (25KB) + memory/wisdom_teacher.py (13KB)
Knowledge doesn't just live in individual agents — it flows through the hierarchy:
Downward Flow (Teaching)
Leadership knowledge is injected into worker prompts with hop-based compression:
- 100% fidelity at direct leader
- 60% one hop up
- 30% two hops up
The wisdom_teacher.py module implements active teaching protocols — leaders don't just have wisdom, they actively push it to their reports.
Upward Flow (Bubble Filter)
Worker discoveries are evaluated against a quality threshold (score ≥ 7/10) and selectively promoted to leaders. Knowledge never leaks sideways — it flows strictly up and down.
Cloud Sync: Cross-Device Persistence
Source: memory/supabase_sync.py (26KB)
Full bidirectional synchronization with Supabase:
- Local memories are pushed to the cloud in fire-and-forget background threads
- Cloud memories are available as a fallback recall source
- The
sync_watchdog.pymonitors sync health - Enables cross-device memory persistence without manual export/import
Competitor Comparison — Memory
| Memory Capability | Clawpy | OpenClaw | Hermes | Agent Zero | Paperclip |
|---|---|---|---|---|---|
| Storage layers | ✅ 3 (Markdown + SQLite + Vector) | ⚠️ 1 (Markdown only) | ⚠️ 2 (Markdown + SQLite) | ⚠️ 2 (Files + FAISS) | ⚠️ 2 (Knowledge Graph + Notes) |
| Hybrid search (vector + keyword) | ✅ Fused with tunable alpha | ⚠️ Plugin-based | ❌ SQLite FTS only | ❌ FAISS vector only | ❌ None (orchestration layer) |
| Temporal decay (forgetting curve) | ✅ Ebbinghaus-inspired, access-aware | ❌ None | ❌ None | ❌ None | ❌ None |
| Access-aware boosting | ✅ Each recall extends memory life | ❌ None | ❌ None | ❌ None | ❌ None |
| MMR diversity re-ranking | ✅ Built-in | ❌ None | ❌ None | ❌ None | ❌ None |
| Rule-based auto-capture | ✅ 14 patterns, 4 categories | ❌ LLM decides | ❌ Manual | ❌ LLM decides | ❌ None |
| Self-poisoning prevention | ✅ Only captures user messages | ❌ Not enforced | ❌ Not enforced | ❌ Not enforced | ❌ No agent runtime |
| Cosine dedup (>0.90) | ✅ Before storage | ❌ None | ❌ None | ❌ None | ❌ None |
| Multi-source recall | ✅ 4 sources | ❌ Single source | ❌ Single source | ❌ Single source | ❌ None |
| Adaptive source weighting | ✅ Bounded telemetry-driven | ❌ None | ❌ None | ❌ None | ❌ None |
| Operator feedback loop | ✅ Helpful/Noisy/Pin modifiers | ❌ None | ❌ None | ❌ None | ❌ None |
| Leave-one-out backtesting | ✅ Scientific validation | ❌ None | ❌ None | ❌ None | ❌ None |
| PARA canonical knowledge | ✅ 4 categories + fact lifecycle | ❌ None | ❌ None | ❌ None | ⚠️ PARA-inspired (similar concept) |
| Lossless context compression | ✅ State Folding + Pointer Swapping | ❌ Destructive compaction | ❌ None | ⚠️ Summarization only | ❌ None |
| LLM-driven history summarization | ✅ Chained summaries | ⚠️ Memory flush (pre-compaction) | ❌ None | ⚠️ Basic summarization | ❌ None |
| Query expansion | ✅ LLM-powered (4 variants) | ❌ None | ❌ None | ❌ None | ❌ None |
| Multimodal memory (images + audio) | ✅ Vision LLM + Whisper | ❌ Text only | ❌ Text only | ❌ Text only | ❌ None |
| Bidirectional wisdom flow | ✅ Down (hop-compressed) + Up (bubble) | ❌ None | ❌ None | ❌ None | ⚠️ Goals flow down, results up |
| Active teaching protocol | ✅ Leader → Worker injection | ❌ None | ❌ None | ❌ None | ❌ None |
| Cloud sync (cross-device) | ✅ Supabase bidirectional | ❌ Local only | ❌ Local only | ❌ Local only | ❌ Local only |
| Injection guard on memory | ✅ 11 patterns + sanitisation | ❌ None | ⚠️ Basic | ❌ None | ❌ None |
| Memory inspection via dashboard | ✅ Full GUI | ⚠️ Read Markdown files | ⚠️ Read Markdown files | ⚠️ Browse directories | ⚠️ React dashboard (goals/audit) |
| User modeling | ✅ Alfred relationship memory | ❌ None | ✅ Honcho dialectic modeling | ❌ None | ❌ None |
The Fundamental Difference
OpenClaw stores memories as Markdown and relies on the LLM to decide what to save. This is non-deterministic — the LLM might forget to save, or save the wrong things. Community plugins (Mem0, LCM) try to patch this, but they're external dependencies.
Hermes has the strongest user modeling through Honcho (dialectic dual-peer reasoning), but it operates as a single agent with no hierarchical knowledge flow, no temporal decay, no hybrid search, and no multi-source recall.
Agent Zero uses FAISS for vector search and has project-isolated workspaces, but has no hybrid fusion, no rule-based capture, no temporal decay, no feedback loop, and no lossless compression.
Paperclip has a PARA-inspired memory concept (Knowledge Graph + Daily Notes that distill into durable facts) and goals flow down the org chart while results flow up. But Paperclip is an orchestration-only layer — it doesn't run agents itself, so it has no vector search, no hybrid fusion, no auto-capture, no temporal decay, no query expansion, no multimodal memory, and no lossless compression. Memory is handled by whatever runtime you plug into it (Claude Code, OpenClaw, etc.).
Clawpy treats memory as a cognitive architecture — not a storage bucket. It independently captures, decays, boosts, expands, compresses, syncs, and validates memory across 22 interconnected modules. Knowledge flows through the hierarchy. The operator can inspect, correct, pin, or mark memories as noisy. And the system scientifically validates whether its adaptive recall is actually improving.