Temporal Knowledge Graph
While vector search (Layers C and E) answers "what past work is similar to this task?", the Temporal Knowledge Graph answers a fundamentally different question: "what past work is causally connected to this task?"
Layer F is a lightweight directed graph stored as JSON. It tracks task relationships, applies time-decay scoring, and injects graph neighbors into recall results — surfacing causal chains that pure similarity search cannot find.
Why a Graph?
Consider this scenario: a deployment task fails. A vector search for "deployment failure" might surface similar past failures — useful, but incomplete. The knowledge graph adds structural context:
Task: "Fix API authentication"
├── blocks → "Deploy v2.4 to staging"
├── related_to → "Refactor OAuth flow"
└── derived_from → "Security audit findings"
By walking 1-hop graph neighbors, Clawpy discovers that the authentication fix was derived from a security audit and is blocking a deployment — context that vector similarity alone cannot provide.
Data Model
Nodes
Every completed task becomes a node in the graph:
TaskNode {
task_id: string // Unique identifier (e.g., "task_a1b2c3")
title: string // "Fix API authentication"
timestamp: float // time.time() when created/completed
success: bool | null // true = passed, false = failed, null = pending
description: string // Optional context or notes
}
Edges
Directed edges encode four types of relationships:
| Relationship | Direction | Meaning |
|---|---|---|
blocks | A → B | Task A's failure directly prevented Task B from proceeding |
related_to | A ↔ B | Tasks share domain, components, or codebase area |
derived_from | B → A | Task B was spawned or split from Task A |
similar_to | A ↔ B | Auto-linked when ChromaDB similarity exceeds threshold |
Edges are deduplicated by (source, target, relationship) — the same relationship between two tasks is never recorded twice.
Time-Decay Weighted Recall
The graph's most powerful feature is its time-weighted re-ranking of ChromaDB results. Raw vector search returns results ranked by semantic distance, but this ignores temporal relevance. The knowledge graph fixes this with a composite scoring formula:
combined_score = λ × semantic_score + (1 − λ) × recency_score
Where:
| Variable | Formula | Default |
|---|---|---|
semantic_score | 1 − chroma_distance | Range: 0.0 – 1.0 |
recency_score | exp(−age_days / τ) | Exponential decay |
λ (lambda) | SEMANTIC_WEIGHT | 0.7 |
τ (tau) | HALF_LIFE_DAYS | 14 days |
How Recency Scoring Works
The exponential decay function means:
| Age | Recency Score | Interpretation |
|---|---|---|
| 0 days (today) | 1.000 | Full recency weight |
| 7 days (1 week) | 0.607 | ~61% recency weight |
| 14 days (half-life) | 0.368 | ~37% — the half-life point |
| 28 days (1 month) | 0.135 | ~14% recency weight |
| 90 days (3 months) | 0.002 | ~0.2% — nearly forgotten |
With λ = 0.7, semantic similarity dominates the score, but recency acts as a tiebreaker: when two past tasks are equally similar to the current task, the more recent one ranks higher.
Worked Example
Two recalled tasks for the query "Fix SSL certificate rotation":
Task A: "Configured Let's Encrypt auto-renewal" — 3 days ago, distance 0.4
semantic = 1 − 0.4 = 0.6
recency = exp(−3 / 14) = 0.807
combined = 0.7 × 0.6 + 0.3 × 0.807 = 0.420 + 0.242 = 0.662
Task B: "Set up TLS termination on load balancer" — 45 days ago, distance 0.3
semantic = 1 − 0.3 = 0.7
recency = exp(−45 / 14) = 0.040
combined = 0.7 × 0.7 + 0.3 × 0.040 = 0.490 + 0.012 = 0.502
Result: Task A (0.662) ranks above Task B (0.502) despite Task B being more semantically similar, because Task A is much more recent.
1-Hop Neighbor Injection
After scoring direct ChromaDB hits, the graph walks one hop in each direction and injects connected tasks that weren't found by vector search:
for neighbor in graph.neighbors(task_id):
if neighbor.task_id in already_seen:
continue
# Penalize indirect results: halve the semantic score
combined = λ × (semantic_score × 0.5) + (1 − λ) × recency_score
results.append(neighbor_with_reduced_score)
Indirect neighbors receive a 50% penalty on their semantic component to avoid overwhelming the recall with loosely-connected results. At most MAX_GRAPH_NEIGHBORS = 3 neighbors are injected per direct hit.
Soft Invalidation
When a task's results become obsolete (e.g., a decision is reversed), the node is soft-invalidated rather than deleted:
def soft_invalidate_node(self, task_id, reason, stamp=""):
node.description += f"\n[invalidated:{stamp}] {reason}"
node.success = None # Reset outcome
self._save()
This preserves the full audit trail while signaling to the recall system that this task's results should be treated with lower confidence.
Storage
The entire graph is stored as a single JSON file:
{
"nodes": {
"task_a1b2c3": {
"task_id": "task_a1b2c3",
"title": "Fix API authentication",
"timestamp": 1713564000.0,
"success": true,
"description": ""
}
},
"edges": [
{
"source_id": "task_a1b2c3",
"target_id": "task_d4e5f6",
"relationship": "blocks",
"created_at": 1713564100.0,
"note": ""
}
],
"_meta": {
"saved_at": 1713564200.0,
"node_count": 42,
"edge_count": 67
}
}
Writes are atomic (write to .tmp, then rename) to prevent data corruption during crashes. The graph is loaded once at startup as a singleton and held in memory for fast traversal.