Adaptation Engine
The Adaptation Engine is Clawpy's autonomous self-improvement system. It captures runtime outcomes (failures, budget incidents, human approvals), synthesises them into improvement candidates, evaluates each candidate against held-out evidence, and — upon approval — promotes changes into the live system.
This is how Clawpy learns from its own mistakes without human intervention.
The Adaptation Pipeline
Runtime Events (15+ types)
│
▼
┌────────────────────────┐
│ Reflection Service │ ← Captures high-signal events
│ (reflection_service.py)│ into structured learning records
└──────────┬─────────────┘
│
▼
┌────────────────────────┐
│ Learning Digest │ ← Synthesises learning records
│ (learning_digest.py) │ into candidate proposals
└──────────┬─────────────┘
│
▼
┌────────────────────────┐
│ Candidate Evaluation │ ← Scores candidates against
│ │ held-out evidence
└──────────┬─────────────┘
│
▼
┌────────────────────────┐
│ Promotion │ ← Approved candidates become
│ │ active system modifications
└────────────────────────┘
Event Types
The Reflection Service captures 15+ distinct event types that signal learning opportunities:
| Event Type | Source | Signal |
|---|---|---|
failed_tdd | Validation Loop | A test-driven repair cycle failed |
successful_heal | Validation Loop | A validation failure was self-healed |
repeated_retry | Validation Loop | Same error type recurred across multiple runs |
task_conflict | Task Board | Two agents attempted conflicting work |
budget_incident | Budget Service | An agent exhausted its budget |
human_approval | Dashboard | A human approved a pending action |
agent_error | Tool Executor | A tool call failed unexpectedly |
successful_bubble | Bubble Filter | A worker learning was promoted to a leader |
guidance_applied | Wisdom Cascade | Leadership guidance was injected into a prompt |
guidance_noisy | Wisdom Cascade | Injected guidance was counterproductive |
stale_guidance | Wisdom Cascade | Leadership guidance was outdated |
teaching_refresh | Wisdom Teacher | A teaching cycle completed successfully |
cloud_sync_failure | Supabase Sync | Cloud memory sync failed |
cloud_memory_recall_hit | Context Engine | Cloud memory was successfully recalled |
tool_sequence | Tool Executor | A repeated multi-step tool pattern was detected |
Candidate Types
Learning records are synthesised into five candidate types:
1. Prompt Fragments
Injected into the system prompt for specific run kinds. Used to evolve how agents reason about tasks.
Sources: Research summaries, blueprint drafts, auto-reply evaluations, memory fact extraction, guidance events.
2. Fix Templates
Reusable repair playbooks for recurring failure patterns. Capture the successful repair strategy so it can be applied automatically next time.
Sources: Failed TDD runs, successful heals.
3. Routing Hints
Adjustments to the semantic router's behaviour. For example, routing ambiguous tasks to cheaper models after detecting budget pressure.
Sources: Budget incidents, task conflicts, successful bubbles.
4. Validator Policy Tweaks
Changes to validation parameters — tighter retry caps, adjusted timeout values, stricter guardrails.
Sources: Repeated retries, stale guidance, cloud sync failures, noisy guidance.
5. Flow Offloads
Proposals to convert repeated tool-call sequences into deterministic flows (see Flow Sequence Detector).
Sources: Detected tool sequence repetitions.
Candidate Lifecycle
detect → draft → score → corroborate → review → promote/reject
1. Detect
The Learning Digest scans recent learning records and identifies patterns:
def digest_reflection_opportunities(opportunities, *, limit=5, min_score=60.0):
# Filter: only records scoring above the threshold
# Deduplicate by (candidate_type, scope, source_label)
# Return top N candidate drafts
2. Draft
Each candidate is assigned a dedupe key to prevent duplicate proposals:
dedupe_key = "{candidate_type}:{scope_type}:{scope_id}:{source_label}"
# e.g., "routing_hint:agent:cto:budget_incident"
3. Score
Initial scoring uses the learning record's own score (0–100).
4. Corroborate
The candidate is evaluated against held-out evidence — learning records that weren't used to generate the candidate:
# Match: same source_label, same scope, different record IDs
avg_score = mean(matching_record_scores)
corroboration_bonus = min(15.0, num_matches * 5.0)
candidate_score = min(100, avg_score * 0.85 + corroboration_bonus)
A candidate with 3 corroborating records gets a +15 bonus. A candidate with zero corroboration is capped at 45 points and automatically fails.
5. Promote or Reject
Candidates scoring ≥ 60 (pass score) are promoted. The promotion effect depends on the candidate type:
| Candidate Type | Promotion Action |
|---|---|
| Prompt Fragment | Inject guidance into the Adaptation Overlay Store |
| Fix Template | Create TDD repair playbook entry |
| Routing Hint | Modify semantic router bias settings |
| Validator Policy Tweak | Override validation parameters for the run kind |
| Flow Offload | Register a new deterministic flow definition |
Adaptation Overlay Store
Promoted candidates are stored in the Adaptation Overlay Store (adaptation_overlay_store.py), which provides a persistent registry of active system modifications:
entries = overlay_store.get_prompt_fragments(
agent_id="introspection_loop",
run_kind="introspection_evaluation",
)
# Returns: [{"guidance": ["Prefer concrete patterns...", "Only suggest..."]}]
These entries are queried at runtime and injected into the appropriate prompts, creating a feedback loop where past failures inform future reasoning.
Autonomous Mode
In Autonomous Mode, the Adaptation Engine auto-approves candidates that score above the pass threshold without waiting for human review. This enables fully self-improving operation where the system evolves its own behaviour based on observed outcomes.
In manual mode, candidates are queued in the dashboard for operator review before promotion.