Introspection Loop
The Introspection Loop (introspection.py, 16,928 bytes) enables agents to pause and reflect on their own performance. After every N tool calls, the agent evaluates what worked, what failed, and whether any patterns are repeating — then stores the learnings in persistent memory.
In active mode, it can autonomously create new skills from detected patterns.
How It Works
Tool call #1 → record
Tool call #2 → record
...
Tool call #10 → record
│
┌──────────▼──────────┐
│ INTROSPECTION │
│ │
│ 1. Summarise actions │
│ 2. LLM evaluation │
│ 3. Extract learnings │
│ 4. Store in memory │
│ 5. Create skill? │
└──────────────────────┘
│
Tool call #11 → record (counter resets)
...
Three Modes
| Mode | Learning | Skill Creation | Use Case |
|---|---|---|---|
off | ❌ | ❌ | Disable introspection entirely |
passive | ✅ → Memory | ❌ | Learn from experience without modifying behaviour |
active | ✅ → Memory | ✅ → Pending skill | Full autonomous evolution |
Mode is set per-agent in comms.json:
{
"self_learn": "active",
"introspect_interval": 10
}
Evaluation Process
Step 1: Build Actions Summary
Recent tool calls are formatted into a readable timeline:
1. ✅ file_read(path="/api/routes.py")
→ [200 lines of Python code...]
2. ❌ bash_execute(cmd="pytest tests/")
→ FAILED: 3 tests failed
3. ✅ file_write(path="/api/routes.py", content="...")
→ File written successfully
4. ✅ bash_execute(cmd="pytest tests/")
→ All 15 tests passed
Step 2: LLM Self-Evaluation
The actions summary is sent to the LLM with a structured prompt:
Analyse these actions. What worked? What failed?
Are there patterns that keep repeating?
Should we create a skill to handle this better next time?
Expected JSON output:
{
"learnings": [
"file_read before file_write consistently catches import issues",
"Running pytest twice suggests a test-fix-verify pattern"
],
"repeated_pattern": true,
"pattern_description": "Read → Fix → Test → Verify loop",
"suggested_skill": {
"operation": "create",
"name": "test-fix-verify",
"category": "Development",
"description": "Automated read-fix-test cycle",
"trigger_phrases": ["fix and test", "TDD cycle"],
"content": "# Test-Fix-Verify\n1. Read the failing test...",
"support_files": []
},
"summary": "4 tool calls: 1 failure (test), self-healed with file edit"
}
Step 3: Validation
The LLM output passes through the Validation Loop with JsonObjectValidator:
learningsmust be a list of non-empty stringsrepeated_patternmust be a booleansummaryis required- If
repeated_pattern: trueand mode isactive,suggested_skillmust have a valid shape (name, content, operation)
Step 4: Store Learnings
Each learning is sent to the Auto-Capture system as an [INTROSPECTION LEARNING] message:
self.auto_capture.process_messages(
[{"role": "system", "content": f"[INTROSPECTION LEARNING] {learning}"}],
session_id="introspection_loop",
)
These learnings persist across sessions and become available for future recall.
Step 5: Skill Creation (Active Mode Only)
If a repeated pattern is detected, a skill proposal is created:
proposal = create_skill_proposal(
proposal=skill_draft,
source_agent="introspection_loop",
trigger="repeated_pattern",
summary=summary,
evidence=recent_tool_calls[-6:],
)
The proposal is queued for dashboard review — or auto-approved in Autonomous Mode:
if AdaptationPolicyStore().is_autonomous_mode_enabled():
approve_skill_proposal(proposal["id"], skills_root, review_notes="Auto-approved")
Prompt Fragment Integration
The Introspection Loop checks the Adaptation Overlay Store for active prompt fragments before running its evaluation:
entries = self.overlay_store.get_prompt_fragments(
agent_id="introspection_loop",
run_kind="introspection_evaluation",
)
If the Adaptation Engine has previously promoted a guidance fragment for introspection (e.g., "Prefer concrete patterns over generic self-critique"), it will be injected into the evaluation prompt — making future introspections sharper based on past feedback.
This creates a self-improving introspection loop: the system introspects, learns, feeds those learnings back into the Adaptation Engine, which improves the introspection prompts, which produces better introspections.
Cost Efficiency
Introspection evaluations use the cheapest available model (resolved via the Archetype Registry as ceo_junior). With a typical validation budget of 8 cents and 3 max retries, each introspection cycle costs less than a penny in most cases.