Introspection Loop

The Introspection Loop (introspection.py, 16,928 bytes) enables agents to pause and reflect on their own performance. After every N tool calls, the agent evaluates what worked, what failed, and whether any patterns are repeating — then stores the learnings in persistent memory.

In active mode, it can autonomously create new skills from detected patterns.


How It Works

Tool call #1    → record
Tool call #2    → record
...
Tool call #10   → record
                    │
         ┌──────────▼──────────┐
         │  INTROSPECTION       │
         │                      │
         │  1. Summarise actions │
         │  2. LLM evaluation   │
         │  3. Extract learnings │
         │  4. Store in memory   │
         │  5. Create skill?     │
         └──────────────────────┘
                    │
Tool call #11   → record (counter resets)
...

Three Modes

ModeLearningSkill CreationUse Case
offDisable introspection entirely
passive✅ → MemoryLearn from experience without modifying behaviour
active✅ → Memory✅ → Pending skillFull autonomous evolution

Mode is set per-agent in comms.json:

{
  "self_learn": "active",
  "introspect_interval": 10
}

Evaluation Process

Step 1: Build Actions Summary

Recent tool calls are formatted into a readable timeline:

1. ✅ file_read(path="/api/routes.py")
   → [200 lines of Python code...]
2. ❌ bash_execute(cmd="pytest tests/")
   → FAILED: 3 tests failed
3. ✅ file_write(path="/api/routes.py", content="...")
   → File written successfully
4. ✅ bash_execute(cmd="pytest tests/")
   → All 15 tests passed

Step 2: LLM Self-Evaluation

The actions summary is sent to the LLM with a structured prompt:

Analyse these actions. What worked? What failed?
Are there patterns that keep repeating?
Should we create a skill to handle this better next time?

Expected JSON output:

{
  "learnings": [
    "file_read before file_write consistently catches import issues",
    "Running pytest twice suggests a test-fix-verify pattern"
  ],
  "repeated_pattern": true,
  "pattern_description": "Read → Fix → Test → Verify loop",
  "suggested_skill": {
    "operation": "create",
    "name": "test-fix-verify",
    "category": "Development",
    "description": "Automated read-fix-test cycle",
    "trigger_phrases": ["fix and test", "TDD cycle"],
    "content": "# Test-Fix-Verify\n1. Read the failing test...",
    "support_files": []
  },
  "summary": "4 tool calls: 1 failure (test), self-healed with file edit"
}

Step 3: Validation

The LLM output passes through the Validation Loop with JsonObjectValidator:

  • learnings must be a list of non-empty strings
  • repeated_pattern must be a boolean
  • summary is required
  • If repeated_pattern: true and mode is active, suggested_skill must have a valid shape (name, content, operation)

Step 4: Store Learnings

Each learning is sent to the Auto-Capture system as an [INTROSPECTION LEARNING] message:

self.auto_capture.process_messages(
    [{"role": "system", "content": f"[INTROSPECTION LEARNING] {learning}"}],
    session_id="introspection_loop",
)

These learnings persist across sessions and become available for future recall.

Step 5: Skill Creation (Active Mode Only)

If a repeated pattern is detected, a skill proposal is created:

proposal = create_skill_proposal(
    proposal=skill_draft,
    source_agent="introspection_loop",
    trigger="repeated_pattern",
    summary=summary,
    evidence=recent_tool_calls[-6:],
)

The proposal is queued for dashboard review — or auto-approved in Autonomous Mode:

if AdaptationPolicyStore().is_autonomous_mode_enabled():
    approve_skill_proposal(proposal["id"], skills_root, review_notes="Auto-approved")

Prompt Fragment Integration

The Introspection Loop checks the Adaptation Overlay Store for active prompt fragments before running its evaluation:

entries = self.overlay_store.get_prompt_fragments(
    agent_id="introspection_loop",
    run_kind="introspection_evaluation",
)

If the Adaptation Engine has previously promoted a guidance fragment for introspection (e.g., "Prefer concrete patterns over generic self-critique"), it will be injected into the evaluation prompt — making future introspections sharper based on past feedback.

This creates a self-improving introspection loop: the system introspects, learns, feeds those learnings back into the Adaptation Engine, which improves the introspection prompts, which produces better introspections.


Cost Efficiency

Introspection evaluations use the cheapest available model (resolved via the Archetype Registry as ceo_junior). With a typical validation budget of 8 cents and 3 max retries, each introspection cycle costs less than a penny in most cases.