Why Clawpy
Clawpy is not a wrapper around an LLM. It is a deeply engineered cognitive operating system for autonomous agent swarms β with transparency, control, and self-learning built into every layer.
This page explains what makes Clawpy different from other agentic frameworks, and why those differences matter. Every claim below is backed by specific source modules in the codebase.
Intelligent Model Routing
Most frameworks use a single model for everything. Clawpy assigns models automatically by role through the Archetype Registry (core/archetype_registry.py).
Every tier gets a different model β the operator doesn't choose per-agent models. The archetype defines it:
| Tier | Role | Primary Model | Fallback Model | Cost |
|---|---|---|---|---|
| CEO | Strategic decisions | claude-4.6-opus | GPT-5.4 | π°π°π° Highest |
| CTO | Tech leadership | GPT-5.4-Codex | claude-4.6-opus | π°π°π° High |
| CFO / COO / CMO | Division leadership | MiniMax-M2.7 | claude-4.6-Sonnet | π°π° Medium |
| Senior Leads | Department execution | claude-4.6-Sonnet | MiniMax-M2.7 | π°π° Medium |
| Workers | Implementation | Moonshot/K2.5 | MiniMax-M2.7 | π° Low |
| Junior Workers | Repetitive tasks | claude-4.6-haiku | GPT-5.4-codex-mini | π° Lowest |
| Butler (Alfred) | Personal assistant | MiniMax-M2.7-highspeed | Moonshot/K2.5 | π° Optimised for speed |
How Resolution Works
The ArchetypeRegistry.resolve() method follows a strict priority chain:
- Vault override β
STACK_{ARCHETYPE_KEY}environment variable (full flexibility) - CSV/core default β
docs/archetypes.csvor hardcoded fallback table - Global fallback β
CLAWPY_MODELoropenai/gpt-4o
This means you can override any archetype's model with a single environment variable (STACK_CTO=your-preferred-model) while the system handles everything else automatically. The result: 50β80% cost savings compared to running the same model for every agent.
Hierarchical Delegation & Escalation
Clawpy agents don't operate in a flat structure. They form a corporate-style hierarchy where work flows down and problems flow up.
Downward: Leaders Delegate to Workers
Confirmed in orchestration/main_agent.py β the spawn_subagent tool (line 1086) explicitly enforces:
- Only leaders can spawn β "Workers cannot spawn further agents β only leads can."
- CPU/RAM resource checks before spawning β the system verifies host resources before allowing a new agent
- Lead agent owns the spawned worker β clear chain of responsibility
The CEO delegates complex tasks to division heads (CTO, COO, CMO). Division heads spawn worker agents for implementation. This is enforced by the archetype tier system β Worker and Junior Worker tiers are blocked from calling spawn_subagent.
CodingLane β Structured Delegation Pipeline
spawn_coding_lane (line 1032 in main_agent.py) implements a built-in three-stage pipeline:
- Senior Architect plans and writes the first pass
- Junior Developer iterates on bugs and refines
- Auditor reviews the output before returning the result
This is not a prompt hack β it's a real multi-agent pipeline with separate LLM calls at each stage, using the appropriate model tier for each role.
Upward: Automatic Escalation
When a worker stalls or fails, the system automatically escalates up the chain. This is confirmed in multiple source modules:
core/heartbeat_monitor.py β If an agent hasn't produced meaningful output in 5+ minutes:
- Force-pauses the agent
- Moves stalled tasks to
dead_letters/ - Reads the agent's
escalation_chain(e.g.,["cto", "ceo", "owner"]) - Escalates to the next authority in the chain
core/auto_reply/engine.py β If the auto-reply engine's confidence score is below the threshold:
- Drafts a reply but does NOT send it
- Publishes an ACP escalation message to the
orchestrator - Includes the draft, confidence score, and original payload for human review
core/seed_workspaces.py β Every seeded agent has a configured escalation_chain:
- CEO escalates to
["owner"] - CTO/COO/CMO escalate to
["ceo", "owner"] - Workers escalate to
["cto", "ceo", "owner"]or["lead_id", "ceo", "owner"] - Budget overruns > $10/hour: auto-escalate to CEO
core/learning_digest.py β Budget pressure incidents trigger automatic guidance: "escalate ambiguous work to the senior route"
No human babysitting required. Problems find their way to the right level of authority.
Autonomy Control β From Supervised to Full Mad Dog
Clawpy provides a complete autonomy control system (core/autonomy_mode_store.py) with 4 named presets and granular scoping.
The 4 Presets
| Preset | Execution Profile | Governance | Confirmation | What It Means |
|---|---|---|---|---|
| Safe | Off (human review) | Manual | Strict | Human approval for all risky actions |
| Execution Autonomy | Mad Dog π | Manual | Standard | Agents execute freely, but hiring/skills need approval |
| Governance Autonomy | Off | Auto-approve | Elevated | Skills and hiring auto-approved, but execution gated |
| Maximum Autonomy | Mad Dog π | Auto-approve | Maximum | Full autonomy β constrained only by hard safety policies |
Scoped at Every Level
The killer feature: you can set different autonomy per scope:
- Global β default for everything
- Workspace β per-project override
- Agent β per-individual agent override
- Lucius (CEO) β special scope for the CEO agent
- Flow β per-workflow override
- Task β per-task override
Resolution priority: Task β Flow β Agent β Lucius β Workspace β Global
What This Enables
- Run your CEO in Safe mode (human approval for strategic decisions) while junior workers run in Maximum Autonomy (full mad dog on repetitive tasks)
- Lock down production workspaces while letting staging run wild
- Give a specific agent elevated trust for a specific workflow
- Override everything at the task level for one-off operations
Full Transparency
The operator can:
- See every decision, every cost, every memory β in the dashboard
- Override any agent's model, budget, or autonomy level β at any time
- Go from "approve everything" to "let it rip" β with a single preset switch
- Scope that control at any level of granularity β global, workspace, agent, or task
Self-Learning Architecture
Clawpy doesn't just execute β it learns from its own behaviour and evolves autonomously. Four systems work together:
1. Adaptation Engine
Confirmed in core/reflection_service.py and core/learning_digest.py.
Captures 15+ runtime event types (task failures, budget incidents, escalations, human approvals, successful completions) and synthesises them into improvement candidates:
- Prompt Fragments β Injected into system prompts to improve reasoning
- Fix Templates β Reusable repair playbooks for recurring failures
- Routing Hints β Adjustments to how work is assigned across the hierarchy
- Validator Policy Tweaks β Tighter guardrails or relaxed constraints based on evidence
- Flow Offloads β Convert repeated patterns into deterministic workflows
Each candidate is scored, corroborated against held-out evidence, and promoted only if it demonstrates statistically significant improvement. See Adaptation Engine for the full pipeline.
2. Introspection Loop
Confirmed in memory/introspection.py.
Every N tool calls, each agent pauses to self-evaluate: What worked? What failed? Are patterns repeating? The introspection loop:
- Evaluates recent tool-call performance
- Extracts learnings as structured records
- In active mode, autonomously creates new skills from detected patterns
- Feeds learnings back into the Adaptation Engine
This is a self-improving feedback loop β introspection learnings feed the Adaptation Engine, which improves introspection prompts, which produces better introspections. See Introspection Loop.
3. Flow Sequence Detector
Confirmed in core/flow_sequence_detector.py.
Watches for repeated 3+ step tool-call sequences and proposes deterministic flow definitions:
- Detects repeating sequences using sliding-window pattern analysis
- Proposes flow definitions that replace LLM reasoning with cheap, predictable execution
- Estimated savings: 95% per execution on offloaded flows
- Flows are validated before activation to prevent regressions
4. Wisdom Cascade
Confirmed in memory/wisdom_cascade.py.
Knowledge doesn't just live in individual agents β it flows through the hierarchy:
- Downward (M1 + M2): Leadership principles are injected into worker prompts with hop-based compression:
- 100% fidelity at direct leader
- 60% one hop up
- 30% two hops up
- Upward (Bubble Filter): The Bubble Filter evaluates worker discoveries (threshold: score β₯ 7/10) and selectively promotes them to leaders
- Knowledge never leaks sideways β it flows strictly up and down the hierarchy
See Wisdom Cascade.
Long-Horizon Task Orchestration
Single-session AI agent demos collapse under real-world engineering work. Long-horizon tasks β refactoring an entire module, building a multi-file feature, or executing a 20-step deployment pipeline β create intense pressures around scope, context, and execution that simple chat interactions cannot handle.
Clawpy was architected from the ground up to survive multi-step, multi-hour autonomous execution. The system addresses three foundational pillars, confirmed in docs/agentorc.md and implemented across the orchestration layer:
Pillar 1: Scope Management
Confirmed in orchestration/main_agent.py and core/seed_workspaces.py.
Work is decomposed into small, verifiable units. Each sub-agent receives a narrow, bounded objective with explicit success criteria β eliminating drift, infinite loops, and speculative rewrites that plague standard agents.
- PRDs and specifications define exactly what an agent is trying to accomplish
- Phase-based plans break complex objectives into isolated, individually verifiable steps
- Per-task workspaces prevent agents from touching unrelated files
- Narrow-goal sub-agents ensure no single agent is overwhelmed with too broad a scope
Pillar 2: Context Engineering
Confirmed in memory/ modules and core/reflection_service.py.
Context isn't just a prompt β it's a structurally engineered environment. Persistent memory, active plans, coding conventions, and past-action summaries are loaded dynamically so agents comprehend the full picture, not just the last message.
- Externalized context: Important knowledge exists as retrievable artifacts β state files, research notes, plans, git history, skills, summaries β not buried in ephemeral chat logs
- Layered memory retrieval: The 7-layer memory stack ensures the right context is loaded at the right time
- Session continuity: State is distributed across the Ticket (managerial), the Workspace (operational), and the Workflow Policy (behavioral) β surviving restarts and handoffs
Pillar 3: Verified Execution
Confirmed in core/heartbeat_monitor.py and orchestration/task_board.py.
Generation is easy; establishing correctness is hard. Clawpy closes the control loop that chat-based agents leave wide open:
- Automated tests, type-checking, and linting validate every generated step before the system advances
- Structured error reporting feeds failures back into the next iteration with full diagnostic context
- Stall detection (HeartbeatMonitor) catches agents that spin without progress and auto-escalates
- Human or AI-agent reviews gate critical transitions via the approval workflow
The Continuous Cycle
The three pillars form a continuous work loop β not a one-shot prompt:
Define Scope β Prepare Context β Execute Work β Verify Result β Update State β Continue or Split
- Scope β Context: Badly scoped tasks create noisy, chaotic context
- Context β Execution: Poor context leads to unreliable action
- Execution β Scope: Every action produces new state; repeated failures signal scope is too broad and trigger automatic decomposition
Why This Matters
Standard chat-based agents (ChatGPT sessions, basic agent scripts) collapse under long-horizon pressure because they use ephemeral memory β the model forgets what it did 30 steps ago. Clawpy survives because it treats the repository itself as memory infrastructure, persisting state externally, delegating steps sequentially, and verifying them structurally. The human shifts from interactive prompter to strategic workflow manager.
Competitor Comparison
Does OpenClaw Do Any of This?
From our codebase audit: No.
- OpenClaw uses a flat agent model β all agents use the same model (or user-specified per-agent). No automatic tiered routing by role.
- OpenClaw has no escalation chains β if an agent fails, the user must intervene.
- OpenClaw has no hierarchical delegation β there's no concept of leaders spawning workers or seniors handing tasks to juniors.
- OpenClaw has no CodingLane-style pipelines β no structured seniorβjuniorβauditor flow.
- OpenClaw has no stall detection β no heartbeat monitor to auto-kill stuck agents.
- OpenClaw is either on or off β no tiered autonomy, no per-agent scoping, no preset system.
Full Feature Matrix
| Feature | Clawpy | OpenClaw | Hermes | Agent Zero | Paperclip |
|---|---|---|---|---|---|
| Named autonomy presets | β 4 presets (Safe β Maximum) | β No presets | β No presets | β No presets | β No presets |
| Per-scope autonomy control | β 6 scoping levels | β Global only | β Global only | β N/A | β N/A |
| Automatic tiered model routing | β By archetype (6 tiers) | β οΈ Manual per-agent | β οΈ Single model | β Single model | β οΈ Per-agent config |
| Primary + fallback model chains | β Per archetype | β Single model | β Single model | β Single model | β Single model |
| Hierarchical delegation | β Leaders spawn workers (tier-enforced) | β οΈ Coordinator pattern, no enforcement | β Single-agent | β οΈ call_subordinate (basic) | β Org chart with CEO |
| CodingLane pipeline | β SeniorβJuniorβAuditor | β None | β None | β None | β None |
| Automatic escalation chains | β Per-agent, configurable | β Manual intervention | β No hierarchy | β No hierarchy | β οΈ Reporting lines |
| Budget enforcement | β Soft/hard thresholds, auto-pause | β οΈ Guidance only, no enforcement | β None | β None | β Per-agent monthly budget |
| Stall detection & auto-kill | β HeartbeatMonitor (5min threshold) | β None | β None | β None | β οΈ Heartbeat (scheduled, not real-time) |
| Self-learning (adaptation) | β 15+ event types β 5 candidate types | β None | β οΈ Basic skill extraction | β None | β None |
| Self-learning (introspection) | β Periodic + auto-skill creation | β None | β οΈ Learn loop (similar concept) | β None | β None |
| Flow offloading | β Auto-detect β deterministic flows | β None | β None | β None | β None |
| Wisdom cascade (bidirectional) | β Down (hop-compressed) + Bubble Filter up | β None | β None | β None | β οΈ Goals down, results up |
| Memory architecture | β 7 layers + Ebbinghaus forgetting curve | β οΈ Markdown + basic recall | β οΈ SQLite + Markdown | β οΈ Basic context | β οΈ Knowledge Graph + Notes |
| Temporal knowledge graph | β Time-decay weighted recall | β None | β None | β None | β οΈ Basic Knowledge Graph |
| Cryptographic intent binding | β SHA-256 per-request cipher | β None | β None | β None | β None |
| Two-tier security scanner | β Regex hard-block + LLM deep scan | β Pattern blocking only | β οΈ Dangerous pattern detection | β None | β No agent runtime |
| Dashboard observability | β Full GUI (memory, costs, hierarchy) | β οΈ Mission Control (logs/flows) | β Terminal only | β οΈ Web UI (basic) | β React dashboard |
| Butler / personal assistant | β Alfred (relationship-aware, memory-persistent) | β None | β None | β None | β None |
| Dual contact points | β Alfred (ops) + Lucius (strategy) | β Single agent | β Single agent | β Single agent | β οΈ CEO only (user = Board) |
| Guided onboarding | β Guardian walkthrough + Alfred guidance | β οΈ Documentation-driven | β CLI setup | β CLI setup | β οΈ Company setup wizard |
What This Means
Clawpy's competitors offer transparency as an afterthought β logs you can read, files you can inspect. Clawpy offers transparency as architecture:
- 4 named presets with clear, descriptive labels
- 6 scoping levels for granular control
- Per-agent autonomy overrides from the dashboard
- Full graphical observability into memory, costs, hierarchy, and learning
The closest competitor is Paperclip, which shares Clawpy's org-chart philosophy and has strong governance features (budgets, audit trails, heartbeats, React dashboard). However, Paperclip is an orchestration-only layer β it doesn't provide an agent runtime. It wraps Claude Code, OpenClaw, or other runtimes as "employees." This means it has no memory architecture of its own, no security stack, no self-learning, no intent cipher, and no butler. Clawpy is a complete, vertically-integrated system β runtime, memory, security, learning, and observability in one package.
The operator can go from "approve everything" to "full mad dog" with a single setting β and scope that trust differently per workspace, per agent, or per task.
Out of the Box
Everything described on this page ships with Clawpy. No plugins to install. No external services to configure. No YAML to write.
| Capability | Status |
|---|---|
| 7-layer memory with Ebbinghaus forgetting curve | Built-in |
| Temporal knowledge graph with time-decay recall | Built-in |
| PARA canonical knowledge manager | Built-in |
| Wisdom cascade with bubble filter | Built-in |
| Archetype registry with tiered model routing | Built-in |
| Primary + fallback model chains per archetype | Built-in |
| Corporate hierarchy with delegation & escalation | Built-in |
| CodingLane (seniorβjuniorβauditor pipeline) | Built-in |
| 4 autonomy presets with 6 scoping levels | Built-in |
| Adaptation engine with 5 candidate types | Built-in |
| Introspection loop with auto-skill creation | Built-in |
| Flow sequence detection and offloading | Built-in |
| Budget enforcement with soft-warn and hard-stop | Built-in |
| HeartbeatMonitor stall detection and auto-kill | Built-in |
| Guardian two-tier security scanner | Built-in |
| Cryptographic intent cipher (SHA-256) | Built-in |
| Docker sandbox isolation (DooD) | Built-in |
| Alfred personal butler | Built-in |
| Auto-reply with confidence-gated escalation | Built-in |
| Dashboard with full observability | Built-in |
| Discord, Telegram, Brave, X integrations | Built-in |