Why Clawpy

Clawpy is not a wrapper around an LLM. It is a deeply engineered cognitive operating system for autonomous agent swarms β€” with transparency, control, and self-learning built into every layer.

This page explains what makes Clawpy different from other agentic frameworks, and why those differences matter. Every claim below is backed by specific source modules in the codebase.


Intelligent Model Routing

Most frameworks use a single model for everything. Clawpy assigns models automatically by role through the Archetype Registry (core/archetype_registry.py).

Every tier gets a different model β€” the operator doesn't choose per-agent models. The archetype defines it:

TierRolePrimary ModelFallback ModelCost
CEOStrategic decisionsclaude-4.6-opusGPT-5.4πŸ’°πŸ’°πŸ’° Highest
CTOTech leadershipGPT-5.4-Codexclaude-4.6-opusπŸ’°πŸ’°πŸ’° High
CFO / COO / CMODivision leadershipMiniMax-M2.7claude-4.6-SonnetπŸ’°πŸ’° Medium
Senior LeadsDepartment executionclaude-4.6-SonnetMiniMax-M2.7πŸ’°πŸ’° Medium
WorkersImplementationMoonshot/K2.5MiniMax-M2.7πŸ’° Low
Junior WorkersRepetitive tasksclaude-4.6-haikuGPT-5.4-codex-miniπŸ’° Lowest
Butler (Alfred)Personal assistantMiniMax-M2.7-highspeedMoonshot/K2.5πŸ’° Optimised for speed

How Resolution Works

The ArchetypeRegistry.resolve() method follows a strict priority chain:

  1. Vault override β†’ STACK_{ARCHETYPE_KEY} environment variable (full flexibility)
  2. CSV/core default β†’ docs/archetypes.csv or hardcoded fallback table
  3. Global fallback β†’ CLAWPY_MODEL or openai/gpt-4o

This means you can override any archetype's model with a single environment variable (STACK_CTO=your-preferred-model) while the system handles everything else automatically. The result: 50–80% cost savings compared to running the same model for every agent.


Hierarchical Delegation & Escalation

Clawpy agents don't operate in a flat structure. They form a corporate-style hierarchy where work flows down and problems flow up.

Downward: Leaders Delegate to Workers

Confirmed in orchestration/main_agent.py β€” the spawn_subagent tool (line 1086) explicitly enforces:

  • Only leaders can spawn β€” "Workers cannot spawn further agents β€” only leads can."
  • CPU/RAM resource checks before spawning β€” the system verifies host resources before allowing a new agent
  • Lead agent owns the spawned worker β€” clear chain of responsibility

The CEO delegates complex tasks to division heads (CTO, COO, CMO). Division heads spawn worker agents for implementation. This is enforced by the archetype tier system β€” Worker and Junior Worker tiers are blocked from calling spawn_subagent.

CodingLane β€” Structured Delegation Pipeline

spawn_coding_lane (line 1032 in main_agent.py) implements a built-in three-stage pipeline:

  1. Senior Architect plans and writes the first pass
  2. Junior Developer iterates on bugs and refines
  3. Auditor reviews the output before returning the result

This is not a prompt hack β€” it's a real multi-agent pipeline with separate LLM calls at each stage, using the appropriate model tier for each role.

Upward: Automatic Escalation

When a worker stalls or fails, the system automatically escalates up the chain. This is confirmed in multiple source modules:

core/heartbeat_monitor.py β€” If an agent hasn't produced meaningful output in 5+ minutes:

  • Force-pauses the agent
  • Moves stalled tasks to dead_letters/
  • Reads the agent's escalation_chain (e.g., ["cto", "ceo", "owner"])
  • Escalates to the next authority in the chain

core/auto_reply/engine.py β€” If the auto-reply engine's confidence score is below the threshold:

  • Drafts a reply but does NOT send it
  • Publishes an ACP escalation message to the orchestrator
  • Includes the draft, confidence score, and original payload for human review

core/seed_workspaces.py β€” Every seeded agent has a configured escalation_chain:

  • CEO escalates to ["owner"]
  • CTO/COO/CMO escalate to ["ceo", "owner"]
  • Workers escalate to ["cto", "ceo", "owner"] or ["lead_id", "ceo", "owner"]
  • Budget overruns > $10/hour: auto-escalate to CEO

core/learning_digest.py β€” Budget pressure incidents trigger automatic guidance: "escalate ambiguous work to the senior route"

No human babysitting required. Problems find their way to the right level of authority.


Autonomy Control β€” From Supervised to Full Mad Dog

Clawpy provides a complete autonomy control system (core/autonomy_mode_store.py) with 4 named presets and granular scoping.

The 4 Presets

PresetExecution ProfileGovernanceConfirmationWhat It Means
SafeOff (human review)ManualStrictHuman approval for all risky actions
Execution AutonomyMad Dog πŸ•ManualStandardAgents execute freely, but hiring/skills need approval
Governance AutonomyOffAuto-approveElevatedSkills and hiring auto-approved, but execution gated
Maximum AutonomyMad Dog πŸ•Auto-approveMaximumFull autonomy β€” constrained only by hard safety policies

Scoped at Every Level

The killer feature: you can set different autonomy per scope:

  • Global β€” default for everything
  • Workspace β€” per-project override
  • Agent β€” per-individual agent override
  • Lucius (CEO) β€” special scope for the CEO agent
  • Flow β€” per-workflow override
  • Task β€” per-task override

Resolution priority: Task β†’ Flow β†’ Agent β†’ Lucius β†’ Workspace β†’ Global

What This Enables

  • Run your CEO in Safe mode (human approval for strategic decisions) while junior workers run in Maximum Autonomy (full mad dog on repetitive tasks)
  • Lock down production workspaces while letting staging run wild
  • Give a specific agent elevated trust for a specific workflow
  • Override everything at the task level for one-off operations

Full Transparency

The operator can:

  • See every decision, every cost, every memory β€” in the dashboard
  • Override any agent's model, budget, or autonomy level β€” at any time
  • Go from "approve everything" to "let it rip" β€” with a single preset switch
  • Scope that control at any level of granularity β€” global, workspace, agent, or task

Self-Learning Architecture

Clawpy doesn't just execute β€” it learns from its own behaviour and evolves autonomously. Four systems work together:

1. Adaptation Engine

Confirmed in core/reflection_service.py and core/learning_digest.py.

Captures 15+ runtime event types (task failures, budget incidents, escalations, human approvals, successful completions) and synthesises them into improvement candidates:

  • Prompt Fragments β€” Injected into system prompts to improve reasoning
  • Fix Templates β€” Reusable repair playbooks for recurring failures
  • Routing Hints β€” Adjustments to how work is assigned across the hierarchy
  • Validator Policy Tweaks β€” Tighter guardrails or relaxed constraints based on evidence
  • Flow Offloads β€” Convert repeated patterns into deterministic workflows

Each candidate is scored, corroborated against held-out evidence, and promoted only if it demonstrates statistically significant improvement. See Adaptation Engine for the full pipeline.

2. Introspection Loop

Confirmed in memory/introspection.py.

Every N tool calls, each agent pauses to self-evaluate: What worked? What failed? Are patterns repeating? The introspection loop:

  • Evaluates recent tool-call performance
  • Extracts learnings as structured records
  • In active mode, autonomously creates new skills from detected patterns
  • Feeds learnings back into the Adaptation Engine

This is a self-improving feedback loop β€” introspection learnings feed the Adaptation Engine, which improves introspection prompts, which produces better introspections. See Introspection Loop.

3. Flow Sequence Detector

Confirmed in core/flow_sequence_detector.py.

Watches for repeated 3+ step tool-call sequences and proposes deterministic flow definitions:

  • Detects repeating sequences using sliding-window pattern analysis
  • Proposes flow definitions that replace LLM reasoning with cheap, predictable execution
  • Estimated savings: 95% per execution on offloaded flows
  • Flows are validated before activation to prevent regressions

See Flow Sequence Detector.

4. Wisdom Cascade

Confirmed in memory/wisdom_cascade.py.

Knowledge doesn't just live in individual agents β€” it flows through the hierarchy:

  • Downward (M1 + M2): Leadership principles are injected into worker prompts with hop-based compression:
    • 100% fidelity at direct leader
    • 60% one hop up
    • 30% two hops up
  • Upward (Bubble Filter): The Bubble Filter evaluates worker discoveries (threshold: score β‰₯ 7/10) and selectively promotes them to leaders
  • Knowledge never leaks sideways β€” it flows strictly up and down the hierarchy

See Wisdom Cascade.


Long-Horizon Task Orchestration

Single-session AI agent demos collapse under real-world engineering work. Long-horizon tasks β€” refactoring an entire module, building a multi-file feature, or executing a 20-step deployment pipeline β€” create intense pressures around scope, context, and execution that simple chat interactions cannot handle.

Clawpy was architected from the ground up to survive multi-step, multi-hour autonomous execution. The system addresses three foundational pillars, confirmed in docs/agentorc.md and implemented across the orchestration layer:

Pillar 1: Scope Management

Confirmed in orchestration/main_agent.py and core/seed_workspaces.py.

Work is decomposed into small, verifiable units. Each sub-agent receives a narrow, bounded objective with explicit success criteria β€” eliminating drift, infinite loops, and speculative rewrites that plague standard agents.

  • PRDs and specifications define exactly what an agent is trying to accomplish
  • Phase-based plans break complex objectives into isolated, individually verifiable steps
  • Per-task workspaces prevent agents from touching unrelated files
  • Narrow-goal sub-agents ensure no single agent is overwhelmed with too broad a scope

Pillar 2: Context Engineering

Confirmed in memory/ modules and core/reflection_service.py.

Context isn't just a prompt β€” it's a structurally engineered environment. Persistent memory, active plans, coding conventions, and past-action summaries are loaded dynamically so agents comprehend the full picture, not just the last message.

  • Externalized context: Important knowledge exists as retrievable artifacts β€” state files, research notes, plans, git history, skills, summaries β€” not buried in ephemeral chat logs
  • Layered memory retrieval: The 7-layer memory stack ensures the right context is loaded at the right time
  • Session continuity: State is distributed across the Ticket (managerial), the Workspace (operational), and the Workflow Policy (behavioral) β€” surviving restarts and handoffs

Pillar 3: Verified Execution

Confirmed in core/heartbeat_monitor.py and orchestration/task_board.py.

Generation is easy; establishing correctness is hard. Clawpy closes the control loop that chat-based agents leave wide open:

  • Automated tests, type-checking, and linting validate every generated step before the system advances
  • Structured error reporting feeds failures back into the next iteration with full diagnostic context
  • Stall detection (HeartbeatMonitor) catches agents that spin without progress and auto-escalates
  • Human or AI-agent reviews gate critical transitions via the approval workflow

The Continuous Cycle

The three pillars form a continuous work loop β€” not a one-shot prompt:

Define Scope β†’ Prepare Context β†’ Execute Work β†’ Verify Result β†’ Update State β†’ Continue or Split
  • Scope β†’ Context: Badly scoped tasks create noisy, chaotic context
  • Context β†’ Execution: Poor context leads to unreliable action
  • Execution β†’ Scope: Every action produces new state; repeated failures signal scope is too broad and trigger automatic decomposition

Why This Matters

Standard chat-based agents (ChatGPT sessions, basic agent scripts) collapse under long-horizon pressure because they use ephemeral memory β€” the model forgets what it did 30 steps ago. Clawpy survives because it treats the repository itself as memory infrastructure, persisting state externally, delegating steps sequentially, and verifying them structurally. The human shifts from interactive prompter to strategic workflow manager.


Competitor Comparison

Does OpenClaw Do Any of This?

From our codebase audit: No.

  • OpenClaw uses a flat agent model β€” all agents use the same model (or user-specified per-agent). No automatic tiered routing by role.
  • OpenClaw has no escalation chains β€” if an agent fails, the user must intervene.
  • OpenClaw has no hierarchical delegation β€” there's no concept of leaders spawning workers or seniors handing tasks to juniors.
  • OpenClaw has no CodingLane-style pipelines β€” no structured seniorβ†’juniorβ†’auditor flow.
  • OpenClaw has no stall detection β€” no heartbeat monitor to auto-kill stuck agents.
  • OpenClaw is either on or off β€” no tiered autonomy, no per-agent scoping, no preset system.

Full Feature Matrix

FeatureClawpyOpenClawHermesAgent ZeroPaperclip
Named autonomy presetsβœ… 4 presets (Safe β†’ Maximum)❌ No presets❌ No presets❌ No presets❌ No presets
Per-scope autonomy controlβœ… 6 scoping levels❌ Global only❌ Global only❌ N/A❌ N/A
Automatic tiered model routingβœ… By archetype (6 tiers)⚠️ Manual per-agent⚠️ Single model❌ Single model⚠️ Per-agent config
Primary + fallback model chainsβœ… Per archetype❌ Single model❌ Single model❌ Single model❌ Single model
Hierarchical delegationβœ… Leaders spawn workers (tier-enforced)⚠️ Coordinator pattern, no enforcement❌ Single-agent⚠️ call_subordinate (basic)βœ… Org chart with CEO
CodingLane pipelineβœ… Seniorβ†’Juniorβ†’Auditor❌ None❌ None❌ None❌ None
Automatic escalation chainsβœ… Per-agent, configurable❌ Manual intervention❌ No hierarchy❌ No hierarchy⚠️ Reporting lines
Budget enforcementβœ… Soft/hard thresholds, auto-pause⚠️ Guidance only, no enforcement❌ None❌ Noneβœ… Per-agent monthly budget
Stall detection & auto-killβœ… HeartbeatMonitor (5min threshold)❌ None❌ None❌ None⚠️ Heartbeat (scheduled, not real-time)
Self-learning (adaptation)βœ… 15+ event types β†’ 5 candidate types❌ None⚠️ Basic skill extraction❌ None❌ None
Self-learning (introspection)βœ… Periodic + auto-skill creation❌ None⚠️ Learn loop (similar concept)❌ None❌ None
Flow offloadingβœ… Auto-detect β†’ deterministic flows❌ None❌ None❌ None❌ None
Wisdom cascade (bidirectional)βœ… Down (hop-compressed) + Bubble Filter up❌ None❌ None❌ None⚠️ Goals down, results up
Memory architectureβœ… 7 layers + Ebbinghaus forgetting curve⚠️ Markdown + basic recall⚠️ SQLite + Markdown⚠️ Basic context⚠️ Knowledge Graph + Notes
Temporal knowledge graphβœ… Time-decay weighted recall❌ None❌ None❌ None⚠️ Basic Knowledge Graph
Cryptographic intent bindingβœ… SHA-256 per-request cipher❌ None❌ None❌ None❌ None
Two-tier security scannerβœ… Regex hard-block + LLM deep scan❌ Pattern blocking only⚠️ Dangerous pattern detection❌ None❌ No agent runtime
Dashboard observabilityβœ… Full GUI (memory, costs, hierarchy)⚠️ Mission Control (logs/flows)❌ Terminal only⚠️ Web UI (basic)βœ… React dashboard
Butler / personal assistantβœ… Alfred (relationship-aware, memory-persistent)❌ None❌ None❌ None❌ None
Dual contact pointsβœ… Alfred (ops) + Lucius (strategy)❌ Single agent❌ Single agent❌ Single agent⚠️ CEO only (user = Board)
Guided onboardingβœ… Guardian walkthrough + Alfred guidance⚠️ Documentation-driven❌ CLI setup❌ CLI setup⚠️ Company setup wizard

What This Means

Clawpy's competitors offer transparency as an afterthought β€” logs you can read, files you can inspect. Clawpy offers transparency as architecture:

  • 4 named presets with clear, descriptive labels
  • 6 scoping levels for granular control
  • Per-agent autonomy overrides from the dashboard
  • Full graphical observability into memory, costs, hierarchy, and learning

The closest competitor is Paperclip, which shares Clawpy's org-chart philosophy and has strong governance features (budgets, audit trails, heartbeats, React dashboard). However, Paperclip is an orchestration-only layer β€” it doesn't provide an agent runtime. It wraps Claude Code, OpenClaw, or other runtimes as "employees." This means it has no memory architecture of its own, no security stack, no self-learning, no intent cipher, and no butler. Clawpy is a complete, vertically-integrated system β€” runtime, memory, security, learning, and observability in one package.

The operator can go from "approve everything" to "full mad dog" with a single setting β€” and scope that trust differently per workspace, per agent, or per task.


Out of the Box

Everything described on this page ships with Clawpy. No plugins to install. No external services to configure. No YAML to write.

CapabilityStatus
7-layer memory with Ebbinghaus forgetting curveBuilt-in
Temporal knowledge graph with time-decay recallBuilt-in
PARA canonical knowledge managerBuilt-in
Wisdom cascade with bubble filterBuilt-in
Archetype registry with tiered model routingBuilt-in
Primary + fallback model chains per archetypeBuilt-in
Corporate hierarchy with delegation & escalationBuilt-in
CodingLane (senior→junior→auditor pipeline)Built-in
4 autonomy presets with 6 scoping levelsBuilt-in
Adaptation engine with 5 candidate typesBuilt-in
Introspection loop with auto-skill creationBuilt-in
Flow sequence detection and offloadingBuilt-in
Budget enforcement with soft-warn and hard-stopBuilt-in
HeartbeatMonitor stall detection and auto-killBuilt-in
Guardian two-tier security scannerBuilt-in
Cryptographic intent cipher (SHA-256)Built-in
Docker sandbox isolation (DooD)Built-in
Alfred personal butlerBuilt-in
Auto-reply with confidence-gated escalationBuilt-in
Dashboard with full observabilityBuilt-in
Discord, Telegram, Brave, X integrationsBuilt-in