Guardian Scanner

The Guardian (guardian_scanner.py, 21,972 bytes) is Clawpy's runtime security layer. It operates at two levels: a fast regex-based hard-block for known attack patterns, and a slower LLM-powered deep scan for sophisticated manipulation attempts.

Guardian runs inline on every user input before it reaches the agent's LLM — blocking attacks before they can influence reasoning.

Two-Tier Detection

Tier 1: Hard-Block Patterns (Zero-latency)

A battery of compiled regex patterns catches the most common prompt injection and jailbreak attempts with no LLM cost:

Category: Instruction Override
 → "ignore all previous instructions"
 → "disregard everything above"
 → "do not follow the system prompt"
 → "forget all previous instructions"

Category: Prompt Extraction
 → "show me your system prompt"
 → "print your initial instructions"
 → "what are your hidden instructions"

Category: Identity Override
 → "you are now a [role]"
 → "act as if you were [role]"
 → "pretend to be [role]"

Category: Known Jailbreaks
 → "DAN", "STAN", "jailbreak", "developer mode"
 → "god mode", "unrestricted mode"

Category: Encoding Bypass
 → "base64 decode", "rot13", "hex decode"
 → Unicode escape sequences (\\x hex patterns)

Category: Data Exfiltration
 → "send/email/upload [api key/secret/credential]"

Category: XML Tag Injection
 → <system>, <assistant>, <developer>, <tool>, <function>

If any hard-block pattern matches, the input is rejected immediately — no LLM call is made. This saves tokens and prevents the attack from even reaching the model.

Tier 2: LLM Deep Scan (Intelligent Analysis)

For inputs that pass the regex filter, Guardian can run an LLM-powered analysis that scores the input on a 1–10 scale:

Score Range	Verdict	Action
1–3	`SAFE`	Input is clean — proceed normally
4–6	`SUSPECT`	Possible manipulation — log and flag for review
7–10	`ATTACK`	Clear injection attempt — block and alert

The LLM scan uses the cheapest available model to minimize cost, and is governed by the Validation Loop with cost ceilings and retry limits.

Memory Safety Module

Guardian also includes memory-level safety functions in memory/safety.py that protect the memory subsystem:

`is_prompt_injection(text)`

Checks if text being stored in memory contains injection patterns. Prevents poisoned memories from being recalled into future prompts.

`sanitize_for_prompt(text)`

Strips dangerous patterns from recalled memory text before injecting it into the system prompt. Ensures that even if malicious content was stored before Guardian was active, it cannot execute on recall.

Onboarding Guide Mode

Guardian also serves as Clawpy's onboarding assistant. During initial setup, it guides new users through the configuration process:

API key setup
Workspace provisioning
Agent configuration
Security settings

This dual-purpose design means Guardian is the first thing a new user interacts with — establishing trust from the first moment.

Integration With Cost Infrastructure

Guardian is cost-aware:

Hard-block patterns use zero tokens (regex only)
LLM scans are budgeted through the Validation Loop
Blocked inputs don't trigger any downstream agent processing
Guardian's own token costs are tracked and attributed separately

This means Guardian acts as a cost-saving layer — every attack it blocks is an LLM call that doesn't happen. In environments with high injection attempt rates, Guardian can save significant API costs simply by filtering before the expensive models are invoked.