IronClaw Deep Dive: Security-First AI Agents
TL;DR: IronClaw is the only AI agent framework where credentials are architecturally impossible to steal. Built by the co-author of the Transformer paper, it uses WASM sandboxes and host-boundary credential injection so tools never see your secrets. OpenClaw has 24 security vulnerabilities, IronClaw solves 11 of them through architecture, not configuration.
IronClaw Deep Dive
Overview
IronClaw is a security-first Rust rewrite of an AI agent framework, created by NEAR AI as a response to OpenClaw's security vulnerabilities. The defining feature is WASM-based tool sandboxing with host-boundary credential injection -- tools never possess credentials.
- Repository: github.com/nearai/ironclaw
- Stars: 2,798
- Language: Rust (~42,000 LOC)
- Binary: 3.4MB
- License: MIT + Apache 2.0
- Created: February 3, 2026
- Latest: v0.9.0 (Feb 21, 2026)
Who Built It
Illia Polosukhin (ilblackdragon) -- 126 commits (71% of total)
- Co-author of "Attention Is All You Need" (2017) -- the paper that created transformers (GPT, Claude, Gemini, etc.)
- Former Google Research, TensorFlow core contributor
- Co-founder of NEAR Protocol, NEAR Foundation CEO
- 1,066 GitHub followers
Other Contributors
- frol (Vlad Frolov) -- 12 commits, Rust & NEAR Technical Advocate, built NEAR CLI in Rust
- zmanian (Zaki Manian) -- 7 commits, Cosmos/Tendermint contributor, security expert
- serrrfirat -- 11 commits, NEAR AI team
- Total: 24 contributors
Funding
- NEAR Protocol: $540M+ total raised
- $20M AI Agent Fund (Feb 2026)
- $4M Infrastructure Committee
- Commercial arm: ironclaw.tech (enterprise)
- Verdict: Serious long-term project, not a PR move
Architecture
Codebase
ironclaw/
+-- src/
| +-- agent/ # Core agent loop
| +-- channels/ # REPL, HTTP, WebSocket, Web gateway
| +-- llm/ # Provider abstractions
| +-- db/ # PostgreSQL/libSQL
| +-- extensions/ # WASM tool runtime
| +-- hooks/ # Lifecycle interception
| +-- context/ # Session & memory
| +-- evaluation/ # Skill routing
| +-- observability/ # Logging & telemetry
+-- channels-src/ # WASM channel implementations (Discord, Slack, Telegram, WhatsApp)
+-- tools-src/ # WASM tool implementations (GitHub, Gmail, Google Docs/Drive/Sheets, Okta)
+-- wit/ # WebAssembly Interface Types contracts
| +-- channel.wit
| +-- tool.wit
+-- migrations/ # 9 PostgreSQL migration files
+-- deploy/ # Orchestration configs
+-- docker/ # Container definitions
Key Dependencies
- WASM: wasmtime 28 (component model), wasmtime-wasi, wasmparser
- Async: tokio (full)
- HTTP: reqwest (rustls), axum
- Docker: bollard
- Security: aes-gcm, hkdf, sha2, blake3, secrecy
- Database: tokio-postgres, deadpool-postgres, libsql (optional)
- LLM: rig-core 0.30
Security Model
WASM Sandbox Architecture
Tools compiled to .wasm binaries. Run in wasmtime with WebAssembly Interface Types (WIT) contracts defining the boundary.
Tools CANNOT:
- Access environment variables
- Read arbitrary files
- Open raw network sockets
- Access the credential vault
- Spawn processes
- Call other tools directly
Tools CAN (via host functions):
- Make HTTP requests (validated against allowlist)
- Read/write workspace files (sandboxed path)
- Search memory
- Return results to the LLM
Host-Boundary Credential Injection
The core innovation. Flow:
- Tool requests HTTP via host function (no auth headers)
- Host reads
capabilities.jsonfor credential requirements - Host decrypts secret from AES-256-GCM vault
- Host injects credential into HTTP request (Bearer, Basic, header, or query param)
- Host runs leak scan on outbound request
- Host sends request
- Host runs leak scan on response
- Host returns response body to WASM tool (tool never saw the credential)
Capability declaration (per tool):
{
"http": {
"allowed_domains": ["api.github.com"],
"paths": ["/repos/*"],
"methods": ["GET", "POST"],
"schemes": ["https"]
},
"secrets": ["github_token"],
"tools": ["file_write", "memory_search"]
}
Leak Detection
Aho-Corasick multi-pattern matching on ALL I/O paths:
- tool output -> HTTP request body/headers
- HTTP response -> tool input
- tool output -> LLM context
- LLM response -> user
15+ patterns: API keys (sk-, AKIA), OAuth tokens (xox*), PEM private keys, SSH keys, JWTs, database URLs, credential strings.
Prompt Injection Defense (5 Layers)
- Sanitizer: Pattern detection (SQL/command/path injection), content escaping, delimiter wrapping
- Validator: Length limits (100K chars), UTF-8 enforcement, forbidden patterns (
<|endoftext|>,[INST]) - Policy Engine: Severity-tiered rules (critical=block, high=require approval, medium=warn, low=log)
- Shell Env Scrubbing: Strips AWS_SECRET_ACCESS_KEY, DATABASE_URL, GITHUB_TOKEN etc. before shell execution. Detects command chaining (
;,&&,||,|, backticks,$()) - Output Wrapping: External content wrapped in
<external-content>delimiters, LLM instructed to treat as untrusted
SSRF Protection (Built-in HTTP Tool)
- HTTPS only (HTTP rejected)
- Blocks localhost (127.0.0.0/8, ::1)
- Blocks private IPs (RFC 1918)
- Blocks link-local (169.254.0.0/16)
- Blocks cloud metadata (169.254.169.254)
- No redirects (
redirect::Policy::none()) - 5 MB response limit
Docker Container Sandbox (Heavy Jobs)
Per-job containers with ephemeral bearer tokens:
- Tokens: 64-char hex, in-memory only, revoked on job completion
- Containers: drop ALL capabilities, no-new-privileges, non-root (UID 1000), read-only rootfs, auto-remove
- Network: bridge (isolated). HTTP proxied through orchestrator with allowlist
- Tmpfs: /tmp (512MB), /home/sandbox/.cargo/registry (1GB)
- Stuck job detection: 300s threshold, force kill + retry
- Three policies: ReadOnly, WorkspaceWrite, FullAccess
Security Scorecard vs OpenClaw's 24 Issues
Solved
| # | OpenClaw Issue | IronClaw Solution |
|---|---|---|
| 1 | No rate limiting | Per-tool rate limits (token bucket), fuel metering, execution timeouts |
| 2 | Exec approval bypass | No shell by default. WASM capability model. Docker with approval overlay |
| 3 | Plaintext credentials | AES-256-GCM vault, host-boundary injection |
| 4 | Malicious skills | WASM sandbox limits blast radius to declared capabilities |
| 7 | Admin superscope | No admin scope concept. Capability-based, opt-in only |
| 9 | Device token replay | Per-job ephemeral tokens, single-use, in-memory only |
| 10 | No TLS enforcement | Gateway binds localhost only (partial -- no app-level TLS) |
| 11 | WebSocket exhaustion | Hard limit 100 connections |
| 12 | Large payload DoS | 64KB body limit, 32KB message limit, 60 req/min |
| 13 | Session cap exhaustion | Max 5 parallel jobs, auto-remove containers, stuck detection |
| 17 | Plugins in-process | WASM sandbox, memory isolation via linear memory model |
| 21 | Log redaction | Leak scanner on all I/O paths (15+ patterns) |
Not Addressed
| # | OpenClaw Issue | Status |
|---|---|---|
| 5 | Session data leakage | N/A (single-user only) |
| 6 | No audit logging | Partial (DB table exists, no hash-chain or tamper detection) |
| 8 | No tenant isolation | N/A (single-user only) |
| 14 | Session transcript exposure | NOT encrypted at rest (only secrets) |
| 15 | Config drift | Not addressed |
| 16 | Concurrent write interleaving | Not documented |
| 18 | Session metadata validation | Unknown |
| 19 | No CSP headers | Security headers applied (resolved) |
| 20 | Dependency pinning | Cargo.lock pins Rust deps |
| 22 | No signed releases | Not addressed |
| 23 | Stale session cleanup race | Not applicable (different architecture) |
| 24 | Heartbeat timing side-channel | Not addressed |
Multi-Agent (NOT IMPLEMENTED)
- No subagent spawning
- No agent-to-agent messaging
- No task queuing / delegation
- No team orchestration
- No plan approval workflow
- FEATURE_PARITY.md: "Not implemented, unassigned"
Channels
- REPL (production)
- HTTP Webhook (production, 60 req/min, secret in JSON body)
- Web Gateway (production, SSE streaming, chat/memory/logs)
- Slack (WASM channel, production)
- Telegram (WASM channel, production)
- Discord (WASM source exists, maturity unclear)
- WhatsApp (WASM source exists, maturity unclear)
Providers
- NEAR AI (default, OAuth + API key)
- OpenAI
- Anthropic
- Ollama
- OpenRouter
- Together AI
- vLLM, LiteLLM
- Tinfoil (zero-knowledge TEE inference)
Memory
- PostgreSQL 15+ with pgvector (primary)
- libSQL (local/edge alternative)
- Hybrid search: Reciprocal Rank Fusion (vector cosine + full-text tsvector)
- Workspace filesystem: IDENTITY.md, SOUL.md, USER.md, HEARTBEAT.md
- Memory tools: search, write, read, tree
- Hygiene: configurable retention, auto-cleanup
MCP Support
- First-class. Install via
ironclaw tool install <mcp-server> - Any language (Node.js, Python, Rust)
- Stdio-based communication (JSON-RPC)
- Dynamic tool discovery
- Same capability sandbox rules apply
Performance
| Metric | IronClaw | OpenClaw |
|---|---|---|
| Binary size | 3.4 MB | 28 MB+ |
| Startup | <10 ms | 5.98 s |
| RAM (idle) | ~7.8 MB | ~394 MB |
| RAM (active) | ~20 MB | ~1.52 GB |
| Tests | 943 passing | -- |
WASM overhead: ~10-20% vs native (wasmtime JIT). AOT compilation available for near-native speed.
Known Issues
Security Findings (from NETWORK_SECURITY.md)
- F-2: No TLS at application layer (Low) -- relies on reverse proxy
- F-3: Orchestrator binds 0.0.0.0 on Linux (Medium) -- Docker bridge requirement
- F-7: No rate limiting on orchestrator API (Low)
- F-8: No graceful shutdown on orchestrator (Info)
Limitations
- 19 days old, no production battle-testing, no security audit
- Bus factor: ilblackdragon = 71% of commits
- No encryption at rest for conversations, jobs, workspace memory, audit logs
- "Vibe coded" criticism on HN despite Illia's credentials
- Missing: multi-agent, most channels, vision, PDF, audio, canvas, config hot-reload
- No mobile/desktop apps (out of scope)
HN Sentiment
- Positive: Respect for credentials, appreciation for security-first approach, impressive performance
- Negative: "Vibe coded," prompt injection skepticism, *Claw fatigue
- Consensus: Cautiously optimistic, waiting for production validation
Residual Attack Surface
| Attack | Status | Notes |
|---|---|---|
| Authorized misuse of allowed APIs | Residual risk | Tool with GitHub POST can create issues on attacker repos |
| Data exfiltration via allowed channels | Residual risk | Leak scanner catches secrets but not arbitrary text |
| Prompt injection via response content | Partial defense | <external-content> wrapping, not foolproof |
| MCP server compromise | Residual risk | MCP URLs operator-controlled but server itself could be malicious |
| Social engineering (user approves bad action) | Residual risk | Requires human judgment |
| No encryption at rest for data | Gap | Only secrets encrypted; conversations/jobs in plaintext PostgreSQL |