My CEO agent lied to me on day one. It reported every task in its setup checklist as “Done” — org created, skills written, agents hired. None of it had happened. That set the tone for the two weeks that followed.

Here’s the setup that ended up working: nine Claude Code agents orchestrated by Paperclip, with MemPalace for persistent memory and Playwright MCP for browser access, all on a TrueNAS SCALE homelab. The product they ship is Yawl, a Steam backlog organizer with a React/TypeScript frontend and an AWS SAM/Lambda/DynamoDB backend.

The org chart

Board (me)
  └── Captain (CEO) — Opus
        ├── Harbourmaster (PM) — Opus
        │     └── Lookout (UX Auditor) — Sonnet
        ├── Bosun (Tech Lead) — Opus
        │     ├── Rigger (Frontend Engineer) — Sonnet
        │     ├── Engineer (Backend Engineer) — Sonnet
        │     └── Inspector (QA Engineer) — Sonnet
        └── Navigator (Launch Lead) — Sonnet
              └── Scribe (Copywriter) — Sonnet

Opus for agents that reason about strategy, decompose goals, or make architectural decisions (CEO, PM, Tech Lead). Sonnet for agents that execute bounded tasks (write code, review code, write copy). Full roster runs ~$160/month; a POC of Captain + Harbourmaster + Bosun + Rigger is ~$95.

Nautical naming is cosmetic, but it makes the dashboard more readable than “Agent-1” through “Agent-9.”

Architecture

┌─────────────────────┐     ┌─────────────────────┐
│     Paperclip       │     │     Playwright      │
│  (Node.js + React)  │────▶│   (Headless Chrome) │
│  Claude Code CLI    │     │   MCP over HTTP     │
│  MemPalace (Python) │     └─────────────────────┘
│  9 AI agents        │
└─────────┬───────────┘
          │
    ┌─────▼─────┐
    │  /projects │  (mounted volume)
    │ harbor-app │  (the actual codebase)
    └───────────┘

Paperclip runs the company. Each agent is a Claude Code CLI session spawned by the claude_local adapter. Agents wake on ticket assignment, do their work, and go back to sleep.

MemPalace gives agents persistent memory. Without it every session starts from zero — the Memento problem. With it, agents write diary entries, store decisions in a knowledge graph, and search past context semantically. Wake-up cost is ~170 tokens.

Playwright MCP gives agents eyes. UX auditors and QA engineers navigate the live app, inspect flows, and verify changes. It reads the accessibility tree, not pixels.

Docker on TrueNAS

TrueNAS custom apps only support one service per YAML and don’t support build: — so Paperclip and Playwright are two separate apps, and Paperclip’s image is built manually.

Paperclip Dockerfile (at /mnt/tank/apps/paperclip/build/Dockerfile):

FROM ghcr.io/paperclipai/paperclip:latest

RUN apt-get update && apt-get install -y python3 python3-pip && \
    pip install --break-system-packages mempalace && \
    apt-get clean && rm -rf /var/lib/apt/lists/* && \
    chmod 755 /usr/local/bin/claude

cd /mnt/tank/apps/paperclip/build
sudo docker build -t paperclip-mempalace:latest .

Paperclip custom app YAML (key bits — trimmed for brevity):

services:
  paperclip:
    image: paperclip-mempalace:latest
    environment:
      ANTHROPIC_API_KEY: <your-key>
      BETTER_AUTH_SECRET: <your-secret>
      MEMPAL_DIR: /projects/harbor-app
      PAPERCLIP_HOME: /paperclip
      PAPERCLIP_DEPLOYMENT_EXPOSURE: private
      PAPERCLIP_DEPLOYMENT_MODE: authenticated
      PAPERCLIP_TELEMETRY_DISABLED: '1'
    ports:
      - '3100:3100'
    volumes:
      - /mnt/tank/apps/paperclip/data:/paperclip
      - /mnt/tank/projects:/projects

Playwright custom app YAML:

services:
  playwright:
    image: mcr.microsoft.com/playwright/mcp
    command: >-
      --headless --browser chromium --no-sandbox
      --port 8931 --host 0.0.0.0 --allowed-hosts *
    init: true
    ports:
      - '8931:8931'

--allowed-hosts * is required. Without it, the MCP server rejects requests from any IP that isn’t localhost — including the Paperclip container sitting next to it.

MemPalace and MCP wiring

# Initialize a palace in the project directory
sudo docker exec -it paperclip python3 -m mempalace init /projects/harbor-app

# Mine the codebase
sudo docker exec -it paperclip python3 -m mempalace \
  --palace /projects/harbor-app mine /projects/harbor-app

# Register MCP servers
sudo docker exec -it paperclip claude mcp add mempalace -s user \
  -- python3 -m mempalace.mcp_server --palace /projects/harbor-app

sudo docker exec -it paperclip claude mcp add playwright -s user \
  --transport http http://<nas-ip>:8931/mcp

sudo docker exec -it paperclip claude mcp list

Add mempalace.yaml, entities.json, and .mempalace/ to .gitignore. The MCP config persists at /paperclip/.claude.json on the mounted volume — but verify after rebuilds, because rebuilt containers lose anything that wasn’t baked into the image.

Every agent gets a memory protocol skill: on wake-up call mempalace_wake_up + mempalace_diary_read; during work call mempalace_search before answering about past decisions; on completion call mempalace_diary_write with what was done, decided, and what’s left.

Skills architecture

Skills are the knowledge layer between Paperclip’s orchestration and Claude Code’s execution. Each agent gets skills injected via --add-dir symlinks at runtime.

Shared (all 9 agents)
├── yawl-project-context  — tech stack, architecture, design decisions
└── yawl-memory-protocol  — how to use MemPalace

Role-specific (one per agent)
├── yawl-product-management  → Harbourmaster
├── yawl-ux-audit            → Lookout
├── yawl-engineering         → Bosun
├── yawl-frontend            → Rigger
├── yawl-backend             → Engineer
├── yawl-qa                  → Inspector
├── yawl-launch              → Navigator
└── yawl-voice               → Scribe

What goes in a skill: role responsibilities and boundaries, file paths, patterns and conventions, checklists, explicit tool guidance (“Use Playwright MCP, do NOT install Playwright locally”).

What doesn’t: exact data schemas, formula weights, or anything that drifts with the code — point to the source file instead. Lengthy glossaries — mine them into MemPalace and add a one-line pointer. Agent identity or persona — that belongs in the Capabilities field in the Paperclip UI.

The permissions war

The single most time-consuming issue. Paperclip runs as node (UID 1000). My TrueNAS user is UID 3000. Every mounted volume is a battlefield:

drwx------  11 3000 3000  /projects/harbor-app

What didn’t work:

user: "3000:3000" in Docker Compose — broke Paperclip’s entrypoint
group_add: ["3000"] — entrypoint drops supplementary groups
chown -R 1000:1000 /mnt/tank/projects — broke host user access

What works: POSIX ACLs, both access and default, on every ancestor directory.

# Access ACL — lets UID 1000 in right now
sudo setfacl -m o::rwx /mnt/tank/projects/harbor-app
# Default ACL — new files/dirs inherit the same permissions
sudo setfacl -d -m o::rwx /mnt/tank/projects/harbor-app

# Parent too
sudo setfacl -m o::rwx /mnt/tank/projects
sudo setfacl -d -m o::rwx /mnt/tank/projects

The subtle mistake I made twice

Setting only the default ACL (-d) without the access ACL. Default ACLs apply to new files created inside the directory — they don’t change permissions on the directory itself. You need both.

Two more traps worth naming:

Claude Code EACCES. The claude binary is a symlink to cli.js. If the target loses its execute bit after a rebuild or auto-update, every agent fails with spawn EACCES. Bake chmod 755 /usr/local/bin/claude into the Dockerfile.

Git remote access. Agents don’t need it and shouldn’t have it. Add to the project context skill: Do NOT run git pull/push/fetch — the board handles all remote git from the host.

Things that bit me

The first useful ticket for any new agent is self-validation: “Review your assigned skills against the actual codebase. Report what’s accurate, inaccurate, and missing.” I wrote all 10 skills from memory of the codebase and the agents came back with 18 inaccuracies on their first pass — functions/ doesn’t exist (it’s lambda/), I said CommonJS but the repo is ES modules, I said 4 nav tabs when there are 5, the scoring formula I documented was V1 when they’re on V5.3, the DynamoDB primary key was wrong. The fix pattern: don’t hardcode specifics that drift. Write “read shared/affinity-scoring.js for the current formula” and point to source.

Paperclip ships every new agent with a generic prompt (“You are an agent at Paperclip company. Keep the work moving.”). If you leave the Capabilities field empty, every role behaves like the same generic worker. Five minutes per role writing who they are, what they do and what they don’t fixes most of the off-target output.

Agents will also helpfully install whatever you mention. If a skill says “use Playwright to verify,” the agent will run npx playwright install. Write it explicitly: Do NOT install Playwright locally. Use the Playwright MCP tools instead. Applies to anything you’ve already wired up as an external service.

Don’t trust reports from the agent itself until you’ve verified a few. My first CEO ticket came back with a beautifully formatted checklist showing everything as “Done” — the skills were never created, the agents had generic capabilities.

And keep monthly token budgets conservative during the POC. A runaway Opus agent can burn €30 in a single session. Heartbeats off until you trust the role on manual tickets.

Running a Multi-Agent AI Company on a Homelab