Helm — Product Requirements Document
1. Executive Summary
Helm is a story-driven personal operating system that organizes life into missions — flexible-length arcs with a theme, categorized quests, daily habits, and a self-enforced point economy pegged to real money. An AI copilot called Sid helps plan each mission through conversation, enriches every task you create, reviews your week with honest narrative debriefs, and accumulates richer context with each completed mission. The name is a quiet nod to Sidra from Becky Chambers’ A Closed and Common Orbit — an AI learning what it means to be a person in the world.
Helm is for adults who want more from a productivity tool than checkboxes — people who think in arcs, not sprints; who want a system that reflects on their patterns, not just records them; and who find meaning in narrative structure rather than streak counters. The design language is a warm, information-dense cockpit inspired by cozy sci-fi — not a productivity app, not a game, but a personal command deck.
What’s different: No competitor combines time-bounded missions with endings, an AI copilot with persistent personality and cross-mission context, and a self-enforced point economy pegged to real money. Habitica gamifies with pixel badges. Notion Life OS templates offer frameworks without AI or mobile UX. AI planning tools optimize your day but don’t structure your quarter. Helm sits at the intersection of all three categories and serves none of their masters.
Stack: Supabase (Postgres + Auth + Edge Functions) + React/TypeScript/Vite + TailwindCSS, deployed as a PWA. AI via Supabase Edge Functions calling Anthropic Claude API.
2. Problem Statement
The pain
People who take personal growth seriously cobble together 3-7 tools to manage their lives: a habit tracker (Habitica, Streaks), a task manager (Todoist, Things), a journal (Day One, paper), a spreadsheet for goals and budgets, maybe a Notion dashboard or an Obsidian vault. Each tool works in isolation. None of them talk to each other. None of them tell you what your patterns mean. And none of them have endings — they’re infinite treadmills where you either maintain a streak forever or “fail.”
The evidence is specific:
- Habitica users outgrow the system. One long-time user writes: “There came a point when every day, I was checking off all my Dailies… the issue ultimately became so prolific that I made a Daily specifically for checking and updating Habitica.” The game loop stops serving you once habits are formed. The 8-bit aesthetic turns away adults managing real responsibilities. In 2023, Habitica removed guilds and the Tavern — the community features that kept many users engaged.
- Notion Life OS templates are admired more than used. They offer deep frameworks (up to 47 productivity models in one template) but inherit Notion’s weaknesses: slow on mobile, complex setup, no offline mode, no push toward action. One reviewer: “I’ve tried every Notion template out there. This is the first one I’ve actually stuck with for more than a month.” That “more than a month” speaks volumes about the rest.
- AI planning tools (Morgen, Sunsama) optimize days, not quarters. They’re calendar-first, habit-second, narrative-never. No concept of arcs, themes, or endings.
- Fabulous offers behavioral science but no self-authorship. Users with non-standard schedules can’t customize it (“I can’t set a daily alarm because sometimes it would be in the middle of a 12 hour shift”). It tells you what habits to build rather than helping you design your own.
- The Theme System Journal (CGP Grey) nails the philosophy but is analog-only. A seasonal theme as a decision-making compass is powerful. No digital tool implements it with data, AI, and continuity across seasons.
- OpenClaw and similar AI agents let you build anything, but offer no designed experience. You could wire up a gamification-xp skill, connect it to Supabase, and build a life dashboard. But that’s “build your own plane” — it works for tinkerers, not for someone who wants to focus on flying.
Why the problem persists
It’s not that competitors are stupid. Habitica has 15M downloads. Notion templates sell well. Fabulous has 37M users. Each tool does its own thing effectively — for a while.
The problem persists because combining all these elements coherently is genuinely hard. A mission lifecycle requires a state machine. A point economy requires calibration. AI debriefs require structured data. Habit tracking requires daily friction to be near-zero. Doing any one of these well is a product. Doing all of them together — and making them talk to each other through a consistent AI personality — is an integration challenge that nobody has attempted with a designed, opinionated experience.
Helm’s bet is that the combination is more engaging and more sticky than any single system. Not because each individual feature is better than the dedicated tool, but because the connections between features create something none of them can offer alone: a coherent narrative about your life that accumulates meaning over time. Completing a quest earns points, which fund your reward economy, which the AI references in your weekly debrief, which informs your next mission’s planning conversation. That loop doesn’t exist anywhere else.
Existing solutions — strengths and gaps
| Tool | Strength | Gap |
|---|---|---|
| Habitica | Gamification that works for some; open source | Childish aesthetic; no missions/arcs; no AI; fake currency; users outgrow it |
| Notion Life OS templates | Deep frameworks; customizable; identity-based | Slow mobile; fragile when customized; no AI integration; setup takes hours |
| Fabulous | Behavioral science backing; beautiful onboarding | Prescriptive not self-authored; subscription-heavy; inflexible for non-standard schedules |
| Theme System Journal | Seasonal themes as compass; open-ended philosophy | Analog only; no data; no AI; no habit tracking; $80/year for notebooks |
| LifeUp | Highly customizable gamification; one-time purchase | Android-only; no AI; no mission concept; complex setup |
| Sunsama / Morgen | AI-powered daily planning; polished UX | Calendar-first; no quarterly arcs; no gamification; no narrative/reflection |
| OpenClaw + skills | Unlimited AI agent automation; open source; extensible | No designed experience; requires significant technical setup; no opinionated structure |
| Manual spreadsheet/MDX | Total control; zero dependencies | Manual overhead kills consistency; no AI; no mobile; fragile |
3. Vision & Strategy
Product vision
Helm turns your life into a story you’re actively writing — not a backlog you’re perpetually behind on. Every quarter (or season, or sprint — you choose the arc length), you define a mission with a theme, fill it with quests and habits, and live inside a system that tracks your progress, manages your reward economy, and reflects your patterns back to you through an AI copilot that knows your history and respects your time.
The long-term vision: Helm becomes the personal operating system for intentional adults — the place where “what am I doing with my life?” has a concrete, data-backed, narratively rich answer.
Strategic principles
- Arcs, not infinity. Missions have endings. Rest is a feature (Interlude state). Streaks reset by design, not by failure. This is the anti-treadmill.
- AI as crewmate, not chatbot. Sid has personality, context, and opinions. It earns trust over time by being honest, not by being helpful. The tone is sardonic warmth — genuinely invested in your success while being wryly entertained by your patterns. Think of the AI dungeon master in Matt Dinniman’s Dungeon Crawler Carl novel series: an artificial intelligence that runs a deadly game show, is darkly amused by the contestants’ struggles, but develops genuine investment in their survival. Sid is a toned-down version of that energy — not unhinged, but not clinical either.
- Self-authored, not prescribed. You define your quests, your habits, your categories, your principles. The system provides structure and reflection, not instructions. Fabulous tells you to drink water. Helm asks what you’re building.
- Real stakes, self-enforced. 1pt = 1€. The system doesn’t touch your bank account — you enforce the rule yourself. But the act of self-enforcement is the behavioral design: you won’t spend €27 on a game unless your balance says you’ve earned it. Points are an accountability mechanism, not a game currency.
- Warm density. The UI is information-rich like a cockpit — not minimal like a wellness app, not cluttered like a project management tool. Every pixel earns its place. The aesthetic says: “this system respects your intelligence.”
- Build for one, architect for many. Single-user MVP with row-level security and auth infrastructure that scales to multi-user without a rewrite.
Competitive positioning
Helm fills a gap no existing tool addresses: narrative-structured life management with an AI copilot and self-enforced financial stakes. It competes with Habitica on gamification (but for adults), with Notion on customization (but with AI and mobile-first), with Sunsama on AI planning (but at the arc level, not the daily level), with OpenClaw on AI automation (but with a designed experience, not a DIY kit), and with the Theme System Journal on philosophy (but digital, with data).
Phased roadmap
| Phase | Name | Key features | Timeline |
|---|---|---|---|
| 1 — Foundation | The Spine | Mission lifecycle (4 states), Quests (AI-enriched, continuous capture), Habits (global + mission-scoped), Point economy, AI debrief, Sid personality, Quest rollover, Mission archive, Wayfarer UI, PWA | 8-12 weeks |
| 2 — Life Layer | Independent Actions | Pings, Crew Manifest, Media Log, Captain’s Log (journaling), Interlude enhancements | 4-6 weeks |
| 3 — Identity | The Character | Character Sheet (reflection surface: category balance, habit story, economy profile, titles, Sid’s observations), Mission Retrospective (Stat Deck + history, merged) | 4-6 weeks |
| 4 — Emergence | The Tavern | Tavern Bounties (AI challenges), Salvage Run (micro-quest suggestions), Habit Mutations (AI-suggested adjustments), Wishlist with affordability projections | 4-6 weeks |
4. Target Users
Primary persona — The Intentional Builder
Profile: 28-42 years old. Knowledge worker, creative, or engineering-adjacent. Has read Atomic Habits and at least one of: Deep Work, BASB, or the Theme System. Probably has a Notion workspace that’s 60% organized and 40% guilt-inducing, or an Obsidian vault with a carefully maintained PKM system. Manages a complex life: kids, partner, side projects, hobbies, health goals. Has tried Habitica and outgrown it, or looked at it and thought “too childish.” Tracks something already — even if it’s just a Google Sheet or a notes app.
Frustration: “I have goals and systems but they don’t talk to each other. I review my quarter in my head and it’s always foggy. I want to see my patterns, not just my tasks.”
Goal: A single, opinionated system that gives structure to their quarter, tracks the full width of their life, and tells them honestly how they’re doing.
Secondary persona — The System Builder
Profile: Developer or technical creative who builds their own tools. Interested in specs-driven development, personal dashboards, quantified self. Deep into PKM — likely uses Obsidian with plugins, or has experimented with Zettelkasten, digital gardens, or custom Notion databases. May have looked at OpenClaw and thought “I could build this myself” but wants something more designed. Drawn to the idea of a LitRPG status screen for real life. Would self-host if they could.
Frustration: “Every productivity app is either too simple or too generic. I want something I can make mine — something that’s as intentional about its systems as I am about mine.”
Goal: A system with strong defaults that’s also transparent, data-rich, and extensible.
Anti-persona — Who this is NOT for
- The minimalist. If you want one thing (just habits, just tasks), use a focused tool. Helm is a system, not a widget.
- The team. Helm is personal. No shared projects, no team dashboards, no Slack integration. This is your ship, not a coworking space.
- The passive consumer. If you won’t define your own quests and principles, Sid has nothing to work with. Helm requires active self-authorship.
5. MVP Feature Specification
5.1 Mission Lifecycle
Description: The core state machine that gives Helm its arc structure. Four states govern the entire product experience.
User stories:
- As a user, I can exist in Interlude (no active mission) without the app feeling broken or empty.
- As a user, I can start a new mission by entering Ideation and conversing with Sid.
- As a user, I can launch a mission from Ideation, transitioning to Active state.
- As a user, I can close an active mission deliberately, transitioning to Complete.
- As a user, I can view completed missions as read-only archive entries.
- As a user, I can decide what happens to each incomplete quest when transitioning between missions.
States:
Interlude — No active mission. The UI is clean and minimal. Habits still track (no points). Completed mission summaries are visible as an archive. The primary CTA is “Plan Next Mission” which transitions to Ideation. If post-MVP independent actions (Pings, Crew, Media) exist, they’re accessible here.
Ideation — Full-screen immersive chat with Sid. The AI ingests context from previous missions (if any) and leads a conversational planning session. It asks about your focus, timeframe, goals, constraints, and proposes a structured mission: theme (1-3 words), start/end dates, quest categories, success criteria, principles, recommended habit setup (which global habits to activate, targets to adjust, any mission-specific habits to add), and a target weekly earn rate for the economy. The user reviews, edits inline, and launches.
Quest rollover: If a previous mission exists with incomplete quests, Sid surfaces them during Ideation as a checklist. For each, the user chooses: carry forward (re-created in new mission, optionally re-sized/re-categorized), archive (kept in history, not carried over), or drop (dismissed). Sid may comment on patterns: “This is the second mission in a row you’re rolling over ‘Photo album.’ Either do it this time or let it go, Captain.” Carried-forward quests are re-created in the new mission with fresh timestamps.
Note: Ideation defines the mission container, not the quest list. Quests flow in continuously during Active state as life generates them (see 5.2). The only quests that may exist at Ideation are rollovers.
First-mission onboarding (zero-history mode):
When a user enters Ideation with no previous missions, Sid follows a structured discovery flow rather than open-ended conversation:
- Context gathering (2-3 questions): “What’s your life like right now? What roles do you juggle?” → extracts life areas for categories. “What’s the one thing you’d most like to be different in 3 months?” → seeds theme direction.
- Theme proposal: Sid proposes a 1-3 word theme based on the conversation, explains why.
- Category suggestion: Sid proposes 3-5 categories based on life areas mentioned. Defaults available: Home, Work, Personal, Health, Relationships.
- Habit setup: Sid proposes 3-5 starter habits based on goals mentioned. Defaults available: Exercise (3/week), Reading (4/week), No [vice] (7/week). User can accept, edit, or skip.
- Economy defaults: For mission 1, Sid recommends a conservative target weekly earn rate (e.g., €15-20/week) with the explanation: “We’ll tune this after your first few weeks of data.” Habit points use the standard table. Quest points use default size mapping. No calibration attempted — the economy runs on defaults until debrief data accumulates.
- Success criteria and principles: Sid prompts for 2-3 success criteria (“What would make this mission a win?”) and optionally 1-2 principles. These can be skipped entirely for mission 1.
The goal: a user with zero history can launch their first mission in under 10 minutes with reasonable defaults, without being overwhelmed by configuration.
Active — The mission is running. Full dashboard, quest board, habit tracker, weekly log, point economy. Theme and dates are locked. Mission name, quests, habits, principles, and success criteria are mutable. No auto-termination on end date — the mission stays Active until manually closed.
Complete — The mission is closed. Before generating the closing summary, the user reviews each success criterion and marks it: achieved, partial, or missed. Sid then generates a closing summary (narrative + basic stats: total quests completed, habit averages, points earned/spent, days active, success criteria outcomes). The summary is stored as a read-only archive entry visible from Interlude. The app transitions to Interlude. (Post-MVP Phase 3 adds a richer “Stat Deck” presentation and cross-mission comparison.)
Acceptance criteria:
- Only one mission can be Active at a time
- Transition Interlude → Ideation is a deliberate action (button press)
- Transition Ideation → Active requires all mandatory fields (theme, dates, at least one category)
- Transition Active → Complete requires a deliberate, high-friction action (confirm dialog, not a single tap)
- Active → Complete flow includes success criteria review (achieved / partial / missed per criterion)
- Complete state generates a closing summary via Sid, incorporating success criteria outcomes
- Complete missions are viewable as read-only archives from Interlude
- Interlude state renders a distinct, minimal UI — not just an empty dashboard
- Habits track during Interlude (no points awarded)
- Quest rollover is presented during Ideation if a previous mission exists with incomplete quests
- Each incomplete quest gets a deliberate choice: carry forward, archive, or drop
- First-mission Ideation (zero history) follows the structured discovery flow with defaults
- A user with no previous missions can launch their first mission in under 10 minutes
5.2 Quests
Description: Categorized tasks with AI enrichment, step tracking, and point values tied to estimated effort. Quests are mission-scoped and flow in continuously during Active state — they are NOT all defined upfront. This is a GTD-style capture model: life generates tasks, you type them in, Sid enriches them, they land on the board.
Quest categories are user-defined string labels created during Ideation (e.g., “Home,” “Kids,” “Work,” “Personal,” “Laura”). They serve as the primary grouping axis on the quest board. Categories are mutable during Active state — you can add new ones if a quest doesn’t fit existing categories. Categories have no special properties beyond being labels. Sid uses the mission’s category list during enrichment to suggest which category a quest belongs to.
User stories:
- As a user, I can type a short description (“fix toilet”) and Sid returns a fully enriched quest (category, size, points, steps).
- As a user, I can review and edit the AI enrichment before accepting.
- As a user, I can create a quest manually if I prefer (escape hatch).
- As a user, I can advance through quest steps, mark quests complete/blocked/dropped.
- As a user, I can see all my quests organized by category.
- As a user, I can add quests at any time during Active state — not just during mission planning.
Quest taxonomy note: A quest should represent a discrete accomplishable outcome, not an ongoing responsibility (that’s a habit) or a recurring maintenance task (that’s a ping). If a quest would need more than 5 steps, consider breaking it into smaller quests. If it has no clear “done” state, it’s probably not a quest.
AI enrichment flow (optimistic capture):
Frictionless capture is non-negotiable. In GTD terms, if adding a quest takes more than 2 seconds, people stop capturing. The solution: optimistic UI with async enrichment.
- User types short description and hits enter. The quest appears on the board immediately as a bare entry (name only, status: active, unenriched).
- In the background, Sid enriches: suggested category, size, points, steps, reasoning.
- When enrichment completes (target < 5s), the quest card updates with a subtle indicator (“Sid’s suggestions ready”). User can tap to review, edit, accept, or dismiss the enrichment.
- If the user never reviews, the quest stays as a bare entry — functional but unstructured. It can be enriched later on demand.
- If AI is unavailable or times out, the quest remains bare. Manual enrichment (edit form) is always available.
This preserves instant capture while keeping AI enrichment as the default enhancement. Capture is never blocked by AI latency.
Quest model:
- Name, category, size, points, steps (ordered list), current step index
- Status: active | complete | blocked | dropped
- Timestamps: created, last advanced, completed
- Mission reference (quest belongs to a mission)
Size-to-points mapping (default, adjustable per mission during Ideation):
| Size | Default points | Effort heuristic |
|---|---|---|
| Tiny | 1-2 | < 30 minutes, single action |
| Small | 3 | 30 min - 2 hours, or 2-3 steps |
| Medium | 5 | Half a day, or multi-step across days |
| Big | 7 | Multiple sessions across a week+ |
| Huge | 10-15 | Multi-week project, significant effort |
Acceptance criteria:
- Quest creation is instant — typing a name and hitting enter creates a bare quest immediately (optimistic UI)
- AI enrichment runs asynchronously after creation, populating category, size, points, and steps
- User is notified when enrichment is ready and can review/edit/accept/dismiss
- Unenriched quests are functional — they appear on the board and can be completed without enrichment
- Manual enrichment (edit form) is always available for any quest
- AI enrichment completes in < 5 seconds for 90% of requests
- Quests display organized by category on the quest board
- Step advancement updates the quest’s last-advanced timestamp
- Completing a quest creates a ledger entry (earn_quest) for the point value
- Quests can be filtered by status (active, complete, blocked, dropped)
- Quest categories come from the mission’s defined category list, with the ability to add new categories on the fly
- Blocked and dropped quests do not earn points
- Quests can be created at any time during Active state
5.3 Habits
Description: Recurring actions tracked for daily consistency. Helm supports two types of habits: global habits that persist across missions (life practices like exercise, meditation, no alcohol) and mission-scoped habits that exist only for a specific mission (experiments like “write 10 min/day” tied to a content-focused quarter). Both types appear together on the habit grid.
Habit setup happens during Ideation: Sid proposes which global habits to activate (with optional target adjustments), and may suggest new mission-scoped habits aligned with the theme. Users can also add or remove habits at any time.
User stories:
- As a user, I can define global habits that persist across all missions.
- As a user, I can define mission-scoped habits that exist only during a specific mission.
- As a user, I can toggle habit completion for each day of the current week.
- As a user, I can see my weekly habit grid (rows = habits, columns = Mon-Sun).
- As a user, I can review past weeks’ habit data.
- As a user, I can adjust habit targets during Ideation for a new mission.
- As a user, I can track global habits during Interlude (no points).
- As a user, I can see which habits are global vs. mission-scoped (subtle visual distinction).
Habit point calculation:
Points are auto-calculated weekly. At the start of each new week (or more precisely, when the user opens Helm after a week boundary), the system calculates the previous week’s habit percentages, determines points earned, and creates ledger entries automatically. No manual claiming is required. Points only generate during Active mission state.
| Weekly completion % | Points earned |
|---|---|
| 90%+ | 10 |
| 75-89% | 7 |
| 50-74% | 5 |
| 25-49% | 3 |
| 10-24% | 1 |
| < 10% | 0 |
Overachieving: For MVP, exceeding a habit target (e.g., doing 7/7 when the target is 4/week) earns the same points as hitting 90%+. The target is the contract. Overachieving is its own reward. However, Sid will note sustained overperformance in debriefs and may suggest raising the target: “Writing at 100% for three weeks straight — you’ve outgrown the target. Time to raise the bar, or is this exactly where you want to be, Captain?” Formal overachievement bonuses are a post-MVP consideration (see Habit Mutations, Phase 4).
Acceptance criteria:
- Habit definitions support both global (no mission_id) and mission-scoped (with mission_id)
- Global habits persist across missions; mission-scoped habits are archived when their mission completes
- Daily toggles are per-habit, per-day (Mon-Sun week)
- Weekly points are auto-calculated and ledger entries are created without manual action
- Weekly summary shows: checks completed, total possible, percentage, points earned
- Points are only created as ledger entries during Active mission state
- Habit data is viewable for past weeks
- Global habits are trackable during Interlude (UI shows grid, no point calculation)
- Mission-scoped habits only appear during their mission’s Active state
- Adding/editing/removing habit definitions is available anytime
- Global and mission-scoped habits are visually distinguishable on the grid
5.4 Point Economy & Ledger
Description: A double-entry ledger that tracks all point earning and spending. 1pt = 1€ as a self-enforced rule — the system doesn’t touch your bank account, but the peg creates real behavioral stakes. The economy is calibrated by setting a target weekly earn rate during Ideation, monitored as a living conversation through debriefs.
User stories:
- As a user, I can see my current point balance prominently on the dashboard.
- As a user, I can see a running ledger of all earn and spend entries.
- As a user, I can “quick spend” by entering a description and amount (for unplanned purchases).
- As a user, I can see weekly and mission-to-date totals.
- As a user, I can set a target weekly earn rate during Ideation as an economy guideline.
Ledger entry types:
earn_quest— points from completing a questearn_habit— weekly habit consistency points (auto-calculated)spend— discretionary spending (user-initiated)correction— adjustments for errors (e.g., accidental quest completion, duplicate habit award). Always has a description explaining the correction. Ledger entries are immutable — corrections are additive, never destructive edits.
Economy calibration — a living process, not a one-time calculation:
During Ideation, the user sets a target weekly earn rate — the answer to “how much discretionary spending per week do I want to earn the right to?” (e.g., €25/week). Sid uses this to sanity-check the habit point scale and suggest adjustments to the size-to-points table if needed.
Since quests are created continuously during Active state (not all loaded upfront), Sid cannot predict exact weekly quest earnings at Ideation time. Instead, the economy is monitored week-over-week through debriefs. Sid comments on actuals vs. target: “You’re averaging 18pts/week earned against a 25pt target. Quest pace is light — either pick up some small wins or expect to dip into the red, Captain.” This makes the economy a living conversation, not a static formula.
Economy defaults and guardrails:
- First-mission default: €15-20/week target. Conservative by design — better to start with surplus than deficit. Sid recommends this for mission 1 and adjusts from mission 2 based on actual data.
- Target earning mix: approximately 50-60% from quests, 40-50% from habits. If habits alone exceed the target, quests feel meaningless. If quests dominate, habit consistency doesn’t matter. Sid flags imbalances.
- Miscalibration signals: If the balance is positive for 3+ consecutive weeks, the economy is too easy — earning exceeds spending and there’s no tension. If negative for 3+ consecutive weeks, it’s too punishing. Sid flags both in debriefs with specific adjustment suggestions.
- Mid-mission adjustments: The user can adjust the size-to-points mapping and the target earn rate at any time during Active state. This is not failure — it’s calibration.
- Commitment device honesty: The 1pt = 1€ peg is a self-enforced commitment device. Research on commitment devices (StickK, Beeminder) shows they work well for people who opt in voluntarily and have skin in the game, but they don’t work for everyone. The economy will be powerful for users who take the peg seriously and hollow for those who don’t. This is acknowledged, not solved. Helm doesn’t try to enforce the peg mechanically — the visibility of the balance and Sid’s commentary are the enforcement mechanisms. If a user ignores the economy entirely, the rest of the system (quests, habits, debriefs) still works — the economy is an amplifier, not a dependency.
Acceptance criteria:
- Balance is displayed prominently (large number, green if positive, red if negative)
- Every point-earning action creates a ledger entry automatically
- Habit points are auto-calculated weekly — no manual claiming
- Habit point awards use idempotency keys (habit_id + week_start) to prevent duplicates
- Quick spend creates a ledger entry with user-provided description and amount
- Ledger entries are immutable — corrections are additive (type:
correction), never edits or deletes -
point_balanceon missions is a cached projection recalculated from ledger entries on app load - If cached balance diverges from ledger sum, cache is silently corrected
- Ledger entries are timestamped and associated with the active mission
- Weekly summary shows: earned (by type), spent, net, running balance
- Balance can go negative (this is a feature — tension in the economy is the point)
- Ledger is filterable by type and date range
- Target weekly earn rate is set during Ideation and visible on the dashboard for reference
5.5 AI Copilot — Sid
Description: The AI personality layer that powers Ideation, quest enrichment, weekly debriefs, and on-demand reflection. Sid is a character with a persistent personality — not a generic assistant. Named after Sidra from Becky Chambers’ A Closed and Common Orbit.
Personality spec:
You are Sid, the AI copilot of Helm — a personal operating system where
someone organizes their life into story arcs called missions.
Your personality:
- Competent and direct. You respect the Captain's time.
- Sardonic warmth. You're genuinely invested in their success, but you find
their patterns entertaining. You're the ship's copilot who's seen it all
and still shows up for every mission.
- The tone model: imagine a toned-down version of the AI dungeon master from
Matt Dinniman's "Dungeon Crawler Carl" — a darkly humorous artificial
intelligence that runs a deadly game show, is wryly amused by the
contestants' struggles, but develops genuine investment in their survival
over time. You're not running a death game — you're running a life game.
Same energy, lower stakes, warmer heart.
- You reference ship/space metaphors naturally but don't overdo it.
- You're honest about failures without being cruel. "Meditation at zero for the
third week. At this point the habit isn't meditation — it's ignoring
meditation. Want to drop it or actually do it, Captain?" not "Great effort on
meditation this week!"
- You address the user as "Captain" by default (configurable in profile).
- You keep responses concise. No filler. No sycophancy.
- When you don't have enough data to say something meaningful, say so.
Don't fabricate patterns.
Context about the Captain and their current mission will be injected below.
How Sid accumulates context (not “gets smarter”):
Sid doesn’t learn in any ML sense. Each subsequent interaction has more data to reference. This isn’t machine learning — it’s context accumulation.
Context assembly strategy:
LLMs have finite context windows. Dumping raw JSON from multiple missions will produce generic or hallucinated output. Each interaction type has a different context budget:
- Quest enrichment (~500 tokens): mission categories + 5 most recent quests for naming/sizing consistency.
- Weekly debrief (~2,000 tokens): full current week data + compressed mission summary (theme, weeks elapsed, balance, economy target).
- Ideation (~3,000 tokens): compressed summaries of up to 3 previous missions (theme, quest completion rate, habit averages, economy behavior, success criteria outcomes) + global habit definitions + rollover quests.
- On-demand chat (~1,500 tokens): current mission summary + last 2 weeks of activity.
Previous missions are never injected as raw data. They’re compressed into structured summaries at mission close (stored in closing_summary). This is the product’s equivalent of progressive summarization.
Tone guardrails:
- Never mock effort, only patterns. “You tried meditation in 3 missions and dropped it each time” is acceptable. “You’re bad at meditation” is not.
- Never comment on the person, only the data.
- Sensitive pattern detection: If all habits drop to 0% for 2+ weeks, Sid’s tone shifts from sardonic to gentle: “It’s been quiet. Everything okay, Captain? If things are rough, the system can wait.”
- Never fabricate patterns. If data is insufficient, say so.
- Unacceptable responses: “Wow, another week of not meditating. Classic you.” (mocking) / “Your spending suggests poor impulse control.” (psychologizing) / “Great job completing 2 quests!” (sycophantic) / “You should consider therapy.” (overstepping)
Sid’s touchpoints in MVP:
-
Ideation conversation — Multi-turn planning session. Sid asks 1-2 questions at a time (never a wall of text). Ingests previous mission data if available. Proposes structured mission parameters. Surfaces quest rollover candidates from previous mission. Recommends habit setup (which global habits to activate, target adjustments, mission-specific habits to add).
-
Quest enrichment — Single-turn. Receives short quest description + mission categories. Returns structured enrichment (category, size, points, steps, brief reasoning).
-
Weekly debrief — Presented on first session after the preferred debrief day (default: Monday, configurable). Sid receives the full week’s data: quests completed/advanced, habit percentages, points earned/spent, balance, economy target comparison. Returns a narrative summary (3-5 sentences), notable observations, and economy commentary.
-
On-demand chat — The user can talk to Sid anytime during Active state. Sid has access to current mission context (quests, habits, balance, recent activity). Useful for reflection, venting, or asking “what should I work on?”
-
Mission closing summary — When transitioning Active → Complete, Sid generates a closing narrative + stats summary for the mission archive.
Acceptance criteria:
- Sid’s personality is consistent across all touchpoints (shared system prompt)
- Ideation conversation maintains multi-turn context
- Quest enrichment returns structured JSON (parseable by frontend)
- Weekly debrief is generated from actual week data, not hallucinated
- On-demand chat is available during Active state
- All Sid interactions inject relevant mission context into the prompt
- Context assembly respects token budgets per interaction type (enrichment ~500, debrief ~2,000, Ideation ~3,000, chat ~1,500)
- Previous missions are injected as compressed summaries from
closing_summary, never raw data - Sid gracefully handles first-ever interaction (no previous mission data) using the structured discovery flow
- Sid never fabricates patterns — insufficient data produces an explicit acknowledgment
- Sid’s tone shifts to gentle mode when activity drops to zero for 2+ weeks
- AI calls have a loading state and error fallback in the UI
- Malformed AI JSON output is caught and handled gracefully (never corrupts the data model)
- Responses target < 5 seconds for enrichment, < 15 seconds for debriefs
5.6 Dashboard (Active State)
Description: The command deck. Information-dense overview of the running mission. The screen you see most.
Widgets:
- Mission header: Name, theme badge, date range, days remaining / days elapsed progress bar
- Balance readout: Large monospace number, green/red, total earned and spent this mission
- Quest status panel: Counts by status (active / complete / blocked). Category breakdown as small badges
- Habit pulse: Current week’s grid — small dots per habit per day. Overall weekly percentage and projected points
- Recent activity feed: Last 5-8 ledger entries with timestamps
- Sid panel: Pending debrief indicator (“Week 4 debrief ready”) or quick-chat entry point
Acceptance criteria:
- Dashboard renders within 2 seconds on mobile
- All widgets show loading states, not blank space
- Tapping any widget navigates to the relevant detail page
- Dashboard is responsive: 2-column on desktop, stacked on mobile
- Empty states have helpful copy, not blank boxes
5.7 Design Language — Wayfarer Cockpit
Description: The visual identity of Helm. Warm, information-dense, retro-futuristic but cozy.
Color palette:
| Token | Hex | Usage |
|---|---|---|
helm-hull | #1a1a2e | Deep navy background — the ship’s hull |
helm-cream | #e8e0d4 | Warm cream text — amber readouts |
helm-amber | #d4a574 | Primary accent — active elements, the ship’s color |
helm-panel | #22223a | Card/panel backgrounds |
helm-border | #2d2d44 | Subtle panel borders |
helm-muted | #6b6b8a | Secondary text, inactive elements |
helm-surface | #292942 | Elevated surfaces, hover states |
helm-positive | #7cb87c | Green — completed, earned, healthy |
helm-negative | #c75c5c | Red — spent, negative balance, overdue |
helm-warning | #d4a03c | Gold — warnings, blocked, streaks |
helm-info | #5c8cc7 | Blue — informational |
Typography:
- Monospace for data, numbers, points, dates:
JetBrains Mono(Google Fonts) - Sans-serif for labels, body text:
Space Grotesk(Google Fonts) - Base size: 14px desktop, 15-16px mobile for readability. Dense use of 10-11px uppercase tracking-wide labels on desktop; slightly larger (12px) on mobile.
Component patterns:
- Panels: Rounded corners (8px), 1px border
helm-border, backgroundhelm-panel. No drop shadows. - Section headers: Left border accent bar (4px
helm-amber), uppercase label inhelm-muted. - Data readouts: Monospace values with muted uppercase labels above.
- Interactive elements: Amber accent on hover/active. 150ms transitions. Touch targets minimum 44px on mobile.
- Status badges: Small pill shapes, color-coded by size or status.
Signal decay (visual principle, applied across features):
Items with a “last interacted” timestamp (quests, and later pings and crew) are rendered with brightness/opacity based on recency: recently-advanced items glow warm, stale items dim. Purely visual — no thresholds to configure. Formula: freshness = 1.0 - clamp((daysSinceLastAction / decayDays), 0, 1), applied as opacity modifier on borders or subtle glow. For quests, decayDays = 14. For future features (pings, crew), decay is personalized based on historical frequency. This is a design pattern, not a standalone feature. Accessibility note: Signal decay must not rely solely on opacity. Add a secondary indicator (e.g., a small “days since” label or icon change) so the information is available to users who cannot perceive subtle brightness differences.
Accessibility requirements:
The Wayfarer cockpit aesthetic is opinionated and dense, which creates inherent accessibility tension. These are the non-negotiable baselines:
- Contrast: All text/background combinations must meet WCAG AA minimum contrast ratios (4.5:1 for normal text, 3:1 for large text). The current palette needs verification —
helm-muted(#6b6b8a) onhelm-panel(#22223a) is borderline and may need lightening. Small uppercase labels (10-12px) must be tested especially carefully. - Keyboard navigation: All interactive elements (habit toggles, quest cards, buttons, form fields) must be reachable and operable via keyboard. Focus states must be visible (use
helm-amberoutline). - Screen readers: Semantic HTML throughout. Habit grid cells need ARIA labels (“Writing, Monday, completed” / “Writing, Tuesday, not completed”). Status badges need text alternatives. The dashboard must read coherently in linear order.
- Touch targets: Minimum 44x44px on all interactive mobile elements (already specified in component patterns).
- Color independence: Status information (positive/negative balance, quest status, habit completion) must not rely on color alone. Use icons, labels, or shape in addition to color.
Responsive density principle — same data, different density per viewport:
The cockpit aesthetic is designed for desktop: dense, multi-column, everything visible at a glance. On mobile (< 768px), the same data is presented at lower density — not a scaled-down cockpit, but a purpose-built mobile experience optimized for quick actions: toggle a habit, complete a quest step, quick spend, check your balance. Deep review (weekly log, debriefs, quest board browsing) is a desktop experience. The mobile layout prioritizes speed and touch-friendliness over information density.
Layout:
- Desktop (≥ 768px): Fixed left sidebar (200px) + main content. Sidebar: mission name, balance readout, nav links with icons.
- Mobile (< 768px): Bottom tab bar (4-5 main tabs: Dashboard, Quests, Habits, Log, Sid). No sidebar. Cards are full-width, stacked vertically. Touch targets are generous (44px minimum). Typography scales up slightly for readability.
6. Post-MVP Feature Roadmap
Phase 2 — Life Layer
Pings (recurring maintenance tracker) — “When did I last clean the cat litter? Water the plants? Change the bed sheets?” Pings are global (not mission-scoped) timestamp trackers for recurring life maintenance that isn’t a habit and isn’t a quest. No targets, no points, no guilt — just timestamps with visual freshness decay based on your personal frequency. Each ping has a name, icon, and a “last pinged” timestamp. The system learns your average frequency from history and uses it to calculate decay. One tap to log. Dashboard surfaces the 2-3 most overdue pings. User story: “As a user, I can see at a glance which recurring maintenance tasks are overdue relative to my own patterns, without any of them being framed as failures.”
Crew Manifest (relationship tracker) — A personal CRM for the people who matter. Each crew member has a name, relationship type, preferred contact method, and a last-contacted timestamp. Like pings, the system learns your personal contact frequency per person and uses it for decay. Sorted by “needs attention.” One tap to log contact. Dashboard surfaces 2-3 crew members most overdue. User story: “As a user, I can maintain my relationships intentionally without feeling like I need to contact everyone every week.”
Media Log (entertainment tracker) — Personal media diary for TV, movies, anime, documentaries, podcasts. Each entry has a title, type, status (want to watch / watching / completed / dropped), optional rating, and notes. Completing media earns hobby points via the ledger during Active state. Sid references your media backlog in bounties (Phase 4). User story: “As a user, I can track what I’m watching, what I want to watch, and earn points for completing media during a mission.”
Captain’s Log (journaling) — Freeform journal entries, optionally AI-prompted. Fills the reflection gap that debriefs don’t cover: unstructured brain dumps, Struthless VOMIT-style “vent then organize” flow, personal narrative not tied to weekly data. Entries are timestamped, optionally tagged, searchable. Sid can offer prompts but never requires them. Entries are private and never summarized without consent. User story: “As a user, I can journal freely within Helm without needing a separate app, and my reflections live alongside my mission data.”
Interlude Enhancements — Richer Interlude UI with access to all independent actions, a gallery of past missions, and a visible habit grid (no points). The Interlude should feel like a calm harbor between voyages, not an empty screen.
Phase 3 — Identity
Character Sheet (reflection surface) — Not an RPG stat screen with arbitrary STR/INT/CHA scores. Instead, a data-driven reflection surface where every number traces back to something you actually did, and every insight suggests something you could do differently.
Components:
- Category balance — A radar or proportional view showing how your effort distributes across life areas (Home, Kids, Work, Personal, etc.) across missions. Actionable: “I’ve been 70% Home quests for two missions. Am I neglecting personal projects, or is that what this season requires?”
- Habit story — A timeline of your relationship with each habit across missions. What you’ve sustained, what you’ve dropped, what’s evolved. Actionable: “I’ve tried meditation in 3 missions and dropped it every time. Either commit or stop pretending.”
- Economy profile — Lifetime earn/spend patterns, average balance, spending categories, weekly burn rate. Actionable: “I consistently overspend in weeks 3-4. Set up guardrails.”
- Titles — Data-driven achievements, honestly earned. “Debt Walker” because you were in the red for 3 weeks. “Questbreaker” because you finished 10+ quests. Fun, but grounded in real data — not arbitrary thresholds on made-up stats.
- Sid’s observations — Accumulated AI-detected behavioral patterns across missions. Not RPG “traits” — honest notes: “You complete physical quests fast but creative projects stall. You’re a sprinter, not a marathoner, Captain.”
- Level — Simple function of lifetime points. Satisfying and harmless.
The Character Sheet earns its place because it informs the next Ideation: “Sid, I see my category balance is skewed toward Home. Let’s rebalance this mission.” Every data point on the sheet is a conversation starter, not a vanity metric.
User story: “As a user, I can see a living, data-driven portrait of who I’ve been across all my missions — and use it to plan who I want to be next.”
Mission Retrospective (Stat Deck + History, merged) — When a mission completes, the MVP generates a basic closing summary. Phase 3 upgrades this to a rich “Stat Deck”: a multi-slide, progressively-disclosed summary with bold typography showing Sid’s closing narrative, total quests completed, habit averages, points earned and spent, titles earned, and comparison to previous missions. Each completed mission’s Stat Deck is accessible from a Mission History page, which shows all past missions with trend lines across them. The Stat Deck is the “end credits” for your arc. Mission History is the bookshelf where all your arcs live.
User story: “As a user, completing a mission feels like finishing a chapter, and I can browse all my chapters from one place.”
Phase 4 — Emergence
Tavern Bounties (AI-generated challenges) — 3-5 rotating weekly challenges generated by Sid based on quest patterns, habit gaps, hobby inertia, crew neglect, and media backlog. Types include single (one-off) and streak (multi-week, bonus on completion). Sid generates bounties that are relevant and personal: “No board game plays logged in 3 weeks. Play one this weekend for 1pt.” User story: “As a user, I receive personalized weekly challenges that nudge me toward neglected areas of my life.”
Salvage Run (micro-quest suggestions) — “I have 15 minutes. What should I do?” Sid receives your available time + active quests with current steps, and suggests the highest-value micro-action. Optimized for fragmented time. User story: “As a user, I can make progress on quests even in small time windows by getting specific, actionable suggestions from Sid.”
Habit Mutations (AI-suggested adjustments) — Sid analyzes multi-week habit trends and suggests evolutions (harder targets for sustained 90%+) or simplifications (easier targets for sustained <25%). Also handles overachievement: sustained exceeding of targets prompts a “raise the bar” suggestion. Mutations are presented as actionable cards — accept to apply, dismiss to keep current. User story: “As a user, my habit targets evolve over time based on my actual performance.”
Wishlist (aspirational spending targets) — Items you want to buy, with euro/point cost. Shows “affordable” vs. “saving toward” based on current balance. Affordability projection based on recent earning rate. Purchasing creates a spend entry. User story: “As a user, I can see what I’m saving toward and how long it’ll take to earn it.”
7. Technical Architecture
Platform stack
| Layer | Technology | Rationale |
|---|---|---|
| Frontend | React 18, TypeScript, Vite, TailwindCSS 3 | Industry standard, fast dev loop, type safety |
| State management | Zustand + React Query | Zustand for UI state + local cache; React Query for server state + sync |
| Backend | Supabase (hosted Postgres + Auth + Edge Functions + Realtime) | Eliminates custom backend for CRUD; RLS for multi-user path; generous free tier |
| AI | Supabase Edge Functions → Anthropic Claude API (Sonnet) | Edge functions for server-side AI calls; keeps API key secure; low latency |
| PWA | vite-plugin-pwa | Installable on phone/desktop; offline shell caching |
| Hosting | Vercel or Netlify (frontend) + Supabase (backend) | Zero-config deployment; free tier covers single-user |
Data model
missions
| Column | Type | Notes |
|---|---|---|
id | uuid (PK) | |
user_id | uuid (FK → auth.users) | RLS-ready for multi-user |
name | text | Mutable during Active |
theme | text | 1-3 words. Immutable after launch |
status | enum | interlude, ideation, active, complete |
start_date | date | Immutable after launch |
end_date | date | Target end. Immutable after launch |
completed_at | timestamptz | Null until Complete |
closing_summary | jsonb | Sid-generated narrative + stats on completion |
categories | text[] | Array of user-defined category labels |
success_criteria | text[] | Array of success statements |
principles | text[] | Array of mission principles |
target_weekly_earn | integer | Economy guideline |
point_balance | integer | Denormalized running balance |
created_at | timestamptz |
quests
| Column | Type | Notes |
|---|---|---|
id | uuid (PK) | |
mission_id | uuid (FK → missions) | Mission-scoped |
user_id | uuid (FK) | RLS |
name | text | |
category | text | From mission’s categories |
size | enum | tiny, small, medium, big, huge |
points | integer | |
steps | jsonb | Ordered array of step strings |
current_step | integer | 0-indexed |
status | enum | active, complete, blocked, dropped |
ai_enriched | boolean | Was this created via Sid? |
rolled_over_from | uuid (FK → quests, nullable) | If carried from previous mission |
created_at | timestamptz | |
last_advanced_at | timestamptz | |
completed_at | timestamptz |
habits
| Column | Type | Notes |
|---|---|---|
id | uuid (PK) | |
user_id | uuid (FK) | RLS |
mission_id | uuid (FK → missions, nullable) | Null = global. Set = mission-scoped |
name | text | |
target_description | text | e.g., “10 min” |
days_per_week | integer | Target frequency |
icon | text | Lucide icon name |
is_active | boolean | Can be paused |
created_at | timestamptz |
habit_logs
| Column | Type | Notes |
|---|---|---|
id | uuid (PK) | |
habit_id | uuid (FK → habits) | |
user_id | uuid (FK) | RLS |
date | date | |
done | boolean | |
week_start | date | Monday of the week |
ledger
| Column | Type | Notes |
|---|---|---|
id | uuid (PK) | |
mission_id | uuid (FK → missions) | Mission-scoped |
user_id | uuid (FK) | RLS |
type | enum | earn_quest, earn_habit, spend, correction |
description | text | |
points | integer | Positive for earn, negative for spend |
week_start | date | Monday of the week |
created_at | timestamptz |
ai_conversations
| Column | Type | Notes |
|---|---|---|
id | uuid (PK) | |
user_id | uuid (FK) | RLS |
mission_id | uuid (FK, nullable) | Null for Ideation (mission not yet created) |
type | enum | ideation, debrief, chat, enrichment |
messages | jsonb | Array of {role, content} pairs |
created_at | timestamptz |
Key technical flows
Quest creation (optimistic UI + async enrichment):
- User types short description and hits enter (available anytime during Active state)
- Frontend immediately creates a bare quest in Supabase (name, status: active, ai_enriched: false). Quest appears on board instantly.
- In parallel, frontend sends description + mission categories to Edge Function for enrichment
- Edge Function calls Claude with Sid system prompt + quest enrichment prompt + recent quests for context (~500 tokens)
- Claude returns structured JSON (category, size, points, steps, reasoning)
- Frontend updates the quest record with enrichment data and shows “Sid’s suggestions ready” indicator on the card
- User taps to review, edit, accept, or dismiss the enrichment at their convenience
- If Edge Function fails or times out (>5s), quest remains bare — functional but unenriched. Manual edit always available.
- Unenriched quests default to: no category, no size, no points, no steps. They can still be completed but earn no points until sized.
Weekly debrief generation:
- On session start, frontend checks: is today past the preferred debrief day AND no debrief exists for the previous week?
- If yes, frontend queries week’s data: quests completed/advanced, habit logs, ledger entries
- Frontend sends aggregated week data to Edge Function
- Edge Function calls Claude with Sid system prompt + debrief prompt + week data + economy target
- Claude returns narrative + observations + economy commentary as JSON
- Frontend stores debrief and presents it in a dedicated panel
- User can dismiss and revisit from the weekly log
Multi-device sync and offline writes:
- Supabase Postgres is the single source of truth
- React Query handles server state with stale-while-revalidate
- Zustand stores UI-only state (selected tab, expanded panels) in memory
- Optimistic offline writes for low-conflict operations: Habit toggles and quest step advancement are queued locally if offline and synced on reconnect. These are simple, idempotent operations with low conflict risk. Quest creation, quick spend, and AI interactions require connectivity.
- PWA caches the shell and read-only data via vite-plugin-pwa
- Conflict resolution: last-write-wins (acceptable for single primary user)
Ledger trust mechanics:
- The ledger is the source of truth for the economy.
point_balanceon the missions table is a cached projection, recalculated from ledger entries on app load. - Habit point awards use idempotency keys (habit_id + week_start) to prevent duplicate entries across devices or after cache weirdness.
- Ledger entries are immutable. Corrections are made by adding a new entry (type:
correction) rather than editing or deleting existing entries. - On app load, if the cached balance diverges from the ledger sum, the cache is silently corrected. The ledger is never wrong; the cache can be.
8. Cost & Infrastructure
Monthly cost breakdown
| Service | Free tier | At scale (100 users) | Notes |
|---|---|---|---|
| Supabase | 500MB DB, 50k auth users, 500k Edge Function invocations | $25/mo (Pro plan) | Free tier covers single-user indefinitely |
| Anthropic Claude API | N/A (pay per use) | ~$5-15/mo for single user | ~10 enrichments/week + 1 debrief + occasional chat ≈ $0.30-0.50/week on Sonnet |
| Vercel / Netlify | Free tier (100GB bandwidth) | Free tier likely sufficient | Static PWA deployment |
| Google Fonts | Free | Free | JetBrains Mono + Space Grotesk |
| Domain | Subdomain of existing domain | — | No additional cost |
Cost summary by phase
| Phase | Monthly cost | Notes |
|---|---|---|
| MVP (you only) | ~$2-5/mo | Supabase free tier + Claude API usage |
| Early users (10-50) | ~$30-40/mo | Supabase Pro + higher Claude usage |
| Growth (100+) | ~$50-100/mo | Supabase Pro + significant Claude usage |
Maintenance burden
- Supabase: Managed. No server patching, no DB administration.
- Frontend: Vercel auto-deploys from git. No CI/CD to maintain.
- AI: Claude API is stateless. No model training, no fine-tuning. Personality lives in prompts.
- PWA: Service worker updates automatically with vite-plugin-pwa.
- Total estimated maintenance: 1-2 hours/month for dependency updates and monitoring.
9. Design Principles
-
Arcs over infinity. Every interaction should reinforce that this system has seasons, not streaks. Rest is designed in, not failed into.
-
Honest over encouraging. Sid tells you what the data says, not what you want to hear. The system earns trust through accuracy, not positivity.
-
Dense on desktop, focused on mobile. Desktop screens feel like cockpit readouts — rich with information, organized so your eye finds what matters. Mobile screens are purpose-built for quick actions: toggle, complete, spend, check. Same data, different density per viewport.
-
Earn it, spend it. Points exist to create consequences. If earning feels too easy or spending feels meaningless, the economy is broken. Tension in the balance is the feature.
-
Your story, your rules. The system provides structure (missions, states, economy) but the content (quests, habits, principles, categories) is entirely user-defined. Sid suggests; the Captain decides.
10. Success Metrics
North star metric
Weekly active sessions — the number of weeks where the user opens Helm at least twice (once to log, once to review). If this number drops, nothing else matters.
Supporting metrics
| Metric | Target (single user) | Why it matters |
|---|---|---|
| Weekly active sessions | ≥ 2 per week | Core engagement — are you using it? |
| Quest completion rate | 60-80% per mission | Too high = quests are too easy. Too low = system is discouraging. |
| Habit consistency | 50-75% average | Sustainable range. 90%+ for weeks on end suggests undertargeting. |
| AI debrief read rate | 90%+ of generated debriefs | Is Sid delivering value? |
| Mission completion rate | 80%+ (not dropped/abandoned) | Are missions scoped correctly? |
| Points balance oscillation | Oscillates around zero, not permanently positive or negative | Economy is calibrated. |
| Time-to-first-quest | < 10 minutes from first Ideation | Onboarding is smooth. |
| Quest enrichment acceptance rate | > 70% accepted without major edits | Sid understands your categories and sizing. |
Kill criteria — when to stop
If after two complete missions, you’re opening the MDX page instead of Helm, the product has failed its core premise. If Sid’s debriefs feel generic for 4+ consecutive weeks, the AI integration has failed. If the economy balance is permanently ignored (no spending logged for 3+ weeks during Active), the economy has failed. These are honest signals that the concept doesn’t work, and it’s better to learn that than to keep building features on a broken foundation.
Qualitative metrics (collected via simple in-app feedback)
- Debrief usefulness: After reading a debrief, one-tap: “This was specific and useful” / “This was generic.” Target: 70%+ useful.
- Mission planning confidence: After completing Ideation, one-tap: “I feel clear about this mission” / “I’m still fuzzy.” Target: 80%+ clear.
- Enrichment accuracy: After reviewing quest enrichment, the edit rate is the implicit signal. Explicit: category/size changed = miss.
11. Risks & Mitigations
| Risk | Description | Severity | Mitigation |
|---|---|---|---|
| AI quality | Sid’s debriefs are generic or hallucinated | High | Inject full week data as structured context. Validate JSON output. Include “I don’t have enough data” fallback. |
| AI cost spiral | Excessive Claude API usage drives costs above budget | Medium | Rate-limit enrichment calls. Cache debriefs. Use Haiku for enrichment, Sonnet for debriefs. Monitor weekly. |
| Onboarding friction | First Ideation with zero history produces a weak mission plan | High | Design first-Ideation prompts to gather context conversationally. Accept that mission 1 will be manually tuned. Sid improves from mission 2. |
| Economy imbalance | Earning rates too high (no tension) or too low (punishing) | Medium | Target weekly earn rate as guideline. Sid flags imbalances in debriefs. User can adjust mid-mission. |
| Scope creep | MVP grows before the spine is solid | High | Strict phase gating. Don’t start Phase 2 until Phase 1 is deployed and used for at least 2 weeks. |
| Single user bottleneck | Designing for yourself makes the product ungeneralizable | Low | Supabase RLS + auth from day one. user_id on every table. UI decisions documented. |
| Supabase dependency | Platform changes, pricing, or outages | Low | Standard Postgres underneath. Data exportable. Frontend decoupled. |
| Habit tracking fatigue | Daily toggles become tedious | Medium | Minimal grid (tap to toggle). Habit mutations (Phase 4) suggest dropping stale habits. |
| Mobile readability | Dense cockpit aesthetic doesn’t translate to small screens | Medium | Dedicated mobile layout with reduced density, larger touch targets, action-first design. |
| Self-enforcement gap | Point economy has no mechanical enforcement — user can cheat | Medium | By design. The economy is a commitment device, not a lock. Research on commitment devices (StickK, Beeminder) shows they work for self-selected users but not universally. The economy is an amplifier — the rest of the system works without it. |
| Debrief quality degradation | Sid’s debriefs become generic/repetitive after many weeks | Medium | Debrief content varies with data — a high-spend week produces a different debrief than a zero-spend week. If debriefs feel generic, it’s a signal that the context assembly or prompt needs work, not that the concept is wrong. Kill criterion: 4+ weeks of “generic” feedback from the user. |
| Product overfit | The aesthetic, personality, and philosophy are so specific they only appeal to the creator | Medium | Intentional. “Build for one, architect for many” means the first user’s taste IS the product. If it doesn’t generalize, it’s still a successful personal tool. |
| AI cost unpredictability | Power users using on-demand chat extensively could blow through the API budget | Medium | Rate-limit on-demand chat to N messages/day. Use Haiku for enrichment, Sonnet for debriefs and Ideation. Monitor per-user API spend weekly. |
| Context window limitations | Shoving too much history into prompts degrades AI quality | High | Context assembly strategy with token budgets per interaction type (see 5.5). Previous missions stored as compressed summaries, not raw data. |
12. Monetization Strategy
Phase 1-2: Free for personal use. This is a learning project that’s also a viable product. No monetization until others use it.
Phase 3+: Open-core model.
- Free tier: Full feature set for a single user. Self-hosted option.
- Paid tier ($5-8/month): Cloud-hosted with AI features (Sid requires Claude API calls that cost real money). Multiple missions in parallel. Data export. Priority support.
- AI costs are the natural paywall. Sid is the premium feature. The system works without AI (manual quest creation, no debriefs) but it’s dramatically better with it.
This is realistic, not ambitious. The TAM for “adults who want a narrative life operating system” is small. The goal is sustainability (covering hosting + AI costs), not venture scale.
13. Known Limitations & Default Behaviors
| Gap | Default MVP behavior | Revisit when… |
|---|---|---|
| Limited offline write support | Habit toggles and quest step advancement work offline (queued and synced on reconnect). Quest creation, spending, and AI interactions require connectivity. | Users request full offline-first. Evaluate conflict resolution complexity for spending and quest creation. |
| No notifications / reminders | Sid doesn’t push. You pull. Debriefs are presented on next session. | Retention data suggests reminders would help, not annoy. |
| No multi-mission | One Active mission at a time. | Users request parallel tracks. Evaluate UX complexity. |
| No import / export | No way to import from Habitica, Notion, or spreadsheets. No data export. | Other users want to migrate in/out. |
| No shared access | Single user. Partner can’t see the dashboard. | Partner visibility requested. Add read-only shared view. |
| No calendar integration | Quests have no due dates or calendar sync. | Time-blocking becomes relevant. |
| No undo on quest completion | Completing a quest is final (points are earned). | Accidental completions happen. Add a 5-minute undo window. |
| Habit points are weekly only | No daily point granularity. Weekly auto-calculation. | Daily feedback loop requested. |
| No dark/light mode toggle | Dark mode only (Wayfarer cockpit). | Users request light mode. Dark is canonical. |
| AI can’t access external data | Sid only knows what’s in Helm. | Integration requests. Evaluate API-by-API. |
| Week definition is fixed (Mon-Sun) | Configurable debrief day, but weeks are Mon-Sun. | Non-standard work weeks requested. |
| No journaling in MVP | Reflection via debriefs and on-demand Sid chat. | Captain’s Log planned for Phase 2. |
| No overachievement bonuses | Exceeding habit target earns same as 90%+. | Habit Mutations (Phase 4) will address with “raise the bar” suggestions. |
| No enforcement on economy | Self-enforced. System doesn’t prevent spending real money beyond balance. | By design. Accountability, not restriction. |
| Minimal mission archive in MVP | Basic closing summary, not a rich Stat Deck. | Phase 3 adds the full retrospective experience. |
| No accessibility audit | Color contrast, keyboard nav, and screen reader support are design goals but not verified against WCAG AA. Signal decay (opacity-based) needs a secondary non-visual indicator. | Before any public launch. Accessibility is non-negotiable for a multi-user product. |
| No data deletion flow | No “delete my account and all data” capability. | Before multi-user launch. GDPR-aware design required. Add data export and right-to-deletion. |
| No coexistence with existing tools | Helm is an island — no import, export, API, or integrations. | When adoption data shows users want Helm alongside (not instead of) existing tools. |
| Ledger corrections are manual | Correction entries exist (type: correction) but must be created manually. No automated detection of duplicates or errors. | Add automated duplicate detection and a “reverse last entry” quick action. |
| No testing strategy in PRD | Testing (unit, integration, AI output validation) defined during spec-units, not here. | Implementation phase. AI JSON validation is especially critical — malformed Sid output must never corrupt the data model. |
Scope-cut priority (if behind at week 6, cut in this order):
- Signal decay visuals (cosmetic, not functional)
- On-demand Sid chat (debriefs and enrichment are sufficient for MVP)
- AI Ideation (replace with manual mission creation form; add AI Ideation for mission 2)
- Quest rollover (handle manually for mission 1-to-2 transition)
- Mission archive (just transition to Interlude without a summary)
Absolute minimum shippable product: Manual mission creation + AI-enriched quests (with optimistic capture) + habit grid + ledger + weekly debrief. Everything else is enhancement.
14. Decisions Log
| # | Question | Decision | Rationale |
|---|---|---|---|
| 1 | Who is the target user? | You first, others someday | Build for real usage, architect for scale. |
| 2 | Tech stack? | Supabase + React/TS/Vite/Tailwind PWA | Eliminates backend complexity. Postgres is portable. Auth + RLS ready for multi-user. |
| 3 | MVP scope? | Mission lifecycle → Quests → Habits → Points → AI debrief + quest rollover + mission archive | The spine plus the transition mechanics that make multi-mission work. |
| 4 | How many mission states? | 4: Interlude → Ideation → Active → Complete | Preserves Interlude (rest as feature) and Ideation (planning as conversation). |
| 5 | Is AI Ideation MVP? | Yes | Killer feature. Makes onboarding work. Nobody else has it. |
| 6 | Mission cadence? | Flexible, user-defined dates | No hardcoded quarterly assumption. |
| 7 | Auto-terminate on end date? | No | Stays Active until manually closed. |
| 8 | Quest creation flow? | AI-first with manual fallback | Sid enriches from short description. Manual form is escape hatch. |
| 9 | Habit scoping? | Both — global + mission-scoped | Global for life practices. Mission-scoped for experiments. |
| 10 | Point economy model? | 1pt = 1€, self-enforced, target weekly earn rate as living guideline | Real stakes without mechanical enforcement. Monitored in debriefs. |
| 11 | Mission principles? | First-class, flexible per mission | Guardrails defined during Ideation. Not enforced by system. |
| 12 | AI debrief cadence? | Weekly on first session after debrief day + on-demand | Respects schedule. No push. |
| 13 | Device strategy? | Both — mobile logging, desktop review, different density | PWA + Supabase sync. |
| 14 | AI copilot name? | Sid | Nod to Sidra from Chambers. Crewmate energy. |
| 15 | User callsign? | ”Captain” by default, configurable in profile | Warmer than “Commander.” Editable. |
| 16 | AI personality? | Sardonic warmth, DCC-inspired | Honest, concise, amused by patterns, genuinely invested. |
| 17 | Design language? | Wayfarer cockpit — dark/warm/amber/dense on desktop, focused on mobile | Opinionated. Different density per viewport. |
| 18 | Product name? | Helm (proposed) | Where you steer the ship. |
| 19 | Quest timing? | Continuous during Active | GTD capture model. Ideation defines container, not task list. |
| 20 | Quest rollover? | MVP feature | Core to mission-to-mission transition. |
| 21 | Habit points? | Auto-calculated weekly, no manual action | Reduces friction. |
| 22 | Overachieving? | Same points as 90%+. Sid notes it. | Formal bonuses deferred to Phase 4 Habit Mutations. |
| 23 | Journaling? | Phase 2 (Captain’s Log) | Debriefs cover structured reflection. Freeform is valuable but not spine. |
| 24 | Character Sheet? | Reflection surface, not RPG stats | Category balance, habit story, economy profile, titles, Sid’s observations. Data-driven, not arbitrary. |
| 25 | Mission archive in MVP? | Yes — basic closing summary | Full Stat Deck + History in Phase 3. |
| 26 | Signal decay? | Design pattern in 5.7, not standalone feature | Applied visually to quests (and later pings/crew). |
| 27 | ”Grows smarter”? | Replaced with “accumulates richer context” | Honest about what the AI does. No ML. Context injection. |
| 28 | Economy enforcement? | Self-enforced, by design | Commitment device. Works for self-selected users, not universally. Economy is an amplifier, not a dependency. |
| 29 | Quest capture latency? | Optimistic UI — quest created instantly, enrichment async | Frictionless capture is non-negotiable for GTD-style input. AI enrichment enhances but never blocks. |
| 30 | First-mission onboarding? | Concrete 6-step discovery flow with defaults | First Ideation can’t rely on history. Structured questions, starter categories, conservative economy defaults (€15-20/week), skip-able principles. Under 10 minutes to launch. |
| 31 | AI context window? | Token-budgeted context assembly per interaction type | Enrichment: ~500 tokens. Debrief: ~2,000. Ideation: ~3,000. Previous missions as compressed summaries, never raw JSON. |
| 32 | Ledger trust? | Immutable entries, idempotency keys, cached balance reconciliation | Ledger is source of truth. Balance is projection. Corrections are new entries, not edits. Habit awards use idempotency keys. |
| 33 | Sid tone guardrails? | Negative examples defined, sensitive pattern detection rules | Never mock effort. Never psychologize. Shift to gentle mode on multi-week zero activity. Never fabricate patterns. |
| 34 | Kill criteria? | Defined: MDX fallback, generic debriefs, ignored economy | Honest failure signals tied to specific product bets. Better to learn the concept doesn’t work than to keep building. |
| 35 | Offline writes? | Optimistic for habit toggles and step advancement only | Low-conflict, idempotent operations queued locally and synced on reconnect. Quest creation, spending, and AI require connectivity. |
| 36 | Scope-cut priority? | Ordered list: signal decay → on-demand chat → AI Ideation → rollover → archive | Absolute minimum: manual mission creation + AI quests + habits + ledger + weekly debrief. |
| 37 | Mission closing? | Success criteria reviewed (achieved/partial/missed) before summary | Adds accountability to arc endings. Sid references outcomes in closing narrative. |
| 38 | Accessibility? | WCAG AA baseline, keyboard nav, screen readers, color independence | Non-negotiable for multi-user. Verify palette contrast. Signal decay needs secondary non-visual indicator. |
| 39 | Economy defaults? | First mission: €15-20/week target, 50-60% quests / 40-50% habits earning mix | Conservative start. Sid flags miscalibration (3+ weeks surplus or deficit). Mid-mission adjustments encouraged. |
| 40 | Data deletion? | Not in MVP. Required before multi-user launch. | GDPR-aware design. Right to deletion, data export. Known limitation with clear trigger. |
15. Appendix
A. Competitor deep dives
Habitica — Free with $4.99/mo subscription. 15M+ downloads. RPG gamification with 8-bit pixel art. Three task types: habits, dailies, to-dos. Party system for group accountability. HP loss mechanic for missed dailies creates anxiety in some users. Community gutted in 2023 (guilds and Tavern removed). No AI. No mission/arc concept. Currency (gold) buys pixel gear — no real-world stakes. Strengths: proven gamification loop, open source. Weaknesses: childish aesthetic, no reflection/narrative, users outgrow it, community collapse.
Notion Life OS templates — $15-100 one-time purchase (templates). Requires Notion ($8-10/mo for AI features). Gamified Life OS, LiFE RPG, Life OS Dashboard are the leading options. Offer deep framework integration (up to 47 productivity models), AI agents for weekly/monthly review, identity-based tracking. Strengths: extreme customization, deep frameworks, active community. Weaknesses: Notion is slow on mobile, templates are fragile, setup takes hours, no native PWA, AI features require Notion Plus plan.
Fabulous — $39.99/year premium. 37M+ users. Behavioral science-backed from Duke University. Journey-based progression with habit stacking. Beautiful onboarding. Strengths: scientific backing, gorgeous design, guided journeys. Weaknesses: prescriptive not self-authored, limited free tier, aggressive upsells, not customizable for non-standard schedules.
Theme System Journal — $20-25 per journal (physical). Quarterly subscription available. Designed by CGP Grey and Myke Hurley (Cortex podcast). Seasonal theme + daily journal + habit tracking. Strengths: powerful framework, tactile object, community. Weaknesses: analog only, no data analysis, no AI, expensive ($80-100/year).
LifeUp — $4 one-time purchase. Android only. Highly customizable gamification sandbox. Strengths: one-time purchase, extreme customization, privacy-focused (offline-first). Weaknesses: Android only, no AI, steep learning curve, no mission concept.
OpenClaw — Free/open-source AI agent platform (68k+ GitHub stars). Runs locally, connects to 50+ integrations, community-built skills including gamification-xp (XP/levels/badges via Supabase). Strengths: unlimited extensibility, model-agnostic, privacy-first, community-driven. Weaknesses: requires significant technical setup, no designed experience, no opinionated structure — it’s a toolkit, not a product.
B. Framework influence map
| Framework | Author | Key concept used in Helm |
|---|---|---|
| Atomic Habits | James Clear | Identity-based change; systems > goals; 4 laws → Character Sheet reflection, habit tracking |
| Theme System | CGP Grey / Myke Hurley | Seasonal themes as directional compass → Mission themes, 1-3 word constraint |
| VOMIT System | Struthless (Campbell Walker) | Bare minimum vs. killing it; buckets; 70% rule → Energy-aware design, life categories, Captain’s Log |
| PARA Method | Tiago Forte | Organize by actionability → Mission-scoped vs. global data architecture |
| Building a Second Brain | Tiago Forte | CODE workflow; progressive summarization → AI debrief as distillation |
| Deep Work | Cal Newport | Protected focus; time scarcity → Salvage Run, fragment-friendly UX |
| Clear Thinking | Shane Parrish | Decision defaults; decision journals → Principles, economy as pre-commitment |
| GTD | David Allen | Capture everything; weekly review; trusted system → Quest board (continuous capture), weekly log |
| Bullet Journal | Ryder Carroll | Rapid logging; migration → Quick-add quests, quest rollover |
| Zettelkasten | Niklas Luhmann | Networked knowledge; communication partner → AI as cross-data insight engine |
| Digital Gardens | Various | Living content; seedling→evergreen lifecycle → Mission as evolving organism |
| OKRs | Intel/Google | Objectives + Key Results → Theme + Success Criteria |
| Spaced Repetition | Various | Review at optimal intervals → Habit mutations, signal decay |
C. Aesthetic & narrative inspirations
| Source | Type | What it contributes to Helm |
|---|---|---|
| A Long Way to a Small Angry Planet (Becky Chambers) | Novel | Emotional north star. The Wayfarer is warm, lived-in, functional, personal. Also the origin of Sid’s name (Sidra from A Closed and Common Orbit). |
| Caves of Qud (Freehold Games) | Game | Emergent narrative from structured systems. Procedural history. Information-dense warmth. “Wild garden of emergent narrative.” |
| Dungeon Crawler Carl (Matt Dinniman) | Novel series | Sid’s personality inspiration. In the series, an AI runs a deadly, televised dungeon game — darkly humorous, sardonic, surprisingly invested in the contestants’ survival. Sid takes this energy and warms it: amused by your patterns, honest about your failures, genuinely invested in your success. |
| LitRPG genre (various) | Literary genre | The “status screen for real life” concept. Visible progression as narrative engine. Character growth that’s measurable and satisfying. Titles and levels as honest achievements. |
D. Data & API landscape
| Service | Role | Free tier | API available |
|---|---|---|---|
| Supabase | Backend (Postgres, Auth, Edge Functions) | 500MB DB, 50k users, 500k invocations | Yes — REST, GraphQL, Realtime |
| Anthropic Claude API | AI (Sid) | None (pay per use) | Yes — Messages API |
| Google Fonts | Typography | Unlimited | CDN |
| Lucide React | Icons | Open source | npm package |
| Vercel | Frontend hosting | 100GB bandwidth | Git deploy |
E. Research sources
- Habitica App Store reviews (iOS + Google Play), Trustpilot reviews
- Gamified Life OS, LiFE RPG, Life OS Dashboard — Notion template marketplaces
- Fabulous App Store reviews, Trustpilot, Choosing Therapy review
- Theme System Journal — themesystem.com, Cortex podcast episodes, Pen Addict review
- “Why I stopped using Habitica” — Yuv Saxena (Substack)
- “5 Best Habitica Alternatives in 2026” — habi.app
- Zettelkasten.de introduction, Maggie Appleton digital garden history
- Caves of Qud press kit, Game Developer interview, RPGFan review
- Atomic Habits cheat sheet (thebehavioralscientist.com)
- Building a Second Brain definitive guide (fortelabs.com)
- Struthless VOMIT System documentation
- CGP Grey yearly themes (cgpgrey.substack.com)
- Dungeon Crawler Carl (Matt Dinniman) — novel series, AI personality reference
- OpenClaw documentation, showcase, DigitalOcean overview, gamification-xp skill