--- name: implement-plan description: Use when a plan folder (from create-plan) exists and needs to be executed in an isolated git worktree with iterative cross-model milestone review. ALWAYS invoke when user says "implement the plan", "execute the plan", "start implementation", "resume the plan", or similar execution requests. --- # Implement Plan (Claude Code) Execute an existing plan (created by `create-plan`) in an isolated git worktree, with iterative cross-model review at each milestone boundary. ## Prerequisite Check (MANDATORY) Required: - Plan folder exists under `ai_plan/` at project root - `continuation-runbook.md` exists in plan folder - `milestone-plan.md` exists in plan folder - `story-tracker.md` exists in plan folder - Git repo with worktree support: `git worktree list` - Superpowers execution skills: - `superpowers:executing-plans` - `superpowers:using-git-worktrees` - `superpowers:verification-before-completion` - `superpowers:finishing-a-development-branch` If any dependency is missing, stop immediately and return: "Missing dependency: [specific missing item]. Ensure all prerequisites are met, then retry." If no plan folder exists: "No plan found under `ai_plan/`. Run `create-plan` first." ## Process ### Phase 1: Locate Plan 1. Scan `ai_plan/` for plan directories (most recent first by date prefix). 2. If multiple plans exist, ask user which one to implement. 3. If no plan exists, stop: "No plan found. Run create-plan first." 4. Read `continuation-runbook.md` first (source of truth). 5. Read `story-tracker.md` to detect resume state (`in-dev` or `completed` stories). 6. Read `milestone-plan.md` for implementation details. ### Phase 2: Configure Reviewer If the user has already specified a reviewer CLI and model (e.g., "implement the plan, review with claude sonnet"), use those values. Otherwise, ask: 1. **Which CLI should review each milestone?** - `codex` — OpenAI Codex CLI (`codex exec`) - `claude` — Claude Code CLI (`claude -p`) - `cursor` — Cursor Agent CLI (`cursor-agent -p`) - `skip` — No external review, proceed with user approval only 2. **Which model?** (only if a CLI was chosen) - For `codex`: default `o4-mini`, alternatives: `gpt-5.3-codex`, `o3` - For `claude`: default `sonnet`, alternatives: `opus`, `haiku` - For `cursor`: **run `cursor-agent models` first** to see available models - Accept any model string the user provides 3. **Max review rounds per milestone?** (default: 10) - If the user does not provide a value, set `MAX_ROUNDS=10`. Store `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS`. These values are fixed for the entire run. ### Phase 3: Set Up Worktree (REQUIRED SUB-SKILL) Invoke `superpowers:using-git-worktrees` explicitly. 1. Branch naming: `implement/` (e.g., `implement/2026-03-04-auth-system`). 2. Follow worktree skill's directory priority: `.worktrees/` > `worktrees/` > CLAUDE.md > ask user. 3. Verify `.gitignore` covers worktree directory. 4. Run project setup (auto-detect: `npm install`, `cargo build`, `pip install`, etc.). 5. Verify clean baseline (run tests). **Resume detection:** If `story-tracker.md` shows `in-dev` or `completed` stories, check if worktree branch already exists (`git worktree list`). If so, `cd` into existing worktree instead of creating a new one. ### Phase 4: Execute Milestones (Loop) For each milestone (M1, M2, ...): #### Step 1: Read Milestone Spec Read the milestone section from `milestone-plan.md`. #### Step 2: Update Tracker Mark first story `in-dev` in `story-tracker.md`. #### Step 3: Implement Stories Execute each story in order. After completing each story: 1. Mark `in-dev` -> `completed` in `story-tracker.md` 2. Update counts 3. Mark next story `in-dev` Commit hashes are not available yet — they are backfilled in Step 6 after the milestone is approved and committed. #### Step 4: Verify Milestone (REQUIRED SUB-SKILL) Invoke `superpowers:verification-before-completion` explicitly. ```bash # Lint changed files # Typecheck # Run tests (targeted first, then full suite) ``` All must pass before proceeding. If failures: fix, re-verify. Do NOT proceed to review with failures. #### Step 5: Milestone Review Loop Send to reviewer for approval **before committing**. See Phase 5 for details. The review payload uses working-tree diffs (`git diff` for unstaged, `git diff --staged` for staged changes). **Skip this step if reviewer was set to `skip`.** When skipped, present the milestone summary to the user and ask for approval directly. #### Step 6: Commit & Approve Only after the reviewer approves (or user overrides at max rounds): ```bash git add git commit -m "feat(): implement milestone M - " ``` Do NOT push. After committing: 1. Backfill the commit hash into the Notes column for all stories in this milestone in `story-tracker.md`. 2. Mark milestone as `approved` in `story-tracker.md`. 3. Move to next milestone. ### Phase 5: Milestone Review Loop (Detail) **Skip this phase entirely if reviewer was set to `skip`.** #### Step 1: Generate Session ID ```bash REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) ``` Use `REVIEW_ID` for all milestone review temp file paths: - `/tmp/milestone-${REVIEW_ID}.md` - `/tmp/milestone-review-${REVIEW_ID}.md` - `/tmp/milestone-review-${REVIEW_ID}.json` - `/tmp/milestone-review-${REVIEW_ID}.stderr` - `/tmp/milestone-review-${REVIEW_ID}.status` - `/tmp/milestone-review-${REVIEW_ID}.runner.out` - `/tmp/milestone-review-${REVIEW_ID}.sh` Resolve the shared runtime helper path before writing the command script: ```bash REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh ``` Set helper success-artifact args before writing the command script: ```bash HELPER_SUCCESS_FILE_ARGS=() case "$REVIEWER_CLI" in codex) HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md) ;; cursor) HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json) ;; esac ``` #### Step 2: Write Review Payload Write to `/tmp/milestone-${REVIEW_ID}.md`: ```markdown # Milestone M Review: ## Milestone Spec (from plan) [Copy milestone section from milestone-plan.md] ## Acceptance Criteria [Copy acceptance criteria checkboxes] ## Changes Made (git diff) [Output of: git diff -- for unstaged changes, or git diff --staged for staged changes] ## Verification Output ### Lint [lint output] ### Typecheck [typecheck output] ### Tests [test output with pass/fail counts] ``` #### Review Contract (Applies to Every Round) The reviewer response must use this structure: ```text ## Summary ... ## Findings ### P0 - ... ### P1 - ... ### P2 - ... ### P3 - ... ## Verdict VERDICT: APPROVED ``` Rules: - Order findings from `P0` to `P3`. - `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have. - Use `- None.` when a severity has no findings. - `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking. - The calling agent should still try to fix `P3` findings when they are cheap and safe. #### Liveness Contract (Applies While Review Is Running) - The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive. - The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute. - Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues. - Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals. #### Step 3: Submit to Reviewer (Round 1) Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: ```bash #!/usr/bin/env bash set -euo pipefail ``` **If `REVIEWER_CLI` is `codex`:** ```bash codex exec \ -m ${REVIEWER_MODEL} \ -s read-only \ -o /tmp/milestone-review-${REVIEW_ID}.md \ "Review this milestone implementation. The spec, acceptance criteria, git diff, and verification output are in /tmp/milestone-${REVIEW_ID}.md. Evaluate: 1. Correctness — Does the implementation match the milestone spec? 2. Acceptance criteria — Are all criteria met? 3. Code quality — Clean, maintainable, no obvious issues? 4. Test coverage — Are changes adequately tested? 5. Security — Any security concerns introduced? Return exactly these sections in order: ## Summary ## Findings ### P0 ### P1 ### P2 ### P3 ## Verdict Rules: - Order findings from highest severity to lowest. - Use `- None.` when a severity has no findings. - `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have. - End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE` - `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." ``` Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. **If `REVIEWER_CLI` is `claude`:** ```bash claude -p \ "Review this milestone implementation using the following spec, acceptance criteria, git diff, and verification output: $(cat /tmp/milestone-${REVIEW_ID}.md) Evaluate: 1. Correctness — Does the implementation match the milestone spec? 2. Acceptance criteria — Are all criteria met? 3. Code quality — Clean, maintainable, no obvious issues? 4. Test coverage — Are changes adequately tested? 5. Security — Any security concerns introduced? Return exactly these sections in order: ## Summary ## Findings ### P0 ### P1 ### P2 ### P3 ## Verdict Rules: - Order findings from highest severity to lowest. - Use `- None.` when a severity has no findings. - `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have. - End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE` - `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \ --model ${REVIEWER_MODEL} \ --strict-mcp-config \ --setting-sources user ``` **If `REVIEWER_CLI` is `cursor`:** ```bash cursor-agent -p \ --mode=ask \ --model ${REVIEWER_MODEL} \ --trust \ --output-format json \ "Read the file /tmp/milestone-${REVIEW_ID}.md and review this milestone implementation. Evaluate: 1. Correctness — Does the implementation match the milestone spec? 2. Acceptance criteria — Are all criteria met? 3. Code quality — Clean, maintainable, no obvious issues? 4. Test coverage — Are changes adequately tested? 5. Security — Any security concerns introduced? Return exactly these sections in order: ## Summary ## Findings ### P0 ### P1 ### P2 ### P3 ## Verdict Rules: - Order findings from highest severity to lowest. - Use `- None.` when a severity has no findings. - `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have. - End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE` - `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \ > /tmp/milestone-review-${REVIEW_ID}.json ``` For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent. Run the command script through the shared helper when available: ```bash if [ -x "$REVIEWER_RUNTIME" ]; then "$REVIEWER_RUNTIME" \ --command-file /tmp/milestone-review-${REVIEW_ID}.sh \ --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ --status-file /tmp/milestone-review-${REVIEW_ID}.status \ "${HELPER_SUCCESS_FILE_ARGS[@]}" else echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr fi ``` Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/milestone-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data. After the command completes: - If `REVIEWER_CLI=cursor`, extract the final review text: ```bash CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json) jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file: ```bash cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md ``` Fallback is allowed only when the helper is missing or not executable. #### Step 4: Read Review & Check Verdict 1. Read `/tmp/milestone-review-${REVIEW_ID}.md` 2. If the review failed, produced empty output, or reached helper timeout, also read: - `/tmp/milestone-review-${REVIEW_ID}.stderr` - `/tmp/milestone-review-${REVIEW_ID}.status` - `/tmp/milestone-review-${REVIEW_ID}.runner.out` 3. Present review to the user: ``` ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL}) [Reviewer feedback] ``` 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute. 5. Check verdict: - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings -> proceed to Phase 4 Step 6 (commit & approve) - **VERDICT: APPROVED** with only `P3` findings -> optionally fix the `P3` items if they are cheap and safe, then proceed - **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding -> go to Step 5 - No clear verdict but `P0`, `P1`, and `P2` are all `- None.` -> treat as approved - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `needs-operator-decision` -> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) #### Step 5: Address Feedback & Re-verify 1. Address the reviewer findings in priority order (`P0` -> `P1` -> `P2`, then `P3` when practical) (do NOT commit yet). 2. Re-run verification (lint/typecheck/tests) — all must pass. 3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output. Summarize revisions for the user: ``` ### Revisions (Round N) - [Change and reason, one bullet per issue addressed] ``` If a revision contradicts the user's explicit requirements, skip it and note it for the user. #### Step 6: Re-submit to Reviewer (Rounds 2-N) Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly. **If `REVIEWER_CLI` is `codex`:** Resume the existing session: ```bash codex exec resume ${CODEX_SESSION_ID} \ -o /tmp/milestone-review-${REVIEW_ID}.md \ "I've addressed your feedback. Updated diff and verification output are in /tmp/milestone-${REVIEW_ID}.md. Changes made: [List specific changes] Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before. Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." ``` If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. **If `REVIEWER_CLI` is `claude`:** Fresh call with accumulated context (Claude CLI has no session resume): ```bash claude -p \ "You previously reviewed milestone M<N> and requested revisions. Previous feedback summary: [key points from last review] I've addressed your feedback. Updated diff and verification output are below. $(cat /tmp/milestone-${REVIEW_ID}.md) Changes made: [List specific changes] Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before. Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \ --model ${REVIEWER_MODEL} \ --strict-mcp-config \ --setting-sources user ``` **If `REVIEWER_CLI` is `cursor`:** Resume the existing session: ```bash cursor-agent --resume ${CURSOR_SESSION_ID} -p \ --mode=ask \ --model ${REVIEWER_MODEL} \ --trust \ --output-format json \ "I've addressed your feedback. Updated diff and verification output are in /tmp/milestone-${REVIEW_ID}.md. Changes made: [List specific changes] Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before. Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \ > /tmp/milestone-review-${REVIEW_ID}.json ``` If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds. Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response. After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1. Return to Step 4. #### Step 7: Cleanup Per Milestone ```bash rm -f \ /tmp/milestone-${REVIEW_ID}.md \ /tmp/milestone-review-${REVIEW_ID}.md \ /tmp/milestone-review-${REVIEW_ID}.json \ /tmp/milestone-review-${REVIEW_ID}.stderr \ /tmp/milestone-review-${REVIEW_ID}.status \ /tmp/milestone-review-${REVIEW_ID}.runner.out \ /tmp/milestone-review-${REVIEW_ID}.sh ``` If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately. ### Phase 6: Completion (REQUIRED SUB-SKILL) After all milestones are approved and committed: 1. Invoke `superpowers:finishing-a-development-branch` explicitly. 2. Run full test suite one final time — all must pass. 3. Merge the worktree branch into the parent branch: ```bash # From the main repo (not the worktree) git merge implement/<plan-folder-name> ``` 4. Delete the worktree and its branch: ```bash git worktree remove <worktree-path> git branch -d implement/<plan-folder-name> ``` 5. Mark plan status as `completed` in `story-tracker.md`. ### Phase 7: Final Report Present summary: ``` ## Implementation Complete **Plan:** <plan-folder-name> **Milestones:** <N> completed, <N> approved **Review rounds:** <total across all milestones> **Branch:** implement/<plan-folder-name> (merged and deleted) ``` ### Phase 8: Telegram Completion Notification (MANDATORY) Resolve the Telegram notifier helper from the installed Claude Code skills directory: ```bash TELEGRAM_NOTIFY_RUNTIME=~/.claude/skills/reviewer-runtime/notify-telegram.sh ``` On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured: ```bash if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then "$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>" fi ``` Rules: - Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier. - Notification failures are non-blocking, but they must be surfaced to the user. - If Telegram is not configured, state that no completion notification was sent. ## Tracker Discipline (MANDATORY) **ALWAYS update `story-tracker.md` before/after each story. NEVER proceed with stale tracker state.** Before starting any story: 1. Open `story-tracker.md` 2. Mark story `in-dev` 3. Add notes if relevant 4. Then begin implementation After completing any story: 1. Mark story `completed` 2. Review pending stories 3. Update Last Updated and Stories Complete counts Note: Commit hashes are backfilled into story Notes after the milestone commit (Step 6), not per-story. ## Verification Checklist - [ ] Plan folder located and all required files present - [ ] Reviewer configured or explicitly skipped - [ ] Max review rounds confirmed (default: 10) - [ ] Worktree created with branch `implement/<plan-folder-name>` - [ ] Worktree directory verified in .gitignore - [ ] Baseline tests pass in worktree - [ ] Each milestone: stories tracked (in-dev -> completed) - [ ] Each milestone: lint/typecheck/tests pass before review - [ ] Each milestone: reviewer approved (or max rounds + user override) - [ ] Each milestone: committed locally only after approval - [ ] Each milestone: marked approved in story-tracker.md - [ ] All milestones completed, approved, and committed - [ ] Final test suite passes - [ ] Worktree branch merged to parent and worktree deleted - [ ] Story tracker updated with final status - [ ] Telegram completion notification attempted if configured