26 KiB
name, description
| name | description |
|---|---|
| implement-plan | Use when a plan folder (from create-plan) exists and needs to be executed in an isolated git worktree with iterative cross-model milestone review. ALWAYS invoke when user says "implement the plan", "execute the plan", "start implementation", "resume the plan", or similar execution requests. |
Implement Plan (Codex Native Superpowers)
Execute an existing plan (created by create-plan) in an isolated git worktree, with iterative cross-model review at each milestone boundary.
Overview
This skill wraps the Superpowers execution flow for Codex:
- Locate plan files under
ai_plan/ - Set up an isolated git worktree
- Execute milestones one-by-one with lint/typecheck/test gates
- Review each milestone with a second model/provider
- Commit approved milestones, merge to parent branch, and delete worktree
Core principle: Codex uses native skill discovery from ~/.agents/skills/. Do not use deprecated superpowers-codex bootstrap or use-skill CLI commands.
Prerequisite Check (MANDATORY)
Required:
- Plan folder exists under
ai_plan/at project root continuation-runbook.mdexists in plan foldermilestone-plan.mdexists in plan folderstory-tracker.mdexists in plan folder- Git repo with worktree support:
git worktree list - Superpowers skills symlink:
~/.agents/skills/superpowers -> ~/.codex/superpowers/skills - Superpowers execution skills:
superpowers:executing-planssuperpowers:using-git-worktreessuperpowers:verification-before-completionsuperpowers:finishing-a-development-branch
Verify before proceeding:
test -L ~/.agents/skills/superpowers
test -f ~/.agents/skills/superpowers/executing-plans/SKILL.md
test -f ~/.agents/skills/superpowers/using-git-worktrees/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md
If any dependency is missing, stop and return:
Missing dependency: [specific missing item]. Ensure all prerequisites are met, then retry.
If no plan folder exists:
No plan found under ai_plan/. Run create-plan first.
Required Skill Invocation Rules
- Invoke relevant skills through native discovery (no CLI wrapper).
- Announce skill usage explicitly:
I've read the [Skill Name] skill and I'm using it to [purpose].
- For skills with checklists, track checklist items with
update_plantodos. - Tool mapping for Codex:
TodoWrite->update_planTasksubagents -> unavailable in Codex; do the work directly and state the limitationSkill-> use native skill discovery from~/.agents/skills/
Process
Phase 1: Locate Plan
- Scan
ai_plan/for plan directories (most recent first by date prefix). - If multiple plans exist, ask user which one to implement.
- If no plan exists, stop: "No plan found. Run create-plan first."
- Read
continuation-runbook.mdfirst (source of truth). - Read
story-tracker.mdto detect resume state (in-devorcompletedstories). - Read
milestone-plan.mdfor implementation details.
Phase 2: Configure Reviewer
If the user has already specified a reviewer CLI and model (e.g., "implement the plan, review with claude sonnet"), use those values. Otherwise, ask:
-
Which CLI should review each milestone?
codex— OpenAI Codex CLI (codex exec)claude— Claude Code CLI (claude -p)cursor— Cursor Agent CLI (cursor-agent -p)skip— No external review, proceed with user approval only
-
Which model? (only if a CLI was chosen)
- For
codex: defaulto4-mini, alternatives:gpt-5.3-codex,o3 - For
claude: defaultsonnet, alternatives:opus,haiku - For
cursor: runcursor-agent modelsfirst to see available models - Accept any model string the user provides
- For
-
Max review rounds per milestone? (default: 10)
- If the user does not provide a value, set
MAX_ROUNDS=10.
- If the user does not provide a value, set
Store REVIEWER_CLI, REVIEWER_MODEL, and MAX_ROUNDS. These values are fixed for the entire run.
Phase 3: Set Up Worktree (REQUIRED SUB-SKILL)
Invoke superpowers:using-git-worktrees.
- Branch naming:
implement/<plan-folder-name>(e.g.,implement/2026-03-04-auth-system). - Follow worktree skill's directory priority:
.worktrees/>worktrees/> CLAUDE.md > ask user. - Verify
.gitignorecovers worktree directory. - Run project setup (auto-detect:
npm install,cargo build,pip install, etc.). - Verify clean baseline (run tests).
Resume detection: If story-tracker.md shows in-dev or completed stories, check if worktree branch already exists (git worktree list). If so, cd into existing worktree instead of creating a new one.
Phase 4: Execute Milestones (Loop)
For each milestone (M1, M2, ...):
Step 1: Read Milestone Spec
Read the milestone section from milestone-plan.md.
Step 2: Update Tracker
Mark first story in-dev in story-tracker.md.
Step 3: Implement Stories
Execute each story in order. After completing each story:
- Mark
in-dev->completedinstory-tracker.md - Update counts
- Mark next story
in-dev
Commit hashes are not available yet — they are backfilled in Step 6 after the milestone is approved and committed.
Step 4: Verify Milestone (REQUIRED SUB-SKILL)
Invoke superpowers:verification-before-completion.
# Lint changed files
# Typecheck
# Run tests (targeted first, then full suite)
All must pass before proceeding. If failures: fix, re-verify. Do NOT proceed to review with failures.
Step 5: Milestone Review Loop
Send to reviewer for approval before committing. See Phase 5 for details. The review payload uses working-tree diffs (git diff for unstaged, git diff --staged for staged changes).
Skip this step if reviewer was set to skip. When skipped, present the milestone summary to the user and ask for approval directly.
Step 6: Commit & Approve
Only after the reviewer approves (or user overrides at max rounds):
git add <changed-files>
git commit -m "feat(<scope>): implement milestone M<N> - <description>"
Do NOT push. After committing:
- Backfill the commit hash into the Notes column for all stories in this milestone in
story-tracker.md. - Mark milestone as
approvedinstory-tracker.md. - Move to next milestone.
Phase 5: Milestone Review Loop (Detail)
Skip this phase entirely if reviewer was set to skip.
Step 1: Generate Session ID
REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
Use REVIEW_ID for all milestone review temp file paths:
/tmp/milestone-${REVIEW_ID}.md/tmp/milestone-review-${REVIEW_ID}.md/tmp/milestone-review-${REVIEW_ID}.json/tmp/milestone-review-${REVIEW_ID}.stderr/tmp/milestone-review-${REVIEW_ID}.status/tmp/milestone-review-${REVIEW_ID}.runner.out/tmp/milestone-review-${REVIEW_ID}.sh
Resolve the shared runtime helper path before writing the command script:
REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh
Set helper success-artifact args before writing the command script:
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json)
;;
esac
Step 2: Write Review Payload
Write to /tmp/milestone-${REVIEW_ID}.md:
# Milestone M<N> Review: <title>
## Milestone Spec (from plan)
[Copy milestone section from milestone-plan.md]
## Acceptance Criteria
[Copy acceptance criteria checkboxes]
## Changes Made (git diff)
[Output of: git diff -- for unstaged changes, or git diff --staged for staged changes]
## Verification Output
### Lint
[lint output]
### Typecheck
[typecheck output]
### Tests
[test output with pass/fail counts]
Review Contract (Applies to Every Round)
The reviewer response must use this structure:
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
Rules:
- Order findings from
P0toP3. P0= total blocker,P1= major risk,P2= must-fix before approval,P3= cosmetic / nice to have.- Use
- None.when a severity has no findings. VERDICT: APPROVEDis allowed only when noP0,P1, orP2findings remain.P3findings are non-blocking.- The calling agent should still try to fix
P3findings when they are cheap and safe.
Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits
state=in-progress note="In progress N"heartbeats every 60 seconds while the reviewer child is alive. - The calling agent must keep waiting as long as a fresh
In progress Nheartbeat keeps arriving roughly once per minute. - Do not abort just because the review is slow, a soft timeout fired, or a
stall-warningline appears, as long as theIn progress Nheartbeat continues. - Treat missing heartbeats,
state=failed,state=completed-empty-output, andstate=needs-operator-decisionas escalation signals.
Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to /tmp/milestone-review-${REVIEW_ID}.sh as a bash script:
#!/usr/bin/env bash
set -euo pipefail
If REVIEWER_CLI is codex:
codex exec \
-m ${REVIEWER_MODEL} \
-s read-only \
-o /tmp/milestone-review-${REVIEW_ID}.md \
"Review this milestone implementation. The spec, acceptance criteria, git diff, and verification output are in /tmp/milestone-${REVIEW_ID}.md.
Evaluate:
1. Correctness — Does the implementation match the milestone spec?
2. Acceptance criteria — Are all criteria met?
3. Code quality — Clean, maintainable, no obvious issues?
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
Do not try to capture the Codex session ID yet. When using the helper, extract it from /tmp/milestone-review-${REVIEW_ID}.runner.out after the command completes (look for session id: <uuid>), then store it as CODEX_SESSION_ID for resume in subsequent rounds.
If REVIEWER_CLI is claude:
claude -p \
"Review this milestone implementation using the following spec, acceptance criteria, git diff, and verification output:
$(cat /tmp/milestone-${REVIEW_ID}.md)
Evaluate:
1. Correctness — Does the implementation match the milestone spec?
2. Acceptance criteria — Are all criteria met?
3. Code quality — Clean, maintainable, no obvious issues?
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user
If REVIEWER_CLI is cursor:
cursor-agent -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"Read the file /tmp/milestone-${REVIEW_ID}.md and review this milestone implementation.
Evaluate:
1. Correctness — Does the implementation match the milestone spec?
2. Acceptance criteria — Are all criteria met?
3. Code quality — Clean, maintainable, no obvious issues?
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json
For cursor, the command script writes raw JSON to /tmp/milestone-review-${REVIEW_ID}.json. Do not run jq extraction until after the helper or fallback execution completes. If jq is not installed, inform the user: brew install jq (macOS) or equivalent.
Run the command script through the shared helper when available:
if [ -x "$REVIEWER_RUNTIME" ]; then
"$REVIEWER_RUNTIME" \
--command-file /tmp/milestone-review-${REVIEW_ID}.sh \
--stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
--status-file /tmp/milestone-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
fi
Run the helper in the foreground and watch its live stdout for state=in-progress heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll /tmp/milestone-review-${REVIEW_ID}.status separately instead of treating heartbeats as post-hoc-only data.
After the command completes:
- If
REVIEWER_CLI=cursor, extract the final review text:
CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json)
jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
- If
REVIEWER_CLI=codex, extractCODEX_SESSION_IDfrom/tmp/milestone-review-${REVIEW_ID}.runner.outafter the helper or fallback run. If the review text is only in.runner.out, move or copy the actual review body into/tmp/milestone-review-${REVIEW_ID}.mdbefore verdict parsing. - If
REVIEWER_CLI=claude, promote stdout captured by the helper or fallback runner into the markdown review file:
cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md
Fallback is allowed only when the helper is missing or not executable.
Step 4: Read Review & Check Verdict
- Read
/tmp/milestone-review-${REVIEW_ID}.md - If the review failed, produced empty output, or reached helper timeout, also read:
/tmp/milestone-review-${REVIEW_ID}.stderr/tmp/milestone-review-${REVIEW_ID}.status/tmp/milestone-review-${REVIEW_ID}.runner.out
- Present review to the user:
## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL})
[Reviewer feedback]
- While the reviewer is still running, keep waiting as long as fresh
state=in-progress note="In progress N"heartbeats continue to appear roughly once per minute. - Check verdict:
- VERDICT: APPROVED with no
P0,P1, orP2findings -> proceed to Phase 4 Step 6 (commit & approve) - VERDICT: APPROVED with only
P3findings -> optionally fix theP3items if they are cheap and safe, then proceed - VERDICT: REVISE or any
P0,P1, orP2finding -> go to Step 5 - No clear verdict but
P0,P1, andP2are all- None.-> treat as approved - Helper state
completed-empty-output-> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state
needs-operator-decision-> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters - Max rounds (
MAX_ROUNDS) reached -> present to user for manual decision (proceed or stop)
- VERDICT: APPROVED with no
Step 5: Address Feedback & Re-verify
- Address the reviewer findings in priority order (
P0->P1->P2, thenP3when practical) (do NOT commit yet). - Re-run verification (lint/typecheck/tests) — all must pass.
- Update
/tmp/milestone-${REVIEW_ID}.mdwith new diff and verification output.
Summarize revisions for the user:
### Revisions (Round N)
- [Change and reason, one bullet per issue addressed]
If a revision contradicts the user's explicit requirements, skip it and note it for the user.
Step 6: Re-submit to Reviewer (Rounds 2-N)
Rewrite /tmp/milestone-review-${REVIEW_ID}.sh for the next round. The script should contain the reviewer invocation only; do not run it directly.
If REVIEWER_CLI is codex:
Resume the existing session:
codex exec resume ${CODEX_SESSION_ID} \
-o /tmp/milestone-review-${REVIEW_ID}.md \
"I've addressed your feedback. Updated diff and verification output are in /tmp/milestone-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
If resume fails (session expired), fall back to fresh codex exec with context about prior rounds.
If REVIEWER_CLI is claude:
Fresh call with accumulated context (Claude CLI has no session resume):
claude -p \
"You previously reviewed milestone M<N> and requested revisions.
Previous feedback summary: [key points from last review]
I've addressed your feedback. Updated diff and verification output are below.
$(cat /tmp/milestone-${REVIEW_ID}.md)
Changes made:
[List specific changes]
Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user
If REVIEWER_CLI is cursor:
Resume the existing session:
cursor-agent --resume ${CURSOR_SESSION_ID} -p \
--mode=ask \
--model ${REVIEWER_MODEL} \
--trust \
--output-format json \
"I've addressed your feedback. Updated diff and verification output are in /tmp/milestone-${REVIEW_ID}.md.
Changes made:
[List specific changes]
Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json
If resume fails, fall back to fresh cursor-agent -p with context about prior rounds.
Do not run jq extraction until after the helper or fallback execution completes, then extract /tmp/milestone-review-${REVIEW_ID}.md from the JSON response.
After updating /tmp/milestone-review-${REVIEW_ID}.sh, run the same helper/fallback flow from Round 1.
Return to Step 4.
Step 7: Cleanup Per Milestone
rm -f \
/tmp/milestone-${REVIEW_ID}.md \
/tmp/milestone-review-${REVIEW_ID}.md \
/tmp/milestone-review-${REVIEW_ID}.json \
/tmp/milestone-review-${REVIEW_ID}.stderr \
/tmp/milestone-review-${REVIEW_ID}.status \
/tmp/milestone-review-${REVIEW_ID}.runner.out \
/tmp/milestone-review-${REVIEW_ID}.sh
If the round failed, produced empty output, or reached operator-decision timeout, keep .stderr, .status, and .runner.out until the issue is diagnosed instead of deleting them immediately.
Phase 6: Completion (REQUIRED SUB-SKILL)
After all milestones are approved and committed:
- Invoke
superpowers:finishing-a-development-branch. - Run full test suite one final time — all must pass.
- Merge the worktree branch into the parent branch:
# From the main repo (not the worktree)
git merge implement/<plan-folder-name>
- Delete the worktree and its branch:
git worktree remove <worktree-path>
git branch -d implement/<plan-folder-name>
- Mark plan status as
completedinstory-tracker.md.
Phase 7: Final Report
Present summary:
## Implementation Complete
**Plan:** <plan-folder-name>
**Milestones:** <N> completed, <N> approved
**Review rounds:** <total across all milestones>
**Branch:** implement/<plan-folder-name> (merged and deleted)
Phase 8: Telegram Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed Codex skills directory:
TELEGRAM_NOTIFY_RUNTIME=~/.codex/skills/reviewer-runtime/notify-telegram.sh
On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID are configured:
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>"
fi
Rules:
- Telegram is the only supported notification path. Do not use desktop notifications,
say, email, or any other notifier. - Notification failures are non-blocking, but they must be surfaced to the user.
- Before stopping for any user interaction, approval, or manual decision, send a Telegram summary first if configured.
- If Telegram is not configured, state that no Telegram notification was sent.
Quick Reference
| Phase | Action | Required Output |
|---|---|---|
| 1 | Locate plan in ai_plan/ |
Plan folder identified, files read |
| 2 | Configure reviewer CLI and model | REVIEWER_CLI, REVIEWER_MODEL, MAX_ROUNDS |
| 3 | Invoke superpowers:using-git-worktrees |
Worktree created, baseline passing |
| 4 | Execute milestones (loop) | Stories tracked, verified, committed, reviewed |
| 5 | Milestone review loop (per milestone) | Reviewer approval or max rounds + user override |
| 6 | Invoke superpowers:finishing-a-development-branch |
Branch merged to parent, worktree deleted |
| 7 | Final report | Summary presented |
| 8 | Send Telegram notification | User notified or notification status reported |
Tracker Discipline (MANDATORY)
ALWAYS update story-tracker.md before/after each story. NEVER proceed with stale tracker state.
Before starting any story:
- Open
story-tracker.md - Mark story
in-dev - Add notes if relevant
- Then begin implementation
After completing any story:
- Mark story
completed - Review pending stories
- Update Last Updated and Stories Complete counts
Note: Commit hashes are backfilled into story Notes after the milestone commit (Step 6), not per-story.
Common Mistakes
- Using deprecated commands like
superpowers-codex bootstraporsuperpowers-codex use-skill. - Proceeding to milestone review with failing tests.
- Pushing commits before milestone approval.
- Skipping worktree setup and working directly on the main branch.
- Not capturing the Codex session ID for resume in subsequent review rounds.
- Forgetting to update
story-tracker.mdbetween stories. - Creating a new worktree when one already exists for a resumed plan.
- Using any notification path other than Telegram.
Rationalizations and Counters
| Rationalization | Counter |
|---|---|
| "Bootstrap CLI is faster" | Deprecated for Codex; native discovery is the supported path. |
| "I can skip the worktree for small plans" | Worktree isolation is mandatory — it protects the main branch. |
| "Tests passed earlier, I can skip re-verification" | Each milestone must be independently verified before review. |
| "The reviewer approved, I can skip my own validation" | Reviewer feedback supplements but does not replace your own verification. |
| "I can commit before the reviewer approves" | Code must only be committed after milestone approval — reviewers evaluate uncommitted diffs. |
Red Flags - Stop and Correct
- You are about to run any
superpowers-codexcommand. - You are pushing commits without user approval.
- You did not announce which skill you invoked and why.
- You are proceeding to review with failing lint/typecheck/tests.
- You are skipping the worktree and working on the main branch.
- You are applying a reviewer suggestion that contradicts user requirements.
Verification Checklist
- Plan folder located and all required files present
- Reviewer configured or explicitly skipped
- Max review rounds confirmed (default: 10)
- Worktree created with branch
implement/<plan-folder-name> - Worktree directory verified in .gitignore
- Baseline tests pass in worktree
- Each milestone: stories tracked (in-dev -> completed)
- Each milestone: lint/typecheck/tests pass before review
- Each milestone: reviewer approved (or max rounds + user override)
- Each milestone: committed locally only after approval
- Each milestone: marked approved in story-tracker.md
- All milestones completed, approved, and committed
- Final test suite passes
- Worktree branch merged to parent and worktree deleted
- Story tracker updated with final status
- Telegram notification attempted if configured