Align reviewer runtime and Telegram notifications

2026-03-24 11:45:58 -05:00
parent 4d37674626
commit 63a048a26c
17 changed files with 1756 additions and 200 deletions
@@ -47,7 +47,10 @@ If the user has already specified a reviewer CLI and model (e.g., "create a plan
   - For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription)
   - Accept any model string the user provides

-Store the chosen `REVIEWER_CLI` and `REVIEWER_MODEL` for Phase 6 (Iterative Plan Review).
+3. **Max review rounds for the plan?** (default: 10)
+   - If the user does not provide a value, set `MAX_ROUNDS=10`.
+
+Store the chosen `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phase 6 (Iterative Plan Review).

 ### Phase 4: Design (REQUIRED SUB-SKILL)
 - Invoke `superpowers:brainstorming` explicitly.
@@ -61,7 +64,7 @@ Store the chosen `REVIEWER_CLI` and `REVIEWER_MODEL` for Phase 6 (Iterative Plan

 ### Phase 6: Iterative Plan Review

-Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (max 5 rounds).
+Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (default max 10 rounds).

 **Skip this phase entirely if reviewer was set to `skip`.**

@@ -86,10 +89,60 @@ Resolve the shared reviewer helper from the installed Claude Code skills directo
 REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh
 ```

+Set helper success-artifact args before writing the command script:
+
+```bash
+HELPER_SUCCESS_FILE_ARGS=()
+case "$REVIEWER_CLI" in
+  codex)
+    HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.md)
+    ;;
+  cursor)
+    HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.json)
+    ;;
+esac
+```
+
 #### Step 2: Write Plan to Temp File

 Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`.

+#### Review Contract (Applies to Every Round)
+
+The reviewer response must use this structure:
+
+```text
+## Summary
+...
+
+## Findings
+### P0
+- ...
+### P1
+- ...
+### P2
+- ...
+### P3
+- ...
+
+## Verdict
+VERDICT: APPROVED
+```
+
+Rules:
+- Order findings from `P0` to `P3`.
+- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
+- Use `- None.` when a severity has no findings.
+- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
+- The calling agent should still try to fix `P3` findings when they are cheap and safe.
+
+#### Liveness Contract (Applies While Review Is Running)
+
+- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
+- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
+- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
+- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
+
 #### Step 3: Submit to Reviewer (Round 1)

 Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script:
@@ -113,8 +166,21 @@ codex exec \
 4. Alternatives — Is there a simpler or better approach?
 5. Security — Any security concerns?

-Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED
-If changes are needed, end with exactly: VERDICT: REVISE"
+Return exactly these sections in order:
+## Summary
+## Findings
+### P0
+### P1
+### P2
+### P3
+## Verdict
+
+Rules:
+- Order findings from highest severity to lowest.
+- Use `- None.` when a severity has no findings.
+- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
+- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
+- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
 ```

 Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -133,8 +199,21 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
 4. Alternatives — Is there a simpler or better approach?
 5. Security — Any security concerns?

-Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED
-If changes are needed, end with exactly: VERDICT: REVISE" \
+Return exactly these sections in order:
+## Summary
+## Findings
+### P0
+### P1
+### P2
+### P3
+## Verdict
+
+Rules:
+- Order findings from highest severity to lowest.
+- Use `- None.` when a severity has no findings.
+- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
+- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
+- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
  --model ${REVIEWER_MODEL} \
  --strict-mcp-config \
  --setting-sources user
@@ -155,8 +234,21 @@ cursor-agent -p \
 4. Alternatives — Is there a simpler or better approach?
 5. Security — Any security concerns?

-Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED
-If changes are needed, end with exactly: VERDICT: REVISE" \
+Return exactly these sections in order:
+## Summary
+## Findings
+### P0
+### P1
+### P2
+### P3
+## Verdict
+
+Rules:
+- Order findings from highest severity to lowest.
+- Use `- None.` when a severity has no findings.
+- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
+- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
+- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
  > /tmp/plan-review-${REVIEW_ID}.json
 ```

@@ -170,13 +262,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
    --command-file /tmp/plan-review-${REVIEW_ID}.sh \
    --stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \
    --stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \
-    --status-file /tmp/plan-review-${REVIEW_ID}.status
+    --status-file /tmp/plan-review-${REVIEW_ID}.status \
+    "${HELPER_SUCCESS_FILE_ARGS[@]}"
 else
  echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
  bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr
 fi
 ```

+Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/plan-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
+
 After the command completes:
 - If `REVIEWER_CLI=cursor`, extract the final review text:

@@ -186,6 +281,13 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
 ```

 - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing.
+- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
+
+```bash
+cp /tmp/plan-review-${REVIEW_ID}.runner.out /tmp/plan-review-${REVIEW_ID}.md
+```
+
+Fallback is allowed only when the helper is missing or not executable.

 #### Step 4: Read Review & Check Verdict

@@ -202,17 +304,19 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
 [Reviewer feedback]
 ```

-3. Check verdict:
-   - **VERDICT: APPROVED** → proceed to Phase 7 (Initialize workspace)
-   - **VERDICT: REVISE** → go to Step 5
-   - No clear verdict but positive / no actionable items → treat as approved
+4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
+5. Check verdict:
+   - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → proceed to Phase 7 (Initialize workspace)
+   - **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if they are cheap and safe, then proceed
+   - **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to Step 5
+   - No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as approved
   - Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
-   - Helper state `needs-operator-decision` → surface status log and decide whether to keep waiting, abort, or retry with different helper parameters
-   - Max rounds (5) reached → proceed with warning
+   - Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
+   - Max rounds (`MAX_ROUNDS`) reached → present the outcome to the user for a manual decision (proceed or stop)

 #### Step 5: Revise the Plan

-Address each issue the reviewer raised. Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`.
+Address the reviewer findings in priority order (`P0` → `P1` → `P2`, then `P3` when practical). Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`.

 Summarize revisions for the user:

@@ -223,7 +327,9 @@ Summarize revisions for the user:

 If a revision contradicts the user's explicit requirements, skip it and note it for the user.

-#### Step 6: Re-submit to Reviewer (Rounds 2-5)
+#### Step 6: Re-submit to Reviewer (Rounds 2-N)
+
+Rewrite `/tmp/plan-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.

 **If `REVIEWER_CLI` is `codex`:**

@@ -237,8 +343,8 @@ codex exec resume ${CODEX_SESSION_ID} \
 Changes made:
 [List specific changes]

-Re-review. If solid, end with: VERDICT: APPROVED
-If more changes needed, end with: VERDICT: REVISE"
+Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
+Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
 ```

 If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -260,8 +366,8 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
 Changes made:
 [List specific changes]

-Re-review the full plan. If solid, end with: VERDICT: APPROVED
-If more changes needed, end with: VERDICT: REVISE" \
+Re-review the full plan using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
+Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
  --model ${REVIEWER_MODEL} \
  --strict-mcp-config \
  --setting-sources user
@@ -282,8 +388,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
 Changes made:
 [List specific changes]

-Re-review. If solid, end with: VERDICT: APPROVED
-If more changes needed, end with: VERDICT: REVISE" \
+Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
+Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
  > /tmp/plan-review-${REVIEW_ID}.json

 jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md
@@ -302,7 +408,7 @@ Return to Step 4.

 **Status:** Approved after N round(s)
 [or]
-**Status:** Max rounds (5) reached — not fully approved
+**Status:** Max rounds (`MAX_ROUNDS`) reached — not fully approved

 [Final feedback / remaining concerns]
 ```
@@ -354,6 +460,27 @@ When handing off to execution, instruct:

 Private plan files under `~/.claude/plans/` are planning artifacts and must not be used as execution source of truth.

+### Phase 10: Telegram Completion Notification (MANDATORY)
+
+Resolve the Telegram notifier helper from the installed Claude Code skills directory:
+
+```bash
+TELEGRAM_NOTIFY_RUNTIME=~/.claude/skills/reviewer-runtime/notify-telegram.sh
+```
+
+On every terminal outcome for the create-plan run (approved, max rounds reached, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
+
+```bash
+if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
+  "$TELEGRAM_NOTIFY_RUNTIME" --message "create-plan completed for <plan-folder-name>: <status summary>"
+fi
+```
+
+Rules:
+- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
+- Notification failures are non-blocking, but they must be surfaced to the user.
+- If Telegram is not configured, state that no completion notification was sent.
+
 ## Tracker Discipline (MANDATORY)

 **ALWAYS update `story-tracker.md` before/after each story. NEVER proceed with stale tracker state.**
@@ -392,6 +519,7 @@ After completing any story:
 - [ ] `.gitignore` ignore-rule commit was created if needed
 - [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/`
 - [ ] Reviewer configured or explicitly skipped
+- [ ] Max review rounds confirmed (default: 10)
 - [ ] Plan review completed (approved or max rounds) — or skipped
 - [ ] `original-plan.md` copied from `~/.claude/plans/` plan file
 - [ ] `final-transcript.md` present
@@ -399,6 +527,7 @@ After completing any story:
 - [ ] `story-tracker.md` created with all stories as `pending`
 - [ ] `continuation-runbook.md` present
 - [ ] Handoff explicitly says to read runbook first and execute from plan folder
+- [ ] Telegram completion notification attempted if configured

 ## Exit Triggers for Question Phase
 User says: "ready", "done", "let's plan", "proceed", "enough questions"