Align reviewer runtime and Telegram notifications

This commit is contained in:
Stefano Fiorini
2026-03-24 11:45:58 -05:00
parent 4d37674626
commit 63a048a26c
17 changed files with 1756 additions and 200 deletions

View File

@@ -18,6 +18,7 @@ Create structured implementation plans with milestone and story tracking, and op
- Claude Code: `~/.claude/skills/reviewer-runtime/run-review.sh` - Claude Code: `~/.claude/skills/reviewer-runtime/run-review.sh`
- OpenCode: `~/.config/opencode/skills/reviewer-runtime/run-review.sh` - OpenCode: `~/.config/opencode/skills/reviewer-runtime/run-review.sh`
- Cursor: `.cursor/skills/reviewer-runtime/run-review.sh` or `~/.cursor/skills/reviewer-runtime/run-review.sh` - Cursor: `.cursor/skills/reviewer-runtime/run-review.sh` or `~/.cursor/skills/reviewer-runtime/run-review.sh`
- Telegram notification setup is documented in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
If dependencies are missing, stop and return: If dependencies are missing, stop and return:
@@ -115,10 +116,13 @@ Verify Superpowers dependencies exist in your agent skills root:
- Creates plans under `ai_plan/YYYY-MM-DD-<short-title>/`. - Creates plans under `ai_plan/YYYY-MM-DD-<short-title>/`.
- Ensures `/ai_plan/` is in `.gitignore`. - Ensures `/ai_plan/` is in `.gitignore`.
- Commits `.gitignore` update locally when added. - Commits `.gitignore` update locally when added.
- Asks which reviewer CLI and model to use (or accepts `skip` for no review). - Asks which reviewer CLI, model, and max rounds to use (or accepts `skip` for no review).
- Iteratively reviews the plan with the chosen reviewer (max 5 rounds) before generating files. - Iteratively reviews the plan with the chosen reviewer (default max 10 rounds) before generating files.
- Runs reviewer commands through `reviewer-runtime/run-review.sh` when available, with fallback to direct synchronous execution only if the helper is missing. - Runs reviewer commands through `reviewer-runtime/run-review.sh` when available, with fallback to direct synchronous execution only if the helper is missing.
- Waits as long as the reviewer runtime keeps emitting per-minute `In progress N` heartbeats.
- Requires reviewer findings to be ordered `P0` through `P3`, with `P3` explicitly non-blocking.
- Captures reviewer stderr and helper status logs for diagnostics and retains them on failed, empty-output, or operator-decision review rounds. - Captures reviewer stderr and helper status logs for diagnostics and retains them on failed, empty-output, or operator-decision review rounds.
- Sends completion notifications through Telegram only when the shared setup in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) is installed and configured.
- Produces: - Produces:
- `original-plan.md` - `original-plan.md`
- `final-transcript.md` - `final-transcript.md`
@@ -130,13 +134,24 @@ Verify Superpowers dependencies exist in your agent skills root:
After the plan is created (design + milestones + stories), the skill sends it to a second model for review: After the plan is created (design + milestones + stories), the skill sends it to a second model for review:
1. **Configure** — user picks a reviewer CLI (`codex`, `claude`, `cursor`) and model, or skips 1. **Configure** — user picks a reviewer CLI (`codex`, `claude`, `cursor`), a model, and optional max rounds (default 10), or skips
2. **Prepare** — plan payload and a bash reviewer command script are written to temp files 2. **Prepare** — plan payload and a bash reviewer command script are written to temp files
3. **Run** — the command script is executed through `reviewer-runtime/run-review.sh` when installed 3. **Run** — the command script is executed through `reviewer-runtime/run-review.sh` when installed
4. **Feedback** — reviewer evaluates correctness, risks, missing steps, alternatives, security 4. **Feedback** — reviewer evaluates correctness, risks, missing steps, alternatives, security, and returns `## Summary`, `## Findings`, and `## Verdict`
5. **Revise** — the planning agent addresses each issue and re-submits 5. **Prioritize** — findings are ordered `P0`, `P1`, `P2`, `P3`
6. **Repeat**up to 5 rounds until the reviewer returns `VERDICT: APPROVED` 6. **Revise**the planning agent addresses findings in priority order and re-submits
7. **Finalize** — approved plan is used to generate the plan file package 7. **Repeat** — up to max rounds until the reviewer returns `VERDICT: APPROVED`
8. **Finalize** — approved plan is used to generate the plan file package
### Reviewer Output Contract
- `P0` = total blocker
- `P1` = major risk
- `P2` = must-fix before approval
- `P3` = cosmetic / nice to have
- Each severity section should use `- None.` when empty
- `VERDICT: APPROVED` is valid only when no `P0`, `P1`, or `P2` findings remain
- `P3` findings are non-blocking, but the caller should still try to fix them when cheap and safe
### Runtime Artifacts ### Runtime Artifacts
@@ -153,16 +168,18 @@ The review flow may create these temp artifacts:
Status log lines use this format: Status log lines use this format:
```text ```text
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>" ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|in-progress|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
``` ```
`stall-warning` is a heartbeat/status-log state only. It is not a terminal review result. `in-progress` is the liveness heartbeat emitted roughly once per minute with `note="In progress N"`.
`stall-warning` is a non-terminal status-log state only. It does not mean the caller should stop waiting if `in-progress` heartbeats continue.
### Failure Handling ### Failure Handling
- `completed-empty-output` means the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause. - `completed-empty-output` means the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause.
- `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to keep waiting, abort, or retry with different parameters. - `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to extend the timeout, abort, or retry with different parameters.
- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds should retain `.stderr`, `.status`, and `.runner.out` until diagnosed. - Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds should retain `.stderr`, `.status`, and `.runner.out` until diagnosed.
- As long as fresh `in-progress` heartbeats continue to arrive roughly once per minute, the caller should keep waiting.
### Supported Reviewer CLIs ### Supported Reviewer CLIs
@@ -178,6 +195,12 @@ For all three CLIs, the preferred execution path is:
2. run that script through `reviewer-runtime/run-review.sh` 2. run that script through `reviewer-runtime/run-review.sh`
3. fall back to direct synchronous execution only if the helper is missing or not executable 3. fall back to direct synchronous execution only if the helper is missing or not executable
## Notifications
- Telegram is the only supported completion notification path.
- Shared setup: [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
- Notification failures are non-blocking, but they must be surfaced to the user.
## Template Guardrails ## Template Guardrails
All plan templates now include guardrail sections that enforce: All plan templates now include guardrail sections that enforce:

View File

@@ -25,6 +25,7 @@ Execute an existing plan (created by `create-plan`) in an isolated git worktree,
- Claude Code: `~/.claude/skills/reviewer-runtime/run-review.sh` - Claude Code: `~/.claude/skills/reviewer-runtime/run-review.sh`
- OpenCode: `~/.config/opencode/skills/reviewer-runtime/run-review.sh` - OpenCode: `~/.config/opencode/skills/reviewer-runtime/run-review.sh`
- Cursor: `.cursor/skills/reviewer-runtime/run-review.sh` or `~/.cursor/skills/reviewer-runtime/run-review.sh` - Cursor: `.cursor/skills/reviewer-runtime/run-review.sh` or `~/.cursor/skills/reviewer-runtime/run-review.sh`
- Telegram notification setup is documented in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
If dependencies are missing, stop and return: If dependencies are missing, stop and return:
@@ -133,10 +134,13 @@ Verify Superpowers execution dependencies exist in your agent skills root:
- Runs lint/typecheck/tests as a gate before each milestone review. - Runs lint/typecheck/tests as a gate before each milestone review.
- Sends each milestone to a reviewer CLI for approval (max rounds configurable, default 10). - Sends each milestone to a reviewer CLI for approval (max rounds configurable, default 10).
- Runs reviewer commands through `reviewer-runtime/run-review.sh` when available, with fallback to direct synchronous execution only if the helper is missing. - Runs reviewer commands through `reviewer-runtime/run-review.sh` when available, with fallback to direct synchronous execution only if the helper is missing.
- Waits as long as the reviewer runtime keeps emitting per-minute `In progress N` heartbeats.
- Requires reviewer findings to be ordered `P0` through `P3`, with `P3` explicitly non-blocking.
- Captures reviewer stderr and helper status logs for diagnostics and retains them on failed, empty-output, or operator-decision review rounds. - Captures reviewer stderr and helper status logs for diagnostics and retains them on failed, empty-output, or operator-decision review rounds.
- Commits each milestone locally only after reviewer approval (does not push). - Commits each milestone locally only after reviewer approval (does not push).
- After all milestones approved, merges worktree branch to parent and deletes worktree. - After all milestones approved, merges worktree branch to parent and deletes worktree.
- Supports resume: detects existing worktree and `in-dev`/`completed` stories. - Supports resume: detects existing worktree and `in-dev`/`completed` stories.
- Sends completion notifications through Telegram only when the shared setup in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) is installed and configured.
## Milestone Review Loop ## Milestone Review Loop
@@ -145,11 +149,22 @@ After each milestone is implemented and verified, the skill sends it to a second
1. **Configure** — user picks a reviewer CLI (`codex`, `claude`, `cursor`) and model, or skips 1. **Configure** — user picks a reviewer CLI (`codex`, `claude`, `cursor`) and model, or skips
2. **Prepare** — milestone payload and a bash reviewer command script are written to temp files 2. **Prepare** — milestone payload and a bash reviewer command script are written to temp files
3. **Run** — the command script is executed through `reviewer-runtime/run-review.sh` when installed 3. **Run** — the command script is executed through `reviewer-runtime/run-review.sh` when installed
4. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security 4. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security, and returns `## Summary`, `## Findings`, and `## Verdict`
5. **Revise** — the implementing agent addresses each issue, re-verifies, and re-submits 5. **Prioritize** — findings are ordered `P0`, `P1`, `P2`, `P3`
6. **Repeat**up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED` 6. **Revise**the implementing agent addresses findings in priority order, re-verifies, and re-submits
7. **Repeat** — up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED`
7. **Approve** — milestone is marked approved in `story-tracker.md` 7. **Approve** — milestone is marked approved in `story-tracker.md`
### Reviewer Output Contract
- `P0` = total blocker
- `P1` = major risk
- `P2` = must-fix before approval
- `P3` = cosmetic / nice to have
- Each severity section should use `- None.` when empty
- `VERDICT: APPROVED` is valid only when no `P0`, `P1`, or `P2` findings remain
- `P3` findings are non-blocking, but the caller should still try to fix them when cheap and safe
### Runtime Artifacts ### Runtime Artifacts
The milestone review flow may create these temp artifacts: The milestone review flow may create these temp artifacts:
@@ -165,16 +180,18 @@ The milestone review flow may create these temp artifacts:
Status log lines use this format: Status log lines use this format:
```text ```text
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>" ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|in-progress|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
``` ```
`stall-warning` is a heartbeat/status-log state only. It is not a terminal review result. `in-progress` is the liveness heartbeat emitted roughly once per minute with `note="In progress N"`.
`stall-warning` is a non-terminal status-log state only. It does not mean the caller should stop waiting if `in-progress` heartbeats continue.
### Failure Handling ### Failure Handling
- `completed-empty-output` means the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause. - `completed-empty-output` means the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause.
- `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to keep waiting, abort, or retry with different parameters. - `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to extend the timeout, abort, or retry with different parameters.
- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds should retain `.stderr`, `.status`, and `.runner.out` until diagnosed. - Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds should retain `.stderr`, `.status`, and `.runner.out` until diagnosed.
- As long as fresh `in-progress` heartbeats continue to arrive roughly once per minute, the caller should keep waiting.
### Supported Reviewer CLIs ### Supported Reviewer CLIs
@@ -190,6 +207,12 @@ For all three CLIs, the preferred execution path is:
2. run that script through `reviewer-runtime/run-review.sh` 2. run that script through `reviewer-runtime/run-review.sh`
3. fall back to direct synchronous execution only if the helper is missing or not executable 3. fall back to direct synchronous execution only if the helper is missing or not executable
## Notifications
- Telegram is the only supported completion notification path.
- Shared setup: [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
- Notification failures are non-blocking, but they must be surfaced to the user.
The helper also supports manual override flags for diagnostics: The helper also supports manual override flags for diagnostics:
```bash ```bash

View File

@@ -7,6 +7,7 @@ This directory contains user-facing docs for each skill.
- [ATLASSIAN.md](./ATLASSIAN.md) — Includes requirements, generated bundle sync, install, auth, safety rules, and usage examples for the Atlassian skill. - [ATLASSIAN.md](./ATLASSIAN.md) — Includes requirements, generated bundle sync, install, auth, safety rules, and usage examples for the Atlassian skill.
- [CREATE-PLAN.md](./CREATE-PLAN.md) — Includes requirements, install, verification, and execution workflow for create-plan. - [CREATE-PLAN.md](./CREATE-PLAN.md) — Includes requirements, install, verification, and execution workflow for create-plan.
- [IMPLEMENT-PLAN.md](./IMPLEMENT-PLAN.md) — Includes requirements, install, verification, and milestone review workflow for implement-plan. - [IMPLEMENT-PLAN.md](./IMPLEMENT-PLAN.md) — Includes requirements, install, verification, and milestone review workflow for implement-plan.
- [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) — Shared Telegram notification setup used by reviewer-driven skills.
- [WEB-AUTOMATION.md](./WEB-AUTOMATION.md) — Includes requirements, install, dependency verification, and usage examples for web-automation. - [WEB-AUTOMATION.md](./WEB-AUTOMATION.md) — Includes requirements, install, dependency verification, and usage examples for web-automation.
## Repo Setup ## Repo Setup

View File

@@ -0,0 +1,96 @@
# TELEGRAM-NOTIFICATIONS
## Purpose
Shared setup for Telegram completion notifications used by reviewer-driven skills such as `create-plan` and `implement-plan`.
## Requirements
- Telegram bot token in `TELEGRAM_BOT_TOKEN`
- Telegram chat id in `TELEGRAM_CHAT_ID`
- Notification helper installed beside the shared reviewer runtime:
- Codex: `~/.codex/skills/reviewer-runtime/notify-telegram.sh`
- Claude Code: `~/.claude/skills/reviewer-runtime/notify-telegram.sh`
- OpenCode: `~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh`
- Cursor: `.cursor/skills/reviewer-runtime/notify-telegram.sh` or `~/.cursor/skills/reviewer-runtime/notify-telegram.sh`
## Install
The helper ships from `skills/reviewer-runtime/` together with `run-review.sh`.
### Codex
```bash
mkdir -p ~/.codex/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.codex/skills/reviewer-runtime/
```
### Claude Code
```bash
mkdir -p ~/.claude/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.claude/skills/reviewer-runtime/
```
### OpenCode
```bash
mkdir -p ~/.config/opencode/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.config/opencode/skills/reviewer-runtime/
```
### Cursor
Repo-local install:
```bash
mkdir -p .cursor/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* .cursor/skills/reviewer-runtime/
```
Global install:
```bash
mkdir -p ~/.cursor/skills/reviewer-runtime
cp -R skills/reviewer-runtime/* ~/.cursor/skills/reviewer-runtime/
```
## Verify Installation
```bash
test -x ~/.codex/skills/reviewer-runtime/notify-telegram.sh || true
test -x ~/.claude/skills/reviewer-runtime/notify-telegram.sh || true
test -x ~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh || true
test -x .cursor/skills/reviewer-runtime/notify-telegram.sh || test -x ~/.cursor/skills/reviewer-runtime/notify-telegram.sh || true
```
## Configure Telegram
Export the required variables before running a skill that sends completion notifications:
```bash
export TELEGRAM_BOT_TOKEN="<bot-token>"
export TELEGRAM_CHAT_ID="<chat-id>"
```
Optional:
```bash
export TELEGRAM_API_BASE_URL="https://api.telegram.org"
```
## Test the Helper
Example:
```bash
TELEGRAM_BOT_TOKEN="<bot-token>" \
TELEGRAM_CHAT_ID="<chat-id>" \
skills/reviewer-runtime/notify-telegram.sh --message "Telegram notification test"
```
## Rules
- Telegram is the only supported completion notification path for these skills.
- Notification failures are non-blocking, but they must be surfaced to the user.
- Skills should report when Telegram is not configured instead of silently pretending a notification was sent.

View File

@@ -47,7 +47,10 @@ If the user has already specified a reviewer CLI and model (e.g., "create a plan
- For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription) - For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription)
- Accept any model string the user provides - Accept any model string the user provides
Store the chosen `REVIEWER_CLI` and `REVIEWER_MODEL` for Phase 6 (Iterative Plan Review). 3. **Max review rounds for the plan?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store the chosen `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phase 6 (Iterative Plan Review).
### Phase 4: Design (REQUIRED SUB-SKILL) ### Phase 4: Design (REQUIRED SUB-SKILL)
- Invoke `superpowers:brainstorming` explicitly. - Invoke `superpowers:brainstorming` explicitly.
@@ -61,7 +64,7 @@ Store the chosen `REVIEWER_CLI` and `REVIEWER_MODEL` for Phase 6 (Iterative Plan
### Phase 6: Iterative Plan Review ### Phase 6: Iterative Plan Review
Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (max 5 rounds). Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (default max 10 rounds).
**Skip this phase entirely if reviewer was set to `skip`.** **Skip this phase entirely if reviewer was set to `skip`.**
@@ -86,10 +89,60 @@ Resolve the shared reviewer helper from the installed Claude Code skills directo
REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Plan to Temp File #### Step 2: Write Plan to Temp File
Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`. Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`.
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script:
@@ -113,8 +166,21 @@ codex exec \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -133,8 +199,21 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -155,8 +234,21 @@ cursor-agent -p \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
``` ```
@@ -170,13 +262,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/plan-review-${REVIEW_ID}.sh \ --command-file /tmp/plan-review-${REVIEW_ID}.sh \
--stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \
--status-file /tmp/plan-review-${REVIEW_ID}.status --status-file /tmp/plan-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/plan-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -186,6 +281,13 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/plan-review-${REVIEW_ID}.runner.out /tmp/plan-review-${REVIEW_ID}.md
```
Fallback is allowed only when the helper is missing or not executable.
#### Step 4: Read Review & Check Verdict #### Step 4: Read Review & Check Verdict
@@ -202,17 +304,19 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
[Reviewer feedback] [Reviewer feedback]
``` ```
3. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** → proceed to Phase 7 (Initialize workspace) 5. Check verdict:
- **VERDICT: REVISE** → go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → proceed to Phase 7 (Initialize workspace)
- No clear verdict but positive / no actionable items → treat as approved - **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as approved
- Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` → surface status log and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (5) reached → proceed with warning - Max rounds (`MAX_ROUNDS`) reached → present the outcome to the user for a manual decision (proceed or stop)
#### Step 5: Revise the Plan #### Step 5: Revise the Plan
Address each issue the reviewer raised. Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`. Address the reviewer findings in priority order (`P0``P1``P2`, then `P3` when practical). Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`.
Summarize revisions for the user: Summarize revisions for the user:
@@ -223,7 +327,9 @@ Summarize revisions for the user:
If a revision contradicts the user's explicit requirements, skip it and note it for the user. If a revision contradicts the user's explicit requirements, skip it and note it for the user.
#### Step 6: Re-submit to Reviewer (Rounds 2-5) #### Step 6: Re-submit to Reviewer (Rounds 2-N)
Rewrite `/tmp/plan-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
**If `REVIEWER_CLI` is `codex`:** **If `REVIEWER_CLI` is `codex`:**
@@ -237,8 +343,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -260,8 +366,8 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review the full plan. If solid, end with: VERDICT: APPROVED Re-review the full plan using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -282,8 +388,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md
@@ -302,7 +408,7 @@ Return to Step 4.
**Status:** Approved after N round(s) **Status:** Approved after N round(s)
[or] [or]
**Status:** Max rounds (5) reached — not fully approved **Status:** Max rounds (`MAX_ROUNDS`) reached — not fully approved
[Final feedback / remaining concerns] [Final feedback / remaining concerns]
``` ```
@@ -354,6 +460,27 @@ When handing off to execution, instruct:
Private plan files under `~/.claude/plans/` are planning artifacts and must not be used as execution source of truth. Private plan files under `~/.claude/plans/` are planning artifacts and must not be used as execution source of truth.
### Phase 10: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed Claude Code skills directory:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.claude/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome for the create-plan run (approved, max rounds reached, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "create-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Tracker Discipline (MANDATORY) ## Tracker Discipline (MANDATORY)
**ALWAYS update `story-tracker.md` before/after each story. NEVER proceed with stale tracker state.** **ALWAYS update `story-tracker.md` before/after each story. NEVER proceed with stale tracker state.**
@@ -392,6 +519,7 @@ After completing any story:
- [ ] `.gitignore` ignore-rule commit was created if needed - [ ] `.gitignore` ignore-rule commit was created if needed
- [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/` - [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/`
- [ ] Reviewer configured or explicitly skipped - [ ] Reviewer configured or explicitly skipped
- [ ] Max review rounds confirmed (default: 10)
- [ ] Plan review completed (approved or max rounds) — or skipped - [ ] Plan review completed (approved or max rounds) — or skipped
- [ ] `original-plan.md` copied from `~/.claude/plans/` plan file - [ ] `original-plan.md` copied from `~/.claude/plans/` plan file
- [ ] `final-transcript.md` present - [ ] `final-transcript.md` present
@@ -399,6 +527,7 @@ After completing any story:
- [ ] `story-tracker.md` created with all stories as `pending` - [ ] `story-tracker.md` created with all stories as `pending`
- [ ] `continuation-runbook.md` present - [ ] `continuation-runbook.md` present
- [ ] Handoff explicitly says to read runbook first and execute from plan folder - [ ] Handoff explicitly says to read runbook first and execute from plan folder
- [ ] Telegram completion notification attempted if configured
## Exit Triggers for Question Phase ## Exit Triggers for Question Phase
User says: "ready", "done", "let's plan", "proceed", "enough questions" User says: "ready", "done", "let's plan", "proceed", "enough questions"

View File

@@ -72,7 +72,10 @@ If the user has already specified a reviewer CLI and model (e.g., "create a plan
- For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription) - For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription)
- Accept any model string the user provides - Accept any model string the user provides
Store the chosen `REVIEWER_CLI` and `REVIEWER_MODEL` for Phase 6 (Iterative Plan Review). 3. **Max review rounds for the plan?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store the chosen `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phase 6 (Iterative Plan Review).
### Phase 4: Design (REQUIRED SUB-SKILL) ### Phase 4: Design (REQUIRED SUB-SKILL)
@@ -84,7 +87,7 @@ Invoke `superpowers:writing-plans`, then break work into milestones and bite-siz
### Phase 6: Iterative Plan Review ### Phase 6: Iterative Plan Review
Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (max 5 rounds). Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (default max 10 rounds).
**Skip this phase entirely if reviewer was set to `skip`.** **Skip this phase entirely if reviewer was set to `skip`.**
@@ -109,10 +112,60 @@ Resolve the shared reviewer helper from the installed Codex skills directory:
REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Plan to Temp File #### Step 2: Write Plan to Temp File
Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`. Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`.
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script:
@@ -136,8 +189,21 @@ codex exec \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -156,8 +222,21 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -178,8 +257,21 @@ cursor-agent -p \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
``` ```
@@ -193,13 +285,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/plan-review-${REVIEW_ID}.sh \ --command-file /tmp/plan-review-${REVIEW_ID}.sh \
--stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \
--status-file /tmp/plan-review-${REVIEW_ID}.status --status-file /tmp/plan-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/plan-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -209,6 +304,11 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/plan-review-${REVIEW_ID}.runner.out /tmp/plan-review-${REVIEW_ID}.md
```
Fallback is allowed only when the helper is missing or not executable. Fallback is allowed only when the helper is missing or not executable.
@@ -227,17 +327,19 @@ Fallback is allowed only when the helper is missing or not executable.
[Reviewer feedback] [Reviewer feedback]
``` ```
3. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** → proceed to Phase 7 (Initialize workspace) 5. Check verdict:
- **VERDICT: REVISE** → go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → proceed to Phase 7 (Initialize workspace)
- No clear verdict but positive / no actionable items → treat as approved - **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as approved
- Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` → surface status log and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (5) reached → proceed with warning - Max rounds (`MAX_ROUNDS`) reached → present the outcome to the user for a manual decision (proceed or stop)
#### Step 5: Revise the Plan #### Step 5: Revise the Plan
Address each issue the reviewer raised. Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`. Address the reviewer findings in priority order (`P0``P1``P2`, then `P3` when practical). Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`.
Summarize revisions for the user: Summarize revisions for the user:
@@ -248,7 +350,9 @@ Summarize revisions for the user:
If a revision contradicts the user's explicit requirements, skip it and note it for the user. If a revision contradicts the user's explicit requirements, skip it and note it for the user.
#### Step 6: Re-submit to Reviewer (Rounds 2-5) #### Step 6: Re-submit to Reviewer (Rounds 2-N)
Rewrite `/tmp/plan-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
**If `REVIEWER_CLI` is `codex`:** **If `REVIEWER_CLI` is `codex`:**
@@ -262,8 +366,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -285,8 +389,8 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review the full plan. If solid, end with: VERDICT: APPROVED Re-review the full plan using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -307,8 +411,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md
@@ -327,7 +431,7 @@ Return to Step 4.
**Status:** Approved after N round(s) **Status:** Approved after N round(s)
[or] [or]
**Status:** Max rounds (5) reached — not fully approved **Status:** Max rounds (`MAX_ROUNDS`) reached — not fully approved
[Final feedback / remaining concerns] [Final feedback / remaining concerns]
``` ```
@@ -374,19 +478,41 @@ Always instruct the executing agent:
Do not rely on planner-private files during implementation. Do not rely on planner-private files during implementation.
### Phase 10: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed Codex skills directory:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.codex/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome for the create-plan run (approved, max rounds reached, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "create-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Quick Reference ## Quick Reference
| Phase | Action | Required Output | | Phase | Action | Required Output |
|---|---|---| |---|---|---|
| 1 | Analyze codebase/context | Constraints and known patterns | | 1 | Analyze codebase/context | Constraints and known patterns |
| 2 | Gather requirements (one question at a time) | Confirmed scope and success criteria | | 2 | Gather requirements (one question at a time) | Confirmed scope and success criteria |
| 3 | Configure reviewer CLI and model | `REVIEWER_CLI` and `REVIEWER_MODEL` (or `skip`) | | 3 | Configure reviewer CLI and model | `REVIEWER_CLI`, `REVIEWER_MODEL`, `MAX_ROUNDS` (or `skip`) |
| 4 | Invoke `superpowers:brainstorming` | Chosen design approach | | 4 | Invoke `superpowers:brainstorming` | Chosen design approach |
| 5 | Invoke `superpowers:writing-plans` | Milestones and bite-sized stories | | 5 | Invoke `superpowers:writing-plans` | Milestones and bite-sized stories |
| 6 | Iterative plan review (max 5 rounds) | Reviewer approval or max-rounds warning | | 6 | Iterative plan review (max `MAX_ROUNDS` rounds) | Reviewer approval or max-rounds warning |
| 7 | Initialize `ai_plan/` + `.gitignore` | Local planning workspace ready | | 7 | Initialize `ai_plan/` + `.gitignore` | Local planning workspace ready |
| 8 | Build plan package from templates | Full plan folder with required files | | 8 | Build plan package from templates | Full plan folder with required files |
| 9 | Handoff with runbook-first instruction | Resumable execution context | | 9 | Handoff with runbook-first instruction | Resumable execution context |
| 10 | Send Telegram completion notification | User notified or notification status reported |
## Execution Rules to Include in Plan (MANDATORY) ## Execution Rules to Include in Plan (MANDATORY)
@@ -413,6 +539,7 @@ Do not rely on planner-private files during implementation.
- Handoff without explicit "read runbook first" direction. - Handoff without explicit "read runbook first" direction.
- Skipping the reviewer phase without explicit user opt-out. - Skipping the reviewer phase without explicit user opt-out.
- Not capturing the Codex session ID for resume in subsequent review rounds. - Not capturing the Codex session ID for resume in subsequent review rounds.
- Using any completion notification path other than Telegram.
## Rationalizations and Counters ## Rationalizations and Counters
@@ -440,6 +567,7 @@ Do not rely on planner-private files during implementation.
- [ ] `.gitignore` ignore-rule commit was created if needed - [ ] `.gitignore` ignore-rule commit was created if needed
- [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/` - [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/`
- [ ] Reviewer configured or explicitly skipped - [ ] Reviewer configured or explicitly skipped
- [ ] Max review rounds confirmed (default: 10)
- [ ] Plan review completed (approved or max rounds) — or skipped - [ ] Plan review completed (approved or max rounds) — or skipped
- [ ] `original-plan.md` present - [ ] `original-plan.md` present
- [ ] `final-transcript.md` present - [ ] `final-transcript.md` present
@@ -447,3 +575,4 @@ Do not rely on planner-private files during implementation.
- [ ] `story-tracker.md` present - [ ] `story-tracker.md` present
- [ ] `continuation-runbook.md` present - [ ] `continuation-runbook.md` present
- [ ] Handoff explicitly says to read runbook first and execute from plan folder - [ ] Handoff explicitly says to read runbook first and execute from plan folder
- [ ] Telegram completion notification attempted if configured

View File

@@ -73,7 +73,10 @@ If the user has already specified a reviewer CLI and model (e.g., "create a plan
- For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription) - For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription)
- Accept any model string the user provides - Accept any model string the user provides
Store the chosen `REVIEWER_CLI` and `REVIEWER_MODEL` for Phase 6 (Iterative Plan Review). 3. **Max review rounds for the plan?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store the chosen `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phase 6 (Iterative Plan Review).
### Phase 4: Design (REQUIRED SUB-SKILL) ### Phase 4: Design (REQUIRED SUB-SKILL)
@@ -86,7 +89,7 @@ Story IDs: `S-{milestone}{sequence}`.
### Phase 6: Iterative Plan Review ### Phase 6: Iterative Plan Review
Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (max 5 rounds). Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (default max 10 rounds).
**Skip this phase entirely if reviewer was set to `skip`.** **Skip this phase entirely if reviewer was set to `skip`.**
@@ -115,10 +118,60 @@ else
fi fi
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Plan to Temp File #### Step 2: Write Plan to Temp File
Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`. Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`.
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script:
@@ -142,8 +195,21 @@ codex exec \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -162,8 +228,21 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -184,8 +263,21 @@ cursor-agent -p \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
``` ```
@@ -205,13 +297,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/plan-review-${REVIEW_ID}.sh \ --command-file /tmp/plan-review-${REVIEW_ID}.sh \
--stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \
--status-file /tmp/plan-review-${REVIEW_ID}.status --status-file /tmp/plan-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/plan-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -221,6 +316,11 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/plan-review-${REVIEW_ID}.runner.out /tmp/plan-review-${REVIEW_ID}.md
```
#### Step 4: Read Review & Check Verdict #### Step 4: Read Review & Check Verdict
@@ -237,17 +337,19 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
[Reviewer feedback] [Reviewer feedback]
``` ```
3. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** → proceed to Phase 7 (Initialize workspace) 5. Check verdict:
- **VERDICT: REVISE** → go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → proceed to Phase 7 (Initialize workspace)
- No clear verdict but positive / no actionable items → treat as approved - **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as approved
- Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` → surface status log and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (5) reached → proceed with warning - Max rounds (`MAX_ROUNDS`) reached → present the outcome to the user for a manual decision (proceed or stop)
#### Step 5: Revise the Plan #### Step 5: Revise the Plan
Address each issue the reviewer raised. Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`. Address the reviewer findings in priority order (`P0``P1``P2`, then `P3` when practical). Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`.
Summarize revisions for the user: Summarize revisions for the user:
@@ -258,7 +360,9 @@ Summarize revisions for the user:
If a revision contradicts the user's explicit requirements, skip it and note it for the user. If a revision contradicts the user's explicit requirements, skip it and note it for the user.
#### Step 6: Re-submit to Reviewer (Rounds 2-5) #### Step 6: Re-submit to Reviewer (Rounds 2-N)
Rewrite `/tmp/plan-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
**If `REVIEWER_CLI` is `codex`:** **If `REVIEWER_CLI` is `codex`:**
@@ -272,8 +376,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -295,8 +399,8 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review the full plan. If solid, end with: VERDICT: APPROVED Re-review the full plan using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -317,8 +421,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md
@@ -337,7 +441,7 @@ Return to Step 4.
**Status:** Approved after N round(s) **Status:** Approved after N round(s)
[or] [or]
**Status:** Max rounds (5) reached — not fully approved **Status:** Max rounds (`MAX_ROUNDS`) reached — not fully approved
[Final feedback / remaining concerns] [Final feedback / remaining concerns]
``` ```
@@ -384,19 +488,45 @@ Always instruct the executing agent:
Do not rely on planner-private files during implementation. Do not rely on planner-private files during implementation.
### Phase 10: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from Cursor's installed skills directory:
```bash
if [ -x .cursor/skills/reviewer-runtime/notify-telegram.sh ]; then
TELEGRAM_NOTIFY_RUNTIME=.cursor/skills/reviewer-runtime/notify-telegram.sh
else
TELEGRAM_NOTIFY_RUNTIME=~/.cursor/skills/reviewer-runtime/notify-telegram.sh
fi
```
On every terminal outcome for the create-plan run (approved, max rounds reached, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "create-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Quick Reference ## Quick Reference
| Phase | Action | Required Output | | Phase | Action | Required Output |
|---|---|---| |---|---|---|
| 1 | Analyze codebase/context | Constraints and known patterns | | 1 | Analyze codebase/context | Constraints and known patterns |
| 2 | Gather requirements (one question at a time) | Confirmed scope and success criteria | | 2 | Gather requirements (one question at a time) | Confirmed scope and success criteria |
| 3 | Configure reviewer CLI and model | `REVIEWER_CLI` and `REVIEWER_MODEL` (or `skip`) | | 3 | Configure reviewer CLI and model | `REVIEWER_CLI`, `REVIEWER_MODEL`, `MAX_ROUNDS` (or `skip`) |
| 4 | Invoke `superpowers:brainstorming` | Chosen design approach | | 4 | Invoke `superpowers:brainstorming` | Chosen design approach |
| 5 | Invoke `superpowers:writing-plans` | Milestones and bite-sized stories | | 5 | Invoke `superpowers:writing-plans` | Milestones and bite-sized stories |
| 6 | Iterative plan review (max 5 rounds) | Reviewer approval or max-rounds warning | | 6 | Iterative plan review (max `MAX_ROUNDS` rounds) | Reviewer approval or max-rounds warning |
| 7 | Initialize `ai_plan/` + `.gitignore` | Local planning workspace ready | | 7 | Initialize `ai_plan/` + `.gitignore` | Local planning workspace ready |
| 8 | Build plan package from templates | Full plan folder with required files | | 8 | Build plan package from templates | Full plan folder with required files |
| 9 | Handoff with runbook-first instruction | Resumable execution context | | 9 | Handoff with runbook-first instruction | Resumable execution context |
| 10 | Send Telegram completion notification | User notified or notification status reported |
## Tracker Discipline (MANDATORY) ## Tracker Discipline (MANDATORY)
@@ -437,6 +567,7 @@ After completing any story:
- Omitting one or more required files in the plan package. - Omitting one or more required files in the plan package.
- Handoff without explicit "read runbook first" direction. - Handoff without explicit "read runbook first" direction.
- Skipping the reviewer phase without explicit user opt-out. - Skipping the reviewer phase without explicit user opt-out.
- Using any completion notification path other than Telegram.
## Red Flags - Stop and Correct ## Red Flags - Stop and Correct
@@ -454,6 +585,7 @@ After completing any story:
- [ ] `.gitignore` ignore-rule commit was created if needed - [ ] `.gitignore` ignore-rule commit was created if needed
- [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/` - [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/`
- [ ] Reviewer configured or explicitly skipped - [ ] Reviewer configured or explicitly skipped
- [ ] Max review rounds confirmed (default: 10)
- [ ] Plan review completed (approved or max rounds) — or skipped - [ ] Plan review completed (approved or max rounds) — or skipped
- [ ] `original-plan.md` present - [ ] `original-plan.md` present
- [ ] `final-transcript.md` present - [ ] `final-transcript.md` present
@@ -461,6 +593,7 @@ After completing any story:
- [ ] `story-tracker.md` present - [ ] `story-tracker.md` present
- [ ] `continuation-runbook.md` present - [ ] `continuation-runbook.md` present
- [ ] Handoff explicitly says to read runbook first and execute from plan folder - [ ] Handoff explicitly says to read runbook first and execute from plan folder
- [ ] Telegram completion notification attempted if configured
## Exit Triggers for Question Phase ## Exit Triggers for Question Phase
User says: "ready", "done", "let's plan", "proceed", "enough questions" User says: "ready", "done", "let's plan", "proceed", "enough questions"

View File

@@ -59,7 +59,10 @@ If the user has already specified a reviewer CLI and model (e.g., "create a plan
- For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription) - For `cursor`: **run `cursor-agent models` first** to see your account's available models (availability varies by subscription)
- Accept any model string the user provides - Accept any model string the user provides
Store the chosen `REVIEWER_CLI` and `REVIEWER_MODEL` for Phase 7 (Iterative Plan Review). 3. **Max review rounds for the plan?** (default: 10)
- If the user does not provide a value, set `MAX_ROUNDS=10`.
Store the chosen `REVIEWER_CLI`, `REVIEWER_MODEL`, and `MAX_ROUNDS` for Phase 7 (Iterative Plan Review).
### Phase 5: Design (REQUIRED SUB-SKILL) ### Phase 5: Design (REQUIRED SUB-SKILL)
@@ -78,7 +81,7 @@ Story IDs: `S-{milestone}{sequence}`.
### Phase 7: Iterative Plan Review ### Phase 7: Iterative Plan Review
Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (max 5 rounds). Send the plan to the configured reviewer CLI for feedback. Revise and re-submit until approved (default max 10 rounds).
**Skip this phase entirely if reviewer was set to `skip`.** **Skip this phase entirely if reviewer was set to `skip`.**
@@ -103,10 +106,60 @@ Resolve the shared reviewer helper from the installed OpenCode skills directory:
REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/plan-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Plan to Temp File #### Step 2: Write Plan to Temp File
Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`. Write the complete plan (milestones, stories, design decisions, specs) to `/tmp/plan-${REVIEW_ID}.md`.
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/plan-review-${REVIEW_ID}.sh` as a bash script:
@@ -130,8 +183,21 @@ codex exec \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -150,8 +216,21 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -172,8 +251,21 @@ cursor-agent -p \
4. Alternatives — Is there a simpler or better approach? 4. Alternatives — Is there a simpler or better approach?
5. Security — Any security concerns? 5. Security — Any security concerns?
Be specific and actionable. If the plan is solid, end with exactly: VERDICT: APPROVED Return exactly these sections in order:
If changes are needed, end with exactly: VERDICT: REVISE" \ ## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
``` ```
@@ -187,13 +279,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/plan-review-${REVIEW_ID}.sh \ --command-file /tmp/plan-review-${REVIEW_ID}.sh \
--stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/plan-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/plan-review-${REVIEW_ID}.stderr \
--status-file /tmp/plan-review-${REVIEW_ID}.status --status-file /tmp/plan-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr bash /tmp/plan-review-${REVIEW_ID}.sh >/tmp/plan-review-${REVIEW_ID}.runner.out 2>/tmp/plan-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/plan-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -203,6 +298,11 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/plan-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/plan-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/plan-review-${REVIEW_ID}.runner.out /tmp/plan-review-${REVIEW_ID}.md
```
#### Step 4: Read Review & Check Verdict #### Step 4: Read Review & Check Verdict
@@ -219,17 +319,19 @@ jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_I
[Reviewer feedback] [Reviewer feedback]
``` ```
3. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** → proceed to Phase 8 (Initialize workspace) 5. Check verdict:
- **VERDICT: REVISE** → go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings → proceed to Phase 8 (Initialize workspace)
- No clear verdict but positive / no actionable items → treat as approved - **VERDICT: APPROVED** with only `P3` findings → optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding → go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` → treat as approved
- Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` → treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` → surface status log and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (5) reached → proceed with warning - Max rounds (`MAX_ROUNDS`) reached → present the outcome to the user for a manual decision (proceed or stop)
#### Step 5: Revise the Plan #### Step 5: Revise the Plan
Address each issue the reviewer raised. Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`. Address the reviewer findings in priority order (`P0``P1``P2`, then `P3` when practical). Update the plan in conversation context and rewrite `/tmp/plan-${REVIEW_ID}.md`.
Summarize revisions for the user: Summarize revisions for the user:
@@ -240,7 +342,9 @@ Summarize revisions for the user:
If a revision contradicts the user's explicit requirements, skip it and note it for the user. If a revision contradicts the user's explicit requirements, skip it and note it for the user.
#### Step 6: Re-submit to Reviewer (Rounds 2-5) #### Step 6: Re-submit to Reviewer (Rounds 2-N)
Rewrite `/tmp/plan-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
**If `REVIEWER_CLI` is `codex`:** **If `REVIEWER_CLI` is `codex`:**
@@ -254,8 +358,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -277,8 +381,8 @@ $(cat /tmp/plan-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review the full plan. If solid, end with: VERDICT: APPROVED Re-review the full plan using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -299,8 +403,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.json
jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md jq -r '.result' /tmp/plan-review-${REVIEW_ID}.json > /tmp/plan-review-${REVIEW_ID}.md
@@ -319,7 +423,7 @@ Return to Step 4.
**Status:** Approved after N round(s) **Status:** Approved after N round(s)
[or] [or]
**Status:** Max rounds (5) reached — not fully approved **Status:** Max rounds (`MAX_ROUNDS`) reached — not fully approved
[Final feedback / remaining concerns] [Final feedback / remaining concerns]
``` ```
@@ -364,6 +468,27 @@ Use templates from this skill's `templates/` folder.
Always instruct the executing agent: Always instruct the executing agent:
> Read `ai_plan/YYYY-MM-DD-<short-title>/continuation-runbook.md` first, then execute from `ai_plan` files only. > Read `ai_plan/YYYY-MM-DD-<short-title>/continuation-runbook.md` first, then execute from `ai_plan` files only.
### Phase 11: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed OpenCode skills directory:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome for the create-plan run (approved, max rounds reached, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "create-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Tracker Discipline (MANDATORY) ## Tracker Discipline (MANDATORY)
Before starting any story: Before starting any story:
@@ -400,6 +525,7 @@ After completing any story:
- [ ] `.gitignore` ignore-rule commit was created if needed - [ ] `.gitignore` ignore-rule commit was created if needed
- [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/` - [ ] Plan directory created under `ai_plan/YYYY-MM-DD-<short-title>/`
- [ ] Reviewer configured or explicitly skipped - [ ] Reviewer configured or explicitly skipped
- [ ] Max review rounds confirmed (default: 10)
- [ ] Plan review completed (approved or max rounds) — or skipped - [ ] Plan review completed (approved or max rounds) — or skipped
- [ ] `original-plan.md` present - [ ] `original-plan.md` present
- [ ] `final-transcript.md` present - [ ] `final-transcript.md` present
@@ -407,6 +533,7 @@ After completing any story:
- [ ] `story-tracker.md` created with all stories as `pending` - [ ] `story-tracker.md` created with all stories as `pending`
- [ ] `continuation-runbook.md` present - [ ] `continuation-runbook.md` present
- [ ] Handoff explicitly says to read runbook first and execute from plan folder - [ ] Handoff explicitly says to read runbook first and execute from plan folder
- [ ] Telegram completion notification attempted if configured
## Exit Triggers for Question Phase ## Exit Triggers for Question Phase
User says: "ready", "done", "let's plan", "proceed", "enough questions" User says: "ready", "done", "let's plan", "proceed", "enough questions"

View File

@@ -151,6 +151,20 @@ Resolve the shared runtime helper path before writing the command script:
REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Review Payload #### Step 2: Write Review Payload
Write to `/tmp/milestone-${REVIEW_ID}.md`: Write to `/tmp/milestone-${REVIEW_ID}.md`:
@@ -176,6 +190,42 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:
[test output with pass/fail counts] [test output with pass/fail counts]
``` ```
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
@@ -201,8 +251,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -222,8 +286,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -246,8 +324,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -261,13 +353,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/milestone-review-${REVIEW_ID}.sh \ --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
--stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
--status-file /tmp/milestone-review-${REVIEW_ID}.status --status-file /tmp/milestone-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/milestone-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -277,6 +372,11 @@ jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md
```
Fallback is allowed only when the helper is missing or not executable. Fallback is allowed only when the helper is missing or not executable.
@@ -295,17 +395,19 @@ Fallback is allowed only when the helper is missing or not executable.
[Reviewer feedback] [Reviewer feedback]
``` ```
4. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve) 5. Check verdict:
- **VERDICT: REVISE** -> go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings -> proceed to Phase 4 Step 6 (commit & approve)
- No clear verdict but positive / no actionable items -> treat as approved - **VERDICT: APPROVED** with only `P3` findings -> optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding -> go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` -> treat as approved
- Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` -> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)
#### Step 5: Address Feedback & Re-verify #### Step 5: Address Feedback & Re-verify
1. Address each issue the reviewer raised (do NOT commit yet). 1. Address the reviewer findings in priority order (`P0` -> `P1` -> `P2`, then `P3` when practical) (do NOT commit yet).
2. Re-run verification (lint/typecheck/tests) — all must pass. 2. Re-run verification (lint/typecheck/tests) — all must pass.
3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output. 3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output.
@@ -334,8 +436,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -357,12 +459,11 @@ $(cat /tmp/milestone-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user \ --setting-sources user
> /tmp/milestone-review-${REVIEW_ID}.md
``` ```
**If `REVIEWER_CLI` is `cursor`:** **If `REVIEWER_CLI` is `cursor`:**
@@ -380,8 +481,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -443,6 +544,27 @@ Present summary:
**Branch:** implement/<plan-folder-name> (merged and deleted) **Branch:** implement/<plan-folder-name> (merged and deleted)
``` ```
### Phase 8: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed Claude Code skills directory:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.claude/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Tracker Discipline (MANDATORY) ## Tracker Discipline (MANDATORY)
**ALWAYS update `story-tracker.md` before/after each story. NEVER proceed with stale tracker state.** **ALWAYS update `story-tracker.md` before/after each story. NEVER proceed with stale tracker state.**
@@ -477,3 +599,4 @@ Note: Commit hashes are backfilled into story Notes after the milestone commit (
- [ ] Final test suite passes - [ ] Final test suite passes
- [ ] Worktree branch merged to parent and worktree deleted - [ ] Worktree branch merged to parent and worktree deleted
- [ ] Story tracker updated with final status - [ ] Story tracker updated with final status
- [ ] Telegram completion notification attempted if configured

View File

@@ -184,6 +184,20 @@ Resolve the shared runtime helper path before writing the command script:
REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Review Payload #### Step 2: Write Review Payload
Write to `/tmp/milestone-${REVIEW_ID}.md`: Write to `/tmp/milestone-${REVIEW_ID}.md`:
@@ -209,6 +223,42 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:
[test output with pass/fail counts] [test output with pass/fail counts]
``` ```
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
@@ -234,8 +284,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -255,8 +319,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -279,8 +357,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -294,13 +386,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/milestone-review-${REVIEW_ID}.sh \ --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
--stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
--status-file /tmp/milestone-review-${REVIEW_ID}.status --status-file /tmp/milestone-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/milestone-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -310,6 +405,11 @@ jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md
```
Fallback is allowed only when the helper is missing or not executable. Fallback is allowed only when the helper is missing or not executable.
@@ -328,17 +428,19 @@ Fallback is allowed only when the helper is missing or not executable.
[Reviewer feedback] [Reviewer feedback]
``` ```
4. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve) 5. Check verdict:
- **VERDICT: REVISE** -> go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings -> proceed to Phase 4 Step 6 (commit & approve)
- No clear verdict but positive / no actionable items -> treat as approved - **VERDICT: APPROVED** with only `P3` findings -> optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding -> go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` -> treat as approved
- Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` -> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)
#### Step 5: Address Feedback & Re-verify #### Step 5: Address Feedback & Re-verify
1. Address each issue the reviewer raised (do NOT commit yet). 1. Address the reviewer findings in priority order (`P0` -> `P1` -> `P2`, then `P3` when practical) (do NOT commit yet).
2. Re-run verification (lint/typecheck/tests) — all must pass. 2. Re-run verification (lint/typecheck/tests) — all must pass.
3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output. 3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output.
@@ -367,8 +469,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -390,12 +492,11 @@ $(cat /tmp/milestone-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user \ --setting-sources user
> /tmp/milestone-review-${REVIEW_ID}.md
``` ```
**If `REVIEWER_CLI` is `cursor`:** **If `REVIEWER_CLI` is `cursor`:**
@@ -413,8 +514,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -476,6 +577,27 @@ Present summary:
**Branch:** implement/<plan-folder-name> (merged and deleted) **Branch:** implement/<plan-folder-name> (merged and deleted)
``` ```
### Phase 8: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed Codex skills directory:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.codex/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Quick Reference ## Quick Reference
| Phase | Action | Required Output | | Phase | Action | Required Output |
@@ -487,6 +609,7 @@ Present summary:
| 5 | Milestone review loop (per milestone) | Reviewer approval or max rounds + user override | | 5 | Milestone review loop (per milestone) | Reviewer approval or max rounds + user override |
| 6 | Invoke `superpowers:finishing-a-development-branch` | Branch merged to parent, worktree deleted | | 6 | Invoke `superpowers:finishing-a-development-branch` | Branch merged to parent, worktree deleted |
| 7 | Final report | Summary presented | | 7 | Final report | Summary presented |
| 8 | Send Telegram completion notification | User notified or notification status reported |
## Tracker Discipline (MANDATORY) ## Tracker Discipline (MANDATORY)
@@ -514,6 +637,7 @@ Note: Commit hashes are backfilled into story Notes after the milestone commit (
- Not capturing the Codex session ID for resume in subsequent review rounds. - Not capturing the Codex session ID for resume in subsequent review rounds.
- Forgetting to update `story-tracker.md` between stories. - Forgetting to update `story-tracker.md` between stories.
- Creating a new worktree when one already exists for a resumed plan. - Creating a new worktree when one already exists for a resumed plan.
- Using any completion notification path other than Telegram.
## Rationalizations and Counters ## Rationalizations and Counters
@@ -551,3 +675,4 @@ Note: Commit hashes are backfilled into story Notes after the milestone commit (
- [ ] Final test suite passes - [ ] Final test suite passes
- [ ] Worktree branch merged to parent and worktree deleted - [ ] Worktree branch merged to parent and worktree deleted
- [ ] Story tracker updated with final status - [ ] Story tracker updated with final status
- [ ] Telegram completion notification attempted if configured

View File

@@ -188,6 +188,20 @@ else
fi fi
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Review Payload #### Step 2: Write Review Payload
Write to `/tmp/milestone-${REVIEW_ID}.md`: Write to `/tmp/milestone-${REVIEW_ID}.md`:
@@ -213,6 +227,42 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:
[test output with pass/fail counts] [test output with pass/fail counts]
``` ```
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
@@ -238,8 +288,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -259,8 +323,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -283,8 +361,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -304,13 +396,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/milestone-review-${REVIEW_ID}.sh \ --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
--stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
--status-file /tmp/milestone-review-${REVIEW_ID}.status --status-file /tmp/milestone-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/milestone-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -320,6 +415,11 @@ jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md
```
Fallback is allowed only when the helper is missing or not executable. Fallback is allowed only when the helper is missing or not executable.
@@ -338,17 +438,19 @@ Fallback is allowed only when the helper is missing or not executable.
[Reviewer feedback] [Reviewer feedback]
``` ```
4. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve) 5. Check verdict:
- **VERDICT: REVISE** -> go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings -> proceed to Phase 4 Step 6 (commit & approve)
- No clear verdict but positive / no actionable items -> treat as approved - **VERDICT: APPROVED** with only `P3` findings -> optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding -> go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` -> treat as approved
- Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` -> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)
#### Step 5: Address Feedback & Re-verify #### Step 5: Address Feedback & Re-verify
1. Address each issue the reviewer raised (do NOT commit yet). 1. Address the reviewer findings in priority order (`P0` -> `P1` -> `P2`, then `P3` when practical) (do NOT commit yet).
2. Re-run verification (lint/typecheck/tests) — all must pass. 2. Re-run verification (lint/typecheck/tests) — all must pass.
3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output. 3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output.
@@ -377,8 +479,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -400,12 +502,11 @@ $(cat /tmp/milestone-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user \ --setting-sources user
> /tmp/milestone-review-${REVIEW_ID}.md
``` ```
**If `REVIEWER_CLI` is `cursor`:** **If `REVIEWER_CLI` is `cursor`:**
@@ -423,8 +524,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -486,6 +587,31 @@ Present summary:
**Branch:** implement/<plan-folder-name> (merged and deleted) **Branch:** implement/<plan-folder-name> (merged and deleted)
``` ```
### Phase 8: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from Cursor's installed skills directory:
```bash
if [ -x .cursor/skills/reviewer-runtime/notify-telegram.sh ]; then
TELEGRAM_NOTIFY_RUNTIME=.cursor/skills/reviewer-runtime/notify-telegram.sh
else
TELEGRAM_NOTIFY_RUNTIME=~/.cursor/skills/reviewer-runtime/notify-telegram.sh
fi
```
On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Quick Reference ## Quick Reference
| Phase | Action | Required Output | | Phase | Action | Required Output |
@@ -497,6 +623,7 @@ Present summary:
| 5 | Milestone review loop (per milestone) | Reviewer approval or max rounds + user override | | 5 | Milestone review loop (per milestone) | Reviewer approval or max rounds + user override |
| 6 | Invoke `superpowers:finishing-a-development-branch` | Branch merged to parent, worktree deleted | | 6 | Invoke `superpowers:finishing-a-development-branch` | Branch merged to parent, worktree deleted |
| 7 | Final report | Summary presented | | 7 | Final report | Summary presented |
| 8 | Send Telegram completion notification | User notified or notification status reported |
## Tracker Discipline (MANDATORY) ## Tracker Discipline (MANDATORY)
@@ -522,6 +649,7 @@ Note: Commit hashes are backfilled into story Notes after the milestone commit (
- Skipping worktree setup and working directly on the main branch. - Skipping worktree setup and working directly on the main branch.
- Forgetting to update `story-tracker.md` between stories. - Forgetting to update `story-tracker.md` between stories.
- Creating a new worktree when one already exists for a resumed plan. - Creating a new worktree when one already exists for a resumed plan.
- Using any completion notification path other than Telegram.
## Red Flags - Stop and Correct ## Red Flags - Stop and Correct
@@ -550,3 +678,4 @@ Note: Commit hashes are backfilled into story Notes after the milestone commit (
- [ ] Final test suite passes - [ ] Final test suite passes
- [ ] Worktree branch merged to parent and worktree deleted - [ ] Worktree branch merged to parent and worktree deleted
- [ ] Story tracker updated with final status - [ ] Story tracker updated with final status
- [ ] Telegram completion notification attempted if configured

View File

@@ -169,6 +169,20 @@ Resolve the shared runtime helper path before writing the command script:
REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh
``` ```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Review Payload #### Step 2: Write Review Payload
Write to `/tmp/milestone-${REVIEW_ID}.md`: Write to `/tmp/milestone-${REVIEW_ID}.md`:
@@ -194,6 +208,42 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:
[test output with pass/fail counts] [test output with pass/fail counts]
``` ```
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1) #### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
@@ -219,8 +269,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
``` ```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -240,8 +304,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user --setting-sources user
@@ -264,8 +342,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested? 4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced? 5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \ Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -279,13 +371,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/milestone-review-${REVIEW_ID}.sh \ --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
--stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
--status-file /tmp/milestone-review-${REVIEW_ID}.status --status-file /tmp/milestone-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
fi fi
``` ```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/milestone-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes: After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text: - If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -295,6 +390,11 @@ jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-
``` ```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. - If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md
```
Fallback is allowed only when the helper is missing or not executable. Fallback is allowed only when the helper is missing or not executable.
@@ -313,17 +413,19 @@ Fallback is allowed only when the helper is missing or not executable.
[Reviewer feedback] [Reviewer feedback]
``` ```
4. Check verdict: 4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
- **VERDICT: APPROVED** -> proceed to Phase 5 Step 6 (commit & approve) 5. Check verdict:
- **VERDICT: REVISE** -> go to Step 5 - **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings -> proceed to Phase 5 Step 6 (commit & approve)
- No clear verdict but positive / no actionable items -> treat as approved - **VERDICT: APPROVED** with only `P3` findings -> optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding -> go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` -> treat as approved
- Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Helper state `needs-operator-decision` -> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)
#### Step 5: Address Feedback & Re-verify #### Step 5: Address Feedback & Re-verify
1. Address each issue the reviewer raised (do NOT commit yet). 1. Address the reviewer findings in priority order (`P0` -> `P1` -> `P2`, then `P3` when practical) (do NOT commit yet).
2. Re-run verification (lint/typecheck/tests) — all must pass. 2. Re-run verification (lint/typecheck/tests) — all must pass.
3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output. 3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output.
@@ -352,8 +454,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
``` ```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds. If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -375,12 +477,11 @@ $(cat /tmp/milestone-${REVIEW_ID}.md)
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \ --model ${REVIEWER_MODEL} \
--strict-mcp-config \ --strict-mcp-config \
--setting-sources user \ --setting-sources user
> /tmp/milestone-review-${REVIEW_ID}.md
``` ```
**If `REVIEWER_CLI` is `cursor`:** **If `REVIEWER_CLI` is `cursor`:**
@@ -398,8 +499,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made: Changes made:
[List specific changes] [List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
If more changes needed, end with: VERDICT: REVISE" \ Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.json
``` ```
@@ -461,6 +562,27 @@ Present summary:
**Branch:** implement/<plan-folder-name> (merged and deleted) **Branch:** implement/<plan-folder-name> (merged and deleted)
``` ```
### Phase 9: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed OpenCode skills directory:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Tracker Discipline (MANDATORY) ## Tracker Discipline (MANDATORY)
Before starting any story: Before starting any story:
@@ -493,3 +615,4 @@ Note: Commit hashes are backfilled into story Notes after the milestone commit (
- [ ] Final test suite passes - [ ] Final test suite passes
- [ ] Worktree branch merged to parent and worktree deleted - [ ] Worktree branch merged to parent and worktree deleted
- [ ] Story tracker updated with final status - [ ] Story tracker updated with final status
- [ ] Telegram completion notification attempted if configured

View File

@@ -0,0 +1,99 @@
#!/usr/bin/env bash
set -euo pipefail
DEFAULT_API_BASE_URL="https://api.telegram.org"
DEFAULT_PARSE_MODE="HTML"
MAX_MESSAGE_LENGTH=4096
BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-}
CHAT_ID=${TELEGRAM_CHAT_ID:-}
API_BASE_URL=${TELEGRAM_API_BASE_URL:-$DEFAULT_API_BASE_URL}
PARSE_MODE=${TELEGRAM_PARSE_MODE:-$DEFAULT_PARSE_MODE}
MESSAGE=""
MESSAGE_FILE=""
usage() {
cat <<'EOF'
Usage:
notify-telegram.sh --message <text> [--bot-token <token>] [--chat-id <id>] [--api-base-url <url>]
notify-telegram.sh --message-file <path> [--bot-token <token>] [--chat-id <id>] [--api-base-url <url>]
Environment fallbacks:
TELEGRAM_BOT_TOKEN
TELEGRAM_CHAT_ID
TELEGRAM_API_BASE_URL
TELEGRAM_PARSE_MODE
EOF
}
fail_usage() {
echo "Error: $*" >&2
usage >&2
exit 2
}
parse_args() {
while [[ $# -gt 0 ]]; do
case "$1" in
--bot-token)
BOT_TOKEN=${2:-}
shift 2
;;
--chat-id)
CHAT_ID=${2:-}
shift 2
;;
--api-base-url)
API_BASE_URL=${2:-}
shift 2
;;
--message)
MESSAGE=${2:-}
shift 2
;;
--message-file)
MESSAGE_FILE=${2:-}
shift 2
;;
--help|-h)
usage
exit 0
;;
*)
fail_usage "unknown argument: $1"
;;
esac
done
if [[ -n "$MESSAGE" && -n "$MESSAGE_FILE" ]]; then
fail_usage "use either --message or --message-file, not both"
fi
if [[ -n "$MESSAGE_FILE" ]]; then
[[ -r "$MESSAGE_FILE" ]] || fail_usage "message file is not readable: $MESSAGE_FILE"
MESSAGE=$(<"$MESSAGE_FILE")
fi
[[ -n "$MESSAGE" ]] || fail_usage "message is required"
[[ -n "$BOT_TOKEN" ]] || fail_usage "bot token is required (use --bot-token or TELEGRAM_BOT_TOKEN)"
[[ -n "$CHAT_ID" ]] || fail_usage "chat id is required (use --chat-id or TELEGRAM_CHAT_ID)"
command -v curl >/dev/null 2>&1 || fail_usage "curl is required"
if [[ ${#MESSAGE} -gt "$MAX_MESSAGE_LENGTH" ]]; then
MESSAGE=${MESSAGE:0:$MAX_MESSAGE_LENGTH}
fi
}
main() {
parse_args "$@"
curl -fsS -X POST \
"${API_BASE_URL%/}/bot${BOT_TOKEN}/sendMessage" \
--data-urlencode "chat_id=${CHAT_ID}" \
--data-urlencode "text=${MESSAGE}" \
--data-urlencode "parse_mode=${PARSE_MODE}" \
--data-urlencode "disable_web_page_preview=true" \
>/dev/null
}
main "$@"

View File

@@ -2,6 +2,7 @@
set -euo pipefail set -euo pipefail
DEFAULT_POLL_SECONDS=10 DEFAULT_POLL_SECONDS=10
DEFAULT_HEARTBEAT_SECONDS=60
DEFAULT_SOFT_TIMEOUT_SECONDS=600 DEFAULT_SOFT_TIMEOUT_SECONDS=600
DEFAULT_STALL_WARNING_SECONDS=300 DEFAULT_STALL_WARNING_SECONDS=300
DEFAULT_HARD_TIMEOUT_SECONDS=1800 DEFAULT_HARD_TIMEOUT_SECONDS=1800
@@ -12,7 +13,9 @@ COMMAND_FILE=""
STDOUT_FILE="" STDOUT_FILE=""
STDERR_FILE="" STDERR_FILE=""
STATUS_FILE="" STATUS_FILE=""
SUCCESS_FILES=()
POLL_SECONDS=$DEFAULT_POLL_SECONDS POLL_SECONDS=$DEFAULT_POLL_SECONDS
HEARTBEAT_SECONDS=$DEFAULT_HEARTBEAT_SECONDS
SOFT_TIMEOUT_SECONDS=$DEFAULT_SOFT_TIMEOUT_SECONDS SOFT_TIMEOUT_SECONDS=$DEFAULT_SOFT_TIMEOUT_SECONDS
STALL_WARNING_SECONDS=$DEFAULT_STALL_WARNING_SECONDS STALL_WARNING_SECONDS=$DEFAULT_STALL_WARNING_SECONDS
HARD_TIMEOUT_SECONDS=$DEFAULT_HARD_TIMEOUT_SECONDS HARD_TIMEOUT_SECONDS=$DEFAULT_HARD_TIMEOUT_SECONDS
@@ -29,7 +32,9 @@ Usage:
--stdout-file <path> \ --stdout-file <path> \
--stderr-file <path> \ --stderr-file <path> \
--status-file <path> \ --status-file <path> \
[--success-file <path>] \
[--poll-seconds <int>] \ [--poll-seconds <int>] \
[--heartbeat-seconds <int>] \
[--soft-timeout-seconds <int>] \ [--soft-timeout-seconds <int>] \
[--stall-warning-seconds <int>] \ [--stall-warning-seconds <int>] \
[--hard-timeout-seconds <int>] [--hard-timeout-seconds <int>]
@@ -55,6 +60,25 @@ escape_note() {
printf '%s' "$note" printf '%s' "$note"
} }
join_success_files() {
if [[ ${#SUCCESS_FILES[@]} -eq 0 ]]; then
printf ''
return 0
fi
local joined=""
local path
for path in "${SUCCESS_FILES[@]}"; do
if [[ -n "$joined" ]]; then
joined+=", "
fi
joined+="$path"
done
printf '%s' "$joined"
}
iso_timestamp() { iso_timestamp() {
date -u +"%Y-%m-%dT%H:%M:%SZ" date -u +"%Y-%m-%dT%H:%M:%SZ"
} }
@@ -147,10 +171,18 @@ parse_args() {
STATUS_FILE=${2:-} STATUS_FILE=${2:-}
shift 2 shift 2
;; ;;
--success-file)
SUCCESS_FILES+=("${2:-}")
shift 2
;;
--poll-seconds) --poll-seconds)
POLL_SECONDS=${2:-} POLL_SECONDS=${2:-}
shift 2 shift 2
;; ;;
--heartbeat-seconds)
HEARTBEAT_SECONDS=${2:-}
shift 2
;;
--soft-timeout-seconds) --soft-timeout-seconds)
SOFT_TIMEOUT_SECONDS=${2:-} SOFT_TIMEOUT_SECONDS=${2:-}
shift 2 shift 2
@@ -179,11 +211,13 @@ parse_args() {
[[ -n "$STATUS_FILE" ]] || fail_usage "--status-file is required" [[ -n "$STATUS_FILE" ]] || fail_usage "--status-file is required"
require_integer "poll-seconds" "$POLL_SECONDS" require_integer "poll-seconds" "$POLL_SECONDS"
require_integer "heartbeat-seconds" "$HEARTBEAT_SECONDS"
require_integer "soft-timeout-seconds" "$SOFT_TIMEOUT_SECONDS" require_integer "soft-timeout-seconds" "$SOFT_TIMEOUT_SECONDS"
require_integer "stall-warning-seconds" "$STALL_WARNING_SECONDS" require_integer "stall-warning-seconds" "$STALL_WARNING_SECONDS"
require_integer "hard-timeout-seconds" "$HARD_TIMEOUT_SECONDS" require_integer "hard-timeout-seconds" "$HARD_TIMEOUT_SECONDS"
[[ "$POLL_SECONDS" -gt 0 ]] || fail_usage "poll-seconds must be > 0" [[ "$POLL_SECONDS" -gt 0 ]] || fail_usage "poll-seconds must be > 0"
[[ "$HEARTBEAT_SECONDS" -gt 0 ]] || fail_usage "heartbeat-seconds must be > 0"
[[ "$SOFT_TIMEOUT_SECONDS" -gt 0 ]] || fail_usage "soft-timeout-seconds must be > 0" [[ "$SOFT_TIMEOUT_SECONDS" -gt 0 ]] || fail_usage "soft-timeout-seconds must be > 0"
[[ "$STALL_WARNING_SECONDS" -gt 0 ]] || fail_usage "stall-warning-seconds must be > 0" [[ "$STALL_WARNING_SECONDS" -gt 0 ]] || fail_usage "stall-warning-seconds must be > 0"
[[ "$HARD_TIMEOUT_SECONDS" -gt 0 ]] || fail_usage "hard-timeout-seconds must be > 0" [[ "$HARD_TIMEOUT_SECONDS" -gt 0 ]] || fail_usage "hard-timeout-seconds must be > 0"
@@ -227,8 +261,10 @@ main() {
local last_stdout_bytes=0 local last_stdout_bytes=0
local last_stderr_bytes=0 local last_stderr_bytes=0
local last_output_change_time=$START_TIME local last_output_change_time=$START_TIME
local last_heartbeat_time=$START_TIME
local soft_timeout_logged=0 local soft_timeout_logged=0
local stall_warning_logged=0 local stall_warning_logged=0
local heartbeat_count=0
while kill -0 "$CHILD_PID" 2>/dev/null; do while kill -0 "$CHILD_PID" 2>/dev/null; do
sleep "$POLL_SECONDS" sleep "$POLL_SECONDS"
@@ -239,6 +275,12 @@ main() {
stdout_bytes=$(file_bytes "$STDOUT_FILE") stdout_bytes=$(file_bytes "$STDOUT_FILE")
stderr_bytes=$(file_bytes "$STDERR_FILE") stderr_bytes=$(file_bytes "$STDERR_FILE")
if [[ $((now - last_heartbeat_time)) -ge "$HEARTBEAT_SECONDS" ]]; then
heartbeat_count=$((heartbeat_count + 1))
append_status info in-progress "In progress ${heartbeat_count}"
last_heartbeat_time=$now
fi
if [[ "$stdout_bytes" -ne "$last_stdout_bytes" || "$stderr_bytes" -ne "$last_stderr_bytes" ]]; then if [[ "$stdout_bytes" -ne "$last_stdout_bytes" || "$stderr_bytes" -ne "$last_stderr_bytes" ]]; then
last_output_change_time=$now last_output_change_time=$now
stall_warning_logged=0 stall_warning_logged=0
@@ -285,6 +327,7 @@ main() {
trap - EXIT trap - EXIT
local final_stdout_bytes final_stderr_bytes local final_stdout_bytes final_stderr_bytes
local success_file success_bytes
final_stdout_bytes=$(file_bytes "$STDOUT_FILE") final_stdout_bytes=$(file_bytes "$STDOUT_FILE")
final_stderr_bytes=$(file_bytes "$STDERR_FILE") final_stderr_bytes=$(file_bytes "$STDERR_FILE")
@@ -294,6 +337,16 @@ main() {
exit 0 exit 0
fi fi
if [[ ${#SUCCESS_FILES[@]} -gt 0 ]]; then
for success_file in "${SUCCESS_FILES[@]}"; do
success_bytes=$(file_bytes "$success_file")
if [[ "$success_bytes" -gt 0 ]]; then
append_status info completed "reviewer completed successfully via success file $(join_success_files)"
exit 0
fi
done
fi
append_status error completed-empty-output "reviewer exited successfully with empty stdout" append_status error completed-empty-output "reviewer exited successfully with empty stdout"
exit "$EXIT_COMPLETED_EMPTY_OUTPUT" exit "$EXIT_COMPLETED_EMPTY_OUTPUT"
fi fi

View File

@@ -34,6 +34,9 @@ check_skill_file() {
assert_contains "$file" '$(cat /tmp/' assert_contains "$file" '$(cat /tmp/'
assert_contains "$file" "--strict-mcp-config" assert_contains "$file" "--strict-mcp-config"
assert_contains "$file" "--setting-sources user" assert_contains "$file" "--setting-sources user"
assert_contains "$file" "### P0"
assert_contains "$file" "In progress N"
assert_contains "$file" "notify-telegram.sh"
assert_not_contains "$file" "--allowedTools Read" assert_not_contains "$file" "--allowedTools Read"
} }

82
skills/reviewer-runtime/tests/smoke-test.sh Normal file → Executable file
View File

@@ -66,6 +66,28 @@ run_helper() {
return "$exit_code" return "$exit_code"
} }
run_helper_allowing_success_file() {
local command_file=$1
local stdout_file=$2
local stderr_file=$3
local status_file=$4
local success_file=$5
shift 5
set +e
"$HELPER_PATH" \
--command-file "$command_file" \
--stdout-file "$stdout_file" \
--stderr-file "$stderr_file" \
--status-file "$status_file" \
--success-file "$success_file" \
"$@"
local exit_code=$?
set -e
return "$exit_code"
}
test_delayed_success() { test_delayed_success() {
local dir=$1 local dir=$1
local command_file=$dir/delayed-success.sh local command_file=$dir/delayed-success.sh
@@ -120,6 +142,36 @@ printf "completed after soft timeout\n"
assert_file_contains "$status_file" "state=completed" assert_file_contains "$status_file" "state=completed"
} }
test_in_progress_heartbeats() {
local dir=$1
local command_file=$dir/in-progress-heartbeats.sh
local stdout_file=$dir/in-progress-heartbeats.stdout
local stderr_file=$dir/in-progress-heartbeats.stderr
local status_file=$dir/in-progress-heartbeats.status
make_command "$command_file" '
sleep 3
printf "finished with heartbeat coverage\n"
'
if run_helper "$command_file" "$stdout_file" "$stderr_file" "$status_file" \
--poll-seconds 1 \
--heartbeat-seconds 1 \
--soft-timeout-seconds 5 \
--stall-warning-seconds 4 \
--hard-timeout-seconds 10; then
local exit_code=0
else
local exit_code=$?
fi
assert_exit_code "$exit_code" 0
assert_file_contains "$stdout_file" "finished with heartbeat coverage"
assert_file_contains "$status_file" "state=in-progress"
assert_file_contains "$status_file" "In progress 1"
assert_file_contains "$status_file" "In progress 2"
}
test_nonzero_failure() { test_nonzero_failure() {
local dir=$1 local dir=$1
local command_file=$dir/nonzero-failure.sh local command_file=$dir/nonzero-failure.sh
@@ -173,6 +225,34 @@ exit 0
assert_file_contains "$status_file" "state=completed-empty-output" assert_file_contains "$status_file" "state=completed-empty-output"
} }
test_success_file_allows_empty_stdout() {
local dir=$1
local command_file=$dir/success-file.sh
local stdout_file=$dir/success-file.stdout
local stderr_file=$dir/success-file.stderr
local status_file=$dir/success-file.status
local success_file=$dir/review-output.md
make_command "$command_file" "
printf 'review body from redirected file\\n' > \"$success_file\"
exit 0
"
if run_helper_allowing_success_file "$command_file" "$stdout_file" "$stderr_file" "$status_file" "$success_file" \
--poll-seconds 1 \
--soft-timeout-seconds 5 \
--stall-warning-seconds 3 \
--hard-timeout-seconds 10; then
local exit_code=0
else
local exit_code=$?
fi
assert_exit_code "$exit_code" 0
assert_file_contains "$success_file" "review body from redirected file"
assert_file_contains "$status_file" "state=completed"
}
test_signal_cleanup() { test_signal_cleanup() {
local dir=$1 local dir=$1
local command_file=$dir/signal-child.sh local command_file=$dir/signal-child.sh
@@ -252,8 +332,10 @@ main() {
test_delayed_success "$tmp_dir" test_delayed_success "$tmp_dir"
test_soft_timeout_continues "$tmp_dir" test_soft_timeout_continues "$tmp_dir"
test_in_progress_heartbeats "$tmp_dir"
test_nonzero_failure "$tmp_dir" test_nonzero_failure "$tmp_dir"
test_empty_output_is_terminal "$tmp_dir" test_empty_output_is_terminal "$tmp_dir"
test_success_file_allows_empty_stdout "$tmp_dir"
test_signal_cleanup "$tmp_dir" test_signal_cleanup "$tmp_dir"
test_hard_timeout_escalation "$tmp_dir" test_hard_timeout_escalation "$tmp_dir"

View File

@@ -0,0 +1,158 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
HELPER_PATH=$(cd "$SCRIPT_DIR/.." && pwd)/notify-telegram.sh
fail() {
echo "FAIL: $*" >&2
exit 1
}
assert_equals() {
local actual=$1
local expected=$2
if [[ "$actual" != "$expected" ]]; then
fail "expected '$expected', got '$actual'"
fi
}
assert_file_contains() {
local file=$1
local pattern=$2
if ! grep -qF -- "$pattern" "$file"; then
echo "--- $file ---" >&2
sed -n '1,200p' "$file" >&2 || true
fail "expected '$pattern' in $file"
fi
}
capture_curl() {
local bin_dir=$1
local curl_args_file=$2
mkdir -p "$bin_dir"
cat >"$bin_dir/curl" <<EOF
#!/usr/bin/env bash
printf '%s\n' "\$@" >"$curl_args_file"
printf '{"ok":true}\n'
EOF
chmod +x "$bin_dir/curl"
}
read_curl_value() {
local file=$1
local key=$2
awk -v prefix="${key}=" 'index($0, prefix) == 1 { print substr($0, length(prefix) + 1); exit }' "$file"
}
test_missing_credentials() {
local tmp_dir=$1
local output_file=$tmp_dir/missing-credentials.out
set +e
env -u TELEGRAM_BOT_TOKEN -u TELEGRAM_CHAT_ID \
"$HELPER_PATH" --message "hello" >"$output_file" 2>&1
local exit_code=$?
set -e
[[ "$exit_code" -eq 2 ]] || fail "expected exit code 2 for missing credentials, got $exit_code"
assert_file_contains "$output_file" "bot token is required"
}
test_rejects_message_and_message_file_together() {
local tmp_dir=$1
local message_file=$tmp_dir/message.txt
local output_file=$tmp_dir/message-and-file.out
printf 'hello from file\n' >"$message_file"
set +e
TELEGRAM_BOT_TOKEN=test-token \
TELEGRAM_CHAT_ID=123456 \
"$HELPER_PATH" --message "hello" --message-file "$message_file" >"$output_file" 2>&1
local exit_code=$?
set -e
[[ "$exit_code" -eq 2 ]] || fail "expected exit code 2 for mutually exclusive arguments, got $exit_code"
assert_file_contains "$output_file" "use either --message or --message-file, not both"
}
test_successful_request() {
local tmp_dir=$1
local bin_dir=$tmp_dir/bin
local curl_args_file=$tmp_dir/curl-args.txt
capture_curl "$bin_dir" "$curl_args_file"
PATH="$bin_dir:$PATH" \
TELEGRAM_BOT_TOKEN=test-token \
TELEGRAM_CHAT_ID=123456 \
"$HELPER_PATH" --message "Plan completed"
assert_file_contains "$curl_args_file" "https://api.telegram.org/bottest-token/sendMessage"
assert_file_contains "$curl_args_file" "chat_id=123456"
assert_file_contains "$curl_args_file" "text=Plan completed"
assert_file_contains "$curl_args_file" "disable_web_page_preview=true"
assert_file_contains "$curl_args_file" "parse_mode=HTML"
}
test_message_file_and_custom_api_base() {
local tmp_dir=$1
local bin_dir=$tmp_dir/bin-message-file
local curl_args_file=$tmp_dir/curl-message-file.txt
local message_file=$tmp_dir/telegram-message.txt
capture_curl "$bin_dir" "$curl_args_file"
printf 'Plan completed from file\n' >"$message_file"
PATH="$bin_dir:$PATH" \
TELEGRAM_BOT_TOKEN=test-token \
TELEGRAM_CHAT_ID=654321 \
"$HELPER_PATH" \
--message-file "$message_file" \
--api-base-url "https://telegram.example.test/custom"
assert_file_contains "$curl_args_file" "https://telegram.example.test/custom/bottest-token/sendMessage"
assert_file_contains "$curl_args_file" "chat_id=654321"
assert_file_contains "$curl_args_file" "text=Plan completed from file"
}
test_truncates_long_message() {
local tmp_dir=$1
local bin_dir=$tmp_dir/bin-truncate
local curl_args_file=$tmp_dir/curl-truncate.txt
local long_message_file=$tmp_dir/long-message.txt
local truncated_message
capture_curl "$bin_dir" "$curl_args_file"
python3 - <<'PY' >"$long_message_file"
print("A" * 5000, end="")
PY
PATH="$bin_dir:$PATH" \
TELEGRAM_BOT_TOKEN=test-token \
TELEGRAM_CHAT_ID=123456 \
"$HELPER_PATH" --message-file "$long_message_file"
truncated_message=$(read_curl_value "$curl_args_file" "text")
assert_equals "${#truncated_message}" "4096"
}
main() {
[[ -x "$HELPER_PATH" ]] || fail "helper is not executable: $HELPER_PATH"
local tmp_dir
tmp_dir=$(mktemp -d)
trap "rm -rf '$tmp_dir'" EXIT
test_missing_credentials "$tmp_dir"
test_rejects_message_and_message_file_together "$tmp_dir"
test_successful_request "$tmp_dir"
test_message_file_and_custom_api_base "$tmp_dir"
test_truncates_long_message "$tmp_dir"
echo "PASS: telegram notifier tests"
}
main "$@"