diff --git a/docs/IMPLEMENT-PLAN.md b/docs/IMPLEMENT-PLAN.md index 2a0cb31..1ed7f12 100644 --- a/docs/IMPLEMENT-PLAN.md +++ b/docs/IMPLEMENT-PLAN.md @@ -132,6 +132,8 @@ Verify Superpowers execution dependencies exist in your agent skills root: - Executes milestones one-by-one, tracking stories in `story-tracker.md`. - Runs lint/typecheck/tests as a gate before each milestone review. - Sends each milestone to a reviewer CLI for approval (max rounds configurable, default 10). +- Runs reviewer commands through `reviewer-runtime/run-review.sh` when available, with fallback to direct synchronous execution only if the helper is missing. +- Captures reviewer stderr and helper status logs for diagnostics and retains them on failed, empty-output, or operator-decision review rounds. - Commits each milestone locally only after reviewer approval (does not push). - After all milestones approved, merges worktree branch to parent and deletes worktree. - Supports resume: detects existing worktree and `in-dev`/`completed` stories. @@ -141,11 +143,38 @@ Verify Superpowers execution dependencies exist in your agent skills root: After each milestone is implemented and verified, the skill sends it to a second model for review: 1. **Configure** — user picks a reviewer CLI (`codex`, `claude`, `cursor`) and model, or skips -2. **Submit** — milestone spec, acceptance criteria, git diff, and verification output are written to a temp file and sent to the reviewer in read-only/ask mode -3. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security -4. **Revise** — the implementing agent addresses each issue, re-verifies, and re-submits -5. **Repeat** — up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED` -6. **Approve** — milestone is marked approved in `story-tracker.md` +2. **Prepare** — milestone payload and a bash reviewer command script are written to temp files +3. **Run** — the command script is executed through `reviewer-runtime/run-review.sh` when installed +4. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security +5. **Revise** — the implementing agent addresses each issue, re-verifies, and re-submits +6. **Repeat** — up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED` +7. **Approve** — milestone is marked approved in `story-tracker.md` + +### Runtime Artifacts + +The milestone review flow may create these temp artifacts: + +- `/tmp/milestone-.md` — milestone review payload +- `/tmp/milestone-review-.md` — normalized review text +- `/tmp/milestone-review-.json` — raw Cursor JSON output +- `/tmp/milestone-review-.stderr` — reviewer stderr +- `/tmp/milestone-review-.status` — helper heartbeat/status log +- `/tmp/milestone-review-.runner.out` — helper-managed stdout from the reviewer command process +- `/tmp/milestone-review-.sh` — reviewer command script + +Status log lines use this format: + +```text +ts= level= state= elapsed_s= pid= stdout_bytes= stderr_bytes= note="" +``` + +`stall-warning` is a heartbeat/status-log state only. It is not a terminal review result. + +### Failure Handling + +- `completed-empty-output` means the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause. +- `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to keep waiting, abort, or retry with different parameters. +- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds should retain `.stderr`, `.status`, and `.runner.out` until diagnosed. ### Supported Reviewer CLIs @@ -155,6 +184,26 @@ After each milestone is implemented and verified, the skill sends it to a second | `claude` | `claude -p --model --allowedTools Read` | No (fresh call each round) | `--allowedTools Read` | | `cursor` | `cursor-agent -p --mode=ask --model --trust --output-format json` | Yes (`--resume `) | `--mode=ask` | +For all three CLIs, the preferred execution path is: + +1. write the reviewer command to a bash script +2. run that script through `reviewer-runtime/run-review.sh` +3. fall back to direct synchronous execution only if the helper is missing or not executable + +The helper also supports manual override flags for diagnostics: + +```bash +run-review.sh \ + --command-file \ + --stdout-file \ + --stderr-file \ + --status-file \ + --poll-seconds 10 \ + --soft-timeout-seconds 600 \ + --stall-warning-seconds 300 \ + --hard-timeout-seconds 1800 +``` + ## Variant Hardening Notes ### Claude Code diff --git a/skills/implement-plan/claude-code/SKILL.md b/skills/implement-plan/claude-code/SKILL.md index e739dc8..1f702b1 100644 --- a/skills/implement-plan/claude-code/SKILL.md +++ b/skills/implement-plan/claude-code/SKILL.md @@ -136,7 +136,20 @@ Do NOT push. After committing: REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) ``` -Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`. +Use `REVIEW_ID` for all milestone review temp file paths: +- `/tmp/milestone-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.json` +- `/tmp/milestone-review-${REVIEW_ID}.stderr` +- `/tmp/milestone-review-${REVIEW_ID}.status` +- `/tmp/milestone-review-${REVIEW_ID}.runner.out` +- `/tmp/milestone-review-${REVIEW_ID}.sh` + +Resolve the shared runtime helper path before writing the command script: + +```bash +REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh +``` #### Step 2: Write Review Payload @@ -165,6 +178,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`: #### Step 3: Submit to Reviewer (Round 1) +Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: + +```bash +#!/usr/bin/env bash +set -euo pipefail +``` + **If `REVIEWER_CLI` is `codex`:** ```bash @@ -185,7 +205,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" ``` -Capture the Codex session ID from output (line `session id: `). Store as `CODEX_SESSION_ID`. +Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: `), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. **If `REVIEWER_CLI` is `claude`:** @@ -203,8 +223,7 @@ Evaluate: Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" \ --model ${REVIEWER_MODEL} \ - --allowedTools Read \ - > /tmp/milestone-review-${REVIEW_ID}.md + --allowedTools Read ``` **If `REVIEWER_CLI` is `cursor`:** @@ -229,19 +248,43 @@ If changes are needed, end with exactly: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json ``` -Extract session ID and review text (requires `jq`): +For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent. + +Run the command script through the shared helper when available: + +```bash +if [ -x "$REVIEWER_RUNTIME" ]; then + "$REVIEWER_RUNTIME" \ + --command-file /tmp/milestone-review-${REVIEW_ID}.sh \ + --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ + --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ + --status-file /tmp/milestone-review-${REVIEW_ID}.status +else + echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 + bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr +fi +``` + +After the command completes: +- If `REVIEWER_CLI=cursor`, extract the final review text: ```bash CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json) jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` -If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent. +- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. + +Fallback is allowed only when the helper is missing or not executable. #### Step 4: Read Review & Check Verdict 1. Read `/tmp/milestone-review-${REVIEW_ID}.md` -2. Present review to the user: +2. If the review failed, produced empty output, or reached helper timeout, also read: + - `/tmp/milestone-review-${REVIEW_ID}.stderr` + - `/tmp/milestone-review-${REVIEW_ID}.status` + - `/tmp/milestone-review-${REVIEW_ID}.runner.out` +3. Present review to the user: ``` ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL}) @@ -249,10 +292,12 @@ If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivale [Reviewer feedback] ``` -3. Check verdict: +4. Check verdict: - **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve) - **VERDICT: REVISE** -> go to Step 5 - No clear verdict but positive / no actionable items -> treat as approved + - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry + - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) #### Step 5: Address Feedback & Re-verify @@ -272,6 +317,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it #### Step 6: Re-submit to Reviewer (Rounds 2-N) +Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly. + **If `REVIEWER_CLI` is `codex`:** Resume the existing session: @@ -330,20 +377,31 @@ Changes made: Re-review. If solid, end with: VERDICT: APPROVED If more changes needed, end with: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json - -jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds. +Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response. + +After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1. + Return to Step 4. #### Step 7: Cleanup Per Milestone ```bash -rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json +rm -f \ + /tmp/milestone-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.json \ + /tmp/milestone-review-${REVIEW_ID}.stderr \ + /tmp/milestone-review-${REVIEW_ID}.status \ + /tmp/milestone-review-${REVIEW_ID}.runner.out \ + /tmp/milestone-review-${REVIEW_ID}.sh ``` +If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately. + ### Phase 6: Completion (REQUIRED SUB-SKILL) After all milestones are approved and committed: diff --git a/skills/implement-plan/codex/SKILL.md b/skills/implement-plan/codex/SKILL.md index e16b3be..f748376 100644 --- a/skills/implement-plan/codex/SKILL.md +++ b/skills/implement-plan/codex/SKILL.md @@ -169,7 +169,20 @@ Do NOT push. After committing: REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) ``` -Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`. +Use `REVIEW_ID` for all milestone review temp file paths: +- `/tmp/milestone-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.json` +- `/tmp/milestone-review-${REVIEW_ID}.stderr` +- `/tmp/milestone-review-${REVIEW_ID}.status` +- `/tmp/milestone-review-${REVIEW_ID}.runner.out` +- `/tmp/milestone-review-${REVIEW_ID}.sh` + +Resolve the shared runtime helper path before writing the command script: + +```bash +REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh +``` #### Step 2: Write Review Payload @@ -198,6 +211,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`: #### Step 3: Submit to Reviewer (Round 1) +Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: + +```bash +#!/usr/bin/env bash +set -euo pipefail +``` + **If `REVIEWER_CLI` is `codex`:** ```bash @@ -218,7 +238,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" ``` -Capture the Codex session ID from output (line `session id: `). Store as `CODEX_SESSION_ID`. +Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: `), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. **If `REVIEWER_CLI` is `claude`:** @@ -236,8 +256,7 @@ Evaluate: Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" \ --model ${REVIEWER_MODEL} \ - --allowedTools Read \ - > /tmp/milestone-review-${REVIEW_ID}.md + --allowedTools Read ``` **If `REVIEWER_CLI` is `cursor`:** @@ -262,19 +281,43 @@ If changes are needed, end with exactly: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json ``` -Extract session ID and review text (requires `jq`): +For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent. + +Run the command script through the shared helper when available: + +```bash +if [ -x "$REVIEWER_RUNTIME" ]; then + "$REVIEWER_RUNTIME" \ + --command-file /tmp/milestone-review-${REVIEW_ID}.sh \ + --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ + --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ + --status-file /tmp/milestone-review-${REVIEW_ID}.status +else + echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 + bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr +fi +``` + +After the command completes: +- If `REVIEWER_CLI=cursor`, extract the final review text: ```bash CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json) jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` -If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent. +- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. + +Fallback is allowed only when the helper is missing or not executable. #### Step 4: Read Review & Check Verdict 1. Read `/tmp/milestone-review-${REVIEW_ID}.md` -2. Present review to the user: +2. If the review failed, produced empty output, or reached helper timeout, also read: + - `/tmp/milestone-review-${REVIEW_ID}.stderr` + - `/tmp/milestone-review-${REVIEW_ID}.status` + - `/tmp/milestone-review-${REVIEW_ID}.runner.out` +3. Present review to the user: ``` ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL}) @@ -282,10 +325,12 @@ If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivale [Reviewer feedback] ``` -3. Check verdict: +4. Check verdict: - **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve) - **VERDICT: REVISE** -> go to Step 5 - No clear verdict but positive / no actionable items -> treat as approved + - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry + - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) #### Step 5: Address Feedback & Re-verify @@ -305,6 +350,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it #### Step 6: Re-submit to Reviewer (Rounds 2-N) +Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly. + **If `REVIEWER_CLI` is `codex`:** Resume the existing session: @@ -363,20 +410,31 @@ Changes made: Re-review. If solid, end with: VERDICT: APPROVED If more changes needed, end with: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json - -jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds. +Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response. + +After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1. + Return to Step 4. #### Step 7: Cleanup Per Milestone ```bash -rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json +rm -f \ + /tmp/milestone-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.json \ + /tmp/milestone-review-${REVIEW_ID}.stderr \ + /tmp/milestone-review-${REVIEW_ID}.status \ + /tmp/milestone-review-${REVIEW_ID}.runner.out \ + /tmp/milestone-review-${REVIEW_ID}.sh ``` +If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately. + ### Phase 6: Completion (REQUIRED SUB-SKILL) After all milestones are approved and committed: diff --git a/skills/implement-plan/cursor/SKILL.md b/skills/implement-plan/cursor/SKILL.md index 9191171..17f4846 100644 --- a/skills/implement-plan/cursor/SKILL.md +++ b/skills/implement-plan/cursor/SKILL.md @@ -169,7 +169,24 @@ Do NOT push. After committing: REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) ``` -Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`. +Use `REVIEW_ID` for all milestone review temp file paths: +- `/tmp/milestone-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.json` +- `/tmp/milestone-review-${REVIEW_ID}.stderr` +- `/tmp/milestone-review-${REVIEW_ID}.status` +- `/tmp/milestone-review-${REVIEW_ID}.runner.out` +- `/tmp/milestone-review-${REVIEW_ID}.sh` + +Resolve the shared runtime helper path before writing the command script: + +```bash +if [ -x .cursor/skills/reviewer-runtime/run-review.sh ]; then + REVIEWER_RUNTIME=.cursor/skills/reviewer-runtime/run-review.sh +else + REVIEWER_RUNTIME=~/.cursor/skills/reviewer-runtime/run-review.sh +fi +``` #### Step 2: Write Review Payload @@ -198,6 +215,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`: #### Step 3: Submit to Reviewer (Round 1) +Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: + +```bash +#!/usr/bin/env bash +set -euo pipefail +``` + **If `REVIEWER_CLI` is `codex`:** ```bash @@ -218,7 +242,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" ``` -Capture the Codex session ID from output (line `session id: `). Store as `CODEX_SESSION_ID`. +Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: `), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. **If `REVIEWER_CLI` is `claude`:** @@ -236,8 +260,7 @@ Evaluate: Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" \ --model ${REVIEWER_MODEL} \ - --allowedTools Read \ - > /tmp/milestone-review-${REVIEW_ID}.md + --allowedTools Read ``` **If `REVIEWER_CLI` is `cursor`:** @@ -262,12 +285,7 @@ If changes are needed, end with exactly: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json ``` -Extract session ID and review text: - -```bash -CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json) -jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md -``` +For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. Notes on Cursor flags: - `--mode=ask` — read-only mode, no file modifications @@ -275,10 +293,41 @@ Notes on Cursor flags: - `-p` / `--print` — non-interactive mode, output to stdout - `--output-format json` — structured output with `session_id` and `result` fields +Run the command script through the shared helper when available: + +```bash +if [ -x "$REVIEWER_RUNTIME" ]; then + "$REVIEWER_RUNTIME" \ + --command-file /tmp/milestone-review-${REVIEW_ID}.sh \ + --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ + --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ + --status-file /tmp/milestone-review-${REVIEW_ID}.status +else + echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 + bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr +fi +``` + +After the command completes: +- If `REVIEWER_CLI=cursor`, extract the final review text: + +```bash +CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json) +jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md +``` + +- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. + +Fallback is allowed only when the helper is missing or not executable. + #### Step 4: Read Review & Check Verdict 1. Read `/tmp/milestone-review-${REVIEW_ID}.md` -2. Present review to the user: +2. If the review failed, produced empty output, or reached helper timeout, also read: + - `/tmp/milestone-review-${REVIEW_ID}.stderr` + - `/tmp/milestone-review-${REVIEW_ID}.status` + - `/tmp/milestone-review-${REVIEW_ID}.runner.out` +3. Present review to the user: ``` ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL}) @@ -286,10 +335,12 @@ Notes on Cursor flags: [Reviewer feedback] ``` -3. Check verdict: +4. Check verdict: - **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve) - **VERDICT: REVISE** -> go to Step 5 - No clear verdict but positive / no actionable items -> treat as approved + - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry + - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) #### Step 5: Address Feedback & Re-verify @@ -309,6 +360,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it #### Step 6: Re-submit to Reviewer (Rounds 2-N) +Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly. + **If `REVIEWER_CLI` is `codex`:** Resume the existing session: @@ -367,20 +420,31 @@ Changes made: Re-review. If solid, end with: VERDICT: APPROVED If more changes needed, end with: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json - -jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds. +Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response. + +After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1. + Return to Step 4. #### Step 7: Cleanup Per Milestone ```bash -rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json +rm -f \ + /tmp/milestone-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.json \ + /tmp/milestone-review-${REVIEW_ID}.stderr \ + /tmp/milestone-review-${REVIEW_ID}.status \ + /tmp/milestone-review-${REVIEW_ID}.runner.out \ + /tmp/milestone-review-${REVIEW_ID}.sh ``` +If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately. + ### Phase 6: Completion (REQUIRED SUB-SKILL) After all milestones are approved and committed: diff --git a/skills/implement-plan/opencode/SKILL.md b/skills/implement-plan/opencode/SKILL.md index 13d1a43..8efb202 100644 --- a/skills/implement-plan/opencode/SKILL.md +++ b/skills/implement-plan/opencode/SKILL.md @@ -154,7 +154,20 @@ Do NOT push. After committing: REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8) ``` -Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`. +Use `REVIEW_ID` for all milestone review temp file paths: +- `/tmp/milestone-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.md` +- `/tmp/milestone-review-${REVIEW_ID}.json` +- `/tmp/milestone-review-${REVIEW_ID}.stderr` +- `/tmp/milestone-review-${REVIEW_ID}.status` +- `/tmp/milestone-review-${REVIEW_ID}.runner.out` +- `/tmp/milestone-review-${REVIEW_ID}.sh` + +Resolve the shared runtime helper path before writing the command script: + +```bash +REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh +``` #### Step 2: Write Review Payload @@ -183,6 +196,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`: #### Step 3: Submit to Reviewer (Round 1) +Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script: + +```bash +#!/usr/bin/env bash +set -euo pipefail +``` + **If `REVIEWER_CLI` is `codex`:** ```bash @@ -203,7 +223,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" ``` -Capture the Codex session ID from output (line `session id: `). Store as `CODEX_SESSION_ID`. +Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: `), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds. **If `REVIEWER_CLI` is `claude`:** @@ -221,8 +241,7 @@ Evaluate: Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED If changes are needed, end with exactly: VERDICT: REVISE" \ --model ${REVIEWER_MODEL} \ - --allowedTools Read \ - > /tmp/milestone-review-${REVIEW_ID}.md + --allowedTools Read ``` **If `REVIEWER_CLI` is `cursor`:** @@ -247,19 +266,43 @@ If changes are needed, end with exactly: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json ``` -Extract session ID and review text (requires `jq`): +For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent. + +Run the command script through the shared helper when available: + +```bash +if [ -x "$REVIEWER_RUNTIME" ]; then + "$REVIEWER_RUNTIME" \ + --command-file /tmp/milestone-review-${REVIEW_ID}.sh \ + --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \ + --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \ + --status-file /tmp/milestone-review-${REVIEW_ID}.status +else + echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2 + bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr +fi +``` + +After the command completes: +- If `REVIEWER_CLI=cursor`, extract the final review text: ```bash CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json) jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` -If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent. +- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing. + +Fallback is allowed only when the helper is missing or not executable. #### Step 4: Read Review & Check Verdict 1. Read `/tmp/milestone-review-${REVIEW_ID}.md` -2. Present review to the user: +2. If the review failed, produced empty output, or reached helper timeout, also read: + - `/tmp/milestone-review-${REVIEW_ID}.stderr` + - `/tmp/milestone-review-${REVIEW_ID}.status` + - `/tmp/milestone-review-${REVIEW_ID}.runner.out` +3. Present review to the user: ``` ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL}) @@ -267,10 +310,12 @@ If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivale [Reviewer feedback] ``` -3. Check verdict: +4. Check verdict: - **VERDICT: APPROVED** -> proceed to Phase 5 Step 6 (commit & approve) - **VERDICT: REVISE** -> go to Step 5 - No clear verdict but positive / no actionable items -> treat as approved + - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry + - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop) #### Step 5: Address Feedback & Re-verify @@ -290,6 +335,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it #### Step 6: Re-submit to Reviewer (Rounds 2-N) +Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly. + **If `REVIEWER_CLI` is `codex`:** Resume the existing session: @@ -348,20 +395,31 @@ Changes made: Re-review. If solid, end with: VERDICT: APPROVED If more changes needed, end with: VERDICT: REVISE" \ > /tmp/milestone-review-${REVIEW_ID}.json - -jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md ``` If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds. +Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response. + +After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1. + Return to Step 4. #### Step 7: Cleanup Per Milestone ```bash -rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json +rm -f \ + /tmp/milestone-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.md \ + /tmp/milestone-review-${REVIEW_ID}.json \ + /tmp/milestone-review-${REVIEW_ID}.stderr \ + /tmp/milestone-review-${REVIEW_ID}.status \ + /tmp/milestone-review-${REVIEW_ID}.runner.out \ + /tmp/milestone-review-${REVIEW_ID}.sh ``` +If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately. + ### Phase 7: Completion (REQUIRED SUB-SKILL) After all milestones are approved and committed: