feat(implement-plan): route milestone review through shared runtime

2026-03-05 23:23:42 -06:00
parent c2d47487e0
commit 253a4f31e2
5 changed files with 340 additions and 53 deletions
@@ -136,7 +136,20 @@ Do NOT push. After committing:
 REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
 ```

-Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`.
+Use `REVIEW_ID` for all milestone review temp file paths:
+- `/tmp/milestone-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.json`
+- `/tmp/milestone-review-${REVIEW_ID}.stderr`
+- `/tmp/milestone-review-${REVIEW_ID}.status`
+- `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+- `/tmp/milestone-review-${REVIEW_ID}.sh`
+
+Resolve the shared runtime helper path before writing the command script:
+
+```bash
+REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh
+```

 #### Step 2: Write Review Payload

@@ -165,6 +178,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:

 #### Step 3: Submit to Reviewer (Round 1)

+Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+```
+
 **If `REVIEWER_CLI` is `codex`:**

 ```bash
@@ -185,7 +205,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE"
 ```

-Capture the Codex session ID from output (line `session id: <uuid>`). Store as `CODEX_SESSION_ID`.
+Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.

 **If `REVIEWER_CLI` is `claude`:**

@@ -203,8 +223,7 @@ Evaluate:
 Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE" \
  --model ${REVIEWER_MODEL} \
-  --allowedTools Read \
-  > /tmp/milestone-review-${REVIEW_ID}.md
+  --allowedTools Read
 ```

 **If `REVIEWER_CLI` is `cursor`:**
@@ -229,19 +248,43 @@ If changes are needed, end with exactly: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
 ```

-Extract session ID and review text (requires `jq`):
+For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent.
+
+Run the command script through the shared helper when available:
+
+```bash
+if [ -x "$REVIEWER_RUNTIME" ]; then
+  "$REVIEWER_RUNTIME" \
+    --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
+    --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
+    --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
+    --status-file /tmp/milestone-review-${REVIEW_ID}.status
+else
+  echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
+  bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
+fi
+```
+
+After the command completes:
+- If `REVIEWER_CLI=cursor`, extract the final review text:

 ```bash
 CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json)
 jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
 ```

-If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent.
+- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
+
+Fallback is allowed only when the helper is missing or not executable.

 #### Step 4: Read Review & Check Verdict

 1. Read `/tmp/milestone-review-${REVIEW_ID}.md`
-2. Present review to the user:
+2. If the review failed, produced empty output, or reached helper timeout, also read:
+   - `/tmp/milestone-review-${REVIEW_ID}.stderr`
+   - `/tmp/milestone-review-${REVIEW_ID}.status`
+   - `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+3. Present review to the user:

 ```
 ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL})
@@ -249,10 +292,12 @@ If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivale
 [Reviewer feedback]
 ```

-3. Check verdict:
+4. Check verdict:
   - **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve)
   - **VERDICT: REVISE** -> go to Step 5
   - No clear verdict but positive / no actionable items -> treat as approved
+   - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
+   - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters
   - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)

 #### Step 5: Address Feedback & Re-verify
@@ -272,6 +317,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it

 #### Step 6: Re-submit to Reviewer (Rounds 2-N)

+Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
+
 **If `REVIEWER_CLI` is `codex`:**

 Resume the existing session:
@@ -330,20 +377,31 @@ Changes made:
 Re-review. If solid, end with: VERDICT: APPROVED
 If more changes needed, end with: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
-
-jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
 ```

 If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds.

+Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response.
+
+After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1.
+
 Return to Step 4.

 #### Step 7: Cleanup Per Milestone

 ```bash
-rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json
+rm -f \
+  /tmp/milestone-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.json \
+  /tmp/milestone-review-${REVIEW_ID}.stderr \
+  /tmp/milestone-review-${REVIEW_ID}.status \
+  /tmp/milestone-review-${REVIEW_ID}.runner.out \
+  /tmp/milestone-review-${REVIEW_ID}.sh
 ```

+If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately.
+
 ### Phase 6: Completion (REQUIRED SUB-SKILL)

 After all milestones are approved and committed:
@@ -169,7 +169,20 @@ Do NOT push. After committing:
 REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
 ```

-Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`.
+Use `REVIEW_ID` for all milestone review temp file paths:
+- `/tmp/milestone-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.json`
+- `/tmp/milestone-review-${REVIEW_ID}.stderr`
+- `/tmp/milestone-review-${REVIEW_ID}.status`
+- `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+- `/tmp/milestone-review-${REVIEW_ID}.sh`
+
+Resolve the shared runtime helper path before writing the command script:
+
+```bash
+REVIEWER_RUNTIME=~/.codex/skills/reviewer-runtime/run-review.sh
+```

 #### Step 2: Write Review Payload

@@ -198,6 +211,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:

 #### Step 3: Submit to Reviewer (Round 1)

+Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+```
+
 **If `REVIEWER_CLI` is `codex`:**

 ```bash
@@ -218,7 +238,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE"
 ```

-Capture the Codex session ID from output (line `session id: <uuid>`). Store as `CODEX_SESSION_ID`.
+Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.

 **If `REVIEWER_CLI` is `claude`:**

@@ -236,8 +256,7 @@ Evaluate:
 Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE" \
  --model ${REVIEWER_MODEL} \
-  --allowedTools Read \
-  > /tmp/milestone-review-${REVIEW_ID}.md
+  --allowedTools Read
 ```

 **If `REVIEWER_CLI` is `cursor`:**
@@ -262,19 +281,43 @@ If changes are needed, end with exactly: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
 ```

-Extract session ID and review text (requires `jq`):
+For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent.
+
+Run the command script through the shared helper when available:
+
+```bash
+if [ -x "$REVIEWER_RUNTIME" ]; then
+  "$REVIEWER_RUNTIME" \
+    --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
+    --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
+    --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
+    --status-file /tmp/milestone-review-${REVIEW_ID}.status
+else
+  echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
+  bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
+fi
+```
+
+After the command completes:
+- If `REVIEWER_CLI=cursor`, extract the final review text:

 ```bash
 CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json)
 jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
 ```

-If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent.
+- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
+
+Fallback is allowed only when the helper is missing or not executable.

 #### Step 4: Read Review & Check Verdict

 1. Read `/tmp/milestone-review-${REVIEW_ID}.md`
-2. Present review to the user:
+2. If the review failed, produced empty output, or reached helper timeout, also read:
+   - `/tmp/milestone-review-${REVIEW_ID}.stderr`
+   - `/tmp/milestone-review-${REVIEW_ID}.status`
+   - `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+3. Present review to the user:

 ```
 ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL})
@@ -282,10 +325,12 @@ If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivale
 [Reviewer feedback]
 ```

-3. Check verdict:
+4. Check verdict:
   - **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve)
   - **VERDICT: REVISE** -> go to Step 5
   - No clear verdict but positive / no actionable items -> treat as approved
+   - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
+   - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters
   - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)

 #### Step 5: Address Feedback & Re-verify
@@ -305,6 +350,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it

 #### Step 6: Re-submit to Reviewer (Rounds 2-N)

+Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
+
 **If `REVIEWER_CLI` is `codex`:**

 Resume the existing session:
@@ -363,20 +410,31 @@ Changes made:
 Re-review. If solid, end with: VERDICT: APPROVED
 If more changes needed, end with: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
-
-jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
 ```

 If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds.

+Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response.
+
+After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1.
+
 Return to Step 4.

 #### Step 7: Cleanup Per Milestone

 ```bash
-rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json
+rm -f \
+  /tmp/milestone-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.json \
+  /tmp/milestone-review-${REVIEW_ID}.stderr \
+  /tmp/milestone-review-${REVIEW_ID}.status \
+  /tmp/milestone-review-${REVIEW_ID}.runner.out \
+  /tmp/milestone-review-${REVIEW_ID}.sh
 ```

+If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately.
+
 ### Phase 6: Completion (REQUIRED SUB-SKILL)

 After all milestones are approved and committed:
@@ -169,7 +169,24 @@ Do NOT push. After committing:
 REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
 ```

-Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`.
+Use `REVIEW_ID` for all milestone review temp file paths:
+- `/tmp/milestone-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.json`
+- `/tmp/milestone-review-${REVIEW_ID}.stderr`
+- `/tmp/milestone-review-${REVIEW_ID}.status`
+- `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+- `/tmp/milestone-review-${REVIEW_ID}.sh`
+
+Resolve the shared runtime helper path before writing the command script:
+
+```bash
+if [ -x .cursor/skills/reviewer-runtime/run-review.sh ]; then
+  REVIEWER_RUNTIME=.cursor/skills/reviewer-runtime/run-review.sh
+else
+  REVIEWER_RUNTIME=~/.cursor/skills/reviewer-runtime/run-review.sh
+fi
+```

 #### Step 2: Write Review Payload

@@ -198,6 +215,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:

 #### Step 3: Submit to Reviewer (Round 1)

+Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+```
+
 **If `REVIEWER_CLI` is `codex`:**

 ```bash
@@ -218,7 +242,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE"
 ```

-Capture the Codex session ID from output (line `session id: <uuid>`). Store as `CODEX_SESSION_ID`.
+Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.

 **If `REVIEWER_CLI` is `claude`:**

@@ -236,8 +260,7 @@ Evaluate:
 Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE" \
  --model ${REVIEWER_MODEL} \
-  --allowedTools Read \
-  > /tmp/milestone-review-${REVIEW_ID}.md
+  --allowedTools Read
 ```

 **If `REVIEWER_CLI` is `cursor`:**
@@ -262,12 +285,7 @@ If changes are needed, end with exactly: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
 ```

-Extract session ID and review text:
-
-```bash
-CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json)
-jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
-```
+For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes.

 Notes on Cursor flags:
 - `--mode=ask` — read-only mode, no file modifications
@@ -275,10 +293,41 @@ Notes on Cursor flags:
 - `-p` / `--print` — non-interactive mode, output to stdout
 - `--output-format json` — structured output with `session_id` and `result` fields

+Run the command script through the shared helper when available:
+
+```bash
+if [ -x "$REVIEWER_RUNTIME" ]; then
+  "$REVIEWER_RUNTIME" \
+    --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
+    --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
+    --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
+    --status-file /tmp/milestone-review-${REVIEW_ID}.status
+else
+  echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
+  bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
+fi
+```
+
+After the command completes:
+- If `REVIEWER_CLI=cursor`, extract the final review text:
+
+```bash
+CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json)
+jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
+```
+
+- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
+
+Fallback is allowed only when the helper is missing or not executable.
+
 #### Step 4: Read Review & Check Verdict

 1. Read `/tmp/milestone-review-${REVIEW_ID}.md`
-2. Present review to the user:
+2. If the review failed, produced empty output, or reached helper timeout, also read:
+   - `/tmp/milestone-review-${REVIEW_ID}.stderr`
+   - `/tmp/milestone-review-${REVIEW_ID}.status`
+   - `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+3. Present review to the user:

 ```
 ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL})
@@ -286,10 +335,12 @@ Notes on Cursor flags:
 [Reviewer feedback]
 ```

-3. Check verdict:
+4. Check verdict:
   - **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve)
   - **VERDICT: REVISE** -> go to Step 5
   - No clear verdict but positive / no actionable items -> treat as approved
+   - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
+   - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters
   - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)

 #### Step 5: Address Feedback & Re-verify
@@ -309,6 +360,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it

 #### Step 6: Re-submit to Reviewer (Rounds 2-N)

+Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
+
 **If `REVIEWER_CLI` is `codex`:**

 Resume the existing session:
@@ -367,20 +420,31 @@ Changes made:
 Re-review. If solid, end with: VERDICT: APPROVED
 If more changes needed, end with: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
-
-jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
 ```

 If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds.

+Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response.
+
+After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1.
+
 Return to Step 4.

 #### Step 7: Cleanup Per Milestone

 ```bash
-rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json
+rm -f \
+  /tmp/milestone-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.json \
+  /tmp/milestone-review-${REVIEW_ID}.stderr \
+  /tmp/milestone-review-${REVIEW_ID}.status \
+  /tmp/milestone-review-${REVIEW_ID}.runner.out \
+  /tmp/milestone-review-${REVIEW_ID}.sh
 ```

+If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately.
+
 ### Phase 6: Completion (REQUIRED SUB-SKILL)

 After all milestones are approved and committed:
@@ -154,7 +154,20 @@ Do NOT push. After committing:
 REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
 ```

-Use for all temp file paths: `/tmp/milestone-${REVIEW_ID}.md` and `/tmp/milestone-review-${REVIEW_ID}.md`.
+Use `REVIEW_ID` for all milestone review temp file paths:
+- `/tmp/milestone-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.md`
+- `/tmp/milestone-review-${REVIEW_ID}.json`
+- `/tmp/milestone-review-${REVIEW_ID}.stderr`
+- `/tmp/milestone-review-${REVIEW_ID}.status`
+- `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+- `/tmp/milestone-review-${REVIEW_ID}.sh`
+
+Resolve the shared runtime helper path before writing the command script:
+
+```bash
+REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh
+```

 #### Step 2: Write Review Payload

@@ -183,6 +196,13 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:

 #### Step 3: Submit to Reviewer (Round 1)

+Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+```
+
 **If `REVIEWER_CLI` is `codex`:**

 ```bash
@@ -203,7 +223,7 @@ Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE"
 ```

-Capture the Codex session ID from output (line `session id: <uuid>`). Store as `CODEX_SESSION_ID`.
+Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.

 **If `REVIEWER_CLI` is `claude`:**

@@ -221,8 +241,7 @@ Evaluate:
 Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
 If changes are needed, end with exactly: VERDICT: REVISE" \
  --model ${REVIEWER_MODEL} \
-  --allowedTools Read \
-  > /tmp/milestone-review-${REVIEW_ID}.md
+  --allowedTools Read
 ```

 **If `REVIEWER_CLI` is `cursor`:**
@@ -247,19 +266,43 @@ If changes are needed, end with exactly: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
 ```

-Extract session ID and review text (requires `jq`):
+For `cursor`, the command script writes raw JSON to `/tmp/milestone-review-${REVIEW_ID}.json`. Do not run `jq` extraction until after the helper or fallback execution completes. If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent.
+
+Run the command script through the shared helper when available:
+
+```bash
+if [ -x "$REVIEWER_RUNTIME" ]; then
+  "$REVIEWER_RUNTIME" \
+    --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
+    --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
+    --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
+    --status-file /tmp/milestone-review-${REVIEW_ID}.status
+else
+  echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
+  bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
+fi
+```
+
+After the command completes:
+- If `REVIEWER_CLI=cursor`, extract the final review text:

 ```bash
 CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json)
 jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
 ```

-If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivalent.
+- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
+
+Fallback is allowed only when the helper is missing or not executable.

 #### Step 4: Read Review & Check Verdict

 1. Read `/tmp/milestone-review-${REVIEW_ID}.md`
-2. Present review to the user:
+2. If the review failed, produced empty output, or reached helper timeout, also read:
+   - `/tmp/milestone-review-${REVIEW_ID}.stderr`
+   - `/tmp/milestone-review-${REVIEW_ID}.status`
+   - `/tmp/milestone-review-${REVIEW_ID}.runner.out`
+3. Present review to the user:

 ```
 ## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL})
@@ -267,10 +310,12 @@ If `jq` is not installed, inform the user: `brew install jq` (macOS) or equivale
 [Reviewer feedback]
 ```

-3. Check verdict:
+4. Check verdict:
   - **VERDICT: APPROVED** -> proceed to Phase 5 Step 6 (commit & approve)
   - **VERDICT: REVISE** -> go to Step 5
   - No clear verdict but positive / no actionable items -> treat as approved
+   - Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
+   - Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters
   - Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)

 #### Step 5: Address Feedback & Re-verify
@@ -290,6 +335,8 @@ If a revision contradicts the user's explicit requirements, skip it and note it

 #### Step 6: Re-submit to Reviewer (Rounds 2-N)

+Rewrite `/tmp/milestone-review-${REVIEW_ID}.sh` for the next round. The script should contain the reviewer invocation only; do not run it directly.
+
 **If `REVIEWER_CLI` is `codex`:**

 Resume the existing session:
@@ -348,20 +395,31 @@ Changes made:
 Re-review. If solid, end with: VERDICT: APPROVED
 If more changes needed, end with: VERDICT: REVISE" \
  > /tmp/milestone-review-${REVIEW_ID}.json
-
-jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md
 ```

 If resume fails, fall back to fresh `cursor-agent -p` with context about prior rounds.

+Do not run `jq` extraction until after the helper or fallback execution completes, then extract `/tmp/milestone-review-${REVIEW_ID}.md` from the JSON response.
+
+After updating `/tmp/milestone-review-${REVIEW_ID}.sh`, run the same helper/fallback flow from Round 1.
+
 Return to Step 4.

 #### Step 7: Cleanup Per Milestone

 ```bash
-rm -f /tmp/milestone-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.md /tmp/milestone-review-${REVIEW_ID}.json
+rm -f \
+  /tmp/milestone-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.md \
+  /tmp/milestone-review-${REVIEW_ID}.json \
+  /tmp/milestone-review-${REVIEW_ID}.stderr \
+  /tmp/milestone-review-${REVIEW_ID}.status \
+  /tmp/milestone-review-${REVIEW_ID}.runner.out \
+  /tmp/milestone-review-${REVIEW_ID}.sh
 ```

+If the round failed, produced empty output, or reached operator-decision timeout, keep `.stderr`, `.status`, and `.runner.out` until the issue is diagnosed instead of deleting them immediately.
+
 ### Phase 7: Completion (REQUIRED SUB-SKILL)

 After all milestones are approved and committed: