feat(implement-plan): route milestone review through shared runtime
This commit is contained in:
@@ -132,6 +132,8 @@ Verify Superpowers execution dependencies exist in your agent skills root:
|
||||
- Executes milestones one-by-one, tracking stories in `story-tracker.md`.
|
||||
- Runs lint/typecheck/tests as a gate before each milestone review.
|
||||
- Sends each milestone to a reviewer CLI for approval (max rounds configurable, default 10).
|
||||
- Runs reviewer commands through `reviewer-runtime/run-review.sh` when available, with fallback to direct synchronous execution only if the helper is missing.
|
||||
- Captures reviewer stderr and helper status logs for diagnostics and retains them on failed, empty-output, or operator-decision review rounds.
|
||||
- Commits each milestone locally only after reviewer approval (does not push).
|
||||
- After all milestones approved, merges worktree branch to parent and deletes worktree.
|
||||
- Supports resume: detects existing worktree and `in-dev`/`completed` stories.
|
||||
@@ -141,11 +143,38 @@ Verify Superpowers execution dependencies exist in your agent skills root:
|
||||
After each milestone is implemented and verified, the skill sends it to a second model for review:
|
||||
|
||||
1. **Configure** — user picks a reviewer CLI (`codex`, `claude`, `cursor`) and model, or skips
|
||||
2. **Submit** — milestone spec, acceptance criteria, git diff, and verification output are written to a temp file and sent to the reviewer in read-only/ask mode
|
||||
3. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security
|
||||
4. **Revise** — the implementing agent addresses each issue, re-verifies, and re-submits
|
||||
5. **Repeat** — up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED`
|
||||
6. **Approve** — milestone is marked approved in `story-tracker.md`
|
||||
2. **Prepare** — milestone payload and a bash reviewer command script are written to temp files
|
||||
3. **Run** — the command script is executed through `reviewer-runtime/run-review.sh` when installed
|
||||
4. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security
|
||||
5. **Revise** — the implementing agent addresses each issue, re-verifies, and re-submits
|
||||
6. **Repeat** — up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED`
|
||||
7. **Approve** — milestone is marked approved in `story-tracker.md`
|
||||
|
||||
### Runtime Artifacts
|
||||
|
||||
The milestone review flow may create these temp artifacts:
|
||||
|
||||
- `/tmp/milestone-<id>.md` — milestone review payload
|
||||
- `/tmp/milestone-review-<id>.md` — normalized review text
|
||||
- `/tmp/milestone-review-<id>.json` — raw Cursor JSON output
|
||||
- `/tmp/milestone-review-<id>.stderr` — reviewer stderr
|
||||
- `/tmp/milestone-review-<id>.status` — helper heartbeat/status log
|
||||
- `/tmp/milestone-review-<id>.runner.out` — helper-managed stdout from the reviewer command process
|
||||
- `/tmp/milestone-review-<id>.sh` — reviewer command script
|
||||
|
||||
Status log lines use this format:
|
||||
|
||||
```text
|
||||
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
|
||||
```
|
||||
|
||||
`stall-warning` is a heartbeat/status-log state only. It is not a terminal review result.
|
||||
|
||||
### Failure Handling
|
||||
|
||||
- `completed-empty-output` means the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause.
|
||||
- `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to keep waiting, abort, or retry with different parameters.
|
||||
- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds should retain `.stderr`, `.status`, and `.runner.out` until diagnosed.
|
||||
|
||||
### Supported Reviewer CLIs
|
||||
|
||||
@@ -155,6 +184,26 @@ After each milestone is implemented and verified, the skill sends it to a second
|
||||
| `claude` | `claude -p --model <model> --allowedTools Read` | No (fresh call each round) | `--allowedTools Read` |
|
||||
| `cursor` | `cursor-agent -p --mode=ask --model <model> --trust --output-format json` | Yes (`--resume <id>`) | `--mode=ask` |
|
||||
|
||||
For all three CLIs, the preferred execution path is:
|
||||
|
||||
1. write the reviewer command to a bash script
|
||||
2. run that script through `reviewer-runtime/run-review.sh`
|
||||
3. fall back to direct synchronous execution only if the helper is missing or not executable
|
||||
|
||||
The helper also supports manual override flags for diagnostics:
|
||||
|
||||
```bash
|
||||
run-review.sh \
|
||||
--command-file <path> \
|
||||
--stdout-file <path> \
|
||||
--stderr-file <path> \
|
||||
--status-file <path> \
|
||||
--poll-seconds 10 \
|
||||
--soft-timeout-seconds 600 \
|
||||
--stall-warning-seconds 300 \
|
||||
--hard-timeout-seconds 1800
|
||||
```
|
||||
|
||||
## Variant Hardening Notes
|
||||
|
||||
### Claude Code
|
||||
|
||||
Reference in New Issue
Block a user