Align reviewer runtime and Telegram notifications

This commit is contained in:
Stefano Fiorini
2026-03-24 11:45:58 -05:00
parent 4d37674626
commit 63a048a26c
17 changed files with 1756 additions and 200 deletions

View File

@@ -25,6 +25,7 @@ Execute an existing plan (created by `create-plan`) in an isolated git worktree,
- Claude Code: `~/.claude/skills/reviewer-runtime/run-review.sh`
- OpenCode: `~/.config/opencode/skills/reviewer-runtime/run-review.sh`
- Cursor: `.cursor/skills/reviewer-runtime/run-review.sh` or `~/.cursor/skills/reviewer-runtime/run-review.sh`
- Telegram notification setup is documented in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
If dependencies are missing, stop and return:
@@ -133,10 +134,13 @@ Verify Superpowers execution dependencies exist in your agent skills root:
- Runs lint/typecheck/tests as a gate before each milestone review.
- Sends each milestone to a reviewer CLI for approval (max rounds configurable, default 10).
- Runs reviewer commands through `reviewer-runtime/run-review.sh` when available, with fallback to direct synchronous execution only if the helper is missing.
- Waits as long as the reviewer runtime keeps emitting per-minute `In progress N` heartbeats.
- Requires reviewer findings to be ordered `P0` through `P3`, with `P3` explicitly non-blocking.
- Captures reviewer stderr and helper status logs for diagnostics and retains them on failed, empty-output, or operator-decision review rounds.
- Commits each milestone locally only after reviewer approval (does not push).
- After all milestones approved, merges worktree branch to parent and deletes worktree.
- Supports resume: detects existing worktree and `in-dev`/`completed` stories.
- Sends completion notifications through Telegram only when the shared setup in [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md) is installed and configured.
## Milestone Review Loop
@@ -145,11 +149,22 @@ After each milestone is implemented and verified, the skill sends it to a second
1. **Configure** — user picks a reviewer CLI (`codex`, `claude`, `cursor`) and model, or skips
2. **Prepare** — milestone payload and a bash reviewer command script are written to temp files
3. **Run** — the command script is executed through `reviewer-runtime/run-review.sh` when installed
4. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security
5. **Revise** — the implementing agent addresses each issue, re-verifies, and re-submits
6. **Repeat**up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED`
4. **Feedback** — reviewer evaluates correctness, acceptance criteria, code quality, test coverage, security, and returns `## Summary`, `## Findings`, and `## Verdict`
5. **Prioritize** — findings are ordered `P0`, `P1`, `P2`, `P3`
6. **Revise**the implementing agent addresses findings in priority order, re-verifies, and re-submits
7. **Repeat** — up to max rounds (default 10) until the reviewer returns `VERDICT: APPROVED`
7. **Approve** — milestone is marked approved in `story-tracker.md`
### Reviewer Output Contract
- `P0` = total blocker
- `P1` = major risk
- `P2` = must-fix before approval
- `P3` = cosmetic / nice to have
- Each severity section should use `- None.` when empty
- `VERDICT: APPROVED` is valid only when no `P0`, `P1`, or `P2` findings remain
- `P3` findings are non-blocking, but the caller should still try to fix them when cheap and safe
### Runtime Artifacts
The milestone review flow may create these temp artifacts:
@@ -165,16 +180,18 @@ The milestone review flow may create these temp artifacts:
Status log lines use this format:
```text
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
ts=<ISO-8601> level=<info|warn|error> state=<running-silent|running-active|in-progress|stall-warning|completed|completed-empty-output|failed|needs-operator-decision> elapsed_s=<int> pid=<int> stdout_bytes=<int> stderr_bytes=<int> note="<short message>"
```
`stall-warning` is a heartbeat/status-log state only. It is not a terminal review result.
`in-progress` is the liveness heartbeat emitted roughly once per minute with `note="In progress N"`.
`stall-warning` is a non-terminal status-log state only. It does not mean the caller should stop waiting if `in-progress` heartbeats continue.
### Failure Handling
- `completed-empty-output` means the reviewer exited without producing review text; surface `.stderr` and `.status`, then retry only after diagnosing the cause.
- `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to keep waiting, abort, or retry with different parameters.
- `needs-operator-decision` means the helper reached hard-timeout escalation; surface `.status` and decide whether to extend the timeout, abort, or retry with different parameters.
- Successful rounds clean up temp artifacts. Failed, empty-output, and operator-decision rounds should retain `.stderr`, `.status`, and `.runner.out` until diagnosed.
- As long as fresh `in-progress` heartbeats continue to arrive roughly once per minute, the caller should keep waiting.
### Supported Reviewer CLIs
@@ -190,6 +207,12 @@ For all three CLIs, the preferred execution path is:
2. run that script through `reviewer-runtime/run-review.sh`
3. fall back to direct synchronous execution only if the helper is missing or not executable
## Notifications
- Telegram is the only supported completion notification path.
- Shared setup: [TELEGRAM-NOTIFICATIONS.md](./TELEGRAM-NOTIFICATIONS.md)
- Notification failures are non-blocking, but they must be surfaced to the user.
The helper also supports manual override flags for diagnostics:
```bash