Align reviewer runtime and Telegram notifications

This commit is contained in:
Stefano Fiorini
2026-03-24 11:45:58 -05:00
parent 4d37674626
commit 63a048a26c
17 changed files with 1756 additions and 200 deletions

View File

@@ -151,6 +151,20 @@ Resolve the shared runtime helper path before writing the command script:
REVIEWER_RUNTIME=~/.claude/skills/reviewer-runtime/run-review.sh
```
Set helper success-artifact args before writing the command script:
```bash
HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
codex)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md)
;;
cursor)
HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json)
;;
esac
```
#### Step 2: Write Review Payload
Write to `/tmp/milestone-${REVIEW_ID}.md`:
@@ -176,6 +190,42 @@ Write to `/tmp/milestone-${REVIEW_ID}.md`:
[test output with pass/fail counts]
```
#### Review Contract (Applies to Every Round)
The reviewer response must use this structure:
```text
## Summary
...
## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...
## Verdict
VERDICT: APPROVED
```
Rules:
- Order findings from `P0` to `P3`.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- Use `- None.` when a severity has no findings.
- `VERDICT: APPROVED` is allowed only when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking.
- The calling agent should still try to fix `P3` findings when they are cheap and safe.
#### Liveness Contract (Applies While Review Is Running)
- The shared reviewer runtime emits `state=in-progress note="In progress N"` heartbeats every 60 seconds while the reviewer child is alive.
- The calling agent must keep waiting as long as a fresh `In progress N` heartbeat keeps arriving roughly once per minute.
- Do not abort just because the review is slow, a soft timeout fired, or a `stall-warning` line appears, as long as the `In progress N` heartbeat continues.
- Treat missing heartbeats, `state=failed`, `state=completed-empty-output`, and `state=needs-operator-decision` as escalation signals.
#### Step 3: Submit to Reviewer (Round 1)
Write the reviewer invocation to `/tmp/milestone-review-${REVIEW_ID}.sh` as a bash script:
@@ -201,8 +251,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE"
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."
```
Do not try to capture the Codex session ID yet. When using the helper, extract it from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the command completes (look for `session id: <uuid>`), then store it as `CODEX_SESSION_ID` for resume in subsequent rounds.
@@ -222,8 +286,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user
@@ -246,8 +324,22 @@ Evaluate:
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?
Be specific and actionable. If solid, end with exactly: VERDICT: APPROVED
If changes are needed, end with exactly: VERDICT: REVISE" \
Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict
Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json
```
@@ -261,13 +353,16 @@ if [ -x "$REVIEWER_RUNTIME" ]; then
--command-file /tmp/milestone-review-${REVIEW_ID}.sh \
--stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
--stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
--status-file /tmp/milestone-review-${REVIEW_ID}.status
--status-file /tmp/milestone-review-${REVIEW_ID}.status \
"${HELPER_SUCCESS_FILE_ARGS[@]}"
else
echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
fi
```
Run the helper in the foreground and watch its live stdout for `state=in-progress` heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll `/tmp/milestone-review-${REVIEW_ID}.status` separately instead of treating heartbeats as post-hoc-only data.
After the command completes:
- If `REVIEWER_CLI=cursor`, extract the final review text:
@@ -277,6 +372,11 @@ jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-
```
- If `REVIEWER_CLI=codex`, extract `CODEX_SESSION_ID` from `/tmp/milestone-review-${REVIEW_ID}.runner.out` after the helper or fallback run. If the review text is only in `.runner.out`, move or copy the actual review body into `/tmp/milestone-review-${REVIEW_ID}.md` before verdict parsing.
- If `REVIEWER_CLI=claude`, promote stdout captured by the helper or fallback runner into the markdown review file:
```bash
cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md
```
Fallback is allowed only when the helper is missing or not executable.
@@ -295,17 +395,19 @@ Fallback is allowed only when the helper is missing or not executable.
[Reviewer feedback]
```
4. Check verdict:
- **VERDICT: APPROVED** -> proceed to Phase 4 Step 6 (commit & approve)
- **VERDICT: REVISE** -> go to Step 5
- No clear verdict but positive / no actionable items -> treat as approved
4. While the reviewer is still running, keep waiting as long as fresh `state=in-progress note="In progress N"` heartbeats continue to appear roughly once per minute.
5. Check verdict:
- **VERDICT: APPROVED** with no `P0`, `P1`, or `P2` findings -> proceed to Phase 4 Step 6 (commit & approve)
- **VERDICT: APPROVED** with only `P3` findings -> optionally fix the `P3` items if they are cheap and safe, then proceed
- **VERDICT: REVISE** or any `P0`, `P1`, or `P2` finding -> go to Step 5
- No clear verdict but `P0`, `P1`, and `P2` are all `- None.` -> treat as approved
- Helper state `completed-empty-output` -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state `needs-operator-decision` -> surface status log, note any `stall-warning` heartbeat lines as non-terminal operator hints, and decide whether to keep waiting, abort, or retry with different helper parameters
- Helper state `needs-operator-decision` -> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (`MAX_ROUNDS`) reached -> present to user for manual decision (proceed or stop)
#### Step 5: Address Feedback & Re-verify
1. Address each issue the reviewer raised (do NOT commit yet).
1. Address the reviewer findings in priority order (`P0` -> `P1` -> `P2`, then `P3` when practical) (do NOT commit yet).
2. Re-run verification (lint/typecheck/tests) — all must pass.
3. Update `/tmp/milestone-${REVIEW_ID}.md` with new diff and verification output.
@@ -334,8 +436,8 @@ codex exec resume ${CODEX_SESSION_ID} \
Changes made:
[List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED
If more changes needed, end with: VERDICT: REVISE"
Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."
```
If resume fails (session expired), fall back to fresh `codex exec` with context about prior rounds.
@@ -357,12 +459,11 @@ $(cat /tmp/milestone-${REVIEW_ID}.md)
Changes made:
[List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED
If more changes needed, end with: VERDICT: REVISE" \
Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
--model ${REVIEWER_MODEL} \
--strict-mcp-config \
--setting-sources user \
> /tmp/milestone-review-${REVIEW_ID}.md
--setting-sources user
```
**If `REVIEWER_CLI` is `cursor`:**
@@ -380,8 +481,8 @@ cursor-agent --resume ${CURSOR_SESSION_ID} -p \
Changes made:
[List specific changes]
Re-review. If solid, end with: VERDICT: APPROVED
If more changes needed, end with: VERDICT: REVISE" \
Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
> /tmp/milestone-review-${REVIEW_ID}.json
```
@@ -443,6 +544,27 @@ Present summary:
**Branch:** implement/<plan-folder-name> (merged and deleted)
```
### Phase 8: Telegram Completion Notification (MANDATORY)
Resolve the Telegram notifier helper from the installed Claude Code skills directory:
```bash
TELEGRAM_NOTIFY_RUNTIME=~/.claude/skills/reviewer-runtime/notify-telegram.sh
```
On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID` are configured:
```bash
if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
"$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>"
fi
```
Rules:
- Telegram is the only supported completion notification path. Do not use desktop notifications, `say`, email, or any other notifier.
- Notification failures are non-blocking, but they must be surfaced to the user.
- If Telegram is not configured, state that no completion notification was sent.
## Tracker Discipline (MANDATORY)
**ALWAYS update `story-tracker.md` before/after each story. NEVER proceed with stale tracker state.**
@@ -477,3 +599,4 @@ Note: Commit hashes are backfilled into story Notes after the milestone commit (
- [ ] Final test suite passes
- [ ] Worktree branch merged to parent and worktree deleted
- [ ] Story tracker updated with final status
- [ ] Telegram completion notification attempted if configured