Files

Stefano Fiorini 63a048a26c Align reviewer runtime and Telegram notifications

2026-03-24 11:45:58 -05:00

22 KiB

Raw Blame History

name, description

name	description
implement-plan	Use when a plan folder (from create-plan) exists and needs to be executed in an isolated git worktree with iterative cross-model milestone review in OpenCode workflows. ALWAYS invoke when user says "implement the plan", "execute the plan", "start implementation", "resume the plan", or similar execution requests.

Implement Plan (OpenCode)

Execute an existing plan (created by create-plan) in an isolated git worktree, with iterative cross-model review at each milestone boundary.

Prerequisite Check (MANDATORY)

This OpenCode variant depends on Superpowers execution skills being installed via OpenCode's native skill system.

Required:

Plan folder exists under ai_plan/ at project root
continuation-runbook.md exists in plan folder
milestone-plan.md exists in plan folder
story-tracker.md exists in plan folder
Git repo with worktree support: git worktree list
OpenCode Superpowers skills symlink: ~/.config/opencode/skills/superpowers
Superpowers execution skills:
- superpowers/executing-plans
- superpowers/using-git-worktrees
- superpowers/verification-before-completion
- superpowers/finishing-a-development-branch

Verify before proceeding:

ls -l ~/.config/opencode/skills/superpowers

If dependencies are missing, stop immediately and return:

"Missing dependency: OpenCode Superpowers execution skills are required (superpowers/executing-plans, superpowers/using-git-worktrees, superpowers/verification-before-completion, superpowers/finishing-a-development-branch). Install from https://github.com/obra/superpowers (OpenCode setup), then retry."

If no plan folder exists:

"No plan found under ai_plan/. Run create-plan first."

Process

Phase 1: Bootstrap Superpowers Context (REQUIRED)

Use OpenCode's native skill tool:

list skills
verify superpowers/executing-plans, superpowers/using-git-worktrees, superpowers/verification-before-completion, and superpowers/finishing-a-development-branch are discoverable

Phase 2: Locate Plan

Scan ai_plan/ for plan directories (most recent first by date prefix).
If multiple plans exist, ask user which one to implement.
If no plan exists, stop: "No plan found. Run create-plan first."
Read continuation-runbook.md first (source of truth).
Read story-tracker.md to detect resume state (in-dev or completed stories).
Read milestone-plan.md for implementation details.

Phase 3: Configure Reviewer

If the user has already specified a reviewer CLI and model (e.g., "implement the plan, review with codex o4-mini"), use those values. Otherwise, ask:

Which CLI should review each milestone?
- codex — OpenAI Codex CLI (codex exec)
- claude — Claude Code CLI (claude -p)
- cursor — Cursor Agent CLI (cursor-agent -p)
- skip — No external review, proceed with user approval only
Which model? (only if a CLI was chosen)
- For codex: default o4-mini, alternatives: gpt-5.3-codex, o3
- For claude: default sonnet, alternatives: opus, haiku
- For cursor: run cursor-agent models first to see available models
- Accept any model string the user provides
Max review rounds per milestone? (default: 10)
- If the user does not provide a value, set MAX_ROUNDS=10.

Store REVIEWER_CLI, REVIEWER_MODEL, and MAX_ROUNDS. These values are fixed for the entire run.

Phase 4: Set Up Worktree (REQUIRED SUB-SKILL)

Use OpenCode's native skill tool to load:

superpowers/using-git-worktrees

Then:

Branch naming: implement/<plan-folder-name> (e.g., implement/2026-03-04-auth-system).
Follow worktree skill's directory priority: .worktrees/ > worktrees/ > CLAUDE.md > ask user.
Verify .gitignore covers worktree directory.
Run project setup (auto-detect: npm install, cargo build, pip install, etc.).
Verify clean baseline (run tests).

Resume detection: If story-tracker.md shows in-dev or completed stories, check if worktree branch already exists (git worktree list). If so, cd into existing worktree instead of creating a new one.

Phase 5: Execute Milestones (Loop)

For each milestone (M1, M2, ...):

Step 1: Read Milestone Spec

Read the milestone section from milestone-plan.md.

Step 2: Update Tracker

Mark first story in-dev in story-tracker.md.

Step 3: Implement Stories

Execute each story in order. After completing each story:

Mark in-dev -> completed in story-tracker.md
Update counts
Mark next story in-dev

Commit hashes are not available yet — they are backfilled in Step 6 after the milestone is approved and committed.

Step 4: Verify Milestone (REQUIRED SUB-SKILL)

Use OpenCode's native skill tool to load:

superpowers/verification-before-completion

# Lint changed files
# Typecheck
# Run tests (targeted first, then full suite)

All must pass before proceeding. If failures: fix, re-verify. Do NOT proceed to review with failures.

Step 5: Milestone Review Loop

Send to reviewer for approval before committing. See Phase 6 for details. The review payload uses working-tree diffs (git diff for unstaged, git diff --staged for staged changes).

Skip this step if reviewer was set to skip. When skipped, present the milestone summary to the user and ask for approval directly.

Step 6: Commit & Approve

Only after the reviewer approves (or user overrides at max rounds):

git add <changed-files>
git commit -m "feat(<scope>): implement milestone M<N> - <description>"

Do NOT push. After committing:

Backfill the commit hash into the Notes column for all stories in this milestone in story-tracker.md.
Mark milestone as approved in story-tracker.md.
Move to next milestone.

Phase 6: Milestone Review Loop (Detail)

Skip this phase entirely if reviewer was set to skip.

Step 1: Generate Session ID

REVIEW_ID=$(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)

Use REVIEW_ID for all milestone review temp file paths:

/tmp/milestone-${REVIEW_ID}.md
/tmp/milestone-review-${REVIEW_ID}.md
/tmp/milestone-review-${REVIEW_ID}.json
/tmp/milestone-review-${REVIEW_ID}.stderr
/tmp/milestone-review-${REVIEW_ID}.status
/tmp/milestone-review-${REVIEW_ID}.runner.out
/tmp/milestone-review-${REVIEW_ID}.sh

Resolve the shared runtime helper path before writing the command script:

REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh

Set helper success-artifact args before writing the command script:

HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
  codex)
    HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.md)
    ;;
  cursor)
    HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/milestone-review-${REVIEW_ID}.json)
    ;;
esac

Step 2: Write Review Payload

Write to /tmp/milestone-${REVIEW_ID}.md:

# Milestone M<N> Review: <title>

## Milestone Spec (from plan)
[Copy milestone section from milestone-plan.md]

## Acceptance Criteria
[Copy acceptance criteria checkboxes]

## Changes Made (git diff)
[Output of: git diff -- for unstaged changes, or git diff --staged for staged changes]

## Verification Output
### Lint
[lint output]
### Typecheck
[typecheck output]
### Tests
[test output with pass/fail counts]

Review Contract (Applies to Every Round)

The reviewer response must use this structure:

## Summary
...

## Findings
### P0
- ...
### P1
- ...
### P2
- ...
### P3
- ...

## Verdict
VERDICT: APPROVED

Rules:

Order findings from P0 to P3.
P0 = total blocker, P1 = major risk, P2 = must-fix before approval, P3 = cosmetic / nice to have.
Use - None. when a severity has no findings.
VERDICT: APPROVED is allowed only when no P0, P1, or P2 findings remain. P3 findings are non-blocking.
The calling agent should still try to fix P3 findings when they are cheap and safe.

Liveness Contract (Applies While Review Is Running)

The shared reviewer runtime emits state=in-progress note="In progress N" heartbeats every 60 seconds while the reviewer child is alive.
The calling agent must keep waiting as long as a fresh In progress N heartbeat keeps arriving roughly once per minute.
Do not abort just because the review is slow, a soft timeout fired, or a stall-warning line appears, as long as the In progress N heartbeat continues.
Treat missing heartbeats, state=failed, state=completed-empty-output, and state=needs-operator-decision as escalation signals.

Step 3: Submit to Reviewer (Round 1)

Write the reviewer invocation to /tmp/milestone-review-${REVIEW_ID}.sh as a bash script:

#!/usr/bin/env bash
set -euo pipefail

If REVIEWER_CLI is codex:

codex exec \
  -m ${REVIEWER_MODEL} \
  -s read-only \
  -o /tmp/milestone-review-${REVIEW_ID}.md \
  "Review this milestone implementation. The spec, acceptance criteria, git diff, and verification output are in /tmp/milestone-${REVIEW_ID}.md.

Evaluate:
1. Correctness — Does the implementation match the milestone spec?
2. Acceptance criteria — Are all criteria met?
3. Code quality — Clean, maintainable, no obvious issues?
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?


Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict

Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking."

Do not try to capture the Codex session ID yet. When using the helper, extract it from /tmp/milestone-review-${REVIEW_ID}.runner.out after the command completes (look for session id: <uuid>), then store it as CODEX_SESSION_ID for resume in subsequent rounds.

If REVIEWER_CLI is claude:

claude -p \
  "Review this milestone implementation using the following spec, acceptance criteria, git diff, and verification output:

$(cat /tmp/milestone-${REVIEW_ID}.md)

Evaluate:
1. Correctness — Does the implementation match the milestone spec?
2. Acceptance criteria — Are all criteria met?
3. Code quality — Clean, maintainable, no obvious issues?
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?


Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict

Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
  --model ${REVIEWER_MODEL} \
  --strict-mcp-config \
  --setting-sources user

If REVIEWER_CLI is cursor:

cursor-agent -p \
  --mode=ask \
  --model ${REVIEWER_MODEL} \
  --trust \
  --output-format json \
  "Read the file /tmp/milestone-${REVIEW_ID}.md and review this milestone implementation.

Evaluate:
1. Correctness — Does the implementation match the milestone spec?
2. Acceptance criteria — Are all criteria met?
3. Code quality — Clean, maintainable, no obvious issues?
4. Test coverage — Are changes adequately tested?
5. Security — Any security concerns introduced?


Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict

Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking." \
  > /tmp/milestone-review-${REVIEW_ID}.json

For cursor, the command script writes raw JSON to /tmp/milestone-review-${REVIEW_ID}.json. Do not run jq extraction until after the helper or fallback execution completes. If jq is not installed, inform the user: brew install jq (macOS) or equivalent.

Run the command script through the shared helper when available:

if [ -x "$REVIEWER_RUNTIME" ]; then
  "$REVIEWER_RUNTIME" \
    --command-file /tmp/milestone-review-${REVIEW_ID}.sh \
    --stdout-file /tmp/milestone-review-${REVIEW_ID}.runner.out \
    --stderr-file /tmp/milestone-review-${REVIEW_ID}.stderr \
    --status-file /tmp/milestone-review-${REVIEW_ID}.status \
    "${HELPER_SUCCESS_FILE_ARGS[@]}"
else
  echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
  bash /tmp/milestone-review-${REVIEW_ID}.sh >/tmp/milestone-review-${REVIEW_ID}.runner.out 2>/tmp/milestone-review-${REVIEW_ID}.stderr
fi

Run the helper in the foreground and watch its live stdout for state=in-progress heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll /tmp/milestone-review-${REVIEW_ID}.status separately instead of treating heartbeats as post-hoc-only data.

After the command completes:

If REVIEWER_CLI=cursor, extract the final review text:

CURSOR_SESSION_ID=$(jq -r '.session_id' /tmp/milestone-review-${REVIEW_ID}.json)
jq -r '.result' /tmp/milestone-review-${REVIEW_ID}.json > /tmp/milestone-review-${REVIEW_ID}.md

If REVIEWER_CLI=codex, extract CODEX_SESSION_ID from /tmp/milestone-review-${REVIEW_ID}.runner.out after the helper or fallback run. If the review text is only in .runner.out, move or copy the actual review body into /tmp/milestone-review-${REVIEW_ID}.md before verdict parsing.
If REVIEWER_CLI=claude, promote stdout captured by the helper or fallback runner into the markdown review file:

cp /tmp/milestone-review-${REVIEW_ID}.runner.out /tmp/milestone-review-${REVIEW_ID}.md

Fallback is allowed only when the helper is missing or not executable.

Step 4: Read Review & Check Verdict

Read /tmp/milestone-review-${REVIEW_ID}.md
If the review failed, produced empty output, or reached helper timeout, also read:
- /tmp/milestone-review-${REVIEW_ID}.stderr
- /tmp/milestone-review-${REVIEW_ID}.status
- /tmp/milestone-review-${REVIEW_ID}.runner.out
Present review to the user:

## Milestone Review — Round N (reviewer: ${REVIEWER_CLI} / ${REVIEWER_MODEL})

[Reviewer feedback]

While the reviewer is still running, keep waiting as long as fresh state=in-progress note="In progress N" heartbeats continue to appear roughly once per minute.
Check verdict:
- VERDICT: APPROVED with no P0, P1, or P2 findings -> proceed to Phase 5 Step 6 (commit & approve)
- VERDICT: APPROVED with only P3 findings -> optionally fix the P3 items if they are cheap and safe, then proceed
- VERDICT: REVISE or any P0, P1, or P2 finding -> go to Step 5
- No clear verdict but P0, P1, and P2 are all - None. -> treat as approved
- Helper state completed-empty-output -> treat as failed review attempt, surface stderr/status, fix invocation or prompt handling, then retry
- Helper state needs-operator-decision -> surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters
- Max rounds (MAX_ROUNDS) reached -> present to user for manual decision (proceed or stop)

Step 5: Address Feedback & Re-verify

Address the reviewer findings in priority order (P0 -> P1 -> P2, then P3 when practical) (do NOT commit yet).
Re-run verification (lint/typecheck/tests) — all must pass.
Update /tmp/milestone-${REVIEW_ID}.md with new diff and verification output.

Summarize revisions for the user:

### Revisions (Round N)
- [Change and reason, one bullet per issue addressed]

If a revision contradicts the user's explicit requirements, skip it and note it for the user.

Step 6: Re-submit to Reviewer (Rounds 2-N)

Rewrite /tmp/milestone-review-${REVIEW_ID}.sh for the next round. The script should contain the reviewer invocation only; do not run it directly.

If REVIEWER_CLI is codex:

Resume the existing session:

codex exec resume ${CODEX_SESSION_ID} \
  -o /tmp/milestone-review-${REVIEW_ID}.md \
  "I've addressed your feedback. Updated diff and verification output are in /tmp/milestone-${REVIEW_ID}.md.

Changes made:
[List specific changes]

Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking."

If resume fails (session expired), fall back to fresh codex exec with context about prior rounds.

If REVIEWER_CLI is claude:

Fresh call with accumulated context (Claude CLI has no session resume):

claude -p \
  "You previously reviewed milestone M<N> and requested revisions.

Previous feedback summary: [key points from last review]

I've addressed your feedback. Updated diff and verification output are below.

$(cat /tmp/milestone-${REVIEW_ID}.md)

Changes made:
[List specific changes]

Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
  --model ${REVIEWER_MODEL} \
  --strict-mcp-config \
  --setting-sources user

If REVIEWER_CLI is cursor:

Resume the existing session:

cursor-agent --resume ${CURSOR_SESSION_ID} -p \
  --mode=ask \
  --model ${REVIEWER_MODEL} \
  --trust \
  --output-format json \
  "I've addressed your feedback. Updated diff and verification output are in /tmp/milestone-${REVIEW_ID}.md.

Changes made:
[List specific changes]

Re-review using the same `## Summary`, `## Findings`, and `## Verdict` structure as before.
Keep findings ordered `P0` to `P3`, use `- None.` when a severity has no findings, and only use `VERDICT: APPROVED` when no `P0`, `P1`, or `P2` findings remain. `P3` findings are non-blocking." \
  > /tmp/milestone-review-${REVIEW_ID}.json

If resume fails, fall back to fresh cursor-agent -p with context about prior rounds.

Do not run jq extraction until after the helper or fallback execution completes, then extract /tmp/milestone-review-${REVIEW_ID}.md from the JSON response.

After updating /tmp/milestone-review-${REVIEW_ID}.sh, run the same helper/fallback flow from Round 1.

Return to Step 4.

Step 7: Cleanup Per Milestone

rm -f \
  /tmp/milestone-${REVIEW_ID}.md \
  /tmp/milestone-review-${REVIEW_ID}.md \
  /tmp/milestone-review-${REVIEW_ID}.json \
  /tmp/milestone-review-${REVIEW_ID}.stderr \
  /tmp/milestone-review-${REVIEW_ID}.status \
  /tmp/milestone-review-${REVIEW_ID}.runner.out \
  /tmp/milestone-review-${REVIEW_ID}.sh

If the round failed, produced empty output, or reached operator-decision timeout, keep .stderr, .status, and .runner.out until the issue is diagnosed instead of deleting them immediately.

Phase 7: Completion (REQUIRED SUB-SKILL)

After all milestones are approved and committed:

Use OpenCode's native skill tool to load: superpowers/finishing-a-development-branch
Run full test suite one final time — all must pass.
Merge the worktree branch into the parent branch:

# From the main repo (not the worktree)
git merge implement/<plan-folder-name>

Delete the worktree and its branch:

git worktree remove <worktree-path>
git branch -d implement/<plan-folder-name>

Mark plan status as completed in story-tracker.md.

Phase 8: Final Report

Present summary:

## Implementation Complete

**Plan:** <plan-folder-name>
**Milestones:** <N> completed, <N> approved
**Review rounds:** <total across all milestones>
**Branch:** implement/<plan-folder-name> (merged and deleted)

Phase 9: Telegram Completion Notification (MANDATORY)

Resolve the Telegram notifier helper from the installed OpenCode skills directory:

TELEGRAM_NOTIFY_RUNTIME=~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh

On every terminal outcome for the implement-plan run (fully completed, stopped after max rounds, skipped reviewer, or failure), send a Telegram summary if the helper exists and both TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID are configured:

if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
  "$TELEGRAM_NOTIFY_RUNTIME" --message "implement-plan completed for <plan-folder-name>: <status summary>"
fi