Files

T

Stefano Fiorini be993429c1 feat(M2): Documentation flow, accuracy, consistency cleanup, and cross-platform shell portability

2026-05-03 20:14:44 -05:00

37 KiB

Raw Blame History

name, description

name	description
do-task	Execute a single user-supplied prompt end-to-end with two reviewer loops (plan review + implementation review) in OpenCode. ALWAYS invoke when the user says `/do-task`, "do this task", "do task ...", "execute this task", or "make it so". Also invoke on the hint phrase "just do ...". Do NOT invoke on "implement this" (that phrase is reserved for implement-plan).

Do Task (OpenCode)

Execute an ad-hoc user prompt end-to-end: parse → clarify → plan (with reviewer loop) → implement (TDD-first where applicable) → verify → implementation review loop → commit → optional push → notify.

This is a single-artifact sibling of create-plan + implement-plan. Unlike implement-plan, do-task operates on one persistent task-plan.md (not a full milestone plan) and defaults to the current branch (not a worktree).

Core principle: OpenCode loads skills through its native skill tool. Local skills live under ~/.config/opencode/skills/, and OpenCode can also expose shared agent skills from ~/.agents/skills/. Sub-skill invocations use OpenCode's native mechanism — not Claude's Skill tool, not Cursor's discovery mechanism.

Prerequisite Check (MANDATORY)

Required:

OpenCode CLI: opencode --version (install via your package manager or brew install opencode).
Superpowers repo: https://github.com/obra/superpowers
OpenCode Superpowers skills available at ~/.agents/skills/superpowers or ~/.config/opencode/skills/superpowers
superpowers/brainstorming
superpowers/test-driven-development
superpowers/verification-before-completion
superpowers/finishing-a-development-branch
superpowers/using-git-worktrees (only when the prompt opts in to a worktree)
Shared reviewer runtime: ~/.config/opencode/skills/reviewer-runtime/run-review.sh
Telegram notifier helper: ~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh

Verify before proceeding:

opencode --version
test -f ~/.agents/skills/superpowers/brainstorming/SKILL.md || test -f ~/.config/opencode/skills/superpowers/brainstorming/SKILL.md
test -f ~/.agents/skills/superpowers/test-driven-development/SKILL.md || test -f ~/.config/opencode/skills/superpowers/test-driven-development/SKILL.md
test -f ~/.agents/skills/superpowers/verification-before-completion/SKILL.md || test -f ~/.config/opencode/skills/superpowers/verification-before-completion/SKILL.md
test -f ~/.agents/skills/superpowers/finishing-a-development-branch/SKILL.md || test -f ~/.config/opencode/skills/superpowers/finishing-a-development-branch/SKILL.md

If any required dependency is missing, stop immediately and return:

Missing dependency: [specific missing item]. Install required OpenCode Superpowers skills (https://github.com/obra/superpowers, OpenCode setup) and the reviewer-runtime helper, then retry.

Required Skill Invocation Rules

Invoke relevant skills through OpenCode's native skill tool.
Announce skill usage explicitly:
- I've read the [Skill Name] skill and I'm using it to [purpose].
For skills with checklists, track checklist items explicitly in conversation.
Do NOT use Claude's Skill tool syntax or Cursor's discovery mechanism. OpenCode's skill system may expose shared files from ~/.agents/skills/, but invocation still goes through OpenCode's native skill mechanism.

Trigger Phrase Detection

Binding triggers (always invoke this skill):

/do-task
"do this task"
"do task ..."
"execute this task"
"make it so"

Hint trigger (invoke unless context clearly maps to another skill):

"just do ..."

Escape phrases (skip the Phase 2 clarifying-question loop):

--no-questions
"just do it:"
"just do this:"
"no questions:"

Excluded (do NOT trigger do-task):

"implement this" — reserved for implement-plan.

Dropped defaults (explicitly NOT binding triggers):

"work on ..."
"handle this"
"take care of ..."
"get this done"

Worktree opt-in phrases (Phase 4 takes the worktree branch):

"in a worktree"
"use a worktree"
"on an isolated branch"
"on a new branch called X"

Process

Phase 1: Preflight (includes Bootstrap Superpowers Context)

Bootstrap Superpowers context — use OpenCode's native skill tool to list installed skills and confirm superpowers/brainstorming, superpowers/test-driven-development, superpowers/verification-before-completion, and superpowers/finishing-a-development-branch are discoverable. If any is missing, stop with the Prerequisite Check error message.
Verify git repo: git rev-parse --is-inside-work-tree.
Verify /ai_plan/ is present in .gitignore. If missing:
- Append /ai_plan/ to .gitignore.
- Commit that infra change immediately with message chore(gitignore): ignore ai_plan local planning artifacts.
- This infra commit is EXPLICITLY separate from the task commit in Phase 9. It may occur even when the final task ends up aborted or failed.
Announce each sub-skill before invocation using: I've read the [Skill Name] skill and I'm using it to [purpose].

Phase 2: Parse Prompt and Question

Capture the exact user prompt verbatim.
Detect trigger phrase (see above) and record which one matched.
Detect escape phrase. If set, skip clarifying questions entirely.
Apply the ask-first heuristic:
- Skip clarifying questions ONLY if ALL are true:
  - Prompt names a concrete target (file, feature, or function).
  - Prompt names a concrete outcome (what success looks like).
  - Prompt has no ambiguous scope (no "and maybe also ...").
  - All identifiers in the prompt are resolvable against the codebase.
- Otherwise, ask 1-3 clarifying questions, ONE AT A TIME, multiple-choice preferred.
- Empty prompt → ask exactly once: "what task?".
Invoke superpowers/brainstorming via OpenCode's native skill tool for any behavior-changing task — feature creation, bug fix with multiple plausible approaches, refactor, design decision. Present 2-3 approaches and recommend one before finalizing the plan. The ONLY skip conditions are the same ones that allow TDD auto-skip: pure-documentation and pure-comment-whitespace-rename. When skipping, record the skip reason in the Interpretation section of task-plan.md.

Phase 3: Configure Reviewer

If the user has already specified a reviewer CLI and model (e.g., "do task X, review with codex gpt-5.4"), use those values. If the user says "use defaults" or otherwise opts out of explicit configuration, proceed with REVIEWER_CLI=codex, REVIEWER_MODEL=gpt-5.4, and MAX_ROUNDS=10. Otherwise, ask:

Which CLI should review both the plan and the implementation?
- codex — OpenAI Codex CLI (codex exec)
- claude — Claude Code CLI (claude -p)
- cursor — Cursor Agent CLI (cursor-agent -p)
- opencode — OpenCode CLI (opencode run)
- skip — No external review, proceed with user approval only at each loop.
Which model? (only if a CLI was chosen)
- For codex: default gpt-5.4, alternatives: gpt-5.3-codex, o4-mini, o3.
- For claude: default sonnet, alternatives: opus, haiku.
- For cursor: run cursor-agent models first to see available models.
- For opencode: provider-qualified form <provider>/<model> (e.g., anthropic/claude-sonnet-4-5, openai/gpt-5.4). Run opencode models to list available models.
- Accept any model string the user provides.
Max review rounds shared across both loops? (default: 10)
- If the user does not provide a value, set MAX_ROUNDS=10.

Store REVIEWER_CLI, REVIEWER_MODEL, and MAX_ROUNDS for Phases 5 and 8.

Reviewer CLI: codex, claude, cursor, opencode, pi, or skip.

If REVIEWER_CLI=pi, verify the Pi reviewer binary before entering the review loop:

pi --version

For shorthand pi/<pi-model-name>, split only on the first slash when the prefix is exactly pi; store the complete remainder in REVIEWER_MODEL. Examples: pi/claude-opus-4-7 -> claude-opus-4-7, pi/anthropic/claude-opus-4-7 -> anthropic/claude-opus-4-7, and pi/openrouter/anthropic/claude-opus-4-7 -> openrouter/anthropic/claude-opus-4-7.

When REVIEWER_CLI=pi, the reviewer model is configured independently from the model running this workflow. If the model/provider is unavailable, surface helper stderr/status and use pi --list-models [search] to inspect configured models.

Phase 4: Initialize Plan Workspace

OpenCode has no plan-mode concept; there is no plan-mode guard here.

Steps:

Compute slug: YYYY-MM-DD-<slug> where <slug> is a kebab-case hash of the task goal (lowercase, alphanumeric + hyphens only).
Compute plan folder: ai_plan/<slug>/.
Resume detection: If the folder already exists, read task-plan.md:
- If Status is draft or plan-approved or implementation-in-progress: offer to resume, pick a new suffix (<slug>-v2), or abort. Default is resume.
- If Status is any terminal value (pushed, local-only, aborted-*, failed): offer a new suffix or abort. Default is new suffix.
If not resuming, create the folder and write task-plan.md from the template at templates/task-plan.md (this skill's template folder; falls back to ~/.config/opencode/skills/do-task/templates/task-plan.md when installed directly).
Fill in:
- Metadata block.
- Prompt (verbatim).
- Interpretation, Assumptions, Files, Approach, TDD Approach, Acceptance Criteria, Verification, Rollback.
- Leave Runtime State, Review History, Final Status empty (skill updates these).
Set Status: draft.

Worktree branch: If the prompt opts in to a worktree (see Trigger Phrase Detection), invoke superpowers/using-git-worktrees via OpenCode's native skill tool before proceeding. Otherwise continue on the current branch.

Phase 5: Plan Review Loop

If REVIEWER_CLI=skip, present task-plan.md to the user and proceed only after explicit user approval.

Otherwise, invoke the Review Loop (Shared Subroutine) with:

REVIEW_KIND     = plan
REVIEW_ID       = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)
PAYLOAD_PATH    = /tmp/do-task-plan-${REVIEW_ID}.md
PROMPT_TEMPLATE = PLAN_REVIEW_PROMPT  (see below)
SESSION_ID_VAR  = CODEX_PLAN_SESSION_ID | CURSOR_PLAN_SESSION_ID | OPENCODE_PLAN_SESSION_ID

Payload is the current task-plan.md with the Runtime State and Review History blocks stripped before writing to PAYLOAD_PATH. Those two blocks contain reviewer session IDs and scan outcomes that must never be sent back to any reviewer CLI. Reviewers only need the Prompt, Interpretation, Assumptions, Files, Approach, TDD Approach, Acceptance Criteria, Verification, Rollback, and Metadata sections.

PLAN_REVIEW_PROMPT:

Review this task plan for completeness, correctness, and risk. Focus on:
1. Does the plan match the user's prompt?
2. Are all assumptions surfaced?
3. Are acceptance criteria testable?
4. Is the TDD approach appropriate per the TDD Approach section?
5. Are there missing files, risks, or security concerns?

Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict

Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.

On APPROVED:

Set Status: plan-approved.
Append APPROVED row to Review History.
Proceed to Phase 6.

On MAX_ROUNDS:

Set Status: aborted-plan-review.
Send Telegram summary before stopping.
Ask the user whether to override and proceed, restart, or abort.

Phase 6: Execute (TDD-first where applicable)

Native orchestration — do not invoke superpowers:executing-plans.

Set Status: implementation-in-progress.
For every behavior-changing file edit:
- Invoke superpowers/test-driven-development via OpenCode's native skill tool.
- Write the failing test first. Run it. Confirm it fails.
- Implement the minimal code to make it pass. Run the test. Confirm green.
- Do NOT commit yet — a single task commit happens in Phase 9.
Auto-skip of TDD is permitted ONLY for tasks classified in task-plan.md TDD Approach as:
- pure-documentation
- pure-comment-whitespace-rename
Any other skip (including pure-config-addition) requires explicit user approval recorded in task-plan.md with an ISO-8601 timestamp.
Update task-plan.md after each logical step: add notes to Approach, check off Acceptance Criteria items as they complete.

Phase 7: Verification Gate

Invoke superpowers/verification-before-completion via OpenCode's native skill tool.

Run the commands listed in the Verification section of task-plan.md:

Lint (changed files first).
Typecheck.
Tests (targeted first, then broader suite if quick).

All must pass. If a command fails:

Fix the issue.
Re-run that command.
Increment verification_attempts in Runtime State.

If verification_attempts exceeds 3 without green:

Set Status: aborted-verification.
Send Telegram summary.
Ask the user whether to retry, override, or abort.

Phase 8: Implementation Review Loop

If REVIEWER_CLI=skip, present a diff + verification summary to the user and proceed only after explicit user approval.

Otherwise, invoke the Review Loop (Shared Subroutine) with:

REVIEW_KIND     = implementation
REVIEW_ID       = $(uuidgen | tr '[:upper:]' '[:lower:]' | head -c 8)   # distinct from plan-review ID
PAYLOAD_PATH    = /tmp/do-task-implementation-${REVIEW_ID}.md
PROMPT_TEMPLATE = IMPL_REVIEW_PROMPT  (see below)
SESSION_ID_VAR  = CODEX_IMPL_SESSION_ID | CURSOR_IMPL_SESSION_ID | OPENCODE_IMPL_SESSION_ID

Payload contents (assembled by the skill):

# Implementation Review: [Short Title]

## Task Plan (the plan that was approved)
<embed approved task-plan.md, excluding Runtime State block>

## Changes Made (git diff)
<output of: `git diff` for unstaged + `git diff --staged` for staged>

## Verification Output
### Lint
<lint output>
### Typecheck
<typecheck output>
### Tests
<test output, pass/fail counts>

IMPL_REVIEW_PROMPT:

Review this implementation against the task plan. Focus on:
1. Correctness — Does the diff satisfy the Acceptance Criteria?
2. Code quality — Clean, maintainable, no obvious issues?
3. Test coverage — Are behavior changes adequately tested (per the plan's TDD Approach)?
4. Security — Any security concerns introduced?
5. Regressions — Does the diff risk breaking unrelated code?

Return exactly these sections in order:
## Summary
## Findings
### P0
### P1
### P2
### P3
## Verdict

Rules:
- Order findings from highest severity to lowest.
- Use `- None.` when a severity has no findings.
- `P0` = total blocker, `P1` = major risk, `P2` = must-fix before approval, `P3` = cosmetic / nice to have.
- End with exactly one verdict line: `VERDICT: APPROVED` or `VERDICT: REVISE`.
- `VERDICT: APPROVED` is allowed only when there are no `P0`, `P1`, or `P2` findings. `P3` findings are non-blocking.

On APPROVED:

Set Status: implementation-approved.
Append APPROVED row to Review History.
Proceed to Phase 9.

On MAX_ROUNDS:

Set Status: aborted-impl-review.
Send Telegram summary.
Ask the user whether to override and commit anyway, restart, or abort.

Phase 9: Commit + Push Ask

Invoke superpowers/finishing-a-development-branch via OpenCode's native skill tool.

Stage all changed files explicitly (avoid git add -A).
Single commit with message derived from the task goal:
- Format: <type>(<scope>): <short description>
- Example: feat(auth): add session token rotation
Do NOT push. Update Status: local-only.
Ask the user: "Push to remote? (yes / no)"
- On explicit yes → push, then set Status: pushed.
- Any other response → leave Status: local-only.

Phase 10: Telegram Notification + Finalize

Resolve the notifier helper:

TELEGRAM_NOTIFY_RUNTIME=~/.config/opencode/skills/reviewer-runtime/notify-telegram.sh

On every terminal outcome (pushed, local-only, aborted-*, failed), send a Telegram summary if both TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID are set:

if [ -x "$TELEGRAM_NOTIFY_RUNTIME" ] && [ -n "${TELEGRAM_BOT_TOKEN:-}" ] && [ -n "${TELEGRAM_CHAT_ID:-}" ]; then
  "$TELEGRAM_NOTIFY_RUNTIME" --message "do-task <slug>: <status summary>"
fi

Rules:

Telegram is the only supported notification path.
Notification failures are non-blocking but must be surfaced to the user.
Before stopping for any user interaction, approval, or manual decision, send a Telegram summary first if configured.
If Telegram is not configured, state that no Telegram notification was sent.

Fill in Final Status in task-plan.md (include commit hash if any). Do NOT delete the plan folder — it stays as a record.

Review Loop (Shared Subroutine)

This subroutine is invoked twice per do-task run: once in Phase 5 (REVIEW_KIND=plan) and once in Phase 8 (REVIEW_KIND=implementation). Separate session IDs are used for each loop so reviewer context never leaks across loops.

Subroutine Inputs

Variable	Purpose
`REVIEW_KIND`	`plan` or `implementation`
`REVIEW_ID`	8-char hex (from `uuidgen`); reused across rounds of the same loop
`PAYLOAD_PATH`	`/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md`
`PROMPT_TEMPLATE`	`PLAN_REVIEW_PROMPT` or `IMPL_REVIEW_PROMPT`
`REVIEWER_CLI`	`codex` \| `claude` \| `cursor` \| `opencode` \| `pi`
`REVIEWER_MODEL`	Model name
`MAX_ROUNDS`	Default 10
`SESSION_ID_VAR`	`CODEX_PLAN_SESSION_ID` \| `CODEX_IMPL_SESSION_ID` \| `CURSOR_PLAN_SESSION_ID` \| `CURSOR_IMPL_SESSION_ID` \| `OPENCODE_PLAN_SESSION_ID` \| `OPENCODE_IMPL_SESSION_ID`

Temp artifact paths (per loop):

/tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md — payload
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md — normalized review text
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json — raw Cursor/OpenCode JSON (cursor only, plus opencode when --format json is used)
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out
/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh

Resolve the shared helper:

REVIEWER_RUNTIME=~/.config/opencode/skills/reviewer-runtime/run-review.sh

Set helper success-artifact args before writing the command script:

HELPER_SUCCESS_FILE_ARGS=()
case "$REVIEWER_CLI" in
  codex)
    HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md)
    ;;
  cursor)
    HELPER_SUCCESS_FILE_ARGS+=(--success-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
    ;;
esac

Step 1: Write Payload

Write the full payload for this round to PAYLOAD_PATH.

Step 1a: Secret Scan (per-payload, no caching)

BEFORE sending the payload to any reviewer CLI, scan it for secrets. This scan runs EVERY round — no results are cached. Rationale: Phase 8 payloads include newly-introduced diff content that earlier rounds never saw.

Run the secret scan with all of these anchored regexes. Use grep -En on the payload file:

SECRET_REGEX_FILE=$(mktemp)
cat >"$SECRET_REGEX_FILE" <<'EOF'
AKIA[0-9A-Z]{16}
"type"\s*:\s*"service_account"
(ghp|gho|ghs|ghu|ghr)_[A-Za-z0-9]{36,}
xox[abpsr]-[0-9]+-[0-9]+-[0-9]+-[A-Za-z0-9]{24,}
xox[abpsr]-[A-Za-z0-9]{10,48}
sk-(proj-)?[A-Za-z0-9_-]{20,}
sk-ant-(api|admin)[0-9]+-[A-Za-z0-9_-]{20,}
-----BEGIN [A-Z ]+ PRIVATE KEY-----
(TOKEN|SECRET|PASSWORD|API_?KEY|ACCESS_?KEY)\s*=\s*["']?[A-Za-z0-9+/=_-]{8,}
eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+
EOF

SCAN_MATCHES=$(grep -Ensf "$SECRET_REGEX_FILE" "$PAYLOAD_PATH" || true)
rm -f "$SECRET_REGEX_FILE"

If SCAN_MATCHES is non-empty:

Redact the matched text before surfacing — never echo the raw secret to the user, chat log, terminal scrollback, or any persistent file. Replace each matched substring with a fixed token that preserves only the fact of a match: [REDACTED:<pattern-label>:<match-length>-chars]. Example: a matched AWS key becomes [REDACTED:aws-access-key:20-chars]. Keep the file path and line number; they are useful for the user and not secret.

Present the redacted match summary to the user using this exact wording:

SECRET-SCAN MATCH in outbound reviewer payload (loop: ${REVIEW_KIND}, round: N):
<file>:<line>: [REDACTED:<pattern-label>:<match-length>-chars]
...
Proceed with sending this payload to ${REVIEWER_CLI}? (yes / no / redact)

Pattern labels: aws-access-key, gcp-service-account, github-token, slack-token, openai-key, anthropic-key, pem-private-key, dotenv-style, jwt.

Wait for user response.
On yes: record last_scan_outcome_${REVIEW_KIND}=user-approved-with-matches in Runtime State, and proceed.
On redact: ask the user to supply redactions, apply them to PAYLOAD_PATH, re-scan (this step), record last_scan_outcome_${REVIEW_KIND}=redacted-and-approved.
On no: stop the loop, set Status: failed, send Telegram, return to the user.

If SCAN_MATCHES is empty, record last_scan_outcome_${REVIEW_KIND}=clean and proceed.

Step 2: Generate Reviewer Command Script

Write the reviewer invocation to /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh as a bash script starting with:

#!/usr/bin/env bash
set -euo pipefail

If REVIEWER_CLI is pi:

Fresh call every round (Pi reviewer calls do not use session resume):

pi --no-session --no-skills --no-prompt-templates --no-extensions --no-context-files \
  --model "$REVIEWER_MODEL" \
  --tools read,grep,find,ls \
  -p "Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review. Return exactly the required ## Summary, ## Findings, and ## Verdict structure."

If REVIEWER_CLI is codex:

Round 1 — fresh codex exec:

codex exec \
  -m ${REVIEWER_MODEL} \
  -s read-only \
  -o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
  "Review the ${REVIEW_KIND} payload in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.

${PROMPT_TEMPLATE}"

Do not capture the Codex session ID yet. After Round 1 completes, extract it from /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out (look for session id: <uuid>) and persist it to Runtime State under ${SESSION_ID_VAR}.

Round 2 and later — resume session:

codex exec resume ${SESSION_ID_VAR_VALUE} \
  -o /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
  "I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.

Changes made:
[List specific changes]

Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before.
Keep findings ordered P0 to P3, use '- None.' when a severity has no findings, and only use VERDICT: APPROVED when no P0, P1, or P2 findings remain."

If resume fails, fall back to fresh codex exec with prior-round context.

If REVIEWER_CLI is claude:

Fresh call every round (Claude CLI has no session resume):

claude -p \
  "${ROUND_PREFIX}Review the following ${REVIEW_KIND} payload.

$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)

${PROMPT_TEMPLATE}" \
  --model ${REVIEWER_MODEL} \
  --strict-mcp-config \
  --setting-sources user

Where ${ROUND_PREFIX} is empty for Round 1 and "You previously reviewed this ${REVIEW_KIND} and requested revisions. Previous feedback summary: [key points]. " for subsequent rounds.

If REVIEWER_CLI is cursor:

Round 1:

cursor-agent -p \
  --mode=ask \
  --model ${REVIEWER_MODEL} \
  --trust \
  --output-format json \
  "Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.

${PROMPT_TEMPLATE}" \
  > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json

Round 2 and later — resume:

cursor-agent --resume ${SESSION_ID_VAR_VALUE} -p \
  --mode=ask \
  --model ${REVIEWER_MODEL} \
  --trust \
  --output-format json \
  "I've revised based on your feedback. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.

Changes made:
[List specific changes]

Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
  > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json

If resume fails, fall back to fresh cursor-agent -p.

After the command completes, extract the session id and review text:

CURSOR_SID=$(jq -r '.session_id' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json)
jq -r '.result' /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
  > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md

Persist CURSOR_SID to Runtime State under ${SESSION_ID_VAR} on Round 1.

If REVIEWER_CLI is opencode:

OpenCode does not expose a dedicated read-only flag at the CLI level; use the built-in plan primary agent (--agent plan) for review, which is read-oriented and does not modify files. Session resume is supported via -s <session-id>, but the most reliable pattern for non-interactive review is fresh call each round (like claude) because opencode's session lifecycle and ID capture are less standardized than codex/cursor for headless runs. Skills MAY opt-in to session resume when they have verified the installed opencode version exposes a stable session id in --format json output.

Round 1 (preferred, fresh call):

opencode run \
  -m ${REVIEWER_MODEL} \
  --agent plan \
  --format json \
  "Read the file /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md and review.

${PROMPT_TEMPLATE}" \
  > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json

Round 2 and later (fresh-call fallback path — recommended default):

opencode run \
  -m ${REVIEWER_MODEL} \
  --agent plan \
  --format json \
  "You previously reviewed this ${REVIEW_KIND} and requested revisions.

Previous feedback summary: [key points from last review]

I've revised. Updated payload is below.

$(cat /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md)

Changes made:
[List specific changes]

Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
  > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json

Optional session-resume path (only if the installed opencode reliably emits a session id in --format json output and accepts it back via -s):

# Round 2+ with resume
opencode run \
  -s ${SESSION_ID_VAR_VALUE} \
  -m ${REVIEWER_MODEL} \
  --agent plan \
  --format json \
  "I've revised. Updated payload is in /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md.

Changes made:
[List specific changes]

Re-review using the same ## Summary, ## Findings, and ## Verdict structure as before." \
  > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json

Extract the review body (the JSON stream emits events; the final assistant message contains the review text):

jq -r '.[] | select(.type == "message" and .role == "assistant") | .content' \
  /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
  > /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
  || cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
        /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md

If the JSON parse falls through, promote the raw JSON file as the review output and surface a warning to the user. On any opencode CLI or JSON parsing failure, treat this loop round as completed-empty-output and follow the helper-failure escalation in Step 6.

Step 3: Run via `run-review.sh`

Run the command script through the shared helper when available:

if [ -x "$REVIEWER_RUNTIME" ]; then
  "$REVIEWER_RUNTIME" \
    --command-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
    --stdout-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
    --stderr-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
    --status-file /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
    "${HELPER_SUCCESS_FILE_ARGS[@]}"
else
  echo "Warning: reviewer runtime helper not found at $REVIEWER_RUNTIME; falling back to direct synchronous review." >&2
  bash /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh \
    >/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
    2>/tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr
fi

Run the helper in the foreground and watch live stdout for state=in-progress heartbeats. If your agent environment buffers command output until exit, start the helper in the background and poll the .status file instead of treating heartbeats as post-hoc-only data.

Step 4: Promote Reviewer Output + Capture Session ID

After the command completes:

cursor: already promoted in Step 2 via jq -r '.result' .... Also capture session_id if first round.
codex: extract CODEX_SESSION_ID from /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out after the helper or fallback run. If the review text lives only in .runner.out, cp it into the .md file:
```
cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
   /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md
```

claude or pi: promote .runner.out into the .md file:

cp /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
   /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md

opencode: already promoted in Step 2 via jq on the JSON stream. If opt-in session-resume is active and the JSON includes a stable session id, capture it and persist to ${SESSION_ID_VAR}.

On Round 1, persist the captured session ID (if any) into task-plan.md's Runtime State under ${SESSION_ID_VAR}.

Step 5: Parse Verdict + Update Review History

Read /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md.
Append one row to task-plan.md Review History:
- Timestamp (ISO-8601 UTC).
- Loop (plan or implementation).
- Round number.
- Verdict (APPROVED or REVISE).
- Summary (first line of the ## Summary section).
Increment plan_review_round or implementation_review_round in Runtime State.

Step 6: Branch APPROVED / REVISE / MAX_ROUNDS

Verdict rules:

VERDICT: APPROVED with no P0, P1, or P2 findings → exit the subroutine with APPROVED.
VERDICT: APPROVED with only P3 findings → optionally fix the P3 items if cheap and safe, then exit with APPROVED.
VERDICT: REVISE or any P0, P1, or P2 finding → go to revision (see below), then return to Step 1 for the next round.
No clear verdict but P0, P1, and P2 are all - None. → treat as APPROVED.
Helper state completed-empty-output → treat as failed review attempt, surface .stderr/.status, fix invocation or prompt handling, then retry.
Helper state needs-operator-decision → surface status log and decide whether to extend the timeout, abort, or retry with different helper parameters.
Round counter ≥ MAX_ROUNDS → exit the subroutine with MAX_ROUNDS. Caller decides next action per Phase 5 or Phase 8.

Revision: The caller (Phase 5 for plan, Phase 6/7 for implementation) applies findings in priority order (P0 → P1 → P2 → P3). For implementation review revisions, Phase 7 verification must be re-run after every revision before returning to Step 1.

Step 7: Liveness Contract (during Step 3)

The shared reviewer runtime emits state=in-progress note="In progress N" heartbeats every 60 seconds while the reviewer child is alive.
Keep waiting as long as a fresh In progress N heartbeat keeps arriving roughly once per minute.
Do not abort just because the review is slow, a soft timeout fired, or a stall-warning line appears, as long as the In progress N heartbeat continues.
Treat missing heartbeats, state=failed, state=completed-empty-output, and state=needs-operator-decision as escalation signals.

Step 8: Cleanup (on successful round exit)

rm -f /tmp/do-task-${REVIEW_KIND}-${REVIEW_ID}.md \
      /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.md \
      /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.json \
      /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.stderr \
      /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.status \
      /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.runner.out \
      /tmp/do-task-${REVIEW_KIND}-review-${REVIEW_ID}.sh

If the round failed, produced empty output, or reached operator-decision timeout, KEEP .stderr, .status, and .runner.out until the issue is diagnosed instead of deleting them.

Resume Semantics

Detect existing plan folder by slug at Phase 4.
Read task-plan.md → Status.

Decide next action:

Status	Action
`draft`	Resume at Phase 5 (plan review)
`plan-approved`	Resume at Phase 6 (execute)
`implementation-in-progress`	Resume at Phase 6 (continue execute)
`implementation-approved`	Resume at Phase 9 (commit + push ask)
`pushed` \| `local-only`	Ask user: new suffix, abort, or replay for reference only
`aborted-*` \| `failed`	Offer new suffix or full restart

When resuming, read Runtime State for CODEX_PLAN_SESSION_ID, CODEX_IMPL_SESSION_ID, CURSOR_PLAN_SESSION_ID, CURSOR_IMPL_SESSION_ID, OPENCODE_PLAN_SESSION_ID, OPENCODE_IMPL_SESSION_ID, and the round counters. If a session ID is populated, use it for the first revision round in that loop (Round 2) via codex exec resume, cursor-agent --resume, or opencode run -s <id> as applicable.

Tracker Discipline (MANDATORY)

ALWAYS update task-plan.md before/after each phase transition. NEVER proceed with stale state.

Before starting any phase:

Update Status if it transitions.
Update last_phase_entered in Runtime State.

After completing any phase:

Update Status if it transitions.
Append notes to the relevant section of task-plan.md.

Review History is append-only.

Execution Workflow Rules

Current branch is the default; worktree is opt-in only.
Do NOT push without explicit "yes".
Secret scan runs per-payload, no caching — every round, including revisions.
Review loops use MAX_ROUNDS=10 by default, shared across both loops.
The task commit is a single commit created in Phase 9; interim WIP commits are NOT created.
The .gitignore infra commit in Phase 1 is explicitly separate from the task commit and is allowed even on abort.

Verification Checklist

ai_plan/ exists and /ai_plan/ is in .gitignore
task-plan.md created under ai_plan/YYYY-MM-DD-<slug>/
Reviewer CLI + model + MAX_ROUNDS configured (or skip)
Secret scan ran on every outbound reviewer payload
Plan review completed (APPROVED, MAX_ROUNDS handled, or skipped)
Phase 6 executed TDD-first for all behavior-changing steps (or documented skip)
Phase 7 verification green before Phase 8
Implementation review completed (APPROVED, MAX_ROUNDS handled, or skipped)
Single task commit created locally, no push without explicit yes
Telegram notification attempted if configured
task-plan.md Final Status filled in

Variant Hardening Notes — OpenCode

Must use OpenCode's native skill tool for sub-skill invocation. Do NOT use Claude's Skill tool syntax. OpenCode may load shared skill files from ~/.agents/skills/, but invocation is still OpenCode-native.
Phase 1 includes a Bootstrap Superpowers Context step that lists installed skills and confirms superpowers/brainstorming, superpowers/test-driven-development, superpowers/verification-before-completion, and superpowers/finishing-a-development-branch are discoverable before any other phase runs.
Helper paths are ~/.config/opencode/skills/reviewer-runtime/{run-review.sh,notify-telegram.sh}.
OpenCode reviewer CLI branch (when REVIEWER_CLI=opencode):
- Binary: opencode. Non-interactive: opencode run "<message>".
- Model: -m <provider>/<model> (e.g., openai/gpt-5.4, anthropic/claude-sonnet-4-5).
- Read-only posture: --agent plan (uses OpenCode's built-in plan primary agent; no explicit --read-only flag exists).
- Output: --format json for structured output. Extraction uses jq against the JSON event stream.
- Session resume: -s <session-id> or --continue. Fresh call each round is the recommended default since session id capture is less standardized than codex/cursor for headless runs.
No plan-mode guard (OpenCode has no plan-mode concept).

Common Mistakes

Skipping the Bootstrap Superpowers Context step in Phase 1 (breaks native skill discovery).
Using Claude Skill tool syntax, or treating shared ~/.agents/skills/ files as anything other than OpenCode-native skill entries.
Forgetting to set --agent plan on opencode reviewer calls (would use the default build agent which can write files).
Asking multiple clarifying questions in a single message.
Skipping the per-payload secret scan because "the previous round was clean".
Pushing the task commit without explicit user approval.
Using a non-provider-qualified model string for opencode (e.g., gpt-5.4 instead of openai/gpt-5.4).

Red Flags — Stop and Correct

You are invoking sub-skills via Claude's Skill tool or Codex native-discovery paths instead of OpenCode's native skill tool.
You are running an opencode reviewer call without --agent plan.
You did not announce which skill you invoked and why.
You are proceeding to implementation review with failing lint/typecheck/tests.
You are echoing raw secret-scan matches to the user or logs.
You are pushing without explicit user approval.

37 KiB Raw Blame History